Why conversation will determine the future of the IoT

Apr 10, 2015

Hossein Eslambolchi

Chairman & CEO, Cyberflow Analytics

Our Impact

What's the World Economic Forum doing to accelerate action on Emerging Technologies?

The Big Picture

Explore and monitor how Fourth Industrial Revolution is affecting economies, industries and global issues

A hand holding a looking glass by a lake

Crowdsource Innovation

Get involved with our crowdsourced digital platform to deliver impact at scale

Stay up to date:

Emerging Technologies

This week’s blog examines the migration to conversation for IoT from today’s unnatural, command-based model. We recently discussed RRG, what will empower machines to converse with us via a new linguistic approach that connects phrases in any language to meanings and back again.

While conversation relies on natural language understanding (NLU), an exciting observation is that tracking context in conversation can do away with a lot of what artificial intelligence (AI) would call reasoning.

AI was founded in the 1950s with its strongest advocates claiming that whatever was proposed, a program could be written to emulate it. However, despite money, time and effort, nobody to date has been able to scale it to human levels.

Command-based technologies exploit this early AI approach. Provided a system has been loaded with all the necessary commands, and provided the user can remember what the commands are, the system works. As an added bonus, current voice recognition design limitations – the need to know all possible phrases in advance – are catered to.

The trouble is that users may not know a command or, worse, there may not be one yet. This is the state-of-the-art technology behind Wit.ai (acquired by Facebook) and Nest (acquired by Google).

The Thinking Solutions Language Algorithm implements RRG and works like any development environment today. Developers can add vocabulary in any language and link them with unique, system-wide meanings. It’s a different approach worth pursuing.

Commands v. Conversation

Conversation improves on commands because it allows the user to describe their intent in words. It is infinitely flexible, but conversation by machine is new, because it was, until recently, an AI-hard problem. Extracting NLU from AI is a terrific breakthrough.

Those of you who saw the 2004 movie I, Robot, starring Will Smith, would have seen the future: intelligent robots who could understand our language. Detective Del Spooner, a Luddite against the robotic revolution, has an archaic CD player in his house. In a funny scene, Dr. Calvin is guessing how to interact: “Play”, “On”, “Run?” followed by attempts to turn it off with “End program” and “Cancel.”

Ironically, Calvin, who should understand language technology as a robotics expert, was determined to use today’s 2015 archaic commands. Due to a lack of context, even robots would have struggled since it is unclear who she was addressing. Compare her commands to the request: “Please play your current music, CD Player.” This conversational sentence identifies the target device and adds context to help limit responses. A toaster, vacuum and umbrella don’t play music, for example.

Technology follows the best solution. While old CD players may be around in 2035 (the year in which I, Robot is set), it is unlikely that any machine will still have command interfaces.

Like old watches, some will remain for historical interest, but all new ones will use the current technology. In the case of IoT, they will be conversational interfaces. I predict the rapid adoption of conversational computing as fast as it develops.

Hollywood Upgrade

The Hollywood prediction engine for the future of robotics is broken. In the 1968 book by Arthur C. Clarke, 2001: A Space Odyssey, a super intelligent machine, the HAL9000 computer, controls the ship and speaks with us. It seems intelligent and realistic, even today. Since then, Hollywood has scaled back on intelligence.

For Hollywood to catch up, more imagination is needed, as in the late 1960s, to make its robots more realistic. They need real conversational speech and emotions. We will explore the reasons why in a future blog.

Language is Infinite

As all human languages are infinite, there is no list that can possibly cover all sentences. We can continue to add vocabulary and extend sentences, but it is the undeveloped, underlying meaning where NLU’s most promising future lies. Today’s problems are often best addressed at the level of meaning, not at the level of words, as hinted at in the current intelligence tests like the Winograd Schema test (WST).

The WST in particular exploits the deficiencies in statistical approaches to language that inhibit its ability to connect meanings separated within a sentence. In context, the analysis of such meanings are not only demonstrable, but clearly a function of understanding, not just word patterns.

“I like you.” Depending on who says this to whom, the meaning of I and the meaning of you is unbounded and only fully understood in context. Now what of the alternative word order in “It is you I like.” Here, “you I like” is treated as the same meaning, but a statistical system would consider them different. How about “I, well, you know, like you.” This also has the same meaning for those 3 words. And perhaps to make the point, “I, no matter what anyone says, and no matter how long they take to say it, like you. Really.”

Language is infinite, and the words we choose to use should not be imposed on us by “stupid” machines.

With the analysis of human languages based on word sequences, programmers have never been able to scale up. Try it and the combinations escalate quickly. While meaning is key to most disambiguation tasks, the preferred language models in the past ignored meaning and created excessive ambiguity.

Today’s AI Lacks Scalability

The lack of scalability for AI is terminal. We have known for more than five years that it is time for innovation. As I mentioned above, the 1960s were a good time for it. Bar-Hillel’s 1960 critique of automatic translation depended on a Universal Encyclopedia of knowledge.

His contemporary from Bell Labs, John Pierce, noted around 1969 that speech recognition (ASR) requires human-like NLU perhaps prompting his quote “Funding artificial intelligence is real stupidity.” Behind his comment is the observation that a lot of investment money was not producing results. By contrast, working prototypes are far less risky to expand than pure science models.

Yann LeCun, director of Facebook’s AI Research, points out that his “least favorite description” of a current research area, Deep Learning, is that “it’s like the brain.” Further, “AI has gone through a number of AI winters because people claimed things they couldn’t deliver.”

How do I contrast the approach of Deep Learning and its ability to generate features to the new Thinking Solutions Language Algorithm, a system that currently uses supervised machine learning to load accurate content, but without feature automation?

The answer is that they are very different. Most current solutions “learn” by storing new statistics – either as neural network synapses or as tables. Scientists at IBM proposed a solution more than 20 years ago to leverage the strength of statistical methods and processing power with speech recognition in particular, but with languages in general. The trouble is that research into computational methods for speech and translation still has not succeeded and more of the same is unlikely to occur. Nuance Communications uses the statistical approach in its dictation product as well, but it is not nearly 100% accurate. It doesn’t understand and can write nonsense.

Many of you will have experienced using a command-based voice system before. You could say something like: “Siri, please call my best friend’s home.” And get a response, “Who would you like to call?” That is a disastrous answer because you need to start again from scratch. Or at least repeat the phrase “my best friend’s home” which was not recognized the last time. That is a system taking you backwards. An all-or-nothing approach – recognize the sentence or recognize nothing – is demoralizing to users. Even bad typing only needs the incorrect letters changed, not the entire sentence. ASR is annoying when it is inaccurate.

Internet of Things – IoT

Activity continues with voice-based systems for cars, home automation systems and many other areas as the number of devices connected to the internet explodes. Apple started the rush for systems with their acquisition of Siri and their corresponding patents, which show the scope of such a vision to improve our ease at working with IoT.

Imagine an accident with a fuel tanker on its side and your driverless car stopping to help. “Don’t stop” could be an important command because your car could otherwise loiter in a blast zone, a potentially fatal result as the tanker may explode. Understanding is critical.

Compare that to “when I get home, keep the lights on for a minute while I go inside.” The car would then turn off the engine when you arrive home, and per your instruction, leave the headlights on until you enter your house. Much of the action comes from unstated intent, and ambiguous words (when outside of context).

The debate we need to have is around system accuracy and ease of use. Ease of use is intrinsically tied to our language. Let’s look at a couple of aspects to understand where the debate needs to move, as nobody is producing systems like this yet, and with meaning-based systems, they could.

Let’s look at a quick analysis from one of the Thinking Solutions programmers to explain what a meaning-based system could be doing.

“When I get home” is a reference to a future time, not a location. My home is a location, but the sentence refers to the next time I arrive there. Notice it doesn’t mean that I will possess a home, like the sentence “I got an apple”. The word I refers to me, but if you say it, it refers to you. If someone else says it, it refers to them. That concept doesn’t facilitate a nice database model.

“Keep the lights on for a minute while I go inside” is a verbal command telling the car’s IoT to keep the “headlights on”, not the interior car lights. It is obvious from the context as you need lights to see your back door. And “keep the lights on” is a command about the condition of the lights (not off) not the location of the lights – “on the car.” “For a minute” is the duration to keep the lights on, not a purpose like “for a burglar” which uses the same structure. “While I go inside” is also a duration. It has a specific end point, when the back door of the house closes. Note you could think that this means when I go inside the car, but that doesn’t make sense in context.

This example shows a future system in action, naturally taking context and applying it in a narrow domain. Command-based systems force the command to determine the context. Conversational system will use the words to determine context, as people do.

NLU can be extracted from AI-Hard because meaning can be understood independently to the senses. Simplifying a bit, a bright light can be known to hurt your eyes, which is bad, whether or not you experience vision. For language, NLU appears workable without the full integration with working sensory systems, as was the AI-hard model.

A fuel tanker on its side doesn’t need a knowledge of the smell of fuel, or the visual image of a tanker, or the feeling of an explosion. Disabled people deal with lack of sensory input. We should exploit that in the early adoption of NLU while others work on vision systems.

Offline Use

There has been a rush for internet-hosted applications for voice. One reason is the provider only needs a single copy of their large statistical tables. They can also be updated offline without impacting devices. The trouble is that this machine interface will become critical to everything and so the capability needs to be offline, stored on the device.

Offline use is needed for the IoT. Imagine your house lights staying on because the internet link is broken. Sure, there will be a blurring of the boundaries between the internet of today and its role in the future, but the role of conversation will become dominant. It will be on the device or localized to a house, not controlled by a cloud provider for performance, customizations and accessibility.

It is just a matter of time for teams of people to exploit the new technology for our devices everywhere. The language understanding revolution has a long way to go, but there are now a number of applications.

What’s the final test proving the IoT is ready? When Star Trek changes from the command, “Energize” or the personal, “Beam me up, Scotty”, to the conversational “I’m ready to go, Enterprise” and all produce the same result.

This article is published in collaboration with LinkedIn. Publication does not imply endorsement of views by the World Economic Forum.

To keep up with the Agenda subscribe to our weekly newsletter.

Author: Dr. Hossein Eslambolchi is Chairman & CEO of Cyberflow Analytics.

Image: A man types on a computer keyboard in Warsaw. REUTERS/Kacper Pempel/Files

Don't miss any update on this topic

Create a free account and access your personalized content collection with our latest publications and analyses.

License and Republishing

World Economic Forum articles may be republished in accordance with the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Public License, and in accordance with our Terms of Use.

The views expressed in this article are those of the author alone and not the World Economic Forum.