Predictions about the future are important for our planning, so today I will give you my vision for the use of human languages by machines over the next 5 years. I wrote recently about my top ten predictions from a decade ago that have come to pass with the explosion of IP-centric developments. I also wrote that the “years between 2020-25 will be dominated by machines. Consumers and businesses will live in a robotic world — a scenario eerily reminiscent of the Terminator franchise.” How is that working out for us?
Scientific predictions are easiest when innovation has taken place and it is a just matter of application. We are awesome at applying technology that works as can be seen with television miniaturization and Moore’s Law. Conversely, many predicted the rise of intelligent machines at the 1956 Dartmouth conference that coined the term “artificial intelligence” or AI. Most predictions, such as automatic translation between languages, general robotics and conversation with machines have yet to eventuate because the technology was initially immature, and also the solutions were designed to cater to the constraints. In some cases, however, the original solutions continue to be pursued without additional innovation to exploit today’s technology.
Thinking Solutions is an Australian company that has found language understanding is possible without a reliance on today’s dominant approach: statistical analysis for machine learning. Not only that, human-like accuracy is demonstrably possible in this critical area as needed for the next generation of machines.
Human conversation is considered an AI-Hard problem – something that cannot be done without first solving all the problems of artificial intelligence. Some would argue that a detailed understanding of every activity in the brain is needed first, but as has been seen since AI was founded, most problems seem to be solvable independently of the entire solution. The problems of conversation can be broken down so I will look at the pieces of the puzzle along with their solutions.
To work with language, a machine needs to understand (a) text at least, and ideally a bit more with (b) spoken language plus the generation of (c) spoken responses. Some current technologies start with just text, pick out a few text fragments based on rules, and pass control to applications: asking our car to call home is such an example. But the next human step in language is to (d) understand what is being said. A native language speaker picks out every nuance of meaning in a sentence, correcting our choice of words, if necessary, in a way unlike today’s language systems.
Next we take our understanding and in concert with our current situation, we use (e) the context to try to clarify what was said. Interaction while tracking context is what we refer to as (f) conversation. Finally, with the understanding of what has been said, we (g) take action. The action can be to take note, but do nothing, to physically move or to decide what to respond verbally: perhaps a clarifying question or an informational response.
Text recognition starts with the letters entered on a computer via a keyboard. Some analysis stops at this level of recognition.
IBM Watson, for example, takes recognized text and, using sophisticated algorithms and special-purpose hardware, is able to determine the best known information that applies from a massive knowledge base constructed by the acquisition of knowledge from unstructured documents. Think ‘Big Data’ from documents, not human-like cognition, because the system does not understand the words.
Voice recognition has been popular in science fiction, although in practice it is yet to achieve anything like human level accuracy. Worse, Hollywood is starting to assume our future will be limited to voice commands, with science fiction systems now operating unnaturally as in: “music on” or “turn water on.” Isn’t it easier to say whatever comes into our head, such as “fire up the shower, so it is nicely warm when I get to the bathroom?”
Nuance is the market leader for dictation software. Its adoption may be limited by its accuracy. If you won’t bet your life on the market leading dictation systems to accurately transcribe what is heard, the technology is not yet ready. Many agree since the late 1960s, that lack of language understanding holds back dictation.
Nothing brings to mind lack of scientific progress more than hearing a bad voice synthesizer. Part of the historical problem has been the lack of a system’s language representation, but knowing where to put emphasis within words and phrases and the appropriate speed of syllable production is necessary to be natural in any language and with the right accent.
In the industry today, language understanding is an open problem, whose solutions are found wanting. Described as simply needing more rules for accuracy, the problem comes back to scalability. The number of rules needed can grow and grow, until it exceeds the capacity for programmers to deal with it.
By contrast, automation with machine learning techniques build up statistics through automated document analysis, but the meaning of the words are not provided. A current approach, Deep Learning, is seen as a way to further automatically extract useful features. Google is currently investing in applications for Deep Learning with a founding father, Dr. Geoffrey Hinton.
More processing power today enables the technology, but the concepts were developed back in the 1980s. Are there better ways to get the result without the processing effort? Is engineering, in which every piece of activity is understood perfectly, better than one in which a machine gathers features and applies them independently to human involvement? My personal preference is the latter, engineered solutions in which every piece works consistently in a predetermined way.
Context tracking, an integral part of conversation, is an area of scientific inquiry seeing much improvement while suffering neglect at the big end of town. Professor Robert Van Valin, a scientific advisory board member at Thinking Solutions, models languages with three dimensions – word sequences (grammar), word meaning (semantic) and conversation (pragmatic discourse). As the primary developer of Role and Reference Grammar, Van Valin’s model facilitates context tracking in conversation with a linking algorithm.
In working prototypes today, the linking algorithm converts language to its meaning, accurately disambiguating the correct references in conversation.
Conversation has been a missing piece with machine interaction, because it not only requires the understanding of what is said, but also must address what else is going on at the moment. Today, state-of-the-art systems use command-based solutions, in which the commands produce predictable responses. They are not interactive like a conversation. Wit.ai, recently acquired by Facebook, implements such a command-based application. A quote from Wit.ai co-founder Alex Lebrun summarizes the distance to go to bridge the gap between commands and conversations, while suggesting there is no cohesive solution:
“Wit.ai… have only scratched the surface of the problem. From simple commands, we’ll need to teach machines to understand more complex statements, be aware of the context, handle strong ambiguity, and generate natural, fluid interactions. I doubt we’ll ever see a single, monolithic solution provided by one company alone. Instead, a few startups will solve well-defined pieces of the problem. Natural language applications will be made up of these interlocking pieces.”
Taking action at this point of conversation is not complicated. Once we understand what is being said, the action will be resolved. That is the role of a programmer, a commodity skill for most tasks.
The range of mobile phone applications, such as the personal assistants, all work pretty much the same way with similar capabilities. They all lack conversational competency today, but once an action is determined, all provide consistent and accurate responses. Where understanding is not found, all work equally poorly in human terms.
Today’s blog looks at the requirements for language systems to support today’s need for machine understanding. There is much interest. Dr. Alan Turing, famous for his pioneering work in computer design, WWII code breaking and the precursor to AI, devised a test for intelligence. Conversation was used to test whether the other party was intelligent. Some today think answers to questions are a better test, as with the Winograd Schema Challenge. That approach is now a competition offered by Nuance Communications, but of course in the Turing test you could ask those questions anyway.
Facebook, which recently acquired Wit.ai and which has a number of AI experts on the payroll, has proposed a simpler test. Here, a few sentences are entered and while a native language speaker will always be able to answer questions about them correctly, the test is whether a machine can too. Their test is a good example of language understanding as I have explained it.
Radical innovation to move away from computationally-intensive statistical techniques have been lacking in AI generally, and in language and speech products specifically. A change to implement meaning-based systems not only promises to deliver the long-awaited conversational interface with machines, but also the platform to start the next phase in the evolution of computing.
When humans have a solution to follow, our speed of adoption and innovation is consistently strong. The 2020 predictions are now looking as strong as the results from the 2015 predictions.
Do those of us at the forefront of this development need to worry about the 2020 prediction of the Terminator franchise coming to life? Fortunately, language systems alone are only one piece of the puzzle – a critical one to enhance our experience in the robotic world – but only a piece nonetheless.
This article is published in collaboration with LinkedIn. Publication does not imply endorsement of views by the World Economic Forum.
To keep up with the Agenda subscribe to our weekly newsletter.
Author: Dr. Hossein Eslambolchi is Chairman & CEO of Cyberflow Analytics.
Image: Humanoid robot of British company RoboThespian “blushes” during the opening ceremony of the Hanover technology fair Cebit March 9, 2014, where Britain is this year’s partner country. REUTERS/Wolfgang Rattay