Back in September 2016, Google launched its Neural Machine Translation (GNMT) system, which uses deep learning to deliver more natural translations between languages.

Google Translate originally supported only a handful of languages when it launched 10 years ago; today that number has risen to 103. The system translates more than 140 billion words each day.

Creating a computer system to translate multiple languages is complex. The people at Google who built it wanted to find out just how clever their system was. So they came up with a challenge. They taught the machine to translate English to Japanese and vice versa. Then they taught it to translate English to Korean and also the reverse translation. So far, so ordinary. But what followed was truly extraordinary.

Lost in (AI) translation

The researchers discovered GNMT had taught itself to deliver ‘reasonable’ translations of Japanese to Korean – and vice versa – without using English as a bridge. It appears the machine had constructed its own language that reflects the concepts it uses to translate between languages it has been trained to understand.

A single sentence visualisation is captured here, representing the system’s memory of multi-directional translation between Japanese, Korean and English languages:

Image: Johnson et al

The discovery, called an ‘interlingua’, is in its early stages, and may be basic or highly sophisticated in its capabilities. In the above graphic, part (a) shows an overall geometry of the translations. Sentences sharing the same meaning – not language – also share the same colour. Part (b) is a close-up of one of the grouped meanings, and finally, part (c) segments the meanings into source languages.

The Google Translate team explains that "within a single group, we see a sentence with the same meaning but from three different languages. This means the network must be encoding something about the semantics of the sentence rather than simply memorizing phrase-to-phrase translations. We interpret this as a sign of existence of an interlingua in the network."

In simple terms, the system has created something by itself, with no human direction, to seemingly support its understanding of human languages.