DeepMind created an IQ test for AI, and it didn't do too well

Jul 17, 2018

This article is published in collaboration with Futurism.

A woman touches a screen on a robot developed by iFlytek at the outpatient hall of People's Liberation Army General Hospital in Beijing, China March 16, 2017. Picture taken March 16, 2017. Zhao Naiming/via REUTERS

You could be smarter than a robot. Image: REUTERS

Kristin Houser

Writer, Futurism

General Intelligence

AI has gotten pretty good at completing specific tasks, but it’s still a long way from having general intelligence, the kind of all around smarts that would let AI navigate the world the same way humans or even animals do.

One of the key elements of general intelligence is abstract reasoning — the ability to think beyond the “here and now” to see more nuanced patterns and relationships and to engage in complex thought. On Wednesday 11th, researchers at DeepMind — a Google subsidiary focused on artificial intelligence — published a paper detailing their attempt to measure various AIs’ abstract reasoning capabilities, and to do so, they looked to the same tests we use to measure our own.

Human IQ

In humans, we measure abstract reasoning using fairly straightforward visual IQ tests. One popular test, called Raven’s Progressive Matrices, features several rows of images with the final row missing its final image. It’s up to the test taker to choose the image that should come next based on the pattern of the completed rows.

Image: DeepMind

The test doesn’t outright tell the test taker what to look for in the images — maybe the progression has to do with the number of objects within each image, their color, or their placement. It’s up to them to figure that out for themselves using their ability to reason abstractly.

To apply this test to AIs, the DeepMind researchers created a program that could generate unique matrix problems. Then, they trained various AI systems to solve these matrix problems.

Finally, they tested the systems. In some cases, they used test problems with the same abstract factors as the training set — like both training and testing the AI on problems that required it to consider the number of shapes in each image. In other cases, they used test problems incorporating different abstract factors than those in the training set. For example, they might train the AI on problems that required it to consider the number of shapes in each image, but then test it on ones that required it to consider the shapes’ positions to figure out the right answer.

Better Luck Next Time

The results of the test weren’t great. When the training problems and test problems focused on the same abstract factors, the systems fared OK, correctly answering the problems 75 percent of the time. However, the AIs performed very poorly if the testing set differed from the training set, even when the variance was minor (for example, training on matrices that featured dark-colored objects and testing on matrices that featured light-colored objects).

Ultimately, the team’s AI IQ test shows that even some of today’s most advanced AIs can’t figure out problems we haven’t trained them to solve. That means we’re probably still a long way from general AI. But at least we now have a straightforward way to monitor our progress.

Have you read?

Don't miss any update on this topic

Create a free account and access your personalized content collection with our latest publications and analyses.

License and Republishing

World Economic Forum articles may be republished in accordance with the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Public License, and in accordance with our Terms of Use.

The views expressed in this article are those of the author alone and not the World Economic Forum.