When you walk into a room, your eyes process your surroundings immediately. Refrigerator, sink, table, chairs: right, this is the kitchen. Your brain has taken data and come to a clear conclusion about the world around you, in an instant. But how does this actually happen?

Elissa Aminoff, a Research Scientist in the Department of Psychology and at the Center for the Neural Basis of Cognition at Carnegie Mellon University, shares her insights on what computer modelling can tell us about human vision and memory, as part of our XxXX series of interviews with ten pioneering female scientists.

What do you do?

What interests me is how the brain and the mind understand our visual environment. The visual world is really rich with information, and it’s extremely complex. So we have to find ways to break visual data down. What specific parts of our world is the brain using to give us what we see? In order to answer that question, we’re collaborating with computer scientists and using computer vision algorithms. The goal is to compare these digital methods with the brain. Perhaps they can help us find out what types of data the brain is working with.

Does that mean that our brains function like a computer? That’s something you hear a lot about these days.

No, I wouldn’t say that. It’s that computers are giving us the closest thing that we have right now to an analogous mechanism. The brain is really, really complex. It deals with massive amounts of data. We need help in organising this data and computers can do that. Right now, there are algorithms that can identify an object as a phone or as a mug, just like the brain. But are they doing the same thing? Probably not.

Nevertheless, the type of information used by a computer to come to the conclusion that it’s a mug might well be the same as the type of information the brain uses. That’s what we are testing right now: how relevant is a computer’s way of recognizing things to the way the brain does it?

So how does the brain recognize things?

There are two ways that information flows. In the first way, which we call “bottom up”, information begins with points of light entering our eyes that fall onto your retinae. These points are processed by our visual systems and transformed into increasingly complex forms, from points to lines to edges to shapes and, ultimately, to objects and scenes. But the problem is that this array of light coming into our eyes is noisy and difficult to interpret, so just progressively making more and more complex interpretations of the light image would be rather slow.

To help solve this problem, our brains appear to use a wide array of “top-down influences”. That is, our experience and memories help us to anticipate and interpret what is in front of us. We’ve all seen a keyboard in front of a computer before, so if I show you a very blurry image of one, your experiences fill in the gaps before you have a clear picture.

Is that possibly why we sometimes make mistakes? The classic “I’m sure I saw someone” moment?

Yes, visual illusions exploit our unconscious expectations. Disconcertingly, these same predictions can also influence our memories. One study I performed looked at false memories. If I showed you an image with an oven in it, for example, you might later recall seeing an oven and a refrigerator; because you typically see ovens and refrigerators in the same space. In fact, one image I used was of my own kitchen which had a strange set-up. My washer and dryer were stacked one on top of the other inside the kitchen, but when asked to recall the image, many people remembered seeing a refrigerator because that’s what should have been there.

What are you working on right now then?

We have a very good idea of how low-level information is processed. That is, the early bit where points of light are transformed into lines, etc. At the same time, we are also beginning to have a better understanding of how we process very high levels, that is, how a kitchen or a keyboard is represented in our brains. But we don’t know how to connect low-level visual input with such high-level information. And this is where computer vision models are proving to be extremely interesting. As working systems, they actually have to come up with a solution to this problem – how you take points of light and figure out what scene you are looking at. So they necessarily connect the dots and, in doing so, provide us with a “visual vocabulary” of what the features that allow this process to be successful.

How do you see these studies being applied in the future?

On the computer technology side, the more we can improve computer vision systems, the more successful they will be in understanding the world in ways that are helpful to us: safer self-driving cars, robots that can make us breakfast, and so on. Right now a robot might see a carpet, four walls and a window. But to get breakfast, I need it to understand, “there is a refrigerator; there is the handle; I can open it and get some milk out.” If we can get computers to that level, they will be actually useful as assistants. For example, such robots would be a massive advance in helping care for the elderly or the disabled.

On the human side, it’s amazing how much we don’t know about how the brain understands the visual scene. That’s really incredible when you consider that the visual scene affects every aspect of our understanding. My expectations at the office and at the swimming pool are going to be radically different. Based on the visual input I receive, my language, my actions, even my goals will be different. The more effective we can be in understanding the environment around us, the more we can build models of how people generally reason about the world using this rich source of information.

This could also have very practical applications for medicine. Some people suffer from what is called topographic disorientation – they have great difficulty navigating in even simple environments. For example, if someone with this disorder were at the theatre and went to the bathroom, they would have no way of knowing how to get back to their seat. Because scene understanding is so integral to navigation, better models of scene processing will ultimately help us to better address this disorder.

What are some of the challenges that lie ahead for this kind of research?

We’re still working with crude human neuro-imaging techniques. The tools we have to visualise what is happening inside the human brain are exciting, but each point in our data is actually the average response over millions of neurons, making it very difficult to understand the micro-structure of neural information processing. There are 86 billion or so neurons, each an individual cell that transmits information, in the human brain – and we are very far away from neuroscientific methods that will allow us to see how each of these units interact with one another. We’re limited by that.

Where do you see things going next?

There is a lot more to do. I want to understand how visual recognition works both in terms of where in the brain, and when, things happen. And what is the “vocabulary” of vision? From there I would like to see how vision affects other aspects of our cognition, including memory and reasoning.

How can we encourage more women to get involved in STEM (science, technology, engineering and maths) subjects?

In psychological and neural sciences, if you look at graduate programs and post-doctoral positions, there isn’t as much of a gender gap. But the gap appears when you get to the level of a tenure-track faculty position. There is evidence that simple awareness of this imbalance helps to counteract such gaps. And at all levels it is also critical that we have equal representation. Not just faculty per se, but also on committees, speaker series, conference symposia, award nominations, grant panels, and editorial boards. There is clear evidence that representation of underrepresented groups has a significant impact on ultimate outcomes. And finally, we need to mentor young women scientists to assertively pursue career opportunities.

Interview by Donald Armbrecht

Have you read?