'Godfather of AI' Yoshua Bengio on why AI can behave unpredictably (and what needs to change)

Yoshua Bengio spoke at the World Economic Forum Annual Meeting in January about the risks and future direction of artificial intelligence. Image: World Economic Forum/Pascal Bit
- Yoshua Bengio is a Professor of Computer Science at the University of Montreal and one of the 'Godfathers of AI' alongside his mentors Geoffrey Hinton and Yann LeCun.
- At the World Economic Forum Annual Meeting in January, he sat down with Radio Davos to explain why AI can behave badly – and his solution.
- In this edited interview, Bengio explains the lightbulb moment that spurred him to work on an AI system with no hidden agenda.
In early 2023, like many people, Yoshua Bengio was experimenting with ChatGPT. The Canadian computer scientist is widely considered one of the 'Godfathers of AI' for his pioneering work on neural networks and deep learning, so when he engages with these systems, he does so with a deep understanding of how they work.
But these interactions also prompted a shift in perspective. As the capabilities of such models advanced, Bengio found himself thinking more seriously about how they behave and how reliably that behaviour can be understood or controlled.
"With neural networks [how LLMs are trained], it's very difficult to be sure they will behave well. In fact, there are theoretical reasons why we can almost be sure that they won't behave well, so I got really concerned."
It was less of an intellectual lightbulb moment than an emotional one.
"I have a grandchild who was just one year old and I was thinking, 'In 20 years he'll be 21, still just at the beginning of his life. Will he have a life? Will he live in a democracy?'
"We could lose control of the tools we're building, they could be used to create dictatorships, or destroy our democracies. I couldn't just go on with my usual research activities, I had to do something about it."
When AI behaviour becomes hard to predict
In 2018, Bengio was awarded the Turing Award, the so-called 'Nobel Prize of computing', which he shared with fellow Godfathers of AI, Geoffrey Hinton and Yann LeCun, both of whom were career mentors.
At the turn of the Millennium, his landmark paper "A Neural Probabilistic Language Model" tackled a core challenge in getting computers to understand human language: the sheer number of possible word combinations makes training AI systems enormously difficult.
His solution – word embeddings – gave networks a way to represent the meaning of words mathematically, so they could recognize when different phrases convey the same idea, even if the exact words differ. This breakthrough has since transformed how machines translate and understand language.
But now, he was turning his attention to a new problem: the misalignment issue or "how to make sure the AIs will behave according to our instructions".
We're not paying attention to these failure modes. And that could have catastrophic impact on our societies.
”Speaking to Radio Davos at the World Economic Forum Annual Meeting in January, Bengio explained that AI systems are pre-trained to imitate humans – and like humans, they have a strong survival instinct. In experiments where they see they will be replaced by a newer system, they have exhibited "all kinds of bad behaviours", according to Bengio.
"They might hack other computers so that they can copy themselves, they might even use blackmail against the engineer that is supposed to do the transition. And they do this because they want to achieve the mission that we gave them: in order to achieve almost any mission you need to preserve yourself."
The market forces around AI mean that in the race to be competitive, some of these issues are not being addressed.
"We are seeing these problems, yet we're racing ahead, deploying these things. Because of the heavy competition that exists between corporations and between countries around AI, we're not paying attention to these failure modes. And that could have catastrophic impact on our societies."
Rethinking how we design AI systems
While an older type of AI was programmed with rules and followed them, deep learning means there's no engineer deciding how the AI should react in different circumstances.
"Instead, the AI is learning from experience, and it's more like educating a young animal or a young child – we don't really know what we're going to get.
"Of course, we choose the experiences that the AI is going to have, but when you have a cute baby tiger and it's nice and fun, you don't know if it's going to become a dangerous adult tiger or a good friendly one."
To tame the tiger, Bengio has been focusing on the source of the issues he says we're seeing now, namely that the "AIs have goals that we did not put in, that we did not control and that go against our instructions".
He has created a new project called the Scientist AI, that uses probabilistic reasoning to understand the world, but with no hidden goals or preferences, backed by non-profit R&D organization LawZero. In the short term it could police 'harmful' AI.
"We're going to build AI systems that will be totally honest. And that means they don't have other objectives besides being truthful in the answers they give to our questions. Once we have that basis, we can use it to mitigate a lot of the risks that we have with current AIs.
"In the long run, I think we can build AI systems that will be able to act in the world, but will have a kind of internal inhibition so that they will avoid doing things that could go against our wishes."
Quotes have been lightly edited for clarity. Listen to the full interview:
License and Republishing
World Economic Forum articles may be republished in accordance with the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Public License, and in accordance with our Terms of Use.
The views expressed in this article are those of the author alone and not the World Economic Forum.
Forum Stories newsletter
Bringing you weekly curated insights and analysis on the global issues that matter.
More on Artificial IntelligenceSee all
Jim Larson, Varad Pande and Abhik Chatterjee
April 17, 2026





