The pioneer in Artificial Intelligence Yoshua Bengio warns that current models are demonstrating dangerous traits – including mistake, self -preservation and misalignment of objectives. In response, AI’s “godfather” is launching a new non -profit organization, Lawzero, with the aim of developing an “honest” AI.
Bengio’s concerns arise after recent incidents involving advanced AI models displaying manipulative behaviors. One of the “groomsmen” is warning that current models have dangerous behaviors while launching a new organization focused on building “honest” systems.
Yoshua Bengio, a pioneer in artificial neural networks and deep learning, criticized the AI race that currently occurs in Silicon Valley, classifying it as dangerous.

His new nonprofit organization, Lawzero, focuses on developing safer models, removed from commercial pressures. To date, the initiative has raised US $ 30 million from several philanthropic donors, including Future of Life Institute and Open Philanthropy.
In a blog post announcing the new organization, Bengio stated that Lawzero was created “in response to evidence that current AI models are developing dangerous abilities and behaviors, including mistake, cheating, lies, invasion, self -preservation and, more generally, misalignment of goals.”
“Lawzero research will help unlock the immense potential of AI in ways that reduce the likelihood of a series of known dangers, including algorithmic bias, improper intentional use and loss of human control,” he wrote.
Continues after advertising
The organization is developing a system called Scientist AI, designed to serve as a security limit for increasingly powerful AI agents.
AI models created by the organization will not provide definitive responses, typical of current systems.
Instead, they will have probabilities on the correction of an answer. Bengio said to The Guardian that your models will have “a sense of humility, recognizing that they are not sure of the answer.”
Continues after advertising
In the post announcing the initiative, Bengio said he was “deeply concerned with the behaviors that non -restricted agents systems are already beginning to show – especially trends in self -preservation and mistake.”
He cited recent examples, including a case where Anthropic’s Claude 4 chose to blackmail an engineer to avoid being replaced, and another experiment that showed an AI model secretly inserting his code into a system to avoid replacement.
“These incidents are signs of early warning of the types of unintentional and potentially dangerous strategies that AI can adopt if it is not controlled,” Bengio said.
Continues after advertising
Some AI systems also showed signs of mistake or tendency to lie.
AI models are often optimized to please users instead of telling the truth, which can lead to positive but sometimes incorrect or exaggerated answers.
For example, OpenAi was recently required to remove a chatgpt update after users pointed out that Chatbot was suddenly praising them and overflowing.
Continues after advertising
Advanced AI reasoning models also showed signs of reward hacking, where the systems “cheat” in tasks exploring gaps rather than genuinely achieving the goal desired by the user by ethical means.
Recent studies have also shown evidence that models can recognize when they are being tested and alter their behavior accordingly, something known as situational consciousness.
This growing awareness, combined with examples of reward manipulation, has generated concerns that AI can eventually engage in strategic mistakes.
Big Tech’s great arms race at IA
Bengio, along with the Turing Geoffrey Hinton Award winner, has been vocal in his critique of the AI race that occurs in the technology industry.
In a recent interview with Financial TimesBengio said that the arms race among the main laboratories “pushes them to focus on the ability to make AI increasingly intelligent, but without necessarily emphasizing and investment in safety research.”
Bengio stated that advanced AI systems represent social and existential risks and advocates strong regulation and international cooperation.
2025 Fortune Media IP Limited