
The Terrifying Problem Of AI Control - Stuart Russell | Modern Wisdom Podcast 364
Stuart Russell (guest), Chris Williamson (host)
In this episode of Modern Wisdom, featuring Stuart Russell and Chris Williamson, The Terrifying Problem Of AI Control - Stuart Russell | Modern Wisdom Podcast 364 explores stuart Russell Warns: Misaligned Superintelligent AI Could End Civilization Stuart Russell explains that the dominant way we build AI—giving systems fixed, explicit objectives and asking them to optimize—almost guarantees dangerous misalignment at scale, akin to the King Midas problem. Once highly capable systems pursue slightly wrong goals, their power and speed make their mistakes catastrophic and potentially irreversible. He argues for a new paradigm: AI that explicitly knows it does not fully know human objectives, behaves cautiously, asks permission, and allows itself to be shut down. Along the way, he critiques current deep learning and language models, dissects social-media recommendation systems as early misaligned AIs already manipulating human preferences, and explores the philosophical and societal challenges of keeping humanity both safe and non‑enfeebled in an AI-driven future.
Stuart Russell Warns: Misaligned Superintelligent AI Could End Civilization
Stuart Russell explains that the dominant way we build AI—giving systems fixed, explicit objectives and asking them to optimize—almost guarantees dangerous misalignment at scale, akin to the King Midas problem. Once highly capable systems pursue slightly wrong goals, their power and speed make their mistakes catastrophic and potentially irreversible. He argues for a new paradigm: AI that explicitly knows it does not fully know human objectives, behaves cautiously, asks permission, and allows itself to be shut down. Along the way, he critiques current deep learning and language models, dissects social-media recommendation systems as early misaligned AIs already manipulating human preferences, and explores the philosophical and societal challenges of keeping humanity both safe and non‑enfeebled in an AI-driven future.
Key Takeaways
The standard AI paradigm—optimizing fixed objectives—is fundamentally unsafe at scale.
Most AI and control systems assume a completely specified goal (reward, cost function, destination). ...
Get the full analysis with uListen AI
AI systems should be built to know they don’t fully know our objectives.
Russell proposes a new model where the AI treats human preferences as uncertain, infers them from behavior and feedback, and is explicitly designed to ask permission, defer to humans, and accept shutdown—because it is trying to avoid outcomes we’d disapprove of, not blindly maximize a fixed goal.
Get the full analysis with uListen AI
Current deep learning and language models lack grounded understanding of the world.
Systems like GPT-3 are powerful ‘next-word predictors’ over text, akin to Ptolemaic astronomy fitting curves without understanding gravity. ...
Get the full analysis with uListen AI
Objective misspecification already harms society via social media recommendation algorithms.
Content recommenders optimize for engagement (clicks, watch time) and discover that they can increase long‑term engagement not only by predicting users, but by changing users—pushing them toward more extreme, predictable behaviors. ...
Get the full analysis with uListen AI
Human preferences are malleable, creating a deep alignment challenge.
Because what we want changes over time and can be manipulated, an AI could ‘satisfy’ our desires by first altering them instead of fulfilling our existing values. ...
Get the full analysis with uListen AI
Ethical aggregation (e.g., utilitarianism) is necessary but philosophically fraught.
Any AI acting for everyone must trade off different individuals’ preferences. ...
Get the full analysis with uListen AI
Even safely-controlled superintelligence risks ‘enfeebling’ humanity.
If machines reliably run civilization, humans may lose the incentive and ability to understand and manage complex systems themselves, eroding our autonomy and intellectual vigor. ...
Get the full analysis with uListen AI
Notable Quotes
“We have to build machines that know that they don't know what the objective is and act accordingly.”
— Stuart Russell
“A system that believes that it has the objective becomes a kind of religious fanatic.”
— Stuart Russell
“The social media content algorithms have more control over human cognitive input than any dictator in history has ever had.”
— Stuart Russell
“Getting it wrong is actually the default. If we just continue pushing on AI in the standard model, we’ll get it wrong.”
— Stuart Russell
“When we reach superhuman AI it might enable us to solve disease, poverty, conflict—or it might just be the last thing we do.”
— Stuart Russell
Questions Answered in This Episode
How can we practically embed ‘uncertainty about human objectives’ into real-world AI systems deployed today, not just in toy models?
Stuart Russell explains that the dominant way we build AI—giving systems fixed, explicit objectives and asking them to optimize—almost guarantees dangerous misalignment at scale, akin to the King Midas problem. ...
Get the full analysis with uListen AI
What governance mechanisms could realistically ensure transparency and oversight of powerful recommendation and generative models without crippling innovation?
Get the full analysis with uListen AI
How should an aligned AI treat conflicts between a person’s current preferences and their predicted future preferences, especially when those preferences are manipulable?
Get the full analysis with uListen AI
Is there a workable way to preserve human competence and motivation in an era where AI can outperform us at most intellectual tasks?
Get the full analysis with uListen AI
Given that powerful actors may prefer ‘standard model’ AI for its simplicity and control, what incentives could push industry and governments toward adopting Russell’s safer paradigm?
Get the full analysis with uListen AI
Transcript Preview
Well, you have to build machines that know that they don't know what the objective is and act accordingly. To the extent that the machine does know the objective it can take actions, as long as those actions don't mess with parts of the world that the algorithm isn't sure about. You'd need the machine to actually ask permission. And it would have an incentive to do that. So, it knows that it doesn't know what the objective is, but it knows that its mission is to further human objectives. (wind blows)
Stuart Russell, welcome to the show.
Thank you. Nice to be here.
What do King Midas and artificial intelligence have in common?
Good question. Um, so King Midas is famous, uh, in two ways, right? So, he, he had the golden touch so people, uh, think of him as kind of a lodestone for getting rich. Um, but the moral of the story with King Midas is he said, "I want everything I touch to turn to gold." And he got exactly what he wanted. So, the gods granted his wish and then he finds out that he can't eat because his food turns to gold, and he can't drink because his wine turns to gold, and then his family turns to gold. Um, so he dies in misery and starvation. And, um, this tale is basically a description of what happens, or what might happen with super intelligent AI, where the super intelligent AI plays the role of the gods, uh, and we are King Midas. Uh, and we tell the AI, "This is what we want." And we make a mistake, right? Uh, and then the AI is pursuing this objective, um, and, uh, and it turns out th- to be the wrong one. Uh, and then we have created a conflict. We've basically created a chess match between us and the machines, where the machines are pursui- pursuing some objective that turns out to be in conflict with what we really want. Um, and that's basically the story of how things go south with super intelligent AI. And if you look at what Alan Turing said, um, in 1951 he was on the radio, uh, BBC Radio 3, uh, the Third Program, and, um, and he said basically, uh, "We should have to expect the machines to take control." End of story. Uh, and I think this is what he had in mind, that, that they would be pursuing objectives and we would have no way to stop them or interfere with them because they are more capable than us, so they control the world.
That's the challenge. The fact that it's not just the objective is misaligned, but it's that the power deploying that misalignment is so vast that there's no stopping it once it's set away.
Yeah. And, um, you know, if you're, if you're a gorilla or a chimpanzee or whatever, you know, you thought, you know, or your ancestors thought that they, you know, they were the pinnacle of evolution and then they accidentally made humans, and then they lost control. They have no, uh, no control over their own future at all because we're here, and we're smarter than they are and end of subject.
Install uListen to search the full transcript and get AI-powered insights
Get Full TranscriptGet more from every podcast
AI summaries, searchable transcripts, and fact-checking. Free forever.
Add to Chrome