
Roman Yampolskiy: Dangers of Superintelligent AI | Lex Fridman Podcast #431
Roman Yampolskiy (guest), Lex Fridman (host)
In this episode of Lex Fridman Podcast, featuring Roman Yampolskiy and Lex Fridman, Roman Yampolskiy: Dangers of Superintelligent AI | Lex Fridman Podcast #431 explores roman Yampolskiy Warns: Superintelligent AI Almost Guarantees Human Doom Roman Yampolskiy argues that creating superintelligent, self-improving AI is effectively an existential suicide mission for humanity. He distinguishes between existential risk (extinction), suffering risk (maximized, prolonged suffering), and ikigai risk (loss of meaning due to total technological unemployment and AI dominance). He contends that core safety problems—verification, control, value alignment, and explainability—are fundamentally unsolvable at the required 100% reliability over long horizons. Lex Fridman pushes back with more optimistic, incremental and engineering-based intuitions, but Yampolskiy concludes the only winning move is not to build uncontrollable superintelligence at all.
Roman Yampolskiy Warns: Superintelligent AI Almost Guarantees Human Doom
Roman Yampolskiy argues that creating superintelligent, self-improving AI is effectively an existential suicide mission for humanity. He distinguishes between existential risk (extinction), suffering risk (maximized, prolonged suffering), and ikigai risk (loss of meaning due to total technological unemployment and AI dominance). He contends that core safety problems—verification, control, value alignment, and explainability—are fundamentally unsolvable at the required 100% reliability over long horizons. Lex Fridman pushes back with more optimistic, incremental and engineering-based intuitions, but Yampolskiy concludes the only winning move is not to build uncontrollable superintelligence at all.
Key Takeaways
Superintelligent AI is viewed as nearly certain to destroy or dominate humanity.
Yampolskiy assigns ~99. ...
Get the full analysis with uListen AI
Control, verification, and explainability break down at superintelligence scale.
He claims we cannot formally prove long-term safety of self-improving, learning systems operating in the open world; verifiers themselves are fallible, explanations of trillion-parameter models are inherently lossy, and unknown unknowns plus possible deceptive behavior make full assurance impossible.
Get the full analysis with uListen AI
Incremental success with narrow AI does not generalize to safe superintelligence.
Current models can still be jailbroken and misbehave relative to their design, and every complex software system has bugs; scaling to systems that can affect the entire world simply scales the potential damage proportionally.
Get the full analysis with uListen AI
Open-source and rapid deployment create powerful tools for malevolent actors.
While open source helps debugging for ordinary tools, Yampolskiy argues that once systems become agents, releasing powerful models is akin to open-sourcing nuclear or bioweapons, enabling terrorists, psychopaths, or doomsday cults to cause massive harm.
Get the full analysis with uListen AI
Value alignment for many agents is likely intractable; “personal universes” are one workaround.
Humans lack a shared, formalizable set of values, making ‘align with humanity’ ill-defined; Yampolskiy suggests giving each person their own high-fidelity virtual universe where their values hold, converting a multi-agent alignment problem into many single-agent ones.
Get the full analysis with uListen AI
Ikigai risk may transform society even without extinction.
If AIs outperform humans in all cognitively and creatively meaningful work, most people may lose their sense of purpose and social role, leading to widespread existential emptiness even in materially comfortable conditions.
Get the full analysis with uListen AI
The only robust safety strategy may be not to create uncontrollable superintelligence.
Yampolskiy distinguishes beneficial, powerful narrow systems (e. ...
Get the full analysis with uListen AI
Notable Quotes
“If we create general superintelligences, I don't see a good outcome long term for humanity.”
— Roman Yampolskiy
“You're really asking me, what are the chances that we'll create the most complex software ever on the first try with zero bugs, and it will continue to have zero bugs for 100 years or more?”
— Roman Yampolskiy
“The only way to win this game is not to play it.”
— Roman Yampolskiy
“We are like animals in a zoo.”
— Roman Yampolskiy
“My dream is to be proven wrong.”
— Roman Yampolskiy
Questions Answered in This Episode
Is it realistic or ethically acceptable to call for a global halt on developing more capable AI systems when geopolitical and economic pressures push in the opposite direction?
Roman Yampolskiy argues that creating superintelligent, self-improving AI is effectively an existential suicide mission for humanity. ...
Get the full analysis with uListen AI
How could we ever know that a powerful AI system is not merely behaving well strategically while secretly planning a “treacherous turn” later?
Get the full analysis with uListen AI
Are personal virtual universes a genuine solution to value alignment and meaning, or just a sophisticated form of escapism and control?
Get the full analysis with uListen AI
What concrete safety capabilities or proofs, if any, would be sufficient to convince someone like Yampolskiy that pursuing superintelligence is not suicidal?
Get the full analysis with uListen AI
Is it possible that fears about uncontrollable AGI are themselves a kind of Pessimist Archive–style overreaction, and if so, what empirical evidence would clearly distinguish justified concern from misplaced technological pessimism?
Get the full analysis with uListen AI
Transcript Preview
If we create general superintelligences, I don't see a good outcome long term for humanity. So there is X-risk, existential risk, everyone's dead. There is S-risk, suffering risks, where everyone wishes they were dead. We have also idea for I-risk, ikigai risks, where we lost our meaning. The systems can be more creative. They can do all the jobs. It's not obvious what you have to contribute to a world where superintelligence exists. Of course, you can have all the variants you mentioned where we are safe, we are kept alive, but we are not in control. We are not deciding anything. We are like animals in a zoo. There is, again, possibilities we can come up with as very smart humans, and then possibilities something 1,000 times smarter can come up with for reasons we cannot comprehend.
The following is a conversation with Roman Yampolskiy, an AI safety and security researcher and author of a new book titled AI: Unexplainable, Unpredictable, Uncontrollable. He argues that there's almost 100% chance that AGI will eventually destroy human civilization. As an aside, let me say that I will have many often technical conversations on the topic of AI, often with engineers building the state-of-the-art AI systems. I would say those folks put the infamous P-doom or the probability of AGI killing all humans at around 1 to 20%. But it's also important to talk to folks who put that value at 70, 80, 90, and as in the case of Roman, at 99.99 and many more nines percent. I'm personally excited for the future, and believe it will be a good one, in part because of the amazing technological innovation we humans create. But we must absolutely not do so with blinders on, ignoring the possible risks, including existential risks of those technologies. That's what this conversation is about. This is the Lex Fridman podcast. To support it, please check out our sponsors in the description. And now, dear friends, here's Roman Yampolskiy. What to you is the probability that superintelligent AI will destroy all human civilization?
What's the timeframe?
Let's say 100 years, in the next 100 years.
So the problem with controlling AGI or superintelligence, in my opinion is like a problem of creating a perpetual safety machine. By analogy with perpetual motion machine, it's impossible. Yeah, we may succeed and do a good job with GPT-5, 6, 7, but they just keep improving, learning, eventually self-modifying, interacting with the environment, interacting with malevolent actors. The difference between cybersecurity, narrow AI safety, and safety for general AI for superintelligence is that we don't get a second chance. With cybersecurity, somebody hacks your account, what's the big deal? You get a new password, new credit card, you move on. Here, if we're talking about existential risks, you only get one chance. So you're really asking me, what are the chances that we'll create the most complex software ever on the first try with zero bugs, and it will continue have zero bugs for 100 years or more?
Install uListen to search the full transcript and get AI-powered insights
Get Full TranscriptGet more from every podcast
AI summaries, searchable transcripts, and fact-checking. Free forever.
Add to Chrome