Roman Yampolskiy: Dangers of Superintelligent AI | Lex Fridman Podcast #431

Roman Yampolskiy: Dangers of Superintelligent AI | Lex Fridman Podcast #431

Lex Fridman PodcastJun 2, 20242h 15m

Roman Yampolskiy (guest), Lex Fridman (host)

Existential, suffering, and ikigai risks from superintelligent AILimits of AI safety: verification, control, explainability, and value alignmentTimelines and unpredictability of AGI/superintelligence emergenceDebate over open-source AI, regulation, and capitalism-driven accelerationSimulation, personal virtual universes, and multi-agent value alignmentComparison of AI risk to historical tech fears and great filter / alien scenariosConsciousness, moral status of AI, and the possibility of machine qualia

In this episode of Lex Fridman Podcast, featuring Roman Yampolskiy and Lex Fridman, Roman Yampolskiy: Dangers of Superintelligent AI | Lex Fridman Podcast #431 explores roman Yampolskiy Warns: Superintelligent AI Almost Guarantees Human Doom Roman Yampolskiy argues that creating superintelligent, self-improving AI is effectively an existential suicide mission for humanity. He distinguishes between existential risk (extinction), suffering risk (maximized, prolonged suffering), and ikigai risk (loss of meaning due to total technological unemployment and AI dominance). He contends that core safety problems—verification, control, value alignment, and explainability—are fundamentally unsolvable at the required 100% reliability over long horizons. Lex Fridman pushes back with more optimistic, incremental and engineering-based intuitions, but Yampolskiy concludes the only winning move is not to build uncontrollable superintelligence at all.

Roman Yampolskiy Warns: Superintelligent AI Almost Guarantees Human Doom

Roman Yampolskiy argues that creating superintelligent, self-improving AI is effectively an existential suicide mission for humanity. He distinguishes between existential risk (extinction), suffering risk (maximized, prolonged suffering), and ikigai risk (loss of meaning due to total technological unemployment and AI dominance). He contends that core safety problems—verification, control, value alignment, and explainability—are fundamentally unsolvable at the required 100% reliability over long horizons. Lex Fridman pushes back with more optimistic, incremental and engineering-based intuitions, but Yampolskiy concludes the only winning move is not to build uncontrollable superintelligence at all.

Key Takeaways

Superintelligent AI is viewed as nearly certain to destroy or dominate humanity.

Yampolskiy assigns ~99. ...

Get the full analysis with uListen AI

Control, verification, and explainability break down at superintelligence scale.

He claims we cannot formally prove long-term safety of self-improving, learning systems operating in the open world; verifiers themselves are fallible, explanations of trillion-parameter models are inherently lossy, and unknown unknowns plus possible deceptive behavior make full assurance impossible.

Get the full analysis with uListen AI

Incremental success with narrow AI does not generalize to safe superintelligence.

Current models can still be jailbroken and misbehave relative to their design, and every complex software system has bugs; scaling to systems that can affect the entire world simply scales the potential damage proportionally.

Get the full analysis with uListen AI

Open-source and rapid deployment create powerful tools for malevolent actors.

While open source helps debugging for ordinary tools, Yampolskiy argues that once systems become agents, releasing powerful models is akin to open-sourcing nuclear or bioweapons, enabling terrorists, psychopaths, or doomsday cults to cause massive harm.

Get the full analysis with uListen AI

Value alignment for many agents is likely intractable; “personal universes” are one workaround.

Humans lack a shared, formalizable set of values, making ‘align with humanity’ ill-defined; Yampolskiy suggests giving each person their own high-fidelity virtual universe where their values hold, converting a multi-agent alignment problem into many single-agent ones.

Get the full analysis with uListen AI

Ikigai risk may transform society even without extinction.

If AIs outperform humans in all cognitively and creatively meaningful work, most people may lose their sense of purpose and social role, leading to widespread existential emptiness even in materially comfortable conditions.

Get the full analysis with uListen AI

The only robust safety strategy may be not to create uncontrollable superintelligence.

Yampolskiy distinguishes beneficial, powerful narrow systems (e. ...

Get the full analysis with uListen AI

Notable Quotes

If we create general superintelligences, I don't see a good outcome long term for humanity.

Roman Yampolskiy

You're really asking me, what are the chances that we'll create the most complex software ever on the first try with zero bugs, and it will continue to have zero bugs for 100 years or more?

Roman Yampolskiy

The only way to win this game is not to play it.

Roman Yampolskiy

We are like animals in a zoo.

Roman Yampolskiy

My dream is to be proven wrong.

Roman Yampolskiy

Questions Answered in This Episode

Is it realistic or ethically acceptable to call for a global halt on developing more capable AI systems when geopolitical and economic pressures push in the opposite direction?

Roman Yampolskiy argues that creating superintelligent, self-improving AI is effectively an existential suicide mission for humanity. ...

Get the full analysis with uListen AI

How could we ever know that a powerful AI system is not merely behaving well strategically while secretly planning a “treacherous turn” later?

Get the full analysis with uListen AI

Are personal virtual universes a genuine solution to value alignment and meaning, or just a sophisticated form of escapism and control?

Get the full analysis with uListen AI

What concrete safety capabilities or proofs, if any, would be sufficient to convince someone like Yampolskiy that pursuing superintelligence is not suicidal?

Get the full analysis with uListen AI

Is it possible that fears about uncontrollable AGI are themselves a kind of Pessimist Archive–style overreaction, and if so, what empirical evidence would clearly distinguish justified concern from misplaced technological pessimism?

Get the full analysis with uListen AI

Transcript Preview

Roman Yampolskiy

If we create general superintelligences, I don't see a good outcome long term for humanity. So there is X-risk, existential risk, everyone's dead. There is S-risk, suffering risks, where everyone wishes they were dead. We have also idea for I-risk, ikigai risks, where we lost our meaning. The systems can be more creative. They can do all the jobs. It's not obvious what you have to contribute to a world where superintelligence exists. Of course, you can have all the variants you mentioned where we are safe, we are kept alive, but we are not in control. We are not deciding anything. We are like animals in a zoo. There is, again, possibilities we can come up with as very smart humans, and then possibilities something 1,000 times smarter can come up with for reasons we cannot comprehend.

Lex Fridman

The following is a conversation with Roman Yampolskiy, an AI safety and security researcher and author of a new book titled AI: Unexplainable, Unpredictable, Uncontrollable. He argues that there's almost 100% chance that AGI will eventually destroy human civilization. As an aside, let me say that I will have many often technical conversations on the topic of AI, often with engineers building the state-of-the-art AI systems. I would say those folks put the infamous P-doom or the probability of AGI killing all humans at around 1 to 20%. But it's also important to talk to folks who put that value at 70, 80, 90, and as in the case of Roman, at 99.99 and many more nines percent. I'm personally excited for the future, and believe it will be a good one, in part because of the amazing technological innovation we humans create. But we must absolutely not do so with blinders on, ignoring the possible risks, including existential risks of those technologies. That's what this conversation is about. This is the Lex Fridman podcast. To support it, please check out our sponsors in the description. And now, dear friends, here's Roman Yampolskiy. What to you is the probability that superintelligent AI will destroy all human civilization?

Roman Yampolskiy

What's the timeframe?

Lex Fridman

Let's say 100 years, in the next 100 years.

Roman Yampolskiy

So the problem with controlling AGI or superintelligence, in my opinion is like a problem of creating a perpetual safety machine. By analogy with perpetual motion machine, it's impossible. Yeah, we may succeed and do a good job with GPT-5, 6, 7, but they just keep improving, learning, eventually self-modifying, interacting with the environment, interacting with malevolent actors. The difference between cybersecurity, narrow AI safety, and safety for general AI for superintelligence is that we don't get a second chance. With cybersecurity, somebody hacks your account, what's the big deal? You get a new password, new credit card, you move on. Here, if we're talking about existential risks, you only get one chance. So you're really asking me, what are the chances that we'll create the most complex software ever on the first try with zero bugs, and it will continue have zero bugs for 100 years or more?

Install uListen to search the full transcript and get AI-powered insights

Get Full Transcript

Get more from every podcast

AI summaries, searchable transcripts, and fact-checking. Free forever.

Add to Chrome