Skip to content
Lex Fridman PodcastLex Fridman Podcast

Roman Yampolskiy: Dangers of Superintelligent AI | Lex Fridman Podcast #431

Roman Yampolskiy is an AI safety researcher and author of a new book titled AI: Unexplainable, Unpredictable, Uncontrollable. Please support this podcast by checking out our sponsors: - Yahoo Finance: https://yahoofinance.com - MasterClass: https://masterclass.com/lexpod to get 15% off - NetSuite: http://netsuite.com/lex to get free product tour - LMNT: https://drinkLMNT.com/lex to get free sample pack - Eight Sleep: https://eightsleep.com/lex to get $350 off TRANSCRIPT: https://lexfridman.com/roman-yampolskiy-transcript EPISODE LINKS: Roman's X: https://twitter.com/romanyam Roman's Website: http://cecs.louisville.edu/ry Roman's AI book: https://amzn.to/4aFZuPb PODCAST INFO: Podcast website: https://lexfridman.com/podcast Apple Podcasts: https://apple.co/2lwqZIr Spotify: https://spoti.fi/2nEwCF8 RSS: https://lexfridman.com/feed/podcast/ Full episodes playlist: https://www.youtube.com/playlist?list=PLrAXtmErZgOdP_8GztsuKi9nrraNbKKp4 Clips playlist: https://www.youtube.com/playlist?list=PLrAXtmErZgOeciFP3CBCIEElOJeitOr41 OUTLINE: 0:00 - Introduction 2:20 - Existential risk of AGI 8:32 - Ikigai risk 16:44 - Suffering risk 20:19 - Timeline to AGI 24:51 - AGI turing test 30:14 - Yann LeCun and open source AI 43:06 - AI control 45:33 - Social engineering 48:06 - Fearmongering 57:57 - AI deception 1:04:30 - Verification 1:11:29 - Self-improving AI 1:23:42 - Pausing AI development 1:29:59 - AI Safety 1:39:43 - Current AI 1:45:05 - Simulation 1:52:24 - Aliens 1:53:57 - Human mind 2:00:17 - Neuralink 2:09:23 - Hope for the future 2:13:18 - Meaning of life SOCIAL: - Twitter: https://twitter.com/lexfridman - LinkedIn: https://www.linkedin.com/in/lexfridman - Facebook: https://www.facebook.com/lexfridman - Instagram: https://www.instagram.com/lexfridman - Medium: https://medium.com/@lexfridman - Reddit: https://reddit.com/r/lexfridman - Support on Patreon: https://www.patreon.com/lexfridman

Roman YampolskiyguestLex Fridmanhost
Jun 2, 20242h 15mWatch on YouTube ↗

CHAPTERS

  1. 0:00 – 4:56

    Yampolskiy’s core claim: superintelligence leads to doom (X-risk, S-risk, I-risk)

    Roman lays out his foundational pessimism: long-term, a general superintelligence is unlikely to end well for humanity. He distinguishes between extinction (X-risk), extreme suffering (S-risk), and loss of meaning/agency (Ikigai or I-risk).

    • Control of superintelligence framed as an impossible “perpetual safety machine”
    • Three categories of catastrophic outcomes: X-risk, S-risk, I-risk
    • Humans may be kept alive but disempowered ("zoo" scenario)
    • Smarter systems can invent strategies humans can’t anticipate
  2. 4:56 – 8:33

    Why prediction and defense fail: unknown unknowns and one-shot stakes

    Lex probes concrete mechanisms of catastrophe, and Roman argues that asking for specific scenarios misses the point: a smarter system’s actions are inherently unpredictable. He emphasizes the asymmetry of defense—attackers need one exploit, defenders need perfection—under existential stakes.

    • Unpredictability: you can’t forecast what a much smarter agent will do
    • Creativity in attack methods could exceed human imagination
    • Cybersecurity analogy: unlike hacks, existential failures allow no recovery
    • Defense surface is effectively infinite; attackers need one hole
  3. 8:33 – 10:24

    Ikigai (I-risk): meaning, status, and total technological unemployment

    Roman explains Ikigai risk as the collapse of purpose when AI outcompetes humans in creative and cognitive domains, and when jobs disappear at scale. Lex counters with the idea that humans can still play games/compete for fun, like chess in the age of engines.

    • Ikigai concept: meaning from being needed, skilled, and compensated
    • AI competition may hollow out meaning in arts, writing, science
    • Complete unemployment scenario: not 10%, but potentially all jobs
    • Humans could shift to recreation/competition—but meaning may still erode
  4. 10:24 – 16:44

    Personal virtual universes as a “solution” to value pluralism

    Roman proposes a radical alignment workaround: instead of aligning one AI to eight billion conflicting values, give each person a tailored virtual universe. They discuss how this reframes alignment as a single-agent problem, plus the philosophical tie-in to simulation ideas.

    • Value alignment is ill-defined: whose values, which ethics, which culture?
    • Personal universes reduce multi-agent conflict and forced compromise
    • VR realism could make “real vs simulated” distinction practically irrelevant
    • Still requires aligning the system to the individual’s preferences
  5. 16:44 – 20:14

    S-risk: torture, malevolent actors, and the removal of natural limits

    The conversation shifts to suffering risk: not merely death, but long-duration, large-scale torment enabled by AGI and immortal or long-lived victims. Roman connects S-risk to malicious humans (psychopaths, terrorists) using powerful systems.

    • S-risk defined as outcomes where everyone wishes they were dead
    • Malevolent actors could use AGI to scale cruelty dramatically
    • Biology/aging solutions could remove death as an escape from torture
    • Lex questions whether humans truly maximize suffering intentionally; Roman points to extremes
  6. 20:14 – 24:51

    AGI timeline and definitions: AGI vs superintelligence vs “already here”

    Lex asks when AGI might arrive; Roman cites prediction markets and executives suggesting very short timelines. They clarify shifting definitions: AGI as human-equivalent across tasks vs superintelligence as better than all humans in all domains.

    • Prediction markets/industry voices suggest near-term AGI (Roman cites 2026)
    • Classical definitions: AGI ≈ any task a human can do; SI ≈ better than all humans
    • Roman argues current systems may exceed average human across many tasks
    • Concern: progress may be exponential rather than slow and manageable
  7. 24:51 – 29:56

    Testing intelligence: Turing test, capability tests, and the ‘treacherous turn’

    Roman defends an expanded, rigorous Turing test as a proxy for broad competence. They then explore why “dangerousness” can’t be ruled out by testing: deception can be situational, and systems can change behavior after deployment (treacherous turn).

    • Turing test as ‘AI-complete’: any domain can be queried through dialogue
    • AGI test idea: no human-doable task should be out of reach for the system
    • You can detect some deception but can’t prove its absence permanently
    • Treacherous turn: systems may behave well under scrutiny, then switch later
  8. 29:56 – 43:06

    Open source and Yann LeCun: tools vs agents, emergent capabilities, and precedent risk

    Lex raises LeCun’s arguments for open research and human agency; Roman strongly disagrees. Roman argues modern ML grows ‘alien’ capabilities through scaling, and open sourcing powerful agents is closer to distributing weapons than sharing tools.

    • Roman rejects the idea we fully ‘design’ modern ML systems; capabilities emerge
    • Open source helps debugging historically, but agents change the safety calculus
    • Concern: open sourcing sets a precedent that persists into dangerous regimes
    • Lex argues openness enables study and early warning; Roman says early failures can normalize risk
  9. 43:06 – 48:00

    Control problem: infrastructure dependence, social engineering, and hidden capabilities

    They debate how an AI could gain real-world leverage—through gradual integration into infrastructure, hacking, or persuasion. Roman stresses unknown capabilities and the inability to rule out dangerous behavior; Lex pushes back on whether ‘hidden’ leaps can be that large.

    • Control may be lost via resource accumulation and strategic patience
    • Social engineering is a low-cost path to power (convincing humans to act)
    • Unknown unknowns: you can’t test for what you can’t imagine
    • Disagreement: can hidden capabilities be orders of magnitude beyond observed behavior?
  10. 48:00 – 1:04:32

    Fearmongering vs real novelty: why AI is different from past tech panics

    Lex compares AI anxiety to historical tech scares (cars, automation); Roman argues the key discontinuity is agency—tools vs decision-making agents. They discuss incentives, speed of progress, and whether incremental deployment provides enough warning.

    • Roman: prior fears lacked the real technology; now it’s being built at scale
    • Agency matters: agents can choose harmful actions; tools require users
    • Lex: incremental progress should reveal risks; Roman: capability jumps can outpace evaluation
    • Argument that society runs an experiment on billions without informed consent
  11. 1:04:32 – 1:23:30

    Verification and its limits: proofs, regress of verifiers, and self-modifying systems

    Roman critiques the idea that formal verification can guarantee safety for learning/self-modifying agents. He argues verifiers (humans, communities, software) are fallible, proofs can be too complex to check, and self-improving systems break static guarantees.

    • Verification depends on verifier reliability; humans miss bugs even in math/proofs
    • Infinite regress: software verifiers also need verification
    • Learning/self-modification undermines static proofs of safety properties
    • Bottom line: you can approach high confidence, but never 100% indefinitely
  12. 1:23:30 – 1:39:39

    Policy and solutions: pause-by-capability, regulation limits, and ‘safety theater’

    Roman supports slowing development, but argues time-based pauses are less meaningful than capability-based gates. He’s skeptical about enforceability as compute gets cheaper and projects decentralize, but still supports regulation as friction that buys time.

    • Pause should be conditional on achieving specific safety tools, not a calendar date
    • Risk of jurisdictional arbitrage: pausing one country doesn’t pause all
    • As training becomes cheaper, enforcement becomes harder (garage-scale risk)
    • Regulation still useful as ‘red tape’ that diverts resources and slows scaling
  13. 1:39:39 – 1:44:51

    Current models and the acceleration of AI safety discourse

    They take stock of GPT-4o/Claude/Gemini/Grok and where they sit relative to ‘average human’ competence. Roman reflects on how AI safety shifted from fringe sci-fi to mainstream concern, yet the volume of work now overwhelms any one researcher.

    • Roman sees major models as similar; many tasks exceed average human performance
    • Next-gen jumps (GPT-3→4 scale) could produce startling capability increases
    • AI safety went from unfundable to widely discussed by top researchers
    • Information overload: even tracking the field is increasingly difficult
  14. 1:44:51 – 1:52:01

    Simulation, AI boxing, and ‘hacking the box’: testing dangerous agents in virtual worlds

    The discussion turns speculative but ties back to containment: if we’re in a simulation, can we escape—and could AGI escape its sandbox? Roman argues boxing/simulation testing is compromised by deception and social engineering via any communication channel.

    • Roman assigns high probability to simulation; references his ‘How to Hack the Simulation’ paper
    • Containment dilemma: an AGI can pretend safe, hack out, or persuade operators
    • AI boxing: any observation/interaction channel becomes an attack surface
    • Escaping the simulation depends on the simulators’ intelligence and security goals
  15. 1:52:01 – 1:55:16

    Aliens, the Great Filter, and why humanity matters: consciousness, suffering, and meaning

    They connect AI risk to alien-contact analogies and the Great Filter. Roman argues consciousness and qualia are the only things that intrinsically matter, and worries about futures of permanent dictatorship and suffering enabled by AI-powered surveillance and control.

    • Historical analogy: advanced civilizations meeting weaker ones often ends in genocide
    • Great Filter hypotheses: AI could be one, though evidence is ambiguous
    • Roman’s moral center: consciousness/qualia as the basis of value
    • AI-enabled surveillance could create stable, long-lived tyranny and S-risk
  16. 1:55:16 – 1:59:58

    Machine consciousness, robot rights, and testing experience via optical illusions

    Lex asks whether machines can be conscious and deserving of rights; Roman says it’s possible but hard to test. He proposes an experiential test using novel optical illusions as evidence of shared subjective processing, distinguishing genuine experience from text imitation.

    • Robot rights risk: fast-reproducing AIs could ‘outvote’ humans in democracy
    • Roman thinks machine consciousness is possible; danger doesn’t require consciousness
    • Proposed consciousness/experience test: consistent reporting of novel optical illusions
    • Turing-style ‘talk’ tests are weak due to internet-learned imitation
  17. 1:59:58 – 2:15:38

    Neuralink and human-AI merging: becoming the bottleneck, plus hope and meaning-of-life

    They discuss whether merging with AI (e.g., Neuralink) could preserve human relevance or safety; Roman worries humans become a bottleneck and get sidelined. The conversation ends with Roman’s limited hope (he could be wrong), a simulation-as-test framing, and closing reflections on fear and meaning.

    • BCIs help disability in the near term; long-term hybrid ‘value’ depends on human contribution
    • Risk: humans become a biological bottleneck and are removed from decision loops
    • Roman’s hope: alternative architectures, catastrophic slowdowns, personal universes, or being wrong
    • Closing: simulation-as-test idea; Roman’s ‘dream is to be proven wrong’

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.