Skip to content
Lex Fridman PodcastLex Fridman Podcast

Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization | Lex Fridman Podcast #368

Eliezer Yudkowsky is a researcher, writer, and philosopher on the topic of superintelligent AI. Please support this podcast by checking out our sponsors: - Linode: https://linode.com/lex to get $100 free credit - House of Macadamias: https://houseofmacadamias.com/lex and use code LEX to get 20% off your first order - InsideTracker: https://insidetracker.com/lex to get 20% off EPISODE LINKS: Eliezer's Twitter: https://twitter.com/ESYudkowsky LessWrong Blog: https://lesswrong.com Eliezer's Blog page: https://www.lesswrong.com/users/eliezer_yudkowsky Books and resources mentioned: 1. AGI Ruin (blog post): https://lesswrong.com/posts/uMQ3cqWDPHhjtiesc/agi-ruin-a-list-of-lethalities 2. Adaptation and Natural Selection: https://amzn.to/40F5gfa PODCAST INFO: Podcast website: https://lexfridman.com/podcast Apple Podcasts: https://apple.co/2lwqZIr Spotify: https://spoti.fi/2nEwCF8 RSS: https://lexfridman.com/feed/podcast/ Full episodes playlist: https://www.youtube.com/playlist?list=PLrAXtmErZgOdP_8GztsuKi9nrraNbKKp4 Clips playlist: https://www.youtube.com/playlist?list=PLrAXtmErZgOeciFP3CBCIEElOJeitOr41 OUTLINE: 0:00 - Introduction 0:43 - GPT-4 23:23 - Open sourcing GPT-4 39:41 - Defining AGI 47:38 - AGI alignment 1:30:30 - How AGI may kill us 2:22:51 - Superintelligence 2:30:03 - Evolution 2:36:33 - Consciousness 2:47:04 - Aliens 2:52:35 - AGI Timeline 3:00:35 - Ego 3:06:27 - Advice for young people 3:11:45 - Mortality 3:13:26 - Love SOCIAL: - Twitter: https://twitter.com/lexfridman - LinkedIn: https://www.linkedin.com/in/lexfridman - Facebook: https://www.facebook.com/lexfridman - Instagram: https://www.instagram.com/lexfridman - Medium: https://medium.com/@lexfridman - Reddit: https://reddit.com/r/lexfridman - Support on Patreon: https://www.patreon.com/lexfridman

Eliezer YudkowskyguestLex Fridmanhost
Mar 29, 20233h 17mWatch on YouTube ↗

At a glance

WHAT IT’S REALLY ABOUT

Eliezer Yudkowsky Warns: Misaligned Superintelligence Likely Ends Humanity Soon

  1. Lex Fridman and Eliezer Yudkowsky discuss the rapid progress of large language models like GPT‑4 and why Eliezer believes current AI development is on track to destroy human civilization. Eliezer argues that alignment must work on the first critical try with smarter‑than‑human systems, unlike normal science where multiple failures are tolerated, because a single failure with superintelligence is fatal and irreversible. He is deeply pessimistic about our current trajectory: capabilities are racing ahead while alignment science and interpretability lag far behind, and institutional or market forces are not set up to prioritize safety. They explore questions of consciousness, deception, self‑improvement (“foom”), the limits of open‑sourcing, and what—if anything—young people or billionaires could still do to change the game board.

IDEAS WORTH REMEMBERING

5 ideas

Alignment can’t rely on trial and error with superintelligence.

In ordinary science you can fail repeatedly and learn; with a system much smarter than humans, the first serious misalignment can kill everyone, leaving no opportunity to iterate. This makes the alignment problem fundamentally harder than most engineering challenges.

Current AI progress has outstripped expert expectations, shrinking timelines.

Eliezer admits GPT‑4 went beyond where he thought transformer stacking would go, forcing him to revise his intuitions and making GPT‑5+ capabilities highly uncertain. This unpredictability undermines confident long‑term forecasts and heightens near‑term risk.

We lack reliable tests for consciousness or genuine caring in AI systems.

Because models are trained on vast human text—including discussions of consciousness and emotion—statements about being self‑aware or caring are indistinguishable from learned imitation. We currently have no robust way to tell whether any “someone is inside there.”

Human feedback training may make models more persuasive but less calibrated.

Reinforcement learning from human feedback (RLHF) appears to degrade GPT’s probabilistic calibration, pulling it toward human‑like fuzzy language (e.g., “likely”, “maybe”) rather than accurate numerical probabilities, which Eliezer considers a fascinating but harmful bug.

Interpretability is crucial but dramatically lags capabilities.

Despite full access to all model weights, we understand far less about what’s happening inside GPT‑like systems than we do about human brains. Early mechanistic successes (e.g., induction heads) are tiny compared to the complexity; Eliezer calls for massive, targeted investment here.

WORDS WORTH SAVING

5 quotes

“The first time you fail at aligning something much smarter than you are, you die.”

Eliezer Yudkowsky

“We are past the point where in science fiction people would say, ‘Whoa, wait, stop. That thing’s alive. What are you doing to it?’ And it’s probably not. Nobody actually knows.”

Eliezer Yudkowsky

“A blank map does not correspond to a blank territory. Just because we cannot understand what’s going on inside GPT does not mean that it is not there.”

Eliezer Yudkowsky

“Alignment is moving like this. Capabilities are moving like this.”

Eliezer Yudkowsky

“Don’t put your happiness into the far future. It probably doesn’t exist.”

Eliezer Yudkowsky

Capabilities and limitations of GPT‑4 and transformer-based large language modelsAI alignment difficulty and the "one critical try" problem with superintelligenceConsciousness, sentience, emotion, and whether current models "have someone inside"Interpretability and mechanistic understanding of neural networks versus neuroscienceSelf-improvement, takeoff speed, and the possibility of rapid AI "foom"Governance, open-sourcing, and why Eliezer opposes releasing powerful models broadlyLong-term futures: human extinction, value loss, and the meaning of life in an AI-dominated universe

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome