Paul Christiano — Preventing an AI takeover

Talked with Paul Christiano (world’s leading AI safety researcher) about: * Does he regret inventing RLHF? * What do we want post-AGI world to look like (do we want to keep gods enslaved forever)? * Why he has relatively modest timelines (40% by 2040, 15% by 2030), * Why he’s leading the push to get to labs develop responsible scaling policies, & what it would take to prevent an AI coup or bioweapon, * His current research into a new proof system, and how this could solve alignment by explaining model's behavior, * and much more. 𝐎𝐏𝐄𝐍 𝐏𝐇𝐈𝐋𝐀𝐍𝐓𝐇𝐑𝐎𝐏𝐘 Open Philanthropy is currently hiring for twenty-two different roles to reduce catastrophic risks from fast-moving advances in AI and biotechnology, including grantmaking, research, and operations. For more information and to apply, please see this application: https://www.openphilanthropy.org/research/new-roles-on-our-gcr-team/ The deadline to apply is November 9th; make sure to check out those roles before they close: 𝐄𝐏𝐈𝐒𝐎𝐃𝐄 𝐋𝐈𝐍𝐊𝐒 * Transcript: https://www.dwarkeshpatel.com/p/paul-christiano * Apple Podcasts: https://podcasts.apple.com/us/podcast/paul-christiano-preventing-an-ai-takeover/id1516093381?i=1000633226398 * Spotify: https://open.spotify.com/episode/5vOuxDP246IG4t4K3EuEKj?si=VW7qTs8ZRHuQX9emnboGcA * Follow me on Twitter: https://twitter.com/dwarkesh_sp 𝐓𝐈𝐌𝐄𝐒𝐓𝐀𝐌𝐏𝐒 00:00:00 - What do we want post-AGI world to look like? 00:24:25 - Timelines 00:45:28 - Evolution vs gradient descent 00:54:53 - Misalignment and takeover 01:17:23 - Is alignment dual-use? 01:31:38 - Responsible scaling policies 01:58:25 - Paul’s alignment research 02:35:01 - Will this revolutionize theoretical CS and math? 02:46:11 - How Paul invented RLHF 02:55:10 - Disagreements with Carl Shulman 03:01:53 - Long TSMC but not NVIDIA

Dwarkesh PatelhostPaul Christianoguest

Oct 31, 20233h 7mWatch on YouTube ↗

EPISODE INFO

Released: October 31, 2023
Duration: 3h 7m
Channel: Dwarkesh Podcast
Watch on YouTube: ▶ Open ↗

EPISODE DESCRIPTION

Talked with Paul Christiano (world’s leading AI safety researcher) about:
Does he regret inventing RLHF?
What do we want post-AGI world to look like (do we want to keep gods enslaved forever)?
Why he has relatively modest timelines (40% by 2040, 15% by 2030),
Why he’s leading the push to get to labs develop responsible scaling policies, & what it would take to prevent an AI coup or bioweapon,
His current research into a new proof system, and how this could solve alignment by explaining model's behavior,
and much more.
𝐎𝐏𝐄𝐍 𝐏𝐇𝐈𝐋𝐀𝐍𝐓𝐇𝐑𝐎𝐏𝐘 Open Philanthropy is currently hiring for twenty-two different roles to reduce catastrophic risks from fast-moving advances in AI and biotechnology, including grantmaking, research, and operations. For more information and to apply, please see this application: https://www.openphilanthropy.org/research/new-roles-on-our-gcr-team/ The deadline to apply is November 9th; make sure to check out those roles before they close: 𝐄𝐏𝐈𝐒𝐎𝐃𝐄 𝐋𝐈𝐍𝐊𝐒
Transcript: https://www.dwarkeshpatel.com/p/paul-christiano
Apple Podcasts: https://podcasts.apple.com/us/podcast/paul-christiano-preventing-an-ai-takeover/id1516093381?i=1000633226398
Spotify: https://open.spotify.com/episode/5vOuxDP246IG4t4K3EuEKj?si=VW7qTs8ZRHuQX9emnboGcA
Follow me on Twitter: https://twitter.com/dwarkesh_sp
𝐓𝐈𝐌𝐄𝐒𝐓𝐀𝐌𝐏𝐒 00:00:00 - What do we want post-AGI world to look like? 00:24:25 - Timelines 00:45:28 - Evolution vs gradient descent 00:54:53 - Misalignment and takeover 01:17:23 - Is alignment dual-use? 01:31:38 - Responsible scaling policies 01:58:25 - Paul’s alignment research 02:35:01 - Will this revolutionize theoretical CS and math? 02:46:11 - How Paul invented RLHF 02:55:10 - Disagreements with Carl Shulman 03:01:53 - Long TSMC but not NVIDIA

SPEAKERS

Dwarkesh Patel
host
Paul Christiano
guest
Narrator
other

EPISODE SUMMARY

In this episode of Dwarkesh Podcast, featuring Dwarkesh Patel and Paul Christiano, Paul Christiano — Preventing an AI takeover explores paul Christiano on timelines, AI coups, and real alignment work Paul Christiano discusses how advanced AI could reshape economics, war, and governance, emphasizing that the most likely failure mode is a gradual handover of real-world control to opaque AI systems rather than a sudden, sci‑fi style ‘escape’.

RELATED EPISODES

David Reich – Bronze Age shock, the Neanderthal puzzle, & the sudden spread of farming

Jensen Huang – TPU competition, why we should sell chips to China, & Nvidia’s supply chain moat

Dario Amodei — “We are near the end of the exponential”

Andrej Karpathy — “We’re summoning ghosts, not building animals”

Why Leonardo was a saboteur, Gutenberg went broke, and Florence was weird – Ada Palmer

Richard Sutton – Father of RL thinks LLMs are a dead end

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome

Episode Details