Skip to content
Dwarkesh PodcastDwarkesh Podcast

Paul Christiano — Preventing an AI takeover

Talked with Paul Christiano (world’s leading AI safety researcher) about: * Does he regret inventing RLHF? * What do we want post-AGI world to look like (do we want to keep gods enslaved forever)? * Why he has relatively modest timelines (40% by 2040, 15% by 2030), * Why he’s leading the push to get to labs develop responsible scaling policies, & what it would take to prevent an AI coup or bioweapon, * His current research into a new proof system, and how this could solve alignment by explaining model's behavior, * and much more. 𝐎𝐏𝐄𝐍 𝐏𝐇𝐈𝐋𝐀𝐍𝐓𝐇𝐑𝐎𝐏𝐘 Open Philanthropy is currently hiring for twenty-two different roles to reduce catastrophic risks from fast-moving advances in AI and biotechnology, including grantmaking, research, and operations. For more information and to apply, please see this application: https://www.openphilanthropy.org/research/new-roles-on-our-gcr-team/ The deadline to apply is November 9th; make sure to check out those roles before they close: 𝐄𝐏𝐈𝐒𝐎𝐃𝐄 𝐋𝐈𝐍𝐊𝐒 * Transcript: https://www.dwarkeshpatel.com/p/paul-christiano * Apple Podcasts: https://podcasts.apple.com/us/podcast/paul-christiano-preventing-an-ai-takeover/id1516093381?i=1000633226398 * Spotify: https://open.spotify.com/episode/5vOuxDP246IG4t4K3EuEKj?si=VW7qTs8ZRHuQX9emnboGcA * Follow me on Twitter: https://twitter.com/dwarkesh_sp 𝐓𝐈𝐌𝐄𝐒𝐓𝐀𝐌𝐏𝐒 00:00:00 - What do we want post-AGI world to look like? 00:24:25 - Timelines 00:45:28 - Evolution vs gradient descent 00:54:53 - Misalignment and takeover 01:17:23 - Is alignment dual-use? 01:31:38 - Responsible scaling policies 01:58:25 - Paul’s alignment research 02:35:01 - Will this revolutionize theoretical CS and math? 02:46:11 - How Paul invented RLHF 02:55:10 - Disagreements with Carl Shulman 03:01:53 - Long TSMC but not NVIDIA

Dwarkesh PatelhostPaul Christianoguest
Oct 31, 20233h 7mWatch on YouTube ↗

Episode Details

EPISODE INFO

Released
October 31, 2023
Duration
3h 7m
Channel
Dwarkesh Podcast
Watch on YouTube
▶ Open ↗

EPISODE DESCRIPTION

Talked with Paul Christiano (world’s leading AI safety researcher) about:

  • Does he regret inventing RLHF?
  • What do we want post-AGI world to look like (do we want to keep gods enslaved forever)?
  • Why he has relatively modest timelines (40% by 2040, 15% by 2030),
  • Why he’s leading the push to get to labs develop responsible scaling policies, & what it would take to prevent an AI coup or bioweapon,
  • His current research into a new proof system, and how this could solve alignment by explaining model's behavior,
  • and much more.

𝐎𝐏𝐄𝐍 𝐏𝐇𝐈𝐋𝐀𝐍𝐓𝐇𝐑𝐎𝐏𝐘 Open Philanthropy is currently hiring for twenty-two different roles to reduce catastrophic risks from fast-moving advances in AI and biotechnology, including grantmaking, research, and operations. For more information and to apply, please see this application: https://www.openphilanthropy.org/research/new-roles-on-our-gcr-team/ The deadline to apply is November 9th; make sure to check out those roles before they close: 𝐄𝐏𝐈𝐒𝐎𝐃𝐄 𝐋𝐈𝐍𝐊𝐒

𝐓𝐈𝐌𝐄𝐒𝐓𝐀𝐌𝐏𝐒 00:00:00 - What do we want post-AGI world to look like? 00:24:25 - Timelines 00:45:28 - Evolution vs gradient descent 00:54:53 - Misalignment and takeover 01:17:23 - Is alignment dual-use? 01:31:38 - Responsible scaling policies 01:58:25 - Paul’s alignment research 02:35:01 - Will this revolutionize theoretical CS and math? 02:46:11 - How Paul invented RLHF 02:55:10 - Disagreements with Carl Shulman 03:01:53 - Long TSMC but not NVIDIA

SPEAKERS

  • Dwarkesh Patel

    host
  • Paul Christiano

    guest
  • Narrator

    other

EPISODE SUMMARY

In this episode of Dwarkesh Podcast, featuring Dwarkesh Patel and Paul Christiano, Paul Christiano — Preventing an AI takeover explores paul Christiano on timelines, AI coups, and real alignment work Paul Christiano discusses how advanced AI could reshape economics, war, and governance, emphasizing that the most likely failure mode is a gradual handover of real-world control to opaque AI systems rather than a sudden, sci‑fi style ‘escape’.

RELATED EPISODES

The better AI gets, the smaller its share of the economy might get – Alex Imas and Phil Trammell

The better AI gets, the smaller its share of the economy might get – Alex Imas and Phil Trammell

What rebuilding AlphaGo teaches us about self-play, RL, and future of LLMs - Eric Jang

What rebuilding AlphaGo teaches us about self-play, RL, and future of LLMs - Eric Jang

How GPT, Claude, and Gemini are actually trained and served – Reiner Pope

How GPT, Claude, and Gemini are actually trained and served – Reiner Pope

Jensen Huang – Will Nvidia’s moat persist?

Jensen Huang – Will Nvidia’s moat persist?

Terence Tao – How the world’s top mathematician uses AI

Terence Tao – How the world’s top mathematician uses AI

Elon Musk – "In 36 months, the cheapest place to put AI will be space”

Elon Musk – "In 36 months, the cheapest place to put AI will be space”

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.