Skip to content
Dwarkesh PodcastDwarkesh Podcast

John Schulman (OpenAI Cofounder) — Reasoning, RLHF, & plan for 2027 AGI

John Schulman on how posttraining tames the shoggoth, and the nature of the progress to come... 𝐄𝐏𝐈𝐒𝐎𝐃𝐄 𝐋𝐈𝐍𝐊𝐒 * Apple Podcasts: https://podcasts.apple.com/us/podcast/john-schulman-openai-cofounder-reasoning-rlhf-plan/id1516093381?i=1000655679622 * Spotify: https://open.spotify.com/episode/1ivzHH9RWciXe4O1rKtldf?si=53503781e05f4d8f * Transcript: https://www.dwarkeshpatel.com/p/john-schulman/ * Me on Twitter: https://twitter.com/dwarkesh_sp/ 𝐒𝐏𝐎𝐍𝐒𝐎𝐑 * CommandBar is an AI user assistant that any software product can embed to non-annoyingly assist, support, and unleash their users. Used by forward-thinking CX, product, growth, and marketing teams. Learn more at https://www.commandbar.com/ If you’re interested in advertising on the podcast, fill out this form: https://airtable.com/appxGOvFLDLP5dlzv/pagFVrbHRohW6F2bZ/form 𝐓𝐈𝐌𝐄𝐒𝐓𝐀𝐌𝐏𝐒 00:00:00 - Pre-training, post-training, and future capabilities 00:17:20 - Plan for AGI 2025 00:29:43 - Teaching models to reason 00:40:10 - The Road to ChatGPT 00:51:33 - What makes for a good RL researcher? 01:00:18 - Keeping humans in the loop 01:14:36 - State of research, plateaus, and moats

John SchulmanguestDwarkesh Patelhost
May 15, 20241h 35mWatch on YouTube ↗

Episode Details

EPISODE INFO

Released
May 15, 2024
Duration
1h 35m
Channel
Dwarkesh Podcast
Watch on YouTube
▶ Open ↗

EPISODE DESCRIPTION

John Schulman on how posttraining tames the shoggoth, and the nature of the progress to come... 𝐄𝐏𝐈𝐒𝐎𝐃𝐄 𝐋𝐈𝐍𝐊𝐒

𝐒𝐏𝐎𝐍𝐒𝐎𝐑

• CommandBar is an AI user assistant that any software product can embed to non-annoyingly assist, support, and unleash their users. Used by forward-thinking CX, product, growth, and marketing teams. Learn more at https://www.commandbar.com/ If you’re interested in advertising on the podcast, fill out this form: https://airtable.com/appxGOvFLDLP5dlzv/pagFVrbHRohW6F2bZ/form 𝐓𝐈𝐌𝐄𝐒𝐓𝐀𝐌𝐏𝐒 00:00:00 - Pre-training, post-training, and future capabilities 00:17:20 - Plan for AGI 2025 00:29:43 - Teaching models to reason 00:40:10 - The Road to ChatGPT 00:51:33 - What makes for a good RL researcher? 01:00:18 - Keeping humans in the loop 01:14:36 - State of research, plateaus, and moats

SPEAKERS

  • John Schulman

    guest
  • Dwarkesh Patel

    host

EPISODE SUMMARY

In this episode of Dwarkesh Podcast, featuring John Schulman and Dwarkesh Patel, John Schulman (OpenAI Cofounder) — Reasoning, RLHF, & plan for 2027 AGI explores openAI Cofounder John Schulman on Training, Alignment, and Near-Term AGI John Schulman explains how large language models are first pre-trained to imitate the internet and then post-trained (via RLHF and related methods) into helpful, safe assistants with a narrower, chat-focused persona.

RELATED EPISODES

Machiavelli is the most misunderstood thinker of all time – Ada Palmer

Machiavelli is the most misunderstood thinker of all time – Ada Palmer

The better AI gets, the smaller its share of the economy might get – Alex Imas and Phil Trammell

The better AI gets, the smaller its share of the economy might get – Alex Imas and Phil Trammell

What rebuilding AlphaGo teaches us about self-play, RL, and future of LLMs - Eric Jang

What rebuilding AlphaGo teaches us about self-play, RL, and future of LLMs - Eric Jang

How GPT, Claude, and Gemini are actually trained and served – Reiner Pope

How GPT, Claude, and Gemini are actually trained and served – Reiner Pope

Jensen Huang – Will Nvidia’s moat persist?

Jensen Huang – Will Nvidia’s moat persist?

Terence Tao – How the world’s top mathematician uses AI

Terence Tao – How the world’s top mathematician uses AI

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.