Skip to content
Dwarkesh PodcastDwarkesh Podcast

John Schulman (OpenAI Cofounder) — Reasoning, RLHF, & plan for 2027 AGI

John Schulman on how posttraining tames the shoggoth, and the nature of the progress to come... 𝐄𝐏𝐈𝐒𝐎𝐃𝐄 𝐋𝐈𝐍𝐊𝐒 * Apple Podcasts: https://podcasts.apple.com/us/podcast/john-schulman-openai-cofounder-reasoning-rlhf-plan/id1516093381?i=1000655679622 * Spotify: https://open.spotify.com/episode/1ivzHH9RWciXe4O1rKtldf?si=53503781e05f4d8f * Transcript: https://www.dwarkeshpatel.com/p/john-schulman/ * Me on Twitter: https://twitter.com/dwarkesh_sp/ 𝐒𝐏𝐎𝐍𝐒𝐎𝐑 * CommandBar is an AI user assistant that any software product can embed to non-annoyingly assist, support, and unleash their users. Used by forward-thinking CX, product, growth, and marketing teams. Learn more at https://www.commandbar.com/ If you’re interested in advertising on the podcast, fill out this form: https://airtable.com/appxGOvFLDLP5dlzv/pagFVrbHRohW6F2bZ/form 𝐓𝐈𝐌𝐄𝐒𝐓𝐀𝐌𝐏𝐒 00:00:00 - Pre-training, post-training, and future capabilities 00:17:20 - Plan for AGI 2025 00:29:43 - Teaching models to reason 00:40:10 - The Road to ChatGPT 00:51:33 - What makes for a good RL researcher? 01:00:18 - Keeping humans in the loop 01:14:36 - State of research, plateaus, and moats

John SchulmanguestDwarkesh Patelhost
May 15, 20241h 35mWatch on YouTube ↗

Episode Details

EPISODE INFO

Released
May 15, 2024
Duration
1h 35m
Channel
Dwarkesh Podcast
Watch on YouTube
▶ Open ↗

EPISODE DESCRIPTION

John Schulman on how posttraining tames the shoggoth, and the nature of the progress to come... 𝐄𝐏𝐈𝐒𝐎𝐃𝐄 𝐋𝐈𝐍𝐊𝐒

𝐒𝐏𝐎𝐍𝐒𝐎𝐑

• CommandBar is an AI user assistant that any software product can embed to non-annoyingly assist, support, and unleash their users. Used by forward-thinking CX, product, growth, and marketing teams. Learn more at https://www.commandbar.com/ If you’re interested in advertising on the podcast, fill out this form: https://airtable.com/appxGOvFLDLP5dlzv/pagFVrbHRohW6F2bZ/form 𝐓𝐈𝐌𝐄𝐒𝐓𝐀𝐌𝐏𝐒 00:00:00 - Pre-training, post-training, and future capabilities 00:17:20 - Plan for AGI 2025 00:29:43 - Teaching models to reason 00:40:10 - The Road to ChatGPT 00:51:33 - What makes for a good RL researcher? 01:00:18 - Keeping humans in the loop 01:14:36 - State of research, plateaus, and moats

SPEAKERS

  • John Schulman

    guest
  • Dwarkesh Patel

    host

EPISODE SUMMARY

In this episode of Dwarkesh Podcast, featuring John Schulman and Dwarkesh Patel, John Schulman (OpenAI Cofounder) — Reasoning, RLHF, & plan for 2027 AGI explores openAI Cofounder John Schulman on Training, Alignment, and Near-Term AGI John Schulman explains how large language models are first pre-trained to imitate the internet and then post-trained (via RLHF and related methods) into helpful, safe assistants with a narrower, chat-focused persona.

RELATED EPISODES

David Reich – Bronze Age shock, the Neanderthal puzzle, & the sudden spread of farming

David Reich – Bronze Age shock, the Neanderthal puzzle, & the sudden spread of farming

Jensen Huang – TPU competition, why we should sell chips to China, & Nvidia’s supply chain moat

Jensen Huang – TPU competition, why we should sell chips to China, & Nvidia’s supply chain moat

Dario Amodei — “We are near the end of the exponential”

Dario Amodei — “We are near the end of the exponential”

Andrej Karpathy — “We’re summoning ghosts, not building animals”

Andrej Karpathy — “We’re summoning ghosts, not building animals”

Why Leonardo was a saboteur, Gutenberg went broke, and Florence was weird – Ada Palmer

Why Leonardo was a saboteur, Gutenberg went broke, and Florence was weird – Ada Palmer

Richard Sutton – Father of RL thinks LLMs are a dead end

Richard Sutton – Father of RL thinks LLMs are a dead end

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome