Dwarkesh Podcast

John Schulman (OpenAI Cofounder) — Reasoning, RLHF, & plan for 2027 AGI

John Schulman on how posttraining tames the shoggoth, and the nature of the progress to come... 𝐄𝐏𝐈𝐒𝐎𝐃𝐄 𝐋𝐈𝐍𝐊𝐒 * Apple Podcasts: https://podcasts.apple.com/us/podcast/john-schulman-openai-cofounder-reasoning-rlhf-plan/id1516093381?i=1000655679622 * Spotify: https://open.spotify.com/episode/1ivzHH9RWciXe4O1rKtldf?si=53503781e05f4d8f * Transcript: https://www.dwarkeshpatel.com/p/john-schulman/ * Me on Twitter: https://twitter.com/dwarkesh_sp/ 𝐒𝐏𝐎𝐍𝐒𝐎𝐑 * CommandBar is an AI user assistant that any software product can embed to non-annoyingly assist, support, and unleash their users. Used by forward-thinking CX, product, growth, and marketing teams. Learn more at https://www.commandbar.com/ If you’re interested in advertising on the podcast, fill out this form: https://airtable.com/appxGOvFLDLP5dlzv/pagFVrbHRohW6F2bZ/form 𝐓𝐈𝐌𝐄𝐒𝐓𝐀𝐌𝐏𝐒 00:00:00 - Pre-training, post-training, and future capabilities 00:17:20 - Plan for AGI 2025 00:29:43 - Teaching models to reason 00:40:10 - The Road to ChatGPT 00:51:33 - What makes for a good RL researcher? 01:00:18 - Keeping humans in the loop 01:14:36 - State of research, plateaus, and moats

John SchulmanguestDwarkesh Patelhost

May 15, 20241h 35mWatch on YouTube ↗

WHAT IT’S REALLY ABOUT

OpenAI Cofounder John Schulman on Training, Alignment, and Near-Term AGI

John Schulman explains how large language models are first pre-trained to imitate the internet and then post-trained (via RLHF and related methods) into helpful, safe assistants with a narrower, chat-focused persona.
He expects rapid capability gains over the next few years, including long-horizon task execution, better coding and research assistance, and agents that can work on entire projects rather than single prompts.
Schulman discusses alignment and safety plans if AGI arrives sooner than expected, emphasizing careful evaluation, possible pauses, and coordination among major labs to avoid unsafe racing dynamics.
He also details how RLHF actually works in practice, why post-training has dramatically improved GPT‑4 since launch, and how future systems may learn online, use richer memory, and act more like persistent colleagues than search engines.

IDEAS WORTH REMEMBERING

5 ideas

Pre-training creates a broad, calibrated world-model; post-training shapes it into a specific assistant persona.

Base models learn to predict the next token across internet-scale data and can imitate many styles, while RLHF and fine-tuning narrow this into a helpful, instruction-following chatbot optimized for human approval rather than raw imitation.

Near-term models will likely handle full projects, not just single prompts.

Schulman anticipates that within a couple of years models will write multi-file codebases, test, iterate, and recover from errors, enabling sustained collaboration on tasks that today require human project management.

Long-horizon training could unlock big jumps, but won’t alone solve AGI.

Extending coherence over hours to months via long-horizon RL may trigger phase-like transitions in capability, yet he expects remaining deficits—taste, ambiguity-handling, UI/real-world affordances—that still limit fully human-level performance.

Alignment will likely require incremental deployment, strong evaluations, and cross-lab coordination.

If AGI arrives earlier than expected, Schulman favors slowing new training and large-scale deployment, running intensive evals and red-teaming, monitoring deployed systems, and coordinating limits across big labs to avoid unsafe races.

Post-training is a major performance lever and a potential moat.

Most of GPT‑4’s improvement since launch (e.g., in ELO benchmarks) comes from better post-training—data quality, iterative RLHF, and process improvements—which requires complex infrastructure, tacit organizational knowledge, and skilled raters.

WORDS WORTH SAVING

5 quotes

I think even in one or two years, you could imagine having the models carry out a whole coding project… moving away from using the model like a search engine and more towards having a whole project that I'm doing in collaboration with the model.

— John Schulman

We might not wanna jump to having AIs run whole firms immediately, even if the models are good enough to actually run a successful business themselves.

— John Schulman

It seems like then you should be planning for the possibility you would have AGI very soon…

— Dwarkesh Patel

Yeah, I think that would be reasonable.

— John Schulman

Right now it's hard to get the models to do anything coherent. But if they started to get really good, I think we would have to take some of these questions seriously.

— John Schulman

Pre-training vs post-training (RLHF) and what each stage producesLong-horizon reasoning, project-level agents, and future capabilitiesAGI timelines, deployment strategies, and coordination/safety concernsGeneralization, transfer, and sample efficiency in large modelsDesign and effects of RLHF, preference modeling, and the ‘model spec’Economic and governance questions around AI-run firms and human oversightSchulman’s view on the research ecosystem, moats, raters, and model behavior

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.