
John Schulman (OpenAI Cofounder) — Reasoning, RLHF, & plan for 2027 AGI
John Schulman (guest), Dwarkesh Patel (host)
In this episode of Dwarkesh Podcast, featuring John Schulman and Dwarkesh Patel, John Schulman (OpenAI Cofounder) — Reasoning, RLHF, & plan for 2027 AGI explores openAI Cofounder John Schulman on Training, Alignment, and Near-Term AGI John Schulman explains how large language models are first pre-trained to imitate the internet and then post-trained (via RLHF and related methods) into helpful, safe assistants with a narrower, chat-focused persona.
OpenAI Cofounder John Schulman on Training, Alignment, and Near-Term AGI
John Schulman explains how large language models are first pre-trained to imitate the internet and then post-trained (via RLHF and related methods) into helpful, safe assistants with a narrower, chat-focused persona.
He expects rapid capability gains over the next few years, including long-horizon task execution, better coding and research assistance, and agents that can work on entire projects rather than single prompts.
Schulman discusses alignment and safety plans if AGI arrives sooner than expected, emphasizing careful evaluation, possible pauses, and coordination among major labs to avoid unsafe racing dynamics.
He also details how RLHF actually works in practice, why post-training has dramatically improved GPT‑4 since launch, and how future systems may learn online, use richer memory, and act more like persistent colleagues than search engines.
Key Takeaways
Pre-training creates a broad, calibrated world-model; post-training shapes it into a specific assistant persona.
Base models learn to predict the next token across internet-scale data and can imitate many styles, while RLHF and fine-tuning narrow this into a helpful, instruction-following chatbot optimized for human approval rather than raw imitation.
Get the full analysis with uListen AI
Near-term models will likely handle full projects, not just single prompts.
Schulman anticipates that within a couple of years models will write multi-file codebases, test, iterate, and recover from errors, enabling sustained collaboration on tasks that today require human project management.
Get the full analysis with uListen AI
Long-horizon training could unlock big jumps, but won’t alone solve AGI.
Extending coherence over hours to months via long-horizon RL may trigger phase-like transitions in capability, yet he expects remaining deficits—taste, ambiguity-handling, UI/real-world affordances—that still limit fully human-level performance.
Get the full analysis with uListen AI
Alignment will likely require incremental deployment, strong evaluations, and cross-lab coordination.
If AGI arrives earlier than expected, Schulman favors slowing new training and large-scale deployment, running intensive evals and red-teaming, monitoring deployed systems, and coordinating limits across big labs to avoid unsafe races.
Get the full analysis with uListen AI
Post-training is a major performance lever and a potential moat.
Most of GPT‑4’s improvement since launch (e. ...
Get the full analysis with uListen AI
Models already show striking generalization across languages, modalities, and capabilities.
Fine-tuning on English often transfers to other languages, text-only fine-tuning improves image behavior, and a tiny number of corrective examples (dozens) can fix broad misbehavior like falsely claiming to send emails, indicating strong transfer from pre-training.
Get the full analysis with uListen AI
Future assistants will be persistent, proactive collaborators embedded in workflows.
Schulman expects a shift from one-off Q&A toward agents that track whole projects, understand your screen and tools, act in the background, and proactively suggest next steps—while humans ideally remain the primary drivers of goals.
Get the full analysis with uListen AI
Notable Quotes
“I think even in one or two years, you could imagine having the models carry out a whole coding project… moving away from using the model like a search engine and more towards having a whole project that I'm doing in collaboration with the model.”
— John Schulman
“We might not wanna jump to having AIs run whole firms immediately, even if the models are good enough to actually run a successful business themselves.”
— John Schulman
“It seems like then you should be planning for the possibility you would have AGI very soon…”
— Dwarkesh Patel
“Yeah, I think that would be reasonable.”
— John Schulman
“Right now it's hard to get the models to do anything coherent. But if they started to get really good, I think we would have to take some of these questions seriously.”
— John Schulman
Questions Answered in This Episode
If long-horizon RL unlocks coherent year-long planning, what concrete new risks and failure modes does that introduce beyond today’s systems?
John Schulman explains how large language models are first pre-trained to imitate the internet and then post-trained (via RLHF and related methods) into helpful, safe assistants with a narrower, chat-focused persona.
Get the full analysis with uListen AI
How can we empirically distinguish between genuine reasoning improvements and better pattern-matching or prompt-following in future models?
He expects rapid capability gains over the next few years, including long-horizon task execution, better coding and research assistance, and agents that can work on entire projects rather than single prompts.
Get the full analysis with uListen AI
What mechanisms—technical or institutional—could realistically enforce a pause or slowdown if multiple global actors are racing toward AGI?
Schulman discusses alignment and safety plans if AGI arrives sooner than expected, emphasizing careful evaluation, possible pauses, and coordination among major labs to avoid unsafe racing dynamics.
Get the full analysis with uListen AI
How should society decide whose values and preferences get encoded into powerful frontier models as they begin to influence key institutions and economies?
He also details how RLHF actually works in practice, why post-training has dramatically improved GPT‑4 since launch, and how future systems may learn online, use richer memory, and act more like persistent colleagues than search engines.
Get the full analysis with uListen AI
At what point, if ever, should advanced AI systems themselves be treated as stakeholders in alignment decisions, and how would we know when we’ve crossed that threshold?
Get the full analysis with uListen AI
Transcript Preview
I think even in one or two years, you could imagine having the models carry out a whole coding project, moving away from using the model of like a search engine and more towards having a whole project that I'm, like, doing in collaboration with the model. We might not wanna jump to having AIs run whole firms immediately, even if the models are good enough to actually run a successful business themselves.
D- if there's no other bottlenecks next year or y- or something, you got AGI, wh- what- what's the plan? Today, I have the pleasure to speak with John Schulman, who is one of the co-founders of OpenAI and leads the post-training team here. And, um, he also led the creation of ChatGPT and is the author of many of the most important and widely cited papers in AI and RL, including PPO and many others. So John, really excited to chat with you. Thanks for coming on the podcast.
Thanks for having me on the podcast. I'm a big fan.
Oh, thank you. Tha- thank you for saying that. Um, so the first question I had is, we have these distinctions between pre-training and post-training. Beyond what is actually happening in terms of loss, function, and training regimes, I'm just curious, taking a step back conceptually, like, what kind of thing is pre-training creating? What does post-training do on top of that?
In pre-training, you're basically training to imitate all of the content on, on the internet or on the web, um, including websites and code and so forth. Uh, so you get a model that can basically, um, generate, uh, content that looks like, uh, random web pages from the internet. And, um, th- the model is also trained to maximize likelihood, where it has to put a probability on everything. So it's, um... The objective is, uh, basically predicting the next token given the previous tokens. Tokens are, like, words or parts of words. And, uh, since the model has to put a probability on it, uh, and it's, we're training with, um... to maximize log probability, it ends up being very calibrated. So it can not only generate all of this, uh, the content of the web, it can also assign probabilities to everything. So, so the base model can effectively take on all of these different personas or generate, um, all these different kinds of content. And then, uh, when we do post-training, uh, we're usually targeting a narrower, um, range of behavior where we basically want the model to behave like this kind of chat assistant. And, uh, it's a, it's a more specific persona where it's, um, th- trying to be helpful. It's not trying to imitate a person. It's, um, answering your questions or doing your tasks. Um, and, uh, we're optimizing on a different objective, which is more about producing outputs that humans will like and find useful, as opposed to just trying to imitate, uh, this raw content from the web.
Install uListen to search the full transcript and get AI-powered insights
Get Full TranscriptGet more from every podcast
AI summaries, searchable transcripts, and fact-checking. Free forever.
Add to Chrome