John Schulman (OpenAI Cofounder) — Reasoning, RLHF, & plan for 2027 AGI

John Schulman (OpenAI Cofounder) — Reasoning, RLHF, & plan for 2027 AGI

Dwarkesh PodcastMay 15, 20241h 35m

John Schulman (guest), Dwarkesh Patel (host)

Pre-training vs post-training (RLHF) and what each stage producesLong-horizon reasoning, project-level agents, and future capabilitiesAGI timelines, deployment strategies, and coordination/safety concernsGeneralization, transfer, and sample efficiency in large modelsDesign and effects of RLHF, preference modeling, and the ‘model spec’Economic and governance questions around AI-run firms and human oversightSchulman’s view on the research ecosystem, moats, raters, and model behavior

In this episode of Dwarkesh Podcast, featuring John Schulman and Dwarkesh Patel, John Schulman (OpenAI Cofounder) — Reasoning, RLHF, & plan for 2027 AGI explores openAI Cofounder John Schulman on Training, Alignment, and Near-Term AGI John Schulman explains how large language models are first pre-trained to imitate the internet and then post-trained (via RLHF and related methods) into helpful, safe assistants with a narrower, chat-focused persona.

OpenAI Cofounder John Schulman on Training, Alignment, and Near-Term AGI

John Schulman explains how large language models are first pre-trained to imitate the internet and then post-trained (via RLHF and related methods) into helpful, safe assistants with a narrower, chat-focused persona.

He expects rapid capability gains over the next few years, including long-horizon task execution, better coding and research assistance, and agents that can work on entire projects rather than single prompts.

Schulman discusses alignment and safety plans if AGI arrives sooner than expected, emphasizing careful evaluation, possible pauses, and coordination among major labs to avoid unsafe racing dynamics.

He also details how RLHF actually works in practice, why post-training has dramatically improved GPT‑4 since launch, and how future systems may learn online, use richer memory, and act more like persistent colleagues than search engines.

Key Takeaways

Pre-training creates a broad, calibrated world-model; post-training shapes it into a specific assistant persona.

Base models learn to predict the next token across internet-scale data and can imitate many styles, while RLHF and fine-tuning narrow this into a helpful, instruction-following chatbot optimized for human approval rather than raw imitation.

Get the full analysis with uListen AI

Near-term models will likely handle full projects, not just single prompts.

Schulman anticipates that within a couple of years models will write multi-file codebases, test, iterate, and recover from errors, enabling sustained collaboration on tasks that today require human project management.

Get the full analysis with uListen AI

Long-horizon training could unlock big jumps, but won’t alone solve AGI.

Extending coherence over hours to months via long-horizon RL may trigger phase-like transitions in capability, yet he expects remaining deficits—taste, ambiguity-handling, UI/real-world affordances—that still limit fully human-level performance.

Get the full analysis with uListen AI

Alignment will likely require incremental deployment, strong evaluations, and cross-lab coordination.

If AGI arrives earlier than expected, Schulman favors slowing new training and large-scale deployment, running intensive evals and red-teaming, monitoring deployed systems, and coordinating limits across big labs to avoid unsafe races.

Get the full analysis with uListen AI

Post-training is a major performance lever and a potential moat.

Most of GPT‑4’s improvement since launch (e. ...

Get the full analysis with uListen AI

Models already show striking generalization across languages, modalities, and capabilities.

Fine-tuning on English often transfers to other languages, text-only fine-tuning improves image behavior, and a tiny number of corrective examples (dozens) can fix broad misbehavior like falsely claiming to send emails, indicating strong transfer from pre-training.

Get the full analysis with uListen AI

Future assistants will be persistent, proactive collaborators embedded in workflows.

Schulman expects a shift from one-off Q&A toward agents that track whole projects, understand your screen and tools, act in the background, and proactively suggest next steps—while humans ideally remain the primary drivers of goals.

Get the full analysis with uListen AI

Notable Quotes

I think even in one or two years, you could imagine having the models carry out a whole coding project… moving away from using the model like a search engine and more towards having a whole project that I'm doing in collaboration with the model.

John Schulman

We might not wanna jump to having AIs run whole firms immediately, even if the models are good enough to actually run a successful business themselves.

John Schulman

It seems like then you should be planning for the possibility you would have AGI very soon…

Dwarkesh Patel

Yeah, I think that would be reasonable.

John Schulman

Right now it's hard to get the models to do anything coherent. But if they started to get really good, I think we would have to take some of these questions seriously.

John Schulman

Questions Answered in This Episode

If long-horizon RL unlocks coherent year-long planning, what concrete new risks and failure modes does that introduce beyond today’s systems?

John Schulman explains how large language models are first pre-trained to imitate the internet and then post-trained (via RLHF and related methods) into helpful, safe assistants with a narrower, chat-focused persona.

Get the full analysis with uListen AI

How can we empirically distinguish between genuine reasoning improvements and better pattern-matching or prompt-following in future models?

He expects rapid capability gains over the next few years, including long-horizon task execution, better coding and research assistance, and agents that can work on entire projects rather than single prompts.

Get the full analysis with uListen AI

What mechanisms—technical or institutional—could realistically enforce a pause or slowdown if multiple global actors are racing toward AGI?

Schulman discusses alignment and safety plans if AGI arrives sooner than expected, emphasizing careful evaluation, possible pauses, and coordination among major labs to avoid unsafe racing dynamics.

Get the full analysis with uListen AI

How should society decide whose values and preferences get encoded into powerful frontier models as they begin to influence key institutions and economies?

He also details how RLHF actually works in practice, why post-training has dramatically improved GPT‑4 since launch, and how future systems may learn online, use richer memory, and act more like persistent colleagues than search engines.

Get the full analysis with uListen AI

At what point, if ever, should advanced AI systems themselves be treated as stakeholders in alignment decisions, and how would we know when we’ve crossed that threshold?

Get the full analysis with uListen AI

Transcript Preview

John Schulman

I think even in one or two years, you could imagine having the models carry out a whole coding project, moving away from using the model of like a search engine and more towards having a whole project that I'm, like, doing in collaboration with the model. We might not wanna jump to having AIs run whole firms immediately, even if the models are good enough to actually run a successful business themselves.

Dwarkesh Patel

D- if there's no other bottlenecks next year or y- or something, you got AGI, wh- what- what's the plan? Today, I have the pleasure to speak with John Schulman, who is one of the co-founders of OpenAI and leads the post-training team here. And, um, he also led the creation of ChatGPT and is the author of many of the most important and widely cited papers in AI and RL, including PPO and many others. So John, really excited to chat with you. Thanks for coming on the podcast.

John Schulman

Thanks for having me on the podcast. I'm a big fan.

Dwarkesh Patel

Oh, thank you. Tha- thank you for saying that. Um, so the first question I had is, we have these distinctions between pre-training and post-training. Beyond what is actually happening in terms of loss, function, and training regimes, I'm just curious, taking a step back conceptually, like, what kind of thing is pre-training creating? What does post-training do on top of that?

John Schulman

In pre-training, you're basically training to imitate all of the content on, on the internet or on the web, um, including websites and code and so forth. Uh, so you get a model that can basically, um, generate, uh, content that looks like, uh, random web pages from the internet. And, um, th- the model is also trained to maximize likelihood, where it has to put a probability on everything. So it's, um... The objective is, uh, basically predicting the next token given the previous tokens. Tokens are, like, words or parts of words. And, uh, since the model has to put a probability on it, uh, and it's, we're training with, um... to maximize log probability, it ends up being very calibrated. So it can not only generate all of this, uh, the content of the web, it can also assign probabilities to everything. So, so the base model can effectively take on all of these different personas or generate, um, all these different kinds of content. And then, uh, when we do post-training, uh, we're usually targeting a narrower, um, range of behavior where we basically want the model to behave like this kind of chat assistant. And, uh, it's a, it's a more specific persona where it's, um, th- trying to be helpful. It's not trying to imitate a person. It's, um, answering your questions or doing your tasks. Um, and, uh, we're optimizing on a different objective, which is more about producing outputs that humans will like and find useful, as opposed to just trying to imitate, uh, this raw content from the web.

Install uListen to search the full transcript and get AI-powered insights

Get Full Transcript

Get more from every podcast

AI summaries, searchable transcripts, and fact-checking. Free forever.

Add to Chrome