a16zFrom Vibe Coding to Vibe Researching: OpenAI’s Mark Chen and Jakub Pachocki
CHAPTERS
Why GPT-5: Bringing reasoning and agentic behavior to the default experience
Mark and Jakub frame GPT-5 as an effort to make “reasoning models” mainstream, reducing the need for users to choose between fast responses and deep thinking modes. They discuss unifying prior model lines and automatically calibrating how much “thinking time” a prompt needs.
Measuring progress when classic benchmarks saturate: evals, competitions, and new milestones
The conversation turns to how OpenAI evaluates progress as many longstanding benchmarks approach ceiling performance. They argue that with reinforcement learning and targeted training, single eval scores can be misleading, so the field needs new measures tied to discovery and economic relevance.
Unexpected GPT-5 capabilities: hard science usefulness and “months of student work” automation
Mark and Jakub describe moments when GPT-5 (and earlier o3) crossed a threshold of practical usefulness—especially for math/physics reasoning. They cite “light bulb” reactions from professional scientists and talk about increasing trustworthiness for derivations and technical work.
Roadmap to an automated researcher: extending time horizons, memory, and autonomous operation
Jakub lays out OpenAI’s north star: an automated researcher that can discover new ideas in ML and other sciences. Progress is framed as extending the time horizon of coherent reasoning—from hours today toward much longer planning, memory, and reliable autonomy.
Agency vs quality trade-offs: planning depth, stability, and why reasoning is the backbone
They address the observed tension where adding tools/steps can degrade quality, especially late in long chains. The guests connect “depth” and “stability” as the same core challenge: staying on-track over long horizons, with reasoning enabling course-correction under feedback.
Beyond verifiable tasks: tackling open-ended domains and redefining what “open-ended” means
Jakub argues that as problems get longer-horizon—even if well-defined—they become more open-ended in practice because they require choosing fields, programs, and directions. They discuss creative writing as another “extreme,” and how research ultimately forces work in less verifiable spaces.
Why reinforcement learning keeps paying off: combining pretraining’s world model with RL objectives
Jakub explains RL’s effectiveness as a versatile optimization layer once you have strong pretrained language models as a rich environment. He recounts OpenAI’s early RL roots, the difficulty of defining environments, and how language modeling enabled RL to operate in nuanced human contexts.
Reward modeling & enterprise mindset: expect rapid simplification and shifting best practices
Asked how non-RL experts should approach reward modeling, Jakub emphasizes that the tooling and best practices are evolving quickly. He suggests adopting an adaptive mindset—what’s hard and bespoke today may become simpler and more “human-like learning” over time.
GPT-5 Codex and real-world coding: messy environments, behavior specs, and latency presets
Mark describes Codex’s focus: turning raw reasoning intelligence into practical coding help in messy real-world repos. They highlight behavior tuning (proactivity, style, “laziness”) and better presets that spend less time on easy tasks and more time on hard ones.
From “vibe coding” to “vibe researching”: how AI changes the default way people build
They reflect on a Lee Sedol/AlphaGo-style moment for coding: models surpassing their own abilities feels formative and expands what seems possible. Mark shares that younger users see “vibe coding” as the default, and he hopes research will follow the same pattern.
What makes a great researcher: persistence, honest hypothesis testing, taste, and emotional management
Jakub and Mark outline traits of strong researchers: persistence through frequent failure, clear hypotheses, and truth-seeking honesty. Mark adds that experience builds problem selection instincts and the emotional skill to persevere—or pivot—over long timelines.
How breakthroughs happen internally: finding “bugs” in code and in mental models
Jakub gives a grounded view of progress: many pivotal moments come from discovering subtle bugs—either in software that invalidates experiments or in flawed assumptions. Fixing these can unlock stalled research programs and reshape thinking from first principles.
Building a resilient research culture: mission focus, protecting fundamental research, and diverse researcher archetypes
Mark emphasizes that OpenAI’s retention and resilience come from doing frontier fundamental research rather than copying competitors. They discuss hiring for people who’ve solved hard problems (often outside ML), supporting varied research styles, and protecting time/space for core algorithmic work.
Strategy, compute, and scale: portfolio allocation, staying flexible, and trust at the top
They discuss resource allocation as a dynamic portfolio problem where compute is the key bottleneck, and being “second place at everything” is a risk without prioritization. The closing reflects on OpenAI’s ability to keep learning at scale and on the trust between Mark and Jakub forged during early reasoning/RL efforts.
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome