From Vibe Coding to Vibe Researching: OpenAI’s Mark Chen and Jakub Pachocki

What comes after vibe coding? Maybe vibe researching. OpenAI’s Chief Scientist, Jakub Pachocki, and Chief Research Officer, Mark Chen, join a16z general partners Anjney Midha and Sarah Wang to go deep on GPT-5—how they fused fast replies with long-horizon reasoning, how they measure progress once benchmarks saturate, and why reinforcement learning keeps surprising skeptics. They explore agentic systems (and their stability tradeoffs), coding models that change how software gets made, and the bigger bet: an automated researcher that can generate new ideas with real economic impact. Plus: how they prioritize compute, hire “cave-dweller” talent, protect fundamental research inside a product company, and keep pace without chasing every shiny demo. Timecodes: 0:00 Introduction 0:25 The Launch of GPT-5 2:28 Evaluating Progress: Evals & Milestones 5:07 Surprising Capabilities of GPT-5 7:10 The Future of Automated Research 8:59 Agency, Reasoning, and Model Planning 10:18 Extending Progress Beyond Verifiable Domains 12:11 The Role and Success of Reinforcement Learning 14:44 Reward Modeling and Best Practices 15:54 The Evolution of Coding with AI 21:39 What Makes a Great Researcher? 27:20 Building and Sustaining a Winning Research Culture 31:40 Balancing Product and Fundamental Research 38:36 Prioritization, Compute, and Resource Allocation 41:19 The Intersection of Academia and Frontier AI 46:56 Maintaining Speed and Learning at Scale 48:52 Trust and Collaboration at OpenAI Resources: Find Jakub on X: https://x.com/merettm Find Mark on X: https://x.com/markchen90 Find Sarah on X: https://x.com/sarahdingwang Find Anjney on X: https://x.com/AnjneyMidha Stay Updated: If you enjoyed this episode, be sure to like, subscribe, and share with your friends! Find a16z on X: https://x.com/a16z Find a16z on LinkedIn: https://www.linkedin.com/company/a16z Listen to the a16z Podcast on Spotify: https://open.spotify.com/show/5bC65RDvs3oxnLyqqvkUYX Listen to the a16z Podcast on Apple Podcasts: https://podcasts.apple.com/us/podcast/a16z-podcast/id842818711 Follow our host: https://x.com/eriktorenberg Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures.

Jakub PachockiguestMark ChenguestAnjney MidhahostSarah Wanghost

Sep 24, 202553mWatch on YouTube ↗

WHAT IT’S REALLY ABOUT

OpenAI leaders on GPT-5, evals, RL, and automated research vision

GPT-5 is positioned as a unification move that brings “reasoning by default” to mainstream users, reducing confusion between instant-response GPT models and longer-thinking O-series models.
OpenAI views traditional benchmarks as increasingly saturated and is shifting toward evals tied to long-horizon autonomy and economically relevant discovery rather than incremental percentage gains.
The team highlights surprising GPT-5 gains in hard sciences, citing experiences where the model can automate work that might take human students months, and frames this as an early signal toward “automated researcher” capabilities.
Reinforcement learning is described as a continuing engine of progress because it can be anchored to the rich “environment” created by language-model pretraining, enabling targeted capability improvements and extended reasoning reliability.
They argue that sustaining frontier velocity requires protecting fundamental research from product pull, maintaining clear long-term objectives (automated researcher), and doing disciplined compute-driven portfolio prioritization.

IDEAS WORTH REMEMBERING

5 ideas

GPT-5’s product thesis is “reasoning without mode selection.”

They frame GPT-5 as removing the user burden of choosing between fast models and long-thinking models by automatically tuning “how much thinking” a prompt needs, making agentic behavior feel default rather than optional.

Benchmark saturation is pushing eval design toward real discovery and autonomy.

Moving from 96% to 98% on long-used tests matters less; they want evals that measure whether the model can operate autonomously for long periods and produce economically relevant novel outputs.

Competition results are treated as proxies for future research ability.

They cite IOI/AtCoder/IMO-style markers as meaningful because many elite human researchers share those backgrounds, even while acknowledging these too may eventually saturate.

Long-horizon agency and stability are the same core problem: consistency over time.

As models take more steps and use more tools, reliability depends on sustained reasoning, self-correction, and not “going off track” across long sequences rather than optimizing single-shot accuracy alone.

Hard-science usefulness is emerging as a ‘light bulb’ moment for experts.

They describe GPT-5 Pro surprising physicists and mathematicians by producing nontrivial math/science help—sometimes compressing months of student effort—indicating practical research leverage beyond toy tasks.

WORDS WORTH SAVING

5 quotes

So the big thing that we are targeting with our research is producing, um, an automated researcher, so auto-automating the discovery of new ideas.

— Jakub Pachocki

I do feel like already it's kind of transformed the default for coding. Um, this past weekend, I was talking to some, some high schoolers and they were saying, "Oh, you know, actually the default way to code is vibe coding."

— Mark Chen

I, I do think, you know, the future hopefully will be vibe researching.

— Mark Chen

Persistence, uh, is a, is a very key trait, right? Like, I think, like, what, w- what is different about research when you're actually trying to... I, I think the special thing about research, right, is you're trying to create something or, or learn something that is just not known, right? Like, it's not known to work.

— Jakub Pachocki

I think the danger is you end up like second place at everything and- you know, not like, you know, clearly leading at, at, anything.

— Mark Chen

GPT-5 launch goals: reasoning into the mainstreamEvals: saturation, competitions, autonomy durationAutomated researcher and long-horizon planning/memoryAgency vs stability trade-offs in multi-step tool useReinforcement learning’s ongoing returnsReward modeling and evolving best practicesAI-native coding workflows (“vibe coding”) and the path to “vibe researching”Researcher attributes: persistence, taste, truth-seekingResearch culture: protecting fundamental work, hiring signals, deep benchCompute constraints and portfolio allocationAcademia–industry crossover and accelerated researcher trainingTrust and collaboration between research leaders

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.