No Priors Ep. 113 | With OpenAI's Eric Mitchell and Brandon McKinzie

This week on No Priors, Elad and Sarah sit down with Eric Mitchell and Brandon McKinzie, two of the minds behind OpenAI’s O3 model. They discuss what makes O3 unique, including its focus on reasoning, the role of reinforcement learning, and how tool use enables more powerful interactions. The conversation explores the unification of model capabilities, what the next generation of human-AI interfaces could look like, and how models will continue to advance in the years ahead. Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @mckbrando | @ericmitchellai Show Notes: 0:00 What is o3? 3:21 Reinforcement learning in o3 4:44 Unification of models 8:56 Why tool use helps test time scaling 11:10 Deep research 16:00 Future ways to interact with models 22:03 General purpose vs specialized models 25:30 Simulating AI interacting with the world 29:36 How will models advance?

Sarah GuohostBrandon McKinzieguestEric MitchellguestElad Gilhost

Apr 30, 202538mWatch on YouTube ↗

WHAT IT’S REALLY ABOUT

OpenAI’s O3: Tool-Using Reasoning Model Redefines Deep, Steerable AI

The episode explores OpenAI’s O3 reasoning model with researchers Eric Mitchell and Brandon McKinzie, focusing on how it ‘thinks before responding’ and uses tools to handle complex, multi-step tasks. They explain that O3 is trained heavily with reinforcement learning to solve hard problems, allocate compute at test time, and orchestrate tools like browsing and code execution. The conversation covers product tradeoffs between speed and depth, steerability for end users vs. developers, and why tool use dramatically improves test-time scaling, especially in vision and coding. They also discuss future directions such as computer use, robotics, multi-agent collaboration, better evals, and how AI can accelerate AI research itself.

IDEAS WORTH REMEMBERING

5 ideas

Reasoning models benefit from ‘thinking time’ and dynamic compute allocation.

O3 can spend more computation at inference to reason step-by-step, and empirical curves show that letting it think longer typically yields higher accuracy, especially on hard problems.

Tool use turns language models into higher-level agents rather than text generators.

By browsing, writing and running code, and iterating on results, O3 can autonomously decompose tasks like due diligence or research into sequences of tool calls, making its ‘thinking tokens’ far more productive.

Reinforcement learning on difficult, tool-based tasks is central to O3’s training.

Instead of only next-token prediction, O3 is optimized via RL to solve challenging, long-horizon tasks, learn when to call tools, and manage uncertainty and multi-step workflows.

Model steerability will matter as much as raw capability.

Users and API developers need to specify constraints like latency, cost, and depth of analysis; the vision is models that understand context (e.g., ‘this is an API call, be fast’) and adjust their reasoning budget accordingly.

Browsing and vision tools sharply improve test-time scaling and reliability.

For images and current information, O3 can recognize its own uncertainty and then act (e.g., crop, zoom, search) to reduce that uncertainty, leading to much steeper performance gains with more thinking time compared to closed-book reasoning.

WORDS WORTH SAVING

5 quotes

O3 is focused on thinking carefully before it responds, and these models are in some vaguely general sense smarter than models that don’t think before they respond.

— Eric Mitchell

You can feel this when you’re talking to O3… the longer it thinks, I really get the impression that I’m going to get a better result.

— Brandon McKinzie

You can just allocate compute a lot more efficiently because you can defer stuff that the model doesn’t have comparative advantage at to a tool that is really well suited to doing that thing.

— Eric Mitchell

It kind of drives me crazy in some sense that our models are not already just on my computer all day, watching what I’m doing… I hate typing.

— Brandon McKinzie

Evaluating the capabilities of a general capable agent is really hard to do in a rigorous way… evals are a little underappreciated.

— Eric Mitchell

Design and capabilities of OpenAI’s O3 reasoning modelReinforcement learning and test-time scaling for deep reasoningTool use (browsing, code execution, data analysis) as a force multiplierProduct tradeoffs: speed vs. depth, unifying vs. specializing modelsApplications in coding, research, analytics, and computer controlChallenges in evaluation, data, and multi-agent / human collaborationFuture directions: robotics, computer use, and spiky capability growth

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.