No PriorsNo Priors Ep. 113 | With OpenAI's Eric Mitchell and Brandon McKinzie
At a glance
WHAT IT’S REALLY ABOUT
OpenAI’s O3: Tool-Using Reasoning Model Redefines Deep, Steerable AI
- The episode explores OpenAI’s O3 reasoning model with researchers Eric Mitchell and Brandon McKinzie, focusing on how it ‘thinks before responding’ and uses tools to handle complex, multi-step tasks. They explain that O3 is trained heavily with reinforcement learning to solve hard problems, allocate compute at test time, and orchestrate tools like browsing and code execution. The conversation covers product tradeoffs between speed and depth, steerability for end users vs. developers, and why tool use dramatically improves test-time scaling, especially in vision and coding. They also discuss future directions such as computer use, robotics, multi-agent collaboration, better evals, and how AI can accelerate AI research itself.
IDEAS WORTH REMEMBERING
5 ideasReasoning models benefit from ‘thinking time’ and dynamic compute allocation.
O3 can spend more computation at inference to reason step-by-step, and empirical curves show that letting it think longer typically yields higher accuracy, especially on hard problems.
Tool use turns language models into higher-level agents rather than text generators.
By browsing, writing and running code, and iterating on results, O3 can autonomously decompose tasks like due diligence or research into sequences of tool calls, making its ‘thinking tokens’ far more productive.
Reinforcement learning on difficult, tool-based tasks is central to O3’s training.
Instead of only next-token prediction, O3 is optimized via RL to solve challenging, long-horizon tasks, learn when to call tools, and manage uncertainty and multi-step workflows.
Model steerability will matter as much as raw capability.
Users and API developers need to specify constraints like latency, cost, and depth of analysis; the vision is models that understand context (e.g., ‘this is an API call, be fast’) and adjust their reasoning budget accordingly.
Browsing and vision tools sharply improve test-time scaling and reliability.
For images and current information, O3 can recognize its own uncertainty and then act (e.g., crop, zoom, search) to reduce that uncertainty, leading to much steeper performance gains with more thinking time compared to closed-book reasoning.
WORDS WORTH SAVING
5 quotesO3 is focused on thinking carefully before it responds, and these models are in some vaguely general sense smarter than models that don’t think before they respond.
— Eric Mitchell
You can feel this when you’re talking to O3… the longer it thinks, I really get the impression that I’m going to get a better result.
— Brandon McKinzie
You can just allocate compute a lot more efficiently because you can defer stuff that the model doesn’t have comparative advantage at to a tool that is really well suited to doing that thing.
— Eric Mitchell
It kind of drives me crazy in some sense that our models are not already just on my computer all day, watching what I’m doing… I hate typing.
— Brandon McKinzie
Evaluating the capabilities of a general capable agent is really hard to do in a rigorous way… evals are a little underappreciated.
— Eric Mitchell
High quality AI-generated summary created from speaker-labeled transcript.
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome