At a glance
WHAT IT’S REALLY ABOUT
Inside OpenAI Deep Research: Building a Broadly Capable Research Agent
- The episode features Isa Fulford discussing Deep Research, OpenAI’s agentic product that uses reinforcement learning, web browsing, and tools to perform complex, multi-step research tasks. She explains how the team moved from an internal demo to a production system by designing new datasets, tools, and evaluations grounded in real-world knowledge work. The conversation covers when reinforcement fine-tuning (RFT) is worth doing, how human experts and synthetic data shaped the model’s capabilities, and how Deep Research is already used across domains from science to fashion and travel. Fulford also outlines the path toward unified, trustworthy agents that can both research and take actions, along with the safety, memory, and UX challenges that must be solved first.
IDEAS WORTH REMEMBERING
5 ideasGround agent training in concrete, high-value user tasks.
Rather than focusing on flashy transactional demos (like ordering food), the team started from real knowledge-work tasks—literature reviews, product comparisons, travel planning—and built datasets and evals specifically around those practical outcomes.
Use reinforcement fine-tuning when tasks are critical, niche, or out-of-distribution.
RFT is most worth the effort when a task is either central to your business and needs a substantial quality boost, or so different from a model’s training distribution (e.g., specialized genomics workflows) that prompting alone cannot reach acceptable performance.
Human experts plus synthetic data are key to high-quality agent behavior.
Deep Research relied on expert-generated browsing trajectories and outcomes across many domains, supplemented with synthetic datasets, to teach the model what good research looks like without hand-specifying every step of the process.
Tooling must evolve alongside models to unlock richer capabilities.
Today’s Deep Research uses a text-based browser (with PDFs and images) and Python for analysis; future agents will need wider toolsets and training data that force them to choose, combine, and backtrack across tools to solve complex, multi-step problems.
Preventing and exposing hallucinations remains critical, especially as answers get longer.
Even though Deep Research hallucinates less than previous OpenAI models, comprehensive, well-written answers can be over-trusted, so features like citations are essential to let users inspect and verify sources.
WORDS WORTH SAVING
5 quotesIf you can't write a literature review, you're not gonna be able to write a new scientific paper.
— Isa Fulford
We just are gonna go for max thinking time every time.
— Isa Fulford
Everybody does kind of see a pretty clear path to this broadly capable agent.
— Isa Fulford
Anything that you would delegate to a coworker, it should be able to do.
— Isa Fulford
It really was one of those things where we thought that training on browsing tasks would work… but actually the first time you train a model on a new dataset and seeing it actually working was pretty incredible.
— Isa Fulford
High quality AI-generated summary created from speaker-labeled transcript.
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome