No Priors Ep. 112 | With OpenAI Deep Research, Isa Fulford

No PriorsApr 24, 202530m

Sarah Guo (host), Isa Fulford (guest), Narrator

Origin and goals of OpenAI’s Deep Research productReinforcement learning and reinforcement fine-tuning (RFT) for agentic behaviorDesigning datasets, tools, and evaluations for browsing-based researchHuman expert data, synthetic data, and generalization across domainsSafety, hallucinations, and guardrails for research and action-taking agentsProduct evolution: private data, right actions, and unified agentsReal-world use cases and UX tradeoffs (depth vs speed, memory, preferences)

In this episode of No Priors, featuring Sarah Guo and Isa Fulford, No Priors Ep. 112 | With OpenAI Deep Research, Isa Fulford explores inside OpenAI Deep Research: Building a Broadly Capable Research Agent The episode features Isa Fulford discussing Deep Research, OpenAI’s agentic product that uses reinforcement learning, web browsing, and tools to perform complex, multi-step research tasks. She explains how the team moved from an internal demo to a production system by designing new datasets, tools, and evaluations grounded in real-world knowledge work. The conversation covers when reinforcement fine-tuning (RFT) is worth doing, how human experts and synthetic data shaped the model’s capabilities, and how Deep Research is already used across domains from science to fashion and travel. Fulford also outlines the path toward unified, trustworthy agents that can both research and take actions, along with the safety, memory, and UX challenges that must be solved first.

Inside OpenAI Deep Research: Building a Broadly Capable Research Agent

The episode features Isa Fulford discussing Deep Research, OpenAI’s agentic product that uses reinforcement learning, web browsing, and tools to perform complex, multi-step research tasks. She explains how the team moved from an internal demo to a production system by designing new datasets, tools, and evaluations grounded in real-world knowledge work. The conversation covers when reinforcement fine-tuning (RFT) is worth doing, how human experts and synthetic data shaped the model’s capabilities, and how Deep Research is already used across domains from science to fashion and travel. Fulford also outlines the path toward unified, trustworthy agents that can both research and take actions, along with the safety, memory, and UX challenges that must be solved first.

Key Takeaways

Ground agent training in concrete, high-value user tasks.

Rather than focusing on flashy transactional demos (like ordering food), the team started from real knowledge-work tasks—literature reviews, product comparisons, travel planning—and built datasets and evals specifically around those practical outcomes.

Get the full analysis with uListen AI

Use reinforcement fine-tuning when tasks are critical, niche, or out-of-distribution.

RFT is most worth the effort when a task is either central to your business and needs a substantial quality boost, or so different from a model’s training distribution (e. ...

Get the full analysis with uListen AI

Human experts plus synthetic data are key to high-quality agent behavior.

Deep Research relied on expert-generated browsing trajectories and outcomes across many domains, supplemented with synthetic datasets, to teach the model what good research looks like without hand-specifying every step of the process.

Get the full analysis with uListen AI

Tooling must evolve alongside models to unlock richer capabilities.

Today’s Deep Research uses a text-based browser (with PDFs and images) and Python for analysis; future agents will need wider toolsets and training data that force them to choose, combine, and backtrack across tools to solve complex, multi-step problems.

Get the full analysis with uListen AI

Preventing and exposing hallucinations remains critical, especially as answers get longer.

Even though Deep Research hallucinates less than previous OpenAI models, comprehensive, well-written answers can be over-trusted, so features like citations are essential to let users inspect and verify sources.

Get the full analysis with uListen AI

Agent memory and personalization will be essential for long, compound tasks.

As tasks expand from minutes to hours or days, agents must remember a user’s preferences and prior work instead of restarting each time, raising technical questions about context management and product questions about persistent memory and taste modeling.

Get the full analysis with uListen AI

Unified, coworker-like agents are plausible but safety and UX are hard blockers.

Fulford anticipates a single agent that can code, book travel, and manage work tasks, but stresses that issues like access to private data, unintended side effects, and how much users must confirm or override actions must be carefully resolved before broad deployment.

Get the full analysis with uListen AI

Notable Quotes

“If you can't write a literature review, you're not gonna be able to write a new scientific paper.”
— Isa Fulford

“We just are gonna go for max thinking time every time.”
— Isa Fulford

“Everybody does kind of see a pretty clear path to this broadly capable agent.”
— Isa Fulford

“Anything that you would delegate to a coworker, it should be able to do.”
— Isa Fulford

“It really was one of those things where we thought that training on browsing tasks would work… but actually the first time you train a model on a new dataset and seeing it actually working was pretty incredible.”
— Isa Fulford

Questions Answered in This Episode

How can organizations outside major labs practically assemble the kind of expert data and evals needed to do effective reinforcement fine-tuning for their own agents?

The episode features Isa Fulford discussing Deep Research, OpenAI’s agentic product that uses reinforcement learning, web browsing, and tools to perform complex, multi-step research tasks. ...

Get the full analysis with uListen AI

What new failure modes might appear once Deep Research-style agents gain the ability to take real-world actions, not just read-only browsing?

Get the full analysis with uListen AI

How should users and companies decide what tasks to trust a unified agent with versus keeping in human hands, especially when private data and reputational risk are involved?

Get the full analysis with uListen AI

What are promising approaches to agent memory that preserve usefulness and personalization while maintaining strong privacy and safety guarantees?

Get the full analysis with uListen AI

How might the UX evolve so that users can implicitly communicate their time-vs-quality tradeoffs (e.g., ‘five minutes of your best work’) without complex settings or toggles?

Get the full analysis with uListen AI

Transcript Preview

Sarah Guo

(instrumental music plays) Hi, listeners, and welcome back to No Priors. Today, I'm joined by Isa Fulford, one of the pioneering minds behind OpenAI's Deep Research. This is a new agentic product that OpenAI released in February of this year, which uses reasoning and tools, like web browsing, to complete multi-step research tasks for you. Today, they're making it free to all US users. Welcome, Isa. Isa, thank you for doing this.

Isa Fulford

Thank you so much for having me.

Sarah Guo

You, uh, and your team have shipped, like, one of the most exciting AI products of late. Um, I use it a lot, Deep Research. Where, where did the idea come from? Tell me the, uh, origin story.

Isa Fulford

Yeah. So around a year ago now, we were very excited about the progress internally on, um, this new reinforcement learning algorithm. We were seeing a lot of progress on math problems and science problems and coding problems. And at the same time, I was working with, um, my friend Yash from... who works at OpenAI, on a few side projects and we're very interested in agents generally, and kind of wondered if we could apply the same algorithm to tasks that are maybe more, um, in line with what the average user would do every day. And so the two first things we were thinking about were online browsing tasks, because I think in a lot of different professions, people do just have to do a lot of research, synthesize a lot of information, and then come back with a, a report. And then we were also thinking about software engineering. We kind of have been working on those things. I've been focusing on, um, browsing. So to start, we kind of... with, with the math and coding problems that people were already training on, those datasets already exist. You know, you can have a math problem with a ground truth answer and you can train on those. Um, but for browsing, it's kind of more open-ended. You don't really have datasets like that that exist. So we really started by grounding, um, the research and what product use cases we actually wanted the final model to be good at. So we literally would write out just a list of things, like, "I hope the model that could find this list of products for me, um, and rank them by, like, re- these reviews from Reddit," or something like that. Or, "I want it to be able to write a literature review on this topic."

Sarah Guo

I feel like a lot of people when they think about, you know, browsing and agents, they land on the same, like, two, three transactional use cases that I actually don't think are particularly inspiring, right? So it tends to be like, order a burger on DoorDash-

Isa Fulford

Mm-hmm.

Sarah Guo

... or something like that. Or, uh, I feel like ordering flowers is also, like, a really common one. Why do you think you came up with, like, such a different set of goals for the agent?

Install uListen to search the full transcript and get AI-powered insights

Get Full Transcript

Get more from every podcast

AI summaries, searchable transcripts, and fact-checking. Free forever.

Add to Chrome