EVERY SPOKEN WORD
65 min read · 12,974 words- 0:00 – 6:12
Deep research’s inception & evolution
- SGSarah Guo
(instrumental music plays) Hi, listeners, and welcome back to No Priors. Today, I'm joined by Isa Fulford, one of the pioneering minds behind OpenAI's Deep Research. This is a new agentic product that OpenAI released in February of this year, which uses reasoning and tools, like web browsing, to complete multi-step research tasks for you. Today, they're making it free to all US users. Welcome, Isa. Isa, thank you for doing this.
- IFIsa Fulford
Thank you so much for having me.
- SGSarah Guo
You, uh, and your team have shipped, like, one of the most exciting AI products of late. Um, I use it a lot, Deep Research. Where, where did the idea come from? Tell me the, uh, origin story.
- IFIsa Fulford
Yeah. So around a year ago now, we were very excited about the progress internally on, um, this new reinforcement learning algorithm. We were seeing a lot of progress on math problems and science problems and coding problems. And at the same time, I was working with, um, my friend Yash from... who works at OpenAI, on a few side projects and we're very interested in agents generally, and kind of wondered if we could apply the same algorithm to tasks that are maybe more, um, in line with what the average user would do every day. And so the two first things we were thinking about were online browsing tasks, because I think in a lot of different professions, people do just have to do a lot of research, synthesize a lot of information, and then come back with a, a report. And then we were also thinking about software engineering. We kind of have been working on those things. I've been focusing on, um, browsing. So to start, we kind of... with, with the math and coding problems that people were already training on, those datasets already exist. You know, you can have a math problem with a ground truth answer and you can train on those. Um, but for browsing, it's kind of more open-ended. You don't really have datasets like that that exist. So we really started by grounding, um, the research and what product use cases we actually wanted the final model to be good at. So we literally would write out just a list of things, like, "I hope the model that could find this list of products for me, um, and rank them by, like, re- these reviews from Reddit," or something like that. Or, "I want it to be able to write a literature review on this topic."
- SGSarah Guo
I feel like a lot of people when they think about, you know, browsing and agents, they land on the same, like, two, three transactional use cases that I actually don't think are particularly inspiring, right? So it tends to be like, order a burger on DoorDash-
- IFIsa Fulford
Mm-hmm.
- SGSarah Guo
... or something like that. Or, uh, I feel like ordering flowers is also, like, a really common one. Why do you think you came up with, like, such a different set of goals for the agent?
- IFIsa Fulford
Yeah. So I think before we focused on taking right actions, which those are examples of taking right actions, we wanted to get really good at synthesizing information-
- SGSarah Guo
Mm-hmm.
- IFIsa Fulford
... from a large number of sources and mostly read-only tasks. That was for a number of reasons. Firstly, just a huge number of knowledge work professions mostly do that, so it would be quite useful for those groups of people. Secondly, I think the overall goal for OpenAI is to create an AGI that can make new scientific discoveries, and we kind of felt that a prerequisite to that is to be able to synthesize information. You know, if you can't write a literature review, you're not gonna be able to write a new scientific paper. So, felt very in line with the, um, company broader goals.
- SGSarah Guo
It's also very meta because you have, you know, helped make an AI that makes me better at learning, and it's learning. (laughs)
- IFIsa Fulford
Yeah. (laughs) Yeah. Oh, yeah, I hadn't thought about that. I love that. More practically, the read-only, um, read-only task is maybe a... the safety question is a bit more constrained, so it was a good thing to, to start with as well.
- SGSarah Guo
Yeah. It seems that the, you know, read-only space, people were also not nearly as ambitious as, as you were going in or you and Yash were going in about. Like, "Maybe it could understand this set of things for me." Okay, so you thought of these end evals or end... come up with a set of tasks that could be auto-gradable or fit a set of characteristics that made them better fit, um, the algorithms.
- IFIsa Fulford
Mm-hmm.
- SGSarah Guo
And then what?
- IFIsa Fulford
That was actually a huge process in itself. I think we initially had built a, um, a demo to pitch people on this idea, and there was no model training involved. It was fully just prompted models with the UI-
- SGSarah Guo
Mm-hmm.
- IFIsa Fulford
... pitching the vision of what this product could look like. And so I think after that, then we were at the point where we actually had to start thinking about, "How are we gonna do this? How are we gonna create the data? How are we gonna train the model? What tools do we have to create to enable the, the model to browse the internet, um, effectively?" And that was a lot of iteration. I was working very closely with, um, Edward Sun and a few, a few other people on this. And so, um, we also collaborated a lot with the RL team. I think it was definitely a big undertaking, and the... a good thing about it was we were able to work uninterrupted-
- SGSarah Guo
Mm-hmm.
- IFIsa Fulford
... for quite a few months, making, you know, the numbers on our evals go up. So, I think it was nice to have not too much, um, pressure to ship something really quickly, um, and we were just able to, to iterate and get it, get it to a good state.
- SGSarah Guo
Did you have a, a favorite, like, most important task?
- IFIsa Fulford
We had a few tasks. The people would just propose different tasks. One of them, um, was to find all of the papers that Liam Fellus and Barrett Zuff had written together. I think there is 11, and the model now can find most of them or all of them.
- SGSarah Guo
Okay.
- IFIsa Fulford
Um, we would always ask that question. And then another one, which the model actually can't answer anymore, probably for good reason, but the middle... finding the middle name of one of our coworkers. (laughs) And then personally, I, I think I started using it pretty early on for actually finding information for, like, um, product recommendations, travel, and I think actually quite a few people internally, we had kind of a, a Streamlit playground that people would just use. L- a lot of people had found it and were using it. Sam told me he used it to buy a bunch of things. Every time it would go down, people would message us like, "What happened? We need to use, um, the model," even when a previous version that honestly wasn't that good. So, I think that was a good-
- SGSarah Guo
That's a good sign.
- IFIsa Fulford
... initial sign.
- SGSarah Guo
Yeah. What can you say about, um, the actual bulk of the work, like the tool creation and the data creation?
- IFIsa Fulford
So for, for the data, we did a bunch of different things. We used human trainers. Um, for some of it, we kind of had to come up with new ways, new kinds of datasets, I guess, and we had to figure out how to design datasets to,
- 6:12 – 7:20
Data creation
- IFIsa Fulford
um, exercise the kind of skills that we wanted the model to learn. And then...You have to make a way to grade- grade those data sets as you're training them, and then you also have to make good tools for the model to be able to, like, actually complete the task successfully. So right now, we just have the browsing tool, which is a text-based browser, but it can see embedded images and, like, open PDFs, and then also it has access to a Python tool so it can do analysis and calculations and plot graphs and things like that. But you can imagine in future versions, we'll just expand the tool set, and so the model will just become more capable. But we'll also need to make data sets that actually make the model exercise all of those different tools and figure out how to use them and backtrack and, you know, all these different things during training, so that it's actually able to, like, flexibly answer new problems from users in the product.
- SGSarah Guo
It is clear that reinforcement fine-tuning on very powerful base models can do very useful things now. That's super exciting. What advice would you have for, um, startups or other companies who are thinking about doing RFT for a particular task as to, like, when it's worth doing or when they can just try to do,
- 7:20 – 9:05
Reinforcement fine-tuning
- SGSarah Guo
um, sort of just traditional orchestration where agents are a component?
- IFIsa Fulford
So I think in general, you will always get a model better at- better at a specific task if you train on that task. But we also see a lot of generalization from training on one kind of task to- to, you know, other domains. So you can train a reasoning model on mostly math, coding, other re- reasoning kind of problems, and it will be good at writing. But if you, you know, trained it on that specific task, it would be better at it. I think if you have a very specific task that you think is so different to anything that the model was likely trained on, and you try it a bunch of times yourself and you've tried a- a lot of different prompts and it's just really not good at it, so maybe it's some genetic sequencing task or something that's just so out of distribution for the model that-
- SGSarah Guo
Mm-hmm.
- IFIsa Fulford
... it doesn't know how to figure it out, I think that is a good time to try reinforcement fine-tuning. Or if you have a task that is so critical to your, like, business workflow that getting the extra 10, 15%-
- SGSarah Guo
In quality, yeah.
- IFIsa Fulford
... performance is really make or break, then probably try it. But if it's something that you think, oh, the model's pretty good at but it gets things wrong s- you know, some percentage of the time, and then you see with every next model that's released, it gets a little bit better, it might not be worth the effort if the model naturally is just gonna get better at those things. So that would be my recommendation.
- SGSarah Guo
Okay, great. Great advice. You've talked about needing to use human experts to create some of this data. I think of browsing as a somewhat universal task.
- IFIsa Fulford
Mm-hmm.
- SGSarah Guo
So I guess there are good and better- there are, you know, worse and better browsers. Like, where do you feel like you need expertise, or what do you know about browsing expertise that you didn't before? Or information gathering expertise?
- IFIsa Fulford
Yeah, I guess it's one of those things where basically every single profession involves, you know,
- 9:05 – 11:23
Why human expert data matters
- IFIsa Fulford
having a question or wanting to do research in a d- in a domain and then having to find information from many different sources to synthesize an answer- an answer. And, like, while you're doing that, you have to have the expertise to reason about is this a useful source, is this not, is this... you know, should- should I include this, is this, like, completely off-topic, whatever. Like, that is kind of universal to most jobs or most, you know, scientific domains, any- kind of anything. So... And the cool thing with RL is that you don't necessarily need to know the whole process of how the person would do the research. You just have to know what the task is and what the outcome should be, and the model will just learn during training how to get from the problem to a good answer. So I think we just ha- took a pretty broad approach. I think that's one thing that if you work at a place like OpenAI, you- I think you can do what they would tell most startups not to do and just try and focus on a really broad set of users and just get experts in loads of different domains and try and see if you can get good at everything at once, which was the approach that we took. And then we also created, um, a lot of synthetic data sets and things like that. But the human data was definitely a really key part, um, for, you know, making this model successful.
- SGSarah Guo
Did, uh, any of the learned planning from the model across these domains surprise you, like in terms of the path to find the perfect handbag or the restaurant in Japan or the set of papers that was relevant?
- IFIsa Fulford
Yeah, I guess sometimes it will use search terms that I wouldn't necessarily have used or, you know, we didn't teach it to plan upfront, but sometimes we'll see it- it does end up making a plan upfront, um, before starting its research. Sometimes, um, the model will do smart things and try to get around restrictions you put on it. So you have to make sure that it's not hacking, you know, and trying to use a different search engine other than the search engine that you gave it or something like that. Like, it will do smart things that you have to make sure you're looking out for, um, you know, in case you want to not allow the model to do those things (laughs) .
- SGSarah Guo
Maybe we can actually use this as a, um... like, a moment to talk about some of the failure modes, like how did you think about some of the, uh, classic issues with agents, like maybe, you know, compounding error or distraction or even safety?
- IFIsa Fulford
Yeah, so I think with deep research, since it can't actually take actions that
- 11:23 – 13:55
Failure modes of agents
- IFIsa Fulford
aren't kind of the same class of the typical agent safety problems you would think of, but I think the fact that the responses are much more comprehensive and take longer means that people will trust them more.
- SGSarah Guo
Mm-hmm.
- IFIsa Fulford
So I think maybe hallucinations is a bigger problem.
- SGSarah Guo
Mm-hmm.
- IFIsa Fulford
While this model is... hallucinates less than any model that we've ever released, it is- it's still possible for it to hallucinate, most times because it will infer something incorrectly from one of its sources. So that's part of the reason we have citations, 'cause it's very important we- that the user is able to check where the information came from, and if it's not correct, they can hopefully figure it out. Um, but yeah, that's definitely one of the biggest model limitations and something that we're actively always working on to improve. In terms of, like, future agents, I think the ideal agent will be able to do research and take actions for- um, on your behalf, and so I think that's a much harder question that we need to address. And it's kind of at that point when capabilities and safety kind of converge where an agent is not...... useful if you can't trust it to do a task in a way that doesn't have unintended side effects that you don't want. Like, if you ask it to do a task for you and then in the process it sends an embarrassing email-
- SGSarah Guo
(laughs)
- IFIsa Fulford
... or something like this, you know, that's not a successful completion of the task. So I think that is gonna be a much more interesting and difficult, like, safety area that we're starting to tackle.
- SGSarah Guo
You can tell me if you just don't have a projection here, but do you think people are gonna want explicit guardrails? Do you think you can learn a bunch of those characteristics in the model itself?
- IFIsa Fulford
If you've used Operator, I'm sure you have, you have to confirm every right action. I think to start with, that makes a lot of sense. You want to build trust, um, with users and as the models become more capable, maybe you've seen it successfully do things a few times and you'll, you start to trust it more and so maybe you allow it to, okay, every time, you don't have to ask me every time you send an email to these people. Like, that's fine. Um, but I, I do think that as these agents start to roll out, we s- we will definitely want to have guardrails and confirmation, uh, just so, you know, while they're not the end state of capability, we still want to make sure we have, like, a good level of oversight. But I think that they will get so good that we'll just trust them to do things on our behalf.
- SGSarah Guo
What are some of the obvious ways you feel like Deep Research as a product is going to get better?
- IFIsa Fulford
Yeah, I think-
- SGSarah Guo
I mean, it's gonna extend into right, you just implied that at some point.
- IFIsa Fulford
Yes. I mean-
- SGSarah Guo
Yeah. (laughs)
- IFIsa Fulford
... I think may- maybe it's, you know, the ideal state would, to have, would be to have a unified agent that can do all of these different things. Anything that you would delegate
- 13:55 – 18:32
The roadmap ahead for Deep Research
- IFIsa Fulford
to a coworker, it should be able to do.
- SGSarah Guo
How are we gonna make decisions about if it's like, "Sarah, you do this," versus, "Agent, please do this"?
- IFIsa Fulford
Yeah. I guess-
- SGSarah Guo
Or is it always just try the agent first? (laughs)
- IFIsa Fulford
Probably, I mean, I would, I would try (laughs) the agent first if it was my work. It's kind of the pattern of every time the model, um, becomes more capable, the level up of abstraction of the human becomes higher, if that makes sense. Like, the task you're asking it to do is just higher and higher level but you're still initiating the task. So, you know, maybe previous, a year ago, I was asking it to write a function for me and now I'm writing it to, asking it to write a whole file and maybe next year, it will, you know, make a whole PR for me or something like that. So I still think we'll be in the driving seat. Uh, as to Deep Research, I think obvious next steps for Deep Research would also be to, um, have access to private data. Like, be able to do research over, you know, any internal documentation or GitHub, whatever it is.
- SGSarah Guo
There's a golden thread here-
- IFIsa Fulford
(laughs)
- SGSarah Guo
... because when we first met, you were working on retrieval.
- IFIsa Fulford
Yes. (laughs)
- SGSarah Guo
And I was like, "There cannot be only one person at this company-"
- IFIsa Fulford
(laughs)
- SGSarah Guo
"... working on retrieval." (laughs)
- IFIsa Fulford
Everything, uh, all roads lead back to retrieval. (laughs)
- SGSarah Guo
Mm-hmm.
- IFIsa Fulford
So I think that will be really cool and then, um, eventually, you know, taking right actions or calling APIs and then obviously there are just a lot of things that the model is not perfect at now that we just need to improve. But it's, I think we have a really cool, like, working relationship with, um, the reinforcement learning team, so a lot of teams will contribute data sets to the big runs that they do. So we contribute data sets and then as they train models, you know, with a ton of compute, then it just becomes a better base model for us to continue training from. Um, so just think the capabilities are compounding.
- SGSarah Guo
So this was not a low-key research preview, but a side project that turned into a very interesting, you know, internally pitched project. How do you think about, like, what is a product that OpenAI or at least you yourself want to work on independently versus, like, what belongs in the core research, um, path?
- IFIsa Fulford
A cool thing about OpenAI is that, um, even though the company is bigger, I think the culture of anyone being able to have an idea and, um, you know, sh- prove it out and then push it to completion is still, um, you know, it's still been maintained as the company has grown. For me personally, I'm always motivated by, uh, to work on things that I will use myself. With Deep Research, for example, I, I do use it a lot, um, for, you know, looking up various things, travel recommendations, but I think I'm probably a daily active user. It's fun when you get the dog food-
- SGSarah Guo
I think I'm a-
- IFIsa Fulford
... join account. (laughs)
- SGSarah Guo
... I think I'm a Delta now. (laughs)
- IFIsa Fulford
Oh, amazing. (laughs)
- SGSarah Guo
Yeah. I'm burning a lot of GPUs for this.
- IFIsa Fulford
(laughs)
- SGSarah Guo
Are there use cases where like, you know, you're the original expert, are there ways that you or Yash or, like, you've seen the user base use them that you encourage people to use Deep Research?
- IFIsa Fulford
I'm always interested to see people using it in domains that I have absolutely no expertise in. For example, in medical research or I've seen a lot of different scientists posting about how they've used Deep Research and how it helped them do something, to me that's the most interesting because when we were working on it, I obviously had n- no way of judging whether an output is good or not. So seeing experts actually, like, ratify Deep Research respon- responses is useful. An area that I was surprised to see people using the, the model in was, um, code search and, um, for coding questions.
- SGSarah Guo
Mm-hmm.
- IFIsa Fulford
I think, like, use the latest package or latest version of whatever repo to help me write this file or something. For data analysis as well, that's also something, um, the model's already pretty good at and I think will just continue to get better at. Um, I think, you know, uploading a file or something like that and having it do some analysis for you or do some research and then create a report with, um, numerical analysis is pretty interesting.
- SGSarah Guo
I actually haven't tried this so it's, and it's not a, it's not a browsing task, like, what makes the model particularly good at this or what is it capable of? Is it really, like, multi-step and then being able to do planning and understanding of the task and produce a report that's cohesive?
- IFIsa Fulford
Yeah. I think also the base model or the model that we started fine-tuning from O3 is just very capable model. Um, it's trained on many different data sets including a lot of coding, um, reasoning and math tasks. So that inherited capability is pretty strong and then when you add the browsing on top of that, still able to do that analysis, so I think those two together can be quite powerful.
- SGSarah Guo
Uh, before the podcast, we were just talking about, um, the idea of, like, learning taste or-
- 18:32 – 19:29
How do agents develop taste?
- SGSarah Guo
information ingestion preferences?
- IFIsa Fulford
Yeah. I think agent memory will definitely be-... very important. It would be very annoying if every time you ask it to do a task, you have to repeat the same information, how you want it to do the task, everything about you-
- SGSarah Guo
Mm-hmm.
- IFIsa Fulford
... which currently, for Deep Research, you do have to do. And I think as the tasks get more complex and, um, you know, right now, it will take five to 30 minutes. You can imagine in the future, it might take hours or days to complete a task that you ask the model to do. You definitely want the, the models researched to be compounding. You don't want it to ha- want it to have to start fresh every time. So, I, I don't necessarily have a good answer, but I think it's something that will be very important.
- SGSarah Guo
There is a common understanding between many people at some of the leading labs that, like, the recipe to AGI is, I'd say, somewhat known.
- IFIsa Fulford
Mm-hmm.
- SGSarah Guo
Uh, or, you know, there's confidence on this, and, you know, the return of RL is very exciting for everyone.
- IFIsa Fulford
Yeah. (laughs)
- SGSarah Guo
The stance that I've heard from you and from others is both enthusiasm on like, "This seems to work. We're gonna
- 19:29 – 22:03
Experience and path to building a broadly capable agent
- SGSarah Guo
get real capability out of it. Um, it's quite data efficient, and it's gonna be a lot of work." Tell me a little bit about, like, the emotional experience of building Deep Research and if that changes your view at all.
- IFIsa Fulford
I agree with everything, everything you said. I think it's so impressive to see how data efficient the algorithm is. I, I guess the data you train on is much higher quality, uh, and smaller, so actually, um, curating that is an undertaking. And then making sure that the model has access to all the tools that a human would have access to, to do the work that they need to do. And then making sure that you represent tasks that people will find useful or do in their jobs in a way that you can, um, you know, judge whether the model did a good job or not is also hard. And there's so many other challenges for pre-training where you have s- so much more data. You, you know, have to do all of these different things that are like... I think it's just a different challenge, and both are compounding. Like, you need a really good base model to be able to do RL. And then for our team, we just do more RL.
- SGSarah Guo
Mm-hmm. (laughs)
- IFIsa Fulford
(laughs) So, yeah, it's like all very compounding, but I could... think the we... everybody does kind of see a pretty clear path to this, like, broadly capable agent.
- SGSarah Guo
Do you think there are, like, big blockers to progress of, like you said, um, maybe not exactly describing it as the next iteration of Deep Research, but just confidence that, you know, we're gonna have these unified agent capabilities, and it will feel like a coworker? What stands between us and that?
- IFIsa Fulford
There's a lot of really hard safety questions that we need to figure out. You know, we would never ship anything that we don't have, like, very high confidence is safe. And I think it's... the stakes are way higher when it can... when it has access to your GitHub repositories and your passwords and your private data. So, I think that's a really big challenge. I guess, also, if you want the model to be able to do tasks that take many, many hours, finding efficient ways to manage context, kind of similar to the memory thing. But if you're doing a task for a really long time, you're gonna l- run out of context. So, what's an efficient way of, like, dealing with that, um, allowing the model c- to continue to do its thing? And then, yeah, the just, the task of making, making the data and making the tools. I mean, I've s- I've said this already a few times, but that, that's a lot of work.
- SGSarah Guo
I was just looking at my history of queries. My user request is like, I wanna see what things I asked of Deep Research versus other models-
- IFIsa Fulford
Mm-hmm.
- SGSarah Guo
... in particular, in my memory. But it has ranged from like, obviously, you know, if I'm trying to get up to speed on a market for a company I'm looking at or on a technical topic or travel planning.
- 22:03 – 25:55
Deep research vs. o3
- SGSarah Guo
That's a big one. Also, I- I have looked for things that are taste-related. So, I'll be like, "Okay, I, um, like, you know, this set of books for these reasons. I want you to, you know, actually just giving me a long-form summary of a bunch of other things you think I should read and explain why." I realize I don't have a super clear mental model of, like, when Deep Research should be better than O3. What instinct can you give me here?
- IFIsa Fulford
Deep Research is very good when you have a very specific query or well-defined query. So, maybe not a general overview of a topic, but some... you're looking for some specific information, and you think it would be supplemented by existing research online. Even if that information is also, you know... we also train the model on... the base model on that information, I think having live access to it is quite useful.
- SGSarah Guo
So, if I have any instinct about, like, directing to retrieval or particular sources-
- IFIsa Fulford
Yeah.
- SGSarah Guo
... that focusing is useful.
- IFIsa Fulford
I think so.
- SGSarah Guo
Mm-hmm.
- IFIsa Fulford
And also, we trained it to have much longer outputs than I think the, you know, normal models would. So, if you're looking for something very comprehensive, maybe sometimes too comprehensive for some tasks, I think Deep Research is, is... will be useful for those things.
- SGSarah Guo
Connect this for me to a Deep Research, like, fashion task.
- IFIsa Fulford
(laughs) I've used it to find new brands. So I'll say, "These are the kinds of brands I like. Please find new, new brands where I can find this specific coat that looks like this one," or something like that. And then it's very good at finding those versus the, I think the base model or the normal model will say... It will give you some brands, but it won't necessarily fit all of the constraints that I had given. Like, I want it to sell this, you know, fake fur coat that's this length, um, this season or something. It's not gonna be able to do that, 'cause it just won't have the up-to-date information and also just won't necessarily be able to, like, deal with all of the constraints in a query, like in one, in one shot. O1 isn't browsing-
- SGSarah Guo
Mm-hmm.
- IFIsa Fulford
... as comprehensively. I'll use it to find things where I'm looking for a very specific thing that would take me hours to find. So, I'm looking for this very specific item or, you know, sweater that is probably available on RealReal or somewhere, but I can't find it. Or I'm looking for an Airbnb with, like, very specific constraints. So, I think those kinds of thing Deep Research is good for, and then more general, like, high-level things, you should use, like, normal search for.
- SGSarah Guo
Yes. Well, I will admit, I have had some multi-year-
- IFIsa Fulford
(laughs)
- SGSarah Guo
... browsing/shopping tasks-
- IFIsa Fulford
(laughs)
- SGSarah Guo
... that I am now making a, um, cron job for Deep Research to support.
- IFIsa Fulford
(laughs) Good.
- SGSarah Guo
I wanna ask just one more experience question, which is, um, was there a particular, like, win or failure that surprised you in the training of, um, Deep Research?
- IFIsa Fulford
It really was one of those things where we thought that, you know, training on browsing tasks would work, you know.... felt like we had good conviction in it, but actually the first time you train a model on a new data set using this algorithm and seeing it actually working and playing with the model was pretty in- incredible, even though we thought it would work. So honestly, just that it worked so well was pretty surprising.
- SGSarah Guo
Mm-hmm.
- IFIsa Fulford
Even though we thought it would, if that makes sense. (laughs)
- SGSarah Guo
Yeah, yeah.
- IFIsa Fulford
Um-
- SGSarah Guo
It's the, it's the visceral experience of like, "Oh, the path is paved with strawberries or whatever." (laughs)
- IFIsa Fulford
Yeah. (laughs) Exactly. But then sometimes some of the things that it fails at are also surprising. Like sometimes it will make a mistake, or it will do such smart things and then make a mistake where I was just thinking, "Why are you doing that?" Like, "Stop." So I think there's definitely a lot of room for improvement, but yeah. We've been impressed with the model so far.
- SGSarah Guo
I'm used to all my technology tools being instantaneous.
- IFIsa Fulford
Yes.
- SGSarah Guo
Deep Research is not instantaneous. It's thinking and using tools. Can it be faster?
- IFIsa Fulford
Yeah, I do think there's a good middle ground in between where sometimes you don't want it to do really deep research, but you want it to do more than a search.
- 25:55 – 27:56
Latency
- IFIsa Fulford
and will fill that gap.
- SGSarah Guo
Okay. I don't know how to communicate this preference, but I want to, like, toggle at some point-
- IFIsa Fulford
I know.
- SGSarah Guo
... to be, like, as much work as, I mean, 'cause I would say this to a human, "I want you to do as good of a job you possibly can do in the next five minutes." (laughs)
- IFIsa Fulford
Yeah, see, tha- that's something where I think it seems like a bad UX to actually make the user make that decision. The model should just be better at knowing how much time to think. I think we made a decision when training the model that we just are gonna go for max thinking time every time.
- SGSarah Guo
Mm-hmm.
- IFIsa Fulford
So I'm sure, I- I've s- we'll ask it a really simple query sometimes just to test and then get quite frustrated that it's still thinking. So I do think that it's, that's also an area for improvement is, you know, knowing what, how long to think for. But yeah. I- I suspect with, um, our, with Deep Research, we'll always be focusing on the tasks that take the maximum length of time and then I think, like, 03 or, you know, O next, we'll have a better in between.
- SGSarah Guo
What is an example of a task you can imagine Deep Research taking a day at in the future?
- IFIsa Fulford
I mean-
- SGSarah Guo
There's some GPUs smoking over here. (laughs)
- IFIsa Fulford
Yeah. I think anything that would take... I mean, right now and find what 30 minutes it can do what human experts rate take many hours. So I guess in an hour, it could do something that would take a human days. In a day, it could, you know, do something that would take a human weeks. Obviously it would, there will be a lot of challenges to get it to scale like that but I think, you know, you can imagine it doing a research project that would have taken weeks to complete or, like, write a thesis or something like that.
- SGSarah Guo
Okay. I'm gonna make our intern compete with it-
- IFIsa Fulford
Okay. (laughs)
- SGSarah Guo
... over the next couple of months then. Yeah.
- IFIsa Fulford
Sounds good. (laughs)
- SGSarah Guo
If you, um, were to project forward a year, which is a really long time in AI land, what is something that you think will surprise people that Agents can do and that will actually be released? So takes the safety considerations into, into the set.
- IFIsa Fulford
Yes. A general agent that could do a lot of the h- you
- 27:56 – 30:45
Predictions for agent capabilities
- IFIsa Fulford
know, help you do a lot of the tasks that you would do in a lot of different areas. Like, for me, I do a lot of coding. I'm hoping that there'll be an agent that is pretty-
- SGSarah Guo
(laughs)
- IFIsa Fulford
... pretty sufficient at coding, but that I will just trust to, I'll give it a task and it will hopefully make a PR or something. Or maybe I could ask the same agent to help me book a trip to Korea or something. Um, I hope that we'll get to a more unified, um, experience. But I also think that the rate at which these models are improving is gonna be pretty surprising to, to most people.
- SGSarah Guo
Why do you think a unified experience is important? Or why do you think that makes sense? 'cause I think today it's, like, quite different to think about. O- obviously ChatGPT is-
- IFIsa Fulford
Mm-hmm.
- SGSarah Guo
... one experience that's very encompassing. But there are models that people use in different contexts like, you know, um, next line completion type models for coding-
- IFIsa Fulford
Yeah.
- SGSarah Guo
... that are, are just, feel like a very different setting.
- IFIsa Fulford
I think that you'll probably want both. Like, you'll probably want an experience where you can at some point override or interrupt the model and say, "Oh, no, I didn't mean that." Or you can take over and, like, start typing something.
- SGSarah Guo
Yeah.
- IFIsa Fulford
Especially in the short term as the models are not as capable as humans in a lot of areas and are more capable in other areas. Um, so I think it will be a combination of, like, you asking the model to do something but then when maybe to go with the coding example, then maybe you're also in your, your VS Code or whatever it is, your cursor and you're, it's been doing something for you but you can also, like, actually type and, you know, write some of it yourself. So I think it will be a combination of those things but I- I kind of want it to be something that is just like a, it's like having some, a coworker on Slack or, like, a remote coworker you can just ask to do things for you. Send them a Slack message and then they'll start doing it and then you can, like, review their work or, you know, help at some point. But it seems like a pretty nice, like, general interface and you don't have to think about which agent should I ask to do which task. Like, it should just be able to figure it out.
- SGSarah Guo
The mental model I have for this is my general ethos is actually, I love the people I work with. Um, I prefer to work with fewer people with less management overhead, all things considered. Mm-hmm. Because each person has more context and I have more understanding of them. And so, like, the universally useful agent is attractive for that reason.
- IFIsa Fulford
Yeah. And you only have to tell, tell it something once and it will remember and then it will have state on everything you're working on, things like that.
- SGSarah Guo
Awesome. Well, this has been a great conversation, Isa. Thanks for doing this and thank you for the product release.
- IFIsa Fulford
Thank you so much for having me and thank you for using Deep Research. (laughs)
- NANarrator
(instrumental music plays) Find us on Twitter @nopriorspod. Subscribe to our YouTube channel if you wanna see our faces. Follow the show on Apple Podcasts, Spotify, or wherever you listen. That way you get a new episode every week. And sign up for emails or find transcripts for every episode at no-priors.com.
Episode duration: 30:45
Install uListen for AI-powered chat & search across the full episode — Get Full Transcript
Transcript of episode qfB4eDkd_40
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome