No PriorsNo Priors Ep. 41 | With Imbue Co-Founders Kanjun Qiu and Josh Albrecht
EVERY SPOKEN WORD
75 min read · 14,615 words- 0:00 – 4:55
Introduction to Imbue
- SGSarah Guo
Imbue is a company developing AI agents that can reason and code. Today, Elade and I sit down with Kanjun Qiu and Josh Albrecht, co-founders of Imbue, to discuss training large foundation models for high-level reasoning, why agents require architectures different from large language models or language token prediction models, and how current computers are getting in the way of their users. Kanjun, Josh, welcome to No Priors.
- KQKanjun Qiu
Thank you.
- JAJosh Albrecht
Thanks.
- SGSarah Guo
So perhaps you can start by just telling us the story of how you guys know each other and where the idea for Imbue came from.
- KQKanjun Qiu
Josh and I met at a conference and then started a big house together, uh, it was a big house, 20% house, and also started this first company around the same time. I've always been really interested in agency, uh, and kind of like how do we enable humans to have more agency, and Josh has always been really interested in AI, uh, and so it kind of made sense. Uh, we, at that time, talked about like, "Oh, you know, someday we're gonna be able to have AI systems that give humans a lot more agency." Fast-forward to 2018 or so, uh, we were running an AI recruiting company called Sourceress, and that was actually kind of the first AI agent that we built. Um, it was, you know, not transformer models, uh, like, more old-school MLP, uh, but it was a system that recruiters used and kind of automatically got candidates in their inbox, and we learned a lot about, "Oh, if you have an autonomous system like this, like what do you actually need to make it work?" And around that time, some of our housemates were building GPT-3, and, uh, we were seeing, like, "Oh, scaling works. You know, if we just keep scaling, actually, you're gonna get pretty far with a lot of these language models." So our question at that time was, you know, "How far can we get with language models? Does this kind of self-supervised learning, which is working so well on language, work in other modalities as well?" So in early 2020, that's when we first started seeing self-supervised learning working across video and images and language, and we were like, "Huh, there's something really interesting here where maybe machines are learning the same kinds of representations or similar representations of what humans are learning, and maybe they can get to a point where they can actually do the types of things that humans are able to do." And that's when we first started Imbue, or start- started talking about Imbue.
- SGSarah Guo
You clearly know a bunch of people working at sort of large, uh, language model research labs well. When you looked at what they were doing, how did the focus come to be on agents in particular, and how is that different from a general language model?
- JAJosh Albrecht
Yeah, I- I think we've always been interested in agents, in not just, you know, recommender systems or classifiers or things like that, but in systems that are gonna go do real work for us, right? That are gonna actually be useful in the real world. Right now, you can ask some kind of chatbot something, and it'll give you back a response, but the burden is sort of on you to go do something with that to verify whether it's correct or not. I think the real promise of AI is if we can get systems that can actually act on our behalf and can accomplish goals and kind of do these larger things and sort of free us up to, uh, to focus on the things we're interested in.
- KQKanjun Qiu
Yeah, one thing that I think we often forget, because we're in it every day, is our computers are actually, like, they need to be micromanaged. The reason we're in front of our computers every day is because nothing really happens. Like, they can't make decisions on their own. Nothing really happens unless I'm in front of it, and I'm, like, in front of it doing all this really detailed stuff, uh, kind of like operating a factory machine with all of these, like, little knobs that are really specific. And there is a future where computers don't need to be micromanaged, that I can, like, give a computer an instruction, whether it's natural language or something, some other kind of instruction. Like, it can go off and understand what I'm trying to do and help me do it. The diff between this, where we are today, and that is kind of like the diff between the first calculator and where computers are today. Like, the first computer, very first digital computer was a room-sized calculator. All it did was calculate, uh, Fourier transforms and things like that. And, uh, I think that's kind of the potential for where AI can be given the current, like, where tech- where the technology is going, um, it's very possible.
- EGElad Gil
What do you think is, um... So when I look at technologies, there's- there's almost, like, three types, right? There's things that just are never gonna work, and maybe some aspects of Theranos were that, right? Like, there were questions whether the physics of Theranos would ever work (laughs) , um, as you miniaturize things sufficiently. There's things that can work immediately today, or with a little bit of work or engineering you can get there, and then there are things that are clearly gonna happen and they're at some point in the future. So for example, in the '90s, people defined out almost everything that cell phones would do, and then eventually it got there once you had better processors on phones and more bandwidth in terms of cellular networks. Like, you need to build that bunch of stuff in terms of infrastructure, and it was clear what was gonna happen. What do you think is missing, if anything, technologically, to start to build real-world
- 4:55 – 10:23
The Spectrum of Agent Tasks
- EGElad Gil
sort of performant agents?
- KQKanjun Qiu
The way we think about agent tasks is, it's a spectrum of difficulty. So some agents are very possible today. Like, we- we see a lot of them. There are these conversational bots that take over some of the customer success workflows, and they'll failover to real customer success people if the agent doesn't know how to deal with it. Um, and those are actually, what we see inside companies, is that they actually have pretty complex reasoning workflows. Um, they're somewhat hardcoded, and so they're not general. They don't generalize to other company workflows, but, you know, we are already seeing agents. Um, and then maybe there are two spectra. There's like specific to general. So today, we have very specific agents, and over time, we can, you know... If they are better at reasoning and better at certain other things, um, interacting with your computer, then they become more general, be able to use the same agent and it'll learn something new. Um, and then...There's also a spectrum from, uh, like co-pilot to more autonomous. So today, we see a lot of co-pilots, uh, and the human in the loop, and over time, it becomes kind of incrementally more autonomous. And so I don't see it as being so binary, like, "Oh, there is a technology missing for agents," but rather that as capabilities improve, we're gonna see more and more of these use cases be eaten up by more general, more autonomous agents. There are a few categories of things that are in the way today, so I would say where we are today, we're kind of, um, in the era of, like, um, maybe lossy ethernet with no error correction or, (laughs) or like analog computer, something like that, uh, where, um, we have these models, and they don't work reliably. And that's, you know, when we talk to founders building agents, that's really the biggest thing. It's really hard to get these systems to work reliably and output exactly what I want it to output and, um, kind of do the right thing at every step all the time. And so the question is, okay, how do you get it to work more reliably? Well, there's, you know, p- a lot of why we work on reasoning, and when we say reasoning, um, kind of all of the things around, uh, getting tasks done in the world, like should... when does a system come back to you? How does it know it's not certain about its output? Um, can it kind of think through different action plans and figure out, "Okay, this plan is the better plan, and we should try going down this path first?" Reasoning is one big piece of improving reliability and second chunk of things is, like, all of this error correction, and, and I think, like, chain of thought, tree of thought, uh, these are error correction techniques. We have a lot of other techniques internally, and, uh, that also helps improve reliability. And so if we think about this problem as a reliability problem, then you can incrementally make a lot of progress on it.
- EGElad Gil
I, I loved your framework of sort of generalizability and, um, sort of that two-by-two that you had. If I look at a lot of the language models today, what I'm observing a lot of people doing is they basically start off prototyping something, say, on GPT-4, 'cause it's the most advanced model. They see if it works or not, and if it works, and they have any sort of scale, in some cases, they move to GPT-3.5, and sometimes, you know, they've thought about fine-tuning or not. But sometimes they'll move to an open source model, which works dramatically less well in some cases, but then they'll fine-tune it for a high volume use case, and it's all because of cost optimization. Basically, as you know, like, if you have a really big model, it costs a lot more for inference, in terms of compute, than, like, a smaller model. How do you think about that relative to generalizability? Because I guess if you make something really generalizable, my assumption, which may be incorrect, is it's more expensive, right? You'll need some forms of memory for it. You need some broader logical capabilities versus just saying, "I'm just gonna do the thing that's gonna, like, order flights really well," or whatever it may be in terms of agents, and so I'm sort of a little bit curious about that framework.
- JAJosh Albrecht
I think that's a... that's the right way of thinking about it. I think when Kenjun was saying, you know, the spectrum from, uh, more specialized to more generalizable, I think we're talking about the ability to solve more general problems, like the ability to do these problems that you've only seen once or twice. I think even as that ability goes up, we're still gonna see kind of, you know, a, a thing coming behind that, a force that takes each of those things. Like, maybe you start out by doing your plane booking with GPT-4, but eventually, you realize, like, "Oh, actually, like, this is so expensive and slow. Like, I just want the thing to be really good at it." But what you can do is you can apply these agents, and this is part of the reason why we're interested in agents that code. You can apply those agents to the original general system to have it go make a more specialized version of that. So it's kind of specializing the things that you're doing a lot, and you can look at, like, each of those things, like, "Okay, I'm making 10,000 calls to this. This is super expensive. Can I just write a piece of Python code that does this?" Right? As you have more general capabilities, you actually can use those more general capabilities to kind of do that specialization.
- KQKanjun Qiu
What we see with a lot of the agent builders today is that they'll use... Uh, you know, an agent workflow is complicated. It has lots of different pieces, and so they may use a specialized model for parts of it and a general model for other parts of it. And the way we think about it, it's quite pragmatic, uh, basically that as capabilities increase, uh, what we want is, like, minimal viable models for each capability. Um, and so a lot of the models are much smaller and very specific and, like, pretty specifically trained. In the personal computer kind of revolution, around that time, I think there was kind of, like, branching. So some people built super computers. They were like, "Oh, we're gonna, like, make a more powerful computer," and then other people built personal computers. And it turns out, like, personal computer's a much bigger market, and super computers, not that many people needed that much computing power. And I suspect we're gonna see something similar where a lot of use cases are gonna
- 10:23 – 14:08
Specialization and Generalization With Agents
- KQKanjun Qiu
be able to be addressed by something pretty pragmatic and relatively small. We're, like, not, we're definitely not pushing the bounds of what we can do with data today on small models, and so, you know, smaller things can work well.
- SGSarah Guo
I wanna go back to, um, what I think is, like, a really deep topic of discussion at Imbue in terms of how you define reasoning and, like, this being an area of differentiation in terms of your research efforts. Like, you know, we all have a bunch of friends at OpenAI and other labs working now publicly on multi-step reasoning and more process supervision, as you were describing. What makes you excited and confident that there needs to be a different approach versus just general language models in order to make the reasoning you need for agents to work, work?
- JAJosh Albrecht
I think there's a different process. Like, language models are great. They're really good, you know, predicting the next word. They're good at, you know, making, like, a very easy classifier. Th- they're good at all sorts of things, but there is... and there are obvious limits. Like, we know even in theoretical senses, like, they cannot learn to do multiplication in the general sense 'cause it literally doesn't fit in the context window, right? Like, multiplication, they can learn to do addition in a modular sense, and they can learn to do it actually almost perfectly if you train them in the proper way. But they're not learning the general algorithm for addition. Instead, if you want something to actually execute the general algorithm for addition, you need to have a thing that works in a different way, that has some sort of outer loop about, "What step should I take next?" Right? That's just a kind of, like, definitional thing. There has to be some other sort of wrapper. There has to be a different sort of outside process. Everyone at OpenAI and at Imbue and at Anthropic, we all, like, know how this works. I don't think anyone is proposing, like, it's just, you know, shove it all in the language model. You can get really far, but I think we're interested in, what is that other higher level system? How do we decide what is the right next step to take? When should I go collect more information? You know-... "Am I certain about this?" Like, all these kinds of other things. Those are, I think, the questions that are much more interesting, and I think there's actually a lot of work to be done there. I think we're still very early in, in the days of, like, creating these systems.
- KQKanjun Qiu
Natural language is not a bad, um, medium for it. Code is also another example of a medium for it. Language is pretty compressed, uh, and so that's, that's helpful for dealing with these situations.
- EGElad Gil
Is that one of the reasons you all, um, decided to focus on code as one of the first types of agents that you have started with? Or could you explain more about the logic behind that?
- JAJosh Albrecht
Yeah. I think, for us, code is useful when we're thinking about reasoning. One way that we are- One way that we're sort of making, uh, you know, collectively reasoning agents today is founders are just hard coding the reasoning process of like, "Okay, if there's a customer support complaint about this thing, then I do this. If it's like this, then I do that." And so you have this, like, very special case version of a thing, right? And there's a, a spectrum between code and language, or more kind of general reasoning abilities, but it's a spectrum. It's not a binary thing, I think. And so you can have code, now that we have these language models, that kind of mixes the language models and the code layer, right, where it's like sometimes you're using a language model to decide what to do, sometimes you're using an if statement. And so it's more about, like, a fusing or a, like, melding of these two different things and being able to, like, be in the right place on that spectrum. And so code is actually, like, a, a really important part of this, and as you do things that you wanna do more robustly and you wanna do in a more repeatable way, then you wanna move it more towards code, right? And so to the extent that you've never seen this task before, maybe you should be doing it in this more kind of nebulous, intuitive sense, and then over time get better at it, critique it, and turn it more into code actually.
- KQKanjun Qiu
Yeah. And p- when we see founders and ourselves building these agents, um, and people shipping them into production, um, and us shipping them internally for ourselves, like, basically, the, the agent loop can be very complex and breaks down into different chunks, and we can, like, turn certain chunks into code. Um, and it really feels like programming in a lot of ways-
- EGElad Gil
(laughs)
- KQKanjun Qiu
... um, so there's something kind of interesting...
- SGSarah Guo
Can you talk a little bit
- 14:08 – 21:00
Code and Language in AI Agents
- SGSarah Guo
about just, um, where you begin in terms of, like, how to structure the research effort, like if there are certain tasks you work on, if you start by working on policy or reinforcement on certain tasks, or if there's data you wanna collect? Like, how do you start?
- KQKanjun Qiu
Yeah. So we have this idea we call serious use, where basically we should be building agents that we want to use every day. This is actually one of the (laughs) biggest blockers. Like, it's really hard to get agents we wanna use every day because of the reliability issues. And so, uh, a lot of what we work on is coding agents, but we also work on agents for other p- operational business processes. And that kind of helps drive, "Oh, okay, like, these parts of the agent loop are really complicated. Like, can we simplify them? Can we make them more reliable?" And in a lot o- in a lot of ways, it is an incremental kind of, uh, set of work that helps us get from, like, you know, 60% reliable to 70% to 80%, and that's what forces development of new techniques. It's not like, oh, magical, you know, we train a giant model and stick everything into it and then magically it works. Like, it does not work. (laughs) Um, it'll get better at random parts of the agent loop, but that's not what we want.
- SGSarah Guo
And is the premise here, like, you start with a serious use, smaller task in code, or something like a, um, like a recruiting communication automation task, or how do you choose?
- JAJosh Albrecht
Yeah. We pick tasks kind of depending on a, a bunch of different factors. One, like, how useful, how frequent, how possible is this going to be to do, right? How generally applicable is it? How much is it gonna help push the techniques that we wanna push forward?
- KQKanjun Qiu
Does it scale to more complex versions of a task?
- JAJosh Albrecht
Yeah. So we're purposely trying to pick, you know, some with some diversity. Like, we have, you know, one agent that will just go do a random to-do in your code base, and so this can be super, super general. It can take a really, really long time to do this, right? And then we have another s- on the op- opposite end of that spectrum, is we have a agent that will look at every single, you know, pull request and run, uh, linter against it and ask like, "Okay, are there any, like, type errors? Okay, how do I fix them? All right, great. Here's, like, a PR of what- me fixing the type errors for you." But very, very specific. But really, you can imagine how... You know, you can invoke the to-do agent to fix a specific type error, and you can expand the type error fixer to do unit tests and to do security flaws and to do renaming these variables, and they sort of meet in the middle as you kinda make these things both more capable. And so they're just different ways of kind of looking at the problem of, how do we make a useful coding agent?
- KQKanjun Qiu
And Eli, to your point of kind of the specialized versus general dichotomy, um, one thing that is kinda interesting that we are seeing in agents is, like, agents can s- call sub-agents. So our to-do agent can figure out, like, "Oh, there's already a sub-agent for this thing, um, for this function you're trying to write, and, like, let me call that sub-agent 'cause it seems likely to, m- succeed, and then if I try it and it doesn't succeed, I'll do something else." And so, uh, you can kind of have this, like, more general reasoning layer and also a bunch of s- sub-agents where it... That, that general reasoning layer is actually very specific. It's a specific planner. Um, it's not that good at, like, browsing the web and things like that. But the system itself altogether is more general as a result.
- SGSarah Guo
How do you guys, um, do evaluation for both these, like, let's say, categories of agents that you're working with today from the, I assume, more, uh, closer to production grade to-do to broader coding agents?
- JAJosh Albrecht
Yeah. The evaluations, I think, are actually one of the most important parts and one of the places where we spend the most time and think about it kind of the most. There's, there's a lot of work in specifying exactly what you want from the to-do agent, for example, right? Like, how do you know... Like, it gives you back some code. Okay, is that good? There's sort of a spectrum, but, like, if it's faster it's better. If it gives you less code, that's better, but if there's bugs, that's not good. So you really need to take it and, like, break down, "What did I really want to happen here?" And I think when you start to break this down and you start to say, "Okay, there's some things that are kind of qualitative, like do I trust it? Did it come back with tests? Can I run this code immediately?" Like, the kind of feel of it. There's other things that are just for the code itself. There are different attributes. Is it in the same style? Does it have good variable names? Like, is it a minimal change, or did it change all sorts of stuff that it didn't really need to change? Each of those things are actually something that you can measure a little bit more easily than the overall task.... so you can make a, another kind of metric that's like, "Okay, how good are the variable names?" Or, all right, "How similar are they to..." You can break that down. You can kind of keep breaking it down until you get to a point where it's like, okay, I mean, you know, a regular language model, or even just a person looking at this, like, there's an objective answer. One of the reasons why we work on code is that there are objective answers to a lot of these questions, either the tests pass or they don't, either the function is correct or it isn't. Those kind of things are much easier to evaluate, and so we're starting... A lot more of our tests are in that zone as we sort of build up eventually to the ones that are a little bit more qualitative, because the evaluation is so much harder there.
- KQKanjun Qiu
Uh, I think the whole, the strategy of breaking these things down, like, basically, the strategy is we take the output or the answer and we, like, ask a bunch of questions about the output, and then we evaluate those questions. And we also evaluate the, the output. And the interesting thing about that is, it scales pretty well to, like, non-code tasks. So for, like, our recruiting tasks, we can also do a very similar process. Um, I think part of why a lot of teams try to work on just math or code reasoning is because those are the easiest to evaluate and, like, the clearest answers. But, um, but just relying on, like, is the output correct or not, that loses a lot of information in the evaluation.
- SGSarah Guo
Yeah. I, I think it's likely to be a pretty rich space. I'm curious for your point of view. But we've looked at a lot of startups building, let's say, interesting AI, like, development tools, right? And, and one of the things that m- we've spent a bunch of time thinking about is, like, what makes for a good scalable eval loop, right? And that could be objective and easy to test, right? Like, does it compile to things that, as you said, might be richer in data? Like, how easy is it to check the functionality of something, right? Do you have to do static analysis? Is the performance better? Are there examples if you wanna focus on a particular problem, like, let's say, like, Python 2 to 3 upgrades or something, right? I think one of the things that's most attractive about this domain is there are lots of ways to evaluate, right? Even beyond the, the contributions to reasoning that, that you guys described, and it's just gonna be productive. Uh, I mean, maybe on that topic, like, do you, do you guys think of yourselves as a product company? Like, is it important to go get this, um, functionality in front of users or, or, or just focus on research and, and sort of how do you think about that sequencing?
- KQKanjun Qiu
Yeah. Of course, we're a product company. We're a company. Um, but I think looking at the history of computing, there is, like, a right time for technology. Uh, and today, I think what you see, what you both see is that it's pretty hard to make agents that work that, like, can be productionized and used all the time. And it's because, like, the technology is just not there yet. And we use reasoning as, like, a bucket term. We've described a little bit all of the nuances of what we're actually trying to do to get agents to work, and then we lump that all under the term reasoning because it's easier for
- 21:00 – 26:39
Evaluating AI Development Tools Efficiently
- KQKanjun Qiu
people to conceptualize. But the reality is, like, what we're trying to do is to make kind of a system, a set of tools and maybe frameworks that actually makes it so that we can build reliable agents really fast, really easily. So today, my, uh, like, writing agents feels like writing a code in assembly, and that really limits the types of agents we can build and also limits the number of people who can build them. And kind of what we're, what we're going toward is, like, programming lang- programming languages that are a little bit more ergonomic, uh, where we can build agents much more easily, where they can work much better, and where a lot more people can build them. Whatever it is that we release, that's what we hope is go- it's going to enable. Um, and so that's why we kind of work on different parts of the stack. We work on the underlying models because there need to be, like, more specific underlying models that work, uh, for specific things, and that's what allows a lot of these capabilities, um, and agents to be more liable. We also work on other, other pieces of it as well.
- SGSarah Guo
Maybe if we just project forward a little bit, like, what are you guys most excited about? You wanna be a, um, tools at different levels of the stack company. Uh, what are you imagining people build or what are you already seeing people build that you think is gonna be, let's say, useful a year from now and useful five years from now?
- JAJosh Albrecht
Yeah. I think a year from now, we're gonna start to see some of these use cases actually work that today you could, you can write these. Like, we have the capabilities. You can make some kind of agent to triage your email or to do scheduling or... Many of these workflows that, like, we really should... Like, why don't we have that today? That definitely can be done, right? Like, there's nothing stopping us. Uh, and I think five years from now, we're gonna have something where it's not just, you know, "Okay, we have a scheduling bot, we have this other thing," but we really have these more general, more robust systems where each of us can individually say, like, "I want a thing that does this. I wanna do this particular weird research workflow, and I want it to work like this," and blah, blah, blah, and just specify in language. I think one thing that our recruiter-
- KQKanjun Qiu
Personalized agents.
- JAJosh Albrecht
One, yeah, one thing that our re- recruiter mentioned yesterday and that I thought was kind of funny is he's been describing to candidates, like, "We're actually sort of a software dev tooling company." But the idea is that in the future everyone is going, like... As we make these things easier and easier to program, really everyone is gonna be a, like, sort of software engineer in that sense. Like, we'll be able to make our own agents, right? Well, just by sort of working in natural language and, like, describing what we want to do and how we want it to be done and interacting at that level. And so since we're gonna be working with these agents, we're kind of making... We, we're, like, trying to move towards that kind of tooling. And so I think the goal in five years is for people to be able to really specify some huge range of possible agents that, you know, that do exactly what they want, like, they can interact with their computer in whatever way they want.
- KQKanjun Qiu
I think specifically what she said is, "We're a software dev tooling company, but in the future everyone will be a software engineer, and so everyone will need dev, dev tools."
- JAJosh Albrecht
Yeah. (laughs)
- KQKanjun Qiu
Um, and we think of agents... Agents are a very technical term, uh, that's like the specific memory architecture of the computer. Um, but agents, what they enable is, they're like a natural language programming language. And so in the future, you know, it's, uh... You know, computer is... Programming computers today, a way to think about the problem is that it's really not very intuitive to get our computers to do what they, we want them to do. And computers have been becoming more and more intuitive over time, and the best tools are very intuitive. And so one day... You know, language is very intuitive to us. Vis- like, vision, kind of seeing, understanding things that way are very intuitive to us, and our computers will become much more intuitive so that people can make them do what they want. More people can.
- EGElad Gil
One, one, uh, major milestone, um, that you had recently was you announced a $200 million fundraise from, uh, Astera, NVIDIA, and a variety of other folks. How do you think about, um, what proportion of that will go to things like compute versus team, and how in general should AI companies think about, uh, the capital they raise and how to deploy it relative to different potential objectives and outcomes?
- JAJosh Albrecht
I mean, I think actually a significant fraction of that is going to go to compute. I think... I can't speak to other companies, how they should deploy it. But I think for us, given that our goal is to make agents, what, what we really want, actually, as a company is not to become a huge company. We don't want tens of thousands of people. We wanna make our product actually work so that we can make AI agents, so that we can have some huge impact and have a relatively small, close-knit team where the communication is much easier. It's really hard to communicate with 10,000 people. It's much easier to get 100 people in a room and know, you know, what the heck you wanna do and, and agree on things. And so I think we're trying to ideally leverage ourselves, and we're already starting to do that today. And then what that looks like is by spending a bunch on compute. Today, you know, we don't have AI agents that are running off and doing all sorts of things on their own, but we do have the beginnings of those. You know, we do have our internal, you know, hyper-parameter optimizer, for example, which saves us a ton of time. Instead of our researchers manually deciding like, "Oh, this learning rate, I should do this experiment," we just let go. We come back a- after the night and it's like, "Oh, great. Everything is optimized. This is really nice." Right? But that used a lot of compute. Like, we're using a huge amount of compute relative to each person.
- KQKanjun Qiu
Yeah. We're ki- like, training state, state-of-the-art models with like 14, 13 people. Um...
- JAJosh Albrecht
Most of us are not working on training the models or the infrastructure even.
- KQKanjun Qiu
Large, large models. And most of us are not working on that. Um, and so it's... Uh, the to- total team size is very small for the... what we're able to do because of the way we think about our infrastructure. It's like a very agentic (laughs) kind of approach to infrastructure.
- SGSarah Guo
It's now sort of broadly viewed that there will not be a fully monolithic architecture for lots of useful models, and people have, like, mixture of experts and such. Given what you wanna do with agents, with, like, planning and reinforcement learning and more test time compute, like, I think it's sort of a, uh, belief among the largest research labs that under 5,000
- 26:39 – 32:48
Prioritizing GPU Usage
- SGSarah Guo
GPUs, like under some, um, reasonable level, like, you cannot compete on state-of-the-art reasoning, at least as the core LLMs describe it today. Obviously, that, that bar keeps moving. Um, does that number apply to you? Do you think the architecture is just very different?
- JAJosh Albrecht
I, I mean, we actually have a lot of GPUs, so uh-
- SGSarah Guo
Yeah. We have more than that.
- JAJosh Albrecht
... that number, number may or may not apply, but we do have a lot of GPUs. We have enough compute to be able to train models that are as large as the largest models that have been trained today, to date.
- KQKanjun Qiu
So, we have a ton of compute. We can train these really large models, uh, and it may not be the best use of our time or resources actually, because, um, I think just as with, like, computers, things just get more efficient. And what we see is that things are getting more efficient in training. So, like, learning how to use data more effectively, uh, so that the models get much better performance with less data. Um, learning how to, uh, like, do training runs so that-
- JAJosh Albrecht
So things don't, you know, diverge. Like there's also this-
- KQKanjun Qiu
So things don't diverge. We're not having to like rerun the same thing again. There's like a bunch of s- there's a bunch of like, uh, hyper-parameters to set and like tooling to build around it and monitoring and stuff like that, that just makes it more efficient to train these things. And then also, I think the data piece is just so big and so underexplored. Like, people don't really... Um, we all know that data is the thing that matters. Um, and I think like a lot of efficiency gains are gonna come from better data. Um, and so that's actually quite a bit of what we work on.
- EGElad Gil
Could you tell us a little bit more about why you decided to focus on coding, and what are the types of systems you're really focused on building?
- JAJosh Albrecht
Yeah. So there's a, there's a bunch of different reasons for focusing on coding. One of them is that the evalu- like we talked about before, the evaluations are much easier to do and subjective. Another one is that coding is part of reasoning. Another one is that coding really helps us accelerate both our own work and the agents that we end up building. So, as we're making the tools for ourselves, we already are starting to see like this kind of leverage from the systems that we've built where like, we can run this agent now, you know. Or I think probably within the next year, we'll probably, you know, not be hiring as many recruiting coordinators because, oh, we're gonna do some of the scheduling with the agent that we built, right? But we also can do the same thing on the software engineering side. We're writing unit tests literally right now automatically. Okay? And that's just helping accelerate us, helping, you know, remove the bugs. It's additive. It's incremental. It's like, "Okay, we get a 5% gain, a 10% gain here." But as we make more and more tools, those things compound. And I think over time, it's gonna be possible to make much more robust systems much more quickly. And we're building... You know, we're using these coding agents to write the coding agents. And I think this is kind of the like, you know, sort of recursive self-improvement thing that people have always been sort of worried about or, or excited about in, in AI. But I think what it really looks like in practice is not this scary, like, oh, y- you know, you leave your computer on overnight, and all of a sudden, it's a super, super god thing the next day. Instead, it's like this slow grind of making things a little bit better every day, but a 1% improvement every day over a year is huge. Right? And so I think that's the kind of thing that, that we're really excited about with code, is that not only can we apply it to our own workflows, but also as we start to actually get coding agents that can really write code, now we're in a very unlimited, like very interesting space. Right now, the bottleneck for most companies is the ability to hire software engineers that can write really robust code, right? But if you can just turn compute into really good code, now this is a totally different world. Now, there's none of this like, "Oh, yeah. Well, Imbue is so much smaller than this other company, blah, blah, blah." It's like, "No, no. We, we can write way more code than anyone else." Right? So I think this is kind of a pretty interesting thing that over time, I, I think, uh, you know, we'd like to work towards. And so that's another reason for code as well.
- KQKanjun Qiu
There's also, code is really useful for action, so agents acting. Um, and today, you know, even, uh, the models can like do really simple things like write code to write integrations, uh, like API integrations, and so that saves us a lot of time, writing API integrations, which is super annoying. Um, I also think like software is just dramatically underwritten because it's so hard to write code today. So, you know, as we said, in the future, like computers will be able to be programmed by regular people. What that means is like we're gonna write way, way, way more software all the time. Um, and like people will write software, but maybe not by having to write code. The agents write the code.
- JAJosh Albrecht
I think it's not only just more software, but also better software.... right? If, like, already we're having our agents kind of look at our pull requests, you know, fix the time errors, okay. But we can extend this to adding new unit tests, to fixing the existing unit tests, to looking for security flaws. Like, I'm very excited about agents that can go out and help all sorts of organizations improve the quality of their code base. How can we simplify this? Refactor it. Fix security flaws. I think there'll just be a huge flourishing of much higher quality, better software as a result. Not just more software, but just taking the existing software and making it so much better, which will make it so much nicer and more fun to interact with as programmers as well.
- KQKanjun Qiu
Also, uh, much more custom software. Uh, like something that we do some of is generating interfaces. And it's pretty interesting. Like, if I can have, like, a custom interface for whatever it is that I'm trying to do, it has exactly the right form fields, it's, like, kinda nice, and then I can cache that interface and, like, reuse it, um, so, you know, pragmatic, but pretty interesting.
- JAJosh Albrecht
Yeah. I mean, I, I did this over the weekend actually for Midjourney. I got really sick of typing out Midjourney prompts on my phone, uh, in Discord 'cause you can't keep iterating the prompts. So I just made, like, a little, a little thing that interacts, you know, with it via the API. Well, my version of the API. But, uh, yeah. And so, but I think everyone will be able to do this. Like, it didn't really take that much code. When we have agents that can write code, someone else who wants to use it in a different way, great. You can just, like, ask the agent to do that, come back five minutes later, and you have, like, your own perfect way of interacting with this. I think that's just gonna make our computers feel so much nicer to interact with.
- SGSarah Guo
That, I think, is an inspiring note to, to end on. We're gonna have 25% companies who can change the world. (upbeat music) We're gonna have more software, more custom software, and higher quality software, um, for us all to use. So thanks so much for doing this, uh, Josh and Kenjun.
- JAJosh Albrecht
Yeah. Thanks for joining us.
- KQKanjun Qiu
Thank you, Sarah and Joan.
- JAJosh Albrecht
Yeah. Thank you.
- SGSarah Guo
Find us on Twitter @nopriorspod. Subscribe to our YouTube channel if you wanna see our faces. Follow the show on Apple Podcasts, Spotify, or wherever you listen. That way, you get a new episode every week. And sign up for emails or find transcripts for every episode at no-priors.com.
Episode duration: 32:48
Install uListen for AI-powered chat & search across the full episode — Get Full Transcript
Transcript of episode xvMOsYyAdKk
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome