No Priors Ep. 123 | With ReflectionAI Co-Founder and CEO Misha Laskin

Superintelligence, at least in an academic sense, has already been achieved. But Misha Laskin thinks that the next step towards artificial superintelligence, or ASI, should look both more user and problem-focused. ReflectionAI co-founder and CEO Misha Laskin joins Sarah Guo to introduce Asimov, their new code comprehension agent built on reinforcement learning (RL). Misha talks about creating tools and designing AI agents based on customer needs, and how that influences eval development and the scope of the agent’s memory. The two also discuss the challenges in solving scaling for RL, the future of ASI, and the implications for Google’s “non-acquisition” of Windsurf. Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @MishaLaskin | @reflection_ai Chapters: 00:00 – Misha Laskin Introduction 00:44 – Superintelligence vs. Super Intelligent Autonomous Systems 03:26 – Misha’s Journey from Physics to AI 07:48 – Asimov Product Release 11:52 – What Differentiates Asimov from Other Agents 16:15 – Asimov’s Eval Philosophy 21:52 – The Types of Queries Where Asimov Shines 24:35 – Designing a Team-Wide Memory for Asimov 28:38 – Leveraging Pre-Trained Models 32:47 – The Challenges of Solving Scaling in RL 37:21 – Training Agents in Copycat Software Environments 38:25 – When Will We See ASI? 44:27 – Thoughts on Windsurf’s Non-Acquisition 48:10 – Exploring Non-RL Datasets 55:12 – Tackling Problems Beyond Engineering and Coding 57:54 – Where We’re At in Deploying ASI in Different Fields 01:02:30 – Conclusion

Sarah GuohostMisha Laskinguest

Jul 17, 20251h 2mWatch on YouTube ↗

EVERY SPOKEN WORD

120 min read · 24,068 words

0:00 – 0:44
Misha Laskin Introduction
1. SGSarah Guo
  (music plays) Hi, listeners. Welcome back to No Priors. RL is back with a vengeance. And one of the most talent-dense new research labs has a product release, a new code comprehension agent. Reflection AI's co-founders, Misha Larkin and Yannis Antonoglou, worked together as leaders at Google DeepMind on groundbreaking projects like AlphaGo, AlphaZero, and Gemini. I talked to Misha about building universal superhuman agents, the trickiness of reward modeling, bringing all knowledge work tasks under data distribution, how RL for language and robotics differs, the windsor of non-acquisition, and the landscape from here. Misha, welcome. Thank you for doing this.
2. MLMisha Laskin
  Yeah. Thanks, Sara, for having me.
0:44 – 3:26
Superintelligence vs. Super Intelligent Autonomous Systems
1. MLMisha Laskin
2. SGSarah Guo
  So it's been, um, a- about a wild, like, year and a half since you guys started the company. Is that about right?
3. MLMisha Laskin
  Roughly a year and a half. Maybe a bit less, but I'd say it's ballpark correct.
4. SGSarah Guo
  Well, can you just start by describing... Y- you said that the company's mission is to build superintelligent autonomous systems and we've talked before about why, like, this is the moment in time that's possible. What is different about that from building just superintelligence, which is now a sort of more popular ambitious goal?
5. MLMisha Laskin
  At a high level, it's fairly synonymous, uh, but maybe there are different ways of thinking about how to build superintelligence and what that might look like. I think on one spectrum, there's an academic way to look at it, uh, which is, uh... And to some sense, uh, to some extent, um, superintelligence in that sense has already been achieved. So, uh, like AlphaGo was a superintelligent system, and there were other systems during that time that were built that were superintelligent in narrow domains. And I think you can go for the goal of building a very broad superintelligence by, you know, kind of locking yourself up in an academic... Or, uh, uh, it's not really an academic, but kind of an industrial lab with, um... that is sort of kind of decoupled from, uh, product or customers, and kind of max out all the benchmarks that are out there, uh, and build superintelligence that way. I think that is, that is one approach. Um, I think the other approach is to kind of think about what is superintelligence more concretely, how is it going to be deployed, what is it actually going to look like in people's hands, and build backwards from there. So I would kind of say that that approach is more kind of co-designing product and research together. Now, the kind of benefits of that approach is that you're kind of, uh, ma- uh, you're optimizing for real problems. The cons to it is that you have to be a lot more focused, right? Because your, your product kind of defines the sort of capabilities that you want to dr- draw out of the system. And you have to start out a lot more focused before expanding, um, across, you know, other product categories and other capabilities. So, I would say that on the spectrum of companies that are kind of superintelligence, um, in just a research lab, and then figure out what the product is, you know, once it's built, as opposed to co-designing product and research together to build very powerful systems, uh, in what I would call kind of, um, ASI-complete categories. You can pick something that is, uh, maybe too small of a category to draw out a superintelligence. As long as you pick a category that I would say is kind of big enough to be ASI-complete, um, I think... And, and this is kind of our approach at Reflection, is it makes a lot more sense to be focused and co-design those two things together, the product and the research.
6. SGSarah Guo
  I want to come back
3:26 – 7:48
Misha’s Journey from Physics to AI
1. SGSarah Guo
  to, um, choice of initial problem, uh, in, in a minute. In terms of just having the intuition, uh, and the, the confidence to say, like, "We can go do this as a team. We're going to recruit great people and go build Reflection," you and your co-founder, Yannis, were working at Gemini together in key roles before, and previously, you had been, um, part of Pieter Abbeel's lab, who's an amazing researcher as well. Um, you described to me as having, like... I believe the term you used was somewhat muscled your way into AI and deep learning from originally a physics background. Like, how did you decide to go work on this and end up in Pieter's lab?
2. MLMisha Laskin
  Yeah. As a, as a kid, uh, I became really interested in physics, um, theoretical physics. Uh, it was, I mean, probably a byproduct of... I'm, I'm Russian, uh, kind of Israeli-American, and moved around. And then when I landed in the States, it was, uh, kind of in a desert in Washington State, uh, learning a new language. And so, uh, I had a lot of time on my hands, and, you know, bumped into... Um, and my parents had, uh, had the Feynman Lectures, uh, in their, in their library. And so I, uh, spent a lot of time, you know, just reading what, what was on the shelf and bumped into that, and got really interested in physics.
3. SGSarah Guo
  How old were you?
4. MLMisha Laskin
  I was... So when my interest in physics started, that was probably, um, around middle school. And it really, I think, became the thing I wanted to do in, in, in high school. And the reason physics was so interesting was because it kind of seemed like the science that was at the root of many of the things that became impactful. Um, right? So I was reading about the history of the transistor, and it was invented by a group of theoretical physicists. I was reading about, you know, how GPS works. So, it turns out you need special relativity in order to, um, accurately account for, uh, spatial coordinates using, using GPS. And so I felt that physics was kind of the, the r- root science to pursue. I, I went and studied it, got my PhD in it. At the same time, I started seeing, uh, kind of deep learning take off, and really, uh, saw kind of AlphaGo happen. And my sense was that, uh, I want to pursue the, kind of the root science, uh, but there is a such a thing as kind of the root science of our time. Uh, I think a lot of physics has, uh... as a field, it's very interesting, but it's crystallized a lot more than, um, you know, than a new dynamic field that was being born out of nothing. Uh, and AI, to me, felt like it was going through the moment, uh, that physics went through maybe 100 years ago, that when I do problem sets, right, I did problem sets in physics, and the most exciting stuff that I was working on there was basically the things that people were discovering 100 years ago. So, I saw it kind of happening in front of my eyes and, uh, I just decided that...... that was the science to bet on. Uh, and in particular, because it was AlphaGo that was, that inspired me, because it was just unbelievable to me that you could train a neural network, um, to have such immense kind of basically reasoning capabilities, right? This thing was able, was super intelligent within the realm of Go. Yeah, I decided that I needed to kind of get myself into the best reinforcement learning lab, um, I could, um, and Peter's was, uh, Peter's lab was, was that lab for me.
5. SGSarah Guo
  And then you and Yannis were working specifically on RL at Gemini?
6. MLMisha Laskin
  That's right. So Yannis, my co-founder, was the, um, overall RL lead for Gemini at the time, um, for 1 and 1.5. Uh, I was, uh, working very closely with him on his team. Yeah, it was a really exciting time, because, uh, you know, we went both of us from being reinforcement learning researchers, uh, to, um, training large language models at scale, and we kind of saw at the end of that project of what's to come, which was, you know, Gemini 1, 1.5 lands. And it became pretty clear to us that the next paradigm, and effectively the, the final paradigm, um, that we need to have in place before, uh, a, you know, what people used to call AGI or now I think the goalposts have shifted to ASI, is reached, is just figuring out how to scale reinforcement learning, um, on top of large language models. And the first instances of that have, have been happening, right, over the last year. Uh, I think we're still actually a lot earlier than people think, uh, but there is a wedge in and things have started to work.
7. SGSarah Guo
  Yeah, I definitely, uh, I definitely want to talk about what you think is solved and unsolved here. Um, the entire field has clearly gotten more focused on, um, deep reinforcement learning over the last 18 months.
7:48 – 11:52
Asimov Product Release
1. SGSarah Guo
  You have this, uh, huge product launch this week with Asimov, um, can you just sort of describe what it is?
2. MLMisha Laskin
  So Asimov is, uh, the best code research agent in the world. I- it's a comprehension agent, meaning that it's really designed to kind of feel almost like a deep research for large code bases. The way a developer is supposed to feel interacting with it, is effectively like they have a principal level engineer who deeply understands their organization at their fingertips. Uh, so it's very different from the existing set of tools that, uh, focus primarily on code generation. Like, every single coding tool has some code generation and c- and some comprehension aspect, but as we spent a lot of time kind of w- with our customers, um, trying to understand why coding tools... And this is enterprise specific, so I think, I think the w- the world is different with startups. But within enterprises when you, you know, they're adopting coding tools and you see the impact that this is having, um, on their actual productivity, and I think it's much lower than people, uh, expect. Um, so it's, uh, in fact it's, it's sometimes negative, sometimes negligible.
3. SGSarah Guo
  Did you see the recent Meter report on that?
4. MLMisha Laskin
  Yeah. The, the, the Meter report was very close to what I've been hearing when talking to engineering leaders within larger organizations. And it's not just enterprises, it's, and I would say growth stage startups, it's any kind of engineering organization that has a sufficiently complex code base and sufficiently large team, that no one engineer can have the entire code base kind of in their heads. And so Reflection is one of those places as well. Uh, we use our product actively, uh, because the, you know, training large language model's complex, and there's, right, the large language model codebases, and there's the product codebase, um, knowledge is kind of scattered across engineers. It's not just in the codebase, it exists in your chats and project management tools, and, um, other places where knowledge lives. And so what we're effectively building towards is this, uh, an omniscient oracle for organizations that, uh, you can go in, uh, ask any question at kind of any level of complexity, and it will provide you an answer at the level of what that principal level engineer would have given you. Or, you know, in the future as the product expands to other categories, um, what the person who's most embedded in the organization understands. Um, and, of course, once you have that solved, it begets much more reliable agents that act for you as well. Um, but I think the world today is focused on, I would say, 80% kind of action, 20% understanding. E- so 80% code generation, 20% comprehension. The actual problem is exactly the opposite. That when you look at what an engineer does in an organization, 80% of their time they're spending trying to comprehend complex systems and, um, collaborating with teammates. And what is collaboration? It's usually someone asking someone else a question about a system that they don't know. And so that, I think, is kind of the problem at the heart of what would prevent a super intelligence from actually working within an organization. It's really this kind of understanding and being able to ingest from a lot of sources of information and from the team. And once you have that, then the action part I think becomes, uh, I don't want to say trivial, but a lot easier. Like, to me it seems like really 20% of the problem is teaching these agents how to act, and it's more or less solved.
5. SGSarah Guo
  That definitely squares with both my understanding of engineering and then my experience with coding agents personally, right? If, if you think about the, I don't know, the like context load time of just to like trying to understand a new system or code anyone else has written or code your agent has written, in the end it's like, you know, very stupid, um, implementation that like if you had reasoned through it with context of the system, you never would have made such a mistake. Or like a, you know, works in, works in my environment type problem. Um, and, and, and so I, I think that very much mirrors my, you know, intuitive understanding of engineering
11:52 – 16:15
What Differentiates Asimov from Other Agents
1. SGSarah Guo
  here. That's great as problem formation. What, what makes Asimov different in terms of ability to understand better versus just generate code?
2. MLMisha Laskin
  There are a few things. So I think this is kind of where, you know, why it is so important to co-design research and product, because as a researcher you'd go in and say, "The answer is entirely in the agent design or the model or something like this." And as a product person you would say, "Well, it's in these product, you know, differentiators like..."... being able to draw not just from your code base, but knowledge that lives, you know, in other sources of information, or being able to learn from, uh, the engineering team to offload, uh, their tribal knowledge. So, right, an engineer can go in and teach Asimov like, "Hey, uh, we deploy our..." You know, w- when we say environment jobs as in our team, we mean this specific thing, which we mean kind of Google bat jobs. So now when another engineer asks a question about environment jobs in the future, the system just knows what they're talking about. A lot of knowledge is stored in engineers' heads. And I think you need, um, both of these things. You need to understand your customer really closely and develop differentiated product almost independently, right, of the models that are powering it. Um, but then you also need to innovate on the research, uh, in terms of agent design and model training to actually drive the capabilities that you want to see out of the system. And this becomes an evaluation problem, which is basically at the heart of any, any frontier lab as well. This is, uh, I think the least spoken about part of what frontier labs do, but possibly the most important, which is figuring out how they evaluate. Like, what makes Claude magically feel better at code than, um, you know, another model out there? Um, they did something right in their evaluations. So when you look at this problem specifically, there are different capabilities that you need to, um, train. And, and what we do is we really post-train models where, you know, we really focus on, on post-training today. Some of these things are long context reasoning. Now, when I say long context reasoning, I don't mean, um... Uh, I actually mean kind of small models with very long contexts that are able to go into giant code bases, sort of suck up as much information as they can, and reason over an output relevant stuff, basically. So it's almost like neural retrieval. There are capabilities like, um, tool use and multi-hop reasoning. So this is more for, uh, you have your agent and it's designed with some tools, and there are two ways of training, um, agentic models. One is in this very general way where you just train it on thousands of environments and make it, like, the most general agent possible. And that is kind of almost like the pre-training of agents. Um, and that's sort of what, you know... that's what a frontier lab does. Um, that's what, um... This- there's a new release from, uh, KIMe-2, that's kind of what that model does. And that's definitely part of it. But in order to... Tha- that kind of gives you a nice general base to start from. Um, but then to drive a capability kind of depth-wise, like if you really want this reasoner that has, you know, search tools and, you know, ability to call, like, these, uh, long-reasoning context models and other, you know, other tools that it might want to interact with like, "Oh, when do I, when do I read from Jira? When do I read from, um, another tool?" Like, this is kind of a, a reasoning problem. If you train with those specific tools in mind, uh, that's typically what people refer to when they, when they say tool use. Like, they actually train for a specific set of tools and really drive, like, the capabilities, um, for those tools. So these are the kinds of research problems that you need to solve in order to build the overall system that's the best in the world. It's not any one thing, it's all these things combined. Um, and some examples of systems that are being trained for a specific set of tools, the th- thing that comes to mind is, uh, Groq, the Groq4 release, and they kind of showed a plot of their general model and then the model that was trained with a tool to, um, basically climb on humanity's last exam. And there was, um, some big noticeable difference between the two. Now, that's great, but I think the downside of that is that does humanity's last exam actually matter in any meaningful way for a- an end user? And I would argue that some weak correlation, but the answer is most likely no. Uh, and so you have to build the tools and train for the things that users actually want. I think that there's sort of no
16:15 – 21:52
Asimov’s Eval Philosophy
1. MLMisha Laskin
  way around that.
2. SGSarah Guo
  What can you share about how you evaluate either, like, technically or, um, philosophically that, uh, makes Asimov's performance great?
3. MLMisha Laskin
  This is sort of why it makes sense to do something like this as a startup. Um, so the only, the only advantage that you'll ever have as a startup, um, over a big incumbent, um, especially when there's such talented teams out there, uh, is kind of focus and velocity against the thing that you're focused on. Now, I think you need... I- if you want to be playing in what is, you know, arguably I think the biggest category in AI, which is coding, then you need, you need to have the talent as well to do it. Um, but, you know, what do you do if you don't have the billions of re- you know, of dollars to, uh, pre-train models? The only way we can win, I think, is by being, um, very focused. So the way I would, you know, describe what does it look like to work on, uh, a big model within a, you know, within an e- an incumbent lab is that you are one of, like, hundreds of evals. There are teams... You know, w- when you look at, um, the model card for, let's say, the 01 paper, um, that came out, I think last year, if you look at the distribution of what most people worked on in that, on that paper was evals. So you're one of, you know, many people doing all sorts of evals. Um, and spreading yourself in that sense, you get something that's general, but it's spread fairly thin. As a startup and a startup that has a very focused product that didn't, um... you know, that's not kind of being too diffuse and it's pretty opinionated about what it is that it's building, your evals are basically what, you know... I- in the startup lore when, I don't know, Paul Graham would tell you to kind of go talk to customers, like half the time build product, half the time talk to customers. I think in the AI age, it's, um, develop your evals based on what customers are saying and what they're doing. So you have to work with your customers to look at what prompt it is that they're, uh, you know, trying to solve, what general questions are they trying to unlock. So, right, there's very specific pain points that, um, you know, we've identified, like onboarding being one of them. Like, in a big company, uh, it takes months to onboard an engineer. So-How do you develop evals that accelerate the onboarding of an engineer from, you know, months to, hopefully, just a couple of weeks now that, you know, all the questions that they had, they can just ask Asimov and, um, be able to onboard much faster. So I think there's no, there's no silver bullet other than coupling to the information coming from customers, but then being very scientific in the evals that you develop across them. So you have these, let's say, customer needs, let's say onboarding and, you know, a bunch of others, um, and then you have your system capabilities which is, well, what do you need in order to provide a good experience there? Um, well, this customer is being onboarded onto a giant code base. Like, it has, uh, you know, it might be a code base that on its own is like 100 million tokens or something. Well, then you need to figure out some way to reason over that giant code base so you have kind of a long context reasoning capability, or you kind of look at your agent and seeing, like, what's preventing it from satisfying this query from a, from a user? Um, and, and so you kind of work backwards and reverse engineer from what a user is asking for to what capabilities you want to drive in your system. But the important part, I think, is to be able to tweak every part of the system from, you know, the product features, to the agent design, to the model training, uh, in order to build the best overall system. And if you are capped in which parts you can change, like if you can only change the products and agent design, then you're actually pretty limited in what you can do because you're kind of at the mercy of, you know, what, um, kind of these general third-party models can do.
4. SGSarah Guo
  What I'm hearing from you is also that there is some trade-off between, uh, having, you know, uh, to serve all different kinds of users and, um, optimizing across those different evals, because each one of the teams that is thinking about a particular use case or audience at a, um, more general organization, for example, is less likely to have the ability to work through the entire pipeline from training to product t- to win their use case.
5. MLMisha Laskin
  So the thing that was extremely satisfying about working on Gemini is that you're driving research in the frontier, and there's something very gratifying about that. The downside was that you were so far away removed from product that it was kind of a broken telephone game of talking to, th- they're, kind of four different people that information flowed through before the model got into a customer's hands. That coupling was very loose, and I think it's very true that, um, just because, uh, a company might have the best model in some general set of, um, academic benchmarks doesn't actually mean they have the best product. Uh, and I think what we're seeing is when things really fit together, it's usually that there's a, you know, a tight coupling between a product and a model that, that's a whole system, it's not just the model alone. Um, obviously, the first big example of that was, uh, ChatGPT, right? ChatGPT is kind of an incredible product that was coupled with the model, and the model was post-trained for the prompts that were coming in from users for ChatG- from ChatGPT, like there was a reason why it was... You know, when I saw the first coding blog post that ChatGPT produced for me, that was, that was just insane. That was like, uh, an insane magical moment, and they post-trained specifically for that. And I think there's an- another example of that happening right now with QuadCode, um, that's kind of tight model to product coupling, and, and so I really think that that's, it's important to really be able to do both at a great degree of excellence.
21:52 – 24:35
The Types of Queries Where Asimov Shines
1. MLMisha Laskin
2. SGSarah Guo
  What is an example, as you guys open up the wait list, that you want users to try where it should just be, like, obvious that the answers are, are better than other coding agents?
3. MLMisha Laskin
  I think the kinds of, um, queries that it tends to be better at are, I guess, what we would call semantic queries. So let's say, like an example of a query where this is not the best system to use, it's like file level. If you're looking at a file and there's like a specific thing in that file and you're just trying to get a quick answer to it, you don't really need the hammer of like a deep research, like, experience. Um, you don't need to wait, you know, like tens of seconds or a minute or two, uh, to, to get that answered 'cause that should just be delivered snappily. But if you, um, don't exactly know where you're looking for, and you, you know, you don't know the function name or you don't, you know, something... And this is kind of the hard problems that engineers are usually in, like there's a flaky test. I mean, you know that, that this test is flaky, but that's where your knowledge stops, right? And that's when you usually go to Slack and ask some engineers, like, "This test is flaky. What's going on? Does anyone know?" Um, you know, w- we've had, uh, the way we've used it is when you're training these models, there's a lot of infrastructure work that goes into it, and, um, it fails in interesting ways all the time. Uh, and asking things like, you know, "My jobs are running slowly, five times more slowly than usually. Why is that?" Right, that's kind of a vague query that would be very hard to answer with existing systems. Um, especially since the knowledge around that query might live not just in the code base. So in the example that I just brought up, um, when this was happening that our kind of environment jobs were slowing down, uh, it turned out that two different teams, kind of infrastructure and research team, submitted, um, pull requests that were, they passed the tests, it wasn't that, um, they were wrong, but they kind of conflicted together in a way that caused this kind of, um, effectively a race condition, uh, and slowed everyone's jobs down. And these are the kinds of bugs that actually engineers spend, you know, that's where you, you have like two or three engineers who spend a few days trying to solve one of these. Um, so I think these kind of semantic queries, um, tend to be the place where, where a product like this shines in the same way that when you think of, uh, what kind of query would you ask ChatGPT to, you know, when it just needs to use kind of the browser tool? So it's like a quick factual thing, like you wouldn't invoke the deep research experience. But when you want it to be...... compile kind of a, a lot of information around some more nebulous query. Uh, I think that's where people seem to find a lot of value with deep research. So I think a similar kind of, um, mindset holds
24:35 – 28:38
Designing a Team-Wide Memory for Asimov
1. MLMisha Laskin
  here.
2. SGSarah Guo
  One thing I would do, you know, working on new system with principal engineer next to me, is just have them explain the entire system, right?
3. MLMisha Laskin
  Yeah.
4. SGSarah Guo
  Um, uh, because I want to have that context where I can't, I can't even tell the agent what to do. Um, and so I'm, I'm curious from a product perspective, like, uh, the way you have, you know, memory for agents or even for teams is an increasingly popular idea. There's lots of ideas about how, um, how to do it. I think there are not many examples of, like, collaborative memory in production in a useful way yet, but I'm sure it is coming. Um, have you guys designed it in a form like I can understand too?
5. MLMisha Laskin
  Yes, that's... So it's, this is actually one of the more fun things to, I think, work on in product today, and I think it's one of the more fun kind of features to work on at the company, is, um, how do you design a team-wide memory?
6. SGSarah Guo
  Mm-hmm.
7. MLMisha Laskin
  Because there are all sorts of details around, well, who can edit the memory? Um, who can view different parts of the memory? Uh, how do you, you know, how do you maintain a kind of repository of, of this memory for people to edit and view?
8. SGSarah Guo
  You have to have a concept of authority, right?
9. MLMisha Laskin
  Right.
10. SGSarah Guo
  People are gonna say things that are wrong.
11. MLMisha Laskin
  The way it's worked with customers we've started working with is, uh, they typically have... They, they wanna start off with kind of a group of trusted, kind of senior, you know, staff level plus engineers who are kind of the gatekeepers, which is a very, I think, common notion. Um, you have permissions, right, and ownerships, uh, own- ownership structure in code bases. And they basically are the ones who kind of populate the memory first, um, and then sort of expand the scope. But it... I think it works. It's, it's actually a much more complex feature to build, uh, because it touches on, um, yeah, org-wide permissions. Um, there's some parts of the code where a certain engineer should be able to edit the memory, but other engineers shouldn't. Um, and so it, it actually starts looking like the new way of, um, versioning code effectively, right? It's kind of a GitHub plus plus, uh, 'cause you're not versioning the code, you're kind of versioning the meta knowledge around it that helps language models understand it better. Uh, but definitely that is something that we built, but I think it's a, a thing to iterate a lot until you kind of get the right design here, because you're effectively building kind of, yeah, a new, a new Git from scratch.
12. SGSarah Guo
  Yeah, it's interesting. And you're, you're trying to design some sort of permissions into it versus, like, you know, dominant system today in actual version control is, like, you know, at best pull request review, right? Like, you just-
13. MLMisha Laskin
  Right.
14. SGSarah Guo
  ... you try. (laughs) And-
15. MLMisha Laskin
  Yeah.
16. SGSarah Guo
  ... like, there's somebody in the organization, um, with the ability to review, makes a determination as to whether or not Misha should be able to make this change or not, actually based on the content.
17. MLMisha Laskin
  And I think actually it's going to look not too dissimilar from that, right? Where if you want to change the agent's, th- the team-wide memory, then it probably is gonna look something like a pull request, where the person who really understands that system, uh, approves or, you know, edits it or something like this. I don't think it's gonna look too dissimilar.
18. SGSarah Guo
  That's quite different from, like, traditional role-based, like, group hierarchical access control that is quite static, right? And it makes sense to me that it would look perhaps a little bit more Git-like in that the, you know, the person who knows what part of the code base you are editing or creating, uh, creating or editing knowledge about is gonna evolve over time as the code base e- evolves over time and the team does as well.
19. MLMisha Laskin
  Yeah, exactly. But I think this is also how, um... It was very common, um, at, you know, at Google and I think other places as well for different parts of the code base to have owners. And so there are, like, these ownership files, um, that we have as well. And basically if you're on the ownership file, then the review has to go through you or through... It has to be approved by at least one of the members of the ownership file. And as people move around teams and so forth, um, the ownership files themselves get updated. So, uh, I think a pretty similar structure is probably going to hold here, but it's a lot more nuanced than building kind of an individual memory, which is just kind of personal to you and lives on your computer in your, you know, agents on D file or something.
28:38 – 32:47
Leveraging Pre-Trained Models
1. MLMisha Laskin
2. SGSarah Guo
  Uh, okay if we zoom out and place, like, reflection overall in context a little bit and talk about the larger environment?
3. MLMisha Laskin
  Sounds good. Yeah.
4. SGSarah Guo
  You know, coding as a, as a root problem in this era of AI research, um, is somewhat commonly held belief, right? Um, uh, I think a criticism of companies that went after pre-training focused on coding was, in reality, like, you actually, you needed language, you needed a lot of the capabilities. Who can say exactly which? But the, the, the reasoning capabilities that could be elicited from large pre-trained models to do code anyway, and so you had to do all of the work without the general use. Is it specifically the availability of pre-trained models, um, that are more capable and open source that made you feel like we can go after, um, super intelligent, like, autonomous systems in coding without spending the pre-training dollars upfront as a, as a new lab? Or help me think about that logic a little bit more.
5. MLMisha Laskin
  I think that that's roughly correct for kind of, you know, the sort of why you can get into the game sort of short term. Um, a bet that we made, you know, when we were starting the company a year and a half ago was that there are pretty decent open weight models out there that pre-training... You know, we kind of saw pre-training as starting to more or less converge on kind of a known paradigm. There's sort of a, there's a known big data set on the internet. Yes, there are gonna be some algorithmic, um, innovations, but you're basically extracting signal from an extremely noisy dataset, and we felt like there's only so much signal that one would be able to extract without getting into just absurd dollars for scaling this in terms of what you're trying to get out of it. So what we thought would happen is that there'd be...... decent open-weight models. Um, I think the quality of the open-weight frontier has, um, surprised me. Um, they're actually, the models are better than I thought they would be.
6. SGSarah Guo
  Hmm.
7. MLMisha Laskin
  And we thought that you can just focus on, you know, w- we're in this brief period in history right now where, um, the RL flops are still manageable. Like you can, you can, you can really have a best in class, um, product if you're focused, and yes, you'll need to put, you know, you still need a decent amount of GPUs, but from a, but from a flops perspective, it's nowhere near where pre-training is.
8. SGSarah Guo
  Like two magnitudes off.
9. MLMisha Laskin
  Exactly, right? So, you can get into it and kind of build out a both kind of the product and a research arm. Our thought was that this was the time where you can actually start a, um, you know, a, a generational frontier lab that does not need to be coupled to a, you know, to a, a big cloud provider, uh, because if you do it right, you'll actually be able to generate, um, you know, sufficient revenues to not have to be acquired or find, you know, some strange deal where, um, the cloud provider kind of owns you. And that was kind of the model, I think, of a lot of what frontier labs look like pre-LLMs. Um, I think we're already starting to see that, you know, there's kind of more of a field-wide thing independently of Reflection, right? When you look at how fast, like, Anthropic's revenue is growing, um, I think, right, the- they're kind of in this spot where, um, it's like a massive revenue generating business that's growing at an unprecedented rate. That is... But, but that was very much the ethos, that, uh, we can come in, we don't need to pre-train. Um, you can get by with, um, you know, two orders of magnitude less compute, um, and really get something, something out there that's really good. Um, I think that roughly speaking, you know, you won't need the amount of compute that I think a frontier lab needs, uh, today, um, as you're focused, but you'll still need kind of, um, you know, an order of magnitude less. So, uh, I think that the capitalization requirements are still high. There's no way of avoiding that. Um, but I'd say they're, uh, and asymptotically they're probably the same. But asymptotically, the idea is that at that point you just have a generat- generational business that can, that can raise capital
32:47 – 37:21
The Challenges of Solving Scaling in RL
1. MLMisha Laskin
  off of that.
2. SGSarah Guo
  I guess part of my read at this point in time is, and maybe it was always true, but especially now is, your actual capabilities in terms of understanding what evals to go after, how to design reward models, there's perhaps, like, less understanding and more dispersion in the field in, uh, post-training strategies versus, like, as you said, more maturity in pre-training right now because you can... If it was a simple question of scaling RL on language models, people would be doing it more aggressively right now, right? Um, and so actually maybe that's a, a, a good question for you. Like, how would you describe the challenges in solv- solving scaling here? Like, why, why are we only able as a field to put, like, a much smaller amount of compute to work here and still get, like, best in, best in class results versus pre-training skilled GPUs right now?
3. MLMisha Laskin
  I'd say that there are two categories where one would think that, um, things fall into. Um, one is more around the problem's, limitation to the problem's structure, and the other one is, well, maybe the structure is fine, but you need, um, algorithmic advances to really drive the next frontier forward. There's, you know, I'd say some mixture of both, but the, the biggest way I put is on the problem structure. So if you... The, the thing that I, uh, led for Gemini was reward models. I built out, um, the reward models that were used to, um, post-train Gemini 1 and 1.5. And my thought is that if you have a reward that accurately, uh, basically describes the outcome of any arbitrary task that you throw at it, then that's, that's it.
4. SGSarah Guo
  (laughs)
5. MLMisha Laskin
  You know, at that point it's just algorithmic advances, but even the, like, very simple RL methods we have today, um, will be able to get a lot out of this. Like, they'll only be bound by their exploration abilities. So that's the only thing, right? But if today, um, you know, we certainly are not in this world where we have clean rewards for every task we could imagine, and so we're kind of making, as a field, have to make sort of various shortcuts and compromises to that. So you'll have things like LLM as judge with, um, different rubrics. And that works to some extent, but, um, it inevitably... A noisy or, like, stochastic reward inevitably gets hacked. So you kind of need a lot of these and, um, you know, and, and there's only so much you can extract out of them. Uh, then you have sources that do have ground truth rewards, but, um, there are not many of them. And so you have to hope that by optimizing against those, you'll get some generalization effects. And so I think that the fundamental problem is, like, the reward problem. You can either go in and say, "I'm just gonna... All I'm gonna focus on is kind of rewards." Um, or you can say, "I'm going to take things as they are and just be more, um, creative in the methods that leverage the rewards that I have today." And, and so examples of that are basically every synthetic generation pipeline is some example of this. Um, so it's, it's a messy problem, but I think it's fundamentally a, like, we're in a reward-bound world. I don't think there's gonna be any breakthrough that all of a sudden, you know, we go from we didn't have rewards for everything to we do, because the reward problem in itself is at the time I called... I thought it was AGI-complete, now I'd say it's ASI-complete. But by the time you have a neural network that can accurately verify any outcome, that is probably a super intelligence. And so then it goes back to, again, evaluations. What... If you're training your rewards, your reward models on something, like, what are you evaluating against? What are the tasks that, um, you want it to be good at? So, that's kind of, um, h- how I think about it. I think it's a fundamentally reward model, uh, or rewards-bound field. Um-And then there is also kind of algorithmic progress in terms of, uh, the RL methods we have today are quite bad, I would say, at, um, exploration and credit assignment. Like, they, they're sort of addressed, like, uh, the fundamental algorithms are take the things that work and make them happen more frequently, and the things that don't work and ha- and make them happen less frequently.
6. SGSarah Guo
  (laughs)
7. MLMisha Laskin
  But they don't discern at all along your, say, reasoning chain which part of the reasoning, uh, was correct and which part was incorrect. And so that's why you get these reasoning chains that are kind of garden path meandering. Like, they'll explore all sorts of things that are, you know, completely unnecessary and don't look at all like the kind of structured thinking that a person would have. That's how the algorithm works. It doesn't, uh, it doesn't actually look at... There's no credit assignment step on any atomic level. Uh, and so that would, say, fall into more algorithmic progress bottlenecks.
37:21 – 38:25
Training Agents in Copycat Software Environments
1. MLMisha Laskin
2. SGSarah Guo
  Can I ask you for a few, uh, like, hot takes quickly?
3. MLMisha Laskin
  Yeah, let's go for it.
4. SGSarah Guo
  What do you think of all of these efforts, either in-house with, you know, labs and vendors or young companies just creating software environments that look like popular software to train agents in, right? Copies of Airbnb or Amazon or Salesforce or Excel?
5. MLMisha Laskin
  Personally, I, maybe the take is not very hot, I'm very, like, bullish on it because how else are you going to... Maybe the hot take is that there's no such thing as generalization. There's just bringing the test distribution in to train.
6. SGSarah Guo
  Okay, that is an aggressive take.
7. MLMisha Laskin
  (laughs)
8. SGSarah Guo
  Wow. Yeah. (laughs)
9. MLMisha Laskin
  So as long as, like, your, yeah, train distribution looks something like what you would actually want to evaluate for, um, then, you know, users will experience, uh, experience it as generalization. I think, you know, I, I think there is some generalization that happens in these models, but, um, we probably, as, as users, overestimate it because we don't actually see how they were made. But then, you know, yeah, if you saw, "Oh, this synthetic environment was actually very similar to the thing I was asking about," so it would make sense why the model would be, would be good
38:25 – 44:27
When Will We See ASI?
1. MLMisha Laskin
  at that.
2. SGSarah Guo
  Maybe six months ago, I think, you, you, you said, like, "I think it's possible we have my definition of ASI in a couple years." Um, do you still believe that's true?
3. MLMisha Laskin
  I, I still do believe that's true. Um, I think that... Where I think we'll be in a couple years from now is that there will be kind of definitive, um, superintelligence in some meaningful categories of work. And so, for example, when I say coding, I don't mean all of coding. There... But there will be a superintelligence within some kind of slivers, some meaningful slivers of coding, that are driving, um, I would say, immense progress in the companies that can benefit from that.
4. SGSarah Guo
  Mm-hmm.
5. MLMisha Laskin
  And the reason why I would say that the problem of ASI would have been solved by then is because you've kind of, um... At that point, it's just a matter of operationalizing, like, what you know. You know, it just so happened that these particular categories, like, you might have a superintelligent front-end developer because there's so much data distribution for that on the internet, and it's easier to make synthetic data for that. But at that point, you have the recipe, and it's just a matter of, um, making kind of economic decisions of is it worth sinking in X amount of dollars to get the data in this category, um, to get kind of something, um, close to superintelligence there? Um, an example of that is what happened with reinforcement learning before language models. Um, effectively, the blueprint for building superintelligent systems was developed. It happened with, um, the Atari games, AlphaGo, um, you know, then DOTA 5 and AlphaStar were near-superintelligent systems, and if OpenAI and DeepMind had sunk more compute into them, they would have definitely become superintelligent. It's just that at that point, it didn't really make se- it, it... Economically, like, why would you do that?
6. SGSarah Guo
  Then this is a definitional issue because I, I was gonna ask, like, help me understand your view of, like, I don't... Like, one of the big criticisms of RL overall has been lack of generalization. Um, that's been just kind of a general question for this direction. I do have friends at every large research lab that's somewhat, in a some- I, I mean, tell me if you hear, uh, something of a different tenor or just believe differently. They believe we're going to have systems that are much more capable than humans in many types of knowledge work, but they believe less in generalization. And so in a resigned way, they're also, as you're saying, like, "I guess we're just gonna bring all of it under distribution one way or another."
7. MLMisha Laskin
  Yeah.
8. SGSarah Guo
  But that means, like, uh, you know, it, it's a little bit different than my, my view of, like, it's, um... At, at some point, you're, you're just, you know, you have enough capability that the rest you get for free, right? The rest, uh, sort of useful capability you get for free.
9. MLMisha Laskin
  I think I kind of have a similar viewpoint to, to the people you describe. Um, I think the generalization capabilities of these th- things has been weaker. First of all, it's all mind-blowing that this exists. So, um, we went from-
10. SGSarah Guo
  (laughs)
11. MLMisha Laskin
  ... fundamental existential crises in generalization. Like, this was... The field of reinforcement learning before language models was we have these systems, and we can make amazing, you know, at, like, very narrow tasks. We have absolutely no answer for generalization, like zero. Um, and we went from that to things that, you know, feel like they're generalizing. They're certainly generalizing much better than, um, anything we had before. Um, but it's likely because the training distributions are so broad. Uh, so at least the way I think about it is more, um, kind of output as a user. Is the system, you know, superintelligent in some meaningful categories of work? And then from a research perspective, is it obvious how to make it general for anything that you might care about? And at that point, again, it's just a matter of economics. Maybe there are some categories where, um, collecting the data is so expensive and the return on investment is low where, um, effectively just better to have craftspeople than-
12. SGSarah Guo
  Yeah.
13. MLMisha Laskin
  ... like, superintelligent AIs. Um, so I think we're moving into this kind of jagged world, uh, jagged superintelligence where you have a handful of these superintelligences for categories that matter, maybe subsumed into one model at some point, but at first, they'll probably be, um... Again, I think there will be a few companies that have kind of product model coupling.... that, you know, that is super intelligent in different categories. I think an example of, again, starting to see the first glimpses of super intelligence, but in a way that hasn't really transferred to anything meaningful yet is, well, we have these, like, uh, super intelligent test takers now. Like, you know, Amy, the Amy Benchmark is completely saturated. Codeforces and, um, other competitive, um, coding environments, the models are almost best in the world, and within the year will probably be just the best in the world. And yet we have the, so we have the best competitive coding agents. Then you go into, you know, a company and you ask them, "Have these things been helpful?" And they say-
14. SGSarah Guo
  "It's uneven," yeah.
15. MLMisha Laskin
  Yeah (laughs) . E- e- right? They, they, so in the me- in the parts of work that are really meaningful, that would, you wanna see these things driving, um, meaningful, uh, increase in, in GDP. And I think, right, the only way you will see that is if you go into a company and there's kind of, you know, a universal understanding that, "Yeah, my engineers are double digit percentage points as a whole, every single one of them more productive." Right? That's the kind of thing that if you s- that starts happening across every field, then you'll see double digit increases in GDP. So I think that the kind of benchmark maxing that's, um... And it's a bit different than benchmark maxing used to be before, because you have benchmark maxing that is weakly correlated to customer outcomes, but it still looks very similar to taking a board game, training an RL agent on it, getting kind of a landmark, um, result in super intelligence, and then making a claim that, you know, super intelligence is solved. I think, uh, the reality is that deployment of it is half the problem. Which, which it goes back to kind of evaluating on customer problems and building product together with the
44:27 – 48:10
Thoughts on Windsurf’s Non-Acquisition
1. MLMisha Laskin
  models.
2. SGSarah Guo
  So, you must have seen the, the news of the, um, Windsurf non-acquisition into either OpenAI, but non-acquisition into, uh, Google DeepMind. What do you make of it?
3. MLMisha Laskin
  We're seeing this verticalization basically happen across categories that, um, are material to, to frontier intelligence. And, uh, one could argue that the first verticalized category was actually search, like, through ChatGPT, um, that's sort of the place where OpenAI verticalized first. And coding has obviously emerged as another, uh, kind of frontier level category that, right, could, um... Like, all these companies have aspirations of-
4. SGSarah Guo
  ASI.
5. MLMisha Laskin
  Yeah, ASI, b- and I think, you know, being basically trillion dollar companies or more, I don't think that it's really the economics that are the driving factor, but it's more that if you want to sustain frontier research, that's kind of what you have to become. And so coding has clearly become one of these categories where, uh, verticalization is, um, extremely important. And I think that there's, there are kind of two sides of the story. One on the frontier lab side, and the other on the kind of more of, you know, product side, like a startup that builds product but does not have its intelligence, um, in-house. So I think on the, on the frontier lab side, I think this is exactly kind of what Yannis and I noticed when we were, um, eh, working in Gemini is that your model is so far away from the product that oftentimes even b- even though you have the best model, does not at all mean that you have the best product. So, right, there's a reason why, uh, basically startups are the places where, uh, kind of adoption of coding tools took off rather than the frontier labs. And so there's a verticalization happening there, and some are gonna do it successfully and some are not. Um, I think that that's kind of, we're already starting to see that with, um, Cloud Code really being, like, an example of a successful verticalization. Um, I don't think that it's guaranteed that a big lab can, you know, buy their way to, uh, to the end user, because the fundamental problems of your, you know, research team being far away from your product team will still be true, and, and the company having, you know, 100 different focus areas will still be true. So I don't think that acquiring an asset will change that fundamentally. But it does underscore the importance of verticalization. And then from the startup side, I think it actually puts companies that are in these kind of, um, critical path categories, like search and coding, um, in a pretty existential place if they can't build their own frontier models. Not all frontier labs will be able to verticalize correctly, but some will, maybe one will, and that's gonna be enough, I think, to kind of, right, take the thunder out from a, you know, from a company that's built, um, a great user experience on top of someone else's model. Um, and I think some of those dynamics are probably starting to play out as well. Like, I think that uh, there are some question marks around if you're on this critical path category, um, and you don't have your own intelligence, um, you know, how do you compete when your competitor can, you know, just basically subsidize their product a lot more than you can, um, right? Because you're effectively as a, as a startup that's building on top of these things... To grow quickly, you're subsidizing, you know, the margin that, you know, an Anthropic or Gemini or whatever is making. Um, and Google and Anthropic and OpenAI can subsidize their products a lot more than you can. Uh, so I think that companies that are... don't own their intelligence or are not kind of deeply integrated into a customer in some way that makes them hard to remove find themselves in this pretty, um, existential place as it becomes clear to the frontier labs that this is a category they need to verticalize around.
48:10 – 55:12
Exploring Non-RL Datasets
1. MLMisha Laskin
2. SGSarah Guo
  I work with a few robotics companies, and so, um, much of my lens on RL comes from that. And I think it is, like, far less clear in robotics that, y- you know, RL will be a dominant part of the, uh, training versus imitation learning. You'll actually appreciate this. On, on, on imitation from humans using tools, right? Um, because we run this, I, I, I'm gonna, like, describe this idea that is, um, nuts, but I, I think it's just funny. We run this grant program twice a year for amazing people, um, using ML in different fields, uh, and it's called Embed. Uh, and one of, uh, one of the ideas I had as a joke recently was, "Well, like, you should just record everything."Right? Like, not obviously just the code base, but, like, your Slack and all your documentation and all your conversations, because you are a software engineering team. And I'm 100% sure that I can take that data set if you ship something into production to an end customer that has real issues at any scale, and sell it to a friend who's a researcher at a lab working on this stuff. Um, and so you have some floor value that is millions of dollars for your, you know, couple-person company. And, like, bonuses, like, maybe the software company works, right?
3. MLMisha Laskin
  Mm-hmm.
4. SGSarah Guo
  O- obviously this is, like, very noisy and I'm, I'm mostly joking. But I'm, I'm curious how you think about, uh, like, exploring non-RL data sets that could be useful to you here.
5. MLMisha Laskin
  If that company existed, right, we would, uh, we would definitely pay for their data. (laughs)
6. SGSarah Guo
  There we go. See, it's not an idiot idea.
7. MLMisha Laskin
  (laughs)
8. SGSarah Guo
  (laughs)
9. MLMisha Laskin
  Yeah. It's, uh, yeah, especially if there's diversity. Um, I think that would be...
10. SGSarah Guo
  I could sell the whole set.
11. MLMisha Laskin
  Yeah. (laughs) So, is the question around, um, how do you leverage alternative sources of data?
12. SGSarah Guo
  Yeah. The, the question is, um, I, I think there is, like, uh, I, I don't wanna, like, over-analogize to robotics, right? But i- within robotics, you have learning from world models, you have learning from sim, you have learning from embodied data th-, uh, of different types, right? Um, and imitation any of RL. I, I think it's, like, much less clear that, uh, you can use RL for a lot of robotics today, especially some of the harder, like, manipulation problems. And I'm curious, just given, you know, your team has this enormous strength in RL as like a starting premise, how you look at other types of data to create the, uh, you know, uh, coding agent experiences you want.
13. MLMisha Laskin
  So, I was actually, um, a robotics researcher for, like, in reinforcement learning. Like, Pieter Abbeel's lab is a robotics lab. And it was, you know, it was a mixture, like, Pieter's lab was always around in- the intelligence problem and robotics as being a domain where you study it. And one of the, you know, the reason I came to lead reward models for Gemini was because that's the question I was studying with robotics. That was, you know, we had these RL algorithms for getting robots to do some very narrow tasks like moving blocks and, um, you know, various kind of narrow tasks in simulation. And the question was, well, how do we get generalized, um, yeah, manipulators and, um, you know, just how- how do we build this all into one system? And it seemed like the rewards were a bottleneck. So this, a lot of what I was studying before, uh, starting, you know, getting into language models was how do we design reward functions or models for, um, for robotics or, you know, for 3D video games like Minecraft or something like this that have, I think, similar challenges scientifically? The challenge is that if we, if you think that language model rewards are hackable, uh, vision language model rewards or, you know, like other sensory signal rewards are infinitely more hackable.
14. SGSarah Guo
  Mm-hmm.
15. MLMisha Laskin
  They're much more short-lived than, uh, than li- than rewards... Like, you can think of, right, language as just a compressed representation of the world that we have that we are, kind of magically have to start with. Um, whereas if you're processing pixels or a sensory motor signal, um, this is raw signal that has a lot more noise in it. And so, if you train a neural network that is sort of trying to detect whether this thing was manipulated correctly or this thing was, you know, moved correctly, then that thing is just infinitely more hackable than anything you have in language models. So, the same problems be- blow up and become much larger. Uh, and so that's actually why I changed to, uh, language models, because I felt that this was a fundamental problem. But, you know, we now have these confounding factors of these noisy signals coming in. I think that in, at least in a generalizable way, that's why it's really hard to get, um, reinforcement learning to work, um, with robotics. Um, the one place where it really does work well is when you have a clean reward signal, which has, happens to be in these, like, locomotion-like scenarios.
16. SGSarah Guo
  Mm-hmm.
17. MLMisha Laskin
  So, there's a lot of work on, like, building very s- robust sim-to-real locomotion pipelines. And it's because it's kind of, um, locomotion is just your body. Like, you don't have to manipulate the world around you. And so you can actually build reward signals that are like, oh, you know, your quadruped is moving at this velocity without damaging its body, kind of thing. Maybe it's a bit of a roundabout answer to the question, but it's that I think these two fields are very different in the data distribution tha- that they support. And the kind of imitation learning data for language models is, of course, the internet, right? It's, of course, you know, we've... People who've gathered all this data on, you know, how we write and so forth. And so, aside from that, when we're generating synthetic data, um, there is... The only scalable path is really reinforcement learning. The other thing that I'll say here is that when you're collecting data for robotics, um, you can do it in, like, this teleop way. Like, it's sort of, um, these are things, like, the things that we try to, we're trying to train robots to do are very intuitive for humans as well. I mean, even mo- actually more intuitive for humans, right?
18. SGSarah Guo
  Mm-hmm.
19. MLMisha Laskin
  People are master manipulators. So, you can have a lot of, kind of teleop-like, um, data collection. The things that we want language models to do, um, are sort of, you know, at the level of, um, it's really hard to collect data of, you know, the chain of thought process that goes on in, like, a human's head, um, when they're trying to solve some task. And that's kind of the da- data that you need. And so for that reason, I think language models favor this more, like, synthetic data, RL-like approach where, um, well, it's easier for us to, like, verify whether the thing was done or not than it is to actually generate all that data from a person specifically.
20. SGSarah Guo
  Maybe we just need, like, a, like, a network interface-
21. MLMisha Laskin
  Yeah.
22. SGSarah Guo
  ... to get the chain of thought.
23. MLMisha Laskin
  Yeah.
24. SGSarah Guo
  (laughs)
25. MLMisha Laskin
  Maybe. I mean, that's kind of, uh... Actually, when Jonas and I were starting the company, we were, we were thinking about, well, what, like...... you know, maybe we just like, yeah, somehow like have people speak into a microphone as they're doing tasks, um, in order to capture that.
26. SGSarah Guo
  Just stream it.
27. MLMisha Laskin
  Yeah. And it seemed, um, you know, logistically very hard to pull off.
55:12 – 57:54
Tackling Problems Beyond Engineering and Coding
1. MLMisha Laskin
2. SGSarah Guo
  Um, okay. One, uh, one final, uh, uh, question about, um, sort of Reflections' path from here. At what point do you... This is a decision you get to make in the future, but at what point do you try to look at other problems beyond engineering and coding? Um, uh, like do you, do you feel like there's a level of sufficient depth where you should just go attack different domains?
3. MLMisha Laskin
  The thing that makes coding as, um, a category special is that it's not, um, it's not synonymous with software engineering. It's just kind of how we think about the market today. The reason code is special is if you believe that, uh, the way a language model will interact with almost any piece of software is through function calls and therefore code, then if you build very capable reasoners, um, coding reasoners, that, you know, are sort of purpose-built for an organization, so you've solved the kind of long context, "How do I reason over a bunch of informa- disparate source of information," problem, and I can act on pieces of software through code, then you've kind of built a system, like the, the technology that, um, will generalize at least operationally across other categories of work. And so the way I think about it is more first just build, not trying to get, you know, too ahead of yourself of kind of just first build the kind of most depthwise comprehension system for, um, software engineers. Uh, this will naturally induce more reliable coding agents, um, right? You can plug that in as an MCP to your favorite IDE or, um, coding agent, um, you know, or use, you know, one of our own, um, right? You can kind of plug that into where- whatever surface area, uh, makes sense for the customer and then sort of naturally start seeing where, um, you're getting pulled from there. And the reason I think this will work is because we're, this is kind of what we're already seeing, right, in the, um, you know, how do you make the system useful for product managers, um, or, uh, technical support people? Um, and then, you know, I think moving on to things like sales or something like this. But, um, there are already places where, uh, you know, customers are pulling us in different, in, in different directions. It's just kind of a matter of whether you engage on that today or not. And I think that the risk that a startup has is that, you know, you, you see a lot of shiny areas where you can go and you start kind of going diffuse before you've really, um, nailed, um, a category. So, I think it's really important to be focused and not diffuse in a, in the short term and that if you kind of build the right, as we, we kind of think about it as a contextual core for an organization, in this case an engineering organization, then you can naturally start expanding that into adjacent areas of work in that enterprise.
57:54 – 1:02:30
Where We’re At in Deploying ASI in Different Fields
1. MLMisha Laskin
2. SGSarah Guo
  Okay, last question, Misha. Where would you characterize us as like being on the path toward deployment of these capabilities in, in different fields?
3. MLMisha Laskin
  I think we're a lot earlier than most people think. Uh, that this is going to be one of those areas where the technological building blocks, um, outpace their deployment. And so, yeah, within the next couple of years, uh, the blueprint roughly for, you know, how to build ASIs will have been set more or less. Like, uh, maybe there's still some, um, efficiency breakthroughs that need to happen. Um, but more or less there'll be a blueprint for how do you build a super intelligence in a particular category, actually going in and deploying it and, and building it for, you know, specific categories of work. There are gonna be a lot of product and kind of research innovation specific to those categories, um, that will probably make this a multi-decade thing. Um, so I don't think that it's a couple of years from now and, uh, GDP starts growing 10%, um, you know, year over year globally. Uh, I think w- we're actually going to get there, uh, but it's going to be a kind of multi-decade, uh, endeavor. I tend to kind of, um, see a lot of patterns, uh, now in kind of real world deployment with, uh, reinforcement learning, um, research as it worked again before li- large language models. Um, and before large language models, it used to be kind of you pick an environment, like you pick Go, you pick, um, StarCraft, you pick something else, and you go and try to solve it with, you know, some combination of imitation learning and reinforcement learning. And when you look at all those projects, these were basically things that were called strikes within, within DeepMind. Um, and each strike, uh, within and outside of DeepMind is, was a bit of a snowflake. Like the reinforcement learning methods and environments set up for Go was at a high level conceptually similar, but in the detailed implementation level, very different from StarCraft, very different from, um, DOTA 5. And so I think that that's sort of... W- we're going into every big category having a different environment, right, and different kinds of agents with different tools. And that means that you'll need to... You'll have like general base models that you can start with, but you'll need to post-train things in specific ways for those categories. And we're starting to see that already in the sense that the model that powers OpenAI's Codex is not the O series of models. It's a model called Codex, which was post-trained for that environment. The deep research models, like that's a specific environment. Um, they're also post-trains for that environment, and I think we'll basically see more and more of that, that any category that has a sufficiently large business around it, um, that requires an int- an intelligence score to power it, there will be all sorts of interesting design decisions at the research and product level of how do you ac- actually gain the most performance out of this particular category. So, uh, I think we'll kind of see a lot more kind of depth first, uh, players emerge over the coming decade or so.
4. SGSarah Guo
  I'm making a bet on it. And I also think that like part of, to, to your point about choosing like the problem for the era, we don't get to choose at Conviction a problem for 100 years. We do get to choose for like this decade or so, right? And, and I, you know, if you actually believe it's gonna be a very long term endeavor to get to the sort of productivity and abundance you described, but we are going to get there, then, you know, the other thing you think about is like, like path to supporting the cost for bringing anything under distribution during a particular period, right? And so I'd say like in the, you know, we've already backed companies in, in some of these areas, but like let's say in life sciences or material science, like it is more expensive to collect, you know, types of data you might need. And that might be a, a longer endeavor or one that you have to figure out how to fund, right? Or in robotics. And so, um, I think it's a really interesting timing question of like any of these really big categories. But I believe coding is this era.
5. MLMisha Laskin
  I think coding is this era as well. Um, this one I think will take longer than, um, people thought as well because again, enterprise, there's organizational problems, just much different, uh, than, um, the benchmarks that we have today. But I think it will be one of the faster ones. So, I don't think that that's kind of a decade out. That's, uh, that's within the next, um, you know, say dozen, dozens of months kind of thing. So, uh, I think the, the next sort of gene- generational companies in, in coding, um, are definitely being built
1:02:30 – 1:02:54
Conclusion
1. MLMisha Laskin
  today.
2. SGSarah Guo
  Well, congratulations on the release, Misha. Thanks.
3. MLMisha Laskin
  Yeah. Thank you, Sarah.
4. NANarrator
  (instrumental music)
5. SGSarah Guo
  Find us on Twitter @nopriorspod. Subscribe to our YouTube channel if you wanna see our faces. Follow the show on Apple Podcasts, Spotify, or wherever you listen. That way you get a new episode every week. And sign up for emails or find transcripts for every episode at no-priors.com.

Episode duration: 1:02:54

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode xqyy_Zs8Fgw

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome