No Priors Ep. 123 | With ReflectionAI Co-Founder and CEO Misha Laskin

No Priors Ep. 123 | With ReflectionAI Co-Founder and CEO Misha Laskin

No PriorsJul 17, 20251h 2m

Sarah Guo (host), Misha Laskin (guest), Narrator

Difference between abstract superintelligence and deployable superintelligent autonomous systemsReflectionAI’s strategy: co-designing product and research around code comprehension (Asimov)Limitations of current coding tools and the primacy of comprehension over generationReinforcement learning on top of LLMs, reward modeling challenges, and evaluation designVerticalization, ownership of intelligence, and competitive dynamics with frontier labsTeam-wide memory, organizational knowledge capture, and permissions/authority designRoadmap and timelines toward ASI and jagged, domain-specific superintelligences

In this episode of No Priors, featuring Sarah Guo and Misha Laskin, No Priors Ep. 123 | With ReflectionAI Co-Founder and CEO Misha Laskin explores reflectionAI Bets on RL to Build Code-Comprehension Superintelligence ReflectionAI co-founder and CEO Misha Laskin explains his vision for building superintelligent autonomous systems by tightly co-designing research and product, starting with coding inside complex organizations.

ReflectionAI Bets on RL to Build Code-Comprehension Superintelligence

ReflectionAI co-founder and CEO Misha Laskin explains his vision for building superintelligent autonomous systems by tightly co-designing research and product, starting with coding inside complex organizations.

Their new product, Asimov, is positioned as a “principal engineer in a box” focused on deep code comprehension and organizational knowledge, not just code generation, to tackle the true 80% of engineering work: understanding systems.

Laskin argues that reinforcement learning (RL) on top of strong base language models is the final paradigm needed for ASI, but that the field is fundamentally bottlenecked by reward modeling and realistic evaluation, not raw compute.

He predicts we’ll see superintelligent performance in meaningful slices of knowledge work within a few years, with deployment and domain-specific post-training becoming a multi-decade process featuring highly vertical, depth-focused players.

Key Takeaways

Design for deployment, not just benchmarks.

Laskin contrasts a pure research lab approach (maxing academic benchmarks) with ReflectionAI’s approach of co-designing product and research around real workflows, arguing that coupling to customer problems and usage-driven evals is essential for meaningful impact.

Get the full analysis with uListen AI

Code comprehension is the real productivity bottleneck, not code generation.

He notes that engineers spend roughly 80% of their time understanding complex systems and collaborating, yet current tools are optimized ~80% for generation and ~20% for comprehension—often producing negligible or even negative productivity gains in large organizations.

Get the full analysis with uListen AI

Treat organizational knowledge as a first-class input to AI agents.

Asimov ingests not only code but also chats, project tools, and curated team memory, aiming to act as an ‘omniscient oracle’ that captures tribal knowledge and allows engineers to teach the system domain-specific meanings and conventions.

Get the full analysis with uListen AI

Reward modeling is the fundamental bottleneck for RL progress.

Laskin argues the field is “reward-bound”: if we had accurate reward models for arbitrary tasks, existing RL methods plus scaling would suffice; instead we rely on noisy LLM-as-judge setups and scarce ground-truth rewards, limiting how far RL can be pushed today.

Get the full analysis with uListen AI

Narrow superintelligence will appear first in ‘ASI-complete’ categories like coding.

He predicts that within a few years we’ll see superhuman agents in significant slices of coding, creating a blueprint for ASI in other domains; extending that blueprint then becomes mostly an economic and data-collection question rather than a conceptual one.

Get the full analysis with uListen AI

Verticalization and owning the ‘intelligence core’ are existential for some startups.

As frontier labs move to vertically integrate major categories like search and coding, startups that don’t control their own models—or aren’t deeply embedded and hard to rip out—may face pricing pressure and strategic risk from subsidized, integrated offerings.

Get the full analysis with uListen AI

Expect a ‘jagged’ landscape of domain-specific superintelligences.

Rather than a single uniformly general ASI, Laskin anticipates many depth-first, post-trained systems specialized for particular environments and toolsets (e. ...

Get the full analysis with uListen AI

Notable Quotes

Maybe the hot take is that there's no such thing as generalization. There's just bringing the test distribution into train.

Misha Laskin

The actual problem is exactly the opposite [of where tools focus]. When you look at what an engineer does, 80% of their time they're spending trying to comprehend complex systems and collaborating with teammates.

Misha Laskin

We're in this brief period in history where the RL flops are still manageable... you can really have a best-in-class product if you're focused, without pre-training your own gigantic model.

Misha Laskin

It's a fundamentally reward-bound field. By the time you have a neural network that can accurately verify any outcome, that is probably a superintelligence.

Misha Laskin

I think we're a lot earlier than most people think. The technological building blocks will outpace their deployment... getting to broad productivity and GDP impact will be a multi-decade endeavor.

Misha Laskin

Questions Answered in This Episode

If generalization is mostly ‘train equals test distribution,’ what does that imply for how we should design synthetic environments and data for future ASI systems?

ReflectionAI co-founder and CEO Misha Laskin explains his vision for building superintelligent autonomous systems by tightly co-designing research and product, starting with coding inside complex organizations.

Get the full analysis with uListen AI

How can large enterprises practically implement team-wide memory and knowledge curation without creating unmanageable governance and permission complexity?

Their new product, Asimov, is positioned as a “principal engineer in a box” focused on deep code comprehension and organizational knowledge, not just code generation, to tackle the true 80% of engineering work: understanding systems.

Get the full analysis with uListen AI

What concrete metrics or evals should engineering leaders track to determine whether a code comprehension agent like Asimov is truly improving onboarding speed and productivity?

Laskin argues that reinforcement learning (RL) on top of strong base language models is the final paradigm needed for ASI, but that the field is fundamentally bottlenecked by reward modeling and realistic evaluation, not raw compute.

Get the full analysis with uListen AI

Given that reward modeling is the bottleneck, what promising approaches (beyond LLM-as-judge) might materially improve reward quality over the next few years?

He predicts we’ll see superintelligent performance in meaningful slices of knowledge work within a few years, with deployment and domain-specific post-training becoming a multi-decade process featuring highly vertical, depth-focused players.

Get the full analysis with uListen AI

How should startups in ‘critical path’ categories like coding or search think about the trade-offs between building their own models versus relying on frontier labs’ APIs as verticalization accelerates?

Get the full analysis with uListen AI

Transcript Preview

Sarah Guo

(music plays) Hi, listeners. Welcome back to No Priors. RL is back with a vengeance. And one of the most talent-dense new research labs has a product release, a new code comprehension agent. Reflection AI's co-founders, Misha Larkin and Yannis Antonoglou, worked together as leaders at Google DeepMind on groundbreaking projects like AlphaGo, AlphaZero, and Gemini. I talked to Misha about building universal superhuman agents, the trickiness of reward modeling, bringing all knowledge work tasks under data distribution, how RL for language and robotics differs, the windsor of non-acquisition, and the landscape from here. Misha, welcome. Thank you for doing this.

Misha Laskin

Yeah. Thanks, Sara, for having me.

Sarah Guo

So it's been, um, a- about a wild, like, year and a half since you guys started the company. Is that about right?

Misha Laskin

Roughly a year and a half. Maybe a bit less, but I'd say it's ballpark correct.

Sarah Guo

Well, can you just start by describing... Y- you said that the company's mission is to build superintelligent autonomous systems and we've talked before about why, like, this is the moment in time that's possible. What is different about that from building just superintelligence, which is now a sort of more popular ambitious goal?

Misha Laskin

At a high level, it's fairly synonymous, uh, but maybe there are different ways of thinking about how to build superintelligence and what that might look like. I think on one spectrum, there's an academic way to look at it, uh, which is, uh... And to some sense, uh, to some extent, um, superintelligence in that sense has already been achieved. So, uh, like AlphaGo was a superintelligent system, and there were other systems during that time that were built that were superintelligent in narrow domains. And I think you can go for the goal of building a very broad superintelligence by, you know, kind of locking yourself up in an academic... Or, uh, uh, it's not really an academic, but kind of an industrial lab with, um... that is sort of kind of decoupled from, uh, product or customers, and kind of max out all the benchmarks that are out there, uh, and build superintelligence that way. I think that is, that is one approach. Um, I think the other approach is to kind of think about what is superintelligence more concretely, how is it going to be deployed, what is it actually going to look like in people's hands, and build backwards from there. So I would kind of say that that approach is more kind of co-designing product and research together. Now, the kind of benefits of that approach is that you're kind of, uh, ma- uh, you're optimizing for real problems. The cons to it is that you have to be a lot more focused, right? Because your, your product kind of defines the sort of capabilities that you want to dr- draw out of the system. And you have to start out a lot more focused before expanding, um, across, you know, other product categories and other capabilities. So, I would say that on the spectrum of companies that are kind of superintelligence, um, in just a research lab, and then figure out what the product is, you know, once it's built, as opposed to co-designing product and research together to build very powerful systems, uh, in what I would call kind of, um, ASI-complete categories. You can pick something that is, uh, maybe too small of a category to draw out a superintelligence. As long as you pick a category that I would say is kind of big enough to be ASI-complete, um, I think... And, and this is kind of our approach at Reflection, is it makes a lot more sense to be focused and co-design those two things together, the product and the research.

Install uListen to search the full transcript and get AI-powered insights

Get Full Transcript

Get more from every podcast

AI summaries, searchable transcripts, and fact-checking. Free forever.

Add to Chrome