Skip to content
No PriorsNo Priors

No Priors Ep. 123 | With ReflectionAI Co-Founder and CEO Misha Laskin

Superintelligence, at least in an academic sense, has already been achieved. But Misha Laskin thinks that the next step towards artificial superintelligence, or ASI, should look both more user and problem-focused. ReflectionAI co-founder and CEO Misha Laskin joins Sarah Guo to introduce Asimov, their new code comprehension agent built on reinforcement learning (RL). Misha talks about creating tools and designing AI agents based on customer needs, and how that influences eval development and the scope of the agent’s memory. The two also discuss the challenges in solving scaling for RL, the future of ASI, and the implications for Google’s “non-acquisition” of Windsurf. Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @MishaLaskin | @reflection_ai Chapters: 00:00 – Misha Laskin Introduction 00:44 – Superintelligence vs. Super Intelligent Autonomous Systems 03:26 – Misha’s Journey from Physics to AI 07:48 – Asimov Product Release 11:52 – What Differentiates Asimov from Other Agents 16:15 – Asimov’s Eval Philosophy 21:52 – The Types of Queries Where Asimov Shines 24:35 – Designing a Team-Wide Memory for Asimov 28:38 – Leveraging Pre-Trained Models 32:47 – The Challenges of Solving Scaling in RL 37:21 – Training Agents in Copycat Software Environments 38:25 – When Will We See ASI? 44:27 – Thoughts on Windsurf’s Non-Acquisition 48:10 – Exploring Non-RL Datasets 55:12 – Tackling Problems Beyond Engineering and Coding 57:54 – Where We’re At in Deploying ASI in Different Fields 01:02:30 – Conclusion

Sarah GuohostMisha Laskinguest
Jul 16, 20251h 2mWatch on YouTube ↗

At a glance

WHAT IT’S REALLY ABOUT

ReflectionAI Bets on RL to Build Code-Comprehension Superintelligence

  1. ReflectionAI co-founder and CEO Misha Laskin explains his vision for building superintelligent autonomous systems by tightly co-designing research and product, starting with coding inside complex organizations.
  2. Their new product, Asimov, is positioned as a “principal engineer in a box” focused on deep code comprehension and organizational knowledge, not just code generation, to tackle the true 80% of engineering work: understanding systems.
  3. Laskin argues that reinforcement learning (RL) on top of strong base language models is the final paradigm needed for ASI, but that the field is fundamentally bottlenecked by reward modeling and realistic evaluation, not raw compute.
  4. He predicts we’ll see superintelligent performance in meaningful slices of knowledge work within a few years, with deployment and domain-specific post-training becoming a multi-decade process featuring highly vertical, depth-focused players.

IDEAS WORTH REMEMBERING

5 ideas

Design for deployment, not just benchmarks.

Laskin contrasts a pure research lab approach (maxing academic benchmarks) with ReflectionAI’s approach of co-designing product and research around real workflows, arguing that coupling to customer problems and usage-driven evals is essential for meaningful impact.

Code comprehension is the real productivity bottleneck, not code generation.

He notes that engineers spend roughly 80% of their time understanding complex systems and collaborating, yet current tools are optimized ~80% for generation and ~20% for comprehension—often producing negligible or even negative productivity gains in large organizations.

Treat organizational knowledge as a first-class input to AI agents.

Asimov ingests not only code but also chats, project tools, and curated team memory, aiming to act as an ‘omniscient oracle’ that captures tribal knowledge and allows engineers to teach the system domain-specific meanings and conventions.

Reward modeling is the fundamental bottleneck for RL progress.

Laskin argues the field is “reward-bound”: if we had accurate reward models for arbitrary tasks, existing RL methods plus scaling would suffice; instead we rely on noisy LLM-as-judge setups and scarce ground-truth rewards, limiting how far RL can be pushed today.

Narrow superintelligence will appear first in ‘ASI-complete’ categories like coding.

He predicts that within a few years we’ll see superhuman agents in significant slices of coding, creating a blueprint for ASI in other domains; extending that blueprint then becomes mostly an economic and data-collection question rather than a conceptual one.

WORDS WORTH SAVING

5 quotes

Maybe the hot take is that there's no such thing as generalization. There's just bringing the test distribution into train.

Misha Laskin

The actual problem is exactly the opposite [of where tools focus]. When you look at what an engineer does, 80% of their time they're spending trying to comprehend complex systems and collaborating with teammates.

Misha Laskin

We're in this brief period in history where the RL flops are still manageable... you can really have a best-in-class product if you're focused, without pre-training your own gigantic model.

Misha Laskin

It's a fundamentally reward-bound field. By the time you have a neural network that can accurately verify any outcome, that is probably a superintelligence.

Misha Laskin

I think we're a lot earlier than most people think. The technological building blocks will outpace their deployment... getting to broad productivity and GDP impact will be a multi-decade endeavor.

Misha Laskin

Difference between abstract superintelligence and deployable superintelligent autonomous systemsReflectionAI’s strategy: co-designing product and research around code comprehension (Asimov)Limitations of current coding tools and the primacy of comprehension over generationReinforcement learning on top of LLMs, reward modeling challenges, and evaluation designVerticalization, ownership of intelligence, and competitive dynamics with frontier labsTeam-wide memory, organizational knowledge capture, and permissions/authority designRoadmap and timelines toward ASI and jagged, domain-specific superintelligences

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome