No PriorsNo Priors Ep. 123 | With ReflectionAI Co-Founder and CEO Misha Laskin
At a glance
WHAT IT’S REALLY ABOUT
ReflectionAI Bets on RL to Build Code-Comprehension Superintelligence
- ReflectionAI co-founder and CEO Misha Laskin explains his vision for building superintelligent autonomous systems by tightly co-designing research and product, starting with coding inside complex organizations.
- Their new product, Asimov, is positioned as a “principal engineer in a box” focused on deep code comprehension and organizational knowledge, not just code generation, to tackle the true 80% of engineering work: understanding systems.
- Laskin argues that reinforcement learning (RL) on top of strong base language models is the final paradigm needed for ASI, but that the field is fundamentally bottlenecked by reward modeling and realistic evaluation, not raw compute.
- He predicts we’ll see superintelligent performance in meaningful slices of knowledge work within a few years, with deployment and domain-specific post-training becoming a multi-decade process featuring highly vertical, depth-focused players.
IDEAS WORTH REMEMBERING
5 ideasDesign for deployment, not just benchmarks.
Laskin contrasts a pure research lab approach (maxing academic benchmarks) with ReflectionAI’s approach of co-designing product and research around real workflows, arguing that coupling to customer problems and usage-driven evals is essential for meaningful impact.
Code comprehension is the real productivity bottleneck, not code generation.
He notes that engineers spend roughly 80% of their time understanding complex systems and collaborating, yet current tools are optimized ~80% for generation and ~20% for comprehension—often producing negligible or even negative productivity gains in large organizations.
Treat organizational knowledge as a first-class input to AI agents.
Asimov ingests not only code but also chats, project tools, and curated team memory, aiming to act as an ‘omniscient oracle’ that captures tribal knowledge and allows engineers to teach the system domain-specific meanings and conventions.
Reward modeling is the fundamental bottleneck for RL progress.
Laskin argues the field is “reward-bound”: if we had accurate reward models for arbitrary tasks, existing RL methods plus scaling would suffice; instead we rely on noisy LLM-as-judge setups and scarce ground-truth rewards, limiting how far RL can be pushed today.
Narrow superintelligence will appear first in ‘ASI-complete’ categories like coding.
He predicts that within a few years we’ll see superhuman agents in significant slices of coding, creating a blueprint for ASI in other domains; extending that blueprint then becomes mostly an economic and data-collection question rather than a conceptual one.
WORDS WORTH SAVING
5 quotesMaybe the hot take is that there's no such thing as generalization. There's just bringing the test distribution into train.
— Misha Laskin
The actual problem is exactly the opposite [of where tools focus]. When you look at what an engineer does, 80% of their time they're spending trying to comprehend complex systems and collaborating with teammates.
— Misha Laskin
We're in this brief period in history where the RL flops are still manageable... you can really have a best-in-class product if you're focused, without pre-training your own gigantic model.
— Misha Laskin
It's a fundamentally reward-bound field. By the time you have a neural network that can accurately verify any outcome, that is probably a superintelligence.
— Misha Laskin
I think we're a lot earlier than most people think. The technological building blocks will outpace their deployment... getting to broad productivity and GDP impact will be a multi-decade endeavor.
— Misha Laskin
High quality AI-generated summary created from speaker-labeled transcript.
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome