No PriorsAndrej Karpathy on Code Agents, AutoResearch, and the Loopy Era of AI
Sarah Guo and Andrej Karpathy on karpathy maps AI’s loopy era: agents, claws, autoresearch, robotics, education.
In this episode of No Priors, featuring Sarah Guo and Andrej Karpathy, Andrej Karpathy on Code Agents, AutoResearch, and the Loopy Era of AI explores karpathy maps AI’s loopy era: agents, claws, autoresearch, robotics, education Karpathy describes a recent workflow shift where he rarely types code and instead coordinates multiple coding agents in parallel, making human “token throughput” and instruction quality the new bottlenecks.
At a glance
WHAT IT’S REALLY ABOUT
Karpathy maps AI’s loopy era: agents, claws, autoresearch, robotics, education
- Karpathy describes a recent workflow shift where he rarely types code and instead coordinates multiple coding agents in parallel, making human “token throughput” and instruction quality the new bottlenecks.
- He frames “claws” as persistent, looping agent systems with memory and tool access, illustrating their power via a WhatsApp-controlled home automation setup that discovers devices, reverse engineers APIs, and orchestrates household actions.
- AutoResearch is presented as removing the researcher from the loop by defining objectives, metrics, and boundaries so agents can run experiments autonomously, including meta-optimization where models could eventually improve the very “Program.md” that defines the research process.
- He argues current models remain “jagged,” excelling in verifiable, RL-optimized domains (e.g., code/tests) while stagnating in softer domains (e.g., humor/nuance), motivating both better evaluation scaffolds and eventual model “speciation” into specialized intelligences.
- The conversation connects these trends to labor-market shifts (digital work changes first; Jevons paradox may expand software demand), open-vs-closed ecosystem dynamics (open source trailing by months but covering most use cases), and a robotics timeline where atoms lag bits and the key opportunity is the sensor/actuator interface layer.
IDEAS WORTH REMEMBERING
7 ideasEngineering leverage is shifting from typing speed to orchestration skill.
Karpathy reports moving from mostly hand-coding to mostly delegating, where the key competency becomes decomposing work into parallelizable “macro actions,” writing effective instructions, and reviewing outputs at the right fidelity.
Maximizing output now looks like maximizing token throughput, not CPU/GPU utilization.
He likens unused agent quota to idle GPUs in a PhD lab: if an agent is running, the human should queue the next task or spin up another agent, making the person the primary bottleneck.
Persistent “claws” are a UX re-architecture: fewer apps, more intent-driven APIs.
His Dobby home claw replaces multiple vendor apps by discovering local devices, finding/deriving endpoints, and exposing a single natural-language control surface, suggesting software may refactor toward agent-consumable APIs over human-first UIs.
AutoResearch works best where evaluation is cheap, objective, and automatable.
He emphasizes kernels/perf work and model training loops as ideal because correctness and improvement can be verified via tests or metrics, while domains without clear evaluators resist full autonomy.
Jaggedness persists because labs optimize what they can verify.
He argues RL pipelines strongly improve tasks with clear rewards (tests, benchmarks) but leave softer capabilities under-optimized, producing systems that can “move mountains” in coding yet still default to stale, low-diversity jokes.
Research organizations may become tunable codebases (and eventually self-tuning).
Program.md serves as an explicit “org spec” for autonomous work; he and Guo discuss competitions over better Program.md designs and the likelihood of meta-optimization where models rewrite the instructions that govern research.
Open-source and closed frontier can form a healthy power balance—if pluralism persists.
Karpathy expects the current pattern—closed frontier ahead, open models trailing by months but covering broad needs—to continue, arguing it mitigates systemic risk from centralized intelligence while still funding expensive frontier progress.
WORDS WORTH SAVING
5 quotesI don't think I've typed, like, a line of code probably since December, basically.
— Andrej Karpathy
Now it's not about FLOPs, it's about tokens. What is your token throughput, and what token throughput do you command?
— Andrej Karpathy
I simultaneously feel like I'm talking to an extremely brilliant PhD student... and a 10-year-old.
— Andrej Karpathy
A research organization is a set of Markdown files that describe all the roles and how the whole thing connects.
— Andrej Karpathy
In a certain sense, these apps... shouldn't even exist... shouldn't it just be APIs, and shouldn't agents be just using it directly?
— Andrej Karpathy
QUESTIONS ANSWERED IN THIS EPISODE
5 questionsWhat specific practices make you effective at reviewing agent-generated changes when you’re coordinating 5–10 parallel repos (tests, diffs, invariants, risk tiers)?
Karpathy describes a recent workflow shift where he rarely types code and instead coordinates multiple coding agents in parallel, making human “token throughput” and instruction quality the new bottlenecks.
In your Dobby home claw, what security boundaries did you enforce (sandboxing, network isolation, secrets handling), and what scared you enough to avoid email/calendar access?
He frames “claws” as persistent, looping agent systems with memory and tool access, illustrating their power via a WhatsApp-controlled home automation setup that discovers devices, reverse engineers APIs, and orchestrates household actions.
If jaggedness is driven by verifiability, what new evaluation signals would you add to train better “nuance” behaviors like asking clarifying questions at the right time?
AutoResearch is presented as removing the researcher from the loop by defining objectives, metrics, and boundaries so agents can run experiments autonomously, including meta-optimization where models could eventually improve the very “Program.md” that defines the research process.
What would a concrete AutoResearch@home protocol look like for untrusted contributors—how do you safely execute arbitrary commits, prevent exfiltration, and handle compute fraud?
He argues current models remain “jagged,” excelling in verifiable, RL-optimized domains (e.g., code/tests) while stagnating in softer domains (e.g., humor/nuance), motivating both better evaluation scaffolds and eventual model “speciation” into specialized intelligences.
Where did AutoResearch find improvements in Nanochat that surprised you most, and what does that imply about how much “researcher intuition” is actually leaving performance on the table?
The conversation connects these trends to labor-market shifts (digital work changes first; Jevons paradox may expand software demand), open-vs-closed ecosystem dynamics (open source trailing by months but covering most use cases), and a robotics timeline where atoms lag bits and the key opportunity is the sensor/actuator interface layer.
EVERY SPOKEN WORD
Install uListen for AI-powered chat & search across the full episode — Get Full Transcript
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome