No PriorsAndrej Karpathy on Code Agents, AutoResearch, and the Loopy Era of AI
CHAPTERS
From typing code to “manifesting” with agents: the December capability jump
Karpathy describes a dramatic shift in his workflow since December: he rarely types code and instead delegates to coding agents, often in parallel. The new bottleneck is no longer typing or even compute, but the human’s ability to direct and orchestrate agents effectively.
What still limits capability: skill issues, jaggedness, and the edges not working
They explore what constrains progress when agents feel extremely powerful yet inconsistent. Karpathy emphasizes that many failures feel like instruction/coordination problems, but also points to fundamental model “jaggedness” where competence varies wildly by domain.
Mastery of coding agents: macro-actions, multi-agent orchestration, and review strategy
Karpathy lays out what expert-level use looks like: managing many concurrent agents across repos and tasks, operating at higher-level “macro actions,” and selectively reviewing based on risk. The goal is to develop intuition for decomposition and non-interfering parallel workstreams.
Second-order effects of natural-language coding: apps collapse into APIs
Natural language interfaces and agentic tool-use may make many traditional apps obsolete. Karpathy argues that users don’t want to learn bespoke UIs; instead, software should expose APIs and let agents compose workflows across tools on the user’s behalf.
Claws in the real world: persistent, looping agents and Karpathy’s ‘Dobby’ home system
Karpathy explains “claws” as persistent agents that operate semi-autonomously with memory and looping behavior. He shares his Dobby home automation claw that discovered devices on his LAN, reverse engineered APIs, unified control, and even performs vision-based security notifications via WhatsApp.
Why AutoResearch: removing the human bottleneck from research loops
AutoResearch is framed as the next step in leverage: define an objective, constraints, and metrics, then let autonomous loops run without requiring continual human prompting. Karpathy sees it as a prototype for recursive self-improvement workflows in LLM R&D.
AutoResearch results: surprising gains from autonomous hyperparameter discovery
Using NanoChat as a training playground, Karpathy reports AutoResearch found improvements he missed after decades of experience. The system surfaced interacting hyperparameters (e.g., weight decay details, Adam betas) and demonstrated how automation can exceed manual tuning even in familiar regimes.
Meta-optimization: Program.md as “research org code” and evolving the instructions themselves
They discuss the next recursive layer: the research organization is effectively a set of Markdown instructions (roles, processes, heuristics). If different Program.md files yield different results, then optimizing these instructions becomes a new competitive frontier.
Relevant skills in the AI era: metric design, verifiability, and operating at the right abstraction
Karpathy offers a grounded view on what’s valuable: tasks with crisp metrics are easiest to automate and accelerate. For softer domains, humans still matter for intent, evaluation, and steering—especially because models are trained/optimized primarily where rewards are measurable.
Model speciation vs monoculture: specialization, efficiency pressures, and the difficulty of “touching weights”
They debate whether a single general model makes sense given persistent jaggedness. Karpathy expects more “speciation” (specialized models for niches) but notes the science and tooling for safe fine-tuning/continual learning without capability loss remains immature.
Collaboration surfaces for humans + untrusted compute: AutoResearch as a verification-first swarm
Karpathy explores how to scale AutoResearch via parallel workers, including an untrusted internet pool. Because candidate improvements can be cheap to verify but expensive to discover, he sketches systems resembling blockchain dynamics—commits as units, proof-of-work as experiments, and leaderboards as rewards.
Jobs market data and the “digital vs physical” divide: where change hits first
Reviewing BLS job data, Karpathy focuses on which roles manipulate digital information versus atoms. He expects faster upheaval in digital work (bits move cheaply) and slower transformation in physical jobs, while uncertainty remains due to demand elasticity (Jevons paradox).
Open vs closed models: a healthy lag, ecosystem power balance, and risks of centralization
Karpathy argues open source has structurally important roles similar to Linux in operating systems, even if it trails the frontier by months. He prefers an ecosystem where frontier capability exists but a broad open platform provides access, competition, and reduced centralization risk.
Autonomous robotics and the physical interface: sensors, actuators, and information markets
Drawing from self-driving, Karpathy expects robotics to lag digital agents due to cost and messiness, but sees massive long-term opportunity. He highlights the coming importance of interfaces—sensors that feed AI and actuators that let it test hypotheses in the world—plus emerging “information markets” for real-world data collection.
MicroGPT and agentic education: teach the agent, not the audience
Karpathy explains MicroGPT as his attempt to compress LLM training to its algorithmic essence (~200 lines), stripping away efficiency-driven complexity. He argues education is shifting: instead of writing extensive human docs, creators should produce agent-readable guidance so assistants can personalize explanations and curricula.
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome