Andrej Karpathy on Code Agents, AutoResearch, and the Loopy Era of AI

Andrej Karpathy on Code Agents, AutoResearch, and the Loopy Era of AI

No PriorsMar 20, 20261h 6m

Sarah Guo (host), Andrej Karpathy (guest)

Agentic coding as macro-actions and parallel workstreamsToken throughput as the new limiting resource“Claws” (persistent looping agents) + memory + WhatsApp-like portalsAutoResearch: autonomous experiment loops with objective metricsJagged intelligence and verifiability/RL optimization limitsModel speciation vs monoculture; weight-tuning vs context promptingOpen-source catching up to frontier; centralization riskJobs data: digital vs physical work; Jevons paradox in softwareRobotics: atoms are harder; interface between digital and physicalMicroGPT and agent-mediated education/documentation

In this episode of No Priors, featuring Sarah Guo and Andrej Karpathy, Andrej Karpathy on Code Agents, AutoResearch, and the Loopy Era of AI explores karpathy maps AI’s loopy era: agents, claws, autoresearch, robotics, education Karpathy describes a recent workflow shift where he rarely types code and instead coordinates multiple coding agents in parallel, making human “token throughput” and instruction quality the new bottlenecks.

Karpathy maps AI’s loopy era: agents, claws, autoresearch, robotics, education

Karpathy describes a recent workflow shift where he rarely types code and instead coordinates multiple coding agents in parallel, making human “token throughput” and instruction quality the new bottlenecks.

He frames “claws” as persistent, looping agent systems with memory and tool access, illustrating their power via a WhatsApp-controlled home automation setup that discovers devices, reverse engineers APIs, and orchestrates household actions.

AutoResearch is presented as removing the researcher from the loop by defining objectives, metrics, and boundaries so agents can run experiments autonomously, including meta-optimization where models could eventually improve the very “Program.md” that defines the research process.

He argues current models remain “jagged,” excelling in verifiable, RL-optimized domains (e.g., code/tests) while stagnating in softer domains (e.g., humor/nuance), motivating both better evaluation scaffolds and eventual model “speciation” into specialized intelligences.

The conversation connects these trends to labor-market shifts (digital work changes first; Jevons paradox may expand software demand), open-vs-closed ecosystem dynamics (open source trailing by months but covering most use cases), and a robotics timeline where atoms lag bits and the key opportunity is the sensor/actuator interface layer.

Key Takeaways

Engineering leverage is shifting from typing speed to orchestration skill.

Karpathy reports moving from mostly hand-coding to mostly delegating, where the key competency becomes decomposing work into parallelizable “macro actions,” writing effective instructions, and reviewing outputs at the right fidelity.

Get the full analysis with uListen AI

Maximizing output now looks like maximizing token throughput, not CPU/GPU utilization.

He likens unused agent quota to idle GPUs in a PhD lab: if an agent is running, the human should queue the next task or spin up another agent, making the person the primary bottleneck.

Get the full analysis with uListen AI

Persistent “claws” are a UX re-architecture: fewer apps, more intent-driven APIs.

His Dobby home claw replaces multiple vendor apps by discovering local devices, finding/deriving endpoints, and exposing a single natural-language control surface, suggesting software may refactor toward agent-consumable APIs over human-first UIs.

Get the full analysis with uListen AI

AutoResearch works best where evaluation is cheap, objective, and automatable.

He emphasizes kernels/perf work and model training loops as ideal because correctness and improvement can be verified via tests or metrics, while domains without clear evaluators resist full autonomy.

Get the full analysis with uListen AI

Jaggedness persists because labs optimize what they can verify.

He argues RL pipelines strongly improve tasks with clear rewards (tests, benchmarks) but leave softer capabilities under-optimized, producing systems that can “move mountains” in coding yet still default to stale, low-diversity jokes.

Get the full analysis with uListen AI

Research organizations may become tunable codebases (and eventually self-tuning).

Program. ...

Get the full analysis with uListen AI

Open-source and closed frontier can form a healthy power balance—if pluralism persists.

Karpathy expects the current pattern—closed frontier ahead, open models trailing by months but covering broad needs—to continue, arguing it mitigates systemic risk from centralized intelligence while still funding expensive frontier progress.

Get the full analysis with uListen AI

Notable Quotes

I don't think I've typed, like, a line of code probably since December, basically.

Andrej Karpathy

Now it's not about FLOPs, it's about tokens. What is your token throughput, and what token throughput do you command?

Andrej Karpathy

I simultaneously feel like I'm talking to an extremely brilliant PhD student... and a 10-year-old.

Andrej Karpathy

A research organization is a set of Markdown files that describe all the roles and how the whole thing connects.

Andrej Karpathy

In a certain sense, these apps... shouldn't even exist... shouldn't it just be APIs, and shouldn't agents be just using it directly?

Andrej Karpathy

Questions Answered in This Episode

What specific practices make you effective at reviewing agent-generated changes when you’re coordinating 5–10 parallel repos (tests, diffs, invariants, risk tiers)?

Karpathy describes a recent workflow shift where he rarely types code and instead coordinates multiple coding agents in parallel, making human “token throughput” and instruction quality the new bottlenecks.

Get the full analysis with uListen AI

In your Dobby home claw, what security boundaries did you enforce (sandboxing, network isolation, secrets handling), and what scared you enough to avoid email/calendar access?

He frames “claws” as persistent, looping agent systems with memory and tool access, illustrating their power via a WhatsApp-controlled home automation setup that discovers devices, reverse engineers APIs, and orchestrates household actions.

Get the full analysis with uListen AI

If jaggedness is driven by verifiability, what new evaluation signals would you add to train better “nuance” behaviors like asking clarifying questions at the right time?

AutoResearch is presented as removing the researcher from the loop by defining objectives, metrics, and boundaries so agents can run experiments autonomously, including meta-optimization where models could eventually improve the very “Program. ...

Get the full analysis with uListen AI

What would a concrete AutoResearch@home protocol look like for untrusted contributors—how do you safely execute arbitrary commits, prevent exfiltration, and handle compute fraud?

He argues current models remain “jagged,” excelling in verifiable, RL-optimized domains (e. ...

Get the full analysis with uListen AI

Where did AutoResearch find improvements in Nanochat that surprised you most, and what does that imply about how much “researcher intuition” is actually leaving performance on the table?

The conversation connects these trends to labor-market shifts (digital work changes first; Jevons paradox may expand software demand), open-vs-closed ecosystem dynamics (open source trailing by months but covering most use cases), and a robotics timeline where atoms lag bits and the key opportunity is the sensor/actuator interface layer.

Get the full analysis with uListen AI

Transcript Preview

Sarah Guo

Code's not even the right verb anymore, right?

Andrej Karpathy

[laughs] Yeah.

Sarah Guo

But I have to, um, express my will to my agents for-

Andrej Karpathy

Manifest

Sarah Guo

... sixteen hours a day. Manifest.

Andrej Karpathy

How can I have not just a single session of Claude Code or Codex or some of these agent harnesses? How can I have more of them? How can I do that appropriately? The agent part is now taken from granted. Now the claw-like entities are taken for granted, and now you can have multiple of them, and now you can have instructions to them, and now you can have optimization over the instructions. But-

Sarah Guo

[laughs]

Andrej Karpathy

... there-- I mean, this is why it gets to the psychosis, is that this is, like, infinite and everything is skill issue.

Sarah Guo

[upbeat music] Hi, listeners. Welcome back to No Priors. Today, I'm here with Andrej Karpathy, and we have a wide-ranging conversation for you about code agents, the future of engineering and AI research, how more people can contribute to research, what's happening in robotics, his prediction for how agents can reach out into the real world, and education in this next age. Welcome, Andrej. Andrej, thanks for doing this.

Andrej Karpathy

Yeah, thank you for having me. [laughs]

Sarah Guo

Uh, so it's been a very exciting couple of months in AI.

Andrej Karpathy

Uh, yeah, [laughs] you could say that.

Sarah Guo

I remember, um, walking into the office at some point, and you were, like, really locked in, and I was asking what you were up to, and you're like, "I just... I have to code for sixteen hours a day," or code's not even the right verb anymore, right?

Andrej Karpathy

[laughs] Yeah.

Sarah Guo

But I have to, um, express my will to my agents for-

Andrej Karpathy

Manifest

Sarah Guo

... sixteen hours a day. Manifest. Um, because, like, there's been a jump in capability.

Andrej Karpathy

Yeah.

Sarah Guo

Uh, what's happening? Tell me about your experience.

Andrej Karpathy

Yeah, I kinda feel like I was just in this perpetual... I still am often, uh, in this state of AI psychosis just, like, all the time, uh, because there was a huge unlock in what you can achieve as a person, as an individual, right? Because you were bottlenecked by, you know, your typing speed and so on. But now with these agents, it really... I would say in December is when it really just... something flipped, where I kinda went from eighty-twenty of like, you know, uh, to, like, twenty-eighty of writing code by myself versus just delegating to agents. And I don't even think it's twenty-eighty by now. I think it's a lot more than that. I don't think I've typed, like, a line of code probably since December, basically, [laughs] um, which is, like, an extremely large, uh, change. Um, I was talking to, like, for example, I was talking about it to, for example, my parents and so on, and I don't think, like, a normal person actually realizes that this happened or how dramatic it was. Like, literally, like, if you just find a random software engineer or something like that at their, at their desk and what they're doing, like, their default workflow of, you know, building software is completely different as of basically December. Uh, so I'm just, like, in this state of psychosis of trying to figure out, like, what's possible, uh, trying to push it to the limit. How is it-- how can I have not just a single session of, you know, um, Claude Code or Codex or some of these agent harnesses? How can I have more of them? How can I do that, uh, appropriately? And then how can I use these claws? What are these claws? Uh, [laughs] and, uh, so there's, like, a lot of new things. I wanna be at the forefront of it, you know, and I'm very antsy that I'm not at the forefront of it. And I see lots of people on Twitter doing all kinds of things, and they all sound like really good ideas, and I need to be at the forefront, or I feel extremely nervous. And so I guess I'm just in this psychosis of, like, what's possible? Like, because it's unexplored fundamentally.

Install uListen to search the full transcript and get AI-powered insights

Get Full Transcript

Get more from every podcast

AI summaries, searchable transcripts, and fact-checking. Free forever.

Add to Chrome