Question 1

Why does Karpathy say AI agents will take a decade?

Accepted Answer

Karpathy thinks usable AI agents need years of reliability work, not a one-year hype cycle. He frames an agent as something close to an employee or intern that could take over real work from Claude, Codex, or similar systems. Today, he says those systems are impressive but still do not work well enough: they lack enough intelligence, multimodality, computer use, continual learning, and memory. You cannot just tell them something and trust that they will remember it. His decade estimate is not a precise forecast from a formula. It comes from his 15 years in AI, watching predictions succeed or fail, and his sense that these problems are tractable but difficult. The point is calibration: agents will improve and be wonderful, but the missing cognitive pieces take sustained engineering and research.

Question 2

What does Karpathy mean by building ghosts instead of animals?

Accepted Answer

Karpathy uses ghosts to describe digital imitators, not evolved animal minds. His contrast is with the Sutton-style vision of building animal-like intelligence from scratch. Animals are produced by evolution, which bakes in a huge amount of hardware before learning begins. His example is a zebra that can run and follow its mother within minutes of birth, which he says is not reinforcement learning. Current LLMs are built through a different process: imitation of humans and the data humans placed on the internet. That makes them fully digital, spirit-like entities that mimic humans rather than organisms with evolved bodies and instincts. He does not reject animal-like systems as a goal. He says it would be wonderful if a single algorithm could learn everything, but the practical path today is building useful ghost-like systems and possibly making them more animal-like over time.

Question 3

Why does Karpathy call reinforcement learning sucking supervision through a straw?

Accepted Answer

Karpathy's straw metaphor means outcome rewards give too little credit information. In his math-problem example, a model tries many long solution attempts and only checks the final answer at the end. Reinforcement learning then upweights every token in the attempts that got the right answer, even if parts of that reasoning wandered down wrong alleys. He calls that noisy and high variance because the single reward number is broadcast across the whole trajectory. A person would not learn that way. After finding a solution, a person would review which steps were useful, which steps were mistakes, and what should change next time. Karpathy says current LLMs lack that richer review process, although he has seen papers trying to add reflect-and-review style ideas. RL is useful because it can outperform imitation, but he thinks it is still a crude method.

Question 4

Why did self-driving take so long in Karpathy's march of nines?

Accepted Answer

Karpathy's march of nines explains why good demos are far from reliable products. From Tesla self-driving, he learned that a system working 90% of the time is only the first nine. The second, third, fourth, and fifth nines each take about the same kind of hard iteration, because every new level of reliability exposes fresh pockets of reality that need patching. He pushes back on calling self-driving finished: demos go back to the 1980s, he saw a nearly perfect Waymo ride around Palo Alto in 2014, and yet scaled self-driving still has economics, operations, and human-in-the-loop issues. He applies the same intuition to AI agents and production software. In domains where failure is costly, a slick demo is encouraging but does not mean the product is ready.

Question 5

What is Karpathy's Starfleet Academy idea for Eureka?

Accepted Answer

Karpathy wants Eureka to build technical learning ramps before true AI tutors are ready. He says he could help a frontier AI lab, but education is where he can add more unique value because he wants humans to be better off rather than sidelined by AI. He describes Eureka as trying to build a Starfleet Academy: an elite, up-to-date institution for technical knowledge. The ideal future experience is a real tutor, not just an LLM prompt. His Korean tutor could quickly infer what he knew, probe his world model, and give material at the exact difficulty he needed. He says current AI is not capable enough for that full experience, so the near-term product is a very strong AI course built around LLM101N, nanochat, TAs, and carefully engineered ramps to knowledge.

Andrej Karpathy on Dwarkesh Patel: Why Agents Take a Decade

Why does Karpathy say AI agents will take a decade?

What does Karpathy mean by building ghosts instead of animals?

Why does Karpathy call reinforcement learning sucking supervision through a straw?

Why did self-driving take so long in Karpathy's march of nines?

What is Karpathy's Starfleet Academy idea for Eureka?

Get more out of YouTube videos.