Skip to content
Dwarkesh PodcastDwarkesh Podcast

Andrej Karpathy on Dwarkesh Patel: Why Agents Take a Decade

Why pre-training and gradient descent produce ghosts rather than agents: Karpathy maps the biological gaps that make the decade of agents the honest frame.

Andrej KarpathyguestDwarkesh Patelhost
Oct 17, 20252h 26mWatch on YouTube ↗

EVERY SPOKEN WORD

  1. 0:0030:33

    AGI is still a decade away

    1. AK

      Reinforcement learning is terrible. (laughs)

    2. DP

      (laughs)

    3. AK

      It just so happens that everything that we had before it is much worse. (laughs)

    4. DP

      (laughs)

    5. AK

      I'm actually optimistic. I think this will work. I think it's tractable. I'm only sounding pessimistic because when I go on my Twitter timeline-

    6. DP

      (laughs)

    7. AK

      ... I see all this stuff that makes no sense to me. A lot of it is, I think, honestly just, uh, fundraising. We're not actually building animals. We're building ghosts. These are like sort of ethereal spirit entities because they're fully digital and they're kind of like mimicking humans, and it's a different kind of intelligence. It's business as usual because we're in an intelligence explosion already and have been for decades. Everything is gradually being automated, has been for hundreds of years. Don't write blog posts, don't do slides, don't do any of that.

    8. DP

      (laughs)

    9. AK

      Like, build the code, arrange it, get it to work. It's the only way to go, otherwise you're missing knowledge. If you have a perfect AI tutor, maybe you can get extremely far. The geniuses of today are barely scratching the surface of what a human mind can do, I think.

    10. DP

      Today, I'm speaking with Andrej Karpathy. Andrej, why do you say that this will be the decade of agents and not the year of agents?

    11. AK

      Mm-hmm. Uh, well, first of all, uh, thank you for, uh, having me here. I'm, uh, excited to be here. So the quote that you just mentioned, "It's the decade of agents," that's actually a reaction to an existing, preexisting quote, I should say, where I think a lot of th- some of the labs... I'm not actually sure who said this, but they were alluding to this being the year of agents-

    12. DP

      Hmm.

    13. AK

      ... uh, with respect to LLMs and, uh, how they were gonna evolve. And I think, um, I was triggered by that-

    14. DP

      (laughs)

    15. AK

      ... because I feel like there's some over-predictions going on in the industry.

    16. DP

      Yeah.

    17. AK

      And, uh, in my mind, this is really a lot more accurately described as the decade of agents.

    18. DP

      Yeah.

    19. AK

      And we have some very early agents that are actually like extremely impressive and that I use daily. Uh, you know, Claude and Codex and so on. But I still feel like there's, uh, so much work to be done. And so I think my, like my reaction is like, we'll be working with these things for a decade. They're gonna get better, uh, and, uh, it's gonna be wonderful. But I think I was just reacting to the timelines, I suppose, of the, of the, uh, implication.

    20. DP

      And w- what do you think it will take a decade to accomplish?

    21. AK

      Yeah.

    22. DP

      What are the bottlenecks?

    23. AK

      Well, um, actually make it work.

    24. DP

      Mm-hmm.

    25. AK

      So in my mind, I mean, when you're talking about an agent, I guess, or what the labs have in mind and what maybe I have in mind as well, is it's, uh, you should think of it almost like an employee or like an intern that you would-

    26. DP

      Yeah.

    27. AK

      ... hire to work with you. Uh, so for example, you work with some employees here.

    28. DP

      Yeah.

    29. AK

      Um, when would you prefer to have an agent like Claude or Codex, uh, do that work?

    30. DP

      Yeah.

  2. 30:3340:53

    LLM cognitive deficits

    1. DP

      you tweeted out that coding models were actually of very little help to you in assembling this repository, and I'm curious why that was.

    2. AK

      Yeah. Uh, so the repository, I guess I built it over a period of a bit more than a month, and I would say there's, like, three major classes of how people interact with code right now. Some people completely reject all of LLMs, and they are just, uh, writing by scratch...... I think this is probably not the, the right thing to do anymore. Um, the intermediate part, which is where I am, is you still write a lot of things from scratch, but you use, uh, the autocomplete, uh, that's basically, uh, available now from these models. So, when you start writing out a little b- piece of it, it will, it will autocomplete for you and you can just tab through-

    3. DP

      Yeah.

    4. AK

      ... and most of the time, it's correct. Sometimes it's not and you edit it. But you're still very much the, um, sort of architect of what you're writing. And then there's the, you know, vibe coding. Uh, you know, "Hi, please implement this or that. Uh, you know, enter." And then let the model do it.

    5. DP

      Yeah.

    6. AK

      And that's the agents. Um, I do feel like the agents work in very specific settings, and I would use them in specific settings. But again, these are all tools available to you, and you have to, like, learn what they, what they're good at-

    7. DP

      Right.

    8. AK

      ... and what they're not good at, and when to use them. So, the agents are actually pretty good, for example, if you're doing boilerplate stuff.

    9. DP

      Yeah.

    10. AK

      Boilerplate code that's like just cop- you know, just copy-paste stuff.

    11. DP

      Yeah.

    12. AK

      They're very good at that. They're very good at stuff that occurs very often in the internet, um, because there's lots of examples of it in the training sets of these models. Um, so, so there's, like, features of things that, where the models will do very well. I would say NanoChat is not an example of this, because, uh, it's a fairly unique repository. There's not that much code, I think, in the way that I've structured it. And, um, and it's not boilerplate code. It's, like, actually, like, intellectually intense code almost.

    13. DP

      Mm-hmm.

    14. AK

      And everything has to be very precisely arranged. And the models were always trying to... They kept trying to... I mean, they have so many cognitive deficits, right?

    15. DP

      Mm-hmm.

    16. AK

      So, one example, they keep trying to... They keep misunderstanding the code, um, because they, they have too much memory from all the typical ways of doing things-

    17. DP

      Mm-hmm.

    18. AK

      ... on the internet that I just wasn't adopting. Uh, so the models, for example... I mean, I don't know if I wanna get into the full details, but they keep, they keep, um, they keep thinking I'm writing normal code and I'm not (laughs) .

    19. DP

      May- maybe one example. I think it's-

    20. AK

      Maybe one example is-

    21. DP

      ... quite interesting.

    22. AK

      Uh, so the way to synchronize... So, you have eight GPUs-

    23. DP

      Yeah.

    24. AK

      ... that are all doing forward-backwards. The way to synchronize gradients between them is to use a distributed data parallel container of PyTorch, which automatically does all the... As you're doing the backward, it will start communicating-

    25. DP

      Yeah.

    26. AK

      ... and synchronizing the gradients. I didn't use DDP because I didn't want to use it, because it's not necessary. So, I threw it out, and I basically wrote my own synchronization routine that's inside the step of the optimizer. And so the models were trying to get me to use the DDP container-

    27. DP

      (laughs) Yeah.

    28. AK

      ... and they were very concerned about... Okay, this gets way too technical, but I wasn't using that container because I don't need it, and I have a custom implementation of-

    29. DP

      Yeah.

    30. AK

      ... something like it.

  3. 40:5350:26

    RL is terrible

    1. DP

      Let's talk about RL a bit.

    2. AK

      Mm-hmm.

    3. DP

      Uh, you too did some very interesting things about this. Um, conceptually, how should we think about the way that humans are able to build a rich world model just from interacting with our environment and in ways that seems almost irrespective of the final reward at the end of the episode?

    4. AK

      Mm-hmm.

    5. DP

      If somebody ha- you know, somebody's starting to start a business, and at the end of 10 years she finds out whether the business succeeded or failed, we say that she's earned a bunch of wisdom and experience.

    6. AK

      Mm-hmm. Yeah.

    7. DP

      But it's not because, like, the log probs of every single thing that happened over the last 10 years-

    8. AK

      Yeah.

    9. DP

      ... are upweighted or downweighted. Something much more deliberate and, uh, rich is happening.

    10. AK

      Yeah.

    11. DP

      How... What is the ML analogy a- and how does that compare to what we're doing with LLMs right now?

    12. AK

      Yeah, maybe the way I would put it is, uh, humans don't use reinforcement learning is maybe what I've- (laughs)

    13. DP

      Hmm.

    14. AK

      ... as I've said it all. I, I think they do something different, which is, yeah, you experience... So reinforcement learning is a lot worse than I think the average person thinks. (laughs)

    15. DP

      (laughs) Reinforcement learning is terrible. (laughs)

    16. AK

      (laughs)

    17. DP

      (laughs)

    18. AK

      It just so happens that, uh, everything that we had before it is much worse. (laughs)

    19. DP

      (laughs)

    20. AK

      Uh, because previously we were just imitating people, so it has all these issues. Um, so in reinforcement learning, say you're working with, uh, you're solving a math problem, because it's very simple. You're given a math problem and you're trying to find the solution. Um, now in reinforcement learning, you will try, uh, lots of things in parallel first. So, uh, you're given a problem. You try hundreds of different attempts. And these attempts can be complex, right? They can be like, "Oh, let me try this. Let me try that. This didn't work. That didn't work," et cetera. And then maybe you get an answer. And now you check the back of the book and you see, okay, the correct answer is this. And then you can see that, okay, this one, this one, and that one got the correct answer, but these other 97 of them didn't. So literally what reinforcement learning does is it goes to the ones that worked really well, and every single thing you did along the way-

    21. DP

      Yeah.

    22. AK

      ... every single token gets upweighted of, like, do more of this. The problem with that is, uh, I mean, people will say that, uh, your estimator has high variance, but what... I mean, it's just noisy. It's noisy. (laughs) Uh, so basically, it kind of almost assumes that every single little piece of the solution that you made that arrived at the right answer was correct thing to do, which is not true. Like, you may have gone down the wrong alleys, uh, until you arrived at the right solution. Every single one of those incorrect things you did, as long as you got to the correct solution, will be upweighted as do more of this. It's terrible.

    23. DP

      ... yeah.

    24. AK

      It's noise. You've done all this work only to find a single... at the end you get a single number of like, oh, you did correct. And - and based on that, you weigh that entire trajectory as like upweight or downweight. And so you're... the way I like to put it is, you're sucking supervision through a straw. Uh, because you've done all this work, that could be minutes of rollout, and you're - you're like sucking the bits of supervision of the final reward signal-

    25. DP

      (laughs)

    26. AK

      ... through a straw, and you're like putting it... you're like... (laughs) you're basically like, um... yeah, you're broadcasting that across the entire trajectory and using that to upweight or downweight that trajectory. It's just stupid and crazy.

    27. DP

      Uh-

    28. AK

      A human would never do this. Number one, a human would never do hundreds of rollouts, right?

    29. DP

      Right.

    30. AK

      Uh, number two, when a person sort of finds a solution, they will have a pretty complicated process of review, of like, "Okay, I think these parts that I did well. These parts I did not do that well. I should probably do this or that." And they think through things. There's nothing in current LLMs that does this. There's no equivalent of it. Um, but I do see papers popping out that are trying to do this, because it's obvious to everyone in the field.

  4. 50:261:07:13

    How do humans learn?

    1. AK

    2. DP

      So I guess, like, I, I, I see a very, um, not easy, but, like, I, I can conceptualize how you would ha- be able to train on synthetic examples.

    3. AK

      Mm-hmm.

    4. DP

      Or synthetic problems that you have made for yourself.

    5. AK

      Mm-hmm.

    6. DP

      But there seems to be another thing humans do, maybe sleep is this, maybe daydreaming is this.

    7. AK

      Mm-hmm.

    8. DP

      Which is not necessarily come up with fake problems, but just, like, reflect.

    9. AK

      Yeah.

    10. DP

      And I'm not sure what the ML analogy for, you know, daydreaming or sleeping.

    11. AK

      Mm-hmm. Yeah.

    12. DP

      But just, like, just reflecting, I haven't come up with any problem.

    13. AK

      Yeah, yeah.

    14. DP

      I mean, obviously, the very basic analogy would just be, like, fine-tuning on reflection bits. But I feel like in practice, that probably wouldn't work that well.

    15. AK

      Yeah.

    16. DP

      So I don't know if you have some take on what, what the analogy of, like, th- this thing is.

    17. AK

      Yeah, I do think that, that we're missing some aspects there. So as an example, uh, when you're reading a book-

    18. DP

      Yeah.

    19. AK

      ... um, I almost feel like currently, when LLMs are reading a book, uh, what that means is we stretch out the sequence of text and the model is predicting the next token.

    20. DP

      Yeah.

    21. AK

      And it's getting some knowledge from that. Uh, that's not really what humans do, right?

    22. DP

      Yeah.

    23. AK

      So when you're reading a book, I almost don't even feel like the book is like exposition I'm supposed to be attending to and training on.

    24. DP

      Mm-hmm.

    25. AK

      The book is a, is a set of prompts for me to do-

    26. DP

      Mm-hmm.

    27. AK

      ... synthetic data generation, or for you to get into a book club and talk about it with your friends.

    28. DP

      Yeah.

    29. AK

      And it's by manipulating that information that you actually gain that knowledge.

    30. DP

      Yeah.

Episode duration: 2:26:07

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode lXUZvyajciY

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome