Skip to content
No PriorsNo Priors

No Priors Ep. 80 | With Andrej Karpathy from OpenAI and Tesla

Andrej Karpathy joins Sarah and Elad in this week of No Priors. Andrej, who was a founding team member of OpenAI and the former Tesla Autopilot leader, needs no introduction. In this episode, Andrej discusses the evolution of self-driving cars, comparing Tesla's and Waymo’s approaches, and the technical challenges ahead. They also cover Tesla’s Optimus humanoid robot, the bottlenecks of AI development today, and how AI capabilities could be further integrated with human cognition. Andrej shares more about his new mission Eureka Labs and his insights into AI-driven education and what young people should study to prepare for the reality ahead. Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @Karpathy Show Notes: 0:00 Introduction 0:33 Evolution of self-driving cars 2:23 The Tesla vs. Waymo approach to self-driving 6:32 Training Optimus with automotive models 10:26 Reasoning behind the humanoid form factor 13:22 Existing challenges in robotics 16:12 Bottlenecks of AI progress 20:27 Parallels between human cognition and AI models 22:12 Merging human cognition with AI capabilities 27:10 Building high performance small models 30:33 Andrej’s current work in AI-enabled education 36:17 How AI-driven education reshapes knowledge networks and status 41:26 Eureka Labs 42:25 What young people study to prepare for the future

Sarah GuohostElad GilhostAndrej Karpathyguest
Sep 5, 202444mWatch on YouTube ↗

EVERY SPOKEN WORD

  1. 0:000:33

    Introduction

    1. SG

      Hi, listeners. Welcome back to No Priors. Today, we're hanging out with Andrej Karpathy who needs no introduction. Andrej is a renowned researcher, beloved AI educator and cuber, an early team member from OpenAI, the lead for Autopilot at Tesla, and now working on AI for education. We'll talk to him about the state of research, his new company, and what we can expect from AI.

    2. EG

      Thanks a lot for joining us today.

    3. AK

      Yep.

    4. EG

      It's great to have you here.

    5. AK

      Thank you. Happy to be here.

  2. 0:332:23

    Evolution of self-driving cars

    1. AK

    2. SG

      You led Autopilot at Tesla and now, like, we actually have fully self-driving cars, passenger vehicles on the road. How do you read that in terms of where we are in the capability set, how quickly we should see increased capability or pervasive passenger vehicles?

    3. AK

      Uh, yes. I spent maybe five years on self-driving space. I think it's a fascinating space and, um, basically what's happening in the field right now is... Well, um, I do also think that I f- I draw a lot of, like, analogies, I would say, to AGI from self-driving, and maybe that's just because I'm familiar with it. But I kind of feel like we've reached AGI a little bit in self-driving-

    4. SG

      Mm-hmm.

    5. AK

      ... uh, because there are systems today that you can basically take around, and as a paying customer can take around here. So, Waymo in San Francisco here is, of course, very common. Probably you've taken Waymo. I've taken it a bunch and it's amazing and it can drive you all over the place and you're paying for it as a product.

    6. SG

      Mm-hmm.

    7. AK

      What's interesting with Waymo is the first time I took Waymo was actually a decade ago almost exactly, 2014 or so, and it was a friend of mine who worked there and he gave me a demo. And it drove me around the block 10 years ago and it was basically a perfect drive 10 years ago.

    8. SG

      Mm-hmm.

    9. AK

      And it took 10 years to go from, like, a demo that I had to a product I can pay for that's in a city scale and it's expanding, et cetera.

    10. EG

      How much of that do you think was regulatory versus technology? Like, when do you think the technology was ready? Is it at this end? It's-

    11. AK

      I think it's technology. You're just not seeing it in a single demo drive of 30 minutes.

    12. EG

      Yeah, yeah.

    13. AK

      You're not running into all the stuff that they had to do with, deal with for-

    14. EG

      Sure.

    15. AK

      ... a decade. And so demo and product, there's a massive gap there. And I think a lot of it also regulatory, et cetera. Uh, but I do think that we've sort of, like, achieved AGI in the self-driving space in, in that sense a little bit. And yet, I think there's... What's n- really fascinating about it is the globalization hasn't happened at all.

    16. EG

      Mm-hmm.

    17. AK

      So you have a demo and you can take it in the South, but, like, the world hasn't changed yet, and that's gonna take a long time.

    18. EG

      Mm-hmm.

    19. AK

      And so going from a demo to an actual globalization of it, I think there's a big gap there. That's how it's related, I would say, to AGI because I suspect similar... It will look in a similar way for AGI when we sort of get it. And then staying for a minute in the self-driving space. Uh, I think

  3. 2:236:32

    The Tesla vs. Waymo approach to self-driving

    1. AK

      people think that Waymo is ahead of Tesla. I think personally Tesla is ahead of Waymo, and I know it doesn't look like that, uh, but I'm still very, uh, bullish on Tesla and its self-driving program. I think that Tesla has a software problem, and I think Waymo has a hardware problem is the way I put it, and I think software problems are much easier. Tesla has deployment of all these cars on Earth, uh, like at scale.

    2. SG

      Mm-hmm.

    3. AK

      And I think, uh, Waymo needs to get there. And so, uh, the moment Tesla sort of, like, gets to the point where they can actually deploy this and it actually works, I think it's gonna be, you know, uh, really incredible. Uh, the latest builds I just drove yesterday, I mean, it's just driving me all over the place now. They've made, like, really good improvements, uh, I would say very recently.

    4. EG

      Yeah, I've been using it a lot recently and it actually-

    5. AK

      Yeah.

    6. EG

      ... works quite well, so yeah.

    7. AK

      It was, it did some miraculous, uh, driving for me yesterday-

    8. EG

      Mm-hmm.

    9. AK

      ... so I'm very impressed with what the team is doing. And so I still think that Tesla mostly has a software problem, Waymo mostly hardware problem, and so I think Tesla... Uh, Waymo looks like it's winning kind of right now-

    10. EG

      Mm-hmm.

    11. AK

      ... but I think when we look in 10 years and who's actually at scale and where most of the revenue's coming from, I still think they're, uh, they're ahead in that sense.

    12. EG

      How far away do you think we are from the software problem turning the corner in terms of getting to some equivalence? Because obviously to your point, if you look at a Waymo car, it has a lot of very expensive LiDAR and other sort of sensors built into the car so it can do what it does that sort of helps support the software system. And so if you could just use cameras, which is the Tesla approach, then you effectively get rid of enormous cost/complexity and you can do it in an... in many different types of cars. When do you think that transition happens?

    13. AK

      I mean, in- in this... In the next few years, I mean, I'm hoping, you know, like something like that. But actually, what's really interesting about that is I'm not sure that people are appreciating that Tesla actually does use a lot of expensive sensors. They just do it at training time.

    14. EG

      Mm-hmm.

    15. AK

      So there are a bunch of cars that drive around with LiDARSs.

    16. EG

      Mm-hmm.

    17. AK

      They do a bunch of stuff that, like, doesn't scale and they have extra sensors, et cetera, and they do mapping and all this stuff. You're doing it at training time and then you're distilling that into a test time package that gets deployed to the cars and is vision only. And it's like an arbitrage on, like, sensors, uh, and, like, expense.

    18. SG

      Yeah.

    19. AK

      And so I think it's actually kind of a brilliant strategy that I don't think is fully appreciated, and I think it's gonna work out well because the pixels have the information and I think, uh, the network will be capable of doing that. And yes, at training time, I- I think these sensors are really useful, but I don't think they're as useful at test time, and I think you can... You know.

    20. EG

      It seems like the one other thing or transition that's happened is basically a move from a lot of, um, sort of, uh, edge case design heuristics associated with it versus end-to-end deep learning, and that's one other shift that's happened recently. Do you wanna talk a little bit about that and sort of what that (whimsical music)

    21. AK

      Yeah, I think that was always, like, the plan from the start, I would say, at Tesla is I was talking about how the neural net can, like, eat through the stack. Because when I joined there was a ton of C++ code and now there's much, much less C++ code in the test time package that runs in the car because, uh, there's still a ton of stuff in the... in the backend, uh, that we're now talking about. The neural net kind of, like, takes, uh... takes through the system. So first, it just does, like, detection on the image level, then it does multiple images, gives you prediction, then multiple images over time give you a prediction, and you're discarding C++ code, and eventually you're just giving zero commands. And so I think Tesla's kind of eating through the stack. My understanding is that current Waymos are actually, like, not that, but that they've tried but they ended up, like, not doing that is my current understanding but I'm not sure because they don't talk about it. But I do fundamentally believe in this approach, um, and I think, um, that's the last piece to fall if, if you wanna think about it that way. And I do suspect that the end-to-end systems for Tesla in, like, say 10 years, it is just a neural net. I mean, the, the videos stream into a neural net and commands come out. You have to sort of build to... build up... build up to it incrementally and, uh, do it piece by piece. And even all the intermediate predictions and all these things that we've done, I don't think they've actually, like, misled the development. I think they're part of it, uh, because, um...There's a lot of solid reasons for this. So f- so actually like end-to-end driving when you're just imitating humans and so on, uh, you have very few bits of supervision to train a massive neural net. And it's too, too few bits of signal to train so many billions of parameters. And so these intermediate representations and so on help you develop the features and the detectors for everything and then it makes a much easier problem for the, um, end-to-end part of it. And so I suspect, although I don't know because I'm not part of the team, but there's a ton of pre-training happening so that you can do the fine-tuning for end-to-end. And so basically I feel like it was necessary to eat through it incrementally and that's what Tesla has done, and I think it's the right approach and it looks like it's working, so I'm really looking forward to-

    22. SG

      If you had started end-to-end you wouldn't have had the data anyway. That makes sense, yeah.

  4. 6:3210:26

    Training Optimus with automotive models

    1. SG

      So, uh, you worked on the, um, Tesla humanoid robot before you left. Uh, I have so many questions but one is like starting here, what transfers?

    2. AK

      Basically everything transfers and I don't think people appreciate it.

    3. SG

      Okay.

    4. AK

      Um... (clears throat)

    5. SG

      That's a big claim. It s-

    6. AK

      I think cars-

    7. SG

      It's like a very different problem.

    8. AK

      ... are basically robots when you actually, uh, look at it.

    9. SG

      Okay.

    10. AK

      Um, cars are robots, and Tesla, I don't think is a car company, I think this is misleading, it's a robotics company.

    11. SG

      Mm-hmm.

    12. AK

      Robotics at scale company because I would say at scale is also like a whole separate variable. They're not building a single thing, they're building the machine that builds the thing, which is a whole separate thing. Um, and so I think robotics at scale, um, company, uh, is what Tesla is, and I think, uh, in terms of the transfer from cars to r- uh, to humanoids it was not, not that much work at all. And in fact like the early versions of Optimus, um, the ro- robot, uh, it thought it was a car.

    13. SG

      (laughs)

    14. AK

      Like because it had the exact same computer, it had the exact same cameras. It was really funny because we were running the car networks on the robot but it's walking around the office and so on.

    15. SG

      Oh, amazing.

    16. AK

      And like it's trying to like recognize drivable space but it's all just walking space now I suppose, but it actually kind of like generalized a little bit and there's some fine-tuning necessary and so on. But it thought it was driving but it's actually like moving through an environment.

    17. SG

      Is a reasonable way to think of this as like actually it's a robot, many things transfer, but you're just missing, for example, actuation and action data?

    18. AK

      Yeah, you definitely miss some components but... And the other part I would say is like, like so much transfers, like the speed with which Optimus was started I think to me was very impressive because the moment Elon said, "We're doing this," uh, just people just showed up with all the right tools and all... The stuff just showed up so quickly and all these CAD models and all the supply chain stuff and I just felt like wow, there's so much in-house expertise for building robotics at Tesla. And like it's all the same tools and they're just like, okay, they're being reconfigured from a car, like a transformer, the movie, they're just being reconfigured and reshuffled but it's like the same thing. And you need all the same components, you need to think about all the sa- same kinds of stuff both in the hardware side, on the scale stuff, and also on the brains. And so for the brains there was also a ton of transfer, not just of the specific networks but also all of the approach and the labeling team and how it all coordinates and the approaches people are taking. I just think there's a ton of transfer.

    19. EG

      What do you think are the first application areas for humanoid robotics or human form stuff?

    20. AK

      I think a lot of people have this vision of it like doing your laundry, et cetera. I think that will come late. I don't think B2C should be the right start point because I don't think we can have a robot like crush grandma is how I-

    21. EG

      Yeah, yeah.

    22. AK

      ... put it sort of. I think it's like too much, uh, legal liability. (laughs) It's just like I don't think-

    23. SG

      But like a very dorky hug. (laughs)

    24. AK

      I mean, it's just gonna fall over or something like that, you know, like these things are not perfect yet and they require some amount of work. So I think the best customer is yourself first and I think probably Tesla's gonna do this. Uh, I'm very bullish on Tesla if, if people can't tell. Uh, the first customer is yourself and you incubate it in the factory and so on, doing maybe a lot of material handling, et cetera. This way you don't have to create contracts working with third parties, it's all really heavy, there's lawyers involved, like et cetera. You incubate it then you go I think B2B second. Uh, and you go to other companies that have massive warehouses where you can do material handling, we're gonna do all this stuff. Contracts get drafted up, fences get put around, all this kind of stuff. And then once you're incubated in multiple companies I think that's when you start to go into the B2C applications. I do think we'll see B2C, uh, robots also, um, like Unitree and so on are starting to come out with robots that I really want.

    25. EG

      Yeah.

    26. SG

      I got one.

    27. AK

      You did?

    28. SG

      Yeah.

    29. AK

      Okay.

    30. SG

      Yeah, the G1.

  5. 10:2613:22

    Reasoning behind the humanoid form factor

    1. SG

      thesis for a second? Because the, the, the simplest version of this is like the world is built for humans and you build one set of hardware. The right thing to do is build a model that can do an increasing set of tasks in this set of hardware. I think there's a, like another camp that believes like well, like humans are not optimal for any given task, right? You can make them stronger or bigger or smaller or whatever and why shouldn't we do superhuman things, like how do you think about this?

    2. AK

      I think people are maybe un- underappreciating the complexity of any fixed cost that goes into any single platform. I think there's a large fixed cost you're paying for any single platform and so I think it makes a lot of sense to centralize that and have a single platform that can do all the things. I would say the humanoid aspect is also very appealing because people can tele-operate it very easily and so it's a data collection thing that is extremely helpful, uh, because people will be able to obviously very easily tele-operate it. I think that's usually overlooked. There's of course the aspect you mentioned which is like th- world is designed for humans, et cetera, so I think that's also important. I mean, I think we'll have some variations on the humanoid, uh, platform but I think, uh, there is a large fixed cost to any platform. And then I would say also one last dimension of it is you benefit a, a ton from like the transfer learning between the different tasks. And in AI you really want the single neural net that is multitasking, doing lots of things, that's where you're getting all the intelligence and the capability from. And that's also hap- mm, that's also why language models are so interesting is 'cause you have a single, uh, regime like a text, um, domain multitasking all these different problems and they're all sharing knowledge between each other and it's all coupled in a single neural net. And I think you want that kind of a platform and-Uh, you know, you want all the data you collect for leaf picking to benefit all the other tasks. Um, if you're building a special purpose thing for any one thing, you're not gonna benefit from a lot of the transferring between all the other tasks, if that makes sense.

    3. SG

      Yeah, I think there's one, um, argument of like, uh, it seems... I mean, the G1 is like 30 grand, right? But it seems hard to build very capable humanoid robot under a certain BOM, and like if you wanted to, you know, put an arm on wheels that can do things, like as a, like y- maybe there are cheaper approaches to a general platform at the beginning. Does that make sense to you?

    4. AK

      Uh, cheaper approaches to a general platform. I see.

    5. SG

      From a hardware perspective. Yeah.

    6. AK

      Uh, yeah, I think that makes sense. Yeah, you put a wheel on it instead of feet, et cetera. I do feel like, I wonder if it's taking it down like a local minimum a little bit. I just feel like pick a platform, uh, make it perfect is like the long term, uh, pretty good bet. And then the other thing, of course, is like I just think it will be kind of like familiar to people, and I think people will understand that maybe when I talk to it. And I feel like the cog, the psychological aspect also of it I think favors possibly the human platform unless people are like scared of it (laughs) and would actually prefer a platform that is more abstractive like some... But then I don't know if, if the, if it's some-

    7. SG

      We've had sci-fi for so long.

    8. AK

      ... some kind of like an eight-wheeled monster doing stuff, then I don't know if that's like more appealing or less appealing.

    9. EG

      Well, and it's kind of like, it's interesting that thing with the other, um, form factor for the Unitree is a dog, right? And it's almost a more friendlier, familiar-

    10. AK

      Yeah, but then people watch Black Mirror and suddenly the dog flips to like a scary thing.

    11. EG

      ... thing it is. Yeah, yeah.

    12. AK

      So it's hard to think through. Uh, I just think psychologically it will be easy for people to understand what's happening and...

    13. EG

      What

  6. 13:2216:12

    Existing challenges in robotics

    1. EG

      do you think is missing in terms of, um, technological milestones for progress relative to substantiating this future?

    2. AK

      Uh, for robotics specifically?

    3. EG

      For robotics, yeah, or the, the humanoid robot or anything else human form.

    4. AK

      Yeah, I don't know that I have like a really good window into it. I do think that it is kind of interesting that like in a humanoid form factor for, for example, for the lower body, uh, I don't know that you wanna do imitation learning from like demonstration, because for lower body it's all a lot of like inverted pendulum control and stuff like that. It's for the upper body that you need a lot of like tele-operation and, uh, data collection and end-to-end and et cetera. And so I think like everything becomes like very hybrid in that sense, and I don't know how those systems interact.

    5. EG

      When I, when I talk to people working in the field, a lot of what they focus on is like actuation and, you know, manipulation and sort of digital manipulation and things like that.

    6. AK

      Yeah, I do expect in the beginning it's a lot of like tele-operation for, uh, getting stuff off the ground and then imitating it and getting something that works 95% of the time and then talking about human to robot ratios and gradually having people who are supervisors of robots instead of doing the task directly and all this kind of stuff is gonna happen over time and pretty gradually. I don't know that there's like any, uh, individual impediments that I'm like really familiar with. I just think it's a lot of grunt, grunt work. A lot of like the tools are available and transformers are this beautiful like blob of tissue you can just get-

    7. EG

      Yeah.

    8. AK

      ... just, just arbitrary tasks.

    9. EG

      Mm-hmm.

    10. AK

      And you just need the data, you need to put it in the right form, you need to train it, you need to experiment with it, you need to deploy it, iterate on it. There's just a lot of grunt work. I don't know that I have a single individual thing that is like holding us back technically.

    11. SG

      Where are we in the state of large blob research?

    12. AK

      (laughs) Large blob research? (laughs)

    13. SG

      Yeah.

    14. AK

      We're in a really good state. Uh, so I think, um, I'm not sure if it's fully appreciated, but like the transformer is like much more amazing. It's not just like an, it's not just another neural net, it's like an amazing neural net, extremely general. Uh, so for example, when people talk about the scaling laws in neural networks, the scaling laws are actually a, um, uh, to a large extent a, a property of the transformer. Before the transformer, people were playing with LSTMs and stacking them, et cetera. You don't actually get like clean scaling laws and this thing doesn't actually train and doesn't actually work. It's the transformer that was the first thing that actually just kind of like scales, um, and you get scaling laws and everything makes sense. So it's this like general purpose training computer. I think of it as kind of a computer-

    15. SG

      Mm-hmm.

    16. AK

      ... but it's like a differentiable computer. And you can just give it inputs and outputs and building this off it and you can train with back propagation and it actually kind of like arranges itself into a thing that does the task. And so I think it's actually kind of like a magical thing that we've stumbled on in the algorithm space, and I think there's a few individual innovations that went into it. So you have the residual connections, that was a piece that existed. You have the layer normalizations, uh, that needs to slot in. You have the attention block, you have the lack of these like, um, uh, saturating non-linearities like 10 Hs and so on, those are not present in the transformer 'cause they kill gradient signals. So there's a few like, there's four or five innovations that all existed and were put together into this transformer, and that's what Google did with their paper. And this thing actually trains. Uh, and suddenly you get scaling laws and suddenly you have like this piece of tissue that just trains-

    17. SG

      Mm-hmm.

    18. AK

      ... to a very large extent. And so I, it was a major unlock.

    19. SG

      You feel

  7. 16:1220:27

    Bottlenecks of AI progress

    1. SG

      like we are not near the limit of that unlock, right? Because I think there is a discussion of, of course like the data wall and how expensive another generation of scale would be. Like, how do you think about that?

    2. AK

      Uh, that's where we start to get into... Or like, I don't think that the neural network architecture is like holding us back fundamentally anymore. It's like not the bottleneck, whereas I think in the previous, in, before transformer it was the bottleneck.

    3. SG

      Right.

    4. AK

      But now it's not the bottleneck.

    5. SG

      But now the inputs are.

    6. AK

      So now we're talking a lot more about-

    7. SG

      Yeah.

    8. AK

      ... where's the loss function, where's the data set? We're talking a lot more about those and those have become the bottlenecks almost. Um, it's not the general piece of tissue that reconfigures based on whatever you want it to be. And so that's where I think a lot of the activity has moved and that's why a lot of the companies and so on who are applying this technology, like they're not thinking about the transformer much, they're not thinking about the architecture. You know, the LLaMA release, uh, like the, the transformer hasn't changed that much. Uh, you know, we've added the rope positional and the rope brought the positional encodings. Um, that's like the major change. Everything else doesn't really matter too much, it's like plus 3% on the small few things. Uh, but really it's like rope is the only thing that's slotted in and that's the transformer as it, as it has changed since the last five years or something. So there hasn't been that much innovation on that. Everyone just takes it for granted, let's train it, et cetera. And then everyone's just innovating on that data set mostly and the loss function.

    9. SG

      Mm-hmm.

    10. AK

      ... details. Uh, so that's where all the activity has gone to.

    11. SG

      Right. But what about the argument, like in that domain-

    12. AK

      Mm-hmm.

    13. SG

      ... that, that was easier when we were taking internet data-

    14. AK

      Uh-huh.

    15. SG

      ... and we're out of internet data-

    16. AK

      Uh-huh.

    17. SG

      ... and so the questions are really around like synthetic data-

    18. AK

      Yeah.

    19. SG

      ... or more expensive data collection?

    20. AK

      Yeah, yeah. So I think that's a good point. So that's where a lot of the activity is now in LLMs. So the internet data is like not the data you want for your transformer. It's like a nearest neighbor that actually gets you really far, surprisingly. (laughs) But the internet data is a bunch of internet web pages, right? It's just like what you want is the inner thought monologue of your brain.

    21. SG

      Yep.

    22. AK

      That's the idea.

    23. SG

      The trajectories in your brain.

    24. AK

      The trajectories in your brain as you're doing problem solving, if we had a billion of that, like AGI is here-

    25. SG

      Yeah.

    26. AK

      ... roughly speaking, I mean, to a very large extent, uh, and we just don't have that. So where a lot of activity is now I think is with the internet data that actually gets you like really close because it just so happens that internet has enough of reasoning traces in it and a bunch of knowledge, and the transformer just makes it work okay. (laughs) So I think a lot of activity now is around, um, refactoring the dataset into these inner monologue, uh, formats. And I think there's a ton of synthetic data generation that's helpful for that. So what's interesting about that also is like the extent to which the current models are helping us create the next generation of models, and so it's kind of like, you know, the staircase of improvement.

    27. EG

      Ho- ho- how much do you think, uh, synthetic data is... Or how far does that get us, right? Because-

    28. AK

      Yeah.

    29. EG

      ... to your point on each data, how each model helps you train the subsequent model better, or at least create tools for it, data labeling, whatever, maybe part of it is synthetic data. How important do you think the synthetic data piece is? Because when I talk to people, they say-

    30. AK

      It's incredibly important, yeah. (laughs)

  8. 20:2722:12

    Parallels between human cognition and AI models

    1. AK

      (laughs)

    2. SG

      What do you think we are learning now about human cognition from this research?

    3. AK

      I don't know if we're learning much more.

    4. SG

      One could argue that like figuring out these shape of reasoning traces we want-

    5. AK

      Oh.

    6. SG

      ... for example, is, um, instructive to actually understanding how the brain works.

    7. AK

      I would be careful with those analogies, but in general I do think that it's, um, it's a very different kind of thing. But I do think that, uh, there are some analogies you can draw. So as an example, uh, I think transformers are actually better than the human brain in a bunch of ways. I think they're actually a lot more efficient system, and the reason they don't work as good as the human brain is mostly a data issue, roughly speaking-

    8. SG

      Mm-hmm.

    9. AK

      ... as the first order of approximation I would say. And actually like as an example, like transformer memorizing sequences is so much better than humans. Like if you give it a sequence and you do a single forward backward pass in that sequence, then if you give it the first few elements, it will complete the rest of the sequence. It memorized that sequence, and it's so good at it. If you gave a human a single presentation of a sequence, there's no way that you can m- m- remember that. And so the transformer is actually... I do think there's a good chance that the gradient, um, based optimization, the forward backward update that we do all the time for training neural nets, is actually more efficient than the brain in some ways, and these models are better, they're just not, um, yet ready to shine. But in a bunch of cognitive sort of aspects, I think they might come out-

    10. SG

      With the right inputs they will be better.

    11. EG

      That's, that's generically true of computers though for all sorts of applications, right?

    12. AK

      Yeah. And I think human-

    13. EG

      Including memory, to your point.

    14. AK

      Yeah, exactly. And I think human brains just have a lot of constraints. You know, the working memory is very small. I think transformers have a lot, lot bigger working memory and will, this will continue to be the case. Uh, they are much more efficient learners. Uh, the human brains function under all kinds of constraints. Uh, it's not obvious that the human brain uses back propagation, right? It's not obvious how that would work. It's a very stochastic sort of dynamic system. It has all these, uh, constraints it works under, so ambient conditions, et cetera. So I, I do think that what we have is actually potentially better than the brain, um, and it's just not there yet.

    15. EG

      How do you think

  9. 22:1227:10

    Merging human cognition with AI capabilities

    1. EG

      about, um, human augmentation with different AI systems over time? Do you think that's a likely direction? Do you think that's unlikely?

    2. AK

      Augmentation?

    3. EG

      Augmentation of people with AI models.

    4. AK

      Oh. Oh, of course. I mean, but in, in what sense maybe.

    5. EG

      (laughs)

    6. AK

      I think in general, absolutely.

    7. EG

      'Cause I mean, there's the abstract version of it you're using as a tool. That's the external version.

    8. AK

      Mm-hmm.

    9. EG

      There's the, you know, the merger scenario-

    10. SG

      Oh (laughs) .

    11. EG

      ... that, you know, a lot of people end up talking about.

    12. AK

      Yeah, yeah.

    13. EG

      Yeah.

    14. AK

      I mean, we're already kind of merging. Uh, the thing is like there's a, you know, there's the IO bottleneck.

    15. EG

      Sure.

    16. AK

      Uh, but for the most part, you know, at your fingertips if you have any of these models, you already-

    17. EG

      Yeah, but that's a little bit different because I mean people have been making that argument for I think 40, 50 years where, uh, technological tools are just extension of human capabilities, right?

    18. AK

      Mm-hmm. Yeah, the computer is the bicycle for human mind.

    19. EG

      Correct, yeah, exactly.

    20. AK

      Et cetera.

    21. EG

      So, um-

    22. AK

      Just an extension of that.

    23. EG

      ... but there's a subset of the AI community that thinks that, for example, the way that we subsume some potential conflict with future AI or something else would be through some form of...

    24. AK

      Uh, yeah, like the Neurolink pitch, et cetera.

    25. EG

      Exactly, yeah.

    26. AK

      Um, yeah, I don't, I don't know what this merger looks like, uh, yet. But I can definitely see that you want to decrease the IO to tool use.

    27. EG

      Yeah.

    28. AK

      And I see this as kind of like an exocortex we're building on top of our neocortex, right? And it's just the next layer and, uh, it just turns out to be in the cloud, et cetera. But it is the next layer of the brain.

    29. EG

      Yeah, Accelerando, a book from the early 2000s has a version of this where basically everything is ins- is substantiated in a set of goggles that are computationally attached to your brain that you wear. And then if you lose them, you almost feel like you're losing a part of your persona or memory.

    30. AK

      I think that's very likely, yeah. And today the phone is already almost that. And I think it's gonna get worse. When you put your techno stuff away from you-

  10. 27:1030:33

    Building high performance small models

    1. EG

      get to in some sense? Either in parameter size or however you wanna think about it. And so I'm a little bit curious about your view.

    2. AK

      Yeah.

    3. EG

      'Cause you, you've thought a lot about both, uh, distillation, small models, you know.

    4. AK

      Yeah, yep. I think it can be surprisingly small. And I do think that the current models are wasting a ton of capacity remembering stuff that doesn't matter. Like they remember SHA hashes, they remember like the ancient-

    5. SG

      'Cause the data set is not curated, uh, the best it could be.

    6. AK

      Yeah, exactly.

    7. SG

      Yeah.

    8. AK

      Like, and I think this will go away, and I think we just need to get to the cognitive core. And I think the cognitive core can be extremely small, and is just this thing that thinks. And if it needs to look up information, it knows how to use different tools.

    9. SG

      Is that like three billion parameters? Is that 20 billion parameters?

    10. AK

      I think even a billion-

    11. SG

      Okay.

    12. AK

      ... a billion suffices. We'll probably get to that point and, uh, the models can be very, very small. And I think the reason they can be very small is fundamentally I think just like distillation works. It may be like the only thing I would say. Distillation works like surprisingly well. Uh, distillation is where you get a really big model or a huge amount of computer or something like that, um, supervising a very small model.

    13. EG

      Mm-hmm.

    14. AK

      And, uh, you can actually, um, stuff a lot of capability into a very small model.

    15. EG

      Is, is there some sort of like, uh, mathematical representation of that or some information theoretical, like-

    16. AK

      Mm-hmm.

    17. EG

      ... formulation of that? 'Cause it almost feels like you should be able to w-

    18. ... calculate that now in terms of what's the-

    19. AK

      Maybe, maybe like one way to think about it is like, you know, we go back to like the internet data set, which is what we're working with. The internet is like .001% cognition and like 99.9% of like information is like, you know-

    20. EG

      Garbage, yeah. (laughs)

    21. AK

      And I think most of it is not, uh, useful to the thinking part and this like (?) .

    22. EG

      Yeah, yeah. I- I guess maybe another way to frame the question is like is there a math- mathematical representation of cognitive capability relative to model size? Or how do you capture cognition in terms of, you know, here's the min or max relative to what you're trying to accomplish? And ma- maybe there's no good way to represent that. So I think maybe a billion parameters gets you sort of like a good cognitive core.

    23. AK

      Mm. I think probably right. I think even one billion is too much. I don't know, but we'll see.

    24. EG

      It's very exciting given if you think about, uh, well, you know, it's a question of like on an edge device versus on the cloud, but- And also just raw cost of using the model and everything. Yeah, it's very exciting. Right. But at less than a billion parameters, I have my exo-cortic- cortex on a local device as well.

    25. AK

      Yeah, and then probably it's not a single model, right? Like it's interesting to me to think about what this will actually play out like, um, because I do think you want to benefit from parallelization. You don't want to have a syn- a sequential process. You want to have a parallel process. And I think companies to some extent are also kind of like, uh, sick, um, parallelization of work. And, but they, there's this hierarchy in the company because that's one way to, you know, you have the information processing and the reductions that need to happen with an organization for information. So I think we'll probably end up with, uh, companies for, of LLMs. I think it's not unlikely to me that you have models of different capabilities specialized to various, uh, unique domains. Maybe there's a programmer, etc. And it will actually start to resemble companies to a very large extent. So, you know, the programmer and the pro- program manager and, you know, similar kinds of roles of LLMs working in parallel and coming together and orchestrating computation on your behalf. So maybe it's not correct to think about. It's more like a swarm. You're exactly right, it's like a swarm of LLMs.

    26. EG

      Yeah, I was about to say, it's like an ecosystem. It's like a biological ecosystem where you have-

    27. AK

      Yeah.

    28. EG

      ... specialized roles and niches and then-

    29. AK

      And I think it will start to resemble that.

    30. EG

      You have automatic escalation to other parts of the swarm-

  11. 30:3336:17

    Andrej’s current work in AI-enabled education

    1. EG

      You left OpenAI, you're working on education, uh, you've always been an educator. Like why, why do this?

    2. AK

      I would start with, I've always been an educator and I love learning and I love teaching. And, uh, so it's kind of just like a space that I've been very passionate about for a long time. And then the other thing is, I think one macro picture that's kind of driving me is, I think there's a lot of activity in like AI and, um, I think most of it is to kind of like replace or displace people, I would say. It's in the theme of like sliding away the people. But, uh, I'm always, uh, more interested in anything that kind of empowers people. And I feel like I'm kind of on a high level, like team human, and I'm interested in things that AI can do to empower people. And I don't want the future where people are kind of, um, on the side of automation. I want people to be very in an empowered state and I want them to be amazing, even much more amazing than today. And then other aspects that I find very interesting is like, how far can a person go if they have the perfect tutor for all the subjects? And I think people could go really far if they had the perfect curriculum for anything. And I think we see that with, um, you know, if you, if some rich people maybe have, um, tutors and they do actually go really far. Um, and so I think we can approach that with AI or even like surpass it.

    3. EG

      Th- there's very clear literature on that actually from the '80s, right? Where one-on-one tutoring I think, um, helps people get one standard deviation better than- Two, Bloom. Is it two? Yeah. Yeah. It's, it's the Bloom stuff. Yeah, exactly. Yeah. There's a lot of really interesting, uh, precedents on that. How do you actually view that as substantiating through the lens of AI? Or what's the first types of products that will really help with that or... You know, 'cause there's books like The Diamond Age where they talk about The Young Ladies Illustrated Primer and all that kind of stuff.

    4. AK

      Yeah. So I would say I'm definitely inspired by aspects of, of it. So like in practice what, uh, what I'm doing is trying to currently build a single course and I want it to be just like the course you would go to if you want to learn AI. I think the problem with, uh, basically is like I've already taught courses, like I taught 231N at Stanford and that was the first deep learning class and it was pretty successful. But the question is like how do you actually like really scale these classes? Like how do you make it so that your target audience is maybe like eight billion people on Earth? And they're all speaking different languages and they're all different capability levels, etc. So y- and a single teacher doesn't scale to that audience. And so the question is how do you use AI to sort of like do the scaling of a really good teacher? And so the way I'm thinking about it is the teacher is kind of doing a lot of the course creation and the curriculum because currently at the current, uh, AI capability, I, I don't think the models are good enough to create a good course. Uh, but I think they're good to become the front end to the student and, uh, interpret the course to them. And so basically the teacher doesn't go to the people and the teacher is not the front end anymore. The teacher is on the back end designing materials in the course and the AI is the front end and it can speak all the different languages and it kind of like takes you through the course.

    5. EG

      Should I think of that as like, like the TA type experience or is that not a good analogy here?

    6. AK

      That is like one way I'm thinking about it, is it's AI TA. I'm mostly thinking of it as like this front end to the student and it's the thing that's actually interfacing with the student and, uh, taking them through the course. And I think that's tractable today, uh, and it just doesn't exist and I think can be made really good. And then over time, as the capability increases, you would potentially, uh, refactor the setup, uh, in various ways. I like to find things where... Like the AI capability today and having a good model of it, and I think a lot of companies that maybe don't, um, don't quite understand intuitively where the capability is today and then they end up kind of like building things that are kind of like too ahead of what's, what's available or maybe not ambitious enough. And so I think, uh-I do think that this is kind of a sweet spot of what's possible and also really interesting and exciting, so...

    7. SG

      I wanna go back to something you said that I think is very inspiring, especially coming from, like, your background and understanding of where exactly we are in research, which is essentially, like, we do not know what the limits of human performance from a learning perspective are, given much better tooling. And I think there's like a very easy analogy to, we just had the Olympics like a month ago, right?

    8. AK

      Mm-hmm.

    9. SG

      And you know, a runner and it's, they're, the very best mile time or pick any sport today is much better than it was putting aside performance handing, enhancing drugs like 10 years ago.

    10. AK

      Yeah.

    11. SG

      Just because, like, you start training earlier, you have a very different program. We have much better scientific understanding. We have technique, we have here. The fact that you believe, like, we can get much better, further as humans-

    12. AK

      Mm-hmm.

    13. SG

      ... if we're starting with-

    14. AK

      100%.

    15. SG

      ... like the tooling and the curriculum is amazing.

    16. AK

      Yeah, I think we haven't even scratched, like, what's possible at all. So I think there's like two dimensions basically to it is. Number one is the globalization dimension of, like, I want everyone to have really good education, but the other one is, like, how far can a single person go?

    17. SG

      Yeah.

    18. AK

      And I think both of those are very interesting and exciting.

    19. EG

      Usually when people talk about 101 learning, they talk about the adaptive aspect of it.

    20. AK

      Mm-hmm.

    21. EG

      Where you're challenging person at the level that they're at.

    22. AK

      Yeah.

    23. EG

      Do you think you can do that with AI today or is that something for the future and it's more today it's about reach and multiple languages and (overlapping)

    24. AK

      I think the low-hanging fruit is things like, for example, different languages, super low hanging fruit. I think the current models are actually really good at translation basically and can target the material and trans- translate it like at the spot.

    25. EG

      Mm-hmm.

    26. AK

      So I think a lot of things are low hanging fruit. This adaptability to a person's background, I think is like not at the low hanging fruit, but I don't think it's like too high up or too much away. But that is something you definitely want because not everyone is coming in with a d- with, um, with the same background. And also what's really helpful is like if you're familiar with some other disciplines in the past, then it's really useful to make analogies to the things you know.

    27. SG

      Mm-hmm.

    28. AK

      And that's extremely powerful in education. So that's definitely a dimension you want to take advantage of, but I think that starts to get to the point where it's like not obvious and needs some work. I think like the easy version of it is not too far where you can imagine just prompting model. It's like, "Oh hey, I know physics or I know this," and you probably get something. But I guess what I'm talking about is something that actually works, not something that like you can demo and works sometimes.

    29. EG

      Right.

    30. AK

      So I just mean like it actually really works and per- in the way a person would.

  12. 36:1741:26

    How AI-driven education reshapes knowledge networks and status

    1. EG

      about earlier, which I think is really interesting is sort of lineages. That happens in the research community where you come from certain labs and everybody gossips about being from each other's labs. I think a very high proportion of Nobel laureates actually used to work in a former Nobel laureate's lab. So there's some propagation of, I don't know if it's culture or knowledge or branding or what. In an AI education centric world-

    2. AK

      Mm-hmm.

    3. EG

      ... how do you maintain lineage or does it not matter? Or how do you think about those aspects of propagation of network and knowledge?

    4. AK

      I don't actually wanna live in a world where lineage, like, matters too much, right? So I'm hoping that AI can help you destroy that structure a little bit. It, it feels like kind of gatekeeping by some finite, uh, scarce resource, which is like, oh, there's a finite number of people who have this lineage, et cetera. So I feel like it's a little bit of that aspect, so I'm hoping it can destroy that.

    5. SG

      It's definitely one piece, like actual learning, one piece pedigree, right?

    6. AK

      Yeah.

    7. EG

      Uh, well it's also the aggregation of, uh, it's a cluster effect, right? It's like why is all of the, or much of the AI community in the Bay Area?

    8. AK

      Mm. Yeah.

    9. EG

      Or why is most of the fintech community in New York?

    10. AK

      Yeah.

    11. EG

      And so I think a lot of it is also just you're clustering really smart people with common interests and beliefs, and then they kind of propagate from that common core and then they share knowledge in an interesting way.

    12. AK

      Right.

    13. EG

      You've gotta agree a lot of that behavior has shifted online to some extent, particularly for younger people.

    14. AK

      I think one aspect of it is kind of like the educational aspect where like if you're part of a community today, you're getting a ton of education and apprenticeship, et cetera, which is extremely helpful and gets you to a point of empowered state in that area. I think the other piece of it is like the cultural aspect of what you're motivated by and what you wanna work on. What does the culture prize and what do they put on a pedestal and what do they kind of like worship basically? Uh, so in academic world, for example is the H-index.

    15. EG

      Yeah.

    16. AK

      Everyone cares about the H-index, uh, the amount of papers you publish, et cetera. And as part, I was part of that community and I saw that. And I feel like now I've come to different places and there's different idols in all the different communities, and I think that has a massive impact of what people are motivated by and where they get their social status and what actually matters to them. I also was, I think part of different communities, like growing up in Slovakia, also a very different environment, uh, growing, being in Canada, also a very different environment.

    17. SG

      What mattered there?

    18. AK

      So...

    19. EG

      Hockey.

    20. AK

      (clears throat) Sorry.

    21. SG

      Thank you. (laughs)

    22. EG

      Oh, hockey. (laughs)

    23. AK

      Yeah, hockey. (laughs) I would say as an example, I would say in Canada, um, I was in University of Toronto in Toronto. Uh, I don't think it's very entrepreneurial-pilled, uh, environment. It doesn't even occur to you that you should be starting companies. I mean, it's not something that people are doing. You don't know friends who are doing it.

    24. SG

      Yeah.

    25. AK

      You don't know that you're supposed to be looking up to it. People aren't like reading books about all the founders and talking about them.

    26. SG

      Yeah.

    27. AK

      It's just not a thing you aspire to or care about.

    28. SG

      Yeah.

    29. AK

      And, uh, what everyone is talking about, oh, is where are you getting your internship? Where are you gonna work afterwards? And it's just accepted that there's a bunch of se- there's a fixed set of companies that you're supposed to pick from and just align yourself with one of them and that's like what you look up to or something like that. So these cultural aspects are extremely strong and maybe actually the dominant variable, because I almost feel like today already the, the education aspects I think are the easier one. Like, a ton of stuff is already available, et cetera. So I think mostly it's a cultural aspect that you're part of.

    30. SG

      So on this point, like one thing you and I were talking about a few weeks ago is, and, and I think you also posted online about this, um, there's a difference between learning and entertainment.

  13. 41:2642:25

    Eureka Labs

    1. SG

      who was the audience for the first course?

    2. AK

      The audience for the first course, um, I'm mostly thinking of this as like an undergrad level course. Uh, so if you're doing undergrad in a technical area, I think that would be kind of the ideal audience. I do think that what we're seeing now is we have this like antiquated concept of education where you go through school and then you graduate and go to work, right? Obviously this will totally break down, especially in a society that's turning over so quickly. The people are gonna come back to school a lot more frequently as the technology changes very, very quickly. So, it is kind of like undergrad level, but I would say like anyone at that level at any age, uh, is kind of like in scope. I think it will be very diverse in age, as an example, but I think it is mostly like, uh, people who are technical and mostly want to, mostly actually want to uh, understand it, uh, to, uh, you know, a good amount, um, technically.

    3. SG

      When can they take the course?

    4. AK

      I was hoping it would be late this year. I do have a lot of distractions that are piling on, but I think, uh, probably early next year is kind of like the timeline. Yeah, I'm trying to make it very, very good. Um, and, uh, yeah, it just takes time to, uh, to get there, so...

    5. EG

      I have one last question actually that's

  14. 42:2544:16

    What young people study to prepare for the future

    1. EG

      sort of related to that. If you have little kids today, what do you think they should study in order to have a useful future?

    2. AK

      There's a correct answer in my mind, and the correct answer is mostly like, um, I would say like math, physics, CS kind of disciplines. And the reason I say that is because I think it helps, um, uh, for just thinking skills. It's just like the best thinking skill core, uh, is- is my opinion. And of course, I have a specific background, et cetera, so I would- I would think this, but- but that's just my view on it. I think like me taking physics classes and all these other classes just like shaped the way I think, and I think it's very useful for problem solving in general, et cetera. And so if we're in this world where pre-AGI this is gonna be useful, post-AGI you still want empowered humans who can function in any arbitrary capacity.

    3. EG

      Mm-hmm.

    4. AK

      And so I just think that this is just the correct answer for people and what they should be doing and- and taking.

    5. EG

      Mm-hmm.

    6. AK

      And it- it's either useful or it's good.

    7. EG

      Uh-huh.

    8. AK

      And so I just think it's the right answer. And I think a lot of the other stuff you can tack on a bit later, but the critical period where people have a lot of time and they have a lot of- of, uh, kind of like attention and- and time, uh, I think should be mostly spent on doing these kinds of, uh, symbol manipulation heavy tasks and workloads, not memory heavy tasks and workloads.

    9. EG

      Yeah, I did a- a math degree and I felt like there was a- a new groove being carved into my brain (laughs) when I was doing that.

    10. SG

      And it's a harder groove to carve later.

    11. AK

      And I would of course put it in a bunch of other stuff as well, like I'm not, uh, opposed to all the other disciplines, et cetera. I think it's actually beautiful to have a large diversity o- of things, but I do think 80% of it should be something like this.

    12. SG

      Well, and we're not efficient memorizers compared to our tools. Thank you for doing this.

    13. EG

      Yeah, this was amazing.

    14. SG

      This was so much fun.

    15. AK

      Yes, it was great to be here. (laughs)

    16. SG

      Find us on Twitter at No Priors Pod. Subscribe to our YouTube channel if you wanna see our faces. Follow the show on Apple Podcasts, Spotify, or wherever you listen. That way you get a new episode every week. And sign up for emails or find transcripts for every episode at no-priors.com.

Episode duration: 44:16

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode hM_h0UA7upI

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome