Skip to content
Lex Fridman PodcastLex Fridman Podcast

Andrej Karpathy: Tesla AI, Self-Driving, Optimus, Aliens, and AGI | Lex Fridman Podcast #333

Andrej Karpathy is a legendary AI researcher, engineer, and educator. He's the former director of AI at Tesla, a founding member of OpenAI, and an educator at Stanford. Please support this podcast by checking out our sponsors: - Eight Sleep: https://www.eightsleep.com/lex to get special savings - BetterHelp: https://betterhelp.com/lex to get 10% off - Fundrise: https://fundrise.com/lex - Athletic Greens: https://athleticgreens.com/lex to get 1 month of fish oil EPISODE LINKS: Andrej's Twitter: http://twitter.com/karpathy Andrej's YouTube: http://youtube.com/c/AndrejKarpathy Andrej's Website: http://karpathy.ai Andrej's Google Scholar: http://scholar.google.com/citations?user=l8WuQJgAAAAJ Books mentioned: The Vital Question: https://amzn.to/3q0vN6q Life Ascending: https://amzn.to/3wKIsOE The Selfish Gene: https://amzn.to/3TCo63s Contact: https://amzn.to/3W3y5Au The Cell: https://amzn.to/3W5f6pa PODCAST INFO: Podcast website: https://lexfridman.com/podcast Apple Podcasts: https://apple.co/2lwqZIr Spotify: https://spoti.fi/2nEwCF8 RSS: https://lexfridman.com/feed/podcast/ Full episodes playlist: https://www.youtube.com/playlist?list=PLrAXtmErZgOdP_8GztsuKi9nrraNbKKp4 Clips playlist: https://www.youtube.com/playlist?list=PLrAXtmErZgOeciFP3CBCIEElOJeitOr41 OUTLINE: 0:00 - Introduction 0:58 - Neural networks 6:01 - Biology 11:32 - Aliens 21:43 - Universe 33:34 - Transformers 41:50 - Language models 52:01 - Bots 58:21 - Google's LaMDA 1:05:44 - Software 2.0 1:16:44 - Human annotation 1:18:41 - Camera vision 1:23:46 - Tesla's Data Engine 1:27:56 - Tesla Vision 1:34:26 - Elon Musk 1:39:33 - Autonomous driving 1:44:28 - Leaving Tesla 1:49:55 - Tesla's Optimus 1:59:01 - ImageNet 2:01:40 - Data 2:11:31 - Day in the life 2:24:47 - Best IDE 2:31:53 - arXiv 2:36:23 - Advice for beginners 2:45:40 - Artificial general intelligence 2:59:00 - Movies 3:04:53 - Future of human civilization 3:09:13 - Book recommendations 3:15:21 - Advice for young people 3:17:12 - Future of machine learning 3:24:00 - Meaning of life SOCIAL: - Twitter: https://twitter.com/lexfridman - LinkedIn: https://www.linkedin.com/in/lexfridman - Facebook: https://www.facebook.com/lexfridman - Instagram: https://www.instagram.com/lexfridman - Medium: https://medium.com/@lexfridman - Reddit: https://reddit.com/r/lexfridman - Support on Patreon: https://www.patreon.com/lexfridman

Andrej KarpathyguestLex Fridmanhost
Oct 29, 20223h 28mWatch on YouTube ↗

EVERY SPOKEN WORD

  1. 0:000:58

    Introduction

    1. AK

      ... think it's possible that physics has exploits and we should be trying to find them, uh, arranging some kind of a crazy quantum mechanical system that somehow gives you buffer overflow, uh, somehow gives you a rounding error in the floating point. Synthetic intelligences are kind of like the next stage of development. And I don't know where it leads to. Like, at some point, I suspect the universe is some kind of a puzzle. These synthetic AIs will uncover that puzzle and solve it.

    2. LF

      The following is a conversation with Andrej Karpathy, previously the director of AI at Tesla, and before that, at OpenAI and Stanford. He is one of the greatest scientist-engineers and educators in the history of artificial intelligence. This is the Lex Fridman podcast. To support it, please check out our sponsors. And now, dear friends, here's Andrej Karpathy.

  2. 0:586:01

    Neural networks

    1. LF

      What is a neural network and why does it seem to, uh, do s- such a surprisingly good job of learning?

    2. AK

      What is a neural network? It's a mathematical abstraction of the brain. I would say that's how it was originally developed. At the end of the day, it's a mathematical expression and it's a fairly simple mathematical expression when you get down to it. It's basically a sequence of, uh, matrix multiplies, which are really dot products mathematically and, uh, some non-linearities thrown in. And so it's a very simple mathematical expression and it's got knobs in it.

    3. LF

      Many knobs.

    4. AK

      Many knobs. And these knobs are loosely related to basically the synapses in your brain. They're trainable, they're modifiable. And so the idea is, like, we need to find the setting of the knobs that makes the neural net, uh, do whatever you want it to do, like classify images and so on. And so there's not too much mystery I would say in it. Like, um, you might think that basically don't want to endow it with too much meaning with respect to the brain and, uh, how it works. Uh, it's really just a complicated mathematical expression with knobs and those knobs need a proper setting, uh, for it to do something, uh, desirable.

    5. LF

      Yeah, but poetry is just a collection of letters with spaces-

    6. AK

      (laughs) .

    7. LF

      ... but it can make us feel a certain way.

    8. AK

      Yeah.

    9. LF

      And in that same way when you get a large number of knobs together, whe- whether it's in a, inside the brain or inside a computer, they seem to, they seem to surprise us with the, with their power.

    10. AK

      Yeah. I think that's fair. So basically, uh, I'm underselling it by a lot because-

    11. LF

      Yes (laughs) .

    12. AK

      ... you definitely do get very surprising emergent behaviors out of these neural nets when they're large enough and trained on complicated enough problems. Like say, for example, the next, uh, word prediction in a massive data set from the internet. And, uh, then these neural nets take on, uh, pretty surprising magical properties. Yeah, I think it's kind of interesting how much you can get out of even very simple mathematical formalism.

    13. LF

      When your brain right now is talking, is it doing next word prediction?

    14. AK

      Um-

    15. LF

      Or is it doing something more interesting?

    16. AK

      Well, it's definitely some kind of a generative model that's GPT-like and prompted by you. Um-

    17. LF

      Yes.

    18. AK

      ... so you're giving me a prompt and (laughs) -

    19. LF

      And-

    20. AK

      ... I'm kind of, like, responding to it in a generative way.

    21. LF

      ... and by yourself perhaps a little bit? Like, a- are you adding extra prompts from your own memory inside your head?

    22. AK

      Mm-hmm.

    23. LF

      Or no?

    24. AK

      Well, it definitely feels like you're referencing some kind of a declarative structure of, like, memory and so on. And then, uh, you're putting that together with your prompt and giving away some answers.

    25. LF

      Like, how much of what you just said has been said by you before?

    26. AK

      Uh, nothing basically, right?

    27. LF

      No, but if you actually look at all the words you've ever said in your life and you do a search, you probably said a lot of the same words in the same order before.

    28. AK

      Yeah. Could be. I mean, I'm using phrases that are common, et cetera, but I'm remixing it into a pretty, uh, sort of unique sentence at the end of the day. But you're right, definitely, there's, like, a ton of remixing.

    29. LF

      Why... You didn't... You (laughs) ... It's like Magnus Carlsen said, uh, "I'm, I'm rated 2,900 whatever, which is pretty decent."

    30. AK

      (laughs) .

  3. 6:0111:32

    Biology

    1. LF

      impressive thing is biology do- doing to you that computers are not yet? That gap.

    2. AK

      I would say I'm definitely on- I'm much more hesitant with the analogies to the brain than I think you would see, potentially, in the field. Um, and I kind of feel like certainly the way neural networks started is everything stemmed from inspiration of the br- by the brain but at the end of the day, the artifacts that you get after training, uh, they are arrived at by a very different optimization process than the optimization process that gave rise to the brain. And so I think, uh, I kind of think of it as a very complicated alien artifact. Um, it's something different.

    3. LF

      The brain?

    4. AK

      I'm sorry, the, uh, the neural nets that we're training.

    5. LF

      Okay.

    6. AK

      They are a complicated, uh, alien artifact. Uh, I do not make analogies to the brain because I think the optimization process that gave rise to it is very different from the brain. So there was no multi-agent self-play kind of, uh, setup, uh, and evolution.

    7. LF

      (laughs) .

    8. AK

      It was an optimization that is basically a, what amounts to a-

    9. LF

      Yeah.

    10. AK

      ... compression objective on a massive amount of data.

    11. LF

      Okay. So artificial neural networks are doing compression and biological neural networks are not-

    12. AK

      Are trying to survive and they're-

    13. LF

      ... are not really doing any, they're-

    14. AK

      (laughs) .

    15. LF

      ... they're an agent in a multi-agent, self-play system that's been running for a very, very long time.

    16. AK

      Yes. That said, evolution has found that it is very useful to, to predict and have a predictive model in the brain. And so I think our brain utilizes something that looks like that as, as a part of it but it has a lot more, you know, gadgets and gizmos and, uh, value functions and ancient nuclei that are all trying to, like, make you survive, and reproduce, and everything else.

    17. LF

      And the whole thing through embryogenesis is built from a single cell. I mean it's just, the code is inside the DNA.

    18. AK

      Mm-hmm.

    19. LF

      And it just builds it up, like the entire organism with arms.

    20. AK

      It's totally crazy.

    21. LF

      And a head and legs.

    22. AK

      Yes.

    23. LF

      And, like, it does it pretty well.

    24. AK

      It should not be possible.

    25. LF

      So there's some learning going on, there's some, there's some, there's some kind of computation going through that building process. I mean, I, I, I don't know where, if you were just to look at the entirety of history of life on Earth, where do you think is the most interesting invention? Is it the origin of life itself? Is it just jumping to eukaryotes? Is it mammals? Is it humans themselves, homo sapiens?

    26. AK

      Mm-hmm.

    27. LF

      The, the, the origin of intelligence or highly complex intelligence? Or, what do, what, or is it all just a continuation of the same kind of process?

    28. AK

      Hmm. Certainly, I would say it's an extremely remarkable story that I'm only, like, briefly learning about recently. Uh, all the way from, um, actually, like, you almost have to start at the formation of Earth and all of its conditions and the entire solar system and how everything is arranged with Jupiter, and Moon, and the habitable zone, and everything. And then you have an active Earth that's turning over material.

    29. LF

      Mm-hmm.

    30. AK

      And, um, and then you start with abiogenesis and everything. And so it's all, like, a pretty remarkable story. I'm not sure that I can pick, like, a u- single unique piece of it, uh, that I find most interesting. Um, I guess for me as an artificial intelligence researcher, it's probably the last piece. We have lots of animals that, uh, you know, are, are not building technological society but we do. And, um, it seems to have happened very quickly, it seems to have happened very recently, and, uh, something very interesting happened there that I don't fully understand. I almost understand everything else, kind of, I think, intuitively, uh, but I don't understand exactly that part and how quick it was.

  4. 11:3221:43

    Aliens

    1. LF

      Gotta ask you, how many intelligent alien civilizations do you think are out there and, uh, is their intelligence different or similar to ours?

    2. AK

      Yeah. I've been preoccupied with this question quite a bit recently. Uh, basically, the firm- Fermi paradox and just thinking through. And, and the reason actually that I am very interested in, uh, the origin of life is fundamentally trying to understand how common it is that there are technological societies out there, uh, um, in space. And the more I study it, the more I s- I think that, um...Uh, there should be quite a few, quite a lot.

    3. LF

      Why haven't we heard from them? 'Cause I, I agree with you. It feels like I just don't see why what we did here on earth is so difficult to do.

    4. AK

      Yeah, and especially when you get into the details of it. I used to think origin of life was very, um, it, it was this magical rare event but then you read books, like for example, Nick Lane, um, uh, The Vital Question, uh, Life Ascending, et cetera, and he really gets in and he really makes you believe that this is not that rare.

    5. LF

      Basic chemistry.

    6. AK

      You have an active earth and you have your alkaline vents and you have lots of alkaline waters, uh, mixing with the ocean and you have your proton gradients and you have the little porous pockets of these alkaline vents that concentrate chemistry. And, um, basically as he steps through all of these little pieces you start to understand that actually this is not that crazy. You could see this happen on other systems. Um, and he really takes you from just a geology to primitive life and he makes it feel like it's actually pretty plausible. And also, like, uh, the origin of life, um, didn't, uh, was actually fairly fast after formation of earth. Um, if I remember correctly, just a few hundred million years or something like that after basically when it was possible, life actually arose. And so that makes me feel like that is not the constraint, uh, that is not the limiting variable and that life should actually be fairly common. Um, and then, you know, where the drop-offs are is, is very, um, is very interesting to think about. I c- currently think that there's no major drop-offs basically.

    7. LF

      Yeah.

    8. AK

      And so there should be quite a lot of life. And basically, what it, uh, where that brings me to then is the only way to reconcile the fact that we haven't found anyone and so on is that, um, we just can't, we can't see them, we can't observe them.

    9. LF

      Just a quick brief comment. Nick Lane and a lot of biologists I talk to, they really seem to think that the jump from bacteria to more complex organisms is the hardest jump.

    10. AK

      Mm-hmm. The eukaryotic life, basically.

    11. LF

      Y- yeah. Which I don't... I get it, they're much more knowledgeable, uh, than me about, like, the intricacies of biology but that seems, like, crazy 'cause how much, how many single cell organisms are there, like, and how much time you have. Surely, it's not that difficult.

    12. AK

      Yeah.

    13. LF

      Like and, and a billion years is not even that long of a time, really. Just all these bacteria under constrained resources, battling it out. I'm sure they can invent more complex... Like, I don't understand-

    14. AK

      (laughs) .

    15. LF

      It's like how to move from a hello world program to, like, uh, like invent a function or something like that. I don't-

    16. AK

      Yeah.

    17. LF

      (laughs) So I don't... Yeah, so I'm with you. I just feel like I don't see any... If the origin of life, that would be my intuition, that's the hardest thing but if that's not the hardest thing 'cause it happened so quickly, then it's gotta be everywhere. And yeah, maybe we're just too dumb to see it.

    18. AK

      Well, it's just, uh, we don't have really good mechanisms for seeing this life. I mean, uh, by what... How, um... So I'm not an expert, just to preface this, but just from what I think about it-

    19. LF

      On aliens? (laughs)

    20. AK

      (laughs) .

    21. LF

      Who's... I wanna meet an expert on alien intelligence and how they communicate.

    22. AK

      I'm very suspicious of our ability to, to find these intelligences out there and to find these, or it's like, uh, radio waves for example, are, are terrible. Uh, their power drops off as basically one over R squared. Uh, so I remember reading that our current radio waves would not be, uh, the ones that we, we are broadcasting, would not be, uh, measurable by our devices today, only like, was it like one-tenth of a light year away? Like not even... Basically, a tiny distance because, uh, you'd really need, like, a targeted transmission of massive power directed somewhere for this to be picked up on long, long distances. And so I just think that our ability to measure is, um, is not amazing. I think there's probably other civilizations out there. And then the big question is why don't they build Von Neumann probes and why don't they interstellar travel across the entire galaxy? And my current answer is, it's probably interstellar travel is, like, really hard. Uh, you have the interstellar medium if you wanna move at close to the speed of light, you're going to be encountering bullets along the way, uh, because even, like, tiny hydrogen atoms and little particles of dust are basically have im- like massive kinetic energy at those speeds. And so basically you need some kind of shielding. You need, you have all the cosmic radiation. Uh, it's just like brutal out there. It's really hard. And so my thinking is maybe interstellar travel is just extremely hard and you have to be very slow.

    23. LF

      Like billions of years to build hard? It feels like, uh, it feels like we're not a billion years away from doing that.

    24. AK

      It just might be that it's very, you have to go very slowly, potentially, as an example, through space. Um-

    25. LF

      Right. As opposed to close to the speed of light.

    26. AK

      Yeah. So I'm suspicious basically of our ability to measure life and I'm suspicious of, uh, the ability to, um, just permeate all of space in the galaxy or across galaxies. And that's the only way that I can c- I can currently see a way around it.

    27. LF

      Yeah. It's kinda mind-blowing to think that there's trillions of intelligent alien civilizations out there kinda slowly traveling through space.

    28. AK

      Mm-hmm. Maybe.

    29. LF

      To meet each other. And some of them meet, some of them go to war, some of them collaborate.

    30. AK

      Mm-hmm.

  5. 21:4333:34

    Universe

    1. AK

    2. LF

      So you, uh, famously tweeted, "It looks like if you bombard Earth with photons for a while, you can emit a roadster." So if like in Hitchhiker's Guide to the Galaxy we would summarize the story of Earth. So in, in that book it's mostly harmless. Uh, what do you think is the, all the possible stories, like a paragraph long or a sentence long, that Earth could be summarized as a- once it's done its computation?

    3. AK

      Mm.

    4. LF

      So, like, all the possible full... if Earth is a book, right?

    5. AK

      Yeah.

    6. LF

      Uh, it could... probably there has to be an ending.

    7. AK

      Mm.

    8. LF

      I mean, there's going to be an end to Earth and w- it could end in all kinds of ways. It can end soon, it can end later.

    9. AK

      Yeah.

    10. LF

      What do you think are the possible s- stories?

    11. AK

      Well, definitely there seems to be... yeah, you're sort of... it's pretty incredible that these self-replicating systems will basically arise from the dynamics-

    12. LF

      Mm-hmm.

    13. AK

      ... and then they perpetuate themselves and become more complex and eventually become conscious and build a society. And I kind of feel like in some sense it's kind of like a deterministic wave, uh, that, you know, that kind of just like happens on any, you know, any sufficiently well-arranged system like Earth. And so I kind of feel like there's a certain sense of inevitability in it. Um, and it's really beautiful.

    14. LF

      And it ends somehow, right? So it's a, it's a chemically a diverse environment where complex dynamical systems can, uh, evolve-

    15. AK

      Yeah.

    16. LF

      ... and become more, more, further and further complex but then there's a certain, um, what is it? There's certain terminating conditions. (laughs)

    17. AK

      Yeah. I don't know what the terminating conditions are but definitely there's a trend line of something and we're part of that story and, like, where does that... where does it go? So, you know, we're famously described often as a biological boot loader for AIs.

    18. LF

      Mm-hmm.

    19. AK

      And that's because humans, I mean, you know, we're an incredible, uh, biological system and we're capable of computation and, uh, you know, a- and love and so on. Um, but, uh, we're extremely inefficient as well. Like, we're talking to each other through audio. It's just kind of embarrassing honestly that we're manipulating, like, seven symbols, uh, serially, w- we're using vocal chords. It's all happening over, like, multiple seconds.

    20. LF

      Yeah.

    21. AK

      It's just, like, kind of embarrassing when you step down to the, um, uh, frequencies at which com- computers operate or are able to c- operate on. And so basically it does seem like, um, synthetic intelligences are kind of like the next stage of development. And, um, I don't know where it leads to, like, at some point I suspect, uh, the universe is some kind of a puzzle and, uh, these, uh, synthetic AIs will uncover that puzzle and, um, solve it.

    22. LF

      ... and then what happens after, right? Like what, 'cause if you just like fast-forward earth many billions of years, it's like, uh, it's- it's quiet and then it's like t- turmoil, you see like city lights and stuff like that.

    23. AK

      Yeah.

    24. LF

      And then what happens at, like at the end? Like is it like a , or is it-

    25. AK

      Yeah.

    26. LF

      ... like a calming? Is it explosion? Is it like earth like open- like a giant... 'cause you said, uh, emit roasters.

    27. AK

      Yeah.

    28. LF

      Like will it start emitting like a, like a giant number of-

    29. AK

      Yeah.

    30. LF

      ... like satellites and systems on fire?

  6. 33:3441:50

    Transformers

    1. LF

      very meta.

    2. AK

      (laughs)

    3. LF

      Looking back, what is the most beautiful or surprising idea in deep learning or AI in general that you've come across? You've seen this field explode, uh, and grow in interesting ways. Just what- what cool ideas, like- like what made- made you sit back and go, "Hmm." Small, big or small?

    4. AK

      Well, the one that I've been thinking about recently the most probably is the- the transformer architecture. Um, so basically, uh, neural networks have, uh, a lot of architectures that were trendy, have come and gone for different, uh, in- sensory modalities like for vision, audio, text. You would process them with different looking neural nets. And recently we've seen the- this con- this convergence towards one architecture, the transformer. And, uh, you can feed it video or you can feed it, you know, images, or speech, or text and it just gobbles it up. And it's kind of like a, bit of a general-purpose, uh, computer that is also trainable and very efficient to run on our hardware. And so, uh, this paper came out in 2016, I wanna say, um-

    5. LF

      Attention Is All You Need.

    6. AK

      Attention Is All You Need.

    7. LF

      You criticized the paper title in retrospect that it wasn't, um, it didn't foresee the bigness of the impact-

    8. AK

      Yeah.

    9. LF

      ... that it was going to have.

    10. AK

      Yeah. I'm not sure if the authors were aware of the impact that that paper would go on to have. Probably they weren't. Uh, but I think they were aware of some of the motivations and design decisions behind the transformer and they chose not to, I think, uh, expand on it in that way in the paper. And so I think they had an idea that- that there was more, um, than just the surface of just like, oh, we're just doing translation and here's a better architecture. You're not just doing translation. This is like a really cool differentiable, optimizable, efficient computer that you've proposed. And maybe they didn't have all of that foresight but I think it's really interesting.

    11. LF

      Isn't it funny, sorry to interrupt, that that title is memeable, that they went, for such a profound idea, they went with a, I don't think anyone used that kind of title before, right?

    12. AK

      Attention Is All You Need? Yeah.

    13. LF

      Yeah.

    14. AK

      It's like a meme or something simply, yeah.

    15. LF

      Yeah. Isn't that funny, that one? Like, uh, maybe if it was a more serious title-

    16. AK

      Yeah.

    17. LF

      ... it wouldn't have the impact.

    18. AK

      Honestly, I-

    19. LF

      (laughs)

    20. AK

      Yeah, there is an element of me that honestly agrees with you and prefers it this way.

    21. LF

      Yes. (laughs)

    22. AK

      Uh, (laughs) if it was too grand, it would overpromise and then underdeliver potentially.

    23. LF

      Yeah.

    24. AK

      So you want to just, uh, meme your way to greatness. (laughs)

    25. LF

      (laughs) That should be a T-shirt. So you- you tweeted that, "Transformer is a magnificent neural network architecture because it is a general-purpose differentiable computer. It is simultaneously expressive, in the forward pass, optimizable via bra- prop- back-propagation and gradient descent, and efficient, high parallelism compute graph." Can you discuss some of the- those details? Expressive, optimizable, efficient-

    26. AK

      Yeah. Um-

    27. LF

      ... from memory or- or in general. Whatever comes to your heart.

    28. AK

      You want to have a general-purpose computer that you can train on arbitrary problems, uh, like say the task of next word prediction or detecting if there's a cat in the image or something like that. And you want to train this computer so you want to set its- its weights. And I think there's a number of design criteria that sort of overlap in the transformer simultaneously that made it very successful. And I think the authors were kind of, um, deliberately trying to, uh, make this really...... uh, powerful architecture. And, um, so in a, basically it's very powerful in the forward pass because it's able to express, um, very, uh, general co- uh, computation as a sort of something that looks like message passing. Uh, you have nodes and they all store vectors. And, uh, these nodes get to basically look at each other and it's, uh, each other's vectors, and they get to communicate and basically, nodes get to broadcast, "Hey, I'm looking for certain things." And then other nodes get to broadcast, "Hey, these are the things I have." Those are the keys and the values.

    29. LF

      So it's not just attention?

    30. AK

      Yeah, exactly. Transformer is much more than just the attention component. It's got many pieces, architectural, that went into it. The residual connection of the way it's arranged, there's a multi-layer perceptron in there, the way it's, uh, stacked and so on. Um, but basically there's a message-passing scheme where nodes get to look at each other, decide what's interesting, and then update each other. And, uh, so I think the, um, when you get to the details of it, I think it's a very expressive function. Uh, so it can express lots of different types of algorithms in a forward pass. Not only that but the way it's designed with the residual connections, layer normalizations, the softmax attention and everything, it's also optimizable. This is a really big deal because there's lots of computers that are powerful that you can't optimize, um, or they are not easy to optimize using the techniques that we have, which is back-propagation and gradient ascent. These are first-order methods, very simple optimizers really. And so, um, you also need it to be optimizable. Um, and then lastly, you want it to run efficiently on our hardware. Our hardware i- is a massive throughput machine like GPUs. Uh, they prefer lots of parallelism, so you don't want to do lots of sequential operations, you want to do a lot of operations serially. And the transformer is designed with that in mind as well. And so it's designed, uh, for our hardware and it's designed to both be very expressive in a forward pass but also very optimizable in the backward pass.

  7. 41:5052:01

    Language models

    1. LF

      What do you think about one flavor of it which is language models? Have you been surprised? Uh, has your sort of imagination been captivated by, you- you mentioned GPT and all the bigger and bigger and bigger language models, and, uh, what are the limits o- of those models do you think? So just for the task of natural language.

    2. AK

      Basically, the way GPT is trained, right, is you just download a massive amount of, uh, text data from the internet and you try to predict the next, uh, word in the sequence roughly speaking. Uh, you're predicting little word chunks, uh, but, uh, roughly speaking, that's it. Um, and what's been really interesting to watch is...... uh, basically, it's a language model. Language models have actually existed for a very long time. Um, there's papers on language modeling from 2003, even earlier.

    3. LF

      Can you explain, in that case, what a language model is?

    4. AK

      Uh, yeah. So language model, just, uh, basically, the rough idea is, um, just predicting the next word in a sequence-

    5. LF

      Mm-hmm.

    6. AK

      ... roughly speaking. Uh, so there's a paper from, for example, uh, Bengio, uh, and the team from 2003, where for the first time, they were using a, uh, neural network to take, say, like three or five words and predict the, um, next word. And they're doing this on much smaller data sets. And the neural net is not a transformer, it's a multi-layer perceptron. But it, uh, but it's the first time that a neural network has been applied in that setting. But even before neural networks, there were, um, language models, except they were u- uh, using, um, n-gram models. So n-gram models are just, uh, count-based models. So, um, if you try to, if you start to take two words and predict a third one, uh, you'll just count up how many times you've seen any, uh, two-word combinations and what came next.

    7. LF

      Mm-hmm.

    8. AK

      And, uh, what you predict as coming next is just what you've seen the most of in the training set. And so, uh, language modeling has been around for a long time. Neural networks have done language modeling for a long time. So really what's, uh, new, or interesting, or exciting is just realizing that when you scale it up, uh, with a powerful enough neural net, a transformer, you have all these emergent properties where, uh, basically what happens is if you have a large enough data set of text, you are, in the task of predicting the next, uh, word, you are multitasking a huge amount of different kinds of problems. You are multitasking understanding of, you know, chemistry, physics, human nature. Lots of things are sort of clustered in that objective. It's a very simple objective but actually, you have to understand a lot about the world to pr- to make that prediction.

    9. LF

      You just said the U-word, understanding. Uh, w-

    10. AK

      (laughs) .

    11. LF

      ... are you, uh, in terms of chemistry and physics and so on, w- w- what do you feel like it's doing? Is it searching for the right context, uh, in, in, like what-

    12. AK

      Yeah.

    13. LF

      ... what is it, what is the actual process happening here?

    14. AK

      Yeah, so basically, it gets 1,000 words and it's trying to predict the 1,001st. And, uh, in order to do that very, very well over the entire data set available on the internet, you actually have to basically kind of understand, uh, the context of, of what's going on in there.

    15. LF

      Yeah.

    16. AK

      Um, and, uh, it's a sufficiently hard problem that you, uh, if you have a powerful enough computer, like a transformer, you end up with, uh, interesting solutions. (laughs) . And, uh, you can ask it, uh, to, uh, do all kinds of, uh, things and, um, it i- it shows a lot of, uh, emergent properties like in-context learning. That was the big deal with GPT and the original paper when they published it, is that you can just sort of, uh, prompt it in various ways and ask it to do various things and it will just kind of complete the sentence. But in the process of just completing the sentence, it's actually solving all kinds of really, uh, interesting problems that we care about.

    17. LF

      Do you think it's doing something like understanding?

    18. AK

      Um, yeah.

    19. LF

      Like when we use the word understanding for us humans.

    20. AK

      I think it's doing some understanding. It ha- in its way, it's- it understands, I think, a lot about the world and it has to in order to predict the next word in a sequence.

    21. LF

      So it's trained on the data from the internet. Uh, what do you think about this, this approach in terms of data sets, of using data from the internet? Do you think the internet has enough structured data to teach AI about human civilization?

    22. AK

      Yeah, so I think the internet has a huge amount of data. I'm not sure if it's a complete enough set. I don't know that, uh, text is enough for having a sufficiently powerful AGI as an outcome. Um-

    23. LF

      Of course, there is audio, and video, and images-

    24. AK

      Yeah.

    25. LF

      ... and all that kind of stuff.

    26. AK

      Yeah, so text by itself, I'm a little bit suspicious about. There's a ton of things we don't put in text, in writing, uh, just because they're obvious to us about how the world works and the physics of it and that things fall. We don't put that stuff in text because why would you? We share that understanding. And so text is a communication medium between humans. And it's not a, uh, all-encompassing medium of knowledge about the world but as you pointed out, we do have video, and we have images, and we have audio. And so I think that, uh, that definitely helps a lot. But we haven't trained models, uh, sufficiently, uh, across both, across all those modalities yet. Uh, so I think that's what a lot of people are interested in.

    27. LF

      But I wonder what that shared understanding of c- like what we might call common sense has to be learned, inferred in order to complete the sentence correctly. So maybe the fact that it's implied on the internet, the model's gonna have to learn that. Not by reading about it, by inferring it in the representation. So, like, common sense, just like we, I don't think we learn common sense, like nobody says, tells us explicitly, we just figure it all out-

    28. AK

      Mm-hmm.

    29. LF

      ... by interacting with the world.

    30. AK

      Right.

  8. 52:0158:21

    Bots

    1. AK

      today.

    2. LF

      Do you worry about bots on the internet? Given, given these ideas, given how exciting they are, do you worry about bots on Twitter being not the, the stupid bots that we see now or the crypto bots, but the bots that might be out there actually that we don't see, that they're interacting in interesting ways? So this kind of system feels like it should be able to pass the, "I'm not a robot" click button, whatever.

    3. AK

      Hmm.

    4. LF

      Um, which do you actually understand how that test works? I don't quite...

    5. AK

      (laughs)

    6. LF

      Like, uh, there's, there's a, there's a checkbox or whatever that you click.

    7. AK

      Yeah.

    8. LF

      It's presumably tracking...

    9. AK

      Oh, I see.

    10. LF

      ... s- like, mouse movement and the timing and so on.

    11. AK

      Yeah.

    12. LF

      So e- exactly this kind of system we're talking about should be able to pass that. So w- yeah, what do you feel about, um, bots that are language models plus have some interactability and are able to tweet and reply and so on, do you worry about that world?

    13. AK

      Uh, yeah, I think it's always been a bit of an arms race, uh, between sort of the attack and the defense. Uh, so the attack will get stronger but the defense will get stronger as well, uh, our ability to detect that.

    14. LF

      How do you defend? How do you detect? How do you know that your Karpathy account on Twitter is, is human? How would you approach that? Like if people were claim, you know, uh... How would you defend yourself in the court of law that I'm a human?

    15. AK

      Um...

    16. LF

      This account is human.

    17. AK

      Yeah, at some point, I think, uh-

    18. LF

      (laughs)

    19. AK

      ... it might be... I think the society, uh, society will evolve a little bit, like, we might start signing, digitally signing, uh, some of our correspondence or, you know, things that we create. Uh, right now, it's not necessary but maybe in the future it might be. I do think that we are going towards a world where we share, we share the digital space with, uh, AIs.

    20. LF

      Synthetic beings.

    21. AK

      Yeah. And, uh, they will get much better and they will share our digital realm and they'll eventually share our physical realm as well. It's much harder. Uh, but that's kind of, like, the world we're going towards. And most of them will be benign and helpful and some of them will be malicious and it's going to be an arms race trying to detect them.

    22. LF

      So, I mean, the worst isn't the AIs, the worst is the AIs pretending to be human.

    23. AK

      Mm-hmm.

    24. LF

      So it might... I don't know if it's always malicious. There's o- obviously a lot of malicious applications but...

    25. AK

      Yeah.

    26. LF

      ... it could also be... You know, if I was an AI, I would try very hard to pretend to be human because-

    27. AK

      Yeah.

    28. LF

      ... we're in a human world.

    29. AK

      Yeah.

    30. LF

      I wo- I wouldn't get any respect as an AI.

  9. 58:211:05:44

    Google's LaMDA

    1. AK

    2. LF

      There was a Google engineer that claimed that, uh, LaMDA was sentient. Do you think there's any inkling of truth to what he felt? And more importantly, to me at least, do you think language models will achieve sentience or the illusion of sentience soon-ish? Ish.

    3. AK

      Yeah. To me it's a little bit of a canary in a coal mine kind of moment, honestly, a little bit. Uh, because, uh, so this engineer spoke to like a chatbot at Google-

    4. LF

      Mm-hmm.

    5. AK

      ... and, uh, became convinced that, uh, this bot is sentient.

    6. LF

      Yeah, asked it some existential philosophical questions.

    7. AK

      Right, and it gave like reasonable answers and looked real and, uh, and so on. So to me it's a, uh... He was, he was, uh, he wasn't sufficiently trying to stress the system, I think, and, uh, exposing the truth of it as it is today.

    8. LF

      Mm-hmm.

    9. AK

      Um, but, uh, I think this will be increasingly harder over time. Uh, so, uh, yeah, I think more and more people will basically, uh, become, um... Yeah, I think more and more, uh, there will be more people like that over time as- as this gets better.

    10. LF

      Like form an emotional connection to-

    11. AK

      Yeah.

    12. LF

      ... to- to an AI chat bot?

    13. AK

      Yeah, perfectly plausible in my mind. I think these AIs are actually quite good at human c- human connection, human emotion. A ton of text on the internet is about humans and connection and love and so on. So I think they have a very good understanding in some, in some sense of- of how people speak to each other about this.

    14. LF

      Mm-hmm.

    15. AK

      And, um, they're very capable of creating a lot of that kind of text. The, um, there's a lot of like sci-fi from the '50s and '60s that imagined AIs in a very different way. They are calculating, cold Vulcan-like machines. That's not what we are getting today. We're getting pretty emotional AIs (laughs) that actually, uh, are very competent and capable of generating, you know, plausible sounding text with respect to all these topics.

    16. LF

      See, I'm really hopeful about AI systems that are like companions that help you grow, develop as a human being, uh, help you maximize long-term happiness. But I'm also very worried about AI systems that figure out from the internet that humans get attracted to drama and so these will just be like shit talking AIs.

    17. AK

      (laughs)

    18. LF

      They're just constantly, "Did you hear?" Like they'll do gossip. They'll do, uh, they'll try to plant seeds of suspicion to like other humans that you love and trust and, uh, just kind of mess with people, uh, and you know, 'cause- 'cause that's going to get a lot of attention. So drama, maximize drama-

    19. AK

      Yeah.

    20. LF

      ... on the path to maximizing, uh, engagement and us humans will feed into that machine.

    21. AK

      Yeah.

    22. LF

      And get, it'll be a giant drama shit storm of-

    23. AK

      Yeah.

    24. LF

      ... (laughs) ... so I'm worried about that. So it's the objective function really defines, uh, the way that human civilization progresses with AIs in it.

    25. AK

      Yeah. I think right now, at least, today, they are not sort of, it's not correct to really think of them as goal-seeking agents that want to do something.

    26. LF

      Mm-hmm.

    27. AK

      They have no long-term memory or anything. They, it's literally, a good approximation of it is you get 1,000 words and you're trying to predict the 1,001st, and then you continue feeding it in. And you are free to prompt it in whatever way you want. So in text. So you say, "Okay. Uh, you are a psychologist, and you are very good, and you love humans. And, uh, here's a conversation between you and another human. Human, colon, something."

    28. LF

      Mm-hmm.

    29. AK

      "You something." And then it just continues the pattern. And suddenly you're having a conversation with a fake psychologist who's like trying to help you.

    30. LF

      Mm-hmm.

  10. 1:05:441:16:44

    Software 2.0

    1. AK

    2. LF

      So you've spoken a lot about the idea of Software 2.0.

    3. AK

      Mm-hmm.

    4. LF

      Um, all good ideas become like cliches so quickly, like the terms-

    5. AK

      Yeah. (laughs)

    6. LF

      ... it's, it's, it's kind of hilarious. Um, it's like I think Eminem once said that, like, if he gets annoyed by a song he's written very quickly, that means it's gonna be a big hit-

    7. AK

      Mm-hmm.

    8. LF

      ... 'cause it's, it's too catchy.

    9. AK

      Mm-hmm.

    10. LF

      But, uh, can you describe this idea and how your thinking about it has evolved over the months and years since, since you coined it?

    11. AK

      Yeah. Yeah, so I had a blog post on Software 2.0, I think, several years ago now. Um, and the reason I wrote that post is because I kept, I kind of saw something remarkable happening in, like software development and how a lot of code was being transitioned to be written not in sort of like C++ and so on, but it's written in the weights of a neural net.

    12. LF

      Mm-hmm.

    13. AK

      Basically just saying that neural nets are taking over software, the realm of software-

    14. LF

      Yeah.

    15. AK

      ... and, um, taking more and more and more tasks. And at the time, I think not many people understood, uh, this, uh, deeply enough that this is a big deal, this is a big transition. Uh, neural networks were seen as one of multiple classification algorithms you might use for your data set problem on Kaggle. Like, this is not that. This is a change in how we program.... computers. And, uh, I saw neural nets as, uh, this is going to take over. Uh, the way we program computers is going to change. It's not going to be people writing, uh, software in C++ or something like that and directly programming the software. It's going to be accumulating, uh, training sets and data sets and crafting these objectives by which we train these neural nets. And at some point there's going to be a compilation process from the data sets, and the objective, and the architecture specification into the binary which is really just, uh, the neural net, uh, you know, weights and the forward pass of the neural net. And then you can deploy that binary. And so I was talking about that sort of transition and, uh, that's what the post is about. And I saw this sort of play out in a lot of, uh, fields, uh, you know, autopil- autopilot being one of them but also just a simple image classification. People thought originally, you know, in the '80s and so on, that they would write the algorithm for detecting a dog in an image. And they had all these ideas about how the brain does it, and first, we detect corners, and then we detect lines, and then we stitched them up. And they were, like, really going at it. They were, like, thinking about how they're gonna write the algorithm and this is not the way you build it. (laughs) Um, and there was a smooth transition where, okay, uh, first we thought we were gonna build everything. Then we were building the features, uh, so like, HOG features and things like that, uh, that detect these little statistical patterns from image patches, and then there was a little bit of, uh, learning on top of it, like, a support vector machine or binary classifier, uh, for cat versus dog in images on top of the features. So we wrote the features but we trained the last layer, sort of the- the classifier. And then people are like, "Actually, let's not even design the features because we can't. Honestly, we're not very good at it. So let's also learn the features." And then you end up with basically a convolutional neural net where you're learning most of it, you're just specifying the architecture and, uh, the architecture has tons of the fill-in-the-blanks, which is all the knobs, and you let the optimization write most of it. And so this transition is happening across the industry everywhere. And, uh, suddenly we end up with a ton of code that is written in neural net weights. And I was just pointing out that the analogy is actually pretty strong and we have a lot of developer environments for Software 1.0, like we have, uh, IDEs, um, how you work with code, how you debug code, how do you- how do you run code, uh, how do you maintain code? We have GitHub. So I was trying to make those analogies in the neural net, like what is the GitHub of Software 2.0? Turns out that something that looks like Hugging Face right now.

    16. LF

      (laughs)

    17. AK

      Uh... You know? And so I think some people took it seriously and built cool companies and, uh, many people originally attacked the post. It actually was not well-received when I wrote it.

    18. LF

      Mm-hmm.

    19. AK

      And I think maybe it has something to do with the title but the post was not well-received and I think more people sort of have been coming around to it over time.

    20. LF

      Yeah. So y- you were the director of AI at Tesla where I think this idea was really implemented at scale which is how you have engineering teams doing Software 2.0. So can you sort of linger on that idea of... I think we're in the really early stages of everything you just said which is like GitHub IDEs, like how- how do we build engineering teams that- that work in Software 2.0 systems? And, uh, the an- the data collection and the data annotation which is all part of that Software 2.0, like, what do you think is the task of programming in Software 2.0? Is it debugging in the space of hyper parameters or is it also debugging in the space of data?

    21. AK

      Yeah. The way by which you program the computer and influence its algorithm is not by writing the commands yourself, you're changing mostly the data set, uh, you're changing the, um, loss functions of, like, what the neural net is trying to do, how it's trying to predict things. But, yeah, basically, the data sets and the architecture, so the neural net. And, um, um, so in the case of the autopilot, a lot of the data sets had to do with, for example, detection of objects, and lane line markings, and traffic lights, and so on. So you accumulate massive data sets of here's an example, here's the desired label, and then, uh, here's roughly how the archite- here's roughly what the algorithm should look like and that's a convolutional neural net. So the specification of the architecture is like a hint as to what the algorithm should roughly look like, and then the fill-in-the-blanks, uh, process of optimization is, uh, is the training process. And then you take your neural net that was trained, it gives all the right answers on your data set, and you deploy it.

    22. LF

      So there's, uh, in that case, perhaps, in all machine learning cases, there's a lot of tasks. So is coming up formulating a task like a- for a multi-headed neural network, is formulating a task part of the programming?

    23. AK

      Yeah. Very much so.

    24. LF

      How do you break down a problem-

    25. AK

      Yeah.

    26. LF

      ... into a set of tasks?

    27. AK

      Yeah. Um, on a high level I would say if you look at the software running in- in the autopilot, I gave a number of talks on this topic, I would say originally a lot of it was written in Software 1.0. There's, uh, imagine lots of C++, uh, right? And then, uh, gradually there was a tiny neural net that was, for example, predicting given a single image is there, like, a traffic light or not or is there a lane line marking or not? And this neural net didn't have, uh, too much to do in the s- in the scope of the software. It was making tiny predictions on individual little image and then the rest of the system stitched it up. So, okay, we're actually- we don't have just a single camera, we have eight cameras. We actually have eight cameras over time. And so what do you do with these predictions? How do you put them together? How do you do the fusion of all that information and how do you act on it? All of that was written by humans, um, in C++. And then we decided, okay, we don't actually want, uh, uh, to do all of that fusion in, uh, C++ code because we're actually not good enough to write that algorithm.

    28. LF

      Mm-hmm.

    29. AK

      We want the neural nets to write the algorithm and we want to port, uh, all of that software into the 2.0 stack. And so then we actually had neural nets that now take all the eight camera images simultaneously and make predictions for all of that. Uh, so, um, and- and it's- and actually they don't make predictions in the- in the space of images, they now make predictions directly in 3D.

    30. LF

      Mm-hmm.

Episode duration: 3:28:47

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode cdiD-9MMpb0

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome