Skip to content
Lex Fridman PodcastLex Fridman Podcast

Jim Keller: The Future of Computing, AI, Life, and Consciousness | Lex Fridman Podcast #162

Jim Keller is a legendary microprocessor engineer, previously at AMD, Apple, Tesla, Intel, and now Tenstorrent. Please support this podcast by checking out our sponsors: - Athletic Greens: https://athleticgreens.com/lex and use code LEX to get 1 month of fish oil - Brooklinen: https://brooklinen.com and use code LEX to get $25 off + free shipping - ExpressVPN: https://expressvpn.com/lexpod and use code LexPod to get 3 months free - Belcampo: https://belcampo.com/lex and use code LEX to get 20% off first order EPISODE LINKS: Jim's Twitter: https://twitter.com/jimkxa Jim's Wiki: https://en.wikipedia.org/wiki/Jim_Keller_(engineer) Tenstorrent: https://www.tenstorrent.com/ PODCAST INFO: Podcast website: https://lexfridman.com/podcast Apple Podcasts: https://apple.co/2lwqZIr Spotify: https://spoti.fi/2nEwCF8 RSS: https://lexfridman.com/feed/podcast/ Full episodes playlist: https://www.youtube.com/playlist?list=PLrAXtmErZgOdP_8GztsuKi9nrraNbKKp4 Clips playlist: https://www.youtube.com/playlist?list=PLrAXtmErZgOeciFP3CBCIEElOJeitOr41 OUTLINE: 0:00 - Introduction 1:33 - Good design is both science and engineering 7:33 - Javascript 11:40 - RISC vs CISC 15:39 - What makes a great processor? 17:09 - Intel vs ARM 18:58 - Steve Jobs and Apple 21:36 - Elon Musk and Steve Jobs 27:21 - Father 31:03 - Perfection 37:18 - Modular design 42:52 - Moore's law 49:50 - Hardware for deep learning 56:44 - Making neural networks fast at scale 1:04:22 - Andrej Karpathy and Chris Lattner 1:08:36 - How GPUs work 1:12:43 - Tesla Autopilot, NVIDIA, and Mobileye 1:17:23 - Andrej Karpathy and Software 2.0 1:23:43 - Tesla Dojo 1:26:20 - Neural networks will understand physics better than humans 1:28:33 - Re-engineering the human brain 1:33:26 - Infinite fun and the Culture Series by Iain Banks 1:35:20 - Neuralink 1:40:43 - Dreams 1:44:37 - Ideas 1:54:49 - Aliens 1:59:46 - Jordan Peterson 2:04:44 - Viruses 2:07:52 - WallStreetBets and Robinhood 2:15:55 - Advice for young people 2:17:45 - Human condition 2:20:14 - Fear is a cage 2:25:04 - Love 2:31:27 - Regrets SOCIAL: - Twitter: https://twitter.com/lexfridman - LinkedIn: https://www.linkedin.com/in/lexfridman - Facebook: https://www.facebook.com/LexFridmanPage - Instagram: https://www.instagram.com/lexfridman - Medium: https://medium.com/@lexfridman - Reddit: https://reddit.com/r/lexfridman - Support on Patreon: https://www.patreon.com/lexfridman

Lex FridmanhostJim Kellerguest
Feb 18, 20212h 39mWatch on YouTube ↗

EVERY SPOKEN WORD

  1. 0:001:33

    Introduction

    1. LF

      The following is a conversation with Jim Keller. His second time on the podcast. Jim is a legendary microprocessor architect and is widely seen as one of the greatest engineering minds of the computing age. In a peculiar twist of space time in our simulation, Jim is also a brother-in-law of Jordan Peterson. We talk about this and about computing, artificial intelligence, consciousness, and life. Quick mention of our sponsors: Athletic Greens all-in-one nutrition drink, Brooklinen Sheets, ExpressVPN, and Belcampo grass-fed meat. Click the sponsor links to get a discount and to support this podcast. As a side note, let me say that Jim is someone who, on a personal level, inspired me to be myself. There was something in his words, on and off the mic, or perhaps that he even paid attention to me at all, that almost told me, "You're all right, kid." A kind of pat on the back that can make the difference between a mind that flourishes and a mind that is broken down by the cynicism of the world. So, I guess that's just my brief few words of thank you to Jim, and in general, gratitude for the people who have given me a chance on this podcast, in my work, and in life. If you enjoy this thing, subscribe on YouTube, review it on Apple Podcasts, follow on Spotify, support on our Patreon, or connect with me on Twitter @lexfridman. And now, here's my conversation with Jim Keller.

  2. 1:337:33

    Good design is both science and engineering

    1. LF

      What's the value and effectiveness of theory versus engineering, this dichotomy, in, uh, building good software or s- hardware systems?

    2. JK

      Well, it's... Good design is both. I guess that's pretty obvious. By engineering, do you mean, you know, reduction to practice of known methods? And then science is the pursuit of discovering things that people don't understand or solving unknown problems.

    3. LF

      Definitions are interesting here, but I was thinking more in theory, constructing models that kind of generalize about how things work.

    4. JK

      Mm-hmm.

    5. LF

      Engineering is, uh, like actually building stuff. The pragmatic, like-

    6. JK

      Mm-hmm.

    7. LF

      ... okay, we have these nice models, but how do we actually get things to work? Maybe economics is a nice example. Like economists have all these models of how the economy works and how different policies will have an effect, but then there's the actual, okay, let's call it engineering of like-

    8. JK

      Yeah.

    9. LF

      ... actually deploying the policies.

    10. JK

      So, computer design is almost all engineering and reduction to practice of known methods. Now, because of the complexity of the computers we build, you know, you- you could think you're, well, we'll just go write some code and then we'll verify it and then we'll put it together, and then you find out that the combination of all that stuff is complicated, and then you have to be inventive to figure out how to do it. Right? So that's- that's definitely ha- happens a lot. And then every so often some big idea happens, but it might be one person.

    11. LF

      And that idea is in what? In the space of engineering or is it a to- in the space of ideas?

    12. JK

      Well, I'll give you an example. So, one of the limits of computer performance is branch prediction. So... And there's- there's a whole bunch of ideas about how good you could predict a branch. And people said there's a limit to it, and that's been taught a curve, and somebody came up with a better way to do branch prediction. It was a lot better. And he published a paper on it, and every computer in the world now uses it. And it was one idea. So the- the engineers who build branch prediction hardware were happy to drop the one kind of training array and put it in another one.

    13. LF

      Mm-hmm.

    14. JK

      So, it was- it was a real idea.

    15. LF

      And branch prediction is- is one of the key problems underlying all of sort of the lowest level of software. It boils down to branch prediction.

    16. JK

      Boils down to uncertainty. Computers are limited by... You know, single thread computer's limited by two things. The- the predictability of the path of the branches and the predictability of the locality of- of data. So, we have predictors that now predict both of those pretty well.

    17. LF

      Yeah.

    18. JK

      So, memory is, you know, a couple hundred cycles away. Local cache is a couple cycles away. When you're executing fast, virtually all the data has to be in the local cache. So, a simple program says, you know, add one to every element in an array. It's really easy to see what the stream of data will be.

    19. LF

      Mm-hmm.

    20. JK

      But you might have a more complicated program that's, you know, says get a- get an element of this array, look at something, make a decision, go get another element, it's kind of random. And you can think, that's really unpredictable. And then you make this big predictor that looks at this kind of pattern and you realize, well, if you get this data and this data, then you probably want that one. And if you get this one and this one and this one, you probably want that one.

    21. LF

      And is that theory or is that engineering? Like the paper that was written, was it, uh-

    22. JK

      Well, that was prob-

    23. LF

      ... asymptotic kinda- kinda discussion or is it more like here's a hack that works well?

    24. JK

      Um, it's a little bit of both. Like there's information theory in it, I think somewhere.

    25. LF

      Okay. So it's-

    26. JK

      Yeah.

    27. LF

      ... it's actually trying to prove some kind of stuff.

    28. JK

      Yeah. But once- once you know the method, implementing it is an engineering problem. Now, there's a flip side of this which is in a big design team, what percentage of people think their- their- their, uh, their- their- their plan or their life's work is engineering versus design... inventing things? So, lots of companies will reward you for filing patents.

    29. LF

      Yes.

    30. JK

      Some... Many big companies get stuck because to get promoted you have to come up with something new. And then what happens is everybody's trying to do some random new thing, 99% of which doesn't matter, and the basics get neglected. And... Or they get to... There's a dichotomy. They think like the cell library and the basic CAD tools, you know, or basic, you know, software validation methods, that's simple stuff, you know. They want to work on the exciting stuff. And then they- they- they spend lots of time trying to figure out how to patent something.And that's mostly useless.

  3. 7:3311:40

    Javascript

    1. JK

    2. LF

      I don't know if you know Brendan Eich, he wrote JavaScript in 10 days.

    3. JK

      Uh-huh.

    4. LF

      And that's an interesting story. It makes me wonder... And it was, you know, famously for many years considered to be a pretty crappy programming language.

    5. JK

      Mm-hmm.

    6. LF

      It still is perhaps. It's been improving sort of consistently. But the interesting thing about that guy is, you know, he doesn't get any awards. (laughs)

    7. JK

      (laughs)

    8. LF

      You don't get a Nobel Prize or a Fields Medal or-

    9. JK

      Uh-huh.

    10. LF

      ... uh, he might not even-

    11. JK

      For inventing a- a crappy piece of, you know, software code that-

    12. LF

      That- well, that is currently the number one programming language in the world and runs, now is cons- i- increasingly running the backend of the internet, the frontend of the internet.

    13. JK

      Well, does he, does he know why everybody uses it? Like, that would be an interesting thing. Was it the right thing at the right time? 'Cause like when stuff like JavaScript came out, like there was a move from, you know, writing C programs in C++ to, let's call it what they call managed code frameworks.

    14. LF

      Mm-hmm.

    15. JK

      Where you write simple code, it might be interpreted, it has lots of libraries, productivity is high and you don't have to be an expert. So, you know, Java was supposed to solve all the world's problems. It was complicated. JavaScript came out, you know, after a bunch of other scripting languages. I'm not an expert on it but-

    16. LF

      Yeah.

    17. JK

      ... but was it the right thing at the right time?

    18. LF

      The right thing at the right-

    19. JK

      Or was there something, you know, clever? 'Cause he wasn't the only one.

    20. LF

      There's a few elements. One is-

    21. JK

      And maybe if he figured out what it was-

    22. LF

      No, I think-

    23. JK

      ... then he'd get a prize. (laughs)

    24. LF

      (laughs)

    25. JK

      Like that's-

    26. LF

      Constructive theory. (laughs)

    27. JK

      Yeah. You know, maybe this problem is he hasn't defined this, or he just needs a good promoter.

    28. LF

      (laughs) Well, I think there's a bunch of blog posts written about it, which is like wrong is right. Which is like doing the crappy thing fast, just like hacking together the thing that answers some of the needs-

    29. JK

      Mm-hmm.

    30. LF

      ... and then iterating over time, listening to developers, like listening to people who actually use the thing.

  4. 11:4015:39

    RISC vs CISC

    1. JK

    2. LF

      Well, I mean, isn't- isn't that also the story of RISC versus CISC? I mean, is that simplicity? There's something about simplicity that, uh, us in this evolutionary process is valued.

    3. JK

      Yeah.

    4. LF

      If it's simple, it's, uh, gets... It spreads faster, it seems like.

    5. JK

      Yeah.

    6. LF

      Or is that not always true?

    7. JK

      That's not always true. Yeah. It could be simple is good, but too simple is bad.

    8. LF

      So why did RISC win, you think, so far?

    9. JK

      Did RISC win?

    10. LF

      (laughs)

    11. JK

      We don't know.

    12. LF

      In the long arc of history, maybe not. (laughs)

    13. JK

      We, we don't know.

    14. LF

      So, who, who's gonna win? What, what's RISC, what's CISC, and who's gonna win in that space in these instruction sets?

    15. JK

      Well, A-, AI software's gonna win, but there'll be little computers that run little programs like normal all over the place. But, but we're, we're going through another transformation, so.

    16. LF

      B- but y- you think instruction sets underneath it all will change?

    17. JK

      Yeah, they evolve slowly. They, they don't matter very much.

    18. LF

      They don't matter very much, okay.

    19. JK

      Yeah. I mean, the, the limits of performance are, you know, predictability of instructions and data. I mean, that's the big thing. And then the usability of it is some, you know, quality of design, quality of tools, availability. Like, right now, x86 is proprietary with Intel and AMD, but they can change it any way they want independently.

    20. LF

      Mm-hmm.

    21. JK

      Right? Arm is proprietary to Arm, and they won't let anybody else change it. So, it's like a sole point. And RISC-V is open source, so anybody can change it, which is super cool. But that also might mean it gets changed in too many random ways that there's no common sub- subset of it that people can use.

    22. LF

      Do you like open or do you like closed? Like, if you were to bet all your money on one or the other, RISC-V versus A?

    23. JK

      No idea.

    24. LF

      It's case dependent?

    25. JK

      Well, x86, oddly enough, when Intel first started developing it, they licensed it to, like, seven people. So, it was the open architecture.

    26. LF

      (laughs)

    27. JK

      And then they moved faster than others and also bought one or two of them. But there was seven different people making x86. 'Cause at the time, there was 6502 and Z80s and, you know, 8086. And you could argue everybody thought Z80 was the better instruction set, but that was propriety to, proprietary to one place. Oh, and the 6800.

    28. LF

      So-

    29. JK

      There was, like, five differe- four or five different microprocessors. Intel went open, got the market share 'cause people felt like they had multiple sources from it, and then over time, it narrowed down to two players.

    30. LF

      So, why ... You as a historian, uh, wh- (laughs) why did Intel win for so long with, uh, with their processors? I mean, I-

  5. 15:3917:09

    What makes a great processor?

    1. JK

      And al-

    2. LF

      What, what makes a great processor in that? What, you know?

    3. JK

      Oh, if you just look at its performance versus everybody else, it's, you know, the size of it, the, you know, usability of it.

    4. LF

      So, it's not specific, some kind of element that makes it beautiful, it's just, like, literally just raw performance. Is that how you think about processors, as just, like, raw performance?

    5. JK

      Of course. (laughs) It's like a horse race.

    6. LF

      So-

    7. JK

      The fastest one wins. Now-

    8. LF

      You don't care how. (laughs) Just as long as it wins.

    9. JK

      Well, well, there's the, the fastest in an environment, like-

    10. LF

      Right.

    11. JK

      ... you know, for years, you made the fastest one you could, and then people started to have power limits. So, then you made the fastest at the right power point.

    12. LF

      Yeah.

    13. JK

      And then, and then when we started doing multiprocessors, like, if you could scale your processors more than the other guy, you could be 10% faster on, like, a single thread, but you have more threads. So, there's lots of variability. And then Arm really explored, like, you know, they have the A series and the R series and the M series, like a family of processors for all these different design points, from, like, unbelievably small and simple. And so then when you're doing the design, it's sort of like this big palette of CPUs.

    14. LF

      Mm-hmm.

    15. JK

      Like, they're the only ones with a credible, you know, top-to-bottom palette, and ...

    16. LF

      Wh- wh- what do you mean a credible, uh, top-to-bottom palette?

    17. JK

      Well, there's people who make microcontrollers that are small, but they don't have a fast one. There's people who make fast processors, but don't have a litt- a medium one or a small one.

    18. LF

      Is that hard to do, that full palette? That's, that seems like a ...

    19. JK

      Yeah, it's a lot of different

  6. 17:0918:58

    Intel vs ARM

    1. JK

      ...

    2. LF

      So, what's the difference between, uh, the Arm folks and Intel in terms of the way they're approaching this problem?

    3. JK

      Well, Intel, almost all their processor designs were, you know, very custom, high-end, you know, for the last 15, 20 years.

    4. LF

      For the fastest horse possible.

    5. JK

      Yeah.

    6. LF

      (laughs) In one horse race.

    7. JK

      And ... Yeah, and they, they, architecturally, they're really good, but the company itself was fairly insular to what's going on in the industry with CAD tools and stuff. And there's this debate about custom design versus synthesis, and how do you approach that? I, I'd say Intel was slow on the getting to synthesize processors. Arm came in from the bottom, and they generated IP, which went to all kinds of customers, so they had very little say in how the customer implemented their IP. So, Arm is super friendly to the synthesis IP environment, whereas Intel said, "We're gonna make this great client chip or server chip with our own CAD tools, with our own process, with our own, you know, other supporting IP, and everything only works with our stuff."

    8. LF

      So, was that, um... Is Arm winning the mobile platform space in terms of process?

    9. JK

      Of course, yeah.

    10. LF

      And so, i- in that, in... What you're describing is why they're winning.

    11. JK

      Well, they had lots of people doing lots of different experiments, so they controlled the processor architecture and IP, but they let people put in lots of different chips. And there was a lot of variability in what happened there. Whereas Intel, when they made their mobile, their foray into mobile, they had one team doing one part, right? So, it wasn't 10 experiments. And then their mindset was PC mindset, Microsoft software mindset, and that brought a whole bunch of things along that, uh, the mobile world and the embedded world don't do.

    12. LF

      Do you think it was possible for Intel to pivot hard and win the mobile market?

  7. 18:5821:36

    Steve Jobs and Apple

    1. LF

    2. JK

      Sure.

    3. LF

      That's a hell of a difficult thing to do, right? For a huge company to just pivot. I mean, it's so interesting to... 'Cause we'll talk about your current work. It's like, it's clear that PCs were dominating for several decades, like desktop computers, and then mobile, it's unclear.

    4. JK

      It's a, it's a leadership question. Like Ap- like Apple under Steve Jobs, when he came back, they pivoted multiple times.

    5. LF

      Yeah.

    6. JK

      You know, they built iPads and iTunes and phones and tablets and great Macs. Like, like, who knew computers could be made out of aluminum? Nobody knew that. But they're great. It's super fun.

    7. LF

      That was Steve?

    8. JK

      Yeah, Steve Jobs. Like, they pivoted multiple times, and, uh, you know, the old Intel, they, they did that multiple times. They made DRAMs and processors and processes and...

    9. LF

      I gotta ask this. What was it like working with Steve Jobs?

    10. JK

      I didn't work with him.

    11. LF

      Did you interact with him?

    12. JK

      Twice.

    13. LF

      (laughs)

    14. JK

      I said hi to him twice in the cafeteria.

    15. LF

      What did he say? Hi?

    16. JK

      He said, "Hey, fellas."

    17. LF

      (laughs)

    18. JK

      He was friendly.

    19. LF

      (laughs)

    20. JK

      He was wandering around, uh, with somebody. He couldn't find a table 'cause the cafeteria was, was packed, and I gave him my table. But I worked for Mike Colbert, who talked to... Like, Mike, Mike was the unofficial CTO of Apple and a brilliant guy, and he worked for Steve for 25 years, maybe more. And he talked to Steve multiple times a day, and he was one of the people who could put up with Steve's, let's say, brilliance and intensity. And, and Steve really liked him, and Steve trusted Mike to translate the shit he thought up into engineering products that worked. And then Mike ran a group called Platform Architecture, and I was in that group. So, many times, I'd be sitting with Mike, and the phone would ring, and it'd be Steve. And Mike would hold the phone like this 'cause Steve would be yelling about something or other.

    21. LF

      Yeah, and then he would translate.

    22. JK

      And he'd translate, and then he would say, "Steve wants us to do this." So...

    23. LF

      Was Steve a good engineer or no?

    24. JK

      I don't know. He was a great idea guy.

    25. LF

      Idea person.

    26. JK

      And he was a really good selector for talent.

    27. LF

      Yeah. That seems to be-

    28. JK

      So, again-

    29. LF

      ... one of the key elements of leadership, right?

    30. JK

      And then he was a really good first principles guy. Like, like, somebody would say something couldn't be done, and he would just think, "That's obviously wrong," right? But, you know, maybe it's hard to do. Maybe it's expensive to do. Maybe we need different people. You know, there's like a whole bunch of... You know, like, if you want to do something hard, you know, maybe it takes time. Maybe you have to iterate. There's a whole bunch of things y- you could think about. But saying it can't be done is stupid.

  8. 21:3627:21

    Elon Musk and Steve Jobs

    1. JK

    2. LF

      How would you compare... So, it seems like Elon Musk is more engineering centric, but is also... I think he considers himself a designer, too. He has a design mind.

    3. JK

      Yeah.

    4. LF

      Steve Jobs feels like he is much more idea space, design space versus engineering.

    5. JK

      Yeah.

    6. LF

      Just make it happen. Like, the world should be this way. Just figure it out.

    7. JK

      But, but he used computers. You know, he had computer people talk to him all the time. Like, Mike was a really good computer guy. He knew what computers could do.

    8. LF

      Computer meaning computer hardware, like low-level stuff?

    9. JK

      Yeah, hardware, software, all the pieces.

    10. LF

      The whole thing.

    11. JK

      And then he would, you know, have an idea about what could we do with this next that was grounded in reality. It wasn't like he was, you know, just finger painting on the wall and wishing somebody would interpret it. Like, so he had this interesting connection because, you know, he wasn't a computer architect or a designer, but he had an intuition from the computers we had to what could happen. And...

    12. LF

      It's interesting you say intuition, because it seems like he was pissing off a lot of engineers in his intuition about what can and can't be done. Those, the, like, the... What is it? All these stories about, like, floppy disks and all that kind of stuff like that.

    13. JK

      Yeah, so in, in Steve's the first round, like, he'd go into a lab and look at what's going on and hate it and, and, uh, fire people or, or ask somebody in the elevator what they're doing for Apple and, you know, not be happy. When he came back, my impression was, is he surrounded himself with a relatively small group of people.

    14. LF

      Yes.

    15. JK

      And didn't really interact outside of that as much. And then the joke was you'd see like a little, somebody moving a prototype through the, the quad with a, with a black blanket over it.

    16. LF

      (laughs)

    17. JK

      And that was 'cause it was secret, you know, partly from Steve 'cause they didn't want Steve to see it until it was ready.

    18. LF

      Yeah, the dynamic with Jony Ive and Steve is interesting. It's like you don't wanna... He ruins as many ideas as he generates.

    19. JK

      Yeah. Yeah.

    20. LF

      It's a dangerous kind of line to walk. I, I-

    21. JK

      And if you have a lot of ideas, like... Like, Gordon Bell was famous for ideas, right? And it wasn't that the percentage of good ideas was way higher than anybody else.

    22. LF

      (laughs)

    23. JK

      It was he had so many ideas, and, and he was also good at talking to people about it and getting the filters right and, you know, seeing through stuff. Whereas Elon was like, "Hey, I want to build rockets." So, Steve would hire a bunch of rocket guys, and Elon would go read rocket manuals.

    24. LF

      So, Elon is a better engineer, in a sense. Like, or, like, more, uh...... like, a love and passion for the manuals. (laughs)

    25. JK

      Yeah. And the details-

    26. LF

      The details.

    27. JK

      ... and the data and the other stuff.

    28. LF

      The craftsmanship too, right? Well, I guess Steve had craftsmanship too, but of a different kind.

    29. JK

      Yeah.

    30. LF

      What do you make of the, just to stay in there for just a little longer, what do you make of, like, the anger and the passion and all of that? The, the firing and the mood swings and the madness, the im-, you know, being emotional and all of that, that's Steve? And I, I guess Elon too. So, what, is that a, is that a bug or a feature?

  9. 27:2131:03

    Father

    1. LF

      Well, you, you're, you're probably looking for somebody's approval.

    2. JK

      Mm.

    3. LF

      Uh, i- i- even still.

    4. JK

      Yeah, maybe. I should think about that.

    5. LF

      Maybe somebody who's no longer with us kind of thing.

    6. JK

      Mm.

    7. LF

      I don't know.

    8. JK

      I used to call up my dad and tell him what I was doing. He was, he was very excited about engineering and stuff.

    9. LF

      You got his approval?

    10. JK

      Uh, yeah, a lot. I was lucky. Like, he, he decided I was smart and unusual as a kid, and that was okay, when I was really young. So when I d- like did poorly in school, I was dyslexic, I didn't read until I was third or fourth grade and they didn't care. My parents were like, "Oh, he'll be fine." So-

    11. LF

      Cool.

    12. JK

      ... I was lucky. That was cool.

    13. LF

      Is he still with us?

    14. JK

      No.

    15. LF

      You miss him?

    16. JK

      Mm-hmm. Sure, yeah. He had Parkinson's and then cancer. His last 10 years were tough. And I killed him. Killing a man like that's hard.

    17. LF

      The mind?

    18. JK

      Well, it was pretty good. Um, Parkinson's caused the slow dementia. And, uh, the c- the chemotherapy, I think, accelerated it. But it was like hallucinogenic dementia. So he was clever and funny and interesting and was, it was pretty unusual.

    19. LF

      Do you remember conversations, uh-

    20. JK

      Oh, yeah, of course.

    21. LF

      ... from that time? Like what, do you have fond memories of the guy?

    22. JK

      Yeah. Oh, yeah.

    23. LF

      Anything come to mind?

    24. JK

      Uh, a friend told me one time I could draw a computer on the whiteboard faster than anybody he'd ever met. And I said, "You should meet my dad." Like, when I was a kid, he'd come home and say, "I was driving by this bridge, and I was thinking about it." And he'd pull out a piece of paper and he'd draw the whole bridge.

    25. LF

      (laughs)

    26. JK

      He was a mechanical engineer.

    27. LF

      Yeah.

    28. JK

      And he would just draw the whole thing and then he would tell me about it and then tell me how he would've changed it. And he had this, you know, idea that he could understand and conceive anything. And I, I just grew up with that, so that was natural. So if, you know, like, when I interview people, I ask them to draw a picture of something they did on a whiteboard.

    29. LF

      Mm-hmm.

    30. JK

      And it's really interesting. Like, some people draw a little box, you know, and then they'll say, "And then this talks to this, and..."

  10. 31:0337:18

    Perfection

    1. JK

    2. LF

      (laughs) Do you think the perfect is the enemy of the good in hardware and software engineering? It's like, we were talking about JavaScript a little bit, and the messiness of the 10-day building process.

    3. JK

      Yeah, it's- that's, you know, creative tension, right?

    4. LF

      Hmm.

    5. JK

      Th- the, so creative tension is when you have two different ideas that you can't do both, right?

    6. LF

      Right.

    7. JK

      And the- and, but the fact that you want to do both causes you to go try to solve that problem. That's the creative part. So, if you're building computers, like some people say, "We have this schedule, and anything that doesn't fit in the schedule, we can't do." Right? And so they, they throw out the perfect because they have a schedule. I hate that.

    8. LF

      (laughs)

    9. JK

      Right? Then there's other people that say, "We need to get this perfectly right and no matter what." You know, more people, more money, right? And there's a really clear idea about what you want, and some people are really good at articulating it, right? So, so let's call that the perfect, yeah.

    10. LF

      Yeah.

    11. JK

      All right, but that's also terrible because then you never ship anything and you never hit any goals. So, now you have the, now you have your framework.

    12. LF

      Yes.

    13. JK

      You can't throw out stuff because you can't get it done today because maybe you'll get it done tomorrow with the next project, right? You can't... Y- so you have to... I worked with a guy that I really liked working with, but he over filters his ideas.

    14. LF

      Over filters?

    15. JK

      He'd start thinking about something, and as soon as he figured out what was wrong with it, he'd throw it out.

    16. LF

      Hmm.

    17. JK

      And then I start thinking about it, and like, you know, you come up with an idea, and then you find out what's wrong with it, and then you le- give it a little time to set because sometimes, you know, you figure out how to tweak it, or maybe that idea helps some other idea. So, idea generation is really funny. So, you have to give your ideas space, like spaciousness of mind is key, but you also have to execute programs and get shit done. And then it turns out computer engineering is fun because it takes, you know, 100 people to build a computer, 200 or 300, whatever the number is. And people are so variable about, you know, temperament and, you know, skill sets and stuff that in a, in a big organization, you find the, the people who love the perfect ideas and the people that want to get stuff done yesterday, and people like that- that come up with ideas, and people who like to, let's say, shoot down ideas. And it takes the whole... It takes a large group of people.

    18. LF

      So, some are good at generating ideas, some are good at filtering ideas, and then all th- in that, uh, giant mess, you're somehow... I guess the goal is for that giant mess of people to, uh, find the perfect path through the-

    19. JK

      Mm-hmm.

    20. LF

      ... the tension, the creative tension. But like, how do you know when... You said there's some people good at articulating what perfect looks like, what a good design is.

    21. JK

      Mm-hmm.

    22. LF

      Like, if you're sitting in a, in a room, and, uh, you have a set of ideas about like how to design, uh, a better processor. How do you know this is, this is something special here, this is a good idea, let's try this?

    23. JK

      So, have you ever brainstormed an idea with a couple of people that were really smart? And you kind of go into it, and you, you don't quite understand it, and you're working on it, and then you start, you know, talking about it, putting it on the whiteboard. Maybe it takes days or weeks, and then your brains start to kind of synchronize. It's really weird.

    24. LF

      (laughs) What's the-

    25. JK

      And like you start to see what each other is thinking.

    26. LF

      Yeah.

    27. JK

      And, and it starts to work. Like, you can see it work. Like, my talent in computer design is I can, I can see how computers work in my head, like really well. And I know other people can do that too. And when you're working with people that can do that, like, it, it is kind of a, an amazing experience. And then... And every once in a while, you, you get to that place, and then you find the flaw in it, which is kind of funny because you, you can, you can fool yourself in, but-

    28. LF

      The two of you kind of drifted a- a long, uh-

    29. JK

      Yeah, yeah, you got these-

    30. LF

      ... into a direction that was useless. (laughs)

  11. 37:1842:52

    Modular design

    1. LF

      What, uh, computing hardware or, um, just any kind, even software design are you, uh, do you find beautiful? From your own work, from o- o- o- other people's work that you're just, uh... We were just talking about the, the battleground of flaws and mistakes and errors, but things that were just beautifully done. Is there something that pops to mind?

    2. JK

      Well, when things are beautifully done, usually there's a well set, thought out set of abstraction layers. Like-

    3. LF

      So, the whole thing works qu- like, in unison nicely.

    4. JK

      Yes. And, and when I, when I say abstraction layer, that means two different components, when they work together, they work independently. They don't have to know what the other one is doing.

    5. LF

      Hmm. So, that decoupling.

    6. JK

      Yeah. So, the, the famous one was, uh, the network stack. Like, there's a seven-layer network stack.

    7. LF

      Yep.

    8. JK

      You know, data transport and protocol and all the layers. And the innovation was, is when they really rote got that right, 'cause networks before that didn't define those very well. The layers could innovate independently, and occasionally the layer boundary would, would, you know, the interface would be upgraded. And that, that let, you know, the, the design space breathe.

    9. LF

      Mm-hmm.

    10. JK

      And pe- you could do something new in layer seven without having to worry about how layer four worked.

    11. LF

      Right.

    12. JK

      And so good design does that. And you see it in processor designs. When we did, um, the Zen design at AMD, we made several components very modular. And, you know, my insistence at the top was I wanted all the interfaces defined before we wrote the RTL for the pieces. One of the verification leads said, "If we do this right, I can test the pieces so well independently, when we put it together, we won't find all these interaction bugs 'cause the floating point knows how the cache works." And I was a little skeptical, but he was mostly right, that the, the modularity of the design greatly improved the quality.

    13. LF

      Is that universally true in general? Would you say about good designs, the modularity is, uh, like usually modular?

    14. JK

      Well, we talked about this before. Humans are only so smart, like, like-

    15. LF

      (laughs)

    16. JK

      ... and we're not getting any smarter, right? But the complexity of things is going up.

    17. LF

      Yeah.

    18. JK

      So, you know, a, a beautiful design can't be bigger than the person doing it. It's just, you know, their piece of it. Like, the odds of you doing a really beautiful design of something that's way too hard for you is low, right? If it's way too simple for you, it's not that interesting. It's like, "Well, anybody could do that." But when you get the right match of your, your expertise and, you know, mental power to the right design size, that's cool, but that's not big enough to make a meaningful impact on the world. So now, you have to have some framework to design the pieces-

    19. LF

      Yes.

    20. JK

      ... so that the whole thing is big and harmonious, but, you know, when you put it together, it's, you know, it's sufficently- sufficiently interesting to, to be used and, you know. So that's what a good, beautiful design is.

    21. LF

      Matching the limits of that human cognitive capacity to, uh, to the modular you can create and creating a nice interface between those modules. And thereby, do you think there's a limit to the kind of beautiful complex systems we can build with this kind of modular design? It's like, uh, you know, if, if we build increasingly more complicated... You can think of, like, the internet. Okay, let's scale it down.

    22. JK

      Well-

    23. LF

      Like, you can think of, like, social network, like Twitter-

    24. JK

      Mm-hmm.

    25. LF

      ... as one computing system.

    26. JK

      Mm-hmm.

    27. LF

      And, but those are little modules.

    28. JK

      Yeah.

    29. LF

      Right? That's-

    30. JK

      But it's built on, it's built on so many components nobody at Twitter even understands.

  12. 42:5249:50

    Moore's law

    1. JK

    2. LF

      Well, let's go... Let's talk about Moore's Law a little bit. It's, uh-

    3. JK

      Mm-hmm.

    4. LF

      Uh, at the broad view of Moore's Law was just exponential improvement of, uh, computing capability. Uh, like, OpenAI, for example, recently, uh, published this kind of... Papers looking at the exponential improvement in the training efficiency of neural networks.

    5. JK

      Mm-hmm.

    6. LF

      For, like, ImageNet and all that kind of stuff, we just got better on this... And this is purely software side-

    7. JK

      Mm-hmm.

    8. LF

      ... just figuring out better tricks and algorithms for training neural networks, and that seems to be improving, uh, significantly faster than the Moore's Law prediction, you know?

    9. JK

      Mm-hmm.

    10. LF

      So that's in the software space. Like, what do you think... If Moore's Law continues or if the general version of Moore's Law continues, do you think that comes mostly from the hardware, from the software, some mix of the two? Some interesting totally, uh... So not, not the reduction of the size of the transistor kind of thing, but more in the, uh, uh, in the totally interesting kinds of innovations-

    11. JK

      Mm-hmm.

    12. LF

      ... in the hardware space, all that kind of stuff?

    13. JK

      Well, there's, like, a half a dozen things going on in that graph. So one is, there's initial innovations that had a lot of hea- room to be exploited. So, you know, the efficiency of the networks has improved dramatically. And then the decomposability of those and the, the, the use go... You know, they started running on one computer, then multiple computers, then multiple GPUs, and then arrays of GPUs, and they're up to thousands. And at some point... So, so it's sort of like they were consume- they were going from, like, a single computer application to a thousand computer application. So that's not really a Moore's Law thing, that's an independent vector. How many computers can I put on this problem?

    14. LF

      Yeah.

    15. JK

      'Cause the computers themselves are getting better on, like, a Moore's Law rate, but their ability to go from 1 to 10 to 100 to 1,000-

    16. LF

      Yeah.

    17. JK

      ... you know, was something. And then multiplied by, you know, the amount of computes it took to resolve like AlexNet to ResNet to transformers. It's, it's been quite, you know, steady improvements.

    18. LF

      But those are like S-curves, aren't they?

    19. JK

      Yeah.

    20. LF

      That's the exactly kind of-

    21. JK

      Yeah.

    22. LF

      ... S-curves that are underlying Moore's Law from the very beginning.

    23. JK

      Yeah, so-

    24. LF

      So what, what's the biggest... What's the most, uh, productive, uh, rich source of S-curves in the, in the future do you think? Is this hardware? Is it software? Or is it's-

    25. JK

      So hardware is gonna move along relatively slowly, like, you know, double performance every two years.

    26. LF

      (laughs)

    27. JK

      There, there's still-

    28. LF

      I like how you call that slow.

    29. JK

      Yeah, it's the slow version. The snail's pace of Moore's Law. Maybe we should, we should, uh-

    30. LF

      (laughs)

  13. 49:5056:44

    Hardware for deep learning

    1. JK

    2. LF

      But, but speaking about this, uh...

    3. JK

      Yeah.

    4. LF

      ... uh, this walk along the path of innovation towards, uh, the dumb things being smarter than humans, you are now-

    5. JK

      Mm-hmm.

    6. LF

      ... the CTO of (laughs) of, uh, Tenstorrent.

    7. JK

      Mm-hmm.

    8. LF

      Two- as of two months ago. They, uh, build hardware for deep learning.

    9. JK

      Mm-hmm.

    10. LF

      Uh, how do you build scalable and efficient deep learning? This is such a fascinating space.

    11. JK

      Yeah, yeah. So it's interesting. So, um, up until recently, I thought there was two kinds of computers. There are serial computers that run like C programs, and then there's parallel computers. So the way I think about it is, you know, parallel computers y- have given parallelism. Like, GPUs are great because you have a million pixels.

    12. LF

      Mm-hmm.

    13. JK

      And modern GPUs run a program on every pixel. They call it a shader program, right? So, or, like, finite element analysis. You, you build something, you know, you make this into little tiny chunks, you give each chunk to a computer, so you're given all these chunks, you have parallelism like that. But most C programs, you write this linear narrative, and you have to make it go fast. To make it go fast, you predict all the branches, all the data fetches, and you run that more in parallel, but that's found parallelism.

    14. LF

      Mm-hmm.

    15. JK

      Um, AI is... I'm still trying to decide how fundamental this is. It's a given parallelism problem.

    16. LF

      Mm-hmm.

    17. JK

      But the way people describe the neural networks and then how they write them in PyTorch, it makes graphs.

    18. LF

      Yeah. That might be fundamentally different than the GPU kind of-

    19. JK

      Parallelism? Yeah, it might be. Because the, when you run the GPU program on all the pixels, you're running like, you know, depends, you know, this group of pixels, say it's background blue and it runs a really simple program. This pixel is, you know, some patch of your face, so you have some really interesting shader program to give you impression of translucency, but the pixels themselves don't talk to each other. There's no graph, right? So you, you do the image and then you do the next image and you do the next image and you run eight million pixels, eight million programs every time, and modern GPUs have like 6,000-

    20. LF

      Mm-hmm.

    21. JK

      ... thread engines in 'em. So, you know, they got eight million pixels. Each one runs a program on, you know, 10 or 20 pixels, and that's how, uh, th- that's how they work. There's no graph.

    22. LF

      But you think graph might be a totally, uh, new way to think about hardware?

    23. JK

      So, Raja Koduri and I have been having this good conversation about given versus found parallelism, and then the kind of walk as we got more transistors, like, you know, computers way back when did stuff on scale or data, then we did it on vector data, famous vector machines. Now we're making computers that operate on matrices, right? And then the, the ca- the category we, we said that was next was spatial. Like, imagine you have so much data that, you know, you want to do the compute on this data, and then when it's done, it says send the result of this pile of data, run some software on that.

    24. LF

      Mm-hmm.

    25. JK

      And it's better to, to think about it spatially than to move all the data to a central processor and do all the work.

    26. LF

      So spatially, you mean moving in the space of data as opposed to moving the data?

    27. JK

      Yeah. You know, you have a, you have a petabyte data space spread across some huge array of computers, and when you do a computation somewhere, you send the result of that computation or maybe a pointer to the next program to some other piece of data and do it. But I think the... a better word might be graph, and all the AI neural networks are graphs. Do some computations, send the result here, do another computation, do a data transformation, do a merging, do a pooling, do another computation.

    28. LF

      Is it possible to compress and say how we make this thing efficient, this whole process efficient? There's different...

    29. JK

      So first, uh, the fundamental elements in the graphs are things like matrix multiplies, convolutions-

    30. LF

      Okay.

  14. 56:441:04:22

    Making neural networks fast at scale

    1. JK

      working on now.

    2. LF

      So, uh, the... I think it's called the Grace Call Processor-

    3. JK

      Mm-hmm.

    4. LF

      ... uh, introduced last year. It's, uh, you know, there's a bunch of measures of performance, we were talking about-

    5. JK

      Mm-hmm.

    6. LF

      ... horses.

    7. JK

      Mm-hmm.

    8. LF

      It seems to outperform 368 trillion operations per second, seems to out- outperformed NVIDIA's Tesla T4 system.

    9. JK

      Mm-hmm.

    10. LF

      So these are just numbers.

    11. JK

      Mm-hmm.

    12. LF

      What do they actually mean in real world perform- like what are the metrics for you that you're chasing? In, in your horse race, like what do you care about?

    13. JK

      Well, first the... So the, the native language of, you know, people who write AI network programs is PyTorch now, PyTorch, TensorFlow, there's a couple others. So-

    14. LF

      Do you think PyTorch is won over TensorFlow, or is it just a-

    15. JK

      I'm not an expert on that.

    16. LF

      Okay.

    17. JK

      I, I know many people who have switched from TensorFlow to PyTorch.

    18. LF

      Yeah.

    19. JK

      And there's technical reasons for it, and openess-

    20. LF

      I use both. Both are still awesome.

    21. JK

      Both are still awesome.

    22. LF

      But the deepest love is for PyTorch currently.

    23. JK

      Yeah. There, there's more love for that. And that, that may change. So the first thing is when they write their programs, can the hardware execute it pretty much as it was written?

    24. LF

      Mm-hmm.

    25. JK

      Right? So PyTorch turns into a graph, we have a graph compiler that makes that graph, then, like, it fractions the graph down, so if you have big matrix multiply, we turn it into right size chunks to run on the processing elements. It hooks all the graph up, it lays out all the data. There's a couple of mid-level in- representations of it that are also simulatable so that if you're writing the code you can see how it's gonna go through the machine, which is pretty cool. And then at the bottom it schedules kernels, like m- math, data manipulation, data movement kernels, which do this stuff. So we don't have to run, write a little program to do matrix multiply, 'cause we have a big matrix multiplier. Like there's no SIMD program for that. But, uh, there is scheduling for that, right? So the, the... One of the goals is, if you write a piece of PyTorch code that looks pretty reasonable, you should be able to compile it and run it on the hardware without having to tweak it and, and do all kinds of crazy things to get performance.

    26. LF

      There's not a lot of intermediate steps.

    27. JK

      Right.

    28. LF

      It's running directly as written.

    29. JK

      Like on a GPU, if you write a large matrix multiply naively, you'll get 5 to 10% of the peak performance of the GPU.

    30. LF

      Hmm.

  15. 1:04:221:08:36

    Andrej Karpathy and Chris Lattner

    1. LF

      Good. I love the idea of you inside a room with, uh, Karpathy, Andraj Karpathy and Chris Lattner.

    2. JK

      Mm-hmm.

    3. LF

      Uh, v- very, um, very interesting, very brilliant people, very out of the box thinkers-

    4. JK

      Mm-hmm.

    5. LF

      ... but also, like, first principles thinkers.

    6. JK

      Well, they both get stuff done. They only get stuff done to get their own projects done. They, they talk about it clearly, they educate large numbers of people, and they've created platforms for other people to go do their stuff on.

    7. LF

      Yeah, the, the clear thinking that's able to be communicated-

    8. JK

      Yeah.

    9. LF

      ... is kind of i- impressive.

    10. JK

      It's kind of remarkable, the... Yeah, I'm a fan.

    11. LF

      Well, l- l- let me ask, 'cause, um, I, I talk to Chris actually a lot these days.

    12. JK

      Mm-hmm.

    13. LF

      He's been, uh... One, one of the c- just to give him a shout out in, in this-

    14. JK

      Mm-hmm.

    15. LF

      He's been so supportive as a human being. So everybody's quite different, like great engineers are different, but he's been, like, sensitive to the human element-

    16. JK

      Mm-hmm.

    17. LF

      ... in a way that's been fascinating. Like, he was one of the early people on, on this stupid podcast that I do to say, like-

    18. JK

      Yeah.

    19. LF

      ... "Don't quit this thing."

    20. JK

      Mm-hmm.

    21. LF

      And also, "Talk to whoever the hell you want to talk to."

    22. JK

      Mm-hmm.

    23. LF

      That kind of, from a legit engineer, to get, like, props-

    24. JK

      Mm-hmm.

    25. LF

      ... and be like, "You can do this."

    26. JK

      Mm-hmm.

    27. LF

      That was, I mean, that's what-

    28. JK

      That's good.

    29. LF

      ... a good leader does, right? Is they just kinda-

    30. JK

      Mm-hmm.

  16. 1:08:361:12:43

    How GPUs work

    1. JK

    2. LF

      On the, either the TPU or maybe the NVIDIA GPU side, how does Tenstorrent, you think, or the ideas underlying it... It doesn't have to be Tenstorrent, just this kind of graph-focused, uh, graph-centric hardware, deep learning-centric hardware beat NVIDIA's? Do, do you think it's possible for it to basically overtake NVIDIA?

    3. JK

      Sure.

    4. LF

      What's, what's that process look like? What's that, uh, journey look like, you think?

    5. JK

      Well, GPUs were built to run shader programs on millions of pixels, not to run graphs.

    6. LF

      Yes.

    7. JK

      So there's a hypothesis that says the way the graphs, you know, are built is going to be really interesting to, to be inefficient on computing this. And then the, the primitives is not as simple program, it's matrix multiply convolution, and then the data manipulations are, are fairly extensive about... Like how do you do a fast transpose with a program? I don't know if you've ever written a transpose program. They're ugly and slow, but in hardware you can do really well. Like, I'll give you an example. So when GPU accelerators first started doing triangles, like if you have a triangle which maps on the set of pixels.

    8. LF

      Mm-hmm.

    9. JK

      So you build... It's very easy, straightforward to build a hardware engine that'll find all those pixels.

    10. LF

      Mm-hmm.

    11. JK

      And it's kind of weird because you walk along the triangle till you get to the edge, and then you have to go back down to the next row and walk along, and then you have to decide on the edge if the line of the triangle is like half on the pixel.

    12. LF

      Mm-hmm.

    13. JK

      What's the pixel color? Because it's half of this pixel and half the next one. That's called rasterization.

    14. LF

      Y- 'Cause... Y- you're saying that can be done in, uh, in hardware

    15. JK

      messages? No, I'm just... That's an example of that operation as a software program is really bad. I've written a program that did rasterization. The hardware that does it has actually less code than the software program that does it, and it's way faster, right? So there are certain times when the abstraction you have, rasterize a triangle-

    16. LF

      Mm-hmm.

    17. JK

      ... you know, execute a graph, you know, components of a graph, the, the right thing to do in the hardware-software boundary is for the hardware to naturally do it.

    18. LF

      So the GPU is really optimized for the rasterization of triangles? (laughs)

    19. JK

      Well, no, that's just... Well, like in a modern, you know... That's a small piece of modern GPUs.

    20. LF

      Mm-hmm.

    21. JK

      What they did is... That... They still rasterize triangles when you're running the game, but for the most part, most of the computation area of the GPU is running shader programs, but they're single threaded programs on pixels, not graphs.

    22. LF

      I have to be honest and say I don't actually know the, the math behind shader, uh, sh- shading and lighting and all that kind of stuff. I don't know what...

    23. JK

      They look like little simple floating point programs, or complicated ones. You can have 8,000 instructions in a shader program.

    24. LF

      But I, I don't have a good intuition why it could be parallelized so easily.

    25. JK

      No, it's because you have eight million pixels in every single... So when you have a light, right?

    26. LF

      Yeah.

    27. JK

      That comes down, the angle... You know, the amount of light... Like, like say this is a line of pixels across this table, right? The amount of light on each pixel is subtly different, right?

    28. LF

      And each pixel is responsible for figuring out what

    29. JK

      ... figuring out. So that pixel says, "I'm this pixel. I know the angle of the light, I know the occlusion, I know the color I am."

    30. LF

      Mm-hmm.

  17. 1:12:431:17:23

    Tesla Autopilot, NVIDIA, and Mobileye

    1. JK

      And, and NVIDIA invested for years in CUDA, first for HPC and then they got lucky with the AI trend.

    2. LF

      But do you think they're going to essentially not be able to hardcore pivot out of their...

    3. JK

      We'll see.That's always interesting. How often do big companies hardcore pivot? Occasionally.

    4. LF

      How much do you know about NVIDIA, folks?

    5. JK

      Some.

    6. LF

      Some?

    7. JK

      Yeah.

    8. LF

      Well, it's, um, I'm, I'm curious as well, who's ultimately... Is, uh-

    9. JK

      Oh, they've, they've innovated several times, but they've also worked really hard on mobile, they worked really hard on radios, you know. You know, they're fundamentally a GPU company.

    10. LF

      Well, they tried to pivot. There's an in- interesting little, uh, game and play in autonomous vehicles, right, with, or, uh, semi-autonomous, like playing with Tesla and so on and seeing that's a, dipping a toe into that kind of pivot.

    11. JK

      They came out with this platform, which was interesting technically.

    12. LF

      Yeah.

    13. JK

      But it was like a 3,000-watt, you know, it was 1,000-watt, three- $3,000, you know, GPU platform.

    14. LF

      I don't know if it's interesting technically. It's interesting philosophically. I, I, technically, I don't know if it's the execution, the craftsmanship was there. I'm not sure. But I, I didn't get a sense-

    15. JK

      I think they were repurposing GPUs for an automotive solution.

    16. LF

      Right, it's not a real pivot.

    17. JK

      They didn't, they didn't build a ground-up solution.

    18. LF

      Right.

    19. JK

      Like the, the, like the chips inside Tesla are pretty cheap, like Mobileye's been doing this. They're, they're doing the classic work from the simplest thing.

    20. LF

      Yeah.

    21. JK

      You know, they were building 40-mil- square-millimeter chips. In NVIDIA, their solution had two 800-millimeter chips and two 200-millimeter chips, and you know, like boatloads of really expensive DRAMs. And, and, you know, it's a really different approach.

    22. LF

      And-

    23. JK

      So Mobileye fit the, let's say, automotive cost and form factor, and then they added features as it was economically viable and NVIDIA said, "Take the biggest thing, and l- we're gonna go make it work," you know. And, and that's also influenced, like Waymo, there's a whole bunch of autonomous startups where they have a 5,000-watt server in their trunk.

    24. LF

      Mm-hmm.

    25. JK

      Right? And, but that's, that's 'cause they think, "Well, 5,000 watts and, you know, $10,000 is okay, 'cause it's replacing a driver." Elon's approach was, "That board has to be cheap enough to put it in every single Tesla, whether they turn on a- autonomous driving or not." Which, and Mobileye was like, "We need to fit in the BOM and, you know, cost structure that car companies do," so they may sell you a GPS for 1,500 bucks, but the BOM for that's like $25.

    26. LF

      Well, and, uh, for Mobileye, it seems like neural networks were not first-class citizens, like the computation. They didn't start out as a-

    27. JK

      Yeah, it was a CV problem, you know, they-

    28. LF

      Yeah. And-

    29. JK

      ... they did classic CV and found stoplights and lines, and they were really good at it.

    30. LF

      Yeah, and they never, I mean, I don't know what's happening now, but they never fully pivoted. I mean, it's like, it's the NVIDIA thing. And then, as opposed to-

  18. 1:17:231:23:43

    Andrej Karpathy and Software 2.0

    1. LF

      Well, the one really important thing is also what they're doing well is how to iterate that quickly, which means like it's not just about one-time deployment, one building. It's constantly iterating the network and trying to automate as many steps as possible, right?

    2. JK

      Yeah.

    3. LF

      And that's actually the principles of the Software 2.0, like you mentioned with Andrej, is, uh, it's not just... I mean, I don't know what the actual, his description of Software 2.0 is, if it's just high-level philosophical or there are specifics, but the interesting thing about what that actually looks in the real world is it's that, uh, what I think Andrej calls a data engine. It's like, it's the iterative improvement of the thing.

    4. JK

      Mm-hmm. Yeah.

    5. LF

      You have a neural network that, uh, does stuff, fails on a bunch of things, and learns from it over and over and over. So you're constantly discovering edge cases.

    6. JK

      Mm-hmm.

    7. LF

      So it's very much about, uh, like data engineering, like figuring out... It's, it's, it's kind of what you were talking about with Tenstorrent, is you have the data landscape, and you have to walk along that data landscape in a way that, uh, that's constantly improving the, the, the neural network, and that, that feels like that's the central piece that they can't solve.

    8. JK

      Yeah, so, and there's two pieces of it. Like, you, you find edge cases that don't work, and then you define something that goes get your data for that.

    9. LF

      Mm-hmm.

    10. JK

      But then the other constraint is whether you have to label it or not. Like the, the, the amazing thing about like the GPT-3 stuff is it's unsupervised.

Episode duration: 2:39:14

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode G4hL5Om4IJ4

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome