Skip to content
Lex Fridman PodcastLex Fridman Podcast

David Patterson: Computer Architecture and Data Storage | Lex Fridman Podcast #104

David Patterson is a Turing award winner and professor of computer science at Berkeley. He is known for pioneering contributions to RISC processor architecture used by 99% of new chips today and for co-creating RAID storage. The impact that these two lines of research and development have had on our world is immeasurable. He is also one of the great educators of computer science in the world. His book with John Hennessy "Computer Architecture: A Quantitative Approach" is how I first learned about and was humbled by the inner workings of machines at the lowest level. Support this podcast by signing up with these sponsors: - Jordan Harbinger Show: https://jordanharbinger.com/lex/ - Cash App - use code "LexPodcast" and download: - Cash App (App Store): https://apple.co/2sPrUHe - Cash App (Google Play): https://bit.ly/2MlvP5w PODCAST INFO: Podcast website: https://lexfridman.com/podcast Apple Podcasts: https://apple.co/2lwqZIr Spotify: https://spoti.fi/2nEwCF8 RSS: https://lexfridman.com/feed/podcast/ Full episodes playlist: https://www.youtube.com/playlist?list=PLrAXtmErZgOdP_8GztsuKi9nrraNbKKp4 Clips playlist: https://www.youtube.com/playlist?list=PLrAXtmErZgOeciFP3CBCIEElOJeitOr41 OUTLINE: 0:00 - Introduction 3:28 - How have computers changed? 4:22 - What's inside a computer? 10:02 - Layers of abstraction 13:05 - RISC vs CISC computer architectures 28:18 - Designing a good instruction set is an art 31:46 - Measures of performance 36:02 - RISC instruction set 39:39 - RISC-V open standard instruction set architecture 51:12 - Why do ARM implementations vary? 52:57 - Simple is beautiful in instruction set design 58:09 - How machine learning changed computers 1:08:18 - Machine learning benchmarks 1:16:30 - Quantum computing 1:19:41 - Moore's law 1:28:22 - RAID data storage 1:36:53 - Teaching 1:40:59 - Wrestling 1:45:26 - Meaning of life CONNECT: - Subscribe to this YouTube channel - Twitter: https://twitter.com/lexfridman - LinkedIn: https://www.linkedin.com/in/lexfridman - Facebook: https://www.facebook.com/LexFridmanPage - Instagram: https://www.instagram.com/lexfridman - Medium: https://medium.com/@lexfridman - Support on Patreon: https://www.patreon.com/lexfridman

Lex FridmanhostDavid Pattersonguest
Jun 27, 20201h 49mWatch on YouTube ↗

EVERY SPOKEN WORD

  1. 0:003:28

    Introduction

    1. LF

      The following is a conversation with David Patterson, Turing Award winner and professor of computer science at Berkeley. He's known for pioneering contributions to risk processor architecture used by 99% of new chips today, and for co-creating RAID storage. The impact that these two lines of research and development have had on our world is immeasurable. He's also one of the great educators of computer science in the world. His book with John Hennessy is how I first learned about and was humbled by the inner workings of machines at the lowest level. Quick summary of the ads. Two sponsors, The Jordan Harbinger Show and Cash App. Please consider supporting the podcast by going to jordanharbinger.com/lex and downloading Cash App and using code LexPodcast. Click on the links, buy the stuff. It's the best way to support this podcast, and in general, the journey I'm on in my research and startup. This is the Artificial Intelligence Podcast. If you enjoy it, subscribe on YouTube, review it with five stars on Apple Podcasts, support it on Patreon, or connect with me on Twitter @LexFridman, spelled without the E, just F-R-I-D-M-A-N. As usual, I'll do a few minutes of ads now and never any ads in the middle that could break the flow of the conversation. This episode is supported by The Jordan Harbinger Show. Go to jordanharbinger.com/lex. It's how he knows I sent you. On that page, there's links to subscribe to it on Apple Podcasts, Spotify, and everywhere else. I've been binging on his podcast. It's amazing. Jordan is a great human being. He gets the best out of his guests, dives deep, calls them out when it's needed, and makes the whole thing fun to listen to. He's interviewed Kobe Bryant, Mark Cuban, Neil deGrasse Tyson, Garry Kasparov, and- and many more. I recently listened to his conversation with Frank Abagnale, author of Catch Me If You Can and one of the world's most famous con men. Perfect podcast length and topic for a recent long-distance run that I did. Again, go to jordanharbinger.com/lex to give him my love and to support this podcast. Uh, subscribe also on Apple Podcasts, Spotify, and everywhere else. This show is presented by Cash App, the greatest sponsor of this podcast ever and the number one finance app in the App Store. When you get it, use code LexPodcast. Cash App lets you send money to friends, buy Bitcoin, and invest in the stock market with as little as $1. Since Cash App allows you to buy Bitcoin, let me mention that cryptocurrency in the context of the history of money is fascinating. I recommend The Scent of Money as a great book on this history. Also, the audiobook is amazing. Debits and credits on ledgers started around 30,000 years ago, the US dollar created over 200 years ago, and the first decentralized cryptocurrency released just over 10 years ago. So given that history, cryptocurrency is still very much in its early days of development, but it's still aiming to and just might redefine the nature of money. So again, if you get Cash App from the App Store or Google Play and use the code LexPodcast, you get $10 and Cash App will also donate $10 to FIRST, an organization that is helping to advance robotics and STEM education for young people around the world. And now, here's my conversation with David Patterson.

  2. 3:284:22

    How have computers changed?

    1. LF

      Let's start with the big historical question. How have computers changed in the past 50 years at both the fundamental architectural level and in general in your eyes?

    2. DP

      Well, the biggest thing that happened was the invention of the microprocessor. So, computers that used to fill up several rooms could fit inside your cell phone. And, uh, not only a- and not only did (laughs) they get smaller, they got a lot faster. So they're a m- million times faster than they were, uh, 50 years ago, and they're much cheaper and they're ubiquitous. Uh, you know, it, th- I don't, there's 7.8 billion people on this planet. Probably half of them have cell phones by now, which is, uh, remarkable.

    3. LF

      There's probably more microprocessors than there are people.

    4. DP

      Sure. I don't know what the ratio is, but I'm sure it's above one. Uh, maybe it's 10 to one or some number like that.

  3. 4:2210:02

    What's inside a computer?

    1. LF

      What is a microprocessor?

    2. DP

      So, uh, a way to say what a microprocessor is, is to tell you what's inside a computer. So, a computer forever has classically had five pieces. There's input and output, which kind of naturally as you'd expect is... Input is like speech or typing and output is displays. Um, there's a memory. And like the name sounds, it- it remembers things. Uh, so it's, uh, integrated circuits whose job is you put information in and when you ask for it, it comes back out. That's memory. And then the third part is the processor, uh, where the term microprocessor comes from. And that has two pieces as well, and that is the control, which is kind of the brain of the processor, and the, um, the, what's called the arithmetic unit is kind of the brawn of the computer. So if you think of the, as a human body, the arithmetic unit, the thing that does the number crunching is the, is the body and the control is the brain. So those five pieces, input, output, memory, uh, arithmetic unit, and control are, have been in computers since the very dawn, and the- the last two are considered the processor. So a microprocessor simply means a processor that fits on a microchip, and that was invented about, you know, 40 years ago, uh, was the first microprocessor.

    3. LF

      It's interesting that you refer to the arithmetic unit as the, uh, th- uh, like you connect it to the h- the- the body and the control is the brain. So I guess, I never thought of it that way. That's a, that's a nice way to think of it, because most of the actions the microprocessor-... does in terms of literally sort of computation, but the microprocessor does computation, it processes information, and most of the thing it does is basic arithme- arithmetic operations. What, what are the operations, by the way?

    4. DP

      It's a lot like a calculator, you know. So there are, um, add instructions, uh, subtract instructions, multiply and divide. And, uh, kind of the brilliance of the invention of the, of the micro- of the computer or the processor is that it performs very trivial operations, but it just performs billions of them per second. And, uh, what we're capable of doing is writing software that can take these very trivial instructions and have them create tasks that can do things better than human beings can do today.

    5. LF

      Just looking back, uh, through your career, did you anticipate the kind of, how good we would be able to get at doing these small basic operations? Like what, uh, th- like how many surprises along the way where you just kinda sat back and sh- said, "Uh, wow, that, I didn't expect it to go this fast, this good?"

    6. DP

      Well, the, the fundamental driving force is, uh, what's called Moore's Law, which was named after, uh, Gordon Moore, who's a Berkeley alumnus (laughs) . And he made this observation very early in what are called semiconductors, and semiconductors are these ideas you can build these very simple switches and you can put them on these microchips. And he made this observation over 50 years ago. He looked at a few years and said, "I think what's going to happen is the number of these little switches, called transistors, is going to double every year for the next decade." And he said this in 1965. And in 1975, he said, "Well, maybe it's gonna double every two years." And that, what other people since named that Moore's Law, guided the industry. And when Gordon Moore (laughs) made that prediction, he, he, um, wrote a paper back in, I think, in the, in the '70s and said, not only did s- this gonna happen, he wrote what would be the implications of that. And in this article from 1965, he sh- he shows ideas like, uh, computers being in cars and computers being in, uh, uh, something that you would buy in the grocery store and stuff like that. So he kind of not only called his shot, he called the implications of it. So if you were in com- in the computing field and if you believed Moore's prediction, he kind of said what the, what would be happening in the future. So, uh, so it's not kind of, um... It's, at one sense, this is what was predicted. And you could imagine, it was easy to believe that Moore's Law was gonna continue, and so this would be the implications. Uh, on the other side, there are these kind of shocking events, uh, in your life. Like I remember, uh, uh, driving in Marin, across the bay f- in San Francisco, and seeing a, a bulletin board at a local, uh, civic center, and it had a URL on it.

    7. LF

      (laughs)

    8. DP

      (laughs) And it was like, eh, for all, for all the, for, for the people at the time, these first URLs, and that's the, you know, www select stuff, uh, with the HTP, people thought it was look, looked like alien, uh, uh, alien, uh, writing, right? They'd, you'd see these advertisements and commercials on bulletin boards that had this alien writing on it. So for the laypeople, it's like, "What the hell is going on here?" And for those people in the industry, it's, "Oh my God (laughs) , uh, this stuff is getting so popular, it's actually leaking out of our nerdy world and into the real world." Uh, so that, I mean, there is events like that. I think another one was, I remember with the, when the early days of the personal computer, when we started seeing advertisements in magazines for personal computers. Like it's so popular that it's made the newspapers. So at, at one hand s- you know, Gordon Moore predicted it, and you kind of expected it to happen. But when it really hit and you saw it affecting society, it was, uh, it was, uh, s- shocking.

  4. 10:0213:05

    Layers of abstraction

    1. DP

    2. LF

      So maybe, uh, taking a step back and looking both the engineering and philosophical perspective, what, what do you th- see as the layers of abstraction in a computer? Do you see a computer as a, a set of layers of abstractions?

    3. DP

      Yeah, and I think that's one of the things that, uh, computer science, um, uh, fundamentals, is the, these things are really complicated, and the way we cope with, uh, complicated software and complicated hardware is these layers of abstraction. And that simply means that we, uh, you know, suspend disbelief and pretend, uh, that the only thing you know is that layer, and you don't know anything about the layer below it, and that's the way we can make very complicated things. And, uh, uh, probably it started with hardware, that that's the way it was done. Uh, but it's been ex- proven extremely useful. And, you know, I would think that in a modern computer today, there might be 10 or 20 layers of abstraction. And they're all trying to kind of enforce this contract is, all you know is this interface. There's a set of, um, commands that you can, are allowed to use, and you stick to those commands, and we will faithfully execute that. And it's like peeling the air- layers of a London, uh, of a onion. You get down, there's a new set of layers and so forth. So for, uh, people who wanna study computer science, the exciting part about it is you can, uh, keep peeling those layers. You, you, you take your first course, and you might learn to program in Python. And then you can take a follow-on course and you can get it down to a lower level language like C. And, you know, y- and you can go and then you can, if you want to, you can start getting into the hardware layers. And you keep getting down all the way to that transistor that I talked about, that, uh, Gordon Moore predicted. And you can understand all those layers all the way up to the highest level application software. So it's, um, it's a very, um, kind of magnetic field. Uh, if you're interested, you can go into any depth and keep going. In particular, what's happening right now, or it's happened, uh, in software the last 20 years and recently in hardware, there's getting to be open source versions of all of these things. And so what open source means is...... what the, the engineer, the programmer designs, it's not secret, uh, that belonging to a company. It's out there on the World Wide Web, so you can see it. So you can look at, uh, for lots of pieces of software that you use, you can see exactly what the programmer does if you wanna g- get involved. That used to stop at the hardware. Recently, (smacks lips) there's been an effort to make, uh, open source hardware and those interfaces open, so you can see that. So instead of before, you had to stop at the hardware, you can now start going layer by layer below that and see what's inside there. So it's, it's a remarkable time that for the interested individual can really see in great depth what's really going on in the computers that power everything, uh, that we see around us.

  5. 13:0528:18

    RISC vs CISC computer architectures

    1. DP

    2. LF

      Are you thinking also, when you say open source at the hardware level, is this going to the design architecture instruction set level-

    3. DP

      Mm-hmm.

    4. LF

      ... or is it going to literally the, the, you know, the manufacture of the, of the actual hardware, of the actual chips, whether that's ASICs specialized in a particular domain or the general?

    5. DP

      Yeah. So let, let, let's talk about that a little bit. So when you get down to the bottom layer of, uh, software, the way software talks to hardware is in a vocabulary. And what we call that vocabulary, we call that, uh... The words of that vocabulary are called instructions (clears throat) . And the technical term, uh, for the vocabulary is instruction set. So those instructions are like what we talked about earlier, they can be instructions like add, subtract, and multiply, divide. There's instructions to put data into memory, which is called a store instruction, and to get data back, which is called a load instructions. And those simple instructions, uh, go back to the very dawn of computing in, you know, in, in 1950, the comm- commercial computer had these instructions. So that's the instruction set that we're talking about. So up until I'd say 10 years ago, these instruction sets are all proprietary. So a very popular one is owned by Intel, uh, the one that's in the cloud and in all the PCs in the world, Intel owns that instruction set. It's referred to as the x86. There have been a sequence of ones that the first number was called 8086. And since then, there's been a lot of numbers, but they all end in 86. So there's been that, uh, kind of family of instruction sets.

    6. LF

      And that's proprietary.

    7. DP

      That's proprietary. The other one, uh, that's very popular is from ARM. That kind of powers all the ph- all the cell phones in the world, all the iPads in the world, and a lot of, uh, things that are so-called Internet of Things devices. Uh, ARM, and that one is also proprietary. ARM will license it to people for a fee, but they own that. So the new idea that got started at Berkeley, uh, kind of unintentionally 10 years ago, is, uh, w- uh... In early in my career, we pioneered a way to do these vocabularies instruction sets that was very controversial at the time. At the time, in the 1980s, conventional wisdom was these, uh, vocabularies instruction sets should have, you know, powerful instructions. So polysyllabic kind of words, you can think of that. And, and so the... Instead of just add, subtract, and multiply, they would have polynomial divide or sort a list. And the hope was, of those powerful vocabularies, that'd make it easier for software. So we thought that didn't make sense for microprocessors. There was people at Berkeley and Stanford and IBM who argued the opposite, and we, what we called that was a reduced instruction set computer, and the abbreviation was RISC. And typical for computer people, we used the abbreviations to start pronouncing it, so RISC was the thing. So we said, for microprocessors, which with Gordon's Moore's changing really fast, we think it's better to have a pretty simple, um, s- set of instructions, reduced set of instructions, uh, that that would be a better way to build microprocessors, just they're gonna be changing so fast due to Moore's law. And then we'll just use standard, um, uh, software to cover the, use, generate more of those simple instructions. And, uh, one of the pieces of software that's in the software stack, going between these layers of abstractions is called the compiler, and it basically translates. It's a translator between levels. We said the translator will handle that. So the technical question was, um, well, since there are these reduced instructions, you have to k- execute more of them. Yeah, that's right. But maybe you could execute them faster. Yeah, that's right. They're simpler, so they go faster, but you have to do more of them. So what's, how, what's that trade-off look like? And it ended up that we ended up executing maybe 50% more instructions, maybe a third more instructions, but they ran four times faster. So, so this RISC, controversial RISC ideas proved to be maybe factors of three or four better. Um, yeah.

    8. LF

      I love that this idea was controversial and re- almost kind of like rebellious. So that's in the context of what was more conventional is the complex instructional set computing. So how would you pronounce that?

    9. DP

      CISC.

    10. LF

      CISC.

    11. DP

      Right.

    12. LF

      Versus RISC.

    13. DP

      So RISC versus CISC. And, uh, and believe it or not, this sounds, uh, very, very, uh, you know, who cares about this, right? It was, it was violently debated at several conferences, as like, what's the right way to go, is, is... And people thought RISC was, you know, was, uh, de-evolution, we're, we're, we're gonna make software worse by making those instructions simpler, and there were fierce debates of, uh, at several conferences in the 1980s. And then later in the '80s, they kind of settled, uh, to these benefits.

    14. LF

      It's not completely intuitive to me why RISC has, for the most part, won. Uh-

    15. DP

      Yeah. So-

    16. LF

      So it-

    17. DP

      ... why did that happen?

    18. LF

      Yeah, yeah. And maybe I can sort of say a bunch of dumb things that could lay the land for (laughs) further commentary. So, uh, to me, and this is, uh, this is kind of interesting thing, if you look at C++ versus C.... with modern compilers, you really could write faster code with C++. So, uh, relying on the compiler to reduce your complicated code into something simple and fast. So to me, comparing RISC, uh, maybe this is a dumb question, but why is it that focusing the definition and design of the instruction set on very few simple instructions in the long run provide faster execution, versus coming up with, like you said, uh, a ton of complicated instructions that over time, you know, years, maybe decades, you come up with compilers that can reduce those into simple, uh, instructions for you?

    19. DP

      Yeah, so, uh, let's try and split that into two pieces. So, if the compiler can do that for you, if the compiler can take, you know, a complicated program and produce simpler instructions, uh, then the programmer doesn't care, right? The programmer, I mean, uh, you know, uh, I don't care just how- how fast is the computer I'm using and how much does it cost? And so, what we, uh, what happened kind of in the software industry is right around before the 1980s, critical pieces of software were still written not in languages like C or C++. They were written in what's called Assembly language, where there's this kind of humans writing exactly at the instructions at the level that any- that a computer can understand. So, they were writing add, subtract, multiply, you know, instructions. It's very tedious. But the belief was to write this lowest level of software that, that people use, which are called operating systems, they had to be written in Assembly language because these high level languages were just too inefficient. They were too slow, or the- the programs would be too big. Uh, so that changed with, uh, a famous operating system called UNIX, which is kind of the- the grandfather of all the operating systems today. So, UNIX demonstrated that you could write, uh, something as complicated as an operating system in a language like C. So once that was true, then that meant we could hide the instruction set from the programmer. And so that meant then it didn't really matter, uh, the programmer didn't have to write lots of these simple instructions. That was up to the compiler. And so that was part of our arguments for RISC, is if you were still writing in Assembly language, there's maybe a better case for CISC instructions. But if the compiler can do that, it's gonna be, um, you know, that's done once. The com- computer translates it once, and then every time you run the program, it runs at this, uh, this potentially simpler instructions. And so that- that was the debate, right? Is, um, because... And people would acknowledge that these simpler instructions could lead to a faster computer. You can think of monosyllabic instructions. You could say them f-... You know, if you think of reading, you could probably read them faster or say them faster than long instructions, the same thing. That analogy works pretty well for hardware. And as long as you didn't have to read a lot more of those instructions, you could win. So that's- that's kind of, that's the basic idea for RISC.

    20. LF

      But it's interesting that the... In that discussion of UNIX and C, that there's only one step of, uh, levels of abstraction from the code that's really the closest to the machine to the code that's written by a human. Look, it's, um-

    21. DP

      Mm-hmm.

    22. LF

      ... at least to me again, perhaps a dumb intuition, but it feels like there might have been more layers, sort of different kinds of humans stacked on top of each other.

    23. DP

      Well, um, so what's true and not true about what you said is, several of the layers of software, like... Uh, so the- the-... If you... Here, two layers would be... Suppose we just talk about two layers. That would be the operating system, like you get from- from Microsoft or from Apple, like iOS or, uh, the Windows operating system, and let's say applications that run on top of it, like Word or Excel. So, both the operating system could be written in C, and the application could be written in C.

    24. LF

      Yeah.

    25. DP

      So, but you could construct those two layers, and the applications absolutely do call upon the operating system. And the- the change was that both of them could be written in higher level languages.

    26. LF

      Yes.

    27. DP

      So, it's one step of a translation, but you can still build many layers of abstraction of software on top of that.

    28. LF

      Yeah.

    29. DP

      And that's how- how things are done today. So, uh, still today, many of the layers that you'll- you'll de- deal with, you may deal with debuggers, you may deal with linkers, um, there's libraries. Many of those today will be written in C++, say, uh, even though that language is, uh, pretty ancient. And even the Python interpreter is probably written in C or C++. So, lots of layers there are probably written in these, uh, some o- old-fashioned efficient languages that still take one step to produce, um, these, um, instructions, produce RISC instructions. But they're composed... Each layer of software invokes one another through these interfaces, and you can get 10 layers of software that way.

    30. LF

      So, in general, so RISC was developed here at Berkeley?

  6. 28:1831:46

    Designing a good instruction set is an art

    1. LF

      take, going back to the, the time of designing RISC, w- when you design an instruction set architecture, do you think like a programmer, do you think like a microprocessor engineer, do you think like a artist, a philosopher? Do you think in software and hardware? I mean, is it art?

    2. DP

      How you do that-

    3. LF

      Is it science?

    4. DP

      Yeah. I'd say, I think designing a good instruction set is an art, and I think you're trying to, uh, balance, um, the, the simplicity and speed of execution with how well, easy it will be for compilers to use it, right? You're trying to create an instruction set that everything in there can be used by compilers. Uh, there's not things that are missing, uh, that'll make it difficult for the program to run. Uh, they run efficiently, but you want it to be easy to build as well. So, it's that kind of inter... So, you're thinking, I'd say you're thinking hardware, trying to find a hardware-software compromise that'll work well. And, and it's, uh, you know, it's, you know, it's a matter of taste, right? It's, it's kind of fun to build instruction sets. It's not that hard to build an instruction set, but to build one that, uh, catches on and people use, you know, you have to be, you know, uh, fortunate to be in the right place in the right time or, uh, have a design that people really like.

    5. LF

      Are you using metrics? So is it, is it, uh, quantifiable? Because you kind of have to an- anticipate the kind of programs that people will write-

    6. DP

      Yeah.

    7. LF

      ... ahead of time. So is that... Can you use numbers? Can you use metrics? Can you quantify something ahead of time? Or is this, again, that's the art part where you're kind of anticipating-

    8. DP

      No, it's, um, a big, a big change. Kind of what happened, I, I think from Hennessy's and my perspective in the 1980s, what happened was going from kind of really, um, you know, taste and hunches to quantifiable. And in fact, he and I wrote a textbook at the end of the 1980s called Computer Architecture: A Quantitative Approach.

    9. LF

      I heard of that.

    10. DP

      And-

    11. LF

      (laughs)

    12. DP

      ... and it's, it's the thing... It, it had a pretty big bang impact in the field, because we went from textbooks that kind of listed, "So here's what this computer does, and here's the pros and cons, and here's what this computer does and pros and cons," to something where there were formulas and equations where you could measure things. So, specifically for instruction sets, um, what-... um, we do, and some other fields do, is we agree upon a set of programs, which we call benchmarks, and, um, a, a suite of programs, and then you develop both the hardware and the compiler, and you get numbers on how well your, uh, your computer does, given its instruction set and how well you implemented it in your microprocessor and how good your compilers are. In c- in computer architecture, we, you know, using professor's terms, we grade on a curve rather than grade on an absolute scale. So when you say, "You know, this, these programs run this fast," well, that's kind of interesting, but how do you know it's better? Well, you compare it to other computers of the same time. So the best way we know how to make, turn it into a kind of more science and experimental and quantitative is to compare yourself to other computers of the same era that have the same access to the same kind of technology on commonly agreed benchmark programs.

  7. 31:4636:02

    Measures of performance

    1. LF

      So maybe to toss up two possible directions we can go, one is, what are the different trade-offs in designing architectures? We've been already talking about CISC and RISC, but maybe a little bit more, uh, detail in terms of specific features that you were thinking about. And the other side is, what are the metrics that you're thinking about when looking at these trade-offs?

    2. DP

      Yeah. Well, let, let's talk about the metrics. So, th- during these debates, we actually had kind of a hard time explaining, convincing people the ideas, and partly, we didn't have a, a formula to explain it. And a few years into it, we hit upon the formula that helped explain what was going on. And, um, I think if we can do this, see how it works orally to do this. (laughs)

    3. LF

      (laughs)

    4. DP

      So, uh-

    5. LF

      With grades.

    6. DP

      ... the, let's see if I can do a, a formula orally. Let's see. So the, uh... so fundamentally, uh, the way you measure performance is, how long does it take a program to run? Uh, program, if you have 10 programs, and typically, these benchmarks were sweet 'cause you'd wanna have 10 programs so they could represent lots of different applications. So for these 10 programs, how long did it take to run? Well, now, when you're trying to explain why it took so long, you could factor how long it takes a program to run into three factors. Uh, o- uh, one of... the first one is, how many instructions did it take to execute? So that's the, that's the, what we've been talking about, you know, the instructions, ******. How many did it take? All right. The next question is, how long did each instruction take to run o- on average? So you multiply the number of instructions times how long it took to run, and that gets you out time. Okay, so that's... but now, let's look at this, uh, metric of how long did it take the instruction to run. Well, turns out, the way we could build computers today is they all have a clock. And you've seen this when you, if you buy a microprocessor, it'll say, "3.1 gigahertz or 2.5 gigahertz," and more gigahertz is good. Well, what that is, is the speed of the clock. So 2.5 gigahertz turns out to be four billionths of instruction or four nanoseconds. So that's the clock cycle time. But there's another factor, which is, what's the average number of clock cycles it takes per instruction? So it's number of instructions, average number of clock cycles, and the clock cycle time. So in these RISC/CIS debates, we would, we, they would concentrate on, but RISC ne- needs to take more instructions. And we'd argue, well, maybe the clock cycle is faster, but what the real big difference was, was the number of clock cycles per instruction.

    7. LF

      Per instruction, fascinating. What about the mess of, the beautiful mess of parallelism in the whole picture?

    8. DP

      Parallelism, which has to do with, say, how many instructions could execute in parallel and things like that, you could think of that as affecting the clock cycles per instruction 'cause it's the average clock cycles per instruction. So when you're running a program, if it, if it took 100 billion instructions, and on average, it took two clock cycles per instruction, and they were four nanoseconds, you could multiply that out and see how long it took to run. And there's all kinds of tricks to try and reduce the number of clock cycles per instruction. Um, but what turned out that the way they would do these complex instructions is they would actually build what we would call an interpreter, in a, in a simpler, a very simple hardware interpreter. But it turned out that for the CISC constructions, if you had to use one of those interpreters, it would be like ten clock cycles per instruction, where the RISC constructions could be two. So there'd be this factor of five advantage in clock cycles per instruction. We have to execute, say, 25 or 50% more instructions. So that's where the win would come. And then you could make an argument whether the clock cycle times are the same or not. But pointing out that we could divide, uh, the benchmark results time per program into three factors, and the biggest difference between RISC and CIS was the clock cycles per... you execute a few more instructions, but the clock cycles per instruction is much less. And that was what this debate w-... once we made that argument, then people said, "Oh, okay, I g- I get it." And so we went from, uh, it was outrageously controversial in, you know, 1982 that maybe probably by 1984 or so, people said, "Oh, yeah, technically, they've got a good argument."

  8. 36:0239:39

    RISC instruction set

    1. DP

    2. LF

      What, what are the instructions in the RISC instruction set just to get an intuition?

    3. DP

      Okay. (laughs) 1995, I was asked to pre- scientif- to predict the future of, what microprocessor look of future. So I, and I, uh, and I'd seen these predictions, and usually people predict something outrageous just to be entertaining, right?

    4. LF

      (laughs)

    5. DP

      And so my prediction for 2020 was, you know, things are gonna be pretty much... they're gonna look very familiar (laughs) to what they are, and they are. And, and if you were to read the article, you know, the things I said are pretty much true. The instructions that, that have been around forever are kind of the same.

    6. LF

      And that's the outrageous prediction, actually.

    7. DP

      Yeah.

    8. LF

      Given how fast computers have been growing, you would think-

    9. DP

      Well, a- and, you know, Moore's law was gonna go on, we thought, for 25 more years. Uh, you know, who knows? But kind of the surprising thing... in fact, you know, Hennessy and I, you know, won the, the ACM AM Turing Award for both the RISC instruction set contributions and for that textbook I mentioned. But, you know, we are surprised that here we are 35, w-... uh, 40 years later, after we did our work. And the c- the conventional wisdom of the best way to do instruction sets is still those RISC construction sets that look very similar to what we looked like in- uh, we did in the 1980s. So those, uh, surprisingly, there hasn't been some radical new idea even though we have, you know, a million times as many transistors as we had, uh, back then. Uh, that's pr-

    10. LF

      But what are the basic constructions and how do they change over the years? So are we talking about addition, subtraction? These are the arithmetic...

    11. DP

      Okay. The specific... So the, the, to get... So the things that are in a calculator y- are in a computer. So any of the buttons that are in the calculator are in the computer.

    12. LF

      (laughs)

    13. DP

      So the-

    14. LF

      Nice way to put it.

    15. DP

      ... the button... So if you m- There's a memory function key, and like I said, those are turns into putting something in memory is called a store, bring something back is called a load.

    16. LF

      Uh, just as a, a quick tangent. When you say memory, what does memory mean?

    17. DP

      Uh, well, I told you there were five pieces of a computer. And if, if you remember in a calculator, there's a memory key, so y- you wanna have intermediate calculation and bring it back later, so you, you'd hit the memory plus key, M+ maybe, and it would put that into memory, and then you'd hit an R- RM, like rec- recurrent instruction, and it'd bring it back on the display, so you don't have to type it. You don't have to write it down and bring it back again. So that's exactly what memory is, that you can put things into it as temporary storage and bring it back when you need it later. Uh, so that's memory and loads and stores. But the big thing, the difference between a computer and a calculator is that the computer can make decisions. And, and amazingly, the decisions are as simple as, is this value less than zero or is this value bigger than that value? So there's, uh... And those instructions, which are called conditional branch instructions, is what give computers all its power. Uh, if you were... In the early days of computing, before the what's called the general purpose microprocessor, people would write these instructions, uh, kind of in hardware, and, but it couldn't make decisions. It would just... It would do the same thing over and over again. Uh, with the power of having branch instructions, they can look at things and make decisions automatically, and it can make these decisions, you know, billions of times per second. And amazingly enough, we can get, you know, thanks to advances in machine learning, we can, we can create programs that can do something smarter than human beings can do. But if you go down that very basic level, it's the instructions are the keys on the calculator, plus the ability to make decisions, some of these conditional branch instructions are called.

    18. LF

      And a- And all decisions fundamentally can be reduced down to these-

    19. DP

      Yeah.

    20. LF

      ... uh, branch instructions?

    21. DP

      Yeah. Uh, so in, in fact... And so,

  9. 39:3951:12

    RISC-V open standard instruction set architecture

    1. DP

      you know, going way back in the stack, back to, you know, we did four RISC projects at Berkeley i- in the 1980s, did a couple at Stanford in the 1980s. Uh, in 2010, we decided we wanted to do, uh, a new instruction set, learning from the mistakes of those, uh, RISC architectures in 1980s, and that was done here at Berkeley almost exactly 10 years ago. And the, the people who did it, I participated, but, uh, other... Krste Asanovic and others, uh, drove it. They called it RISC-V to honor those RISC, uh, the four RISC projects of the 1980s.

    2. LF

      So what does RISC-V involve?

    3. DP

      So RISC-V is a- another, uh, instruction set vocabulary. It's, uh, learned from the mistakes of the past, but it still has... If you look at the ver- there's a core set of instructions that's very similar to the simplest architectures from the 1980s. And the big difference about RISC-V is it's open. So I talked earlier about proprietary versus, uh, open ins- um, kind of soft-

    4. LF

      Open source.

    5. DP

      ... software. So this is an instruction set, so it's a vocabulary, it's not, it's not hardware. But by having an open instruction set, we can have open source implementations, open source processors that people can use.

    6. LF

      Where do you see that going? So it's, it's a really exciting possibilities. But just like in the Scientific American, if you were to predict 10, 20, 30 years from now, that kind of ability to utilize open source instruction set architectures like RISC-V, what kind of possibilities might that unlock?

    7. DP

      Yeah. And so just to make it clear, because this is confusing. Uh, the specification of RISC-V is something that's like in a textbook. There's books about it. So that's what, that's ki- defining an interface. Uh, there's also the way you build hardware is you write it in languages that are kind of like C, but they're specialized for hardware that gets translated into hardware. And so these implementations of this specification are what are the open source. So they're written in something that's called Verilog or VHDL, but it's put up on the web, just like, uh, the, you can see the C++ code for, uh, Linux on the web. So that's... The open instruction set enables open, uh, source implementations of RISC-V.

    8. LF

      So you can literally build a processor using this instruction set?

    9. DP

      Yep. People are. Uh, people are. So what happened to us that, the story was, this was developed here for our use to do our research, and, uh, we made it, we licensed under the Berkeley Software Distribution License, like a lot of things get licensed here, so other academics could use it. They wouldn't be afraid to use it. And then, uh, about, uh, 2014, we started getting complaints that we were using it in our research and in our courses, and we got complaints from people in industry is, "Why did you change your instruction set, uh, between the fall and the spring, uh, semester?"

    10. LF

      (laughs)

    11. DP

      And, well, we get complaints from industry all the time. "Why the hell do you care what we do with our instruction set?" And then when we talk to them, we found out there's this thirst for this idea of an open instruction set architecture, and they had been looking for one. They stumbled upon ours at Berkeley, thought it was, "Boy, this looks great. We should use this one." And so once we realized there was this need for an open instruction set architecture, we thought, "That's a great idea." And then we started supporting it and tried to make it happen.So this was, um, you know, kinda we accidentally stumbled into this, uh, into this need and our timing was good. And, uh, so it's really taking off. Uh, there's, uh- uh, you know, universities are good at starting things, but they're not good at sustaining things. So like Linux has a Linux Foundation, there's a RISC-V Foundation that we started. There's, there's an annual conferences, and the first one was done, I think January of 2015. And the one that was just last December, and it, you know, it had 50 people at it. And, uh, and th- this one last December had, I don't know, 1,700 people were at it. And, uh, the company is excited all over the world. So if predicting into the future, you know, if we were doing 25 years, I would predict that RISC-V will be, you know, possibly the most popular instruction set architecture out there, because it's a pretty good instruction set architecture and it's open and free, and there's no reason, uh, lots of people shouldn't use it. Uh, and there's benefits. Just like Linux, uh, is so popular today compared to 20 years ago, uh, I- I- and, you know, the fact that you can get access to it for free, you can modify it, you can improve it for all those same arguments, and p- so people collaborate to make it a better system for all, everybody to use, and that works in software. And I expect the same thing will happen in hardware.

    12. LF

      So if you look at the ARM, Intel, MIPS, if you look at just the lay of the land, and what do you think... uh, just for me, because I'm not, um, familiar how difficult this kind of transition would, uh, how much challenges this kind of transition would entail, do you see... Let me ask my dumb question another way. (laughs)

    13. DP

      No, that's, uh, I know where you're headed. (laughs) Well, there's a bunch.

    14. LF

      Yeah.

    15. DP

      I think the thing you point out, there's, there's these propri- very popular proprietary instruction sets, the x86 and-

    16. LF

      And so how do we move to RISC-V potentially in, in sort of in the span of five, 10, 20 years, a kind of unification, uh, in given that the devices, the kind of way we use devices, IoT, mobile devices, and, and the, and the-

    17. DP

      Mm-hmm.

    18. LF

      ... cloud just keeps changing?

    19. DP

      Well, part of it, a big piece of it is, um, the software stack. And, uh, what right now, looking forward, there seem to be three important markets. There's, uh, the cloud, uh, and the cloud is simply, uh, companies like Alibaba and Amazon and Google, um, Microsoft, having these giant data centers with tens of thousands of servers and maybe a hun- maybe a hundred of these data centers all over the world. And that's what the cloud is. So the computer that dominates the cloud is the x86 ins- uh, instruction set. So the c- the instructions or the voc- Instruction sets used in the cloud are the x86 almost, almost a hundred percent of that today is x86. The other big thing are cell phones, uh, and laptops. Uh, those are the big things today. I mean, the PC i- is also dominated by the x86 instruction set, but those sales are dwindling. You know, uh, there's maybe, uh, 200 million PCs a year, and there's m- I'd say one and a half billion phones a year. Uh, there's numbers like that. So for the phones, that's dominated by ARM. And now, uh, and a reason, uh, that, uh, I talked about the software stacks... And then l- the third category is internet of things, which is basically embedded devices, things in your cars, in your microwaves, everywhere. So what's different about those three categories is for the cloud, uh, the software that runs in the cloud is determined by these companies, Alibaba, Amazon, Google, Microsoft. So th- they control that software stack. For the cell phones, there's both for Android and Apple, the software they supply, but both of them have marketplaces where anybody in the world, uh, can build software. And that software is translated or, you know, compiled down and shipped in the vocabulary of ARM. So that's the, what's referred to as binary compatible, because the actual, it's the instructions are, uh, turned into numbers, binary numbers, and shipped around the world. So-

    20. LF

      And, and so, so just a quick interruption. So ARM, what is ARM? Is, uh, ARM is an instruction, like a RISC-based-

    21. DP

      Yeah, it's a RISC-based instruction set.

    22. LF

      ... instruction set.

    23. DP

      It's a proprietary one. ARM stands for, uh, a- Advanced RISC, uh, Machine. A-R-M is the name where the company is. So it's a proprietary RISC architecture. So, uh, and it's been around, um, for a while and it's, you know, the, surely the most popular instruction set in the world right now. They, every year, billions of chips are using, uh, the ARM design in this post-PC era.

    24. LF

      Is wha- was it one of the early RISC ado- adopters of the RISC idea?

    25. DP

      Yeah. The first ARM goes back, I don't know, '86 or so. So Berkeley and Stanford did their work in the early '80s. The ARM guys, uh, needed an instruction set, and they read our papers, and it heavily influenced them. Uh, so getting back to my story, what about internet of things? Well, software is not shipped in internet of things. It's the, the, uh, the, uh, embedded device people control that software stack. So, uh, it, you would... The opportunities for RISC-V, everybody thinks, is in the in- internet of things, embedded things, because there's no dominant player like there is in the cloud or the, uh, smartphones. And, you know, it's, it's, uh, doesn't have a lot of licenses associated with, and you can enhance the instruction set if you want. And it's a, and it, people, uh, have looked at instruction sets and think it's a very good instruction set. So it appears to be very popular there. It's possible that, um, in the cloud, people, those companies control their software stacks so that it's possible that they would, um, decide to use RISC-V if we're talking about 10 and 20 years in the future. Uh, the one that'll be harder would be the cell phones since people ship software in the ARM instruction set that you'd think would be the more difficult one. But if, if RISC-V really catches on and, you know, you could, in p- in a period of a decade, you can imagine that's changing over too.

    26. LF

      Do you have a sense why RISC-V or ARM has dominated? You mentioned these three categories. Why has, why did ARM dominate? Why does it dominate the mobile device space?

    27. DP

      Mm-hmm.

    28. LF

      And maybe, like my, uh, naive intuition is that there's some aspects of power efficiency that are important-

    29. DP

      Yeah.

    30. LF

      ... that somehow come along with RISC.

  10. 51:1252:57

    Why do ARM implementations vary?

    1. LF

      the other aspect of this is, if we look at Apple, Qualcomm, Samsung, Huawei, all use the ARM architecture. And yet, the performance of the systems varies. I mean, I, I don't know whose opinion you take on, but you know, uh, Apple, for some reason, seems to perform better in terms of these-

    2. DP

      Right.

    3. LF

      ... implementations, these architectures. So, where does the magic enter the picture?

    4. DP

      Oh, how's that happen? Yeah, so, what ARM pioneered was a new business model is they said, "Well, here's our proprietary instruction set, and we'll give you two ways to do it. Either we'll give you one of these implementations written in things like C called Verilog, and you can just use ours. You, well, you have to pay money for that. Uh, not only pay, we'll give you the, you know, we'll license you to do that, or you could design your own." And so, we're talking about numbers like, tens of millions of dollars to have the right to design your own, since they, it's, the instruction set belongs to them. So Apple got one of those, uh, the right to build their own. Most of the other people who build like Android phones just get one of the designs from ARM- Right.

    5. LF

      ... to do it themselves. So Apple developed a really good, uh, microprocessor design team. They, uh, you know, acquired a, a very good team that had, uh, was, uh, building other microprocessors and brought them into the company to build their designs. So the instruction sets are the same, the specifications are the same, but their hardware design is much more efficient than I think everybody else's. Uh, and that's given Apple, uh, an advantage in the marketplace, in that, uh, the iPhones tend to be the f- faster than most everybody else's phones that are there.

  11. 52:5758:09

    Simple is beautiful in instruction set design

    1. LF

      It'd be nice to be able to, to jump around and kind of explore different, you know, little sides of this. But let me ask one sort of romanticized question. What to you is the most beautiful aspect or idea of RISC instruction set or instruction sets for this-

    2. DP

      Yeah.

    3. LF

      ... uh, work that you've done?

    4. DP

      Well, I think, uh, you know, I, I, I'm, you know, I, I was always attracted to the idea of, you know, small is beautiful, right? Is that, uh, the temptation in engineering, it's kind of easy to make things more complicated. (laughs) It's harder to come up with a, it's more difficult, surprisingly, to come up with a simple, elegant solution. And I think, uh, th- there's a bunch of small features of, of RISC in general that, you know, uh, where you can see this examples of keeping it simpler just makes it more elegant. Specifically in RISC-V, which, you know, I'm, I was kind of the mentor in the program, but it was really driven by Krste Asanovic and two grad students, Andrew Waterman and Yunsup Lee, is they hit upon this idea of having, uh, a, a subset of instructions, a nice, simple instruc- subset instructions, like 40-ish instructions that all software, uh, the software stack for RISC-V can run just on those 40 instructions. And then they provide optional features that could accelerate, uh, the performance, instructions that if you needed them, could be very helpful, but you don't need to have them. And that, that's a new, really a new idea. So RISC-V has, right now, maybe five optional subsets that you can pull in. But the software runs without them. If you just wanna build the, just the, the core 40 instructions, that's fine. You can do that. So this is fantastic for educationally, is you can explain computers, you only have to explain 40 instructions and not thousands of them. Also, if you invent some wild and crazy new technology like, uh, you know, biological computing, you'd like a nice, simple instruction set, and you can, RISC-V, if you implement those core instructions, you can run, you know, really interesting programs on top of that. So this idea of a core set of instructions that the software stack runs on, and then optional features that, if you turn them on, the compilers will use but you don't have to, I think is a powerful idea. Uh, what's happened in the past, for the proprietary instruction sets, is when they add new instructions, it becomes required piece, uh, and so that all...... all microprocessors in the future have to use those instructions. So it's kind of like, uh, as, for a lot of people as they get older, they gain weight, right?

    5. LF

      (laughs)

    6. DP

      (laughs) Is that... Weight and age are correlated. And so you can see these instruction sets get, getting bigger and bigger as they get older. So RISC-V, uh, you know, lets you (laughs) be as slim as you were as a teenager, and you only have to add these, uh, extra features if you're really gonna use them, rather than every, uh, you have no choice and you have to keep growing with the instruction set.

    7. LF

      I don't know if the analogy holds out, but that's a beautiful notion, that (laughs) , uh, that there's, it's almost like a nudge towards, here's the simple core that's the essential...

    8. DP

      Yeah. And I think the surprising thing is still if we, if we brought back, uh, you know, the pioneers from the 1950s and showed them the instruction set architectures, they'd understand it. They'd, they'd say, "Wow, that doesn't look that different. Well, you know, I'm surprised." (laughs)

    9. LF

      (laughs)

    10. DP

      And it's, uh, there's... And maybe something, you know,, to talk about philosophical things, I mean, there may be something powerful about those f- f- you know, 40 or 50 instructions that all you need is, uh, these commands like th- uh, these instructions that we talked about. And that is sufficient to build, uh, to bring about an, you know, artificial intelligence. And so it's a remarkable... Surprising to me that as complicated, uh, as it is to, to build these things, uh, you know, uh, uh, microprocessors where the line widths are narrower than the wavelength of light, you know, is, uh, this amazing technology is at some fundamental level. Uh, the commands that software executes are really pretty straightforward and haven't changed that much in, in decades, uh, which, uh, what a surprising outcome.

    11. LF

      So underlying all computation, all Turing machines, all artificial intelligence systems perhaps might be a very simple instruction set like a, like a RISC-V or it's-

    12. DP

      Yeah. I mean, I, I... That's kind of what I said. I w- I was interested to see... I had another, f- uh, more senior faculty colleague, and he, he had written something in Scientific American and, uh, you know, his, 25 years in the future, and his turned out about when I was a young professor, and he, and he said, "Yep, I checked it." And so I was, I was interested to see how that was gonna turn out (laughs) -

    13. LF

      (laughs)

    14. DP

      ... for me. And, and it's pretty im- uh, it held up, uh, uh, pretty well. But yeah, so there's, there's probably, there's some- I, you know, there's, there must be something fundamental about, uh, those instructions that were capable of, uh, creating, you know, intelligence, uh, from pretty primitive operations and just doing them really fast. (laughs)

  12. 58:091:08:18

    How machine learning changed computers

    1. LF

      Y- you kinda mentioned the d- a different maybe radical computational medium, like biological, and there's other ideas. So th- there's a lot of space in ASICs that's domain-specific-

    2. DP

      Mm-hmm.

    3. LF

      ... and then there could be quantum computers. And would... So we can think of all those different mediums and types of computation. What's the connection between swapping out different hardware systems and the instruction set? Do you see those-

    4. DP

      Right.

    5. LF

      ... as disjoint or are they fundamentally coupled?

    6. DP

      Yeah. So what's... So kind of if we go back to the history, um, uh, you know, when Moore's Law is in full effect and you're getting twice as m- many transistors every couple of years, you know, kind of the challenge for computer designers is, how can we take advantage of that? How can we turn those transistors into better computers, uh, faster typically? And so there was an era, I guess in the '80s and '90s where computers were doubling performance every 18 months. And if you weren't around then, what would happen is, uh, you had your computer, um, and your friend's computer, which was like a year, a year and a half newer, and it was much faster than your computer. And you, he, he or she could get their work done much faster than your computer 'cause it's newer. So people took their computers, perfectly good computers, and threw them away to buy a newer computer because the computer one or two years later was so much faster. So that's what the world was like in the '80s and '90s. Well, with the slowing down of Moore's Law, that's no longer true, right? E- uh, now with, uh, now with... They're not desk-side computers with the laptops. I only get a new des- laptop when it breaks, right? When, "Oh, damn, the dis- broke..." or, "This display broke, I gotta buy a new computer." But before you would throw them away because it just, they were just so sluggish compared to the latest computers. So that's, you know, uh, th- that's, uh, a huge, uh, change of, of what's go- gone on. So but, since this lasted for decades, kind of programmers and maybe all of society is used to computers getting faster regularly. It n- we now, now believe those of us who are in computer design, it's called computer architecture, that, the, the path forward is, instead, is to add accelerators that only work well for s- certain applications. Um, so since Moore's Law is slowing down, we don't s- think general purpose computers are gonna get a lot faster. So the Intel processors of the world are not gonna, haven't been getting a lot faster. They've been, um, uh, barely improving, like a few percent a year. It used to be doubling every 18 months, and now it's doubling every 20 years. So it's just, which is shocking. So to be able to deliver on what Moore's Law used to do, we think what's gonna happen, what is happening right now is people adding accelerators to their microprocessors that only work well for some domains. And by sheer coincidence, at the same time that this is happening has been this revolution in artificial intelligence called machine learning. So, um, with, as I'm sure your other, uh, guests have said, you know, AI had these two competing schools of thought is that we could figure out artificial intelligence by just writing the rules top down or that was wrong, you had to look at data and infer what the rules are, the machine learning. And what's happened in the last...... decade or eight years as machine learning has one. And it turns out that machine learning, the hardware you build for machine learning is pretty much multiply. The matrix multiply is a key feature for the way pe- machine learning is done. So, uh, that's a godsend for computer designers. We know how to make mach- matrix multiply run really fast. So general purpose microprocessors are slowing down, we're adding accelerators for machine learning that fundamentally are doing matrix multiplies much more efficiently than general purpose computers have done. So, we have to come up with a new way to accelerate things. The danger of only accelerating one application is how important is that application. Turns, uh, t- turns out machine learning gets used for all kinds of things. So serendipitously, uh, w- we found something to accelerate that's widely applicable. Uh, and we don't even... We're in the middle of this revolution of machine learning. We're not sure what the limits of machine learning are. So, this has been, uh, kind of a godsend. If you're gonna be able to acce- deliver on improved performance, as long as people are moving their programs to be embracing more machine learning, we know how to give them more performance, even as Moore's Law is slowing down.

    7. LF

      And counterintuitively, the machine learning mechanism, you can say, is domain specific. But because it's leveraging data, it's actually could be very broad in terms of, in terms of the domains it could be applied in.

    8. DP

      Yeah, that's exactly right.

    9. LF

      Sort of, it's almost sort of, uh, people sometimes talk about the idea of Software 2.0. We're almost taking another step up in the abstraction layer in designing machine learning systems, because now you're programming in a space of data, in the space of hyper parameters, it's changing fundamentally the nature of programming. And so the specialized devices that, uh, that accelerate the performance, especially neural network based machine learning systems, might become the new general.

    10. DP

      Yeah.

    11. LF

      The-

    12. DP

      Yes. So the, this, the thing that's interesting to point out, these are not corre- these are not tied together. The enth- enthusiasm about machine learning, about creating programs driven from data that we should figure out the answers from data rather than kind of top down, which is classically the way most programming is done and the way artificial intelligence used to be done, that's a movement that's going on at the same time. Coincidentally... And, and the, and the first word in machine learning is machines, right? So that's going to increase the demand for computing. Because instead of programmers being smart writing those, those things down, we're gonna instead use computers to examine a lot of data to kind of create the programs. That's the idea. And remarkably, this gets used for all kinds of things very successfully. The image recognition, the language translation, the game playing, and, you know, it gets into, uh, com- pieces of the software stack like databases and stuff like that. We're not quite sure how general purpose it is, but that's going on independent of this hardware stuff. What's happening on the hardware side is Moore's Law is slowing down right when we need a lot more cycles.

    13. LF

      Mm-hmm.

    14. DP

      It's failing us, it's failing us right when we need it, 'cause w- there's gonna be a greater impiece- uh, a greater increase in computing. And then this idea that we're gonna do so-called domain specific, here's a domain that, that your greatest fear is you'll make this one thing work, and that'll help, you know, 5% of the people in the world. Well, this, this looks like it's a very general purpose thing. So the timing is fortuitous that if we can... Perhaps, if we can keep building hardware that will accelerate machine learning, the, the neural networks, that'll, uh, be- the timing will be right that that neural network tr- revolution will transform, you know, software, the so-called Software 2.0. And the software of the future will be very different from the software of the past. And just as our microprocessors, even though we're still gonna have that same basic RISC constructions t- to run, uh, big pieces of the software stack like user interfaces and stuff like that, we can accelerate the, the kind of the small piece that's computationally intensive. It's not lots of lines of code, but there, it takes a lot of cycles to run that code, that that's gonna be the accelerator piece. So this, that's what makes this, from a computer designer's perspective, a really interesting decade. Uh, what Hennessy and I talked about in our, the title of our Turing Award speech is A New Golden Age. We, we see this as a very exciting decade, uh, much like when we were assistant professors and the RISC stuff was going on, that was a very exciting time. It was where we were changing what was going on. We see this happening again, tremendous opportunities of people, because we're fundamentally changing how software is built and how we're running it.

    15. LF

      So at which layer of the abstraction do you think most of the acceleration might be happening? The, if you look in the next 10 years, sort of Google is working on a lot of exciting stuff with the TPU, sort of there's a closer to the hardware, there could be optimizations around the, uh, uh, ro- closer to the instruction set. There could be optimization at the compiler level, it could be even at the higher level software stack.

    16. DP

      Yeah, it's gonna be... Uh, I mean, uh, if you think about the, the old RISC/SIS debate, it was both, uh, com- it was software/hardware. It was the-

    17. LF

      Mm-hmm.

    18. DP

      ... compilers improving as well as the architecture improving. And th- that's likely to be the way things are now. With machine learning, they, they're using, uh, domain specific languages, the languages like, uh, TensorFlow and PyTorch are very popular with the machine learning people that those are... The raising the level of distraction, it's easier for people to write machine learning in these, uh, domain specific languages like, like, uh, PyTorch and, uh, TensorFlow.

    19. LF

      So where the most of the optimization might be happening?

    20. DP

      Yeah. And so the, and so...... there'll be both the compiler piece and the hardware piece underne- neath it. So as you, kind of the fatal flaw for hardware people is to create really great hardware, but not have brought along the compilers. And what we're seeing right now in the marketplace, because of this enthusiasm around hardware for machine learning, is getting, you know, probably a billions of dollars invested in startup companies. We're seeing startup companies go belly up because they focus on the hard work, but didn't bring the software stack along.

  13. 1:08:181:16:30

    Machine learning benchmarks

    1. DP

      Uh, we talked about benchmarks earlier. So I participated, uh, in machine learning. Didn't really have a set of benchmarks, uh, I think just two years ago they didn't have a set of benchmarks, and we've created something called MLPerf, which is machine learning benchmark suite and pretty much the companies who didn't invest in software stack couldn't run MLPerf very well, and the ones who did invest in software stack did. And we're seeing, you know, like kind of in computer architecture, this is what happens. You have these arguments about RISC versus CISC. People spend billions of dollars in the marketplace to see who wins and it's not, it's not a perfect comparison, but it kind of sorts things out and we're seeing companies go out of business and then companies like, uh, like, um, there's a company in Israel called Habana, they came up with machine learning accelerators, th- they had good MLPerf scores. Uh, Intel had acquired a company earlier called Nervana a couple years ago. They didn't reveal their MLPerf scores, which was suspicious, but a month ago, uh, Intel announced that they're canceling the Nervana product line and they've bought Habana for two billion dollars and Intel's gonna be sh- shipping Habana chips which have hardware and software and run the MLPerf programs pretty well, and that's gonna be their product line in the future.

    2. LF

      Brilliant. So m- maybe just a link or briefly on MLPerf. Uh, I love metrics, I love standards that everyone can gather around. Uh, what, what are some interesting aspects of that, uh, portfolio of metrics?

    3. DP

      Well, one of the interesting metrics is, uh, you know, what we thought, i- it was, you know, we, I was involved in the start, uh, you know, we, uh, Peter Mattson is leading the effort from Google. Google got it off the ground, but we had to reach out to competitors and say, um, "There's no benchmarks here. This, we don't, we think this is bad for the field. It'll be much better if we look at examples." Like in the RISC days, there was an effort to create a, for the, the people in the RISC community got together, com- competitors got together and were building RISC microprocessors to agree on a set of benchmarks that were called spec, and that was good for the industry. It's rather, before the different RISC architectures were arguing, "Well, you can believe my performance over others, but those other guys are liars." And that didn't do any good. So we agreed on a set of benchmarks and then we could figure out who was faster between the various RISC architectures, but it was a little bit faster. But that grew the market rather than, you know, people were afraid to buy anything, so we argued the same thing would happen with MLPerf, you know, companies like NVIDIA were, you know, maybe worried that it was some kind of trap, but eventually, uh, we all got together to create a set of benchmarks and, uh, do the right thing, right? And we agree on the results and so we can see whether TPUs or GPUs or CPUs are really faster and how much they're faster. And I think from an engineer's perspective, as long as the results are fair, you're o- you can live with it. Okay, you know, you kind of tip your hat to your, to your colleagues at another institution, "Boy, they did a better job than us." What you, what you hate is if it's, it's false, right? They're making claims and it's just marketing bullshit and, you know, and that's affecting sales. So y- from an engineer's perspective, as long as it's a fair comparison and we don't come in first place, that's too bad, but it's fair. So we wanted to create that environment for MLPerf and so now, uh, there's a, a 10 companies, I mean, 10 universities and 50 companies involved. So pretty much MLPerf has, uh, is the, is the way you measure machine learning, uh, performance, um, and, and it didn't exist even two years ago.

    4. LF

      One of the cool things that I enjoy about the internet, it has a few downsides, but one of the nice things is, um, people can see through BS a little better with the presence-

    5. DP

      Yes.

    6. LF

      ... of these kinds of metrics. It's r- so it's really nice, uh, companies like Google and Facebook and Twitter, now it's the cool thing to do is to put your engineers forward and to actually show off how well you do on these metrics. There's not sort of, um, a, it...

    7. DP

      Well-

    8. LF

      There's less of a desire to do marketing, uh, less so, in my, in my sort of naive viewpoint.

    9. DP

      No, I, I think, well, I was trying to understand the, you know, what's changed from the '80s in this era. I, I think, uh, because of things like social networking, Twitter and stuff like that, if you, if you put up, you know, uh, bullshit stuff, right? That's just, you know, mis- purposely misleading, you know, there, you, you can get a violent reaction in social media pointing out the flaws in your arguments, right? And so from a marketing perspective, you have to be careful today-

    10. LF

      Yes.

    11. DP

      ... that you didn't have to be careful, that, uh, there'll be people who, who put out the flaw... You can get the word out about the flaws in what you're saying much more easily today than in the past. You used to be, it was used to be easier to get away with it. And the other thing that's been happening in terms of showing off engineers is just, uh, in, in the software side, people have largely embraced open source software.

    12. LF

      Yes.

    13. DP

      It, it was 20 years ago, it was a dirty word at Microsoft, and today Microsoft is one of the big proponents of open source software. The kind of, that's the standard way most software gets built, which really shows off your engineers because you can see, if you look at the source code, you can see who are making the commits, who's making the improvements, who are the engineers at all these companies who are, uh, are, you know, really, uh, great, uh, programmers and engineers and making really solid contributions which enhances their reputations and the reputation of the companies.

    14. LF

      ... though, but that's, of course, not everywhere, like in the space that I work more in these autonomous vehicles, and they're still... The machinery of hype and marketing is still very strong there, and there's less willingness to be open in this kind of open source way-

    15. DP

      Yeah.

    16. LF

      ... and sort of benchmarks, so, uh, MLPerf, it represents the machine learning world is much better being open source about holding itself to standards of different... The amount of incredible benchmarks in terms of the different computer vision-

    17. DP

      Mm-hmm.

    18. LF

      ... natural language processing, uh, tasks is incredible.

    19. DP

      Yeah. It actually, you know, it, you know, f- historically, it wasn't always that way. Um, I had a graduate student working with me, David Martin. So for, in computer, in some fields, benchmarking is, been around forever, so, uh, computer architecture, uh, databases, uh, maybe operating systems, uh, benchmarks are, uh, the way you measure progress. But, uh, he was working with me and then started working with Jitendra Malik, and he's, uh, Jitendra Malik, in computer vision space, who, I, I guess you've interviewed Jitendra.

    20. LF

      Yes. Mm-hmm.

    21. DP

      And, uh, uh, Dave Martin told me, "They don't have benchmarks." (laughs) Everybody has their own vision algorithm, and the way, my, here's my image, look at how well I do, and everybody had their own image. So, David Martin, uh, back when he did his dissertation, uh, figured out a way to do benchmarks. He had a bunch of graduate students, uh, identify images, and then ran benchmarks to see which algorithms run well. And that was, as far as I know, kind of the first time people did benchmarks in computer vision, and, uh, which was predated all of, you know, the things that eventually led to ImageNet and stuff like that. But then, you know, the vision community got religion, and then once we got as far as ImageNet, then, uh, that let, uh, uh, the guys in Toronto, um, be able to win the ImageNet competition, and then, you know, that changed th- the whole world.

    22. LF

      It's a scary step actually, because, uh, when you enter the world of benchmarks, you actually have to be good to participate as opposed to, uh...

    23. DP

      Yeah, you can just, you just believe you're the best in the world.

    24. LF

      (laughs)

    25. DP

      Uh, and I, I think the people, I think they weren't purposely misleading. I think if you don't have benchmarks, I mean, how do you know? You know, you could have... Your intuition is kind of like the way we did- used to do computer architecture. Your intuition is that this is the right instruction set to do this job. I believe, in m- my experience, my hunch is that's true. We had to get, to make things more quantitative, uh, to, to make progress. And so I just don't know how, you know, in fields that don't have benchmarks, I don't understand how they figure out how they're making progress.

  14. 1:16:301:19:41

    Quantum computing

    1. LF

      We're kind of in the, uh, vacuum tube days of quantum computing. What are your thoughts in this wholly different kind of-

    2. DP

      Yeah.

    3. LF

      ... space of architectures?

    4. DP

      Uh, you know, I actually (laughs) , you know, quantum computing, uh, is, idea's been around for a while, and I actually thought, "Well, I sure hope, uh, I retire before I have to start teaching this." (laughs)

    5. LF

      (laughs)

    6. DP

      Uh, I'd say, uh, because I talk about, give these talks about the slowing of Moore's Law and, um, you know, when we need to, uh, change by, uh, doing domain-specific accelerators, uh, common questions say, "What about quantum computing?" The reason that comes up, it's in the news all the time. So, I think to keep in, the hard thing to keep in mind is, quantum computing is not right around the corner. Uh, there have been two national reports, one by the National Academy of Engineering and other by the Computing Consortium where they did a frank assessment of, of quantum computing, and, uh, both of those reports said, you know, as far as we can tell, before you get error-corrected quantum computing, it's a decade away. So, I think of it like nuclear fusion, right? Uh, there have been people who've been excited about nuclear fusion a long time. If we ever get nuclear fusion, it's gonna be fantastic for the world. I'm glad people are working on it. But, you know, it's not right around the corner. Um, the- those two reports to me say probably it'll be 2030 before quantum computing is a, uh, something that could happen. And when it does happen, you know, this is gonna be big science stuff. This is, uh, you know, microkelvin, almost absolute zero things, that if they vibrate, if truck goes by, uh, it won't work, right?

    7. LF

      Mm-hmm.

    8. DP

      So this'll be in data center stuff. We're not gonna have a quantum cellphone, uh, and, and it's probably a 2030 kind of thing. So I'm happy that our people are working on it. But just, you know, it's hard with all the news about it not to think that it's right around the corner. Uh, and that's why we need to do something as Moore's Law is slowing down to provide the com- com- computing, keep computing getting better for this next decade. And, and, you know, we shouldn't be betting on quantum computing, um... Uh, uh, or, or expecting quantum computing to deliver in the next few years, it's, it's probably further off, you know. I, I'd be happy to be wrong. It'd be great if quantum computing is gonna commercially viable, but it will be a set of applications. It's not a general purpose computation, so it's gonna do some amazing things, but there'll be a lot of things that probably, uh, you know, the, the old-fashioned computers are gonna keep doing better for quite a while.

    9. LF

      And there'll be a teenager 50 years from now watching this video saying, "Look how silly David Patterson was, uh, saying when-"

    10. DP

      No, I said, I said 2030 (laughs) .

    11. LF

      Okay.

    12. DP

      I didn't say-

    13. LF

      Sorry.

    14. DP

      ... I didn't say never (laughs) .

    15. LF

      We're not gonna have quantum cellphones, so he's gonna be watching in a phone-

    16. DP

      Well, I, I mean, I, I, I think-

    17. LF

      (laughs)

    18. DP

      ... this is such a, you know, given that we've had Moore's Law, I just, uh, I feel comfortable trying to do projects that are thinking about the next decade. Uh, I, I admire people who are trying to do things that are 30 years out, but it's such a fast moving field, uh, I just don't know how to... I, I'm not good enough to figure out what, what's the problem's gonna be in 30 years. Uh, you know, 10 years is hard enough for me.

  15. 1:19:411:28:22

    Moore's law

    1. LF

      So maybe if, if it's possible to untangle your intuition a little bit, I spoke with Jim Keller, I don't know if you're familiar with Jim.

    2. DP

      Mm-hmm.

    3. LF

      And he, he, hi- he is trying to sort of, uh, be a little bit rebellious and to try to think that, uh s-

    4. DP

      Yes. He, he quotes me as being-

    5. LF

      ... wrong. (laughs)

    6. DP

      Yeah, so this is a-

    7. LF

      What are your th- w- what are they really? For the rec-

    8. DP

      (laughs)

    9. LF

      For the record, (laughs) Jim tra- uh, talks about that he has an intuition that Moore's Law is not in feg- in fact dead yet, and that it may continue for some time to come. What are your thoughts about Jim's ideas in this space?

    10. DP

      Yeah, this is just m- this is just marketing. So, what Gordon Moore said-

    11. LF

      (laughs)

    12. DP

      ... is a quantitative prediction. It, it, we can check the facts, right, which is doubling the number of transistors every two years. So, we can look back at Intel for the last five years and ask him, "Let's look at DRAM chips, um, six years ago." So that would be three two-year periods. So then, our DRAM chips have eight times as many transistors as they did six years ago. We can look up Intel microprocessors, you know, six years ago. If Moore's Law is continuing, it should have eight times as many transistors as six years ago. The answer in both those cases is no. The problem has been because Moore's Law was kind of genuinely embraced by the semiconductor industry is they would make investments in similar equipment to make Moore's Law come true, semiconductor improving and Moore's Law, in many people's minds, are the same thing.

    13. LF

      Hm.

    14. DP

      So when I say, and I'm factually correct, that Moore's Law is no longer holds, we are not doubling transistors every use, years. The downside for a company like Intel is people think that means it's stopped.

    15. LF

      Right.

    16. DP

      That technology has no longer improved. And so Jim is trying to f- uh, uh, re- uh, counteract the impression that semiconductors are frozen in 2019, are never gonna get better. So, I never said that. (laughs)

    17. LF

      Okay.

    18. DP

      All I said was Moore's Law is no more, and I'm-

    19. LF

      Strictly looking at the number of transistors, they could, at-

    20. DP

      That, 'cause that's what Moore, that's what Moore's Law is. Uh, there's the, uh, I don't know, there's been this, uh, aura associated with Moore's Law that they've enjoyed for 50 years about, "Look at the field we're in. We're doubling transistors every two years. What an amazing field," which is an amazing thing that they were able to pull off. But even as Gordon Moore said, you know, "No exponential can last forever." It lasted for 50 years, which is amazing, and this is a huge impact on the industry, because of these changes that we've been talking about. So, he claims, e- because he's trying to act... and he claims, you know, Patterson says, "Moore's Law is no more," and look at all, look at it, it's still c- going. And, uh, TSMC, uh, they'll say, uh, "It's, uh, no longer." But there, but (laughs) there's quantitative evidence that Moore's Law is not continuing.

    21. LF

      Yes.

    22. DP

      So, what I say now to try and, okay, I understand the m- the m- the perception problem when I say Moore's Law is stopped. Okay, so now I say Moore's Law is slowing down and-

    23. LF

      (laughs)

    24. DP

      ... I think Jim, which is another way of s- (laughs) if he's p- if it's predicting every two years and I say it's slowing down, then, um, that's another way of saying it doesn't hold anymore. And, and I think Jim wouldn't disagree that it's slowing down. Because that sounds like it's, things are still getting better, just not as fast, which is an- another way of saying Moore's Law isn't working anymore.

    25. LF

      It's still good for marketing. But, uh, th- but what's your... you're not... you don't like expanding the definition of Moore's Law? Sort of, uh, naturally...

    26. DP

      Well, you know, it's, you know, as a educator, you know, (laughs) our, you know, it's just like modern politics. Does everybody get their own facts? (laughs) Or do we have... you know, Moore's Law was a crisp, you know, a, more-

    27. LF

      Yes.

    28. DP

      It was... Carver Mead looked at his obs- uh, Moore's conservizations, uh, drawing on a log-log scale, a straight line, and that's what the definition of Moore's Law is. There's this other... what Intel did for a while, i- interestingly, uh, before Jim joined them, they said, "Oh, no, Moore's Law isn't the number of doubling, isn't really doubling transistors every two years. Moore's Law is the cost of the individual tris- transistor going down, uh, cutting i- in half, uh, every two years." Now, that's not what he said, but they reinterpreted it because they believed that the n-, that the, the cost of transistors was continuing to drop even if they couldn't get twice the number of chips.

Episode duration: 1:49:50

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode naed4C4hfAg

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome