Stanford OnlineStanford CS153 Frontier Systems | Jensen Huang from NVIDIA on the Compute Behind Intelligence
EVERY SPOKEN WORD
65 min read · 12,789 words- SPSpeaker
I would like to welcome back Preacher Huang. [audience applauding] We have been now in a, locked in a global race way faster than NASCAR racing, and it's partly your fault. Jensen's been the preacher that's given us all the power we need, all the energy, and some more to have what I think has been the craziest 12 months of my life, certainly for many of you, and we're just getting started. Um, the energy with which you approach every single thing you do, including the class last year, and then all, every time I've had the chance to hang out with you, you've given so much time to the students, to the founders. Thank you. Should we jump right in?
- JHJensen Huang
Yeah, let's go.
- SPSpeaker
All right. We're gonna go rapid fire. What is co-design, and why is it so important?
- JHJensen Huang
Uh, I'll, I'll answer that in a second.
- SPSpeaker
Yes, please.
- JHJensen Huang
Um, but, but, uh, this is a great time to be in computer science, and obviously the reason is because computing is being reinvented for the first time as dramatically as, as it is for the first time really in about 60-plus years. Uh, the computer that we know of, that you all use in our computing model, our mental model of a computer, the architecture of a computer, how you write the program, run the program, how you think about even taking computers to market, what it's used for, for 64 years it has been largely the same since the s- IBM System 360. In fact, my first architecture book for learning about computer architecture was the System 360's manual. And so, so a lot has changed, um, as we went from PCs to internet and mobile and cloud and all those things, but the fact of the matter is the computing model, the fundamental part of computer science has largely remained the same until now. You know, for the first time, uh, the way you write the software, how you process the neural network versus the software, and what the applications can do has now dram-dramatically changed. Everything is fundamentally different. At the highest level, you know, one simple way to think about it is, is, um, uh, computing as we knew it before was largely pre-recorded. It's content that we pr- we pr-pre-recorded, images, videos, you know, software that we re- largely pre-recorded. But now everything is generated, and the nice thing about generating everything in real time is that it could be contextually consistent, con-con-contextually relevant to what, what it is that you're dealing with. And of course, um, it can respond, uh, to your intention, not just explicitly, uh, to the things that you instruct. And, and so, so the computer, the computer is, is, um, uh, fundamentally different in that way. Now the question is what does that mean, uh, at every single layer of the stack, from, from, uh, how the computer, how the software is now developed, the methodology of it, how you organize your company to be able to develop software of today completely changed. And so the methodology, the tools we use, uh, the approach that we think about software coding, uh, completely changed. Uh, how we run the software, neural network versus compiled binaries, um, very, very different. And so what does that mean to the computer system, the network, the storage? Um, what does that mean to the software stack and the cloud services that sit on top of that? And of course, you know, everything about the applications. What did it open up? And, uh, somebody just, uh, somebody just came and said, uh, this piece of software we just opened up called Alpamayo, and, uh, I've been working on self-driving cars now for about 13 years. And, and, um, I, and the, uh, and the, and the days of robotaxis are gonna be literally everywhere. You know, everything that moves will be robotic. And, and that's an example of an application that, that, uh, we wouldn't consider doing until deep learning and artificial intelligence came along. That was such a big unlock, um, that, you know, I said, "Hey, aha, uh, this, all of these problems that we wanted to solve in the past that we need a computer vision for, uh, really, really, uh, are now fundamentally unlocked." And so, so it's how you think about every single stage of that. What is, you know, what is a software engineer? How do you organize the company? Uh, what is a computer for the age of AI? How do you architect that? All the way to what you can use it for, and therefore, uh, therefore, um, uh, where you would deploy it. Um, all of that has fundamentally changed. And, and, uh, for me, the journey really started about 15 years ago. And, uh, uh, I had the benefit of, of seeing some early works in, in the area. And, uh, as all Stanford students do, uh, you break the problem down, you reason about it from first principles, and you come to the conclusion literally everything has changed. And so here you are, you know, computer scientist students, uh, this is really the first generation of AI becoming, uh, useful. And, um, where we a couple years ago was in the generative part of AI, uh, and, and as you guys know, uh, generative AI not only made it, made it cool for us to do image generation and text summarization and translation and whatnot, but generative al- generative al- AI also enabled us to think. And so when I saw generative AI, uh, you know, when o- other people saw was that it was able to generate images, and I, I, and I surely would appreciated that as well. Uh, but the fact that you can generatethoughts i-in the form of images, but you can generate thoughts, uh, you can also reason with it. And the ability for AI to think after GPT, uh, was, was very, very obvious. Now the question is, is, uh, how would you train, how would you fine-tune an AI, uh, to be able to reason step by step by step, and how would you teach it how to do so at, at fairly large scale in a kinda semi-supervised way? And so those are kind of the engineering problems you had to solve. Uh, but the moment you see GPT, you say, "Aha, uh, thinking is just around the corner." And thinking is generating tokens that you consume internally, and, uh, generating tokens that you consume externally, uh, would be called tool use. And so the idea that, that after GPT happened two years ago, that we would be at this moment was fairly easy to predict. Now, of course, un-un, you know, un-unbelievable amount of technology was invented and a lot of m-amazing people did amazing work, but you could almost see that moment here. And so here we are, you now have agentic systems, and so now the question is, is what's next? And what happens in a world where a computer is not, uh, not responsive to what you ask it to do? It's not, it's not b- on demand. You know, today's computing is really on-demand computing. The word on demand was actually gener- created i-in our generation to talk about how you think about using computers. Uh, time-sharing computers that you would use on demand became cloud computers. And cloud computing, of course, is on demand, but, uh, in your new world of agentic system, uh, these con- the computers are now continuously running. And so what happens in a world where the computers are continuously running? Uh, what happens to cloud services? What happen to your personal computer? What happens to, you know, all of these different sy-systems? Now there's a great ch- great opportunity again to rethink all of that. And so, so what I, w- you know, as kinda my, my, my introduction to everything about, um, computer science has changed and, and everything about every field of science has changed because of the things that we've changed and, and so it's a g- a good time to go to school.
- SPSpeaker
Okay.
- JHJensen Huang
That's it. What was your question?
- SPSpeaker
You know what? Uh, I'm just gonna turn it over to the kids.
- JHJensen Huang
Oh, oh, co-design, co-design, co-design.
- SPSpeaker
No, l-let, let, let's just go into the st- the students have questions. They've all been asking questions in Discord. They're all voting on each other's questions.
- JHJensen Huang
Co-co-design's really interesting, but it's not, it... Co-design's super interesting. And, and basically, co-design says, it said back in the old days, we abstracted computing so that, so that, um, uh, the people who designed microprocessors designed microprocessors, people who, uh, worked on compilers worked on compilers, and people who worked on languages worked on languages, and so on and so forth. You guys know that. And we actually had different fields. Um, but the problem, and in fact, this happened at Stanford, uh, what's the, what's the beauty of RISC? What was the beauty of the work that John Hennessy did? Um, it, it, the beauty of it is that you ought to think about compilers and microprocessor architectures harmoniously co-designed, because otherwise you could end up creating a microprocessor that's super, super tight and, you know, everything is, is maximally optimized, but unfortunately, it's hard to compile. It's, it's difficult, it, not compilable. And so they created a, a simpler instruction set that exposed simplicity to compilers so that compilers could do a better job generating code. And it turns out a simpler machine co-designed with a compiler creates better performance than two systems that were optimized individually. That's, you know, k- that's very Stanford, okay? And this is a, this is part of your heritage a-as all of your, and, and, uh, John Hennessy's, you know, trail of amazing work that's left behind. And so you take, you take that and you think about, well, what happens in, in the post world of general purpose computing? Why is it that every problem in, in computer science would be solvable by a general purpose instrument? At some level, you know, you could say, well, if you had a general purpose instrument, you prefer that. However, there are some extreme problems, whether it's computer graphics back in the old days, or molecular dynamics, or quantum chemistry, or, you know, f- you know, fluid dynamics and large multi-scale, meso-scale multi-physics problems, or deep learning. These problems are so computationally intense, why would you use a general purpose computer to go do that? And so there, y- the big insight is what if you understood the algorithms, understood the computer systems, understood the, you know, if you will, the compilers, the frameworks, and understood the architecture of chips, and you were optimizing all of it at the same time. And so the, the facts, here are the facts. This is what happens when you do it, what I just described. NVIDIA is probably the first computer systems company that's extreme co-design, meaning we, we literally co-design across all of that, and including CPUs, GPUs, networking, and switches, and every- and storage. And so the question is, what do you get? Well, Moore's Law back in the old days, you guys all know about that. Moore's Law was about, uh, 2X every 18 months, so call it 10X every five years. Okay, so 10X every five years is 100X every 10 years. And that's, that was in the good old days of Moore's Law, and for all the computer, computer scientists in the room, you know, you know that Moore's Law was underpinned by a concept called Dennard scaling, and Dennard scaling ran out of steam, um, several years ago, probably about a decade ago, in fact. And we, we kept squeezing it, we kept squeezing it, but, uh, over the course of last 10 years, if you just allowed microprocessors to continue to scale and you just don't touch the software and just benefited from the speed up of semiconductors and m-microprocessor design, you, at best case, you would've gotten 100X, but probably because Dennard scaling slowed down and Moore's Law largely ended, you know, you probably got something along the lines of 10X over the course of 10 years. Well, in the case of NVIDIA and co-design, we got 1 million X over 10 years1 million X. And so somewhere between 100,000 X and 1 million X, okay? So there, when you're talking about numbers that big, it really doesn't matter. And so 1 million X over 10 years, it got-- We, we were able to get scaling and computation scale so large so fast that AI researchers say, "Why don't we just take all of the internet? Why even worry about what data to go curate and what cr- what data to create? Let's just take all of the world's data and just give it to the computer." And that's really the big breakthrough. When you're able to, to, to do something so insanely fast, you know, for example, if you were able to travel at the speed of light, uh, where we choose to live is, doesn't matter. Uh, if you were able to go from New York to California in 10 minutes, uh, you know, our freedom, everything about society would change, right? And so if you're able to do computing a million times faster, everything about computing s- computing changed, and that's really the big breakthrough. Because of co-design, because of the way NVIDIA approached it, we accelerated computing by so, so far that it created all this infinite abundance opportunity for everybody to, to think about the future. And so anyways, here we are.
- SPSpeaker
Cool. I have a bunch of follow-up questions, but I'm not gonna ask them.
- JHJensen Huang
One, that one word led to that.
- SPSpeaker
GPT-10, ladies and gentlemen.
- JHJensen Huang
That's what it's like to work at NVIDIA. You give me one word, and you get ranted at for about half an hour. Because I got too much to, too much to share with you.
- SPSpeaker
The question is w- how should education evolve in response to the industry as it's changing?
- JHJensen Huang
Yeah, and, and that's a really excellent question, and, and I think the answer clearly is, uh, AI should be part of your curriculum, not just in learning about AI, but using AI for the curriculum. The, the problem with, with textbooks, as you know, it takes an enormous amount of effort to do. And when I was taking classes at Stanford, Professor Hennessy was still writing his textbook. It was all handwritten out, and, and each week, it seemed like he was writing a chapter. I don't even know how he writes a chapter a week, but every week, he was writing about a chapter. And, and then over time, all of those notes turned into a textbook, into the first edition, and that must have taken several years. And so I, I think, I think, um, it, it's not, it's not possible for universities, for, you know, pre-recorded textbooks to keep up with information and knowledge that's being generated in re-real time by AI. And so I think the future probably has to be some union of the two, and, and I, I don't know about you guys, but I, I can't learn anymore without AI. And so not only do I have the AI read the papers, um, but I also ha- Once it reads the papers, I might ask it to go read, you know, a whole bunch of the other papers that are associated with it, and then now it becomes a super researcher, and then I can, I can-- First, I ask it to summarize, um, I ask it some basic questions, and then after that, you interact with that paper as if it's a researcher that's dedicated to you. And so most people don't realize that. You know, I think a lot of people still think that you, you summarize a document. But in the process of summarizing the document, that AI learned a lot. And so, and I, I, um, uh, I think that in the future, I, I do hope that, that curriculums are, are tightly integrated. Um, I, I will say, in defense of the textbooks, though, I will say that the first principles don't change. You know, in the final analysis, uh, uh, Mead & Conway is still a solid, a, a, a fundamental methodology as, as before. It, it, it is true that the scaling process that led to, um, constant, constant current density, um, constant, uh, power density, all, all of that, all of those design optimizations associated with modern semiconductor design, you know, the, the-- We've, we've kind of exhausted all of that. None of that is ISO anything anymore, and so, um, but it's still good to know where we came from, you know? And so I, I would still encourage the, to appreciate the first principles and, and you know, while, while I was going to, to Stanford, I was als-already working at AMD and, um, and I was designing microprocessors at the time, and it was still, it was still really good to, to see simultaneously, um, how, how we design things in practice versus, uh, the first principle methods associated with learning about, eventually, uh, how to design these things. And, and, um, I, I, I really enjoyed having, you know, feet on both sides of it, and I, I ended up learning a lot more. And so what that means is when you're using AI, which is real world, it's contextually relevant now, um, it's, it's contemporary, and meanwhile, you have first principle knowledge that you're learning at the same time. You're kind of getting the same thing that I experienced. There-
- SPSpeaker
The question is op- what are your thoughts on open source? How do we, how does open source stay at the frontier?
- JHJensen Huang
Yeah, there's really the question of closed source versus closed proprietary software versus open source. There's a question of my intentions with open source, and so I'll start with my intentions of open source. Um, first of all, uh, NVIDIA uses more Anthropic and OpenAI tokens than just about anybody. Uh, and, and the reason for that is obviously we do a lot of coding, we do a lot of design, and 100% of our engineers are now agentic-agentically supported. And so, so I want them to be working with agents using the latest generation tools and re- and remon- uh, modernize how NVIDIA does work altogether, okay? So number one, if you can use, uh, OpenAI and Anthropic, I would highly recommend you use it, and the reason for that is because it's useful. It works really well. It's getting better all the time, and it's, you know, as a, as you know, large language models is the technology inside, but Claude is a product cl- and Claude Code is a whole harness around it, and that harness is getting better all the time, and the model's getting better all the time. It's not un-- It's not likely that anybody open source go to GitHub, download something that's gonna work nearly as well, okay? So, so I, I highly recommend and, and we do, um, use off-the-shelf frontier AI models. The question is why is it there that we're advancing and working so hard on open, on, on open models? The reason for that is because language models are very important because they representThe, the codification of our intelligence and, and, uh, we wanna automate ourselves especially is a very important part. But you, you, you need to know that, know that AI is about learning the representation, the meaning, the structure of information. And so the question is, where is information? Well, we're living in information right now as we speak. The reason why there's structure is the reason why every day you show up, it's kinda largely the same, otherwise it'd be like practically white noise. And so the fact that biological systems and physical systems have structure, and from that structure, I must be able to learn higher level representation. And if I can learn the representation, then I could manipulate it. Then I can... Does that make sense? And so just because I can learn the representation of, of language, I can then generate it, I can manipulate it, you know, I could put it to use. And so I wanna do the same thing for chemicals and, and proteins and genes and, uh, physics and physical systems, robotics, for example. And so notice the way you represent all of those things are fundamentally different because the structure is different and the dimensionality is different. How you train it is fundamentally different, right? Because you don't have a whole bunch of internet corpus of human language on it. So you, you gotta come up with new, new strategies for all of that stuff. And so we decided that we would dedicate ourselves in some fundamental pillars of-- And because we have the company, the company has the talent and the scale, we have the ability to put the first piece of artifact out in the world, data, model, how to train it, in several different domains. And so some of the domains I care very much about, uh, one of them is called, of course, Nemotron's language, and I'll come back to that in a second why it is that we're doing it. And then second is BioNemo, that's for biology. And, and, uh, um, we have, uh, Alpamayo, somebody mentioned it earlier, for autonomous vehicles. Uh, basically artificial intelligence, uh, navigation. And then, and then, um, uh, we have, uh, Groot, which is, uh, a humanoid articulation robotics, gener- artificial general robotics. Uh, and, and, and then we have, uh, climate science, you know, basically meso-scale multiphysics. Okay? And so all of these different area, these different domains, uh, we decided that, that we should go and pioneer it. And the reason for that is because otherwise, the scientists in these different domains, they simply won't have the scale and the technology necessary to go build that foundation model. And so we decided that we would do that. Okay? So that-- And, and as a result of doing that, we activated healthcare, we activated life sciences, we act- We're working with every single self-driving car company in the world, doesn't matter which one it is. You know, there's NVIDIA in there somewhere. And so we're, we, we enabled that entire ecosystem to really flourish and, and we're working with robotics right now and, you know, so on and so forth. Okay? Without us making that first effort and building a foundation model, it's hard to activate the whole industry downstream. And so it's about, really about expanding AI and, and, uh, democratizing this capability. The, the reason why we do language models is because two reasons. One, there are too many, too many societies where the scale of their language is not big enough for somebody else to decide to make it a high priority. They'll understand Sweden, Swedish, but making Swedish a top priority is not, not, not likely because the country is, is big, but not so big. Uh, Chinese, of course, well taken care of. Uh, Indian, certain dialects very well taken care of, but as you know, you have like 230 others. And so there are too many others. Now, unless you deeply care, it's never gonna be great. And human intelligence, no matter the size of your population, uh, you-- somebody should care. And so we created a, a large language model that's near frontier, Nemotron is close to frontier, and we l- we make everything available so that if somebody wants to then fine-tune it into whatever language of their choice, they got no trouble doing that. Okay? And so-- And then the second reason is very important is because we want to also take these language models and fuse it with the domain-specific models because of human priors. So for example, Alpamayo is a language model fused with a, a world model. And so on the one hand, it's really designed to detect cars and roads and things like that. But on the other hand, we also believe that if the AI model, if Alpamayo, the self-driving car model, can reason, reason like a human, and it could reason with human priors, then, uh, the number, the amount of experiences it needs to have before it could be an extremely good and safe driving car is dramatically reduced. The amount of training data is reduced, and we've proven that. Alpamayo is probably one of the most effective self-driving car systems in the world, and it's really only experienced, you know, a few million miles, not billions of miles. And so that kinda te- the system actually works. Okay? So anyways, I just gave you a long-winded answer for... I broke it all down. You can't just ask a simple question.
- SPSpeaker
Well, what we talked about-
- JHJensen Huang
But open models is really important. And then, and one, one more thing, okay? If there's nothin'-- That wasn't enough. One more thing. Uh, if you want, if you care to have AI be safe and secure, it has to be open. And the reason for that is you can't defend against a black box, and you can't secure a black box. And you can't put a black box of some cap- incredible capability into your system with it completely, completely opaque. Now, of course, there's a lot of different ways you could solve the opaqueness. For example, you could say, "Before it do- does anything, you have to reason about it to me step by step. Before you do anything at all, you have to come up with a plan, you have to reason about it step by step," but you could always lie. And so, so the ability for the, the, the nice thing about transparent systems is that then, you know, we, everybody gets to interrogate it. Uh, if you have a transparent system, then researchers get to use it. If you have a transparent system, uh, open system, then the way you defend against super agentic systems in the future for cybersecurity is obviously not to go into a battle of who gets the better one.You know, you come up with some model, uh, model 7.0, and the only way I combat against that, I'm completely vulnerable until I come back with a 8.0, and then you gotta come back with a 9.0, and we just go back, back and forth driving each other nuts. And obviously that's not, that, that's obviously not the smartest way to do it. The smartest way to do it is you're gonna, you're gonna create these incredible cybersecurity systems and, or you're gonna... These cybersecurity threats, and what we're gonna do is we're gonna have millions, billions, swarms of cheap AIs, and we're gonna systematically surround it. And so it's kinda, you know, if you will, a giant dome. So for example, Nemotron Nano is being used for cybersecurity. And so all these cybersecurity firms take Nemo- Nemotron Nano because it's so fast and so, so cost-effective, you can train it to detect cybers- cyberattacks and then just deploy trillions of them.
- SPSpeaker
Yeah. Um, on, on the topic of open scaling, you know, we, we hung out in January and we-
- JHJensen Huang
I feel like, you know that one scene in Thor? Do you remember he was just hanging and he kept rotating in that direction?
- SPSpeaker
It's zero gravity. Here at AI Coachella, we got no gravity. [laughs]
- JHJensen Huang
You know, in Thor: Ragnarok. Do you guys remember that?
- SPSpeaker
We can, we can move a little bit back so you can do this.
- JHJensen Huang
No, that's all right.
- SPSpeaker
Okay. All right.
- JHJensen Huang
You guys don't watch movies?
Episode duration: 1:08:23
Install uListen for AI-powered chat & search across the full episode — Get Full Transcript
Transcript of episode tsQB0n0YV3k