EVERY SPOKEN WORD
55 min read · 10,549 words- ANAndrew Ng
What I want to do today is, um, give an overview of this class. Uh, I'm, I'm actually curious before we get started, um, this is now September twenty twenty-five. Um, how many of you, you know, just started at Stanford? Raise your hand. Wow. Cool. Awesome. So others pay attention to who just raised their hands and do say hi to them and, you know, help welcome all of the people that just joined Stanford in whatever program. Um, CS230 is a class that we offer in the flipped classroom format. And what that means is that instead of listening to me or Kian, uh, my co-instructor that you meets-- that you meet, uh, next week, instead of listening to us lecture at you for an hour, hour and twenty minutes or whatever, um, we actually ask you to watch a lot of the video lectures online so that we can then make use of the precious in-classroom time for much richer, deeper discussions. So both today and for the entire quarter, um, I would really warmly welcome anyone raising questions, uh, raise your hand. And in fact, I, I, I find that, um, instead of you sitting there-- oh, and even though we have a longer session schedule by the registrar, uh, we'll use usually only up to an hour and twenty minutes for this course. Um, and the goal is to... I-i-it turns out that a lot of Stanford students were watching, you know, the lectures, um, on, uh, Seagull or on the online videos anyway. So rather than us delivering that lecture, same lecture year after year, we put a lot more effort to put very high quality lecture videos online. And we'll ask you to just watch that online, which people are doing anyway, but just highly edited for offline watching. Um, spend-- and spend the classroom time, you know, doing the things that make sense for us to get together in person too. Um, so because you haven't watched any of the lectures yet, or I assume most of you have not, today we'll even maybe have a slightly sh-shorter session to introduce the class, talk over logistics and so on. Right? Um, [clears throat] as many of you know, deep learning is one of the latest, hottest trends, technologies of computer science and AI. Um, if we look at, you know, say our PhD admissions or, or even master's admissions, a very large fraction of all students that come to Stanford or apply to come to Stanford want to work on AI, right? I'm not sure I'm allowed to say the numbers, but they are, you know, extremely high, um, as you can imagine. And so, uh, my goal and my, my co-instructor Kian's goals, our collective goal is through this quarter to help you get to near or at pretty much state-of-the-art with regard to deep learning, um, and make sure that all of you walk away from this class highly skilled at applying deep learning. Um, and so it turns out that a lot of progress in AI over the last, I don't know, decade, maybe fif-- ten, fifteen years, was made by scaling. And one of the reasons why deep learning was so successful was because it's good at absorbing a lot of data. So, you know, if I draw a figure where on the x-axis I plot the amount of data we have for a problem, um, then using more traditional machine learning algorithms, so, you know, logistic regression, maybe decision trees, using older generations of AI machine learning algorithms, um, as you gave it more and more data, the performance or the accuracy of the more traditional algorithms would plateau, right? It was as if... So take speech recognition. With older generations of algorithms, even as you fed it more and more data, hundreds and thousands and tens of thousands of hours of speech data, the accuracy would often plateau, and it was as if the older generations of algorithms didn't know what to do with all the data that we now have. But what we started to find about ten, fifteen years ago, um, was that if you train a small neural network, also known as a small deep learning model, its performance would kind of maybe get better and better. And if you train a medium-sized one, and if you train a very large neural network, the performance just keeps getting better and better. And I think the reason that deep learning has dominated the AI scene for the last ten, fifteen years is because there is a recipe for training very large neural networks, um, that we can then shove a lot of data into that results in exceptional performance. So I think, um, we started to see this, uh, because of some, frankly, some Stanford research papers, um, about fifteen years ago when, when, you know, did the first early work on using Cuda programming and GPUs to scale deep learning. Uh, oh, by the way, actually just one fun fact. Um, uh, the first-- my first, uh, GPU machine used to train neural networks using Cuda, which is a controversial thing at the time, it was built by a Stanford undergrad in his dorm room, right? His name was Ian Goodfellow. But, um, I think that compute server built in a Stanford undergrad dorm room, um, allowed us at Stanford to lay the early foundations of using Cuda, a mo-- at that time, a modern language for training GPUs to train large neural networks. And then, you know, obviously, that influenced a lot of people and helped scaling up deep learning take off. So I, I, I tell that story because, um, sometimes the work that some of you can do, you know, in a dorm room or graduate hou-- student housing or whatever, or in a lab at Stanford, um, uh, looking back over some number of years, it, it can really have a huge impact. And maybe in this class, some of you will do work, um, as impactful as that someday as well.But so [clears throat] what we started to find was that as you train larger and larger neural networks, they could soak up lots of data and drive exceptional performance. Um, and then there was a research paper out of Baidu, which showed that as you scale up neural networks, as you scale up deep learning algorithms, the performance gains are actually quite predictable. So you can forecast if you buy this many GPUs, fill this with compute, and this much data at it, what would the performance be? Um, and then later, OpenAI popularized the idea with a really influential paper on scaling laws, um, and that predictability of how deep learning gets better in performance then drove a lot of the investments, uh, in, you know, data centers and, and building very large AI models with lots of data. Um, and so let's see. [clears throat] In terms of where this class sits, so in computer science, all of us build on each other's work, right? A lot of the way that, uh, computer science and AI has made progress is we build on top of other ideas that are in turn built on top of other ideas that in turn-- they're in turn built on top of other ideas. So maybe I want to give you a s-little map to, to, to show you maybe where deep learning sits. Um, so I think there are... You know, machine learning is built on top of computer science, so I think it's actually helpful to learn CS fundamentals. Uh, and even though I use, and I suspect vast majority have used AI-assisted coding, be it tools like Cloud Code or Gemini CLI or OpenAI Codex or Cursor or Windsurf or whatever, um, I find that people that know CS fundamentals, that really understand computer science, uh, that really understand how computers work rather than like, "I'm gonna vibe code this," you know. Like when you understand CS fundamentals, you get things to work much better. But on top of CS fundamentals, [clears throat] there's a set of, um, machine learning skills. So how do you build algorithms that can learn from data? Um, and then deep learning is a special type of machine learning. Well, that's really the most effective type of machine learning, as far as I can tell, in which we train neural networks. We train certain types of algorithms to learn from large amounts of data. Uh, so far in the last ten minutes, you've probably heard me use the words deep learning and neural networks, and I think today those two terms are almost interchangeable. Uh, some, some purists will insist on some technical differences. For all practical purposes, they mean the same thing. But what happened was the term neural networks had been around for decades. But around ten, fifteen years ago, a number of us realized that, you know, deep learning, it was just a much better brand. Um, and so even though neural networks had been around for decades, starting about ten, fifteen years ago, it was deep learning that took off because who doesn't want learning that is really deep, right? [laughing] It's just a good brand. Um, but you hear me use those terms kinda interchangeably. But, um, deep learning algorithms, neural networks, they give us a way to take advantage of more and more and more compute capacity, so we can build very large AI models with a lot of parameters to soak up the large amounts of data to get more and more, you know, intelligent or to make better and better predictions or generate more and more accurate outputs using the large amounts of data that's available to us. Ever since I was a teenager, my mom's been trying to convince me to stop mumbling.
- SPSpeaker
[laughing]
- ANAndrew Ng
But now, many years later, I still struggle with that. So I'll, I'll, I'll try my best. So please, uh, um, wave at me or, or kind of, uh, let me know if I start to drift lower again. Yeah. All right. My-- I think, um, uh, my mother would be very happy that I practice like this now. Um, [clears throat] all right. So CS fundamentals, machine learning, deep learning, and then the recent generative AI revolution. You know, generative AI, um... Sorry, bad handwriting. Generative AI, um, which is mostly built by a specific type of neural network called a transformer neural network, which you learned about in this class. You actually learn what is a transformer architecture, um, uh, later in this quarter as well, is in turn built on top of deep learning, right? So I assume all of you, you know, are regularly prompting LLMs, large language models, to help you get work done. Um, and what I find is that while I use LLMs all the time, for a lot of applications, just prompting LLMs, it doesn't cut it. There are a lot of things that I cannot get to work just by prompting an LLM. Um, and so I'll often have to go a layer deeper into the deep learning layer of abstraction and fiddle with deep learning algorithms in order to get certain things to work. And in fact-- [lip smack] So what this class covers is we try to make sure that you are, you know, expert-- near expert in deep learning by the end of this class. But we'll also, um, kind of dip a little bit into machine learning concepts. We'll talk a lot about objective functions and tips and tricks for optimizing parameters in an efficient way. Um, and then we'll also actually reach up a little bit to cover some gen AI. In particular, we'll talk about what is a transformer network. And then through this quarter, Jen and I will also chat a little bit about the, um, job landscape as it relates to gen AI and deep learning and how to use-- how, how deep learning is enabling certain types of gen AI applications. Okay? Um, I have more to say about this, but let me just pause for a second and see if you have any questions. Yeah, go for it.
- SPSpeaker
Would you say machine learning and data [inaudible]
- ANAndrew Ng
Oh, good question. When I say pre... Right. Um, congratulations, you've won the prize for the first question asked in CS230, 2035. So well done [laughing] Yeah. Um, so is machine learning a prereq to this course? Uh, not really. So I think two common entry points to AI at Stanford are, um, CS-- Well, a few common entry points are, are CS129, CS229, and CS230. Um, if you don't know any machine learning, this course may end up going a little bit fast. May seem like it's going a little bit fast in the first, uh, I don't know, two or three weeks, but some people do pick things up quickly that way. Uh, and, um, maybe I should-- Well, I'll just read. So a, a few courses that you may hear about, um. [clears throat] So one twenty-nine is a relatively easy entry point into machine learning that slo- tends to... It, it takes a longer time to go through the core concepts of machine learning. Like what's an optimization objective? How do you implement gradient descent? What is, um, logistic regression? What's a very basic neural network? So this is a relatively applied easiest of the, of, of, of this list. Um, CS229, uh, which I'm also involved in co-teaching, is, uh, much more mathematical and theoretical, very high pace, very intense, and very mathematical. And this is less applied than one twenty-nine and two thirty, but this will go over, um, a lot more of the theory and the math derivations behind machine learning algorithms. So for example, if you ever want to learn how to do, um, calculus, not using real numbers, but calculus using matrices and vectors, you know, that's a bunch of sort of, I don't know, mildly complex math that's worth knowing. Um, CS229 goes over that. CS230, this class is relatively applied and it focuses just on deep learning. So of all the machine learning... There are a lot of machine learning algorithms out there, right? Supervised learning, unsupervised learning, a lot of machine learning algorithms out there, and many, many of them are very useful, but... And, and so they're all worth learning about. But of all the machine learning algorithms out there, the one category that is most useful is, you know, that's taken off the most, is deep learning, and then this class focuses just on that. But the other algorithms are also worth learning about. So one twenty-nine is the easiest on-ramp, um, [clears throat] but if you've done either two twenty-nine or two thirty, I would probably skip one twenty-nine at that point. But if you are getting started, there are, there are multiple on-ramps. Yeah.
- SPSpeaker
Can we take two twenty-nine and thirty together or do you think it's redundant?
- ANAndrew Ng
Oh, uh, yeah. Can you take two twenty-nine and two thirty to- uh, together? Yeah, thank you. Prize for the second question. [laughs] Well, all right. If you really want to clap, sure, go for it. [laughs] [clapping] Um, [clears throat] yes, you can take CS230 and CS229 together. Um, we designed the two curricula to be relatively low in overlap, so the very small amount of, of overlap between these two. All right. All right, we, we won't clap for every single question, but yeah, go ahead. [laughs]
- SPSpeaker
Um, are you gonna cover more like recent deep learning algorithms that are used in like recent, like LLM developments like Piper or LLaMA?
- ANAndrew Ng
Yeah. So will we cover the recent, um, LLM developments? We will touch on the transformer neural network, but um, uh, not the, uh, latest LLM variations in this course. Um, I think a lot of the-- It turns out that [clears throat] when you go to get a job, well, maybe, right? Assu- assuming you go and work in the industry, the number of people [clears throat] training LLMs is actually very small, right? Some of those jobs tend to be incredibly well-paid, so we hear about kind of very high salaries in the news media. But the vast majority of application builders end up, um, sometimes working the GenAI level, [clears throat] not that often training a transformer from scratch, um, but then often using deep learning tools as well. So maybe one example, um, [clears throat] something that many of my teams have done is, um, we have trained transformer model-- foundation models from scratch. They're relatively small ones in, in, say, startups. But one thing we do do quite a lot is, um, take a pre-trained transformer network, uh, and then engineer our own data to further fine-tune it, right? Uh, sorry if I'm using words that you may not totally understand. You know, pre-trained, fine-tune, you know what all those terms are by the end of this quarter. So those are things that we actually do, um, uh, kind of day to day. This is important for getting a bunch of products to work. So you will gain the foundations needed to do type of work in this course. Um, the one thing, one thing we don't do is talk a lot about how to train the largest cutting-edge transformer networks. I think that is a very important skill set, is a relatively niche one for which some people are, you know, getting paid really, really well. But the number of people doing that in the world is actually small, whereas the number of people building applications with, you know, this set of skills is, is, is very large.
- SPSpeaker
Do you know of any courses that do cover the-
- ANAndrew Ng
Uh, do I know any courses covering that? I think Percy Liang was thinking about doing something, but I don't remember what he's doing this quarter. Um, a few people are think- thinking about doing something like that. Yeah. Go ahead.
- SPSpeaker
Um, this is just a question about the course itself. Um, do you know about what the portion of the course is gonna involve, like mathematical analysis and multiple-choice coding?
- ANAndrew Ng
So, um, so just repeating for the, for the, um, mic, for the, uh, uh, uh, home viewers. What portion is, uh, coding versus what proportion is, um, mathematical analysis? This course is relatively math light. Um, sorry, maybe that was too strong a statement, but I think this, this course is very practical. Um, I, I, I remember many y-years back, uh, speaking with a mathematician, um, and you know, we were just chatting, and he was asking... He was just talking about his career, why he chose to be a mathematician, and I, I still remember, you know, th-th-- he had, he had like stars in his eyes when he told me that he chose his career path because he felt his role is to pursue truth and beauty in the universe, and that's why he became a mathematician.In this course, I'm not gonna do any truth and beauty stuff, right?
- SPSpeaker
[laughing]
- ANAndrew Ng
So, um, truth and beauty is good. But you find that I've, I, I wanna take a very practical approach to, um, talking about how to build applications and build software that works. Um, yeah, cool. Anything else? Great. Cool. Awesome. All right. Thank you for all the questions. Please keep them coming, and feel free to interrupt me or Kian throughout this quarter as well with questions. Uh, love it. Um, so, oh, just to, just to flesh this out a little bit more, the, uh, uh, this is what I see in terms of, um, teams building practical applications. And I, I'm excited about applications because with improving machine learning algorithms, deep learning algorithms, gen AI algorithms, there are a lot of applications that you can build now that just were, you know, impossible or, or like really inaccessible, you know, to, to any person to build even a few years ago. Um, and so I find that when I prompt gen AI, it works really well for a lot of text-based applications, um, and there's work on multimodal LLMs, large multimodal models, so making inroads into vision, making inroads into audio, but really gen AI algorithms, especially transformer networks trained to output text, you know, like ChatGPT, Claude, Gemini, and so on, really fantastic for text-based applications, right? Um, [clears throat] and I find myself, um, regularly working with deep learning algorithms directly when I am working with, um, audio data, image and video data, um, and then also, um, a lot of structured data. Ah, sorry, my handwriting's awful. Right. So [clears throat] structured data refers to large tables of numbers, right? Like giant, you know, Excel or Google Sheets spreadsheets. Uh, but so that's, that's structured data. Unstructured data refers to text, audio, images, maybe video. Um, and because a lot of gen AI, right, large language models like ChatGPT had grown up being text in, text out kinds of, um, uh, machines, they are remarkable for a lot of text processing applications. But for other types of data, I end up often, you know, dipping down directly to use various deep learning algorithms. Um, and then it turns out that for text-based data, if all you do is prompting, you could usually go quite far. So a lot of applications are built by prompting LLMs. But, um, uh, but I've been on, you know, quite a few teams where after fiddling with the prompts for a month, you just can't get the performance to be better just by tuning the prompts. Um, or another good problem to have, it turns out use of gen AI tools are relatively inexpensive when you're prototyping, right? It's like, you know, a few dollars per million tokens, you can do a lot. Um, but sometimes if you're lucky enough to be on a product that, um, uh, hits product market fit and a lot of users want to use, you know, multiple times I've been on teams where we basically did not care about our, um, uh, uh, large language model bill, right? It was like, whatever, you know, twenty dollars a month or a hundred dollars a month was fine. But when more and more users start using it, then to your team's kind of positive surprise, your AI bill really starts to skyrocket. And then at some point, you look at how much you're paying for your AI bill for the large language models and, um, to bend the cost curve back down, often a lot of the techniques in deep learning become very relevant as well. So there are-- I don't know. I'm, I'm thinking just some of our bills were really breathtaking. I don't wanna say the numbers, but just kind of definitely more than we want to pay. You know, as much as we love the companies providing LLMs, our bills that we're paying them were significantly larger than I wanted to pay. Um, and then knowing how to use deep learning to fine-tune smaller models, that was really the critical skill set that just bent the cost curve back and just made the whole thing affordable to keep on providing a service, right? So, um, yeah, right. So that's what-- So that's the skill sets I hope you get from this course. Great. Let's see. All right. Um [clears throat] that was fine. And, um, to give a... Well, actually, sorry, let me use this one. So to give a fi-- to, to give a quick, um, overview, this course, the online materials is broken down into five modules. So just to give you a overview of the five of them, uh, first one is on the basics of neural networks, right, NN neural networks and, uh, DL deep learning, right? So you learn how to build a neural network or deep learning algorithm from scratch in Python. Um, I find that sometimes if you use the frameworks like TensorFlow or PyTorch, it hides a lot of the details. So I actually work through, um, how to build a basic neural network and how to build a basic deep learning algorithm just in raw Python, so you really understand it. Um, and then the second module, the second s- mini course will be on, um, how to improve, how to tune your neural networks. So may have heard that when you're training a neural network, there are a lot of parameters, or we call them hyperparameters. Hyperparameters are parameters that control the parameters, right? So the weights, the parameters, hyperparameters are things like the learning rates or what's the size of your neural network. And so there are actually a lot of hyperparameters that we end up tuning and, um, try to give you a way-- sense of what are the most important ones and practical skills for tuning them. It turns out that, um-I-i-if you look at, you know, like, my PhD students, I think every one of them that, right-- Well, I think, I, I think that probably every one of them, uh, uh, de-definitely everyone that be-- E-every PhD student I know that became great, I think, um, at some point wound up, up at two AM tuning hyperparameters, right? Um, and I still have very clear recollections of, like, being in the office, you know, two AM, three AM, fiddling with parameters, trying to get it to work. And it turns out that literally your skill at tuning hyperparameters, it really makes a difference. So there were some evenings that I knew my skill at tuning hyperparameters, you know, frankly, it made the difference between whether I went home to sleep at three AM versus whether I went home to sleep at seven AM. Maybe there's not-- Maybe don't do what I do. I'm not encouraging this kind of behavior, uh, but, but this is just my personal experiences, but it really makes a big difference, your practical skill at how quickly you can figure out the recipe to get these neural networks to train. Um, uh, and, um, and related to that is, um... One thing we've chatted a lot about in this course is, uh, strategies for building machine learning projects. So it turns out that if you build a complex system, you know, let's say you build-- This is one example we'll go through later this quarter. Say you want to build a system that, um, recognizes your face, a camera that recognizes your face, your friend's face, unlock a door, right? Security, safety, whatever. But-- So something like that. You know, I've, I've worked on systems like that. Chan's worked on systems like that. These are complex systems with multiple components. Um, there's a camera, there's, you know-- Do you subtract-- clean up the image? Do you color the face? How do you register the face? How do you compare a face? How do you decide to take another picture before you unlock the door or just, you know, d- or is someone trying to fa-- uh, hold up a picture printed on a piece of paper? There are actually a lot of decisions. And so what I find is that, um, the biggest difference between a team that knows how to drive forward a project like this well and get it done in days rather than weeks or weeks rather than many months, is the ability to drive a disciplined development process. It turns out when you have a complex system, less experienced teams will often almost pick things at random to work on, right? There'll read one research paper and say, "Oh, I read in the research paper we should get more data," you know, well, because, like, some newspaper said AI needs lots of data. So let's go spend six months to collect more data. Turns out a lot of the time, collecting more data does not help your application. Um, but sometimes it's a huge help. So given your application, how do you decide? Should you spend more time collecting data? Maybe you should buy more GPUs. I've, I've, I actually definitely know people that read in the news a lot of GPUs are helpful, right? And then I've literally met, you know, fairly senior business leaders that have, um, spent a very large amount of money buying GPUs and then, you know, um, some funny stories. And then, and then I go talk to them and say, "What are you doing with these GPUs?" And then sometimes, you know. There was one meeting I was in where literally a, a, a very large family-run business had bought a lot of GPUs, and the CTO, um, then pointed to his nephew, uh, uh, who, who, who was a, who was a current college student undergrad, and said, "Oh, my nephew knows AI. I'm giving him this very large budget in GPUs, and I think he'll do AI for me," right? And so-- But so I think that, uh, uh, knowing how to make these decisions and not just buying into the hype that you read about in the newspapers is, is really important. Um, and one thing I hope to do in this course is share with you what driving a disciplined development process looks like, because this is one of the things that really makes a 10X difference in the speed with which you can get something to work. Um, I, I've literally seen teams, you know, like, s-take six months or ten months pursuing an approach that experienced engineers would go in and go, "You know what? I could have told you six months ago that spending all this time collecting data, this was not gonna get your application to where you wanted to go," right? Um, but so how do you examine an application and figure out the diagnostics to figure out what are the productive things to do for your application? Where should you spend a lot of time on that? And in fact, um, I'm excited about doing some simulation exercises in this classroom with you later this quarter where, um, I'll invite you later this quarter, you know, to, to, to say in this, i-in this scenario, what would you do? And, and see if you could, you know, make decisions in a more systematic way, right? Um, all right. [clears throat] Um, then course four, we'll talk about con-convolutional networks. Um, very useful for computer vision applications. Um, and then... So ConvNets are specialized models most-- u-used mostly for, uh, vision applications, dealing with images. And then we'll talk finally about sequence models. So sequences could be time series or sequences of text, uh, like words. So I'll start touching the transformer network, uh, that, you know, that power a lot of the GenAI revolution. Okay? Um, [clears throat] and so throughout learning-- through learning all of these things, I hope that you all gain a large tool chest, um, that'll enable you to tackle an almost bewildering range of applications. I think one of the things I've most enjoyed as an AI person is it turns out when you work on AI, there are so many other teams that, um, have data-And that could use our help, right? So I feel like as a AI person, you know, I somehow bizarrely had the right to play in, you know, building autonomous helicopters or helping companies, I don't know, place more relevant ads. Maybe not the most inspiring thing I've worked on, but certainly very lucrative for some companies, right? Or improve web search rankings, um, or improve safety, uh, you know, get rid of the kind of negative toxic results you may not want search results to come back on. Or improve e-commerce retailing, or improve speech recognition, um, or help ships be more fuel efficient, right? There's-- all, all these are real examples. Um, or fight fraud, which is actually really exciting when you're fighting financial fraud, um, which is obviously a bad thing. But, uh, uh, when you-- sometimes you-- when you're fighting fraud, sometimes you wake up in the morning and your team's alerted you that there's a new scam, and then you just have to go and fight them and build algorithms in real time. And you know that every hour you take, you know, more, more money actually leaks from the financial system. So it's kind of awful that there's financial fraud, but it's actually one of the most exhilarating things I've worked on because literally every hour you are slower to, to, you know, formulate a response, you know, more dollars are leaking out every hour. So, um, anyway. So, so somehow when you have this two sets of deep learning, um, you just have a bewildering right to play or ability to play in a huge range of applications, uh, that could use your help, right? And I think on campus too, there's so many departments, um, uh, across campus, um, uh, in the sciences, engineering, humanities, business, uh, that have interesting data where your skill set will let you, if you choose, collaborate with them to do interesting projects. And, and I find, for example, bizarrely, uh, some of my PhD students are working on climate science. It's like, what do I know about climate science, right? I wish I knew more, but using machine learning tools, you know, we could actually work on climate modeling and, and geoengineering and, and just play in all of these important, um, uh, uh, important-- hopefully important and interesting places. So I hope that you have that skill set as well by the end of this quarter. Yeah.
- SPSpeaker
How do you know if you have enough data for a neural network versus, like, it's better suited for other, like, uh, machine learning?
- ANAndrew Ng
Yeah. So how do you know if you enough-- how do you know if you have enough data for a neural network? Um, it turns out to be really difficult to know. Uh, so if it's a application that others have worked on or that you've worked on before, then you may have a sense. So, for example, I don't know, I-- because I've worked on face recognition, you know, I kind of have a sense. If you wanna train a face recognition system from scratch, having, like, fifty thousand images, fifty thousand unique faces is pretty good, right? Uh, or, or... So if you've worked on it or if you read the research literature for something people have worked on, that would give you a gut sense for how much data, um, could be enough to get you started. Uh, but then for, um, greenfield brand-new projects that no one in the world has worked on before, um, if you can't find parallel projects that are kinda comparable, it could be really hard to tell. And so common advice for completely, um, greenfield-- sorry, greenfield, I mean a brand-new project, dissimilar than things... So for example, if, if someone has invented a new medical device and no one has collected this type of, you know, blood specimen data before, it's really hard to tell. And in that case, the most common advice is, um, get a little bit of data and just try training a model. And the degree to which your initial model works or does not work, that will help you hone your perspective on how much data may be needed. And you may be surprised. Maybe a hundred data points is all you need, right? Sometimes we've been surprised by that. And then sometimes we've also worked on applications where, you know, a hundred billion data points later, we're still trying to get a lot more data. Um, and I, I find it really difficult to tell. Yeah. Good question though. Anything else? Oh, so this is a good time for me to pause and take questions because, uh... Uh, oh, so what, what I'm going to do is, um, uh, right. What I'm going to do after this is, um, switch tracks and talk a little bit about exciting trends in AI, um, uh, kind of recent trends in AI that I'm excited about and how I view the AI landscape. But so this is actually a good break point to see if people have other questions before I talk about some trends in AI. Anything else? Yeah, please.
- SPSpeaker
When you were differentiating between gen AI and deep learning and those different tasks, what, what were like the distinguishing features?
- ANAndrew Ng
Yeah.
- SPSpeaker
Was it just transformer versus non-transformer? How would you describe the difference?
- ANAndrew Ng
So I guess, um, let's see. All of these-- many of these terms are blurry and kind of fuzz a little bit into each other. But, um, when I refer to generative AI, um, generative AI is this, uh, body of work that, um, generates text and sometimes also images, sometimes also audio, um, using deep learning algorithms in certain ways. So I think gen AI refers to this body of work, um, with most of the center of gravity on generating text, you know, maybe also images. And the text generation algorithms have been mostly implemented using transformer neural networks, uh, trained on large amounts of data, you know, scraped off the internet and elsewhere. So, so when I refer to gen AI, that's, uh, I guess, uh, that's one particular application of deep learning models that has given us large language models like ChatGPT and Claude and Gemini and, uh, Meta Llama and, and so on. That make sense? Yeah. Cool. All right. Anything else? Cool. All right. So let me... Could, could we go to the slides, please?One of the nice things about this class is Kian and I can occasionally, uh, just share a view of things we're seeing in the broader world. Hey, wh- wh- while we're doing that, I'm actually curious, how many of you use a, um, specialized AI-assisted coding tool like Cloud Code, CodeSearch, ICI, Codex, CodeWindsurf? Awesome. Almost everyone, but not everyone. Interesting. Oh, interesting. Oh, I thought... Interesting. Okay, cool. So, you know, one of the most exciting things that's happened in programming is AI-assisted coding, and I feel like, um, I personally hope I never ever have to go back to coding by hand, right? Like-
- SPSpeaker
[laughing]
- ANAndrew Ng
... it's just, um... It's, it's actually interesting. Uh, I often work in coffee shops on the weekend, and a few weekends ago, I was sitting in a coffee shop, and sitting next to me was someone that was, you know, coding by hand, and it looked so strange. I just asked him, "What are you doing?"
- SPSpeaker
[laughing]
- ANAndrew Ng
In, in, in a, in a, in a respectful way, you know. Uh, and, and it turns out that, that, that, that they're, they're doing a homework from some other university that required they code by hand. Um, but one of the things I find exciting is that, um, uh, individual programmer productivity is much higher than it ever used to be, right? And maybe I wanna share with you just, just one, one, one thing, what, what, what I see. So I find that in the software work that I do, I, I maybe categorize it into two buckets. One is building quick and dirty prototypes to see if something works, and then sometimes I, you know, write production-grade, enterprise-grade, robust, reliable software that has to scale, right? Um, and I find that where AI-assisted coding has made the biggest difference is building the quick and dirty prototypes. Um, whereas I think actually literally one of my collaborators, um, used one of the agentic coders that I named, but I won't say which one, but literally this morning he sent me a Slack message saying, "Sorry, you know, this agentic coder had a database migration error, and we just wiped out all of the database records." Uh, for-fortunately for a test application with like five users, uh, but, but it did happen. So I find that... Oh. Oh, right, thank you. So I find that, um, my use of the agentic coders, you know, for the production-grade software is more careful. Whereas for building quick and dirty prototypes, it kind of, um... So long as you're not shipping software in an respon-- in an irresponsible way, um, quick and dirty prototypes have a lot fewer dependencies. You usually don't need to integrate with legacy data infrastructure. And then, you know... I'm, I'm gonna say something that I, that, that feels like something I'm not supposed to say, but I'll, but I'll say it anyway, which is, um, uh, I find that, um, when I am running code, I, I, I find that people often worry about, um, safety and reliability of software or security of software. So one thing I often say to my teams is, um, if you are building a prototype that only runs on your own laptop and doesn't use any sensitive information, so there's no risk to sensitive information, then unless you are planning to, you know, maliciously hack into your own laptop, right, the security requirements can be lower. And so I find that, um, when building quick and dirty prototypes, um, having a sandbox environment that lets you operate within it quickly means that you can just make a lot of decisions faster without worrying as much about scalability or security or reliability. So long as the sandbox environment means this stuff isn't gonna get out there or leak information or create a security loophole. So that, that, that's part of what lets us move much faster. And so I find that, um, because of the speed of prototyping, so long as you can do so in a responsible way, um, to pursue innovative ideas, my teams will increasingly, you know, try twenty things and see what sticks. And, um, because I... And, and I know that a lot of teams are lamenting that many proof of concepts never make it into production, right? You try something, and it doesn't work, and I know some teams are feeling angst about that. I actually have a different view. I think if the cost of a proof of concept is low enough, then who cares if you have to do twenty of them? And that's the price to finding the one or two things that works really well. So one thing you hear about in this course is both when, when, when you're building a machine learning application, you usually don't know what's gonna happen, um, and there's a specific reason for that. The reason is the output of a machine learning algorithm, it depends both on the code you write as well as on the data you're training on. And while you control the code a hundred percent, you don't really know usually what's really in the data, in the weird and wonderful data that the world has given you. So for example, worked on speech recognition a lot in multiple companies in multiple contexts, and even now, when I work on speech recognition, I'm still, you know, sometimes a little bit surprised that, oh, this data has people of a certain accent more than I realize, or people somehow speak faster or, boy, there's a lot of background noise when people use it in a car, right? So even though I've worked on speech recognition multiple times... Oh, and actually one recent example. A application, I was actually surprised by the number of background speakers. They talk to us. They come and talk to a different person. Then they talk to us, and then we get confused, right? So, so I find that the data that the world gives us is often weird and wonderful. And so it is only by building a system that you then start to discover these things in the data that lets you make progress. Um, and with a lot of software applications as well, separate from machine learning applications, a lot of what I end up having to discover is what do users actually want, right? So again, I control my code a hundred percent. I can write whatever code I want. I control that. But you don't get to control how your users will react to your system.And I find that our ability to build quick and dirty prototypes rapidly, um, both to discover what's in the data and also to take to the users to see if they like it, that allows us to drive faster feedback loops than was ever possible before to then help us build more and more valuable software, right? Um, and I think, um, you know, I, I, I know that the mantra move fast and break things, right, got a bad rep because it broke things. Um, and I find that, um, some teams took away from this that we should not move fast, but I think that's a mistake. So what I usually tell my teams is move fast and be responsible. And, um, despite all the hype about, you know, AI extinction risk, AI kills all, like all that somewhat bizarre hype, in my opinion, I find that when teams move really fast, we can then implement things, test it out in a responsible way, and much more quickly identify problems and fix them. So I find that a lot of the most responsible teams I know, um, teams able to really get stuff to work really quickly, uh, they tend to be some of the fastest moving teams. So it's that speed of execution that lets you finally implement it, figure out what's in your data, figure what uses one, and that's the best way to figure out what could actually go wrong, um, to then make sure things don't actually go wrong. Right? And somewhat related to that is, um, AI coding assistance. Um, and I, I, I assume, you know, almost everyone or everyone in this class knows how to code. Um, uh, if you, if you are-- well, if you haven't learned to code yet, you probably may not wanna take this class yet. Um, but I find that, uh, there have been people, including very senior, right, business leaders advising others to not learn to code on the grounds that AI will automate it. And I, I wanna share this with you, not because I think you need to learn to code, but because I hope you help me spread the word, right? Go to all of your friends in other departments to tell them this advice to not learn to code, I think we'll look back on this as some of the worst career advice ever given. Um, and, and the reason is when coding becomes easier, more people should do it rather than fewer. So when humanity went from punch cards to keyboard and terminal, that made coding easier, and so more people learned to code. Um, when we went from assembly language to modern, well, at that time, modern programming languages, that made coding easier. More people learned to code. I, I went back. I actually found these papers on, um, when, uh, these articles on when COBOL, very old school programming language, right, was invented. And there were actually people that said, "Oh, wow, now we have the COBOL programming language. Coding is so easy. Who needs programmers anymore?" Right? And, and obviously the opposite happened. Um, went from text editors, IDEs, and then, you know, AI-assisted coding. As coding becomes easier, people should code a lot more. Um, a lot more people should learn to code. And the other thing I'm seeing is, I, I just-- I address something like this on people's minds. I know that, um, uh, unemployment of recent computer science graduates has ticked up to higher than it's been, you know, compared to, I think, the last decade. Um, and so I know that to people learning CS, that has caused some consternation. So I'm gonna share my view on that. So, um, it turns out that what I see in Silicon Valley and beyond Silicon Valley is we just can't find enough people with these skills, right? I know many businesses that would-- Well, I know large businesses that would love to hire a thousand people with skills in gen AI and deep learning and machine learning, but they're now struggling to find people with these skills. Um, conversely, there are still, you know, universities with curricula that has not changed since-- has not changed much, right, for the last, like, I don't know, since two thousand and two, before the rise of gen AI. And so I do see that many new CS grads, not from Stanford, but from, you know, around the country, are struggling with finding jobs because unfortunately, that older non-AI-enabled skill set that is, you know, not as much in demand, right? And maybe just for myself, uh, today, I will not hire someone-- I, I will not hire a software engineer that doesn't know how to use AI to help them with coding. It's just, it just doesn't make sense. Same reason why I just won't hire someone that uses a punch card instead of keyboard and terminal, right? And, and I think when the world evolved from punch card to keyboard and terminal, people still had punch card jobs for a while, but eventually the punch card jobs just went away. It just doesn't make sense anymore. So today there are still coding by hand jobs around, maybe some, some specialties, some very low-level coding where AI is not very good. It turns out AI is not very good at some types of GPU programming. You know, there are some niches where coding by hand actually still makes sense. But for building applications, um, you know... Uh, so I, I, I actually-- I, I remember just, just a few months ago now, many months ago now, where I interviewed two engineers back to back. One, um, had not yet graduated from college, but was highly on top of gen AI coding. So, you know, spoke with that candidate, knew how to use AI, built prototypes, gets along quickly. Right after that, I also interviewed someone with ten years of experience as a full-stack engineer, but whose skill set was exactly the same as the two thousand and two skill set. Had not tried out any AI-assisted coding, really good skills, full-stack engineer with ten years of experience. And it was actually really clear to me. I picked the fresh college grad, well, he had not-- he, he was about to graduate, over someone with ten years of experience, right? So I think, um, uh, making sure you master these skills are really important. And what I'm seeing, um, is there is a very largeGap that businesses are having a hard time filling, um, for people with these skills. Um, but the demand for the two thousand and twenty-two skill set, software engineering, full-stack engineering skill set, that is not there, right? Um, so, so I think, um... And then in terms of, uh, AI-assisted coding, um, I find that CS fundamentals really are important. So in, in addition, so I know I, I, I hired someone, a fresh college grad over someone with ten years experience. True story. There's actually one other part to this story, which is with, with, with respect to all of you about to graduate from college, the best programmers I know are also not fresh college grad. No disrespect intended to fresh college grads, someone about to graduate from Stanford. Uh, the best programmers I know are really on top of AI-assisted coding and additionally deeply understand computer science fundamentals, right? So it turns out-- Maybe I'll illustrate this with a quick story. When I was teaching an online course, um, my team wanted to generate background pictures like this, you know, just for decoration. So when I was working on this, um, this is a course, Generative AI for Everyone. I was working with a collaborator, Tommy Nelson, that understood art history. And so my collaborator, um, knew the language of art. He knew the artistic genre inspiration, the palette. So he could prompt Midjourney AI image generation with the language of art, and so he could generate beautiful pictures like these. In contrast, I don't know art history. I wish I did, right? And so all I could do was go to Midjourney or AI image generation and I type, "Please make pretty pictures of robots for me." And I could never get the control that my collaborator Tommy could to generate pictures like these, which is why we use all of his pictures and none of mine, right? And I'm seeing the same thing in computer science. Um, one of the most important skills for the future is to understand how computers work and understand how gen AI and deep learning and machine learning work so that you can use the language of AI, use the language of these tools to tell a computer exactly what you want, so the computer can do it for you. And there is actually a huge difference in performance between, you know, someone that's learned to just prompt an LLM without understanding how computers or how AI really works, versus people that can look at the problem, analyze it, and then with AI-assisted coding, you know, tell a computer how to take the next steps. Uh, which is why I think that, um, CS fundamentals is very valuable. CS fundamentals, machine learning fundamentals, deep learning fundamentals. We-- My-- I and my teams, we use that knowledge, like, every day, right, in, in making pretty consequential decisions, right? So, so I hope that, uh, you get that from this class and the many other classes at Stanford as well. All right. Um, all right. I think I might leave the rest... There, there's more I could say on trends in AI, but I find that, um, uh, AI-assisted... Oh, but, but one thing I hope you do, really, you know, go to all your friends in all the departments across campus and encourage them to be a builder. Because the other thing I'm seeing is, um, clearly for s- computer science professionals, you know, use AI-assisted coding, know CS fundamentals, build cool stuff. But for other disciplines as well, um, that is not computer science and not AI, I'm finding that, um, you know, the education professional or the climate scientist or the mechanical engineer, the ones that know how to build software, um, are just more productive and get a lot more done. And the barrier to entry to AI, to coding is the lowest it's ever been, you know, in our lives. And so this is a good time, frankly, for... I, I, I wish every single Stanford student, right, would learn to build software with AI assistance. Uh, so I hope you go help your friends across campus, um, to master those skills as well. Okay? Uh, yeah. Uh, g- Any, any other questions? Yes, go ahead.
- SPSpeaker
So what do you say about, like, the trend of someone knowing that they're familiar with [door closing] but also they're familiar with AI? I, I heard a lot of, like, a lot of the industry folks talking about how they would rather hire someone with ten years of experience who knows just training AI as is, like just using coding [coughing] rather than a fresh undergrad that does have, like, that deep understanding of deep learning, machine learning, and, like, training AI assets on their own just because they have more experience building. Do you see that also?
- ANAndrew Ng
Yeah. Uh, so let me rank productivity, right, and I'll give you four levels. I think, um, and again, and I say this with a lot of respect for individuals, so if I talk about productivity, it's not with any disrespect or any lack of affection for anyone or for their work, right? But I think, you know, least productive are people with no experience and don't know AI, right? One step [laughing] on top of that is people with less experience, but on, on, uh, uh... Sorry. One step on top of that is, um, uh, people with, say, a decade of experience, but that don't know AI. Um, on top of that, I would rather take a fresh college grad that does know AI. But then even more productive is someone with, you know, a decade of experience and also really on top of AI. So I think between the two factors, really understanding AI is very important, but experience is also important. And so the best developers I know, um, we just work and we just ship code like what no one's ever done, I think even two, three years ago, are very experienced developers. They're also very on top of how to use the latest AI technologies.Yeah. Oh, please. Oh, oh, sorry. And just one other thing about the job market. I find that a lot of employers have not yet figured out how to hire appropriately. This has contributed... Frankly, a lot of employers, you know, if a company has no one that knows GenAI, how do they even know how to interview appropriately? So that is a problem that we need to solve as well. Go ahead.
- SPSpeaker
What's the value in, like, taking, like, classes that are super in the weeds programming and, like, like how, like, for example, I'm thinking of classes like CS107, 111, where we're not encouraged to use tools, and we're, like, encouraged... Is that to build the fundamentals? And do you think that's necessary?
- ANAndrew Ng
Yeah. Boy. So CS107, CS111 are great. Uh, so, so do take them if you're considering them. I find that the fundamentals are important. Um, uh, how to put it? Yeah, I'm not sure what else to say other than that. Um, I think, I think, honestly, uh, Stanford, we're known for, uh, CS department, we're known for really... Um, um, um, I know I'm biased, but I want to say, like, probably the best entry-level CS programming classes of any university in the world. I'm probably biased, so maybe I shouldn't say that. But I think, uh, they're excellent courses if you want to learn the fundamentals in a really solid way. And I know the instructors are routinely thinking about how to update the curriculum and realities of GenAI. So I, I think they do an excellent job with that mix. Yeah. Yes, please.
- SPSpeaker
How would you define someone that really knows GenAI compared with a person just who pulled out the cursor and typed a prompt?
Episode duration: 1:00:16
Install uListen for AI-powered chat & search across the full episode — Get Full Transcript
Transcript of episode _NLHFoVNlbg
