This video isn’t embeddableWatch on YouTube →

Stanford CS230 | Autumn 2025 | Lecture 3: Full Cycle of a DL project

For more information about Stanford’s Artificial Intelligence professional and graduate programs, visit: https://stanford.io/ai October 7, 2025 This lecture covers the full cycle of a DL project. To learn more about enrolling in this course, visit: https://online.stanford.edu/courses/cs230-deep-learning To follow along with the course schedule and syllabus, visit: https://cs230.stanford.edu/syllabus/ More lectures will be published regularly. View the playlist: https://www.youtube.com/playlist?list=PLoROMvodv4rNRRGdS0rBbXOUGA0wjdh1X Andrew Ng Founder of DeepLearning.AI Adjunct Professor, Stanford University’s Computer Science Department Kian Katanforoosh CEO and Founder of Workera Adjunct Lecturer, Stanford University’s Computer Science Department

Kian Katanforooshhost

Oct 15, 20251h 7mWatch on YouTube ↗

EVERY SPOKEN WORD

60 min read · 11,820 words

0:05 – 4:08
Why AI projects are inherently iterative (code + unpredictable data)
1. KKKian Katanforoosh
  So, um, what I'd like to do today is chat with you about the full cycle of a deep learning project. And as I promised in the first lecture, rather than me talking at you for an hour and twenty minutes or whatever, um, would love for this to be much more interactive. And so what I'm gonna do is illustrate this with an example, but also ask you a bunch of questions along the way about what you would do if you are the one working on the project that I'm gonna use as an illustrative example. Okay? So plan for this to be quite interactive, and please interrupt any time and ask questions since I think that's why, uh, you know, we-- I want to do this in person here at Stanford. Um, so one of the reasons why developing machine learning or deep learning or other types of AI projects, including AI projects using large language models or agent AI workflows or whatever, is that AI projects are different than traditional software engineering projects. So one of the biggest difference between AI projects of the many different flavors, supervised learning, LLM base, um, you know, generative AI base, is that in traditional software projects, you write code and you control your code, right? Write whatever code you want, compile it, your code does what you tell it to. But AI projects, you know, involve both code as well as data that you train your algorithm on, and you almost never know what strange and wonderful things there are in your data. For example, if you're working on a face recognition application, that's a running example I'm gonna use today, then as you're just getting started on the project, you actually-- it's really difficult to know in advance, um, what you find in the data. Is the lighting of the faces good? Do you struggle with people with very long hair or people with short hair? Um, do people making weird facial expressions make your system struggle? Do people wearing glasses make your system struggle? So data is so rich that a lot of the time you can't predict in advance what your AI system is going to do, not because you don't control the code. You can write whatever neural network or whatever code you want, but you don't know what's in the data. And this is why unlike traditional software engineering, um, machine learning development is a much more iterative process where you just have to build something, see how it works, and then through a process discover, almost discover what is in the data and therefore what you should be doing to change your code to make your overall system perform. And this is also true not just for-- this is true not just for deep learning based systems. This is also true for modern large language models. If you've-- There's been a lot of hype, buzz about how LLMs, large language models, are hard to control. I think there's a lot of excessive hype about that, you know, kind of fear-mongering. There's a little bit of that. But one of the reasons why none of us know in advance what LLMs do is because it was trained on, on a lot of data, more data than any human could possibly look at. And we just don't know, you know, with precision what is in all that data that the LLM was trained on. And so because we don't know the data, we can't really look at all the massive tens of trillions of tokens of data it was trained on. It's hard to know exactly how a large language model will perform. Which is why building agent AI applications or building large language models based applications is also very empirical or, or very experimental, meaning you just have to build something, then see where it goes well, see where it goes poorly, and then use that to fix problems, and that's how you drive progress. That make sense? So because you control your code but you don't really know, it's hard to control the data. And this is true both for the data you have stored in your hard disk. You know, kind of, I don't know, terabytes of data stored in a hard disk. I don't really know what's in my hard disk. Um, and the thing you really don't control is what data the world will give you in the future. So we deploy a system in the
4:08 – 5:40
The “full cycle” of a deep learning project beyond just modeling
1. KKKian Katanforoosh
  world, uh, say, face recognition, which I-- we should talk about. Will people wear a thick, heavy scarf that covers part of their face when it's winter? You just don't know. There are all these things in data that will surprise you. And, and if you don't even control your past data that's already in your hard disk, you certainly can't control your future data. So, um, it turns out that a lot of machine learning classes talk about building models, right? And you learn a lot from the, um, online videos, uh, about how to build powerful deep learning models. But it turns out that building the overall machine learning system or a deep learning system has a lot more work than just training models. But, um, if you look at a lot of courses, there's actually a very strong focus on modeling because I think that's what academia has focused on, right? We can train models, evaluate models, publish papers on models, and so you find that a lot of courses focus on, you know, training a good deep learning model. And that is absolutely important. Um, but because we know how to evaluate models, different research groups can benchmark different models. There's a lot of academic research work on that that's reflected in a lot of courses. But this is just a small part of what you need to do if you want to build an effective deep learning system. And what I want to do today is, um, go outside the small box of models to give you a broader view of what it feels like to develop a deep learning or AI or machine learning system. Okay?
5:40 – 9:16
Case study setup: face recognition for door access and key-card verification
1. KKKian Katanforoosh
  And this is what it often, uh... So this is, this is what, um, building a deep learning system would be like. Which is first, you know, right? Specify the problem, figure out what we're actually working on.The one example I want to use today will be to build a face recognition system or a face rec system for, um, security, for deciding when to unlock a door. And then, you know, Kian talked about some face recognition, which I believe in face rec-- face rec architectures last week as well. But, um, specific application I wanna talk about is something I've worked on. Some-- Built-- Actually, I built one of the commercial systems that, um, uh, uh, that, uh... For, for this is, um, if this is a door, um, and, you know, this is... Well, you or a friend or, or maybe a, a, a someone that you don't want to let in approaching. You have a little camera... Sorry, bad drawing. Take a picture of whoever's approaching the door and decide whether or not to unlock the door, right? So face recognition to decide who's authorized to enter, like, a restricted location, like a, you know, corporate office building or your house or whatever. Um, and actually one, one, one common use case that, um, one of my teams built was, uh, key card-- swipe key cards. So sometimes key cards get stolen, um, and so one of the systems we deployed, you know, fairly large office complexes, was, uh, um, if you swipe a key card, we also just take a picture, the zoom soon discard it to make sure that the key card is held by the person, you know, whose face is shown on the key card. So it makes it harder for someone to steal a key card to gain unauthorized access to, to, like, an office complex, right? So, so, so I'm gonna use this as, uh, the motivating example for today. And after, um, specifying a problem, typical process is then, right, um, [coughs] sometimes the open source models you can download, but let's say for today that the open source models aren't good enough, you wanna train your own model. The typical process, we get data, you know, design a model, right, train a model, um, and then we will iterate through these steps a bunch of times until the model looks like it's performing well enough. Um, and then after that, we have to deploy it. Um... And, uh, monitor and maintain the model. Okay? So I'm actually gonna talk a bunch about, uh, multiple of these steps today because I want you to come away with a feel for when you're working on a real machine learning application, what the, what, what the important steps you will face are. Right? Um, and so as I alluded, machine learning development is very iterative process. So for these three steps, we often drive a, um, rapid development loop where we, you know... Oops,
9:16 – 11:18
Face recognition architecture primer: Siamese networks and registration images
1. KKKian Katanforoosh
  sorry. Design the model, train it, analyze the results, and then maybe design or update the model or the data or something and iterate around this loop many times before you, um, are satisfied. And, um... All right, just one more detail. Um, it turns out that for face recognition, the very common architecture, which you learn in detail about, uh, later in the online videos, is a neural network called a Siamese network. And what that does is a neural network that takes as input two pictures, [clears throat] and two pictures get fed to a neural network or a deep learning algorithm, and it's the job of the deep learning algorithm to tell us, are these two pictures the same person or different persons? Right? Because, um, if you're trying to set this system, say, for your house, and maybe you have, I don't know, a few family members or roommates you want to let in, then it's, I don't know, it'd be quite annoying if you had to retrain a neural network for every single home. So the most common way to do face recognition is have a neural network that inputs two pictures and the job of the neural network is says, "Tell me, are these two pictures of the same person?" And then the way you set up the system is to input a few registration pictures. So take a picture of yourself, take a picture of your roommate, take a picture of, you know, any family members you want to have access. So then when someone comes, it can quickly check if the person that just showed up is one of the people that's authorized and then let them in. And then the corporate key card swipe example is someone swipes a card and, you know, my card says, "This is Andrew's card," then it'll quickly pull up my registration picture to double-check if I am-- if I seem to be the same person as the, you know, Andrew Ng that was registered. Right? So this is how, um, this is a typical neural network architecture. All right. So
11:18 – 16:02
Interactive scenario: how to collect training data without scraping the internet
1. KKKian Katanforoosh
  I have a question for you. One thing, one, one, one thing I'd like to do today is, um, walk you through a number of scenarios and invite you to think about what decision you would make if you are the CTO of a startup, uh, building these technologies. Right? So, so my question for you is, um, if you are the CTO of a startup building the next face recognition system, um, and if your lawyers have said you aren't allowed to download data from the internet, right? So let's not download data from the internet for this application. How would you go about getting data to train the system? So what you need is a bunch of pictures of people, right, to train a neural network to say, you know, are they the same person or not?So maybe take, take a few minutes to think about it, and then I'll see if people raise hands and give some answers. And one, one specific question I have as well is, um, how long would you take to collect data before you start training a model, right? So I think, um, uh, well, you need to get data. We've designed the model, I guess, and then you need to train model. So what does the timeline look for you? We specify the problem, you know, we design the model, how many hours or days or weeks would you spend to get data and how before you start running, you know, gradient descent? Right. Maybe-- Uh, I, I see a hand up. Oh, sure, go ahead.
2. SPSpeaker
  Uh, are we doing this everyone around the world, or are we, like, specifically looking at, like, Bay Area or America scenario of this, or is, like, this wide range of different-
3. KKKian Katanforoosh
  Let me, let me leave it a bit more open-ended. Say you just graduated from Stanford, and you're a CTO of a three-person startup, um, building this thing, uh, that eventually, hopefully, you sell all around the world, but your goal is to just get started with, with the three of you working out of, you know, Palo Alto, California. Right. And, and do it as your real self, right, with the resources that you have. All right. Anyone want to venture an answer? So how would you, how would you get data to train a neural network? Go for it.
4. SPSpeaker
  Um, how about using some video streaming service and, uh, collect the video data? Or like, uh, uh, maybe people are asking some company to, uh, if I can get the data from them, or we can even create a small video streaming service and then we collect the data people upload. So then we need to share the videos.
5. KKKian Katanforoosh
  Yeah. Cool. Video streaming service. By video streaming service, are you thinking like, you know, Netflix and Hulu? Or are you thinking like, um, security videos?
6. SPSpeaker
  Like Zoom.
7. KKKian Katanforoosh
  Oh, like Zoom. Oh, I see. I see. Cool.
8. SPSpeaker
  Like video-- people talking video.
9. KKKian Katanforoosh
  I see. Cool, cool. Right. Cool. Right. Video streaming service. How, how long do you think it'll take to, to do that?
10. SPSpeaker
  Uh, a couple of months. Uh, you know, the reason, the reason is the activity machines that-- I may be able to collect it from the accounts of some people like that. People create their own different versions for your research.
11. KKKian Katanforoosh
  Yeah. Cool. Right. Cool beans. Cool. Thank you. Right. Creative idea. Any other ideas? How would you go about and get data? Yes, go ahead.
12. SPSpeaker
  This one might work for a three-person startup, but at a larger company, I would put a camera in the plan to have the camera at full swing up and swipe and just take-- asking employees if they would like to participate in data collection, and you can take pictures of the employees at least three times a day. So the employees that are actually working.
13. KKKian Katanforoosh
  Yeah.
14. SPSpeaker
  You might have to generalize it with the physical presence policy.
15. KKKian Katanforoosh
  Yeah.
16. SPSpeaker
  Yes, that works.
17. KKKian Katanforoosh
  Yeah. Cool. Awesome. Right. Stick a camera there. Uh, let, let people, you know, opt in, right, to, to get their picture taken. Cool. I think I saw a hand up. Go for it.
18. SPSpeaker
  Yeah. We subject the users.
19. KKKian Katanforoosh
  Oh, sorry. It's, it-
20. SPSpeaker
  Taking reference from all the services with the camera.
21. KKKian Katanforoosh
  I see. Cool.
22. SPSpeaker
  Users.
23. KKKian Katanforoosh
  How, how would you get users?
24. SPSpeaker
  Well, we review each user. So we need more people.
25. KKKian Katanforoosh
  I see. Cool. Right. Yeah. By your own people, you would, like, grab some friends and also they'll, you know, give you that LinkedIn photo, give you some pictures from your camera roll. Yeah. Cool. I like that. That's fast. I, I, I like that. Any other ideas? Oh, go ahead.
26. SPSpeaker
  Ask Stanford students, uh, survey Stanford students.
16:02 – 26:09
A speed-first principle for early data collection and learning
1. KKKian Katanforoosh
  I see. Interesting. Yeah. Ask Stanford to send an email. Cool. But I love Stanford. Stanford's a wonderful institution. Ask-- [laughs] that's gonna take a while, I think. [laughs] Very creative idea. I li-- I like the creative ideas. So let me share with you one guiding principle for how I would encourage you to approach this problem of data collection. I appreciate all the creative ideas, actually. Um, one of the frameworks I often use to decide how I collect data is speed, um, because I find that, um... Actually, especially if you're building a startup, right, uh, one of the-- in my opinion, one of the strongest predictors for whether a startup will succeed and also whether a small innovative project in a large corporation, you know, it could be a giant corporation, but if a team of three, you know, work on a small innovative project in a big, big company, I find that one of the biggest predictors for the chance of success is just the sp-she-- speed of execution. It's just speed of getting stuff done. And so when I'm, um, sitting with a team and brainstorming different tactics, I will gravitate toward the tactics that let me get a dataset very quickly. And quickly usually means, you know, like one or two days, um, even if it ends up with a inferior, smaller, lower quality datasets or whatever. Um, because I don't really know what problems I'll see in my data, and the quicker I can, you know, get a dataset, train a model, see where it goes wrong, the quicker I can then discover what's wrong with my data and fix it. Um, true story. Chatting with a CEO that told me he once-- actually once, um, he, he actually had spent, um, uh, I think it was over a hundred million dollars, de-definitely more than tens of million dollars. Spent a lot of money buying a company for his data. Um, and then he actually said, "Hey, Andrew, I spent all this money to get all this data, you know, can you help me figure out how to monetize this? How to make money off of this?" Right? And I kind of looked at him and goes, "Boy, I kind of wish I hadn't done that." Right? Uh, um, and what I find is that, um, the value of data, it, it's just so difficult to know in advance, right? What's important and what's not important about data. So for a lot of these tactics, um-Uh, for example, I think student ID is an interesting one. Uh, but, you know, are student ID photos weird in some way, right? Or are they too expressionless or, or people smiling too much in student IDs? I actually have no idea. Um, actually, my, my Stanford ID, I look really weird in my Stanford ID. Um, [laughs] uh, or, um, uh... And, and I think you-- and, and, or, or, or, uh, and I actually like the idea of sticking a camera and just letting people, you know, come up and, and opt in to take a picture, if you can do it quickly. Um, and I think in a big company or even in a small startup, if, you know, it's important to respect user privacy or individual privacy. But if you can stick a camera in some place, you know, that doesn't, um, uh, kind of, uh, what's the word, um, invade people's privacy that don't want to be any part of this, but let people opt in, take the pictures with permission. If you can do that in days, I find that to be valuable. One of the things I found for quite a few of my projects, found that, you know, Stanford students, our community here is pretty cosmopolitan. We're people from all around the world. We're not fully representative of the world's, you know, distribution of people, but we actually do have people from all around the world. A lot of Stanford people are actually very nice. So one thing I've done multiple times is when I need to collect data, uh, we'll go to places on campus with high foot traffic. It turns out cafeterias are very high foot traffic. And we just ask people, "Hey, we're working on a project. Can I get a sample of your voice? Can I take a picture of you?" And kind of with, you know, really informed consent. Tell people, "Is it okay if I do this?" I've been delighted at how collaborative, right, Stanford students are. Uh, uh, um, but-- And, and I find that one thing I've often done is gone to my teams and said, um, "We have two days to collect data." It's like, whatever. "It's eleven fifty-two AM now. Let's figure out what we can do by, uh..." What day is it? Tuesday. "Let's figure out what we can do by Thursday, eleven fifty-two AM. So let's give us forty-eight hours, and let's brainstorm. How can we collect data?" And it's fine that the data is, isn't all there, fine that the data is low quality, but that velocity lets us more quickly train a model, figure out what's wrong with the data, and then jigger or retweet how we collect data. Right? So I find that there's some teams that will ask, um, "How can we collect the data we need, and how long?" And then ask, "How long will that take?" That usually leads to much slower execution. I instead tend to go to my teams and say, "We have two days or one day or maybe a week," right? Some short time span like that, and say, "What's the most creative, you know, respectful, responsible, but creative way you can use to collect data in this short time span?" And one of the ways to think about that too is training a model takes, I don't know, let's, let's say it takes two days. Maybe it actually takes more like one day, right? If you can train a model in a couple of days, then I would not spend, you know, like, whatever, right, two months to find data to train a model, because then this becomes a huge bottleneck. Because you can train a model relatively quickly, let's take a commensurate amount of time to design a model or, or design the data as to train a model. And depending on how long it takes to train a model, sometimes training a model, you know, needs to run overnight. Um, I actually see teams sometimes take one-day iteration loops around this. The fast-moving teams I work with, we often go around this loop once per day, uh, for the smaller models. If we're training a large AI foundation model, sometimes training a model takes weeks or even months, then the process can be different. If, if your, if your model run is gonna be, you know, two months, then yeah, maybe it makes sense to spend, you know, a couple of months to get the data really right. But for face recognition, you could train that overnight or actually in a couple of hours quite easily. So it makes sense not to spend massive amounts of time before you go in to train the model. Does that make sense? And because, um... Oh, uh, the, the word empirical means experimental, right? Sorry, I've used that word a few times today. And, um, I, I think, uh, uh, we, we, we say that machine learning is a very empirical process, meaning it's a very experimental process. You have to do it, see what happens, and decide what to do next. I know that sometimes others have criticized our field as like we never know what we're doing. We just try stuff and see what works. And then, you know, there's a little bit of truth to that. Um, uh, I think understanding neural networks, hyperparameters, architecture, that is really valuable, so we don't just try stuff at random. But because we don't know what's in the data, we do try a lot of things and then drive a disciplined process to understand what works and what doesn't, and then use that to navigate forward. Does that make sense? And then I'll just say there's one exception to the advice I'm giving here, which is if you're working on a project that you have worked on many times before. So, you know, whatever. I've built a bunch of face recognition systems. So I kinda have a sense from previous experience and from reading research papers that certain things I know are just not gonna work if I have, you know, a hundred images, right? So there's some face recognition systems I know I probably need at least fifty thousand images before it'll have any hope of working. So because of that prior experience, having gone through that loop a lot, I now have a basis to say, "Okay, I do need fifty thousand images." Then I might invest upfront and, and put more effort upfront to, to get those fifty thousand images. But I think for most applications you work on for the first time, if you don't have a academic literature to justify certain larger data investments, or if you don't have the prior experience yourself, then, um, I would focus on the speed of iterating around this loop. Make sense? Um, all right. And to relate this to, uh, large language models-based applications as well-A lot of us, a lot of you are probably-- Well, you may be building applications like prompting LLMs and calling an API, right? Like a, like a OpenAI or Anthropic or Gemini or Meta Llama or whatever API to get things back from an LLM. One of the reasons that too is a very experimental, a very iterative empirical process is because when you write an LLM prompt, you don't really know in advance how well it's going to do because it was trained on data that none of us have really looked at. And that too is why instead of, um, theorizing for a long time about what prompt to use an LLM, you know, just try it out. And then it's by doing that, you then see the problems and, um, then your focus can be on fixing the problems. Make sense? And in fact, there's a lot of discussion on responsible AI, how do you make sure AI systems are safe or they don't have kind of unforeseen circumstances. And because a lot of the theorizing, you know, you can only theorize so much. I find that if you want to build safe, responsible AI systems, one of the best ways to do that is to just build it and then experiment with it in a sandbox environment. Just don't let it out in the world until you've tested rigorously. But just go build something and then, you know, test it, probe it in the safety of your own laptop, right? Don't let it-- into innocent users and, and have some weird impact on, on innocent third parties. But it's only by building and then probing it that you can figure out where can it go wrong, where it will say inappropriate things, where would, you know, respond inappropriately to certain user queries, and then that tells you where the problems are that you may work, work on addressing. Okay. All right. Yeah. Question?
26:09 – 30:50
Using error analysis to decide what data to improve (data-centric AI)
1. SPSpeaker
  How do you use, like, analyzing kind of like quality results to back that up, like data from changing validation?
2. KKKian Katanforoosh
  Oh, sorry. Say that again.
3. SPSpeaker
  So, like, when you use models to come up-- like, when you do analyzing of the model, right, and you come up-- come to a conclusion that the model is not good, like, how do you use that to kind of figure out what's wrong with the data?
4. KKKian Katanforoosh
  Yeah. Right. Um, great question. So when you analyze something is wrong with your model, how do you use that to take the next step forward, right? Um, that's a big topic that we'll talk at length multiple times in some of the videos and also in some future, uh, uh, lessons. But maybe long story short, um, one of the things you could do is change your neural network architecture. You may realize that maybe the assignment example isn't working or read the literature on architecture. The other thing you can often do is change your data. Um, so data-centric AI is a discipline of systematically engineering your data to build a successful AI system. And it turns out that you build a face recognition system. Let's say, um, by... I, I, I like your hat. Snap hat. It's cool. But you may-- Let's say hypothetically that you find that the system has, uh, really struggles recognizing people that are wearing hats. Then you may say, "All right. I need to get more data with people wearing hats." Right? And, and so it's often that, um, looking at what goes wrong, which we call error analysis, that then gives you the insight to say, "Oh, it works well on these types of data or these types of users, but it struggles with these type of users and certain data. So can I fix my data or get more data just on the subset of cases it struggles with?" And that's a very common, um, motion. That's a very common process for then driving the performance of the system. And, and this is also why, um, blindly going out to grab more data willy-nilly, that's often not a good strategy because there's just too much data you could get. Do I want more, you know, data with people with, you know, long hair or short hair, or people with facial hair? Sorry, too much hair. Um, too many hair examples. Or, you know, do I want people with, um, uh, wearing a scarf covering part of the face or people that wear glasses or that don't wear glasses, or people that, you know, are slightly turned away? Or s- there, there's just s- so many different types of data you could invest effort to get more of that until you build a system, see where it goes well, see where it goes poorly, it's really difficult to decide where to get more data. And just get more data of everything under the sun, that's very slow and expensive. And I think, um, part of the hype about the value of data has led people to have a sometimes overly simplistic view of data, right? You know, yes, of course, I want more data, but just grabbing more data of all types of data is a very inefficient, very expensive way to improve my system. And even if you look at the way that frontier models are trained right now, it's not a game of just grabbing more data of anything under the sun. It is identifying the subcategories where it's valuable to invest to get high quality data. Which is why if you look at, um... I- it turns out in AI where there are, there are two clear buckets of value in LLMs, right? There's the general answering people's questions. I think, you know, OpenAI ChatGPT is doing really well there. Gemini, Anthropic have momentum, but this is answering general questions. And then one of the verticals that's really valuable is AI for coding assistance. So I think, you know, Claude has been ahead for a while, but OpenAI, you know, I don't know, Gemini two point five Pro, some of the models are making really good progress in coding as well. And if you look at the t- look at the work that the frontier teams are doing to improve coding, building iterative agentic workflows is part of it, but also finding clever ways to come up with coding-related data is also a part of it. And if you want your LLM to do better in coding, you don't grab data willy-nilly with low quality random internet chat, social media, whatever data, right? But instead, having high quality coding-related data is how you can have a focused effort to improve your LLM's ability to, to, to code. So a lot of these things are actually, um, uh, yeah. Uh, uh, yeah. I, I, I feel like, um, the, the, like, more data is better. That is absolutely true, but it's also overly simplistic. Data is not monolithic. There are subcategories of data, and having a view on what piece of data to really invest in giving a lot more of, that's really important to, um, being efficient in how you improve your system's performance. That make sense? All right. Well, we'll talk more about error analysis in this later. Um, any other questions?Cool. All right. Yeah, go for it.
30:50 – 36:37
Data quality and distribution mismatch: how similar must training data be?
1. SPSpeaker
  What about the quality of data? For example, in the future, we want to collect as much data as possible [background noise]
2. KKKian Katanforoosh
  Yeah. Yeah, right. So as you collect data, um, the quality of data matters a lot too, um, which is why... And, and, and I think, um, data quality is, uh, is tricky, I think. Um, when building an application for the first time, I would still focus on speed and collecting some data quickly. But as you then analyze where your system is going well and where it's doing poorly, you often find that, um, the quality of data really matters. Um, so trying to decide, should I talk about this now or later? Um, [chuckles] I'll come back-- So, so let's see. Let's go with that. All right. Let me, let me talk about LLM one. Um, there's a lot of low quality random chat on the internet that's not that useful for training LLMs. But I think, you know, this is actually now well known that if, if you can legally access very high quality written authored articles of books, they're highly edited, very insightful, that's very high quality data for training LLMs. Um, and I'll, I'll give, I'll give an example later as well for face recognition. It turns out that, um, you know, blurry images, right? Would be a, a, a lower quality than sharp, in focus images, assuming that's representative of how you want to recognize people's faces. So that, that really matters as well. All right. Anything else? Cool. All right. Oh, yeah, go for it.
3. SPSpeaker
  Um, originally about the data, how close they need to be to the task you want them to do. For example, in face recognition, where if you're trying to compare two picture, you know, if you train a model just to compare object, it's easier-
4. KKKian Katanforoosh
  I see.
5. SPSpeaker
  -than people's faces. Um, it's, it's much faster, but we're moving away from the ideal task we want the, the data to do so. What is the balance here?
6. KKKian Katanforoosh
  Yeah. So, right. Just repeat for Mike. Um, how important is it that the data you collect is similar to the distribution of things you want to work on? Um, I think it's important, but not as important as most people might think. Uh, so it turns out that, um, uh, it turns out one of the reasons, uh, neural networks have been so effective is because when you build a very large neural network, you can throw all sorts of data into it, including data that is s-- you know, not, uh, perfectly tuned to your test set distribution, and it often doesn't hurt so long as your network is big enough. So, so you raise the example, what if we have it identify two identical objects, right? If it's generic objects like, I don't know, water bottles and markers, maybe that's too far-fetched. But, um, if we were to use, let's say, you know, simulated cartoonish characters, right, that look really different than real humans, my guess is it probably won't hurt at all and may even help a little bit. And so throwing in a lot of data, if you train a large neural network with a lot of capacity to absorb even some slightly relevant data, um, it usually doesn't hurt and might even help a little bit. But how much it helps is another empirical question that will be problem dependent, and we often just have to try it out. But, um, I think maybe, you know, like, uh, in the past, people used to have an obsession that the data you train on has to come from exactly the same distribution as the distribution you test on. That used to be how machine learning was done, I don't know, like ten, fifteen years ago. That's really not true today anymore. I think we're very comfortable. And I think when, when neural networks were much smaller, when we could train very small models, there was a sense that you didn't want to distract the neural network with irrelevant data, right? 'Cause computer was expensive with few parameters, and you're distracted on irrelevant stuff, maybe it gets less good at the core task you really care about. But if you can train a bigger neural network, which is getting easier and ea-easy these days, um, then it's become much more okay to toss in some data that, that hopefully isn't incorrect data. I think incorrect data is a problem, but just irrelevant examples hurts much less because big neural networks, uh, is like a human brain, right? You know, I don't know. The fact that, whatever, the fact that I learned to play the piano, you know, probably doesn't make me worse at AI, right? 'Cause, 'cause hopefully my brain is big enough to learn to play the piano and learn some stuff about AI. Um, uh, and I think as neural networks get big enough, they can learn some of the irrelevant things and also do well the core tasks you really care about. But this is less true when, you know, if, if my brain was really small then, uh, um... I, I don't think any are, are fans of Sherlock Holmes, but I think Sherlock Holmes had an attic theory that your brain has only so much capacity, so you gotta forget some stuff to learn new stuff. But when you train very large neural networks, that's, that's much less true. Cool. All right. Thank you for all the questions. Um, so what I wanna do is just keep going through this flow, right? So we talked about get data, design model, train a model. We'll talk a lot more about error analysis later this quarter of how to figure out where your algorithm is still subpar and where to, um, focus efforts to improve it. But what I want to do is, um, give you a sense of, uh, deployment, right? So, all right. So
36:37 – 39:39
Deployment reality: streaming constraints and the need for edge filtering
1. KKKian Katanforoosh
  when you have trained a model, it's often, um, a bunch of software engineering work to then maybe take your model, host it in the cloud, in a local server, and have it run inference, right? So a very common architecture for deploying a machine learning model would be that you have a neural network as a piece of software-Um, you deploy it maybe on a cloud-hosted service so that your software can accept a picture or accept two pictures and re-reply back, you know, do I unlock the door or not? Or is this the same person or not? So there's kind of a bunch of software engineering work that needs to be done. But in practical deployment settings, um, uh, w-- I'll actually tell you, it turns out that if you're building a practical face recognition system, you probably find that if you ha-- are trying to, um, unlock a, a, a door, you know, to a corporate campus, it's too expensive or too slow to stream video twenty-four seven to cloud to classify every frame at thirty frames per second to see if, you know, there's a person there you should unlock the door for. So in practical face recognition systems, um, what we end up doing... Uh, actually, right, so, so actually let, let's take the example of, uh, um, someone walking up to the door, to your home-- to your door at home. Um, uh, and you wanna see if there's someone there that you should unlock the door for, right? So a lot of systems will actually have an image from the camera and then, uh, try to build a system, um... All right. Right. So what we have so far, right, is a system that takes this input, an image, and maybe a reference image. A neural network says, "Do I unlock the door or not?" But it turns out that if streaming video is too expensive, classifying every frame is too expensive, we'll often end up with a system like this. Um, I'm gonna... Well, the VAD stands for Visual Activity Detection. And what this does is a usually low-cost, low-power, inexpensive compute job to run to just very quickly maybe try to figure out is there a human face there. Um, because it turns out, actually, if you're building something to unlock your front door, you know, to you and your friends, if you look out my front door in my house, like, it's pretty boring. You know, there's, there's a, there's a, there's a wall. We see a little bit of street, but nothing moves most of the time. So it's very obviously... Almost all of the time it's very obvious that there's no one outside my front door trying to be let in, and it'd be very wasteful to stream all that video to the internet for classification, right? So visual activity detection is usually a low-cost, low-power system to just very quickly decide, "Should I do the work of, um, sending this to the larger face recognition system neural network that may be hosted on the cloud to have
39:39 – 50:21
Design choice: simple pixel-change VAD vs a small neural network (and why speed wins)
1. KKKian Katanforoosh
  it do the much more computationally expensive work to decide, you know, zero or one, do I unlock the door or not?" Okay? So this type of, um, optimization in order to make the system computationally feasible to deploy is fairly common. And, um, I'm gonna give you two options for how to implement VAD, and I'm gonna ask you to reflect on them and then tell me which one you would pick to get started. Okay? So option one is a non-machine learning based method, which is, um, see if the number of pixels changed is greater than some threshold epsilon, right? So, uh, if the camera's stationary, maybe looking at a wall, most of the time the pixels barely change 'cause, you know, it's just a wall. Um, and so you can write a little bit of code using, you know, some image library, like PIL, Python imaging library, or some simple-- write a few lines of code to just say, has the number of pixels whose RGB values have changed more than some threshold, has more than, you know, ten percent of the pixels changed compared to what it looked like a second ago, um, in order to see if there's anything in front of your camera to even decide to pass this to a more high-powered neural network. Okay. Option two would be to train a small neural network. Right. So, uh, face recognition is a pretty complex task. You have to look at multiple cues in the face, look at the eyes, look at the mouth, look at, you know. So that takes a relatively large neural network. But just taking a quick glance to see is there even a human in front of my door, that's a much simpler task. So option two would be training a very small, very low power, very lightweight neural network, um, to just very quickly tell you, do you think there's a human there? Um, and then use this trained model to decide, is it worth passing on to a much more powerful neural network running in the cloud with a lot more resources to make the final determination. Okay? So if you are the, again, CTO or the key person started building this, how would you... what, what would you... how would you start? We'll, we'll give everyone a few seconds to reflect on this and I'll get people's opin- get people's thoughts. All right. Go for it.
2. SPSpeaker
  Well, I think it's very much problem dependent. You might do this if people walk by every time and if you want to look at, uh, some threshold with pixel change, you might learn what a fault cause it is. Um, so it depends on what type of problem you have, and you can also go with-
3. SPSpeaker
  It's, uh, simpler classification necessarily than maybe like various labels of classification of, you know, is there a person there? We kind of
4. KKKian Katanforoosh
  Yeah. Right. Cool.
5. SPSpeaker
  Which is much cheaper.
6. KKKian Katanforoosh
  Yeah. Right. Cool. All right. Just, just repeating what I might... So problem dependent, depends on whether you're on the street where a lot of people walking past, maybe consider other algorithms, you know, that are even cheaper, that's more on it. Cool.
7. SPSpeaker
  Yeah. Just a training model. Uh, this is not the, like, main purpose. We pass this framework for that. Um, we can probably just use option one in the short term, uh, while we collect more data and then, um, replace that one with this one.
8. KKKian Katanforoosh
  Yeah. Yeah. Cool. Great. So use option one in the short term, uh, and then maybe replace that with option two when we have more data. Yeah. Cool. That's very, very sane approach. Go ahead.
9. SPSpeaker
  I was thinking like about why not using two options, like, uh, because the, like, what we, like, what all we ran so far is very simple. Um, only if that is clear, then we kind of like can say, oh, we can turn it off, and, and after that, like, we need to
10. KKKian Katanforoosh
  Oh. Oh, sorry. So use-- Oh, use both options, or put both in the pipeline. Oh, that's interesting. I see.
11. SPSpeaker
  Yeah.
12. KKKian Katanforoosh
  All right. So see if option one-- Let option one see if anything change. If something's changed, then pass this from your network and then turn it on. Oh, okay. That, that's cool. That, that could work.
13. SPSpeaker
  Right. So first we associate what optimally would, and then kind of we have to see if points really help in that [door slamming]
14. KKKian Katanforoosh
  Yeah. Cool. Yeah, that could work. So it could cascade in multiple steps. Yeah. Continue.
15. SPSpeaker
  How much would it cost us to actually run the neural network, uh, 'cause that was a thing, right? If it's cheap to run a neural network, uh, for when we gotta do option one, because then you don't have to worry about someone coming to the door and nothing happening. If it's more expensive, then I would probably invest more on options.
16. KKKian Katanforoosh
  That's cool.
17. SPSpeaker
  Like, make sure we're not wasting a bunch of money.
18. KKKian Katanforoosh
  It's cool. Yeah. Right. Yeah. So how expensive is it to run this neural network in the cloud? So, um, actually-- So I would usually want to design both of these to run at the edge, meaning on the device. Um-
19. SPSpeaker
  I'm more, I'm more worried about the passing.
20. KKKian Katanforoosh
  Oh, this one.
21. SPSpeaker
  Yeah.
22. KKKian Katanforoosh
  Uh, so it turns out that it's streaming video is fairly expensive. Um, so I think, um, uh, uh... Yeah, I, I, I feel like, boy, I don't have numbers at tip of my fingertips, but I think, um, running this twenty-four seven is not feasible, so we definitely need something to filter it down. Uh, but sending, you know, I don't know, few images every minute is probably not a problem. Makes sense? Yeah. Okay.
23. SPSpeaker
  Can we-
24. KKKian Katanforoosh
  Uh, sorry?
25. SPSpeaker
  Can we sample certain frames and not use certain-- Like, we maybe don't need all the frames in the video and we can leave out all of them.
26. KKKian Katanforoosh
  All right. Sample certain time-- Sa-sample certain frames of the video in order to get all of them. Yes. I guess so. Yes. Although, um, yeah. Although I, I think if you have a video of my front door, um, you need a way to sample other than random, I guess, right? Because, uh, a lot of time nothing happens, uh, unless you take one frame per minute, I guess, which would be okay, but then we don't want someone waiting there for a full minute before we finally get around to sampling and then sending it. So yeah. Okay.
27. SPSpeaker
  I think, um, this is a question. Could we use the same images that we used for using training on the neural network on option two, or would that be, like, biased?
28. KKKian Katanforoosh
  Yeah. Could we use the same images for training this network to train this-
29. SPSpeaker
  Option two
30. KKKian Katanforoosh
  ... the option two network? Uh, maybe. I, I, I think-- It's not actually one of those things I would-- I, I would say we have to try it before we know if it works. Might be doable. Depends a lot on what data we collected to train the brain neural network. Yeah. Um, cool. Anything else? All right. Let's take, take a couple last ones. Go ahead.
50:21 – 52:44
Practical insight discovered in deployment: frame selection and blur matters
1. KKKian Katanforoosh
  you one example of, um, uh, one insight that, you know, we and many other face recognition teams have, which is it turns out that, um, uh, when you're approaching a camera that's trying to recognize your face, there are some frames that are going to be really clear and in focus and a lot of frames are really blurry, right? So if you just look at video of someone walking towards the camera, sometimes just, you know, when I'm stepping, sometimes the velocity of my face is higher, and sometimes it's lower, right? That's just what it is. And so sometimes my face is in focus, sometimes it's more blurry. And it turns out that if you can select out the high resolution frames and feed that to face recognition, you get much higher quality results, right? So this is the kind of stuff that I assume, you know, most people would not know about until you worked on a system like this. But it turns out that when I was working on a system like that, having a system to not just do VAD, but also l- capture the video and then deliberately select not just one, but maybe five frames that are high resolution and in focus of the person, that actually gave a significant boost to the accuracy of our face recognition system. And I mention this as an example of the sort of discovery that you will have only when you implement one of these systems, maybe even implement this system. But you implement this system, you see, boy, we're trying to recognize a lot of pictures from blurry faces. Maybe we need to do something about that, right? And this is, um, driving the empirical process that may then lead you to train a neural network both to see if there's a face, but also to see if the picture of the face is in focus to select out that frame for downstream processing. Does that make sense? And this is why, um, you find that for some meaningful fraction of questions that are asked in, in, in class, the, the, the thing that's the speediest to do is often the right answer because that, that obsession with speed really lets you go in and figure out what's in your data and improve your system more efficiently, right? Um, so... Right. So yeah, so what happens in practice? You start with this, discover how it doesn't work, 'cause it doesn't actually work, I can tell you that, and then figure out... But use those learnings to do this. Yeah.
52:44 – 54:26
What is “good” accuracy? Using human-level baselines and beyond
1. SPSpeaker
  Just a follow-up to reform on what a good, uh, classification accuracy with problems like this in the video.
2. KKKian Katanforoosh
  Yeah. So is there a, is there a sense of what is a good classification accuracy for problems like these? It's actually really difficult. Um, one of the common benchmarks, you, you learn more about this in, um, the third module on the online videos as well, is, uh, uh, building machine learning systems is easier if we have a reference level of performance, uh, like a aspirational target accuracy, which is often human level performance. Um, so and, and, and so, uh, it, it, it, it turns out that, um, the way you diagnose bias and variance, which we'll talk about later in this course, is easier if you know what's an achievable level of accuracy. And very often we'll use what a human expert could do as achievable level of accuracy. Um, and then, uh, in the case of face recognition, definitely under controlled environments we're better than humans. So, uh, de- definitely... Actually, the, the AI systems we build, they're way better than I am at recognizing human faces, and I think AI systems are better than, I wanna say probably the vast majority of humans, maybe all humans at this point, at really distinguishing if two pictures are the same. And then, and then, then it gets really difficult. Then it actually gets more difficult once they're even better than humans. Um, uh, but those are, uh... Yeah. But, but at least-- But until you're as good as humans in a, on a certain task, using a human level performance is often a good benchmark. And then if you're doing something that even humans can't do well, um, like it turns out, you know, recommending online books or movies or whatever, humans are actually not that good at that. I think most of us have a hard time recommending
54:26 – 1:01:34
Monitoring & maintenance: data drift, concept drift, and owning real-world performance
1. KKKian Katanforoosh
  good movies even to our closest friends. AI actually probably does it even better than many of us do as humans. Then it-- Those, those things, it's harder to establish a baseline. Cool. All right. Now, one final aspect I wanna touch on is, um, after deploying a model to monitor and to maintain the model. Um, one thing that often happens is you train a machine learning model, works great, you know. Does well on your training set, does well on your test set, works great. And then you deploy it, and then annoyingly, the world changes and your system no longer works, right? So we sometimes call this, um, data drift or concept drift, where the distribution of dataThe world gives you is different than what you had in your training set or concept drift, which is the input-output mapping, your x and y changes in the world compared to your training set. But to, to, to ground this, um, if you're training a face recognition system now in, you know, I don't know, when it's not too cold here in California, you get faces of a certain distribution. But as we approach winter, if it starts to rain more, people are wearing scarves, rain jackets, you know, people look different, right? Or, or maybe, um, maybe we approach the summer, more people are wearing sunglasses, then the data distribution changes. Um, or if you train a system based on data here in California, but we then decide to deploy it, you know, in a different country where people, uh, dress differently or where their appearance is different, the world keeps on giving us different data. Um, and so one of the jobs of, I think, us as machine learning engineers is to put in place systems that monitor this type of concept drift or data drift and fix problems as they arise. When you're out building machine learning systems, I have seen a segment of AI engineers that think their job is to do well on a test set, right? And so I've been in a bunch of these conversations where, um, the machine learning person will say, "Look, I did well on the test set. My job is done." But then the product owner or business owner will say, "No, your system doesn't work. Look at all the ways it does not work." And then if the machine learning person says, "Well, that's not my problem. I do well on the test set," I think that's not a constructive way to move it forward. So I encourage you to think of yourselves if you're building a machine learning system, I think of my job as building something that works, and that can be different than building something that works on the test set. So if ever you work on a product and someone says, you know, "I know you did well on the test set, but your system doesn't work," I would encourage you not to respond, "But my job is to do well on the test set." I encourage you to think about why doing well on the test set doesn't translate to doing well on the application that people actually care about, and then lean in and go and fix that, right? And, uh, one of the common problems for why doing well on the test set doesn't translate to doing well in real life is if the data distribution changes, in which case you may need to update the distribution of data you're training on, um, in order to capture what has changed in the world. And, um, just a few other examples. I, I, I feel like, you know, the world gives us new data all the time. So if you're building a web search engine, um, sometimes, uh, sometimes, sometimes a new politician is elected or there's a new movement or some new video goes viral, or I don't know, Taylor Swift releases a new album. I don't know, whatever. And then people are suddenly searching for a brand... Oh, I thought someone-- I, I thought I'd get a laugh from that. No? All right, no way. No, no Swifties here. All right. [laughs] Um, uh, uh, but then suddenly a lot of people are searching for a brand-new thing, um, and so the distribution of data you get is different because the world has changed. Um, or, um, I've done a lot of work in factories. Uh, there's actually a reasonable chance the cellphone you have in your pocket may have been inspected by software that, right, my teams wrote. Um, but you know, sometimes the materials change or, uh, there's a new machine installed in the manufacturing line, and this machine makes a new type of scratch on the cell phone. So data changes in inspection lines as well. Or one, one thing that actually surprised me, when working on self-driving cars, um, one of the teams I was working with, we trained a lot on data in California, and then when we took the cars to Texas, um, you and I as people, we can drive just fine in, you know, California or in Texas, but it turns out the traffic lights look really different in Texas and California, right? So it's traffic lights, horizontal, vertical. I think part of it is, uh, there are very high winds in some parts of Texas, so a lot of traffic lights tend to be strung up differently to be robust to high winds, right? And so traffic lights look pretty different in Texas. So the models we train in California, they don't work in Texas. We've got to get new data, refresh the data. So a lot of that is a process of, um, monitoring and maintaining the model even when something in the world changes. And, um, before going on to monitoring the model performance and maintaining it, one interesting difference in performance between this and this is option one is a very simple model, right? It, it, it basically has one parameter, which is epsilon, the fraction of pixels that change. And so this, because it's so simple, is actually very robust to changes in distribution. For example, say it's a hot summer and a lot of people are wearing sunglasses. You know, well, the p- fraction of pixels that change, right, even when people are wearing sunglasses, doesn't change that much. Um, maybe if it's Halloween and people are wearing crazy large costumes, maybe that would change. But this is actually a very robust algorithm because it's so simple. In contrast, if you trained on data with no one wearing sunglasses, uh, because, you know, it's not, uh, sun's not that hot these days, right? Then with everyone starts to wear sunglasses, this is actually less robust. So one of the advantages of very simple non-machine learning based systems is they may be less susceptible to data drift because maybe I tune this parameter epsilon, you know, to limited data set, but even the data changes, this tuning isn't quite robust. But if I train a neural network with, say, thousands or tens of thousands of parameters, then I'm more likely to have overfit to people without sunglasses. So if people start wearing sunglasses, um, you're more likely to have to update this model, right? And, um, if you are building a-Um, sorry.
1:01:34 – 1:06:58
Dashboards, metrics, and alerting: practical operations for ML systems
1. KKKian Katanforoosh
  All right, cool. [sighs] Boy. If you're building a system, um, it turns out to be incredibly helpful if you can get user permission to stream a little bit of data back to your cloud-hosted service so that respecting user privacy, being careful often, transparent privacy practices... I, I think that privacy is really important, right? So do be transparent. Do the right thing for users. Um, and if you're able to do that and get a little bit of data back to your cloud to plot dashboards. And maybe one, one practice that I've seen is, um, when building a high-stakes application, uh, one good practice would be to gather your team together and, um, get a diverse set of opinions on all the things that could change and all the things that could go wrong. And, um, I've built quite a few machine learning systems, and I found that when we sit down and brainstorm all the things that could go wrong, including the data distribution changes, I don't think I have ever seen something go wrong in real life that we did not identify as a possible problem. I might be wrong, but at least right now, when we sat down, we really brainstormed all things to go wrong. I think I have yet to see something go wrong that was not on the list of stuff that we brainstormed, right? And so if you brainstorm... And, and, and this is true for safety critical applications as well, right? It turns out creative teams, you can actually think of all sorts of things that could change the data or things that could go wrong, and that then lets you, um, try to design a set of dashboards or metrics to put in place to monitor whether or not any of the things that you think might go wrong actually do go wrong. So, um, we may put in place dashboards like, um, how, how often... Uh, it, it turns out re-authentication is a, is a common thing. How often does a user need to authenticate twice before they're let through? That's a, that's actually a sign of user frustration, right? So it's build a dashboard to do that. How often does a user have to try twice? So it probably means something went wrong the first time. Um, how often do you accept versus reject the user? Um, and how-- what is the latency of the system? And I find that just as it's difficult to know in advance what's in your data, it's actually difficult to know in advance what dashboard will be the most useful. And so the best practice I tend to recommend is brainstorm a lot of things to go wrong, brainstorm a lot of metrics, um, and then just, you know, create very rich dashboards where this is time latency, time, uh, re-authentication, right? Um, time the, the number of zeros versus ones, and then draw plots over time to see how these rates trend over time. And if you're able to have a lot of dashboards and sample and just, you know, look at some of the data for where you suspect you may be making a mistake, that often is then a good way for you to have a higher chance of spotting, um, when there might be a problem. And in the times that built large dashboards with, you know, twenty, thirty metrics, I found that it's surprisingly difficult to know in advance which dashboards would turn out to be useful. Um, I think in, in exploratory data analysis and data science, again, because we often don't know what's in the data or what the data will give us in the future, we just, you know, frankly plot a lot of stuff and then go and figure out from there what is interesting, right? So the cost of plotting something, um, in a Jupyter Notebook is fairly low. So let's just plot a lot of stuff, have a lot of dashboards, and if you end up with, you know, thirty, fifty, a hundred dashboards tracking these metrics over time, then hopefully in a few days or a few weeks you figure out that a lot of them are really boring. So we figure out that, well, latency is just not a thing 'cause with cloud-hosted deployment it's just very constant. So, well, let's get rid of that 'cause that's just a very boring plot. I'm just not worried about that. And so we'll often plot a lot of things and then prune back to then have a smaller number of metrics that we track and monitor, um, for the long term. And then eventually, when you get a sense of, um, you know, this being the normal range for some metric, you can then also put in place upper and lower alarms, um, so that if it ever goes above or below a certain bounds, they'll trigger an alarm, like go page someone to take a look to figure out if something's gone wrong. That make sense? And so unfortunately, the, the, um, just because you train a model a lot of the time in a real world production setting, it doesn't mean you're done. Because you deploy it and then the world will give you surprising data, and having a plan to monitor what happens as less than maintain the model, meaning get, get new data, update the system, um, to fix problems as they arise, that is often an important part of, um, the practicalities of, uh, deploying a machine learning system as well.

Episode duration: 1:07:04

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode MGqQuQEUXhk

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

iOS

Android

Claude

Chrome

Why AI projects are inherently iterative (code + unpredictable data)

The “full cycle” of a deep learning project beyond just modeling

Case study setup: face recognition for door access and key-card verification

Face recognition architecture primer: Siamese networks and registration images

Interactive scenario: how to collect training data without scraping the internet

A speed-first principle for early data collection and learning

Using error analysis to decide what data to improve (data-centric AI)

Data quality and distribution mismatch: how similar must training data be?

Deployment reality: streaming constraints and the need for edge filtering

Design choice: simple pixel-change VAD vs a small neural network (and why speed wins)

Practical insight discovered in deployment: frame selection and blur matters

What is “good” accuracy? Using human-level baselines and beyond

Monitoring & maintenance: data drift, concept drift, and owning real-world performance

Dashboards, metrics, and alerting: practical operations for ML systems

Get more out of YouTube videos.