Skip to content
Stanford OnlineStanford Online

Stanford CS230 | Autumn 2025 | Lecture 10: What’s Going On Inside My Model?

For more information about Stanford’s Artificial Intelligence professional and graduate programs, visit: https://stanford.io/ai December 2, 2025 This lecture covers what's happening inside your model and provides a class wrap-up. To learn more about enrolling in this course, visit: https://online.stanford.edu/courses/cs230-deep-learning Please follow along with the course schedule and syllabus: https://cs230.stanford.edu/syllabus/ View the playlist: https://www.youtube.com/playlist?list=PLoROMvodv4rNRRGdS0rBbXOUGA0wjdh1X Andrew Ng Founder of DeepLearning.AI Adjunct Professor, Stanford University’s Computer Science Department Kian Katanforoosh CEO and Founder of Workera Adjunct Lecturer, Stanford University’s Computer Science Department

Kian Katanforooshhost
Dec 15, 20251h 46mWatch on YouTube ↗

EVERY SPOKEN WORD

  1. KK

    Welcome to lecture nine already. I hope everybody had a good, uh, uh, fall break. Um, today we're going to talk about neural networks, both convolutional neural networks and transformers. Um, and we're gonna unpack it to see what's going on inside. Um, this lecture used to be called, um, Neural Network Interpretability, but I've broadened the scope because there is a section now where we talk more about frontier models, um, and the interpretability or visualization methods have not quite been figured out for most models that you play with out there. So think about this one as research areas, what we know from convolutions, and what we're trying to figure out for, uh, frontier models. Uh, we're gonna start, uh, with a very packed agenda, with a case study, uh, where I'm going to ask you a question, um, and let you brainstorm a little bit, uh, all together, um, on how you would, you know, try to understand what's happening inside of a frontier model. Um, in the second section, we're gonna look at the example of convolutions specifically and try to interpret everything possible about a convolution, meaning we're going to look at input-output relationship, we're going to look at a specific neuron inside and try to interpret it. We're going to look also at, uh, specific feature maps and try to understand what they do. I will present many methods to do that. Those methods are real, and they've been used for convolutions. Um, but again, they're not the methods that you might see frontier labs use for today's language or, uh, vision, uh, uh, uh, large, large models. Uh, however, they're going to bring you the skills, uh, that will allow you to understand the methods for frontier models as researchers are trying to figure them out. Um, the second half of the lecture is going to focus more on the modern representation analysis. Uh, we're gonna talk about scaling laws, capability benchmarking, data diagnostics, and then I, I end on, on a few closing remarks. Okay, are we ready for this one? Lots of visualizations in this lecture. So first, um, [clears throat] um, question for you all. Um, let's say the case study is you are a model trainer, um, and you're, you know, working on a two hundred billion parameters model, um, at a frontier lab. And overnight, you know, a new checkpoint passes a training sanity check, but a few issues arise. Things like, you know, uh, model is getting worse on reasoning benchmarks, um, some safety evals are failing, um, and there is a weird spike in, let's say, latency for tool use when you actually use this model for an agentic workflow. Your VP is wondering what's happening, and they ask, "What is going on?" Um, and what are you going to look at first? So what I want you to discuss, um, for a minute or so, think about it first, then I'll open up, um, is what are the type of evidences that you would look-- want to inspect before even, you know, touching the code or retraining the, the model? What are the things that you wanna look at? Jump in. There's no single answer, so I wanna know everything you're gonna look at. Okay. So, um, error analysis. So look, look at... You said, "I will look at the reasoning benchmarks and find the examples where the model is failing, specifically try to find patterns in order to pinpoint what the issue might be." And then same thing on the safety, um, evals, where you wanna see what type of safety issues are arising. Is it everywhere? Is it specific to something? Yeah, I agree. Error analysis in general. What else? Remember, you're, you're the model trainer, so you're training this model. You're, you're supposed to be watching certain things when you're training. What can be interesting? Yeah.

  2. SP

    Um, by training sanity checks, do you mean like the loss and like tool two and all of those, right? If those passes-

  3. KK

    Yeah. Let's say not necessarily passing, but those are great examples. So you're, you're mentioning... Yeah. As you're the model trainer, you would be watching the training loss. And you wanna see... What, what, what are you going to look for in that training loss?

  4. SP

    Convergence.

  5. KK

    Okay, convergence. You probably want to make sure that it's smooth. You don't want big spikes. Um, how about the validation loss? What, what is your expectation on the validation loss?

  6. SP

    [muffled]

  7. KK

    Yeah, should probably follow the same curve as the training loss, but is likely slightly higher because you're probably performing slightly less well on the validation set than on the training set. If you're seeing spikes, it might, it might lead to cert-- it might mean there are some issues. Um, what else are you looking at? Yeah.

  8. SP

    Take a look at this round of training data to see how it's-

  9. KK

    So this batch, you mean?

  10. SP

    Yeah.

  11. KK

    Yeah, yeah. So you're looking at this round of training data. Maybe the last round of data that we trained onThere were some issue in that data. Maybe that data was, uh, you know, uh, probably, you know, poisoned or biased toward a certain category of data that we're failing on. You're, you're totally right. Yeah. Yeah. Maybe that specific checkpoint is doing poorly compared to the previous checkpoint, and so you have pinpoint-- you pinpoint where the issue arised, you know, during the, the training. What else are you looking at? Yeah.

  12. SP

    Yeah. I'm worried about the overnight thing. It's moved, uh, pretty fast. I wonder if this could be a line of a hardware issue.

  13. KK

    Okay. Because it's overnight, and it seemed everything was good up to yesterday, and now there's an issue, maybe you're saying there is a hardware issue. Yeah. We could check actually. Is-

  14. SP

    Latency.

  15. KK

    Yeah. Latency has been, uh, pointed out, so maybe the, the hardware has failed. Yeah, you're right. What else? So w-we-we-- A lot of the answers are global answers. You're looking at the model in general. You're not looking at specific portions of the model. What would you look at if you were to inspect, um, the model more precisely from the inside? And this one's a language model, so you can, you can think about the fact that it's a language model. Yeah.

  16. SP

    I mean, something I would do is just, like, recently examine different checkpoints.

  17. KK

    Mm-hmm.

  18. SP

    Like, look at what's, what has happened, like, over the past, like, a few checkpoints before this one to see if the model was performing, if it was getting to perform better or lose it, um, to see, like, if there was a reason we're gonna see the problem.

  19. KK

    Yeah, you're right. Like, you wanna look at different checkpoints and see where did we fail and might be able to trace back to that moment and figure out what the issue was. So for example, maybe your initialization, um, was actually pretty good, and the first checkpoints were doing well, but suddenly, at some point, uh, the model saturated in a certain way. Maybe you're seeing exploding gradients or vanishing gradients in certain, uh, moments and you wanna pinpoint that. Yeah. What, what else? We're, we're adding so many methods right now, but I, I wanna hear what else you have for language models. What other things can you visualize for language models that might, might mean something's going wrong? Yeah.

  20. SP

    If you really wanna go deep into it, you can actually plot the attention.

  21. KK

    The attention maps.

  22. SP

    Attention maps to see.

  23. KK

    Yeah, yeah. F-fair enough. You, you've learned about transformers, uh, in the online videos. Um, the attention maps, which are representative of, you know, the relationship between different tokens, they might not make sense to you. You might actually be plotting certain attention maps and be like, "This token has nothing to do with that one," but the model seems to think it has. And you might be able to, um, identify certain issues with the attention maps. What else beyond the attention maps? What, what... Yeah.

  24. SP

    I, I haven't done it myself, but, uh, run the sensitivity analysis. I hope to see where or what parameters of the model that shows sensitivity, sensitivity.

  25. KK

    So you mean tell me more about the sensitivity analysis. What would you fix, for example, and what would you change?

  26. SP

    Um, run with the parameters like you see. Right now, you're doing a parameter search, and then you can say, "Well, I need to do sensitivity analysis to realize what has happened." Um, but, but how the parameters have captured, um, or have reacted to the outputs.

  27. KK

    Yeah. Okay. Yeah, but I like the idea of sensitivity analysis. You might fix-- You might, uh, try to figure out which hyperparameter went wrong. Is there something wrong with our optimizer? Is our learning rate schedule poorly tuned? Um, uh, maybe scaling laws. You know, w-we know that we can play with compute, we can play with data, we can play with model size, and one of those might be going wrong. Maybe an analysis would allow us to identify, uh, the model is fine, it just needs to be trained longer. Or, uh, the model is actually too small for the amount of data we're giving it. You know, that type of stuff would come with, um, either doing a sensitivity analysis or, uh, comparing what we're doing to the scaling laws that we know from other models. Uh, we're gonna look into that. Okay. Any other ideas?

  28. SP

    I have a question. So if these two hundred billion parameters, would these be overparameterized? Like, probably these parameters are overparameterized.

  29. KK

    Might be. So you're, you're saying you-- I gave you two hundred billion parameters, which is a very large model, even as of today. Uh, it might be overparameterized. That's a good question because it depends on what it's been trained on, how much data we're feeding it, how much compute. It's all relative to each other. But yeah, it's a large model, so I would expect a lot of compute and a lot of data along with it. Um, you know, in fact, a lot of these models might be built as a mixture of experts. You-you've, you-you-you've heard about mixture of experts. One thing that could happen is that some of the experts are failing, and you might be inspecting if experts are in fact failing or the routing module is, um, always selecting the same expert because it's just, you know, found an expert that is really good and generalized, and the other experts areNot being used, that might be another issue as well, uh, that might, you know, be related to the model capacity. Because if the model is not using all its experts, it's probably not actually operating as a two hundred billion parameter model. It's operating as a smaller model. Yeah. Okay. So, you know, generally, this is to motivate the lecture. We're gonna look into all of these together today. All right. And we start with convolutions, uh, because they're very visual. For the convolutional part, we're gonna go super deep, um, but then for the frontier models, I'm just gonna get broader and, and give you the areas of research. So the answer to the question I asked typically would fall under four buckets, every solution that we looked into together. One is training and scaling. So people are looking at loss curves, at, you know, um, things like gradients, uh, learning rates, mixture of experts, routing, scaling laws. We're, we're gonna talk about all these. Um, the second, uh, category is representation and internal aspect of the model. You mentioned attention, um, heads and maps, uh, embeddings. Nobody mentioned embeddings, but you could actually visualize embeddings and see does it make sense to you. Are these em-- you know, tokens close to each other as you would expect, meaning the, the model's mental understanding of language is correct. Um, and then neuron-level behaviors, although that's really hard with a large model, um, and nobody has quite figured it out yet. Um, and then the other category might be data and distribution. Maybe, you know, the actual, um, uh, benchmark that we're looking at has been contaminated, meaning, you know, the model is just not-- Either it's doing too well on that benchmark, or it doesn't mean anything, or it's doing poorly for a certain reason because the data distribution used in the test set is completely different from the training or validation set. Um, and then, you know, it might be failing at different levels. You can run benchmarks on the language model. You can run benchmarks on the agentic workflow that is using that language model. And because you want the language model to be used in agentic workflow, those are two levels that you need to inspect. So for example, when a, a frontier lab says, um, "Our model is doing really well for tool use," what they mean is the language model has been tested on upstream tasks in a workflow, and it's actually good at tool use against their benchmarks. So those are different levels of capability analysis. Okay, so let's talk about, uh, convolutions. We're gonna dive deep inside convolutions, and then we'll go back up and look at frontier models. Okay? So, um, first case study, uh, for convolutions. Let's say that you have built an animal classifier for a zoo, and they are very reluctant to use your model, um, without any human supervising because they don't understand the decision-making process of the model. How can you alleviate their concerns? How can you give them intuition about the decision-making process of the model so that they feel like, "Ah, the model's doing m-things that feel natural and, and human"? So let's say, just to simplify, let's say you have a, a convolution neural network, and there's a softmax layer, and it's supposed to identify animals. So the number of classes are many animals. Yeah. If you were to write a quick Python code to give them some intuition, how would you do it? Yeah.

  30. SP

    I think like first with a softmax, like how we're gonna eventually get, like, the end result and sort of like probability of like what each animal is. And then the next thing I'll explain is how the, the CNN, uh, each layer of our CNN is getting higher, more of in-depth, I guess, features of the image that we are put-- of the animal we're showing.

Episode duration: 1:46:53

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode Ozb1AR_F5MU

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.