This video isn’t embeddableWatch on YouTube →

Stanford CS230 | Autumn 2025 | Lecture 10: What’s Going On Inside My Model?

For more information about Stanford’s Artificial Intelligence professional and graduate programs, visit: https://stanford.io/ai December 2, 2025 This lecture covers what's happening inside your model and provides a class wrap-up. To learn more about enrolling in this course, visit: https://online.stanford.edu/courses/cs230-deep-learning Please follow along with the course schedule and syllabus: https://cs230.stanford.edu/syllabus/ View the playlist: https://www.youtube.com/playlist?list=PLoROMvodv4rNRRGdS0rBbXOUGA0wjdh1X Andrew Ng Founder of DeepLearning.AI Adjunct Professor, Stanford University’s Computer Science Department Kian Katanforoosh CEO and Founder of Workera Adjunct Lecturer, Stanford University’s Computer Science Department

Kian Katanforooshhost

Dec 15, 20251h 46mWatch on YouTube ↗

EVERY SPOKEN WORD

80 min read · 16,410 words

0:05 – 2:37
Lecture roadmap: from CNN interpretability to frontier-model diagnostics
1. KKKian Katanforoosh
  Welcome to lecture nine already. I hope everybody had a good, uh, uh, fall break. Um, today we're going to talk about neural networks, both convolutional neural networks and transformers. Um, and we're gonna unpack it to see what's going on inside. Um, this lecture used to be called, um, Neural Network Interpretability, but I've broadened the scope because there is a section now where we talk more about frontier models, um, and the interpretability or visualization methods have not quite been figured out for most models that you play with out there. So think about this one as research areas, what we know from convolutions, and what we're trying to figure out for, uh, frontier models. Uh, we're gonna start, uh, with a very packed agenda, with a case study, uh, where I'm going to ask you a question, um, and let you brainstorm a little bit, uh, all together, um, on how you would, you know, try to understand what's happening inside of a frontier model. Um, in the second section, we're gonna look at the example of convolutions specifically and try to interpret everything possible about a convolution, meaning we're going to look at input-output relationship, we're going to look at a specific neuron inside and try to interpret it. We're going to look also at, uh, specific feature maps and try to understand what they do. I will present many methods to do that. Those methods are real, and they've been used for convolutions. Um, but again, they're not the methods that you might see frontier labs use for today's language or, uh, vision, uh, uh, uh, large, large models. Uh, however, they're going to bring you the skills, uh, that will allow you to understand the methods for frontier models as researchers are trying to figure them out. Um, the second half of the lecture is going to focus more on the modern representation analysis. Uh, we're gonna talk about scaling laws, capability benchmarking, data diagnostics, and then I, I end on, on a few closing remarks. Okay, are we ready for this one? Lots of visualizations in this lecture. So first, um, [clears throat] um, question for you all. Um, let's
2:37 – 7:01
Frontier-lab case study: sudden regression in a 200B-parameter checkpoint
1. KKKian Katanforoosh
  say the case study is you are a model trainer, um, and you're, you know, working on a two hundred billion parameters model, um, at a frontier lab. And overnight, you know, a new checkpoint passes a training sanity check, but a few issues arise. Things like, you know, uh, model is getting worse on reasoning benchmarks, um, some safety evals are failing, um, and there is a weird spike in, let's say, latency for tool use when you actually use this model for an agentic workflow. Your VP is wondering what's happening, and they ask, "What is going on?" Um, and what are you going to look at first? So what I want you to discuss, um, for a minute or so, think about it first, then I'll open up, um, is what are the type of evidences that you would look-- want to inspect before even, you know, touching the code or retraining the, the model? What are the things that you wanna look at? Jump in. There's no single answer, so I wanna know everything you're gonna look at. Okay. So, um, error analysis. So look, look at... You said, "I will look at the reasoning benchmarks and find the examples where the model is failing, specifically try to find patterns in order to pinpoint what the issue might be." And then same thing on the safety, um, evals, where you wanna see what type of safety issues are arising. Is it everywhere? Is it specific to something? Yeah, I agree. Error analysis in general. What else? Remember, you're, you're the model trainer, so you're training this model. You're, you're supposed to be watching certain things when you're training. What can be interesting? Yeah.
2. SPSpeaker
  Um, by training sanity checks, do you mean like the loss and like tool two and all of those, right? If those passes-
3. KKKian Katanforoosh
  Yeah. Let's say not necessarily passing, but those are great examples. So you're, you're mentioning... Yeah. As you're the model trainer, you would be watching the training loss. And you wanna see... What, what, what are you going to look for in that training loss?
4. SPSpeaker
  Convergence.
5. KKKian Katanforoosh
  Okay, convergence. You probably want to make sure that it's smooth. You don't want big spikes. Um, how about the validation loss? What, what is your expectation on the validation loss?
6. SPSpeaker
  [muffled]
7. KKKian Katanforoosh
  Yeah, should probably follow the same curve as the training loss, but is likely slightly higher because you're probably performing slightly less well on the validation set than on the training set. If you're seeing spikes, it might, it might lead to cert-- it might mean there are some issues. Um, what else are you looking at? Yeah.
8. SPSpeaker
  Take a look at this round of training data to see how it's-
9. KKKian Katanforoosh
  So this batch, you mean?
10. SPSpeaker
  Yeah.
11. KKKian Katanforoosh
  Yeah, yeah. So you're looking at this round of training data. Maybe the last round of data that we trained onThere were some issue in that data. Maybe that data was, uh, you know, uh, probably, you know, poisoned or biased toward a certain category of data that we're failing on. You're, you're totally right. Yeah. Yeah. Maybe that specific checkpoint is doing poorly compared to the previous checkpoint, and so you have pinpoint-- you pinpoint where the issue arised, you know, during the, the training. What else are you looking at? Yeah.
12. SPSpeaker
  Yeah. I'm worried about the overnight thing. It's moved, uh, pretty fast. I wonder if this could be a line of a hardware issue.
13. KKKian Katanforoosh
  Okay. Because it's overnight, and it seemed everything was good up to yesterday, and now there's an issue, maybe you're saying there is a hardware issue. Yeah. We could check actually. Is-
14. SPSpeaker
  Latency.
7:01 – 12:28
What to probe “inside” an LLM: checkpoints, attention, sensitivity, MoE routing
1. KKKian Katanforoosh
  Yeah. Latency has been, uh, pointed out, so maybe the, the hardware has failed. Yeah, you're right. What else? So w-we-we-- A lot of the answers are global answers. You're looking at the model in general. You're not looking at specific portions of the model. What would you look at if you were to inspect, um, the model more precisely from the inside? And this one's a language model, so you can, you can think about the fact that it's a language model. Yeah.
2. SPSpeaker
  I mean, something I would do is just, like, recently examine different checkpoints.
3. KKKian Katanforoosh
  Mm-hmm.
4. SPSpeaker
  Like, look at what's, what has happened, like, over the past, like, a few checkpoints before this one to see if the model was performing, if it was getting to perform better or lose it, um, to see, like, if there was a reason we're gonna see the problem.
5. KKKian Katanforoosh
  Yeah, you're right. Like, you wanna look at different checkpoints and see where did we fail and might be able to trace back to that moment and figure out what the issue was. So for example, maybe your initialization, um, was actually pretty good, and the first checkpoints were doing well, but suddenly, at some point, uh, the model saturated in a certain way. Maybe you're seeing exploding gradients or vanishing gradients in certain, uh, moments and you wanna pinpoint that. Yeah. What, what else? We're, we're adding so many methods right now, but I, I wanna hear what else you have for language models. What other things can you visualize for language models that might, might mean something's going wrong? Yeah.
6. SPSpeaker
  If you really wanna go deep into it, you can actually plot the attention.
7. KKKian Katanforoosh
  The attention maps.
8. SPSpeaker
  Attention maps to see.
9. KKKian Katanforoosh
  Yeah, yeah. F-fair enough. You, you've learned about transformers, uh, in the online videos. Um, the attention maps, which are representative of, you know, the relationship between different tokens, they might not make sense to you. You might actually be plotting certain attention maps and be like, "This token has nothing to do with that one," but the model seems to think it has. And you might be able to, um, identify certain issues with the attention maps. What else beyond the attention maps? What, what... Yeah.
10. SPSpeaker
  I, I haven't done it myself, but, uh, run the sensitivity analysis. I hope to see where or what parameters of the model that shows sensitivity, sensitivity.
11. KKKian Katanforoosh
  So you mean tell me more about the sensitivity analysis. What would you fix, for example, and what would you change?
12. SPSpeaker
  Um, run with the parameters like you see. Right now, you're doing a parameter search, and then you can say, "Well, I need to do sensitivity analysis to realize what has happened." Um, but, but how the parameters have captured, um, or have reacted to the outputs.
13. KKKian Katanforoosh
  Yeah. Okay. Yeah, but I like the idea of sensitivity analysis. You might fix-- You might, uh, try to figure out which hyperparameter went wrong. Is there something wrong with our optimizer? Is our learning rate schedule poorly tuned? Um, uh, maybe scaling laws. You know, w-we know that we can play with compute, we can play with data, we can play with model size, and one of those might be going wrong. Maybe an analysis would allow us to identify, uh, the model is fine, it just needs to be trained longer. Or, uh, the model is actually too small for the amount of data we're giving it. You know, that type of stuff would come with, um, either doing a sensitivity analysis or, uh, comparing what we're doing to the scaling laws that we know from other models. Uh, we're gonna look into that. Okay. Any other ideas?
14. SPSpeaker
  I have a question. So if these two hundred billion parameters, would these be overparameterized? Like, probably these parameters are overparameterized.
15. KKKian Katanforoosh
  Might be. So you're, you're saying you-- I gave you two hundred billion parameters, which is a very large model, even as of today. Uh, it might be overparameterized. That's a good question because it depends on what it's been trained on, how much data we're feeding it, how much compute. It's all relative to each other. But yeah, it's a large model, so I would expect a lot of compute and a lot of data along with it. Um, you know, in fact, a lot of these models might be built as a mixture of experts. You-you've, you-you-you've heard about mixture of experts. One thing that could happen is that some of the experts are failing, and you might be inspecting if experts are in fact failing or the routing module is, um, always selecting the same expert because it's just, you know, found an expert that is really good and generalized, and the other experts areNot being used, that might be another issue as well, uh, that might, you know, be related to the model capacity. Because if the model is not using all its experts, it's probably not actually operating as a two hundred billion parameter model. It's operating as a smaller model. Yeah.
12:28 – 15:02
Four buckets of evidence: training/scaling, representations, data, and multi-level evals
1. KKKian Katanforoosh
  Okay. So, you know, generally, this is to motivate the lecture. We're gonna look into all of these together today. All right. And we start with convolutions, uh, because they're very visual. For the convolutional part, we're gonna go super deep, um, but then for the frontier models, I'm just gonna get broader and, and give you the areas of research. So the answer to the question I asked typically would fall under four buckets, every solution that we looked into together. One is training and scaling. So people are looking at loss curves, at, you know, um, things like gradients, uh, learning rates, mixture of experts, routing, scaling laws. We're, we're gonna talk about all these. Um, the second, uh, category is representation and internal aspect of the model. You mentioned attention, um, heads and maps, uh, embeddings. Nobody mentioned embeddings, but you could actually visualize embeddings and see does it make sense to you. Are these em-- you know, tokens close to each other as you would expect, meaning the, the model's mental understanding of language is correct. Um, and then neuron-level behaviors, although that's really hard with a large model, um, and nobody has quite figured it out yet. Um, and then the other category might be data and distribution. Maybe, you know, the actual, um, uh, benchmark that we're looking at has been contaminated, meaning, you know, the model is just not-- Either it's doing too well on that benchmark, or it doesn't mean anything, or it's doing poorly for a certain reason because the data distribution used in the test set is completely different from the training or validation set. Um, and then, you know, it might be failing at different levels. You can run benchmarks on the language model. You can run benchmarks on the agentic workflow that is using that language model. And because you want the language model to be used in agentic workflow, those are two levels that you need to inspect. So for example, when a, a frontier lab says, um, "Our model is doing really well for tool use," what they mean is the language model has been tested on upstream tasks in a workflow, and it's actually good at tool use against their benchmarks. So those are different levels of capability analysis. Okay, so let's talk about, uh, convolutions. We're gonna dive deep inside convolutions, and then we'll go back up and look at frontier models. Okay? So, um, first case study, uh, for convolutions. Let's say that you have built an animal classifier for
15:02 – 19:55
Zoo CNN case study: building trust with input–output explanations
1. KKKian Katanforoosh
  a zoo, and they are very reluctant to use your model, um, without any human supervising because they don't understand the decision-making process of the model. How can you alleviate their concerns? How can you give them intuition about the decision-making process of the model so that they feel like, "Ah, the model's doing m-things that feel natural and, and human"? So let's say, just to simplify, let's say you have a, a convolution neural network, and there's a softmax layer, and it's supposed to identify animals. So the number of classes are many animals. Yeah. If you were to write a quick Python code to give them some intuition, how would you do it? Yeah.
2. SPSpeaker
  I think like first with a softmax, like how we're gonna eventually get, like, the end result and sort of like probability of like what each animal is. And then the next thing I'll explain is how the, the CNN, uh, each layer of our CNN is getting higher, more of in-depth, I guess, features of the image that we are put-- of the animal we're showing.
3. KKKian Katanforoosh
  Okay.
4. SPSpeaker
  And then maybe if I have time, I can try and, like, do some sort of like, um, use different images to try and figure out what layers in CNNs are getting good or bad.
5. KKKian Katanforoosh
  Okay.
6. SPSpeaker
  But that would, that would all be necessary.
7. KKKian Katanforoosh
  Good. Good. Good. So just to recap, um, the zoo is not AI native, so you have to explain certain things. You're gonna tell them what softmax is, so we're gonna have a probability for each animal classes. That's how it works. Um, and on top of that, you also mentioned you might talk about convolutions and say, "Here are how features are identified. Here's how a filter scans through the image, and, you know, we're expecting this to learn." So you are gonna educate them first. That's totally right. The second thing you mentioned is maybe you run, um, a dataset search, so you can try to build their intuition by showing them animal pictures and showing that the model's doing well. And yeah, I agree, those are good approaches. We're gonna see how to do a proper dataset search large scale. Uh, but what else can you do, uh, that's gonna give a little bit more confidence? 'Cause this is explanation, but it's not proof that the model is looking at the right place systematically. Yeah.
8. SPSpeaker
  Give space to one of the models and demonstrate those legs and the hood or something like that and show all the small pictures that help with-
9. KKKian Katanforoosh
  So you say y-y-- I-ideally, you would give them intuition at a fil-- at a, at a filter level. Like this filter, we know it's responsible for finding the legs of an animal. That's what you're saying. So h-how would you do that?
10. SPSpeaker
  Uh, maybe, I don't know. Is it not as simple as just like creating the, um, like the others, like-
11. KKKian Katanforoosh
  You know, good, you know, intuition. You're asking is it as simple as just printing out the weights of-- that are identified or the feature map that results of that filter? Um, unfortunately not, usually, because that might be true for the first layerBut as you get deeper, things mix up so much that, you know, if you were just to print the, uh, filter, it wouldn't make, um, any sense pretty much. Um, but there are other methods that we're gonna see. So your intuition is right. We're, we're gonna try to give them that. Something simpler, input-output relationship. How would you show that, um, that the output is actually related to the right portions of the input for this dog, for example? Yeah.
12. SPSpeaker
  Like recall [inaudible] information [inaudible]
13. KKKian Katanforoosh
  Yeah, confusion metrics across a lot of data, you would find true positives, false, et cetera. Yeah, correct. Something else?
14. SPSpeaker
  Yeah. Can you have a kind of like a
15. KKKian Katanforoosh
  Mm-hmm
16. SPSpeaker
  ... [inaudible] layer of the network and then show like instances where you make like different layers. It shows how, you know, from the first layer it's taking a little bit of a [inaudible]
17. KKKian Katanforoosh
  So similar to what he said with, um, you wanna give them intuition from the inner workings of the network, and you're saying, "How about we mask the latter parts of the network, and we treat every intermediary layer as an output and analyze if the output makes sense?" Yeah, we're gonna look at that actually. Um, yeah, those are more advanced. What I was looking for is even more, more basic, much more basic. It's, uh, if you wanna show the, uh, relationship between an input and an output of a CNN, uh, or, you know, any vision model,
19:55 – 24:52
Saliency maps and integrated gradients: pixel attribution for CNN decisions
1. KKKian Katanforoosh
  um, you might take the score of the dog, uh, in the output layer. Okay? And what is exactly this quantity? What, what, what is the intuition, your intuition for what this quantity means? If you take the derivative of the score of, uh, an animal class with respect to X, with X being the input image. Yeah. What, no.
2. SPSpeaker
  How the score of dog varies with [inaudible]
3. KKKian Katanforoosh
  Yeah, how does the score of dog change when you move pixels around? Which is what you want, right? You wanna be able to... If you can do that, you would indicate that which are the pixels of the image, that if we change them, it changes the score of dog. If you can print that, then you would be able to show this is where the model is looking at when it's predicting a dog, right? So yeah, if you actually calculate this derivative, um, you would get something like that, where some of the pixels are gonna be brighter, meaning their gradient is higher, and some of the pixels are gonna be darker, meaning we moved that pixel, and it didn't modify the score of dog at all. That's a very quick way to look at which pixels in the input were relevant for the score of dog. Uh, now, why should we select the score of dog pre-softmax versus post-softmax? It's usually a very common mis- misconception. Yeah.
4. SPSpeaker
  Uh, it's pre-softmax [inaudible] scaled [inaudible]
5. KKKian Katanforoosh
  So what's the issue with the scaled version?
6. SPSpeaker
  Uh, it's more representative of the score.
7. KKKian Katanforoosh
  Class of dog, yeah. So, uh, what, what you said is, uh, the, the post-softmax score is not only dependent on dog, it's also dependent on all the other scores. So you could actually, um, take a pixel, move it, and it happens to modify the score of a panda because there's a panda in the background or something, and it would influence what you're trying to show. Uh, but you're only looking at dog. You just want the score of dog to be influenced. So that's why in this method called saliency maps, we use the pre-softmax scores that is only representative of the class at hand that you're analyzing. Okay? So you could do that, and actually if you were to, i- in the past, not anymore, but you could use that for a quick segmentation just sanity check, because the pixels that are brighter, the gradients that are brighter are representatives of the pixels that should be overlaid on the dog. And in fact, if you do the saliency maps and you realize that the pixels that are bright when you compute that gradient are all over the place, it's probably that the model is not even looking at the right place. It's just getting lucky. Okay? So this first method, saliency maps, now have that in your toolkit. Very easy to implement, right? You just write a Python script. You, you perform the gradients calculation. You print it. It's a matrix of pixels that are brighter or darker, and you're done. Um, one of the main, uh, issues with saliency maps is that it's looking at just the pixel level, um, which doesn't make too much sense if you wanna interpret semantically where the model is looking at. You know, the model will never see a cat or a dog with one [chuckles] pixel being different than the rest. It would be too discontinuous. So instead, there's another method, I'm not gonna go into the detail, but I linked the paper, which is way more common, called integrated gradients. Um, and integrated gradients, uh, the idea is that instead of doing that directly by taking the DS of dog over DX, we're gonna take an image of the animal, and we're gonna, uh, generate, you know, sort of many pictures that are, uh, taking, uh, dark, completely black zeros, um, a pixel all the way to the animal, um, the final image. And then we're going to look at the path of gradients across all of this update, and it's going to be way more interpretive. I'm not gonna go into the details, but integrated gradients is just an extension of saliency maps that happens to use a different formula with an integration and is way more common. Um, if you look at it practically, this is an example from the medical field. Here is, uh, an image of a retina. And, um, if you perform the integrated gradients, you would see that, uh, the originalUm, image, uh, the, the no- the, the annotation for the les- lesions are exactly where the model is looking at when it's giving you a probability that there is a lesion. Okay? So second method called integrated gradients. Let's push it a little further.
24:52 – 28:27
Occlusion sensitivity: masking patches to test what regions matter
1. KKKian Katanforoosh
  Um, the next case study is, uh, that, you know, you, you, you, you, you wanna now, uh, tell them a little more about the decision process, um, of the model, uh, with, um... I guess, I guess, let me, let me rephrase. The saliency maps looked at the pixel level. Um, what, what you can do in order to give a better intuition, uh, which was mentioned earlier, is another approach, uh, called occlusion sensitivity, which is, um, actually way more, uh, intuitive and simple, where you could actually, uh, you know, take the dog image and paste it into the CNN, and you would get a score of a dog. You could also overlay a dark square, so zero out or mask partially the input image and give it to the same CNN and track the modifications on the score of dog that you're tracking. If you actually do that, you can plot a probability map of how is the score of dog changing as I move the dark square through the image. So let's do it together. I'm gonna say that, you know, th- this one, when you, when you actually put the dark square on the top left of the image, the score of dog is unchanged. It's still very high. Now you, uh, move, uh, the square a little bit to the right, and you see that, uh, it's still very high, the score of dog. You do it again, still very high. Now, uh, the square is partially occluding the face of the dog, and you should see, uh, the score of dog drop if the model is in fact looking at the dog. And you perform that many times, so you scan, uh, through the image with your dark square, and you plot what we call, um, you know, the, uh, the probability map of the true class for different positions of the gray square. Does that make sense? So pretty simple. Computationally expensive, though. Just have to rerun [laughs] the image so many times through the model. Uh, here are practical examples, um, to look at. The first one, uh, the true label is a Pomeranian, cute dog, and you see that the model is, um, failing to recognize the true class when the square is overlapping with sort of the center of the face. Which makes sense, because here the true class that we're tracking is not dog, it's Pomeranian, and I could see how if you occlude the face, it's hard to get the breed of the dog. The second example, the true label is a, a car wheel. Sorry, I hadn't shown you that. Uh, the true label is a car wheel, and you can see that when the square is on the wheel, um, it is, um, it is in fact dropping in terms of the true class probability. Um, and then finally, the Afghan Hound. What's interesting about that third example is the probability is dropping when the square is on the dog, but it's also increasing, uh, when the square is on the face of the human on the left. Which means that if you actually occlude the, the face of the human, the model thinks even more that the true class is in fact an Afghan Hound. You're just removing additional unnecessary information for it to discover the true class. So this model seems to be doing well, right? It seems to be looking in the right place. We call that, uh, occlusion sensitivity. Pretty simple. Another tool with saliency map and integrated gradients in
28:27 – 36:57
Real-time localization via CAM/Grad-CAM: fixing the interpretability bottleneck
1. KKKian Katanforoosh
  your toolkit. Let's push it slightly further. Um, here, uh, here we're, we're given, um, you know, along with the classification output, we-- the zoo wants a real-time visualization of the model's decision process, and you have one day to show that, and we're talking about convolutions again. What do you do? So the important part is to know, uh, since, uh, the, the methods that we've seen so far are sort of post, uh, methods where you analyze the output or you show something. Here, we're looking at a so- sort of ideally a, a module that we could plug in our network that would constantly give us the decision-making process of the network, or at least where it's looking at. How would you do this? This is our network, by the way. We're taking an input, we're adding zero padding, and we have a series of Conv, ReLU, max pool blocks. And then at the end, we flatten. We have a triple fully connected layer, a softmax, and we get our probability output for classification. So ju- just first question, where do you think is the weakness of this network when it comes to interpreting where the model is looking at on a picture? A part of this architecture is very-- makes interpreting, uh, the network way harder. Yeah.
2. SPSpeaker
  The fully connected layer at the end.
3. KKKian Katanforoosh
  Why?
4. SPSpeaker
  Because, um, if you look at all the, like, pixels at once, I guess if you wanna call the pixels in the last convolutional layer, uh, and there's three of them, so it's like three layers of abstraction.So what you get is [inaudible]
5. KKKian Katanforoosh
  Yeah, totally right. The fully connected layers, you're looking at all the pixels at the same time, you're mixing everything, and you're doing it three times. So by the end of those three layers, the information has been s- mixed together, essentially. You do not find the localized information that you had pre that with the max pools and the conv, conv layers. So how could you change that layer in order to avoid that? How would you modify your network if you wanted to retain maybe the, the, the performance of the model, but not lose that, uh, localized information?
6. SPSpeaker
  Um, I guess, could you do one layer instead of three overlay, but it might be... I don't know actually how many times one layer [inaudible]
7. KKKian Katanforoosh
  Yeah, good idea. How, how could we-- Instead of doing three, can we do one? Can we, you know, still have a layer that makes the, uh, you know, makes the interpretation easy? Actually, there's another trick, uh, which we're gonna see, but it's similar to what you desc- de- describe. Let's say we convert this network into, uh, something where the, you know, the flattening of the pixels and the fully connected layers are converted into a single global average pooling layer and a fully connected layer. So here, we reduce from three to one, the fully connected layers. We still need our fully connected layers and our softmax because it's a classification task, and we want a good decision engine at the end. Uh, but we converted-- we, we added a global average pooling. So let me explain why, why this is, uh, better. So the last conv block essentially is giving us a volume. For the sake of simplicity, let's say that volume is a four by four, uh, with six channels. Okay? And I color-coded them for simplicity. So each of these channels is a feature map that is resulting from a filter being scanned through the previous input, right? Everybody's clear on that? So global average pooling is gonna take each of these channels, feature maps, and is going to average them in a single number. So if you take the orange matrix and you average it, it gives you one of four point seven. You do the same thing with the green one, the blue one, all six of them, and you get a volume, or call it a vector of size six. One, one, six. So why is that interesting? Because we actually did not lose the localized information. Um, we did not mix things up. We just assigned a single number to a feature map that we retained. So the localized information is still there on the previous volume. And now we can treat that as a vector that goes through a decision engine or a fully connected layer that ultimately goes through our softmax and give us the probabilities. So this architecture is easier to trace back to localized information in the input space because you can actually look at, let's say, one of the score of dog, and you can look at the weights of each of these edges that tell you how much has the feature map from the volume before contributed to that score. So in other words, if I had to summarize, let's say the feature map looks like this. So this feature map is very high, has somehow activated heavily in some portion of the input image. The others similarly have activated to other things. You're taking the weights from your fully connected layer. By the way, you have to retrain that layer. You have to just train that layer, that last one. And then you sum all of them, and it gives you what we call a class activation map for the class that you're visualizing. So you're overlaying those last six feature maps, and you're weighing them with the weights of the last fully connected layer. You're not losing information. You're not mixing three fully connected layers that are impossible to trace back to the input space. Okay. So if you now give it an input image, and you overlay the class activation map for the score of dog, which you can do for other classes. You can do the same thing for the class of cat. Look at the different weights, the feature maps. And maybe for the class of cat, the, the weights will, will certainly be different, so the contribution of each feature map will be different. Yeah, and this is what you get. This is called class activation map, um, and it's from folks over at Berkeley. And so here's a video that describes it. Uh, it runs really quickly. You can think of it as a slight modification to a, a vision network that can allow you to unpack what's happening inside and what's the decision process. There is also, um, sort of an improvement to the CAM or class activation map algorithm called Grad-CAM, which enhances that method. Okay. Any questions on class activation maps? So we're getting to know convolutions a little better. Yeah.
8. SPSpeaker
  I have a question. Like, in the video, it seems like it has hyperbole on unrelated things all the time. Like, it creates the [inaudible]
9. KKKian Katanforoosh
  Yeah. So you were saying in the video, it seems like the model sometimes is looking at, uh, meaningless things. Yeah, I mean, it's not surprising, frankly. This is the previous generation of models. Um, and on top of that, you're looking on a video. The model is a classification network, so it might look sometimes at things that are not even labeled, and so it has to find the closest one. It might not make sense at all. That's why you build that type of a module to visualize and understand, like, the network's actually not working that well. No.Yeah. But maybe on the main objects that you actually want for your task, let's say the zoo wants to do very well with cats and dogs, you can verify that when a cat is moving even at fast speed, the model's [chuckles] quickly looking at it, you know. Super. Uh, let's do a couple more methods, uh, because it's gonna build our intuition, um, for frontier models. Um, so now the zoo trusts you. It trusts that the model correctly locates
36:57 – 42:56
Querying what the model “thinks”: class model visualization via gradient ascent
1. KKKian Katanforoosh
  animals, but they get sor-sort of scared, and they wonder if the model understand what a dog is. Like, does it understand actually what a dog is, or is it just, like, pattern matching random things? Um, you know, how could you, um, take this ConvNet and sort of query what the dog-- what the model thinks a dog is? How would you do that? Do you-- How could you ask the model what's your best representation of a dog?
2. SPSpeaker
  It'd be like trying to reverse engineer, or get a, the image of that maximize the probability of the dog.
3. KKKian Katanforoosh
  Okay. Yeah. You did say two thing. So get an image, so a forged image, that maximizes the probability of dog. Yeah. Let's do that, actually. And then on your second point, um, um, about, uh, reverse engineering, we're gonna look at the method there as well. But yeah, I agree. You could, um... So how would you concretely do that? Like, what would you maximize?
4. SPSpeaker
  The softmax output.
5. KKKian Katanforoosh
  Okay. Actually, what we said earlier, but I, I, I think you came right after that, the... Is we would not take the softmax output because the softmax output is dependent on other classes. You divide by the sum of the exponentials of other classes. And so you could actually maximize the softmax output by not maximizing the class you want, but by, uh, minimizing the other classes, which is different than what you want. So, um, here is what we'll do. We'll define the loss function where we take the pre-softmax score of dog, so the thing right before the softmax, which is only dependent on that specific class, and we might also regularize it to make sure it looks natural. The reason we want the regularization term is because pixels need to be between zero and two fifty-five, roughly. And so you don't want to run an optimization, uh, a problem where pixels can have values that go all over the place. It's just not gonna look good to the human eye. Um, and so we're gonna do that. We're gonna run a gradient ascent algorithm, so similar to what we've seen in some of the previous classes, um, where we are gonna update the pixels of an input image, a completely random input image, until we can maximize the loss function we defined. Um, so we forward propagate the random image, we compute the objective, we back propagate, um, all the way back to the pixels, and then we update the pixels to maximize that objective. And we do that many times until, uh, we end up with something that might look like this. So let's say we take the score of a Dalmatian. Um, you know, here, um, researchers, um, and, and, uh, showed, and this is work for Jason-- from Jason Yosinski, um, shows that you can start seeing the model. If you ask the model what is a Dalmatian, it will tell you it's something with black dots on a white background, roughly. So actually might not understand fully what a dog is, but it-- that's what it thinks it is. So we just unpacked it a little bit and queried, uh, queried that. A-another interesting one is if you look at, um, the goose, so here, the, the, the top left, um, label. A goose for the model is many of them. What does that mean? It means probably the model has seen a bunch of geese all the time together and has rarely seen a single one. And maybe the labeled data was labeling that as goose when it was geese, and so the model actually doesn't understand that it's a single one. It thinks that all of them are the label. Does that make sense? Okay. Super. So that's called class model visualization. You can actually-- Oh, sorry, I wasn't showing what I was talking about. Or no, I was showing. The, um, uh, the, uh, the, the way to improve those visualization is just to change some of the regularization methods. So the researchers have shown that you can actually, um, add more color by regularizing better, so it looks better to the human eye, and then it becomes easier to query the model for a variety of classes, just to make sure that it understood those classes. And so same with flamingos. Um, actually, the label flamingo to the model feels like many flamingos. Just something you can observe. Any questions on, uh, class model visualization? Nothing s-super new, just another, um, you know, tool in your kit. It turns out you can apply the same, uh, type of method as class model visualization, but instead of doing it at the class level, uh, you do it in, um, an intermediary activation. So you could actually do the same exercise, sort of what you were saying earlier with the masking of the later layer and just looking inside the network. You could pick an activation in the network, create an objective function with the regularization and say, "Hey, show me the input picture that maximizes this activation." And that should tell you what isThe input that maximizes activation the most, right? Okay. So that's class model visualization, which can also be applied with gradient ascent anywhere inside the network at any neuron.
42:56 – 48:45
Dataset search for interpretability: top activating examples per filter and receptive fields
1. KKKian Katanforoosh
  And that already gives you some sort of a, a method to look at the neuron level and say, "Hey, what's the input that maximizes-- the fake input that theoretically you could generate that maximizes that activation?" Um, the next method is actually the most commonly used, um, today because it's so simple and intuitive. It's a dataset search. So what you could actually do is to pick a filter. You, you, you just pick one filter, you pick its feature map among, you know... I guess you pick one feature map at some point in the network, such as at, at, at, after this max pooling layer. And, um, so let's say you have two hundred and fifty-six filters in that convolution layer. So you have two hundred and fifty-six feature maps of size five by five. Um, and you, um, find across all your data, your validation set, the top five image that, uh, maximize this feature map. Yeah. So you just track the activations in that feature map. You find the highest activation across all your data, and you find the top im-- the top five images, and you can do that, you know. Again, you would say this seems that the filter, um, that produced that feature map has learned to detect shirts. Because if you find the top five images that activated that feature map the most, that filter the most, uh, it's all images of shirts. If you were to find something like this, you would say it seems that the filter has learned to detect edges. And you could do that across every feature map to interpret your filters. Okay. Simple dataset search that can allow you to interpret if a, if a filter is reacting to something meaningful. So if you look at these pictures that I printed at the bottom of the slide, they're all cropped. So why are they cropped? They don't look like images from the dataset. The, the image is probably bigger than that, right?
2. SPSpeaker
  The body.
3. KKKian Katanforoosh
  Hmm?
4. SPSpeaker
  Because of the body.
5. KKKian Katanforoosh
  The, the what?
6. SPSpeaker
  The body. Like, like you have to find the feature of the body [inaudible] the body.
7. KKKian Katanforoosh
  So, so we took-- we-- So we took the input image, we send it through the Conv ReLU blocks, and then we pick a feature map in the fifth block, let's say. What is this feature map looking at? Is it looking at the whole image or no? Sorry. What is the... So we pick a feature map, and in that feature map, we find the activation that's the highest. So let's say the activation is the row number five, column number three. If you pick that activation, does it have access to the entire input image or not? Mm?
8. SPSpeaker
  Uh, not necessarily, 'cause it's more focused on, like, the, like, what is actually, like... It's closer to what it's actually predicting, right? So it's like at some-- Like, your concerns are actually getting to your right answer. So that's why you're basically seeing where the, like, where the, uh, shape itself is right now, necessarily.
9. KKKian Katanforoosh
  Yeah. So you don't necessarily have access to the entire image. The best way to visualize it is at the first layer. Let's say on the first layer, you take the input image, you take a filter, and you run it through. Well, that activation in the feature map of the first layer is only going to see what the filter sees, right? When you go deeper in the network, do you see more or less on average of the image? A single activation, does it have access to more or less parts of the image? I'm saying more. Yeah, it's more. Le-let's look at it. So here's a picture, sixty-four by sixty-four by three, let's say. Um, we have a conv network, five layers. Um, and after five layers, uh, the last conv has thirteen filters and leads to a thirteen by thirteen... Sorry. It has two hundred and fifty-six filters that leads to a thir-thirteen by thirteen feature maps. Um, if I look at this feature map, let's say that's the most activated, um, I trace back to the input space, it will have access to, uh, this part of the image, let's say. Now, if the-- there was another Conv ReLU block, uh, and I was looking at a feature map, it would have even more abstraction of multiple portions of the square. So it would actually see slightly more from the input image. Does that make sense? So, you know, that, that's, that's, that's how you would think about it. And it makes sense because at the end of it, the last output has access to the entire image, right? 'Cause all these things are, are adding up, um, to the prediction. Okay. So the deeper the activation, the more it sees from the image, and that's why the images were cropped. They were just cropped based on tracing back what that activation was looking at in the input image, which you can do very simply, um, computationally.Okay.
48:45 – 1:05:27
Reverse engineering CNN activations with deconvolution (transposed convolution) and unpooling
1. KKKian Katanforoosh
  So now we're gonna look at our last method for convs, which has to do with reverse engineering a conv, and then we'll move to the frontier models. Um, so remember this slide from when we introduced, uh, generative adversarial networks, GANs? Um, I didn't talk too much about the generator architecture. I just said it was a neural network. But I did mention that something's weird about that network, which is that the input is way smaller than its output. The input is a vector Z, the output is an image of more dimensions. Um, it is very common that such a network needs to upsample and thus would use a deconvolution. Sometimes in the literature, you're gonna see dimensions of deconvulsions as an upsampling network, where the input is smaller than the output. Um, sometimes those are called transposed convolutions, but we're gonna talk about why. Another example where you might run into upsampling is when you have an encoder, decoder type network, such as for segmentation. So let's say you're given an image of a cellular-- a s- a set of cells like this, and then you wanna label, segment the pixels that belong to a cell just to find the different cells. Uh, typically, you would use a, a set of convolutions that reduces the volume in height and width, and then you'll get information encoded in a dense format that you will then upsample because your output should be of the size of the input minus the number of channels. But at least every pixel should have its own class. And so typically, that would be a set of convolutions followed by deconvulsions. So you downsample, you upsample. Um, why am I talking about deconvulsions? Because we're going to try to reverse engineer, uh, conv networks by adding a module, a deconvolutional module that will take a specific activation, and we'll reverse engineer the trace to verify what was the reason this activation was high. Okay? This idea is key, not only for convs, but for any network you'll think about, uh, uh, in the future when you work on it, if you wanna actually reverse engineer, uh, a spe-- the, the reason a specific neuron has been active. So let's take the example of a one deconvolution. This is the most, uh, uh, basic example, just for the, the sake of understanding the math. Um, and I give you an input X, which has, um, some padding with two zeros at the top, two zeros at the bottom, and then X one through X eight values, and I send that input to a one deconvolution, which has one single filter of size four with a stride of two. What's the size of my output Y? Remember the formula. What?
2. SPSpeaker
  Five.
3. KKKian Katanforoosh
  Five. Correct. Yes. I assume you, uh, you did this, um, this. So, you know, you took the-- an X, you applied the formula, and you, you floored it. You didn't forget to add one, um, and then you ended up with five. Is that it? Yeah. Correct. Super. So we have our output of five, and let's say we define our filter size for W one, W two, W three, W four. Um, it's actually easy to see that the conv oneD can be written as a system of equations, where Y one equals W one times zero plus W two times zero, because of the padding, plus W three times X one, plus W two times X-- uh, W four times X two. Right? You're just overlaying the filter on top of the first four indices of the inputs, and you're doing a dot product. Same thing with the second one, third one, all the way to the fifth one, and that's your system of equation that describes this conv oneD. Now, because it's a system of equation, you could actually write it as a matrix multiplication. So you could say the conv oneD is literally just a weight matrix that we multiply by the input. So the input and the output sizes we know, five one and twelve one. So the weight matrix is necessarily a five by twelve weight matrix. So we just rewrote the conv oneD as a single weight matrix that you multiply by the input, you get the output. If you were to, you know, draw this matrix, um, this is what it would look like. So it would be a matrix with values all along the diagonal, and the rest are zeros. So that's our conclusion. Conv oneD can be rewritten as a matrix vector multiplication. Everybody follows? So if you can write it as a matrix multiplication, remember, we're talking about reverse engineering, then I could say, um, deconvIt's possible. It's possible to reverse engineer that, and in fact, I'm gonna make a very, uh, big assumption that is not always true. Sometimes it's true, but for practical reasons, we're in deep learning, right? It's an engineering field. Um, we're gonna assume that W is invertible, um, and so you can find a matrix H that is equal to the inverse of W, um, such that X equals HY, so that you're able to in-- reverse engineer, uh, the signal. Uh, I'm gonna make a second assumption, um, in the sizes I printed, which is even bigger, is that W is not only invertible, it-it's also orthogonal, meaning that its inverse is, um, its transpose. It happens that it's sometimes true, and in fact, if you think about an edge detector, so let's say our filter is minus one, zero, zero, one, uh, zero, zero, zero, one. It's, it's a, it's an e- an edge detector, so I-I have one too many zeros, but it's an edge detector, um, and it's actually, um, uh, you know, invertible, and its inverse is its transpose. Um, and so that simplifies our reverse engineering because we have a conv one D, and we know that we can write it as a matrix vector multiplication, and we can transpose that matrix to reverse it. And maybe it's not always true, but it's true enough for it to work in deep learning, pretty much. You're gonna do that so many times, right? Um, that's why in the literature, oftentimes deconvolution are called transpose convolutions. I give you the one D example. The two D example is similar, it's just more complicated. There's more math, things mix up, but, you know, sim-same idea. Um, now, there is a trick that makes it simpler to code the deconvolution, um, and to see that trick, um, I just drew, um, X equals W transpose Y. So you have your X, which is a vector of size twelve, although there is two padding at the top or at the bottom. Um, and then I transpose the W matrix that I was showing you, um, and then I multiply that by my vector Y of size five. Um, it's ac-- this is actually a transpose convolution with stride two, is equivalent to something slightly different, which is a subpixel convolution of stride one half. It's a mathematical trick. You can do it at home, but you would see that, uh, these two operations are equivalent, meaning you can actually, um, flip the filter. So you see in the right side, uh, in the left side of the screen, the filters are flipped. So if you look at the first row of my matrix, it's not W one through W four, it's W four through W one. So I flipped the filter, and I scanned it all the way through the diagonal. I also used another trick, which is my Y vector, I inserted zeros in between the values. It's called subpixel. So I inserted zeros, and I also added some padding. So a couple of tricks, but no, n-no need to, to remember it by heart. If, if there's anything you can remember, it's that implementing a deconvolution in the subpixel version I was describing is, is similar to a convolution. But what you do is you create a subpixel version of the input by adding zeros in between the values and padding it. You flip the filters, and you divide the stride by two. And that's what a deconvolution is. So if you have a convolution on your, on your n- on your neural network, and you wanna reverse engineer it, you take the filters, you flip them, you create a subpixel version of the input, you divide the stride by two, and you run the process. You will have reversed that convolution. The reason we're doing that is because we're just rewriting the convolution as another convolution, but the hyperparameters are different. But it's easy to code, right? You're just reusing the same code, pretty much. So anyway, let's get back to our example here. We have, um, an image of a dog. We run it through a conv net, um, and we pick at some point in that conv net a feature map. We pick one feature map only among the two hundred and fifty-six possible feature maps right here, and we're gonna look at the max activation of that feature map. So we find the max, let's say it's this one. So row two, column three is the maximum number of that feature map. We-- What does it mean? It means the filter that led to that feature map, when it looked at its input, it was maximal in that location. It's maximally shi-- uh, activated in that location. We zero out every other, um, entries of this matrix, and then we reverse the network. So we max pooled, we unpool. We ReLUed, we do the reverse. We do a deconv instead of the conv, which is a transpose convolution, subpixel version, flip, flip the filter, divide the stride by two. And we do that how many times? Three times, because we had three blocks. And then we should be able to reconstructWhat this activation was maximally activated for, and we get the cropped version as we learned, the cropped part of the image that this activation was looking at, and exactly the pixels that maximize its value. Does that make sense? It's pretty complex, but it's important to know these methods because you might run into something similar in the future, or be asked to sort of interpret certain feature maps, certain activation maps, et cetera. Um, okay. So some additional details that we'll cover is what is unpool and why do we do some ReLU in there. So very simply, um, let's say I take the max pooling layer, I max pool this, filter size two by two, stride of two. Um, if you wanted to unpool this, how would you do it? Are pooling layers, max pooling layers, um, invertible? Mm? No. Why? Yeah.
4. SPSpeaker
  [unintelligible]
5. KKKian Katanforoosh
  Yeah, exactly. It's not invertible because if you pick, you're right, the, the six here on the top left, you can tell that the six was in one of these four, but you don't know where it was. So you can't invert. And it's very important to know where it was, right? Um, so it's not invertible, but you could actually use a trick to make it invertible, um, which is passing what we call switches. So during the forward propagation, you look at all your pooling, your max pooling, and you remember with the binary matrix, so very lightweight matrix, where the pooling happened, where the max pooling happened. And then when you're doing the unpooling, you remember those switches, so you keep them in memory, and you pass them back, and that should tell you where the value came from. Yeah. So that's what we mean by unpooling. Okay? So I go back to my previous map. The only thing I need to change to be able to reverse engineer my network is to pass the switches and the filters, by the way, because the deconv is just the flip version of the filter with subpixel and stride divided by two. And so I do that. So you can see, you can, you can literally invert your network here and trace back from one activation to the input space. Um, and then for ReLU is a little odd. I'm not gonna spend too much time on it because it's more empirical than nothing. But ReLU forward is essentially zeroing out every value that is negative during the forward path. Technically, a ReLU backward is impossible unless you have also the switches. If you have the switches, um, you could actually, um, you know, pass linearly back whatever was kept because it's the identity function. Uh, but, but actually that would kill your, your, uh, positive signal coming back. So instead, you just reuse ReLU, uh, basically. You, you re-use a ReLU because you want to start from the activation that is the highest on your feature map and to keep passing the positive signal back to the input space. Don't worry too much about it. It's just that ReLU is just passed as a ReLU during the reconstruction process, not as a proper ReLU backward. Okay. So here we go. We send our dog through the network. Uh, we look at a specific, uh, max pool output. We take the feature map, we find the activation that is the highest in that feature map. We zero out all the rest. We reverse engineer our network. We find the cropped part of this dog with the pixels that led to that specific feature map finding. We are interpreting the filter that led that feature map. And you can do that anywhere across, um, across the network. But of course, if you're earlier, the crop is gonna be even smaller. If you're later, generally bigger. Okay. So you learned
1:05:27 – 1:12:57
Putting CNN interpretability together: Zeiler/Fergus and Yosinski visualization toolbox
1. KKKian Katanforoosh
  deconvs slash transpose convs. Now let's look at some practical, uh, visualizations from, uh, Matthew Zeiler and Rob Fergus. Uh, these, these are great researchers in the space of visualizations. Been making so much progress. Um, so they trained a network, okay. They looked at results on a validation set of fifty thousand images. And, um, so what you're seeing is the first layer, a-and specifically the patches. What the patches are, they're the top nine, uh, strongest activation per filter. Okay? So for each filter in the first layer, they look at the top nine strongest activation, and they re-remember the data point that was leading to that. So that's the dataset search method that we saw together. What are the nine images that led to that maximum, um, that, that, that feature map activating the most? Print them. Those are the patches for each filter. If you do that, you can already, um, interpret some of the filters, uh, by seeing that, oh, this one reacts to edges that are diagonal, or this one reacts to edges that are straight, for example. On the bottom right, uh, we actually print the filters raw. And of course, because it's the first layer, it is interpretable. So if in fact you have an edge detector, you should see when you print that matrix that it looks like an edge detector.That doesn't work for layers beyond one. So now let's l- go a little deeper. Now we're going layer two, and we're looking at the deconvs. So what are they doing? Um, [clears throat] they're essentially, um, they're essentially looking at the top one strongest activation per feature in the second layer. So the second layer has two hundred fifty-six feature maps. They're presented here. You pick one feature map, you look across all these fifty thousand validation images, you find the maximum, uh, feature map, you take the specific portion of that feature map, the specific entry that's maximally activated, you zero out the rest, you do your deconv, you pass the switches, blah, blah, blah, and you get the cropped part of the image that represents why it was activated, and this is printed all over here. And you can see-- you can start interpreting that by doing the top one, or you can do the top nine. And if you actually do the top nine, um, you would start seeing that in fact, a certain filter have very clear purposes. Some filters detect circles, some filters detect odd, um, you know, shapes. Yeah. If you keep, you know, doing that in layer three, you would start seeing, with the deconv method, that the filters are capturing more complex information. Remember in the first lecture we did together, I said that the deeper you go in the network, the more the information, um, um, adds up and you get more complex features later? This is a proof of that, pretty much. Now, if you go to layer three and you pro-- you, you, you can do, you can do all of it. You can do the top nine patches, where if the nine patches look very similar, you, you know, you can probably safely say that this, uh, filter was responsible for this type of, uh, shape or color or, um, um, you know, a salient feature and, uh, and then you can do the deconv as well, um, essentially. Let's watch together a very short video of Jason Yosinski-
2. SPSpeaker
  Recent advances in neural networks have enabled-
3. KKKian Katanforoosh
  That shows a little bit of everything we've learned together
4. SPSpeaker
  ... they can recognize school buses and zebras, and can tell the difference between Maltese terriers and Yorkshire terriers. We now know what it takes to train these neural networks well, but we don't know so much about how they're actually computing their final answers. We developed this interactive deep visualization toolbox to shine light into these black boxes, showing what happens inside of neural nets. In the top left corner, we show the input to the network, which can be a still image or video from a webcam. These black squares in the middle show the activations on a single layer of a network, in this case, the popular deep neural network called AlexNet running in Cafe. By interacting with the network, we can see what some of the neurons are doing. For example, on this first layer, the unit in the center responds strongly to light-to-dark edges. Its neighbor, one neuron over, responds to edges in the opposite direction: dark to light. Using optimization, we can synthetically produce images that light up each neuron on this layer to see what each neuron is looking for. We can scroll through every layer-
5. KKKian Katanforoosh
  We've seen that method
6. SPSpeaker
  ... in the network to see what it does-
7. KKKian Katanforoosh
  Class activation
8. SPSpeaker
  ... including convolution, pooling, and normalization layers. We can switch back and forth between showing the actual activations and showing images synthesized to produce high activation.
9. KKKian Katanforoosh
  This is a class model visualization method.
10. SPSpeaker
  By the time we get to the fifth convolutional layer, the features being computed represent abstract concepts. For example, this neuron seems to respond to faces. We can further investigate this neuron by showing a few different types of information. First, we can artificially create optimized images using new regularization techniques that are described in our paper.
11. KKKian Katanforoosh
  That's the class model visualization
12. SPSpeaker
  These synthetic images show that this neuron fires in response to a face and shoulders. We can also plot the images from the training set that activate this neuron most-
13. KKKian Katanforoosh
  That's the dataset search
14. SPSpeaker
  ... as well as pixels from those images most responsible for the high activations-
15. KKKian Katanforoosh
  And that's the deconv
16. SPSpeaker
  ... which can be viewed with a deconvolution technique. This feature responds to multiple faces in different locations, and by looking at the deconv, we can see that it would respond more strongly if we had even darker eyes and rosier lips. We can also confirm that it cares about the head and shoulders, but ignores the arms and torso. We can even see that it fires to some extent for cat faces. Using backprop or deconv, we can see that this unit depends most strongly on a couple units in the previous layer, conv four, and on about a dozen or so in conv three.
17. KKKian Katanforoosh
  So because of deconv, you can trace back the entire layers before and where the-
18. SPSpeaker
  From the top nine images-
19. KKKian Katanforoosh
  Okay. I'm, I'm gonna leave it to you, but you get the idea. The, these researchers built a toolkit that essentially reproduces some of the methods we've seen together, although we've seen more methods than what's in the toolkit. And so your, your, your kit is now able to answer many questions about convolution, such as, "Hey, what part of the input is responsible for the output?" We now know that we can use occlusion sensitivity or class activation maps. Um, what is the role of a neuron filter layer? We have many methods that can allow us to do that. Can we check what the network focuses on given the input image? We have methods to do that. And how does the neural network see our world? We have the gradient ascent class model visualization method that allow us to maximize, um, an input image with respect-- Sorry, find the input image that maximizes a certain activation. Super. So that was the first part, um, and then we're gonna move toward frontier ideas. Um, any
1:12:57 – 1:18:33
From CNNs to transformers: attention patterns and embedding-space sanity checks
1. KKKian Katanforoosh
  questions on CNNs? Do you feel like you have a better idea of how to look inside a CNN?Good. So let me start by comparing CNNs to more, uh, you know, modern frontier networks. Um, the core distinction is going to be that CNNs, uh, deal with localized information. They visualize edges, textures, and shapes when in modern, call it LLMs, um, we visualize relationships and meanings between concepts or between tokens. And this is because transformers are based on attention. And that started with the Attention Is All You Need paper, uh, which essentially explained why attention on its own is highly performant and can allow us to model very complex relationships. So, uh, you know, by the way, this is just the, the, the, the first figure of the Attention Is All You Need paper, which you should all be able to read and understand by now in the class. Um, and transformers really represent language, um, using two very simple ideas that are, uh, visualizable. They, they're-- We can interpret them to a certain extent. The first one is the attention, um, patterns. Um, you know, attention looks at the relationship between tokens. So you s- look at a specific token, which can be a word, a subword, or a syllable. You know, I'm gonna simplify by saying it's a word, and its relationship with other words in the training set, and that's the attention, um, that the transformer looking at it. Um, each attention head learns, uh, different patterns, so it can learn things like linking pronouns to nouns, or tracking structures, or enforcing a certain ordering. And then, um, I really like this, uh, visualization, which is from, um, Jesse Vig in twenty nineteen. Um, and this visualization essentially, um, shows you there's a very nice blog post that he wrote, um, with a few figures where you can see, um, he, he presents how, um, attention can be visualized in simple ways. What is the connection between a fixed token with the surrounding tokens, let's say. Um, so this is essentially the transformer analog to the CNN saliency maps that we looked at, pretty much. Uh, the other, uh, things that transformers or more modern language model use is embeddings. You know, during the pre-training phase, you're also learning embeddings. Um, you are ready to read the BERT paper, in fact, now with the baggage you have from the class. Um, and, um, what's interesting about embeddings, and I printed a picture here from, um, Garg in two thousand twenty-one. I also encourage you to see that, uh, sm-- short blog post where he uses a visualization method called t-SNE. It's a dimension reduction method. We're not gonna present it in the class, but it's taught actually a lot in the-- in, in biotech and healthcare. It's used very extensively, uh, for those of you who work with the Stanford Hospital. Um, and it allows you to visualize embeddings. And embeddings are sort of how the language model perceives our words. So you would expect tokens that should have similar semantic meanings to be next to each other in that space, or tokens that have nothing to do with each other to be far away from each other in distance. And that can be a way to sanity check that your model actually is learning meaningful, um, representations. All right? So together, attention and embeddings are what let large language models track relationship and meanings. And, um, and you can, uh, visualize your embeddings with dimension reduction tool. You can ev- uh, uh, visualize attention relationships as well. Um, unfortunately, the modern transformers are so complicated that even the cutting-edge research is only able to interpret those relationships with two-layer transformers, pretty much. The best you find out there is probably Anthropic's work, so I linked two papers. The first one is called "The Mathematical Framework for Transformer Circuits," which is essentially, um, explaining how the different components within a transformer interact with each other, uh, and they introduce the concept of a circuit. And then the second one is a follow-up to that paper called "In-Context Learning with Induction Heads." Induction heads are probably the best tool we have to sort of visualize what's happening inside a transformer. It's pretty complex. You should be able to understand it by now, uh, but you'd have to spend, um, quite some time to go deeper into it. I just will link them. We're not gonna talk about it for the sake of time. Uh, let's get to some fun stuff: training and scaling diagnostics. So how do labs, frontier labs, check if a model is training well?
1:18:33 – 1:46:53
Frontier model diagnostics: telemetry, scaling laws, benchmarks, safety, and data health
1. KKKian Katanforoosh
  We've talked about it in the first case study, but one, uh, very natural way is to look at our loss curves. You can look at the training loss, at the validation loss, and make sure that they follow sort of a smooth trajectory, and if it's not smooth, there's probably something that went wrong. You've probably trained your own network where some loss functions look very funky. I remember back in the days, there was even blogs where people would post their ugliest loss functions, and there was a lot on there. Um, you might find sudden jumps on the loss. That means, uh, maybe the, the batch that has been processed has been corrupted. Uh, maybe you're doing extremely well on it when you should actually not beYou know, do that well, and it might raise a flag. Um, you might, you know, find bugs in your code because of that. You might find gradients that are exploding, gradients that are vanishing. Um, all of that you could visualize at the loss function level. Now, note that loss functions can be run at a global level or at a l-- you know, on a specific subset of your data. We're gonna talk about it in data diagnostics. Um, the other things that are interesting to track also, sometimes referred to in the, in the, in the community as, uh, training telemetry, is, um, to watch and track your gradient norms, um, uh, look at your learning rate schedule, um, or even look at hardware efficiency metrics to feel if you've underutilized compute, which we talked about again in the first part. So imagine that if you're working at a major frontier lab, you probably have a, a dashboard that tracks your different loss function for different subset of the data, your checkpoints, all of that, you know. Uh, you would have all of that. Unfortunately, very few of these are published because they're IP. You know, they can't really-- They don't want to give it out, uh, because it will, um, leak essentially information about their architecture, about what's going well, what's not going well, et cetera. And that's why you find very few, um, information on these. The one thing that I, I, I, I-- you do find some charts on that are really helpful is scaling laws. So scaling laws, which we've talked about in a previous lecture, um, is essentially trying to understand the relationship between our model performance and some, um, other, um, call it hyperparameters, such as the model capacity, so the size of the model, the amount of compute that being used, or the dataset size. Uh, DeepMind has done amazing work, um, I think it was in two thousand twenty-two, a couple of years ago, uh, with Chinchilla. Um, this chart is, you know, borrowed from the Chinchilla paper, where essentially what I want you to look at here is, uh, they're comparing the Chinchilla model, the green star, to other models, including GPT-3, which came up, you know, a little before. And what they're showing is that, um, the scaling law is actually slightly different than what OpenAI thought. And they analyzed GPT-3, and they said GPT-3 is actually not performing well enough for the size that it is at. And it was found that GPT-3 was not trained for long enough. They essentially tr-- uh, explained that if you kept training GPT-3 for longer, um, you would have had way better performance, and it was not about the model size. In fact, the model was underutilized. So they plotted these lines. So the dotted line is what probably we thought in two thousand twenty, twenty-one, the scaling law, the power law would be, and they showed that it's actually not exactly that. Where, um, essentially the idea is they plot the, the, the, the, the full line here, and they say, "This is our analysis of the scaling law. If your star is above that line, it's probably that your model should be trained longer." Where if it's on the line, it's respecting the scaling laws that they're finding. And that's what's interesting about this Chinchilla paper. Yeah.
2. SPSpeaker
  So this is after OpenAI's scaling laws paper got published or after-
3. KKKian Katanforoosh
  So that's after the GPT-3 paper. Chinchilla came in twenty twenty-two saying, "You should have trained GPT-3 longer. You would have done better." And here's Chinchilla, uh, model that has, uh, less parameters than GPT-3, so seventy billion versus one seventy-five billion, um, and that is performing better.
4. SPSpeaker
  And that's only based on performance?
5. KKKian Katanforoosh
  Yes. Yeah. To go deeper, I also pulled a few charts, uh, from, um, that, that, that show you the power laws between, um, you know, the, the loss function. So on the vertical axis, you have the test loss, okay? That shows essentially your performance on the test set. And then on the horizontal axis, on the X-axis, you have compute, dataset size, and parameters. So how are those scaling laws established? Essentially, they fix two of them, and they vary the third one, and they see if things are respected. Meaning, let's say you're, um, increasing the-- you're keeping the same compute, the same, um, uh, dataset size, but you're, uh, training a model that is twice as big in log, uh, uh, in, in logarithms. Uh, what, what does it tell you about the performance, essentially? Are those scaling laws respected? And so what's nice now is that we have a precedent for scaling laws. So if you were actually training, um, such big models, you would, uh, compare to the scaling laws that are available out there. Remember, another reason these are important is because models' training runs are so expensive. It wasn't shared publicly, but we, we estimate that, uh, GPT-5 is probably in the hundreds of millions, you know. And so you wanna know, should I train that model twice longer or not? Because that's a big financial decision, right? And the scaling laws allow you to determine, should I invest in compute? Should I invest in, uh, uh, growing my dataset, so finding more data, or should I invest in model capacity, making my model bigger? Okay. So yeah, together these form sort of a health dashboard for the model. So that's training and scaling diagnostic, loss functions, you know, et cetera, um, health dashboard, scaling laws, all of that are things that researchers might use to get a broad sense of where to invest more in terms of, uh, improving their model.The other one is something we've already seen to a certain extent, is how models-- how labs evaluate model capabilities and safety, um, and they might do it with benchmark. So capability benchmark might be evaluating the model in, you know, tasks like reasoning or coding or math or multilingual tasks, et cetera. It might also be comparing checkpoints that help you understand how your model is improving over time, depending on what you're feeding it or depending on some, um, hyperparameters that you're tweaking. Um, and also, uh, error clusters. So just to tell you a little more about error clusters, um, if you actually use benchmarks across a wide variety of tasks, you might see that all the model checkpoint number five is actually doing very poorly at reasoning. Let's see why is that. So here are some examples, um, from a competitive math benchmark, um, twenty twenty-five AIM, um, published by OpenAI on GPT 5, and actually the bottom one is from today. This morning, Mistral announced their, um, their third-generation model. So I thought I'd pull it to show you that how real-time these things are, and just published today. You know, and they're comparing across reasoning and math and et cetera, um, against benchmarks. Now, the risk is, uh, are these benchmarks contaminated? You know, how, how can a benchmark be contaminated? Yeah.
6. SPSpeaker
  It was in the training data.
7. KKKian Katanforoosh
  Yeah, if it was in the training data. The problem is these models are trained on so much data online, you don't know. Maybe it was trained on a blog post where someone actually was presenting a benchmark and describing what the benchmark was about. Maybe it was trained on GitHub and there was a text file in a very shady part of the GitHub that was listing some of the test set, uh, information. All of those might contaminate benchmarks. How would you, um, identify that a benchmark has been contaminated? Test set has been contaminated.
8. SPSpeaker
  Isn't that kind of what-
9. SPSpeaker
  Like Llama Four kind of happened to it. Like every-- They showed really good results on benchmarks, but then when the model came out and people actually tried it on the similar tasks, it wasn't performing well. So it's like with testing it on synthesized questions and similar stuff.
10. KKKian Katanforoosh
  Yeah. Llama Four, you brought up, just to repeat, um, looked good on benchmark, looked poorly-- poor on pract- in practice after the community tested it. The general consensus, I mean, my opinion is I, I actually don't look too much at the benchmarks when a foundation model provider publishes them. Or in other words, I would look at them in relative value between models rather than absolute value. And you'll wait for the community to test it on agentic workflows, on their tasks, and then people will sort of get a taste for if it works or not and on what it works. So it took some time, for example, for the community to realize how good Claude was at coding. It's like it wasn't clear from-- It was clear from the benchmark, but others were also clearly good. But, you know, over time, people felt like, "Oh, wow, it's actually really good at coding." Yeah. Yeah, you had a question or no?
11. SPSpeaker
  No.
12. KKKian Katanforoosh
  Okay. Um, cool. So, oh yeah, contamination. So, you know, we, um, how, how to detect if a test set has been, uh, you know, exposed. Few, few methods. Um, some, you know, researchers might do a, a search within the data set. So let's say you have a training set and you have a held-out test set, and you actually look for n-grams. So you take, uh, sequences of tokens of size seven, eight, and you search through the training set and you find that, uh, their same n-grams is also found in the test set. There's a chance some of it has been contaminated. You can do it, uh, also, uh, with hashes or with embeddings. Actually, maybe the test set has been contaminated but not word for word, maybe semantically. And so you might do the same thing with an embedding and run a search and say that, "Oh, wow, this specific example from the test set, um, is found in the training set semantically." Very similar stuff. So you might actually run analysis to figure out if it's contaminated. And if you find that your test set has been exposed or you have a reason to think it's been exposed, what do you do? You would usually, uh, fix the test set. The test set is smaller. You would remove all those examples that you think are, um, exposed, and you would replace them with brand-new ones that are completely held offline in a folder that is separate, not available online, et cetera. Yeah. Uh, then there's the problem of safety evaluations. Uh, not gonna spend too much time on it, but safety, um, is, uh, important to foundation model providers. Uh, they stress-test their model under many adversarial attacks, jailbreaking, social engineering, misuses. Um, they also assess harmful, um, content generation, hallucination, privacy leakage, et cetera, et cetera. Um, and then they also look at how the foundation model behaves in an agentic workflow, as I was saying earlier. So not only evaluating it, uh, sort of one shot, but evaluating it in a workflow. Here are some examples of a actually very nice joint collaboration between OpenAI and Anthropic, uh, from last year, where they worked together to assess the safety of their models and they tried to jailbreak the model, to socially engineer, and they published some, um, um, findings on password protection or phrase protection. I linked it and I would encourage you to quickly look at it. They wrote prompts to try to extract a password from a model, um, and see if the model was good at not, um-You know, leaking the password, for example. So these dashboards, um, essentially inform the go, no-go decision for, uh, you know, releasing or for determining what the RLHF will be based on. So if you're actually going to do supervised fine-tuning or, uh, re- reinforcement learning from human feedback, you-- it's expensive, and you wanna do it on the stuff that's failing. So if you identify exactly which evals are failing, you will then use that information to focus the RLHF on that specific problem and save a lot of money and human time. Uh, lastly, let me see if there is anything else I wanted to mention here. Um, yeah. Yeah. Uh, let's talk about the data diagnostic, and we'll end on that. So, uh, data diagnostics. This is probably the last area that, uh, frontier models are very focused on. So how labs detect data issues. Um, there are many things you can do, but I, I really like distribution checks. So I, I pulled this chart from a paper called Pile, the Pile from twenty twenty, uh, where the Pile is a very large dataset, eight hundred gigabytes, that is made from diverse texts. Um, and they kept the data domain. So they explain using that figure what the dataset is made of. So the dataset might be made of, um, information from FreeLaw, or it might be made of Wikipedia, Stack Exchange, GitHub with coding. And so you have different domains within that dataset. And in fact, when you train a model on that, you can plot the loss function across the entire dataset, or you can plot the loss function across different domains within that dataset, which give you more intuition for, um, where it might be failing or working. And so you might wanna track domain proportions in your dataset. And domain proportions also matter because it is observed that if, um, certain, uh, domains are underrepresented in the data, the performance of the model on that domain is likely going to drop in comparison to another domain. So all things are not equal. You know, you-- if you actually have so much, you know... Re-remember with the speech recognition example where I said you have too many zeros and too few ones, and so the model just doesn't learn the ones. So. Um, yeah. This is also a problem with online learning. So imagine, you know, those frontier model, they're learning live, right? Like, oftentimes they're just being fed data constantly, and maybe the batch from the last month had very little coding data. And so the last portion of the training, the distribution of the coding domain, um, or the frequency was lower than other domains. And so sometimes you might see a drop in performance for a specific domain if you're not careful. That can be fixed with sampling, like smart sampling. You remember what we saw in reinforcement learning with the experience replay, where we actually kept experiences, and we put them in a replay memory, and then we sampled from that. Those are the types of method, sampling methods that allow a model provider to make sure they keep the frequency of data domain the same at different portions of the training. Yeah. Uh, yeah. Token statistics, just to mention it a little bit. Um, you know, you, uh, you wanna count the frequency changes for key, uh, tokens, which I was mentioning. So let's say, you know, um, a math token is underrepresented, um, that will be a problem. That, you know, the derivative symbol is underrepresented, that might actually lead to way worse performance for that specific task where you ask the model to make derivatives. Um, and so labs oftentimes monitor, uh, token drift or distribution or the frequency per token, and they use sampling to fix it. And finally, the contamination checks, which we have, um, talked about earlier. I also give you concrete examples. Not gonna go through all of them, but, you know, these are examples of token distribution drift reports, uh, tokenizer statistics that, um, issues that, that, that I, I raise here, um, or some anomalies on, um, uh, you know, corrupt data dis-- uh, uh, uh, detection. So if I, if I read one of the examples for you, um, let's say, um, let's pick this one maybe. Um, you know, no- non-English tokens increased from twelve percent to nineteen percent after a new web crawl, where that might mean that the domain of that specific language is increasing relative to others, and that might lead to a, an increase in performance or a drop in performance for a different language. As simple as that. Okay. To summarize everything, uh, what are examples of things that frontier labs monitor? Global training loss, validation loss, both global and domain-specific on the subset of the data. Uh, scaling curve alignment, you know, comparing your loss, test loss to your compute, to your dataset, or to your model capacity. Um, oh, we didn't talk too much about router, but m-mixture of experts. You know, imagine you have aA lot of the models that are top models right now are mixtures of experts, meaning that in your transformer, uh, block, um, for the multi-layer perceptron, there's actually multiple experts that are being trained in parallel, and there's a router that will route that batch of data to the right experts, to top one, top two, top three experts. And it's very common for the router to fail to or to always exploit the same mixture of experts. You need a mechanism to detect when this happens. And so you might have in your health dashboard some sort of a router information or whether the mixture of experts are all used as much as each other. Sometimes certain experts are gonna be so specialized that they're never gonna be used almost, and you wanna avoid that. And you might do some load balancing to avoid that. Um, you know, gradient norms, learning rates, um, checkpoint to checkpoint eval benchmark, token distribution, tokenizer statistic. We, we, we covered all of that at a high level. And as I said earlier, uh, frontier labs rarely publish, uh, those dashboards because it's IP and because it can, uh, leak certain deep information about their IP and how their models are trained. Um, and so you usually learn about these things year after. Yeah, you might, you might learn about these things on a model that last, that, that came out three years ago or four years ago, and they're okay now with sharing it. Yeah. Um, it's pretty common. Okay. Um, so closing remarks. Any questions first? Or, yeah.
13. SPSpeaker
  Um, do you-- So for example, like, Claude is training, like, coding models, right? So do they care more about, like, tokens that are shown in, like, in code in general, or are they, like, are they more worried about, like, having that diversity of tokens? Like
14. KKKian Katanforoosh
  Yeah. So you're asking if Anthropic is training Claude code, do they care mostly about coding data, or do they also add other data?
15. SPSpeaker
  Yeah.
16. KKKian Katanforoosh
  Yeah, it's a, it's a tough question. I don't have the exact answer. It's been shown that certain domains might improve the performance of other domains. So I imagine that in coding, you know, if you have math data, maybe math data actually helps the performance of coding, especially for functional programming, let's say coding languages that are more mathematical. Um, but I could clearly see that if they were training Claude code on web crawl and everything, it would not perform well because you would have so much crap data that is not relevant for what you're trying to get the model to do. And so I think there is a balance between those things.
17. SPSpeaker
  So is it safe to say, like, you would want to include, like, you would, like, you, you would wanna include neighboring domains as well?
18. KKKian Katanforoosh
  I think you could run experiments. So would you include neighboring domains? If I were training, um, Claude code, um, and I had a lot of money to do that, I would probably, uh, you know, maybe you would, you, you start with the Python language, and you get as much, and there's so much data on Python language, so you probably have enough already there. But then you want a model that scales to other programming languages. Well, probably Python is useful for C++. C++ is useful for Java. And then functional programming, if you're going to Elixir, um, you know, Scala, things like that, they're helping each other probably to a certain extent, uh, but you will need to have representation of those. I, I could see that... I'm, I'm, I'm pretty sure, and I don't work at Anthropic, so I don't know, but I'm pretty sure, let's say a language like Rust increases in popularity, which is the case, right? Um, and then in the data distribution, you start seeing more frequency of those tokens from Rust. Does that affect the performance on other languages? Probably, yes. That's my guess. And how do you track it? This is all what we talked about. Like, you know. Yeah.
19. SPSpeaker
  If you train on most of the, like, publicly available data or using synthetic data, would you think that would be able to increase performance?
20. KKKian Katanforoosh
  Yeah. Question is, um, you know, let's say we, we trained on everything online and now, uh, real data, and now what about synthetic data? Should we use it a lot? Should we use it, uh, uh, strategically? What's the future of that? Um, so, uh, depends of are you talking about general purpose models or not, specialized models. In general, uh, it is a good idea to do data augmentation, to use synthetic data. Although, I would always watch the, um, the token frequency, meaning, you know, you can't-- 'cause synthetic data is way cheaper. And so if you actually generate so much synthetic data of a certain data domain, and then it impedes on the rest and lowers the performance on the rest because the model just is always trained on that synthetic data, then that's a problem. In practice, I think also the returns of synthetic data might be plateauing at some point. Um, the recent news, I guess, and if you look at, uh, um, at the DeepMind paper, um, it's probably that we're lacking high quality data more than we're lacking synthetic data for most domains right now. Uh, but who knows if it's gonna be the case. You know, some other people would say what we're actually lacking is letting these, uh, agents play in RL environments in the wild and generate their own synthetic data, uh, or real data, but part of a game. Um, you know, nobody has quite the answer. I would just say the trend has gone from... Actually, you should look at the, a paper from Epoch AI. Maybe you've, you've, you've seen that already. But Epoch AI has a really nice research report which says by, I forgot the exact numbers, but it's in there. By two thousand and twenty, um, uh, two thousand twenty-five, the world would have exhaus- the, the frontier labs would have exhausted low quality data available online by two thousand-- in text. By two thousand twenty-seven, low quality data in audio, image, and video would have been exhausted. By two thousand thirty, high quality data would also have been exhausted. And at that point, it's like, what's next, you know? Um-Probably by that time, data is not gonna be the bottleneck anymore, uh, and it's gonna be more about model architecture, yeah, potentially.
21. SPSpeaker
  Are we producing more data than we're using to train models?
22. KKKian Katanforoosh
  Are we producing more data than we're using? Uh, probably yes, but it doesn't mean the models are not plateauing. The data that, you know, you go and you code in Python, your Python code is gonna be already online somewhere, most likely, or ninety-nine percent of it. So the model is actually not learning that much from it. It's just more data, not higher quality data, and that's why I think the plateau is there. Like, maybe the best radiologist in the world is producing a research paper that is so unique that it's high quality by definition of what the models feel is high quality today. But, you know, how much of that can we expect? Uh, yeah.
23. SPSpeaker
  Is there a risk that the model keeps training on data that informs the
24. KKKian Katanforoosh
  Yeah, it's risky. Like, i- is it risky for the model to... Like, the model, let's say, is online learning, so it's learning from new data being produced by everyone. Um, is that gonna, uh, risk the model performance to drop, um, essentially? That's what you're asking? Or-
25. SPSpeaker
  Like, the data produced by everyone, at the end of the day, is also data produced by AI because a lot of this data is produced-
26. KKKian Katanforoosh
  Yeah, yeah. In that case, yeah.
27. SPSpeaker
  Or a thing of itself.
28. KKKian Katanforoosh
  Yeah, the data produced is also coming out of AI, yeah, for sure. More today than before. Like, coding data today is increasingly used by, um, you know, by, by, uh... Is it increasingly generated, and so it's just fed back. So just long story short, it's not that interesting for training. Super. So closing remarks, uh, and reminder on what's next. Um, so by the way, I hope you feel after this lecture that you have a better understanding of the techniques that you can use in order to look inside a model, look outside a model, both for CNNs and for frontier models. Um, again, it's just a two-hour lecture. We don't have time to go so deep, as much as I would like it, in each of these domains. Um, I, uh, it's my last lecture this quarter, um, and so thank you for participating. I, um, enjoyed spending time with you all. Um, I hope you spend time on your projects. Projects can be very delightful in CS230. Over the years, I've seen people use their projects to get a job, to start a company, to make friends. And so I don't think you will regret putting time and effort into your projects, even if we don't have too much time left. Um, those are the, the last, um, you know, milestones for the-- or, or deliverables for the class. I hope you enjoyed the class. We're always looking for, uh, feedback, and so, um, I'm eager to hear from you all. Thank you. [audience applauding]

Episode duration: 1:46:53

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode Ozb1AR_F5MU

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

iOS

Android

Claude

Chrome

Lecture roadmap: from CNN interpretability to frontier-model diagnostics

Frontier-lab case study: sudden regression in a 200B-parameter checkpoint

What to probe “inside” an LLM: checkpoints, attention, sensitivity, MoE routing

Four buckets of evidence: training/scaling, representations, data, and multi-level evals

Zoo CNN case study: building trust with input–output explanations

Saliency maps and integrated gradients: pixel attribution for CNN decisions

Occlusion sensitivity: masking patches to test what regions matter

Real-time localization via CAM/Grad-CAM: fixing the interpretability bottleneck

Querying what the model “thinks”: class model visualization via gradient ascent

Dataset search for interpretability: top activating examples per filter and receptive fields

Reverse engineering CNN activations with deconvolution (transposed convolution) and unpooling

Putting CNN interpretability together: Zeiler/Fergus and Yosinski visualization toolbox

From CNNs to transformers: attention patterns and embedding-space sanity checks

Frontier model diagnostics: telemetry, scaling laws, benchmarks, safety, and data health

Get more out of YouTube videos.