Skip to content
Stanford OnlineStanford Online

Stanford CS230 | Autumn 2025 | Lecture 10: What’s Going On Inside My Model?

For more information about Stanford’s Artificial Intelligence professional and graduate programs, visit: https://stanford.io/ai December 2, 2025 This lecture covers what's happening inside your model and provides a class wrap-up. To learn more about enrolling in this course, visit: https://online.stanford.edu/courses/cs230-deep-learning Please follow along with the course schedule and syllabus: https://cs230.stanford.edu/syllabus/ View the playlist: https://www.youtube.com/playlist?list=PLoROMvodv4rNRRGdS0rBbXOUGA0wjdh1X Andrew Ng Founder of DeepLearning.AI Adjunct Professor, Stanford University’s Computer Science Department Kian Katanforoosh CEO and Founder of Workera Adjunct Lecturer, Stanford University’s Computer Science Department

Kian Katanforooshhost
Dec 14, 20251h 46mWatch on YouTube ↗

At a glance

WHAT IT’S REALLY ABOUT

Interpreting CNNs and diagnosing frontier model training and behavior issues

  1. The lecture opens with a frontier-lab case study and organizes debugging evidence into four buckets: training/scaling telemetry, internal representations, data/distribution issues, and capability/safety evaluation results.
  2. For CNNs, it presents multiple ways to connect inputs to outputs—saliency maps, integrated gradients, and occlusion sensitivity—to verify whether predictions rely on the right image regions.
  3. It shows how architectural tweaks enable built-in localization via Class Activation Maps (CAM) and Grad-CAM by replacing deep fully connected stacks with global average pooling plus a final linear layer.
  4. It covers methods to probe what the network has learned, including activation maximization (class/neurons) and dataset search for top-activating examples, then extends to deconvolutional “reverse engineering” with unpooling switches.
  5. For frontier transformers, the lecture contrasts CNN locality with attention/embedding-based meaning, notes current interpretability limits, and emphasizes modern diagnostics: scaling laws, benchmark contamination checks, safety evals, and data distribution/token drift monitoring.

IDEAS WORTH REMEMBERING

5 ideas

Start frontier-model investigations with structured evidence, not ad-hoc guesses.

The lecture recommends gathering signals across training/scaling telemetry (loss, gradients, LR), internal representation probes (attention/embeddings), data/distribution checks, and eval/agentic workflow regressions to quickly narrow root causes.

Use pre-softmax logits for attribution-style interpretability in classifiers.

For saliency and activation maximization, post-softmax probabilities entangle all classes; pre-softmax class scores isolate the class of interest and avoid misleading attributions caused by changes in competing classes.

Occlusion sensitivity provides an intuitive, model-agnostic “where is it looking?” test.

By sliding a masking patch and tracking the target-class score change, you obtain a heatmap of regions critical to the prediction, at the cost of many forward passes (computational expense).

CAM makes localization easier by preserving spatial structure until the end.

Replacing multiple fully connected layers with global average pooling + a final linear layer enables a class activation map formed by a weighted sum of last-layer feature maps, giving a real-time, interpretable localization signal (often improved by Grad-CAM).

Activation maximization reveals what a class or neuron ‘wants to see,’ but needs regularization.

Gradient ascent on pixels can generate synthetic prototypes (e.g., Dalmatian as black dots on white), and regularization keeps images in natural pixel ranges so the visualization is interpretable rather than noisy artifacts.

WORDS WORTH SAVING

5 quotes

Your VP is wondering what's happening, and they ask, "What is going on?"

Kian Katanforoosh

If you do the saliency maps and you realize that the pixels that are bright when you compute that gradient are all over the place, it's probably that the model is not even looking at the right place. It's just getting lucky.

Kian Katanforoosh

Unfortunately, the modern transformers are so complicated that even the cutting-edge research is only able to interpret those relationships with two-layer transformers, pretty much.

Kian Katanforoosh

The general consensus, I mean, my opinion is I, I actually don't look too much at the benchmarks when a foundation model provider publishes them.

Kian Katanforoosh

Frontier labs rarely publish, uh, those dashboards because it's IP and because it can, uh, leak certain deep information about their IP and how their models are trained.

Kian Katanforoosh

Frontier-model debugging evidence bucketsSaliency maps vs integrated gradientsOcclusion sensitivity heatmapsCAM/Grad-CAM with global average poolingActivation maximization (class and neuron visualization)Dataset search for top activations and receptive fieldsDeconvolution / transposed convolution and unpooling switchesTransformer attention visualization and embedding plots (t-SNE)Transformer circuits and induction heads (Anthropic)Training telemetry and scaling laws (Chinchilla)Capability benchmarks, contamination, and community validationSafety evaluations and agentic workflow evalsData diagnostics: domain proportions, token drift, sampling strategiesMixture-of-Experts routing and load balancing

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.