Skip to content
Stanford OnlineStanford Online

Stanford CS230 | Autumn 2025 | Lecture 10: What’s Going On Inside My Model?

For more information about Stanford’s Artificial Intelligence professional and graduate programs, visit: https://stanford.io/ai December 2, 2025 This lecture covers what's happening inside your model and provides a class wrap-up. To learn more about enrolling in this course, visit: https://online.stanford.edu/courses/cs230-deep-learning Please follow along with the course schedule and syllabus: https://cs230.stanford.edu/syllabus/ View the playlist: https://www.youtube.com/playlist?list=PLoROMvodv4rNRRGdS0rBbXOUGA0wjdh1X Andrew Ng Founder of DeepLearning.AI Adjunct Professor, Stanford University’s Computer Science Department Kian Katanforoosh CEO and Founder of Workera Adjunct Lecturer, Stanford University’s Computer Science Department

Kian Katanforooshhost
Dec 15, 20251h 46mWatch on YouTube ↗

CHAPTERS

  1. Lecture roadmap: from CNN interpretability to frontier-model diagnostics

    Kian frames the session as a broadened take on “interpretability,” spanning classic, well-understood CNN visualization methods and the much less-settled toolkit for frontier transformers. He previews a two-part structure: deep CNN interpretability methods first, then modern representation/training analysis topics like scaling laws, benchmarks, and data diagnostics.

  2. Frontier-lab case study: a new checkpoint regresses—what evidence do you inspect first?

    A hypothetical 200B-parameter model passes basic training sanity checks but worsens on reasoning, fails some safety evals, and shows a tool-use latency spike. The class brainstorms what to inspect before retraining or changing code, emphasizing systematic “first evidence” triage.

  3. Turning brainstorm into a framework: four diagnostic buckets for frontier models

    Kian organizes the troubleshooting ideas into four categories used in practice: training/scaling signals, internal representations, data/distribution issues, and capability evaluation across levels (model vs agentic workflow). This becomes a mental checklist for “what’s going on” investigations.

  4. Zoo CNN interpretability case study: building trust in an animal classifier

    A zoo wants confidence that the CNN’s decisions are reasonable, not arbitrary. The discussion covers how to communicate model outputs (softmax probabilities) and how feature hierarchies in CNN layers progressively capture more complex visual patterns.

  5. Saliency maps: gradient of class score w.r.t. input pixels (and why pre-softmax matters)

    Kian introduces saliency maps as a fast way to see which pixels most influence a class score. He emphasizes using the pre-softmax logit for the target class to avoid confounds from other classes in the softmax denominator.

  6. Integrated gradients: a more robust attribution than raw saliency

    Integrated gradients are presented as an extension that aggregates gradients along a path from a baseline (e.g., black image) to the real input. A medical retina example illustrates how attributions can align with lesions, improving interpretability for domain users.

  7. Occlusion sensitivity: masking regions to see how class probability changes

    Occlusion sensitivity provides an intuitive, region-based test: slide a dark square across the image and track how the true-class score changes. The resulting heatmap reveals which regions are necessary (score drops) or distracting (score increases) for a prediction.

  8. From post-hoc to “real-time”: Class Activation Maps (CAM) and Grad-CAM via GAP

    To support continuous visualization, Kian explains modifying CNN architectures to preserve localization by replacing multiple fully connected layers with global average pooling (GAP) plus a simpler classifier. CAMs are computed as weighted sums of final feature maps; Grad-CAM is noted as a common improvement.

  9. Class model visualization: synthesizing the model’s “ideal dog” via gradient ascent

    To probe what the model thinks a class looks like, the lecture uses gradient ascent on input pixels to maximize a class logit (with regularization for naturalness). Examples (Dalmatian spots, “goose as many geese”) reveal dataset biases and what cues the model has internalized.

  10. Dataset search for neuron/filter meaning: top-activating examples and receptive fields

    A practical interpretability workhorse is to find which real validation images maximally activate a chosen filter/feature map. Kian explains why visualizations are cropped: deeper activations have larger receptive fields, so you trace back the spatial region an activation can “see.”

  11. Reverse engineering CNN activations with deconvolution (transpose conv) + unpool switches

    The lecture derives how convolutions can be expressed as matrix multiplication and motivates “deconvolution” as reversing the operation, often approximated via transposed weights. For full CNN inversion-style visualization, you keep max-pool switches for unpooling and use transpose convolutions to reconstruct what caused a peak activation.

  12. Putting it together: Zeiler & Fergus + Yosinski toolkits show feature complexity by depth

    Classic visualization results demonstrate that early layers learn edges and simple textures, while deeper layers capture higher-level parts and concepts (e.g., faces). The Yosinski interactive demo ties together optimization-based synthesis, dataset top-activations, and deconv-style reconstructions for neuron understanding.

  13. From CNNs to transformers: attention patterns and embedding geometry (and limits today)

    Kian contrasts CNN locality with transformer representations based on token relationships and meaning. Attention maps and embedding visualizations (e.g., t-SNE) provide partial interpretability, but full internal understanding remains hard for large modern models; Anthropic’s “circuits” and induction heads are referenced as leading research directions.

  14. Frontier-model health monitoring: loss curves, telemetry, and scaling laws (Chinchilla lesson)

    The lecture shifts to “training and scaling diagnostics,” focusing on monitoring loss behavior, gradients, learning rates, and compute utilization. Scaling laws are highlighted as a decision tool for whether to invest in more compute, more data, or larger models—illustrated by DeepMind’s Chinchilla critique that GPT-3 was undertrained.

  15. Capability & safety evaluation: benchmarks, contamination detection, and agentic workflows

    Kian discusses how labs assess model capabilities (reasoning, coding, math) and safety (jailbreaks, leakage, harmful content), while warning that benchmarks can be contaminated. He outlines contamination detection via n-gram matching, hashes, and embedding similarity, and stresses evaluation beyond one-shot tasks—into tool-use and full agent workflows.

  16. Data diagnostics: domain proportions, token drift, sampling strategies, and MoE routing health

    The final technical section focuses on dataset composition (e.g., The Pile), domain-specific losses, and monitoring token statistics for drift. Kian notes mixture-of-experts risks (router overusing a subset of experts) and emphasizes sampling/load-balancing tactics to keep training stable and capabilities broad.

  17. Closing Q&A: synthetic data, data exhaustion, and feedback loops; course wrap-up

    In Q&A, Kian discusses tradeoffs in domain-focused training (e.g., coding models), how neighboring domains can help, and the limits/risks of synthetic data at scale. He also mentions concerns about data exhaustion and models training on AI-generated outputs, then closes with project encouragement and end-of-quarter remarks.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.