Skip to content
Stanford OnlineStanford Online

Stanford CS230 | Autumn 2025 | Lecture 4: Adversarial Robustness and Generative Models

For more information about Stanford’s Artificial Intelligence professional and graduate programs, visit: https://stanford.io/ai October 14, 2025 This lecture covers adversarial robustness and generative models. To learn more about enrolling in this course, visit: https://online.stanford.edu/courses/cs230-deep-learning To follow along with the course schedule and syllabus, visit: https://cs230.stanford.edu/syllabus/ More lectures will be published regularly. View the playlist: https://www.youtube.com/playlist?list=PLoROMvodv4rNRRGdS0rBbXOUGA0wjdh1X Andrew Ng Founder of DeepLearning.AI Adjunct Professor, Stanford University’s Computer Science Department Kian Katanforoosh CEO and Founder of Workera Adjunct Lecturer, Stanford University’s Computer Science Department

Kian Katanforooshhost
Oct 21, 20251h 47mWatch on YouTube ↗

CHAPTERS

  1. Lecture roadmap: adversarial robustness + generative modeling (GANs and diffusion)

    Kian Katanforoosh opens the lecture by framing two modern, highly visual topics: adversarial robustness (attacks/defenses) and generative modeling. He previews that the second half will focus on GANs and diffusion models, clarifying that “adversarial” in GANs is unrelated to adversarial attacks.

  2. Attack landscape: three waves—adversarial examples, data poisoning/backdoors, prompt injection

    The class brainstorms real attack examples, then the instructor organizes the field into three historical waves. The discussion highlights how new agentic pipelines (tools, RAG, context) increase the number of attack surfaces beyond raw inputs.

  3. Forging targeted inputs: optimizing pixels to make a model predict “iguana”

    Students are guided to formulate adversarial example creation as an optimization problem over the input, not the network weights. The key shift is computing gradients with respect to pixels to directly search for an input that forces a target label.

  4. Why the optimized ‘iguana’ doesn’t look like an iguana: high-dimensional image space intuition

    The instructor explains that the set of all possible pixel configurations is astronomically larger than the manifold of natural images. Optimization can land in regions the model labels confidently (e.g., “iguana”) even though the image is meaningless to humans.

  5. Stealthier adversarial examples: keep the image looking like a cat but predict iguana

    The attack is made realistic by adding a constraint that the adversarial image stays close to a chosen clean image (e.g., a cat). This leads to imperceptible or small perturbations that preserve human perception while flipping the model’s label.

  6. Real-world demonstrations: adversarial misclassification and adversarial patches

    Several concrete examples show how small changes can cause big prediction shifts in deployed vision systems. The lecture also introduces adversarial patches that generalize across people, highlighting creativity in attack loss design (e.g., printability constraints).

  7. White-box vs black-box attacks and transferability

    The instructor defines attacker knowledge settings and explains how black-box attackers can still succeed. Attackers may train a surrogate model and exploit feature/common-gradient similarities across architectures, while defenders can rate-limit queries.

  8. Why neural nets are vulnerable: linearity + high-dimensional accumulation (logistic regression example)

    The lecture addresses the misconception that nonlinearity causes adversarial sensitivity. Using a logistic regression derivation, Katanforoosh shows how small, aligned perturbations across many dimensions compound into large logit shifts—explaining vulnerability at image-scale dimensionality.

  9. Fast Gradient Sign Method (FGSM): one-shot adversarial example generation

    FGSM is introduced as a practical, efficient way to craft adversarial inputs without iterative optimization. It perturbs every pixel in the direction that increases loss, scaled by a small epsilon to keep the change visually subtle.

  10. Defenses: sanitization, adversarial training, red teaming, and alignment methods

    Students propose defenses; the instructor consolidates them into a toolbox used in practice. Emphasis is placed on adversarial training as a dominant approach, plus operational practices like red teaming and post-training alignment.

  11. Backdoor/data poisoning attacks: trigger patches and mislabeled training examples

    The lecture explains how attackers can implant a ‘trigger’ during dataset creation/scraping so that specific patterns systematically flip predictions at deployment. This is framed as a major risk for web-scale training and model sharing ecosystems.

  12. Prompt injection and jailbreaks: direct vs indirect attacks in LLM applications

    Katanforoosh describes how prompt templates can be overridden by malicious instructions, and how agents using web/RAG tools introduce indirect prompt injection via retrieved content. Classic jailbreak-style examples illustrate how attackers reframe instructions to bypass refusals.

  13. Generative modeling overview: use cases and goal of matching data distributions

    The lecture transitions to generative AI, contrasting discriminative vs generative objectives. Use cases span text-to-image, super-resolution, inpainting, privacy-preserving synthetic data, multimodal systems, and code/audio/video generation—framed as learning the data distribution.

  14. GANs: generator–discriminator game, losses, and training stability tricks

    GAN training is presented as a two-network minimax setup: a generator maps random noise to images while a discriminator distinguishes real vs fake. The instructor derives typical discriminator/generator losses and explains why GANs are hard to train, including the saturating-loss problem and the non-saturating fix.

  15. GAN limitations and why diffusion won: mode collapse, evaluation issues, and latent arithmetic

    The lecture highlights GAN mode collapse, where the generator learns a narrow subset that fools the discriminator, and notes the broader challenge of evaluating generative quality (“vibes” vs metrics). It also introduces interesting GAN latent-space linearity (vector arithmetic) used for controllable edits in art workflows.

  16. Diffusion models: forward noising, training to predict noise, sampling via iterative denoising

    Diffusion is introduced as a single-model, stable alternative that avoids adversarial games. Training is self-supervised: images are progressively noised (forward process), and the model learns to predict the added noise so denoising can be reversed; inference starts from noise and iteratively denoises into an image.

  17. Latent diffusion + conditioning + video generation (Sora/Veo intuition)

    To reduce compute, diffusion is performed in a learned latent space via an autoencoder, then decoded back to pixels. The instructor explains conditioning (e.g., text prompts) to steer generation, and extends the idea to video by modeling spatiotemporal “cubes/tokens” that enforce frame-to-frame consistency.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.