Skip to content
Dawn Song: Adversarial Machine Learning and Computer Security | Lex Fridman Podcast #95
This video isn’t embeddableWatch on YouTube →
Lex Fridman PodcastLex Fridman Podcast

Dawn Song: Adversarial Machine Learning and Computer Security | Lex Fridman Podcast #95

Dawn Song is a professor of computer science at UC Berkeley with research interests in security, most recently with a focus on the intersection between computer security and machine learning. Support this podcast by signing up with these sponsors: - Cash App - use code "LexPodcast" and download: - Cash App (App Store): https://apple.co/2sPrUHe - Cash App (Google Play): https://bit.ly/2MlvP5w EPISODE LINKS: Dawn's Twitter: https://twitter.com/dawnsongtweets Dawn's Website: https://people.eecs.berkeley.edu/~dawnsong/ Oasis Labs: https://www.oasislabs.com Oasis Labs Twitter: https://twitter.com/OasisLabs PODCAST INFO: Podcast website: https://lexfridman.com/podcast Apple Podcasts: https://apple.co/2lwqZIr Spotify: https://spoti.fi/2nEwCF8 RSS: https://lexfridman.com/feed/podcast/ Full episodes playlist: https://www.youtube.com/playlist?list=PLrAXtmErZgOdP_8GztsuKi9nrraNbKKp4 Clips playlist: https://www.youtube.com/playlist?list=PLrAXtmErZgOeciFP3CBCIEElOJeitOr41 OUTLINE: 0:00 - Introduction 1:53 - Will software always have security vulnerabilities? 9:06 - Human are the weakest link in security 16:50 - Adversarial machine learning 51:27 - Adversarial attacks on Tesla Autopilot and self-driving cars 57:33 - Privacy attacks 1:05:47 - Ownership of data 1:22:13 - Blockchain and cryptocurrency 1:32:13 - Program synthesis 1:44:57 - A journey from physics to computer science 1:56:03 - US and China 1:58:19 - Transformative moment 2:00:02 - Meaning of life CONNECT: - Subscribe to this YouTube channel - Twitter: https://twitter.com/lexfridman - LinkedIn: https://www.linkedin.com/in/lexfridman - Facebook: https://www.facebook.com/LexFridmanPage - Instagram: https://www.instagram.com/lexfridman - Medium: https://medium.com/@lexfridman - Support on Patreon: https://www.patreon.com/lexfridman

Lex FridmanhostDawn Songguest
May 12, 20202h 12mWatch on YouTube ↗

CHAPTERS

  1. 0:00 – 6:03

    Security vulnerabilities are inevitable: why formal verification helps (but doesn’t solve everything)

    Lex and Dawn open with the foundational question of whether software will always have vulnerabilities. Dawn explains the broad, evolving nature of attacks (memory safety, side channels) and how formal methods can prove certain properties while still leaving other attack surfaces open.

    • Why “no vulnerabilities” is hard to define because attacks evolve
    • Classic memory-safety exploits (buffer overflows, control-flow hijack)
    • Side channels and other non-traditional vulnerability classes
    • Formal verification: what it can prove and where it falls short
  2. 6:03 – 8:57

    Static vs dynamic techniques: verification, testing, fuzzing, and what guarantees mean

    The discussion drills into what it means to verify a program and whether you can do so by inspecting code alone. Dawn contrasts static verification with approaches that execute code (testing, fuzzing, model checking) and explains why static methods are needed for broader guarantees.

    • Static analysis/verification vs running the program
    • Why runtime testing only covers limited scenarios
    • How formal verification scales to real systems (kernels, compilers, filesystems)
    • Security assurance as proving absence of certain classes of attacks
  3. 8:57 – 12:31

    Humans as the weakest link: social engineering, phishing, deepfakes, and AI defenses

    Dawn argues attacks are moving 'up the stack' from systems to people, making humans the primary vulnerability. They discuss phishing, fake news, and deepfakes, and how AI might help defend users rather than only hardening software.

    • Attack trends shifting from technical exploits to manipulating humans
    • Phishing and real-world examples of successful social engineering
    • Deepfakes/fake news as security threats targeting perception
    • Why 'patching humans' is fundamentally difficult
  4. 12:31 – 16:47

    Chatbots as personal security agents: detecting scams and issuing challenges

    Dawn describes a vision of NLP-based assistants that monitor conversations and intervene when patterns look suspicious. The bot could issue challenge-response prompts to verify identities and even engage attackers to gather intelligence.

    • NLP/chatbot systems to monitor conversations for scam signals
    • Challenge-response to test identity claims (e.g., “relative in trouble” scams)
    • Potential for bots to interact with attackers to learn more
    • Deployment models: platform-provided vs user-controlled agents
  5. 16:47 – 20:48

    Adversarial machine learning fundamentals: inference-time perturbations and training-time poisoning

    The conversation shifts to adversarial ML, defining the attacker’s goal as forcing wrong model decisions. Dawn distinguishes inference-time adversarial examples from training-time poisoning/backdoors, including targeted attacks that produce specific attacker-chosen outputs.

    • Threat model: fooling ML systems into incorrect decisions
    • Inference-time adversarial examples (small, often imperceptible perturbations)
    • Targeted vs untargeted misclassification
    • Training-time data poisoning and backdoor triggers
  6. 20:48 – 27:00

    Backdoor attacks in practice: facial recognition, ‘trigger’ objects, and stealthy poisoning

    Dawn explains how a small number of poisoned training points can implant a backdoor in a model. Using facial recognition, they discuss attacks where a trigger (like specific glasses) causes the system to label anyone wearing them as a chosen identity—tested even in the physical world.

    • Only a small fraction of poisoned samples can be sufficient
    • Stealth: model works normally except on attacker-triggered inputs
    • Physical triggers (glasses) that generalize across many faces
    • Difficulty of detecting poisoned data by human review or model inspection
  7. 27:00 – 35:58

    Physical-world adversarial attacks: stop signs, robustness across angles/distances, and real constraints

    They discuss Dawn’s work on robust physical attacks—modifying traffic signs so vision systems misclassify them under varying viewpoints. Dawn details the added constraints of the physical world: limited perturbation locations, camera capture effects, printing artifacts, and environmental variation.

    • Why physical attacks are harder than digital ones (but still feasible)
    • Robustness requirements: angle, distance, lighting, motion
    • Constraints: perturbations must be on-object; sensor and printer pipelines matter
    • End-to-end process: optimize perturbations with physical constraints in the loss
  8. 35:58 – 40:46

    What adversarial examples reveal about deep learning—and why defenses are still patchy

    Lex asks what adversarial examples imply about neural networks; Dawn argues they highlight limited understanding and insufficiently ‘right’ representations. They note the field has many more attacks than defenses, and common defenses (like adversarial training) provide limited, non-general protection.

    • Adversarial examples as evidence we’re early in robust ML
    • Models learning spurious features vs human-meaningful structure
    • Defense landscape: many attacks, fewer robust general defenses
    • Adversarial training as partial mitigation, not a full solution
  9. 40:46 – 47:46

    Consistency-based defenses: spatial checks for segmentation, extending to audio/video

    Dawn presents a more promising defense direction: leveraging natural task constraints via consistency checks. In semantic segmentation, spatial consistency across overlapping patches can detect adversarial manipulation; similar temporal consistency ideas extend to speech/audio and potentially video.

    • Semantic segmentation vulnerabilities (arbitrary attacker-chosen patterns)
    • Spatial consistency: overlapping patches should agree on labels
    • Why random patching makes consistent attacks harder
    • Temporal consistency for audio and combined spatiotemporal checks for video
  10. 47:46 – 51:24

    Real-world black-box attacks: model stealing and adversarial examples against cloud APIs

    Dawn pushes back on the idea that attacks are only lab demos by describing effective attacks on production services. They discuss stealing models via API queries (e.g., translation systems), training an imitation model, and transferring adversarial examples back to the original service.

    • Black-box threat: attacker only queries an API
    • Model extraction (imitation) with comparable performance
    • Transferability of adversarial examples from imitation to target
    • Examples in NLP (translation) and vision cloud APIs
  11. 51:24 – 57:33

    Autonomous driving risk: feasibility vs deployment, and multimodal sensor fusion as defense

    Lex probes whether attacks could realistically endanger systems like Tesla Autopilot. Dawn distinguishes feasibility from likelihood, argues misbehavior already happens even without attackers, and emphasizes multimodal sensing and cross-sensor consistency checks to raise attack difficulty.

    • Feasibility vs practicality of attacks in the wild
    • Natural generalization failures already cause dangerous mistakes
    • Targeted vs non-targeted misbehavior: both matter to safety
    • Defense direction: multimodal sensors (camera/radar/lidar/etc.) and consistency checks
  12. 57:33 – 1:01:52

    Privacy as confidentiality: how models leak training data and what attackers can do

    The conversation pivots from integrity to confidentiality, framing privacy attacks in ML as extracting information about training data. Dawn outlines white-box and query-only attack models, including membership inference and broader training-data reconstruction risks.

    • Security triad framing: integrity vs confidentiality (privacy)
    • Why high-capacity models can memorize sensitive training details
    • White-box attacks using access to parameters
    • Query-only attacks that infer training data without model internals
  13. 1:01:52 – 1:05:46

    Extracting secrets from language models—and defending with differential privacy

    Dawn describes work showing that attackers can extract SSNs and credit card numbers from a language model trained on sensitive emails using only queries. She then explains how differentially private training adds noise during optimization to preserve utility while preventing such extraction attacks.

    • Case study: language model trained on Enron emails
    • Query-based extraction of SSNs/credit card numbers
    • Differential privacy: noise/perturbations added during gradient-based training
    • Practical outcome: similar utility with significantly stronger privacy and failed attacks
  14. 1:05:46 – 1:32:12

    Data ownership and the ‘responsible data economy’: incentives, control, and new infrastructure

    Lex and Dawn discuss the societal question of who should own data and how ownership parallels property rights in economic development. Dawn argues clear ownership enables consent-based usage and better utility/privacy tradeoffs; they connect this to blockchain-based auditing and privacy-preserving computation (Oasis Labs).

    • Property rights analogy: ownership + enforcement as an economic driver
    • Data increasingly defines identity and value in the digital world
    • User choice models: free-with-data vs paid private services
    • Oasis Labs vision: confidential transactions/smart contracts, zero-knowledge proofs, secure computing, and immutable policy logs
  15. 1:32:12 – 1:44:54

    Program synthesis as an AGI ‘playground’: learning to write code and measuring progress

    The discussion turns to program synthesis—computers generating code from specifications or data—framed as a core path toward more general intelligence. Dawn outlines where it works today (narrow domains like NL-to-SQL) and identifies key metrics: program/task complexity, generalization, and adaptation to new tasks.

    • Program synthesis: teaching computers to write programs
    • Real-world traction in constrained domains (NL → SQL, IFTTT)
    • Progress metrics: complexity, generalization, and task adaptation
    • Learning recursion and its role in stronger generalization
  16. 1:44:54 – 2:12:36

    From physics to computer science, US–China collaboration, and the meaning of life

    Dawn reflects on her transition from studying physics in China to computer science in the US, contrasting elegance in physics with human-made complexity in CS and the joy of quickly realizing ideas. The conversation then touches on academic openness across borders and closes with a philosophical exchange on meaning, creation, and growth.

    • Why she switched from physics (Cornell) to CS (CMU) and what she kept from physics training
    • Cultural transition: earlier-era differences in access, language, and information
    • Science as borderless via open publication and code sharing
    • Meaning of life: self-defined purpose, creation, growth, and the value (and trap) of the question

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.