Lex Fridman PodcastDawn Song: Adversarial Machine Learning and Computer Security | Lex Fridman Podcast #95
CHAPTERS
- 0:00 – 6:03
Security vulnerabilities are inevitable: why formal verification helps (but doesn’t solve everything)
Lex and Dawn open with the foundational question of whether software will always have vulnerabilities. Dawn explains the broad, evolving nature of attacks (memory safety, side channels) and how formal methods can prove certain properties while still leaving other attack surfaces open.
- •Why “no vulnerabilities” is hard to define because attacks evolve
- •Classic memory-safety exploits (buffer overflows, control-flow hijack)
- •Side channels and other non-traditional vulnerability classes
- •Formal verification: what it can prove and where it falls short
- 6:03 – 8:57
Static vs dynamic techniques: verification, testing, fuzzing, and what guarantees mean
The discussion drills into what it means to verify a program and whether you can do so by inspecting code alone. Dawn contrasts static verification with approaches that execute code (testing, fuzzing, model checking) and explains why static methods are needed for broader guarantees.
- •Static analysis/verification vs running the program
- •Why runtime testing only covers limited scenarios
- •How formal verification scales to real systems (kernels, compilers, filesystems)
- •Security assurance as proving absence of certain classes of attacks
- 8:57 – 12:31
Humans as the weakest link: social engineering, phishing, deepfakes, and AI defenses
Dawn argues attacks are moving 'up the stack' from systems to people, making humans the primary vulnerability. They discuss phishing, fake news, and deepfakes, and how AI might help defend users rather than only hardening software.
- •Attack trends shifting from technical exploits to manipulating humans
- •Phishing and real-world examples of successful social engineering
- •Deepfakes/fake news as security threats targeting perception
- •Why 'patching humans' is fundamentally difficult
- 12:31 – 16:47
Chatbots as personal security agents: detecting scams and issuing challenges
Dawn describes a vision of NLP-based assistants that monitor conversations and intervene when patterns look suspicious. The bot could issue challenge-response prompts to verify identities and even engage attackers to gather intelligence.
- •NLP/chatbot systems to monitor conversations for scam signals
- •Challenge-response to test identity claims (e.g., “relative in trouble” scams)
- •Potential for bots to interact with attackers to learn more
- •Deployment models: platform-provided vs user-controlled agents
- 16:47 – 20:48
Adversarial machine learning fundamentals: inference-time perturbations and training-time poisoning
The conversation shifts to adversarial ML, defining the attacker’s goal as forcing wrong model decisions. Dawn distinguishes inference-time adversarial examples from training-time poisoning/backdoors, including targeted attacks that produce specific attacker-chosen outputs.
- •Threat model: fooling ML systems into incorrect decisions
- •Inference-time adversarial examples (small, often imperceptible perturbations)
- •Targeted vs untargeted misclassification
- •Training-time data poisoning and backdoor triggers
- 20:48 – 27:00
Backdoor attacks in practice: facial recognition, ‘trigger’ objects, and stealthy poisoning
Dawn explains how a small number of poisoned training points can implant a backdoor in a model. Using facial recognition, they discuss attacks where a trigger (like specific glasses) causes the system to label anyone wearing them as a chosen identity—tested even in the physical world.
- •Only a small fraction of poisoned samples can be sufficient
- •Stealth: model works normally except on attacker-triggered inputs
- •Physical triggers (glasses) that generalize across many faces
- •Difficulty of detecting poisoned data by human review or model inspection
- 27:00 – 35:58
Physical-world adversarial attacks: stop signs, robustness across angles/distances, and real constraints
They discuss Dawn’s work on robust physical attacks—modifying traffic signs so vision systems misclassify them under varying viewpoints. Dawn details the added constraints of the physical world: limited perturbation locations, camera capture effects, printing artifacts, and environmental variation.
- •Why physical attacks are harder than digital ones (but still feasible)
- •Robustness requirements: angle, distance, lighting, motion
- •Constraints: perturbations must be on-object; sensor and printer pipelines matter
- •End-to-end process: optimize perturbations with physical constraints in the loss
- 35:58 – 40:46
What adversarial examples reveal about deep learning—and why defenses are still patchy
Lex asks what adversarial examples imply about neural networks; Dawn argues they highlight limited understanding and insufficiently ‘right’ representations. They note the field has many more attacks than defenses, and common defenses (like adversarial training) provide limited, non-general protection.
- •Adversarial examples as evidence we’re early in robust ML
- •Models learning spurious features vs human-meaningful structure
- •Defense landscape: many attacks, fewer robust general defenses
- •Adversarial training as partial mitigation, not a full solution
- 40:46 – 47:46
Consistency-based defenses: spatial checks for segmentation, extending to audio/video
Dawn presents a more promising defense direction: leveraging natural task constraints via consistency checks. In semantic segmentation, spatial consistency across overlapping patches can detect adversarial manipulation; similar temporal consistency ideas extend to speech/audio and potentially video.
- •Semantic segmentation vulnerabilities (arbitrary attacker-chosen patterns)
- •Spatial consistency: overlapping patches should agree on labels
- •Why random patching makes consistent attacks harder
- •Temporal consistency for audio and combined spatiotemporal checks for video
- 47:46 – 51:24
Real-world black-box attacks: model stealing and adversarial examples against cloud APIs
Dawn pushes back on the idea that attacks are only lab demos by describing effective attacks on production services. They discuss stealing models via API queries (e.g., translation systems), training an imitation model, and transferring adversarial examples back to the original service.
- •Black-box threat: attacker only queries an API
- •Model extraction (imitation) with comparable performance
- •Transferability of adversarial examples from imitation to target
- •Examples in NLP (translation) and vision cloud APIs
- 51:24 – 57:33
Autonomous driving risk: feasibility vs deployment, and multimodal sensor fusion as defense
Lex probes whether attacks could realistically endanger systems like Tesla Autopilot. Dawn distinguishes feasibility from likelihood, argues misbehavior already happens even without attackers, and emphasizes multimodal sensing and cross-sensor consistency checks to raise attack difficulty.
- •Feasibility vs practicality of attacks in the wild
- •Natural generalization failures already cause dangerous mistakes
- •Targeted vs non-targeted misbehavior: both matter to safety
- •Defense direction: multimodal sensors (camera/radar/lidar/etc.) and consistency checks
- 57:33 – 1:01:52
Privacy as confidentiality: how models leak training data and what attackers can do
The conversation pivots from integrity to confidentiality, framing privacy attacks in ML as extracting information about training data. Dawn outlines white-box and query-only attack models, including membership inference and broader training-data reconstruction risks.
- •Security triad framing: integrity vs confidentiality (privacy)
- •Why high-capacity models can memorize sensitive training details
- •White-box attacks using access to parameters
- •Query-only attacks that infer training data without model internals
- 1:01:52 – 1:05:46
Extracting secrets from language models—and defending with differential privacy
Dawn describes work showing that attackers can extract SSNs and credit card numbers from a language model trained on sensitive emails using only queries. She then explains how differentially private training adds noise during optimization to preserve utility while preventing such extraction attacks.
- •Case study: language model trained on Enron emails
- •Query-based extraction of SSNs/credit card numbers
- •Differential privacy: noise/perturbations added during gradient-based training
- •Practical outcome: similar utility with significantly stronger privacy and failed attacks
- 1:05:46 – 1:32:12
Data ownership and the ‘responsible data economy’: incentives, control, and new infrastructure
Lex and Dawn discuss the societal question of who should own data and how ownership parallels property rights in economic development. Dawn argues clear ownership enables consent-based usage and better utility/privacy tradeoffs; they connect this to blockchain-based auditing and privacy-preserving computation (Oasis Labs).
- •Property rights analogy: ownership + enforcement as an economic driver
- •Data increasingly defines identity and value in the digital world
- •User choice models: free-with-data vs paid private services
- •Oasis Labs vision: confidential transactions/smart contracts, zero-knowledge proofs, secure computing, and immutable policy logs
- 1:32:12 – 1:44:54
Program synthesis as an AGI ‘playground’: learning to write code and measuring progress
The discussion turns to program synthesis—computers generating code from specifications or data—framed as a core path toward more general intelligence. Dawn outlines where it works today (narrow domains like NL-to-SQL) and identifies key metrics: program/task complexity, generalization, and adaptation to new tasks.
- •Program synthesis: teaching computers to write programs
- •Real-world traction in constrained domains (NL → SQL, IFTTT)
- •Progress metrics: complexity, generalization, and task adaptation
- •Learning recursion and its role in stronger generalization
- 1:44:54 – 2:12:36
From physics to computer science, US–China collaboration, and the meaning of life
Dawn reflects on her transition from studying physics in China to computer science in the US, contrasting elegance in physics with human-made complexity in CS and the joy of quickly realizing ideas. The conversation then touches on academic openness across borders and closes with a philosophical exchange on meaning, creation, and growth.
- •Why she switched from physics (Cornell) to CS (CMU) and what she kept from physics training
- •Cultural transition: earlier-era differences in access, language, and information
- •Science as borderless via open publication and code sharing
- •Meaning of life: self-defined purpose, creation, growth, and the value (and trap) of the question