Lex Fridman PodcastDawn Song: Adversarial Machine Learning and Computer Security | Lex Fridman Podcast #95
At a glance
WHAT IT’S REALLY ABOUT
Dawn Song on hacking AI: vulnerabilities, defenses, and data ownership
- Lex Fridman and Dawn Song explore computer security across classic software bugs, human-focused social engineering, and emerging attacks on machine learning systems.
- They discuss adversarial machine learning in depth: how models can be fooled at inference and training time, including physical-world attacks on stop signs, backdoored facial recognition, and black-box attacks on real services like Google Translate.
- Song explains parallel privacy risks, showing how trained models can leak sensitive training data and how techniques like differential privacy and confidential computation can mitigate this.
- The conversation broadens to data ownership, blockchain-based responsible data economies, program synthesis as a path toward intelligent machines, and philosophical reflections on meaning, creativity, and scientific collaboration.
IDEAS WORTH REMEMBERING
5 ideasSecurity vulnerabilities are unavoidable, but their impact can be reduced.
Formal verification and program analysis can prove specific properties (like memory safety) for real systems such as kernels and crypto libraries, yet the vast and evolving space of attack types means no complex real-world system can be guaranteed 100% secure.
The security weak point is increasingly human, not just code.
As systems harden, attackers shift “up the stack” to exploit people through phishing, social engineering, fake news, and deepfakes; AI-powered chatbots could act as user-side guardians that monitor conversations, challenge suspicious claims, and even interrogate attackers.
Machine learning models can be systematically fooled and backdoored.
Adversarial examples with tiny input perturbations can force misclassification, including robust physical attacks (e.g., perturbed stop signs) and training-time backdoors where a few poisoned samples cause models to behave normally except on special triggers like particular glasses frames.
Defenses work best when they exploit natural structure and redundancy.
Checks like spatial consistency in images (overlapping patches should yield similar segmentations) and temporal consistency in audio/video make life hard for attackers, and combining multiple sensors or modalities (vision, LIDAR, radar) further raises the bar for successful attacks.
Trained models can leak private training data unless designed otherwise.
Even without model internals, attackers can query language models trained on sensitive emails and recover actual Social Security or credit card numbers; training with differential privacy adds controlled noise in learning so models retain utility while sharply reducing such leakage.
WORDS WORTH SAVING
5 quotesSecurity is job security.
— Dawn Song
We are still at a very early stage of really developing robust and generalizable machine learning methods.
— Dawn Song
It’s almost impossible to say that a real world system is 100% no security vulnerabilities.
— Dawn Song
The weakest link of the system is oftentimes humans themselves.
— Dawn Song
Once we teach computers to write software—to write programs—then I guess computers will be eating the world by transitivity.
— Dawn Song
High quality AI-generated summary created from speaker-labeled transcript.
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome