Sander Schulhoff: Why AI guardrails fail every red team test

How prompt injection and jailbreaks bypass state-of-the-art guardrails; agents that send emails or touch databases turn every bypass into real damage.

Sander SchulhoffguestLenny Rachitskyhost

Dec 21, 20251h 32mWatch on YouTube ↗

EPISODE INFO

Released: December 21, 2025
Duration: 1h 32m
Channel: Lenny's Podcast
Watch on YouTube: ▶ Open ↗

EPISODE DESCRIPTION

Sander Schulhoff is an AI researcher specializing in AI security, prompt injection, and red teaming. He wrote the first comprehensive guide on prompt engineering and ran the first-ever prompt injection competition, working with top AI labs and companies. His dataset is now used by Fortune 500 companies to benchmark their AI systems security, he’s spent more time than anyone alive studying how attackers break AI systems, and what he’s found isn’t reassuring: the guardrails companies are buying don’t actually work, and we’ve been lucky we haven’t seen more harm so far, only because AI agents aren’t capable enough yet to do real damage. *We discuss:*
The difference between jailbreaking and prompt injection attacks on AI systems
Why AI guardrails don’t work
Why we haven’t seen major AI security incidents yet (but soon will)
Why AI browser agents are vulnerable to hidden attacks embedded in webpages
The practical steps organizations should take instead of buying ineffective security tools
Why solving this requires merging classical cybersecurity expertise with AI knowledge
*Brought to you by:* Datadog—Now home to Eppo, the leading experimentation and feature flagging platform: https://www.datadoghq.com/lenny Metronome—Monetization infrastructure for modern software companies: https://metronome.com/ GoFundMe Giving Funds—Make year-end giving easy: http://gofundme.com/lenny *Transcript:* https://www.lennysnewsletter.com/p/the-coming-ai-security-crisis *My biggest takeaways (for paid newsletter subscribers):* https://www.lennysnewsletter.com/i/181089452/my-biggest-takeaways-from-this-conversation *Where to find Sander Schulhoff:*
X: https://x.com/sanderschulhoff
LinkedIn: https://www.linkedin.com/in/sander-schulhoff
Website: https://sanderschulhoff.com
AI Red Teaming and AI Security Masterclass on Maven: https://bit.ly/44lLSbC
*Where to find Lenny:*
Newsletter: https://www.lennysnewsletter.com
X: https://twitter.com/lennysan
LinkedIn: https://www.linkedin.com/in/lennyrachitsky/
*In this episode, we cover:* (00:00) Introduction to Sander Schulhoff and AI security (05:14) Understanding AI vulnerabilities (11:42) Real-world examples of AI security breaches (17:55) The impact of intelligent agents (19:44) The rise of AI security solutions (21:09) Red teaming and guardrails (23:44) Adversarial robustness (27:52) Why guardrails fail (38:22) The lack of resources addressing this problem (44:44) Practical advice for addressing AI security (55:49) Why you shouldn’t spend your time on guardrails (59:06) Prompt injection and agentic systems (01:09:15) Education and awareness in AI security (01:11:47) Challenges and future directions in AI security (01:17:52) Companies that are doing this well (01:21:57) Final thoughts and recommendations *Referenced:*
AI prompt engineering in 2025: What works and what doesn’t | Sander Schulhoff (Learn Prompting, HackAPrompt): https://www.lennysnewsletter.com/p/ai-prompt-engineering-in-2025-sander-schulhoff
The AI Security Industry is Bullshit: https://sanderschulhoff.substack.com/p/the-ai-security-industry-is-bullshit
The Prompt Report: Insights from the Most Comprehensive Study of Prompting Ever Done: https://learnprompting.org/blog/the_prompt_report?srsltid=AfmBOoo7CRNNCtavzhyLbCMxc0LDmkSUakJ4P8XBaITbE6GXL1i2SvA0
OpenAI: https://openai.com
Scale: https://scale.com
Hugging Face: https://huggingface.co
Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs through a Global Scale Prompt Hacking Competition: https://www.semanticscholar.org/paper/Ignore-This-Title-and-HackAPrompt%3A-Exposing-of-LLMs-Schulhoff-Pinto/f3de6ea08e2464190673c0ec8f78e5ec1cd08642
Simon Willison’s Weblog: https://simonwillison.net
ServiceNow: https://www.servicenow.com
ServiceNow AI Agents Can Be Tricked Into Acting Against Each Other via Second-Order Prompts: https://thehackernews.com/2025/11/servicenow-ai-agents-can-be-tricked.html
Alex Komoroske on X: https://x.com/komorama
Twitter pranksters derail GPT-3 bot with newly discovered “prompt injection” hack: https://arstechnica.com/information-technology/2022/09/twitter-pranksters-derail-gpt-3-bot-with-newly-discovered-prompt-injection-hack
MathGPT: https://math-gpt.org
2025 Las Vegas Cybertruck explosion: https://en.wikipedia.org/wiki/2025_Las_Vegas_Cybertruck_explosion
Disrupting the first reported AI-orchestrated cyber espionage campaign: https://www.anthropic.com/news/disrupting-AI-espionage
...References continued at: https://www.lennysnewsletter.com/p/the-coming-ai-security-crisis _Production and marketing by https://penname.co/._ _For inquiries about sponsoring the podcast, email podcast@lennyrachitsky.com._ Lenny may be an investor in the companies discussed.

SPEAKERS

Sander Schulhoff
guest
Lenny Rachitsky
host
Narrator
other

EPISODE SUMMARY

In this episode of Lenny's Podcast, featuring Sander Schulhoff and Lenny Rachitsky, Sander Schulhoff: Why AI guardrails fail every red team test explores aI guardrails are failing, exposing a looming agent-driven security crisis Sander Schulhoff argues that today’s AI security stack—especially guardrails and automated red teaming—is fundamentally ineffective against determined attackers. Because large language models have an effectively infinite prompt (attack) space, claims like “99% protection” are statistically meaningless, and humans routinely bypass state-of-the-art defenses in minutes. The real risk emerges not from chatbots alone but from AI agents, browsers, and robots that can take real actions (send emails, touch production systems, control hardware) and can be manipulated via prompt injection and jailbreaks. Schulhoff urges companies to refocus from buying guardrail products to combining classical cybersecurity with AI expertise, constraining permissions (e.g., CAMEL-style approaches), and educating teams before deploying powerful agents.

RELATED EPISODES

The GitLab way: Kindness, transparency, and short toes | David DeSanto (CPO)

Lessons from a 2-time unicorn builder, 50-time startup advisor and 20-time board member | Uri Levine

How to build deeper, more robust relationships | Carole Robin (Stanford professor, “Touchy Feely”)

The ultimate guide to product-led sales | Elena Verna

Why AI evals are the hottest new skill for product builders | Hamel Husain & Shreya Shankar

How to consistently go viral: Nikita Bier’s playbook for winning at consumer apps

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

iOS

Android

Claude

Chrome

Episode Details