Skip to content
Stanford OnlineStanford Online

Stanford CS153 Frontier Systems | Mati Staniszewski from ElevenLabs on The Future of Voice Systems

For more information about Stanford's online Artificial Intelligence programs, visit: https://stanford.io/ai Follow along with the course schedule and syllabus, visit: https://cs153.stanford.edu/ In week two of CS153 ("AI Coachella"), Anjney Midha interviews Mati Staniszewski, founder and CEO of ElevenLabs, tracing the company’s origins from an early Discord text-to-speech bot to a fast-growing frontier audio and speech platform. Mati explains ElevenLabs’ initial focus on solving AI dubbing inspired by Poland’s single-voice film narration, the shift to prioritizing emotional, natural-sounding text-to-speech for creators, and the evolution from cascaded pipelines (transcription, translation/LLM, and speech generation) toward real-time voice agents. They discuss tradeoffs between cascaded versus fused multimodal systems, efforts to detect and convey emotion, safety and voice authentication limits, on-device model deployment, collaboration with teams like Sesame, and business lessons on PLG plus enterprise deployment, team structure, pricing from customer value, and growth to over $430M revenue with ~450 employees. Guest Speaker: Mati Staniszewski is the CEO and co-founder of ElevenLabs, the AI voice/audio platform. Born in 1995 in a town outside Warsaw, Poland, he attended Copernicus Bilingual High School in Warsaw before earning a degree in mathematics from Imperial College London. While at Imperial, he organized Mathscon, a UK student-led mathematics conference. His earlier career included roles at Opera Software, BlackRock (where he worked in the Portfolio Analytics Group and helped launch the Aladdin Wealth platform), and Palantir Technologies (as a Deployment Strategist managing large-scale public- and private-sector implementations). In 2022, he co-founded ElevenLabs with his high school friend Piotr Dabkowski. He has raised hundreds of millions from investors, including Sequoia, Andreessen Horowitz, and Salesforce Ventures, with the company valued at $11 billion as of February 2026. He joined the board of Klarna in 2025 and was named to Forbes 30 Under 30 Europe in 2024 and TIME's 100 Most Influential People in AI in 2025. Follow the playlist: https://youtube.com/playlist?list=PLoROMvodv4rN447WKQ5oz_YdYbS74M5IA&si=DOJ5amlyRdyMJBhG

Anjney MidhahostMati Staniszewskiguest
May 4, 20261h 6mWatch on YouTube ↗

CHAPTERS

  1. CS153 kickoff: ElevenLabs’ origin story and the Discord bot that “blew up”

    Anjney introduces Mati Staniszewski (founder/CEO of ElevenLabs) and recounts discovering ElevenLabs as a viral text-to-speech Discord bot. Mati shares an early cultural detail: the company initially ran internally on Discord to stay close to creators and move fast.

  2. Gaming/creator ecosystems as a leading indicator for AI use cases

    They discuss how gaming communities often incubate future mainstream product patterns. Mati explains why creator/developer proximity and fast feedback loops shaped ElevenLabs’ product-led growth and roadmap.

  3. The original problem: fixing bad dubbing and unlocking global content access

    Mati traces ElevenLabs’ starting obsession to Poland’s “single narrator” dubbing style and the broader pain of low-quality localization. The founding vision becomes universal access to content in any language with preserved performance, emotion, and identity.

  4. Anatomy of AI dubbing: the cascaded pipeline and why it was hard in 2022

    Mati lays out the end-to-end dubbing system: transcription/speaker handling, translation, and voice generation—each needing to be strong to produce good results. In 2022, translation and overall pipeline quality weren’t there yet, leading to a strategic narrowing of scope.

  5. Product pivot: focus on text-to-speech as the “common denominator”

    User conversations revealed immediate demand for voiceover fixes, narration, audiobook creation, and script-to-audio workflows. ElevenLabs prioritized solving high-quality, emotional TTS in English first, rather than building the full transcription/translation stack immediately.

  6. Early research strategy: open source inspiration, papers, and pragmatic compute

    Mati describes how early progress came from combining open-source models and academic ideas not yet widely applied to audio. Tortoise (James Betker) is highlighted as a breakthrough baseline, while ElevenLabs focused on speed, stability, and longer-form generation.

  7. IP strategy and speed: why they skipped patents

    They reflect on deciding not to patent early innovations due to cost, rapid iteration, and limited strategic value. Mati frames IP defense as less useful than out-innovating, while acknowledging the nuisance of patent trolls.

  8. Capability timeline: 2022 TTS → 2023 voice cloning/marketplace → 2024 localization/dubbing → 2025 real-time agents

    Mati offers a year-by-year map of major audio breakthroughs and how ElevenLabs expanded scope. The arc moves from expressive TTS, to scalable multilingual narration and voice cloning, to high-quality localization, and finally to low-latency interactive voice agents.

  9. Cascaded vs fused/omni models: tradeoffs in emotional understanding, reliability, and latency

    They dig into why current voice systems often lose emotion/intent by converting audio to plain text, and what it takes to preserve paralinguistic cues. Mati frames the core architectural decision—cascaded components vs fused models—through three constraints: expressivity, reliability, and latency.

  10. Data and control: labeling emotion and enabling “director-style” steerability

    Mati explains that emotion-aware agents require labeled data and controllable generation. He describes building labeling pipelines and releasing agent features that detect emotion, pass it to the LLM, and render responses with appropriate tone; later, voice generation becomes steerable like a director guiding an actor.

  11. Collaboration culture at the frontier: ElevenLabs, Sesame, and the open ecosystem

    Anjney highlights Mati’s collaborative approach—publicly acknowledging peers and sharing insights—even amid competition. The discussion connects ecosystem collaboration (including open-sourcing models like Sesame’s CSM) to accelerating frontier progress beyond VC-defined categories.

  12. Scaling the business: revenue growth, team structure, and forward-deployed engineering

    Mati shares striking business metrics and the organizational model behind execution. He attributes predictability to forward-deployed teams that deliver value inside iconic enterprises, plus small autonomous teams that move quickly and accept being wrong.

  13. Pricing, safety, and security: value-based packaging, watermarking, and the end of voice authentication

    Mati outlines value-based pricing principles and addresses voice replication risks. He describes layered safety: traceability, fraud/scam detection, watermarking/detection tools, and guidance that voice should not be used as a secure authentication factor.

  14. Sovereign scale and geopolitics: Ukraine deployment, China distillation risk, and open vs closed trends

    Mati describes working with Ukraine’s digital-government ecosystem to add voice access to citizen services during wartime, and discusses strategic alignment with Western allies. The conversation then turns to distillation attacks, China’s competitive model ecosystem, and tensions between openness, IP norms, and safety standards.

  15. Five-year vision: leading audio foundation models, platform-as-infrastructure, and on-device TTS

    Mati projects ElevenLabs’ future as both a research leader in conversational audio and a platform enabling businesses/builders to deploy, monitor, and govern voice systems. He also previews on-device progress—useful for constrained TTS—while emphasizing that full interactive reliability remains a cloud/platform challenge.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome