Skip to content
Stanford OnlineStanford Online

Stanford CS153 Frontier Systems | Mati Staniszewski from ElevenLabs on The Future of Voice Systems

For more information about Stanford's online Artificial Intelligence programs, visit: https://stanford.io/ai Follow along with the course schedule and syllabus, visit: https://cs153.stanford.edu/ In week two of CS153 ("AI Coachella"), Anjney Midha interviews Mati Staniszewski, founder and CEO of ElevenLabs, tracing the company’s origins from an early Discord text-to-speech bot to a fast-growing frontier audio and speech platform. Mati explains ElevenLabs’ initial focus on solving AI dubbing inspired by Poland’s single-voice film narration, the shift to prioritizing emotional, natural-sounding text-to-speech for creators, and the evolution from cascaded pipelines (transcription, translation/LLM, and speech generation) toward real-time voice agents. They discuss tradeoffs between cascaded versus fused multimodal systems, efforts to detect and convey emotion, safety and voice authentication limits, on-device model deployment, collaboration with teams like Sesame, and business lessons on PLG plus enterprise deployment, team structure, pricing from customer value, and growth to over $430M revenue with ~450 employees. Guest Speaker: Mati Staniszewski is the CEO and co-founder of ElevenLabs, the AI voice/audio platform. Born in 1995 in a town outside Warsaw, Poland, he attended Copernicus Bilingual High School in Warsaw before earning a degree in mathematics from Imperial College London. While at Imperial, he organized Mathscon, a UK student-led mathematics conference. His earlier career included roles at Opera Software, BlackRock (where he worked in the Portfolio Analytics Group and helped launch the Aladdin Wealth platform), and Palantir Technologies (as a Deployment Strategist managing large-scale public- and private-sector implementations). In 2022, he co-founded ElevenLabs with his high school friend Piotr Dabkowski. He has raised hundreds of millions from investors, including Sequoia, Andreessen Horowitz, and Salesforce Ventures, with the company valued at $11 billion as of February 2026. He joined the board of Klarna in 2025 and was named to Forbes 30 Under 30 Europe in 2024 and TIME's 100 Most Influential People in AI in 2025. Follow the playlist: https://youtube.com/playlist?list=PLoROMvodv4rN447WKQ5oz_YdYbS74M5IA&si=DOJ5amlyRdyMJBhG

Anjney MidhahostMati Staniszewskiguest
May 4, 20261h 6mWatch on YouTube ↗

EVERY SPOKEN WORD

  1. AM

    Welcome to week two of CS153, also known as AI Coachella. We are super lucky to be kicking off this week with Mati. Mati is the founder and CEO of ElevenLabs. How many people here have heard of ElevenLabs? All right, so pretty much everybody. Mati and I go back a ways. About three years ago, I think, when I was still running platform at Discord, um, a friend said, "You know, Anj, there's a, a little bot, like a, a text-to-speech bot on Discord that's blowing up. Um, you should check it out." Um, and, you know, we had a lot going on at the time at Discord, and so I, I actually didn't, and I should have. And then a month later, somebody pinged me again a- and said, "You really should check out this bot." And I, I checked it out. It was called ElevenLabs, and it was quite an extraordinary bot. It, it, it was a, um, a Discord bot that allowed you to generate audio clips with just a text prompt. Uh, and within 24 hours, I'd asked one of our mutual friends, Nat Friedman, to introduce us. Mati was gracious enough to explain what they were working on. I had-- You let me come on as an angel investor, so thank you. Um, and since then, Mati has gone on to build one of the most, um, the fastest growing, uh, one of the most widely used, and I would say trusted brands and services in frontier audio and speech. Um, so thank you for joining us, Mati.

  2. MS

    Thank you.

  3. AM

    Thank you so much.

  4. MS

    Thank you so much. Good mor- good morning, everyone. It was also a crazy thing. Anj, Anj, uh, when we met for the first time, it was me and my co-founder, Piotr, uh, we both came from Google and Palantir before that. So we were trying to, like, redo the company setup from scratch of, like, what not to do, and we tried to, like, go against some of the lessons from those days. Um, so we were allergic to meetings. We were allergic to, um, to, like, any email-based communication internally. But we also want- wanted to not do any of the internal communication the standard way. So when we started, we actually ran the company on Discord.

  5. AM

    I did not know that.

  6. MS

    So in that conversation, you were, you were helping us, A, on, on, on, on the text-to-speech. And we were trying to, like, figure out, is that the right play for us to base all the company on Discord? We swapped from s- to Slack-

  7. AM

    I know. Sad times

  8. MS

    ... uh, which, which was-

  9. AM

    I'm aware

  10. MS

    ... which was, uh, easier for Freddie. But that was, uh, that was an interesting few, few, few first months of trying to build all the bots on Discord to, like, make it easy and quick for us.

  11. AM

    Th-th-this was a bit of a theme we talked about last year, too, which is that often gaming ends up being this petri dish for innovation. Some of the hardest infra, product, design experience problems that are solved in gaming then become sort of, uh, um, leading indicators for the rest of the world. And the stuff you were doing, and a bunch of other, our friends were doing on Discord at the time, have ended up becoming indicative of, of, y-you know, value in AI a few years later. Is that-- Do you feel like that's an, a true assessment, or, uh, am I overfitting?

  12. MS

    Yeah. No, I think the, the, the true part there, which, you know, we, we were following the journey model at the time of, like, how they've built that community piece on Discord. And for us at ElevenLabs, when we started, we knew that we want to fix two things. We want to fix the research and foundational models around audio and voice, and then build product around that to, to bring that AI into more of an applied AI setting and fix the problems that our customers are facing. We started on a very PLG-driven motion, so working on the product-led growth with a lot of the, the, the, the creators in the space of developers in the space. And we thought that the best way to do it is close the, the, the loop as close as possible to the people that are using those tools. And, like, Discord at the time and, and, and, and generally keeping access open to a lot of those creators and developers was the best way for us to learn, is it good enough? Is the quality finally there to, to serve the needs they have? To what are the use cases we might not predict that people might want to build so we can bring that, uh, quicker and then free? Um, and that's still a big tissue today of our work across that is, um, we want to work with the community to find a ways for them to contribute back to the product development. And of course, whether that's using the models to refine based on the data of, of how you use the model all the way through that can you contribute. In our case, we've created a voice marketplace where k- people can con-con-contribute their voice to, to be used by others. So that community aspect was very important, and it's always, uh, true where I feel the technology adopted by the community will show you use cases-

  13. AM

    Right

  14. MS

    ... that might, like, diffuse to the rest of the world six, 12, 18 months later. So being, like, close is, is super valuable. But more so, more so than not in that early days, just, uh, I think you need to be, like, extremely problem obsessed. What is the problem that they are having? And, and the variation of what you think the problem is to what the customer actually thinks is a problem is slightly, is slightly different.

  15. AM

    Okay. Let, let's actually stop. Uh, take, take a beat there. Can you go back, take us back in time. What was the problem you guys were obsessed with when you started Eleven? What is it today? How might it evolve? Give people a bit of a ElevenLabs 101. How do, how do we get here?

  16. MS

    Cool. Cool. The whole chronology. I will, I will, I'll give you a zoom-in into that first day and then, and then, and accelerate over last few years. But when we started-- So I'm from Poland, my co-founder is from Poland. A very peculiar thing that happens in Poland is that if you watch a foreign movie in Polish, all the voices, whether that's a male voice or a female voice, get narrated with one single character. So you have one voice reading every character. As you can imagine, a pretty, pretty terrible experience. And you would think that with the modern technology, this is a problem that would have fix- been fixed, and no, it's still the case. So most of the content is delivered this way. So that was the-

  17. AM

    Wh- Whose voice was this? Who did the-

  18. MS

    They have five characters. There's like five of those voices, usually monotone, male, deep, old voices. Uh, uh, and the, the-- It's also crazy because the part of the thing is they are kind of encouraged to deliver the, the movie in a flat delivery, so-

  19. AM

    Ah

  20. MS

    ... the audience can interpret the emotions for themselves, uh, which is like another-

  21. AM

    Wow

  22. MS

    ... another, another [chuckles] level.

  23. AM

    They're expecting a lot from the audience.

  24. MS

    They do expect a lot. Um, so if you, uh, like any Polish person, if you ask them, they will, like, account how-Like, not good experience, that is. And when you learn English, you finally get to learn everything in original, and that's a, an extremely positive one. So that was, like, the first piece and inspiration for us. We know the future is different. The future will be where you can access all types of content in any language, uh, with that incredible tonality, incredible emotions. So, so, uh, so we left Google, we left Palantir at the time, and, um-

  25. AM

    Were you guys both, both in the Bay at the time?

  26. MS

    We are both, uh, between Warsaw and London. So-

  27. AM

    So-

  28. MS

    At the time when we started, we started in, in, in London.

  29. AM

    Right.

  30. MS

    Then moved to Warsaw for a little bit, then moved back to London.

Episode duration: 1:06:25

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode vfF011ko89o

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome