Stanford OnlineStanford CS153 Frontier Systems | Mati Staniszewski from ElevenLabs on The Future of Voice Systems
EVERY SPOKEN WORD
65 min read · 13,394 words- AMAnjney Midha
Welcome to week two of CS153, also known as AI Coachella. We are super lucky to be kicking off this week with Mati. Mati is the founder and CEO of ElevenLabs. How many people here have heard of ElevenLabs? All right, so pretty much everybody. Mati and I go back a ways. About three years ago, I think, when I was still running platform at Discord, um, a friend said, "You know, Anj, there's a, a little bot, like a, a text-to-speech bot on Discord that's blowing up. Um, you should check it out." Um, and, you know, we had a lot going on at the time at Discord, and so I, I actually didn't, and I should have. And then a month later, somebody pinged me again a- and said, "You really should check out this bot." And I, I checked it out. It was called ElevenLabs, and it was quite an extraordinary bot. It, it, it was a, um, a Discord bot that allowed you to generate audio clips with just a text prompt. Uh, and within 24 hours, I'd asked one of our mutual friends, Nat Friedman, to introduce us. Mati was gracious enough to explain what they were working on. I had-- You let me come on as an angel investor, so thank you. Um, and since then, Mati has gone on to build one of the most, um, the fastest growing, uh, one of the most widely used, and I would say trusted brands and services in frontier audio and speech. Um, so thank you for joining us, Mati.
- MSMati Staniszewski
Thank you.
- AMAnjney Midha
Thank you so much.
- MSMati Staniszewski
Thank you so much. Good mor- good morning, everyone. It was also a crazy thing. Anj, Anj, uh, when we met for the first time, it was me and my co-founder, Piotr, uh, we both came from Google and Palantir before that. So we were trying to, like, redo the company setup from scratch of, like, what not to do, and we tried to, like, go against some of the lessons from those days. Um, so we were allergic to meetings. We were allergic to, um, to, like, any email-based communication internally. But we also want- wanted to not do any of the internal communication the standard way. So when we started, we actually ran the company on Discord.
- AMAnjney Midha
I did not know that.
- MSMati Staniszewski
So in that conversation, you were, you were helping us, A, on, on, on, on the text-to-speech. And we were trying to, like, figure out, is that the right play for us to base all the company on Discord? We swapped from s- to Slack-
- AMAnjney Midha
I know. Sad times
- MSMati Staniszewski
... uh, which, which was-
- AMAnjney Midha
I'm aware
- MSMati Staniszewski
... which was, uh, easier for Freddie. But that was, uh, that was an interesting few, few, few first months of trying to build all the bots on Discord to, like, make it easy and quick for us.
- AMAnjney Midha
Th-th-this was a bit of a theme we talked about last year, too, which is that often gaming ends up being this petri dish for innovation. Some of the hardest infra, product, design experience problems that are solved in gaming then become sort of, uh, um, leading indicators for the rest of the world. And the stuff you were doing, and a bunch of other, our friends were doing on Discord at the time, have ended up becoming indicative of, of, y-you know, value in AI a few years later. Is that-- Do you feel like that's an, a true assessment, or, uh, am I overfitting?
- MSMati Staniszewski
Yeah. No, I think the, the, the true part there, which, you know, we, we were following the journey model at the time of, like, how they've built that community piece on Discord. And for us at ElevenLabs, when we started, we knew that we want to fix two things. We want to fix the research and foundational models around audio and voice, and then build product around that to, to bring that AI into more of an applied AI setting and fix the problems that our customers are facing. We started on a very PLG-driven motion, so working on the product-led growth with a lot of the, the, the, the creators in the space of developers in the space. And we thought that the best way to do it is close the, the, the loop as close as possible to the people that are using those tools. And, like, Discord at the time and, and, and, and generally keeping access open to a lot of those creators and developers was the best way for us to learn, is it good enough? Is the quality finally there to, to serve the needs they have? To what are the use cases we might not predict that people might want to build so we can bring that, uh, quicker and then free? Um, and that's still a big tissue today of our work across that is, um, we want to work with the community to find a ways for them to contribute back to the product development. And of course, whether that's using the models to refine based on the data of, of how you use the model all the way through that can you contribute. In our case, we've created a voice marketplace where k- people can con-con-contribute their voice to, to be used by others. So that community aspect was very important, and it's always, uh, true where I feel the technology adopted by the community will show you use cases-
- AMAnjney Midha
Right
- MSMati Staniszewski
... that might, like, diffuse to the rest of the world six, 12, 18 months later. So being, like, close is, is super valuable. But more so, more so than not in that early days, just, uh, I think you need to be, like, extremely problem obsessed. What is the problem that they are having? And, and the variation of what you think the problem is to what the customer actually thinks is a problem is slightly, is slightly different.
- AMAnjney Midha
Okay. Let, let's actually stop. Uh, take, take a beat there. Can you go back, take us back in time. What was the problem you guys were obsessed with when you started Eleven? What is it today? How might it evolve? Give people a bit of a ElevenLabs 101. How do, how do we get here?
- MSMati Staniszewski
Cool. Cool. The whole chronology. I will, I will, I'll give you a zoom-in into that first day and then, and then, and accelerate over last few years. But when we started-- So I'm from Poland, my co-founder is from Poland. A very peculiar thing that happens in Poland is that if you watch a foreign movie in Polish, all the voices, whether that's a male voice or a female voice, get narrated with one single character. So you have one voice reading every character. As you can imagine, a pretty, pretty terrible experience. And you would think that with the modern technology, this is a problem that would have fix- been fixed, and no, it's still the case. So most of the content is delivered this way. So that was the-
- AMAnjney Midha
Wh- Whose voice was this? Who did the-
- MSMati Staniszewski
They have five characters. There's like five of those voices, usually monotone, male, deep, old voices. Uh, uh, and the, the-- It's also crazy because the part of the thing is they are kind of encouraged to deliver the, the movie in a flat delivery, so-
- AMAnjney Midha
Ah
- MSMati Staniszewski
... the audience can interpret the emotions for themselves, uh, which is like another-
- AMAnjney Midha
Wow
- MSMati Staniszewski
... another, another [chuckles] level.
- AMAnjney Midha
They're expecting a lot from the audience.
- MSMati Staniszewski
They do expect a lot. Um, so if you, uh, like any Polish person, if you ask them, they will, like, account how-Like, not good experience, that is. And when you learn English, you finally get to learn everything in original, and that's a, an extremely positive one. So that was, like, the first piece and inspiration for us. We know the future is different. The future will be where you can access all types of content in any language, uh, with that incredible tonality, incredible emotions. So, so, uh, so we left Google, we left Palantir at the time, and, um-
- AMAnjney Midha
Were you guys both, both in the Bay at the time?
- MSMati Staniszewski
We are both, uh, between Warsaw and London. So-
- AMAnjney Midha
So-
- MSMati Staniszewski
At the time when we started, we started in, in, in London.
- AMAnjney Midha
Right.
- MSMati Staniszewski
Then moved to Warsaw for a little bit, then moved back to London.
Episode duration: 1:06:25
Install uListen for AI-powered chat & search across the full episode — Get Full Transcript
Transcript of episode vfF011ko89o
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome