No Priors Ep. 143 | With ElevenLabs Co-Founder Mati Staniszewski

Imagine learning chess from a grand master, or negotiating tactics from an expert FBI hostage negotiator. ElevenLabs’ voice AI technology is making that unlock possible. Sarah Guo sits down with Mati Staniszewski, co-founder of ElevenLabs, to explore how the three-year old company is transforming how humans interact with technology through voice. Mati talks about the technical challenges of building foundational audio models, the strategic thinking between conducting research and deploying products in tandem, and why voice is the ultimate interface for everything from computers to robots to immersive media. They also discuss how the coming revolution of AI personal tutors will shift agentic AI from reactive to proactive support, break down language barriers globally, and even provide the framework for agentic government services. Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @elevenlabsio |@matiii Chapters: 00:00 – Mati Staniszewski Introduction 00:46 – 11 Labs: Growth and Scale 02:46 – Voice Technology and Applications 06:52 – Research and Product Development 12:36 – Voice Quality and Customer Preferences 17:54 – Agent Platform and Use Cases 23:21 – Choosing the Right Technology Partner 26:43 – The Role of Foundation Models 29:58 – Open Source Models and Future Trends 32:37 – Research and Development Focus 36:53 – Future of AI Companions and Education 41:37 – Conclusion

Sarah GuohostMati StaniszewskiguestElad Gilhost

Dec 11, 202541mWatch on YouTube ↗

EVERY SPOKEN WORD

90 min read · 17,524 words

0:00 – 0:46
Mati Staniszewski Introduction
1. SGSarah Guo
  (music plays) Hi, listeners. Welcome back to No Priors. Today, I'm here with Madis Senžus, the co-founder and CEO of ElevenLabs, which was founded to change the way we interact with each other and with computers with voice. Over three short years, they've skyrocketed to more than 300 million in run rate. Madis and I talk about the future of voice, education, customer experience, and the other applications of this voice, as well as how to build a multi-segment from self-serve to enterprise and combined research and product company. Welcome, Madis.
2. MSMati Staniszewski
  Sara, thanks for having me.
3. SGSarah Guo
  And thank you for doing this at seven in the morning.
4. MSMati Staniszewski
  Our pleasure. Thank you for doing that at seven in the morning. It's great we- we- we- we got to finally do this together.
0:46 – 2:46
11 Labs: Growth and Scale
1. MSMati Staniszewski
2. SGSarah Guo
  Uh, I think a lot of our listeners will have used or played with Eleven at some point, but for everybody else, can you just reintroduce the company?
3. MSMati Staniszewski
  Definitely. We at ElevenLabs, we are solving how humans and technology interact, how you can create seamlessly with that technology. Um, what this means in practice is we build foundational audio models, so models in a space to help you create speech that sounds human, understand speech in a much better way, or orchestrate all those components to make it interactive, and then build products on top of that foundational models. And we have our creative product, which is a platform for helping you with narrations, for audiobooks, with voiceovers, for ads or movies, or dubs of those movies to other languages. And our agents, uh, platform product, which is effectively an offering to help you elevate customer experience, build an agent for personal AI, education, new ways of immersive, immersive media. Uh, but all this kind of under-underlied with that mission of solving how we can interact with technology on our terms in a better way.
4. SGSarah Guo
  You started the company in 2022?
5. MSMati Staniszewski
  That's right.
6. SGSarah Guo
  And you've had amazing, like, rocket ship growth since then. I'm sure it's felt up and down different ways. I want to ask you about that. Uh, can you give a sense of what the scale of the company is today?
7. MSMati Staniszewski
  So we've grown to 350 people globally. We started from, from Europe. We started as a remote company, and are still first, remote first, but have hubs around the world, with London being the biggest, New York being second biggest, Warsaw, San Francisco, and now Tokyo, and, and one in Brazil. We are at, uh, 300 million in, in, in ARR, which is, uh, roughly 50/50 between self-serve, so a lot of subscription and creators using our creative platform, and then approaching 50, uh, percent on the enterprise side using our agents platform, uh, uh, work. And that's on the sales-led, uh, classic sales-led side. And we serve more than five million monthly actives on that, on that, on that creative, uh, side of the work. And then on the enterprise side, we have a few thousand customers from Fortune 500s to some of the fastest AI growing startups.
2:46 – 6:52
Voice Technology and Applications
1. MSMati Staniszewski
2. SGSarah Guo
  I think this is such a... You're an amazing founder, but I also think it's such an interesting company because it is, um, very unintuitive to, I think, many people, and investors in particular. I don't know if you faced this at the beginning, but, um, we were both there in 2022. There's a, there's a class of companies that allow creation in some way, when we look at your, like, first business, beyond the research itself. Uh, and I would put Eleven and Midjourney and Suno and HeyGen in this category, and I think there's, like, this overall sense of, like, who really wants to do this? Um, what was your initial read of, like, how many people want to make voices, or what made you believe that was going to be much broader than, uh... Like, if I look at dubbing, for example, it's not a huge market.
3. MSMati Staniszewski
  I think first piece was, which is, as you mentioned, there's like a very... It's very tricky to do both the product and the research.
4. SGSarah Guo
  Mm-hmm.
5. MSMati Staniszewski
  I'm in, uh, in a lucky position that I, uh, that my co-founder and I known each other for 15 years. I think he's the smartest person I know.
6. SGSarah Guo
  Mm-hmm.
7. MSMati Staniszewski
  And has been able to create a lot of that research work to be able to create that foundation to then elevate that experience. Um, but both of us are from, from Poland originally, and the original belief came from Poland. It's a, it's a very peculiar thing, but if you, if you watch a movie in Polish language, a foreign movie in Polish language, all the voices, whether it's a male voice or a female voice, are narrated with one single character. So you have, like, a flat delivery-
8. SGSarah Guo
  (laughs)
9. MSMati Staniszewski
  ... for everything in a movie. And-
10. SGSarah Guo
  A terrible experience.
11. MSMati Staniszewski
  It is terrible experience, and it's still... If, like, if you grow up, the, the, the... As soon as you learn English, you, like, switch out and you don't want to watch content in this way. Um, and it's crazy that it still happens until today in this way for a majority of, of, of content. Combining that, and I worked at Palantir, my co-founder worked at Google. We knew that that will change in the future, and that all the information will be available globally.
12. SGSarah Guo
  Mm-hmm.
13. MSMati Staniszewski
  And then as we started digging further, we realized-
14. SGSarah Guo
  Sorry, in, in every language in a high quality way.
15. MSMati Staniszewski
  Exactly.
16. SGSarah Guo
  That was kind of the starting point.
17. MSMati Staniszewski
  And, uh, and, and a ba- the big thing was, like, instead of having it just translated, um, could you have the original voice, original emotions, original intonation carried across.
18. SGSarah Guo
  Mm-hmm.
19. MSMati Staniszewski
  So, like, um, imagine having this podcast, but say people could switch it over to Spanish and they still hear Sara, they still hear Madis, and, and the same voice, the same, the same delivery. Um, which is kind of exactly what we did with Lex back when he interviewed Narendra Modi, and you could, like, kind of immerse yourself in that story a lot better.
20. SGSarah Guo
  Mm-hmm.
21. MSMati Staniszewski
  Um, so that was the original, uh, uh, kind of insight. And, um, and we, we then started digging further, which is that just so much of the technology we interact with will, will change. Whether this is how you create, it's still relatively tricky to bring voice alive. You, you, you, you would, you need to go through the expensive process of hiring a voice talent, having a studio space, having expensive tooling to then actually adjust it. The tooling isn't intuitive to be able to do this. So, like, all that creation process will and should change to make it easier for new people with keenness to bring that to life. Then a lot of the technology wasn't possible for you to be able to, um, recreate a specific voice or be able to create that in that high quality way. And then, of course, as we dived into further and, and shifted away from the static piece, the whole interactive piece is still crazy in the way it functions, where most of us...... seen this technological ev- evolution over last, over last decades, but you still will spend most of your time on a keyboard, you will look at a screen, and- and that interface feels broken. It should be where you can communicate with the devices through- through speech, through the most natural interface there is, one that kind of started when the humanit- humanity started and, um, and we realized we want to, we want to solve that. And I think now, fast-forward from 2022, I feel like many people will carry that belief too, that voice is the interface of the future, as you think about the devices around us, whether it's smartphones, whether it's computers, whether it's robots, speech will be one of the key ones. But I think 2022, it- it wasn't, and, um, and as we think about the market for the creative side, or w- uh, for- for- for that interactive side, it was, like, very clear it will be a huge, a huge, huge
6:52 – 12:36
Research and Product Development
1. MSMati Staniszewski
  one.
2. SGSarah Guo
  So even when you think about, uh, just the research part of your business, and then you have products for at least two different markets, and then you have this larger mission, a lot has changed in the last five or ten years. But it used to be, like, a very strongly held traditional belief of, like, one must do one thing well in a startup and there's no other path, like you're treating this like an interaction company, platform company. How did you think about sequencing, like, the research and the product effort?
3. MSMati Staniszewski
  Mm-hmm.
4. SGSarah Guo
  Does that make sense? Or like thinking about new markets?
5. MSMati Staniszewski
  Mm-hmm.
6. SGSarah Guo
  And maybe wrapped up in that question too is just, like, well where are we in quality on- on voice as well? Because i- i- if- if, um... I- I would sort of claim, like, if the models are not good enough for certain use cases at all, like, uh, it kind of doesn't make sense to do product.
7. MSMati Staniszewski
  And I think that's right. It's- it's almost exactly like when we- when we started. Originally what we- what we did was try to actually use existing models that were in the market and kind of optimize them for our first use case was actually starting with combination of- of narration and dubbing, and then, um, that creative side. And, um, we realized pretty quickly that the models that existed just produced such a robotic and- and- and not- not good speech that people didn't want to listen to it, and that's where my co-founder's genius came in, where he was able to assemble the team and- and do a lot of the research himself to actually create new version of- of creating that work. But, like, to your question, I think that the way we are kind of organized internally, and how we think about sequencing a lot of that, was looking at a first problem, and then creating effectively a lab around that problem, which is like a combination of mighty researchers, engineers, operators to go after that problem.
8. SGSarah Guo
  Mm-hmm.
9. MSMati Staniszewski
  And the first problem was the problem of voice, so how can we recreate the- the voice? And like you say, it needs to have that research expertise to be able to do that well. So we started with effectively a voice lab, which was that mission of can we narrate the work in- in- in- in better way. There was a combination of roughly five people that were, that were doing that work. And then sequenced the research first, and then built a simple layer on top of that work to- to allow people to use that work, and then kind of expanded from there with a holistic suite for creating a full audiobook and then creating a full movie narration, uh, movie dub. Um, and then we moved to the next problem, which was the realization that, okay, we have solved the voice, great for making content sound human.
10. SGSarah Guo
  Mm-hmm.
11. MSMati Staniszewski
  The first problem, for that to be useful for us to interact with the technology, you need to solve how you bring the knowledge on demand into that.
12. SGSarah Guo
  Mm.
13. MSMati Staniszewski
  So we effectively started then the second team, which was a second lab, uh, uh, an- an agent lab effectively-
14. SGSarah Guo
  Mm-hmm.
15. MSMati Staniszewski
  ... which was a team that would combine researchers, engineers, and operators once more, uh, which would try to fix, okay, we have text-to-speech, how do they now combine this with LLMs and speech-to-text and orchestrate all those components together, while integrating that with other systems to make it easier? And then similarly, you know, you- you- you kind of expand from looking just at the voice layer into how those systems work together. And here too, you need the research expertise to do that in a low-latency way, efficient way, accurate way. Um, but at the same time, there is the product layer that starts forming, that it's not only the orchestration that matters. It's also the integrations of how you link up to the legacy systems, how you build functions around it, or how you deploy that in production and test, monitor, evaluate over time.
16. SGSarah Guo
  Do you feel like you were creating new use cases when you built the tools? Did people know that they wanted to do this already? Um, because o- o- one argument like that I remember hearing was like, "Ah," like, you know, "enterprises don't know what to do with voice, how many people really want to do it," and then you're serving essentially like perhaps the, like, creator/publisher side of your business.
17. MSMati Staniszewski
  Right.
18. SGSarah Guo
  Yeah.
19. MSMati Staniszewski
  It's de- definitely a combination of, like, initiatives that we believe will happen in the world, and then, like, response to a lot of that.
20. SGSarah Guo
  Yeah.
21. MSMati Staniszewski
  Like as I think back, we, you know, m- m... of course voice, the internal voice lab or agents lab then kind of that kickstarted so many of the other labs in response to the problems. We started a music lab because people wanted to create music with ElevenLabs, so it's a fully licensed model where people wanted to use and create speech but they wanted to add music in a- in a simple way.
22. SGSarah Guo
  Mm-hmm.
23. MSMati Staniszewski
  We wanted to deliver that. And then, of course, that kind of came together through how do we combine music, audio, sounds. Uh, we are now integrating partner models from image and video into that suite, is how could you combine all of that in- in one? And all of that was in response to the market saying us like, "Hey, we would love this." And then you will have completely different use cases even in that space, let's say dubbing. Dubbing is a use case that we didn't feel there was like a- a- a big push for- for that, but we- we knew that in the ideal world in the future, you will be able to have that content delivered naturally around the languages still carrying that. Um, and I still think actually this market will be immense because it's not going to be only the static delivery in movies, but if you travel around the world and want to communicate-
24. SGSarah Guo
  In real time. Yeah.
25. MSMati Staniszewski
  ... in real time, like the full Babel Fish idea from Hitchhiker's Guide to the Galaxy. This will happen. It will be like the biggest, uh, like the whole breaking down language barriers, the- the
26. EGElad Gil
  Mm-hmm.
27. MSMati Staniszewski
  ... there are the barriers to communication, to creation, like all of that will break, and- and- and that will be like the foundational real-time dubbing concept. So super excited about that part. And similarly on- on the- on the agent side, you, um...You, you, you are like some obvious things that, of course, customers that we work with or partners will, will want to, want to integrate, which is, we want integrations with XYZ Systems. But then there are like other parts that might not be as easy to predict of, as you interact with technology, you of course want to understand what's happening. But you also want to understand how the things are being said and bring that into the fold, which would be something we try to prioritize on our side. So then the people, when they actually interact with the technology, they realize, "Oh, expressive thing is actually so, so much more enjoyable and beneficial and helpful."
12:36 – 17:54
Voice Quality and Customer Preferences
1. MSMati Staniszewski
2. SGSarah Guo
  So I want to ask you a question about this, which relates to quality. Um, uh, you know, I work with a series of companies where we're selling a product to, uh, the buyers are s- generally not machine learning scientists, right?
3. MSMati Staniszewski
  Right. Right.
4. SGSarah Guo
  And even the, the scientific community does not have the, like, full suite of evals and benchmarks-
5. MSMati Staniszewski
  Right.
6. SGSarah Guo
  ... to understand every domain well. This is a well-known problem. But I imagine for a lot of your customers, it's not like they, like, know how to choose good voice. So how do you, how do you deal with that problem? Like, is it like a, "Hey, I make a clone," and like, "That sounds like me and I believe it and I'm gonna try all of these different options," or, or, or, you know, actually, are you teaching people to do eval?
7. MSMati Staniszewski
  It's a great question because I think there are, like, two big problems. One is, like, how do you benchmark the general space in audio where, like you say, it's, like, so dependent on the specific voice? Let alone, like, if you are training it to interactive, then it's, like, even more tricky.
8. SGSarah Guo
  Mm-hmm.
9. MSMati Staniszewski
  Um, and then the second piece, which is, as you are working on a specific use case, how you select a voice. So I'll, I'll take the fi- second front first, which is, uh, we have like a voice sommelier effectively-
10. SGSarah Guo
  Mm-hmm. (laughs)
11. MSMati Staniszewski
  ... with us. We work with, with enterprises. We, we, we deploy that person to work with them and help them navigate. That person is like a voice coach, has an incredible voice themselves. And, uh, and now we have, like, a team under that person that, like, will partner to help you find what's the right branding of any voice.
12. SGSarah Guo
  Yeah. And now you have, like, the celebrity marketplace and such. Yeah. Mm-hmm.
13. MSMati Staniszewski
  And now we have a celebrity marketplace to, like, help you even get the in- iconic talent in there, like Sir Michael Caine. That piece was important because of course the voice will depend on the use case that you are trying to build, the language, all of that will, will have an impact of what's the right voice for your customer base. So we have effectively a, uh, um, a, a, a voice person helping those companies. And some companies will be very opinionated on what they want. So they will sometimes select it themselves, sometimes give us a, a brief of, "Hey, we want a voice that sounds professional, neutral. It's calming." We recently had a company, one of the, (laughs) one of the biggest European companies that wanted, uh, that gave us a brief, which was very original, where, uh, that, uh, they wanted as robotic voice as possible.
14. SGSarah Guo
  Okay.
15. MSMati Staniszewski
  Which was un- counterintuitive.
16. SGSarah Guo
  Yeah.
17. MSMati Staniszewski
  Um, but for example-
18. SGSarah Guo
  So you're like, "We can't do that anymore." (laughs)
19. MSMati Staniszewski
  (laughs) Almost. But we were, like, trying to go backwards of, like, "How do we do that?"
20. SGSarah Guo
  (laughs)
21. MSMati Staniszewski
  But I think we, we got a good result. Uh, uh, but recently we had a company in, in Japan where, um, Japan and Korea, where they wanted to serve different voices depending on the customer that's calling in.
22. SGSarah Guo
  Mm-hmm.
23. MSMati Staniszewski
  They have a older populationing and a very younger population. The younger one, they wanted, like, one of the famous voices in the market that's very excitable and happy. Uh, and for the older one they wanted, like, a calm, slow-speaking one. We help a lot with that. So that's on the voice piece. And I do think it's going to be a, a big and important one.
24. SGSarah Guo
  So like a personalized choice and then it can even be dynamic in a customer?
25. MSMati Staniszewski
  Yes.
26. SGSarah Guo
  Okay.
27. MSMati Staniszewski
  Exactly.
28. SGSarah Guo
  Yeah.
29. MSMati Staniszewski
  Exactly. And then maybe in the future it's, like, going to be, like, a fully... Depending on your interaction, you will-
30. SGSarah Guo
  Yeah.
17:54 – 23:21
Agent Platform and Use Cases
1. MSMati Staniszewski
2. SGSarah Guo
  Can you talk about what's happening on, uh, the agent platform side? Like, what is challenging for, you know, businesses or even creators that are trying to build agents and what the, what, maybe what the surprising or high traction use cases are? I think everybody's kind of aware of the idea of like agent-based customer support, but I imagine you're doing many things beyond that.
3. MSMati Staniszewski
  Yeah, so the, uh, exactly, customer support is probably the one that's, like, kicking off the quickest, and, and that's the, the one that, like, we see overtaking so many use cases, whether it's why I work with Cisco or Twilio or TELUS Digital. All of, all of them are kind of elevating that to a high extent. I think the second exciting piece within that domain which is happening is the shift from effectively a reactive customer support, I have a problem, I'm reaching out to customer support-
4. SGSarah Guo
  Mm-hmm.
5. MSMati Staniszewski
  ... into more of, like, a proactive part of the experience customer support. So to make it explicit, uh, we work with the biggest e-commerce, uh, um, uh, shop in India, Meesho, where they started working on the customer support side with, "I want a, a refund," "I wanna see the tracking of the, the package," to actually having an agent be a front part of the experience. So if you go to the website-
6. SGSarah Guo
  Mm-hmm.
7. MSMati Staniszewski
  ... you can, you have, um, you have the, the widget, you can engage it for voice, and you can ask it, "Hey, can you help me navigate to item X, item, uh, Y?" Or, "Can you explain me what's the right thing for me to give up for a gift for this period of time?" And then it will actually help you based on your questions, based on what is on the offer, show you those items, navigate to the right parts of the piece, maybe go all the way through the checkout. And I think this will be a phenomenal thing of, like, elevating the full experience, where that's more of an assistant across the whole thing. We got kicked off our work with Square that enables other businesses to do that work, exactly the same pattern. Started with voice ordering, uh, how can now this be part of the full, uh, discovery experience too, where you get items shown to you, you can have a lot more explanation, which I think will be a phenomenal piece, where, where it's, uh, uh, effectively from the beginning to the end. So that's one category. The second one is the wider shift from static to immersive media, where there's just so much incredible stories in IP that today exist in effectively one way of delivery, and now you'll be able to interact with that content in a completely new way. We, uh, I think one of the incredible use cases was working with Epic Games. We worked with them on bringing the voice of Darth Vader and Darth Vader into Fortnite, where millions of players could interact with Darth Vader live in the game, where you had, like, a full experience of, of Darth Vader in a, in a, in a, in a new way. And I think this will be a theme across, whether it's talking to a book, talking to the character that you like, to the whole, the whole space shifting. And then I think the one that's, that I'm most excited about for the world and for the shift is going to be education, where you'll just be able to have, like, effectively a personal tutor, uh, on your headphone. You could, like, actually study something in a, in a, in an amazing way. I'll give you, like, two quick examples. One is, uh, we recently worked, uh, with chess.com. I'm a, I'm a huge fan of chess. Uh-
8. SGSarah Guo
  Mm-hmm. Me too.
9. MSMati Staniszewski
  ... I'm a huge chess fan. Okay, great. So you can learn chess, but you can have Hikaru Nakamura or Magnus Carlsen be your, uh, your teacher of how you deliver that, uh-
10. SGSarah Guo
  Cool.
11. MSMati Staniszewski
  ... which is amazing. Or even Botas sisters, or it's, like, all, all the plethora of different players that engaged with that, which I think is great. And then maybe a last one, which is a master class who we worked with to shift from, you can, of course, have the content and go through step by step-
12. SGSarah Guo
  Mm-hmm.
13. MSMati Staniszewski
  Um, but you can also have, like, an interactive experience. And the best example of that was working with Chris Voss-
14. SGSarah Guo
  Mm-hmm.
15. MSMati Staniszewski
  ... the FBI negotiator, one of the top negotiators-
16. SGSarah Guo
  Mm-hmm.
17. MSMati Staniszewski
  ... who has a masterclass lesson, but then you can actually call him and-
18. SGSarah Guo
  Ah.
19. MSMati Staniszewski
  ... have a practice negotiation-
20. SGSarah Guo
  Okay.
21. MSMati Staniszewski
  ... which is crazy.
22. SGSarah Guo
  Yeah, gotta get that hostage out. We'll definitely try it.
23. MSMati Staniszewski
  (laughs)
24. SGSarah Guo
  Yeah. Um-
25. MSMati Staniszewski
  Can I add one more? I think the one ad- one last one which combines all of them together, which I, which I realized just recently is, uh, which was crazy. So recently I went to, uh, to Ukraine, where we are working with Ministry of Transformation, where they are effectively creating a first agentic government.
26. SGSarah Guo
  Mm-hmm.
27. MSMati Staniszewski
  And the crazy thing is they have all of those use cases-
28. SGSarah Guo
  Agentic government, sorry.
29. MSMati Staniszewski
  Agentic government.
30. SGSarah Guo
  Okay.
23:21 – 26:43
Choosing the Right Technology Partner
1. SGSarah Guo
  Um, can I ask you a business model question here? Because-
2. MSMati Staniszewski
  Of course.
3. SGSarah Guo
  ... looking at the strategic landscape, um... Actually, I have many questions here. Um, one of the observations I'd have is if I look at one of these, like, rich voice and action agent experiences, there's a lot of, uh, let's say, Fortune 500 Global 2000 leaders who listen to the pod, uh, they, I think a lot of them are gonna buy the idea of like, "I want this amazing, um, automatic, like, real-time available, 24/7, every language experience for my customer, that's consistent and high-quality." The ways I might get there include working with a Palantir or a large consulting firm, uh, working with Eleven or a, like, platform technology company or, or like an OpenAI or something, right? Let's talk about that. Uh, or working with a sort of more, uh, use case oriented company like Sierra.
4. MSMati Staniszewski
  Mm-hmm.
5. SGSarah Guo
  Right?How do you think about how people are making that decision or how they should make that decision?
6. MSMati Staniszewski
  The... So, so my past is also in Palantir.
7. SGSarah Guo
  Right.
8. MSMati Staniszewski
  So I started e- e- exactly kind of from, from that side, and we do blend a lot of the forward deployed engineering inside of the company too. As I think about the kind of our offering and, and the customers making that choice, if you are looking just as, as a like one pointed solution, uh, and only that one, then likely we aren't the best choice. If you are looking to deploy that across a plethora of different experiences, so be it customer support, but then you also want internal training.
9. SGSarah Guo
  Mm-hmm.
10. MSMati Staniszewski
  Then you might want to elevate your sales part and actually increase the top line with new experiences of how you engage customers beyond that kind of reactive piece.
11. SGSarah Guo
  Mm-hmm.
12. MSMati Staniszewski
  Um, then it's a great platform to build. And then we effectively, as we engage with customers, combine that platform work with, uh, with our engineering resources to help those companies deploy on that. Or, which we also see increasingly in, um, in Fortune 500s, G2- G2000s where they will want to build parts of the things themselves because they already have a lot of the investments in that platform, while then engage us on some of the, the nuance and combine those. And I, and, and I think that our model and the way it's different to, to a lot of the use case specific ones is that our platform is relatively open where you can use pieces of that platform and not all of them.
13. SGSarah Guo
  Mm-hmm.
14. MSMati Staniszewski
  Um, for, for those different use cases. Palantir of course will, will... Or, or some of the consulting companies will have a lot more resources to go in the wider digital transformation journey. In our case, it's like very specific conversational agents.
15. SGSarah Guo
  Mm-hmm.
16. MSMati Staniszewski
  It's like if you are looking for new interface with customers, that's the, the best way. And, um, and companies like Sierra, phenomenal of course on, on how they are thinking about the, the specific pointed, uh, um, use case. And then maybe the other piece is, uh, like as we think about our work, depending on how you are... what you are optimizing for, so we, we have a, a lot of international partners. If you have like a, a wider geographic user base, great. That's what we optimize for. Our voices, our languages, our support for integrations internationally are just so much broader, which is frequently a, a piece that you will look into. Depending on, um, e- your exact scope, this will be... This will be a big factor, but I would summarize that if you are looking for a solution across the... a set of different use cases that you want our engineering help and deploy that, then we are the right solution and probably the best solution.
26:43 – 29:58
The Role of Foundation Models
1. MSMati Staniszewski
2. SGSarah Guo
  I want to talk a little bit about maybe like OpenAI and the foundation, L&M foundation model companies. One of the reasons Elade and I call this podcast No Priors is because we're like, okay, people are making a lot of assumptions all the time about how the market is gonna work, and lo and behold, like many of those assumptions end up being nonsense actually. And you ha- you, you can't... You have to very much decide your own narrative at this point in time. I think, correct me if I'm wrong, like in 2022 and '23, you probably heard a lot of people say like, "Google can do this and OpenAI can do this," and like why do you get to persist working on voice anyway as a general capability? Wh- what's the answer?
3. MSMati Staniszewski
  That also adds, adds a kind of a lo- another element to, to, to that, a couple of the other previous questions where, where is agent's work? Where is the creative work? To deploy the value in those, in those work, you need a very strong product layer. You need integrations, you need to help people deploy the work, which is the most common piece. But our superpower and our focus for a long time was building the foundational models to actually make that experience seamless. And as I think about the other companies in the market, they will optimize for a lot of other things and that, that will be like the differentiator, um, in our case where we will make the whole experience, especially with voice, seamless, human, controllable in a, in a much better way.
4. SGSarah Guo
  And so fundamentally, you would argue that like the labs just aren't going to focus on this and haven't.
5. MSMati Staniszewski
  Exactly.
6. SGSarah Guo
  (laughs)
7. MSMati Staniszewski
  So I think most of those companies, and that's the thing about the long term, it's going to be incredible research and an incredible product that meets customers where they are and work backwards from there. I don't think the, the labs will focus on building that product layer that's so important.
8. SGSarah Guo
  Mm-hmm.
9. MSMati Staniszewski
  But I think the, the, you know, part of the question that you're asking is like how, uh, or/and- and/and why they haven't done even the research part-
10. SGSarah Guo
  Mm-hmm.
11. MSMati Staniszewski
  ... to the quality that, that we've been able to... Us here. I'm also biased, but we are happily beating them on benchmarks with text to speech or speech to text or the orchestration mechanisms. And, uh, credit to Michael Founder and the team, uh, that they've been able to do it, is, is just like mighty researchers just continuing th- their work. But I think the main part that I think is different in AudioSpace is that you don't need the scale as much as you need the architectural breakthroughs, the model breakthroughs to really-
12. SGSarah Guo
  Uh-huh.
13. MSMati Staniszewski
  ... to really make a, a dent, and, um, and we've been able to do that couple of times. And I think the number of people doesn't matter, but the people that you do does. We think there's maybe 50 to 100 researchers in AudioSpace that could do it. We think we have probably 10 of them, um, in the company that, um, that are some of the best ones. And I think this like obsession of just those people working across and then actually giving the full focus on the company on making them actually work on that and bringing their work to production, seeing how the users interact back was, was so important. So that's the, that's I think how we, how we been able to create models, um, better than some of the, the, the top companies out there. But you know, the truth is, it's like, to a large extent is, why they weren't able to do it is also like an interesting... We, we don't know.
14. SGSarah Guo
  (laughs)
15. MSMati Staniszewski
  It's like... It's a, it's a, it's a... They are like s-... They have such an incredible talent there too.
29:58 – 32:37
Open Source Models and Future Trends
1. MSMati Staniszewski
2. SGSarah Guo
  How do you think at the same time about, um, like open source models?
3. MSMati Staniszewski
  Anyone you ask in the company I think will say that same, and the second narrative we think about is, in the long term, models will commoditize or the differences between will be negligible.
4. SGSarah Guo
  Mm-hmm.
5. MSMati Staniszewski
  For some use cases, they will still matter.
6. SGSarah Guo
  Mm-hmm.
7. MSMati Staniszewski
  For most, like the long t- like the most use cases they, they want.
8. SGSarah Guo
  Mm-hmm.
9. MSMati Staniszewski
  And, um...
10. SGSarah Guo
  And they'll be broadly available and
11. MSMati Staniszewski
  ... they will be broadly available.
12. SGSarah Guo
  Yeah.
13. MSMati Staniszewski
  Exactly.
14. SGSarah Guo
  Mm-hmm.
15. MSMati Staniszewski
  And we don't know where that is, whether it's two years, three years, four years, but it's, it's going to happen at some stage. Then of course you'll have a fine tuning layer that will matter a lot, um, on, on top of those models. But like the base models I think will get pretty good. Um, and that's why for us, the product piece is so...... important from the company perspective, but also from the value perspective.
16. SGSarah Guo
  Mm-hmm.
17. MSMati Staniszewski
  Because if you have a model, that's great, but to actually connect your business logic and knowledge to, um, to be able to have the right interface for creating a, an ad for your work or a completely new material, that's a, that's a very different exercise. Um, but open source models are getting... if I split it into two, like more of that async content narration, I think narration is pretty much open source is great, commercial models are great, differences are, are getting smaller on the, on the out of the box quality. What most of the models haven't figured out, um, and I think we, we were, is how to make them controllable.
18. SGSarah Guo
  Mm-hmm.
19. MSMati Staniszewski
  So that's the, kind of the narration piece. I think the whole interaction piece of how you orchestrate the components together, whether that's cascaded speech-to-text or length text-to-speech approach, or whether in the future it's a fused approach where you train them together, I think this is, is good for customer support or customer experience. But it's still a way aw- away from, like, conversation like we have-
20. SGSarah Guo
  Interactive, yeah.
21. MSMati Staniszewski
  ... and, like, passing that Turing test. So I think this is still, like, a, at least a year, uh, I don't know, let's see, like, within a year. And then you will have, like, real time dubbing kind of variation of, like, real time translation conversation, and I think that's maybe, like, more two years, within two years away.
22. SGSarah Guo
  You know, a, a, a very uncomfortable belief that I, I feel comfortable having this belief, but I, I think is uncommon in the market right now, is that actually most advantages in technology, like, they could, they could last you a year, or they could last you ten. But they're not, like, infinitely defensible. And if you think about that from a model quality perspective or a product perspective, they allow you to, like, serve the customer better and build momentum and build scale for some period of time. And actually, that's really powerful over time, right?
23. MSMati Staniszewski
  For sure.
24. SGSarah Guo
  But it, but it's not like a clean forever answer. And so I think that makes, I don't know, business people and investors uncomfortable.
25. MSMati Staniszewski
  And, uh, I mean, it's, it's, it's, it's, it's very true as well.
26. SGSarah Guo
  And CEOs. (laughs)
32:37 – 36:53
Research and Development Focus
1. SGSarah Guo
2. MSMati Staniszewski
  (laughs) The way we, I mean, the way you think about it, research is head start.
3. SGSarah Guo
  Okay.
4. MSMati Staniszewski
  This gives us... We can give advantage to the customer earlier, and it's six, 12 months of advantage. That is also a way for us to build a right product layer for you to get best of that research. Frequently, we do that in parallel. So the moment the research is out there, you have the product, because we know our initiatives, we know what the product is that's right. So we have research, product in parallel that extends that. But the, kind of the thing that will really give that long term value is the ecosystem that you create around, whether that's the brand and distribution, whether that's the collection of voices you can have, the collection of integrations you can build, the workflows that you can build. Um, like, I think that's, that's the way we kind of sequence that in our mind, that research, product, ecosystem that we built. And, um, and research, all it is, is a, is a head start in being able to, like, accelerate the future a little bit closer.
5. SGSarah Guo
  I think that's a really powerful insight, especially if, you know, the research and, the research team and the company team believe that as well internally.
6. MSMati Staniszewski
  It's, it's, it's...
7. SGSarah Guo
  Yeah.
8. MSMati Staniszewski
  I, I think the piece that we, uh, was, like, interesting for us is, um, and I think this is like the, the, the big questions for all companies that do research in product is, do you wait for research or do you do, like, a, a product change? Uh, or even not only research product companies. Like, do you wait for someone else to do the research? Because the timeline for that isn't clear.
9. SGSarah Guo
  (laughs)
10. MSMati Staniszewski
  Is it three months, six months, 12 months? You don't know exactly when it will do, which is the hard choice of like, do I invest into product layer or do I just wait more for the research? So like in our case, we internally, that's all the product teams, the research initiatives, so we can parallelize that work. Uh, but we don't hold them that if, if a product team thinks we should deliver value to the customer by doing something different, they can. And rough rule of thumb is like three months. If we think it's going to be longer than three months, we will probably build it. If it's less than that, we probably won't.
11. SGSarah Guo
  Can you talk about some of the research that you're doing now and then how you think about, like, the cadence of delivery and what's worth working on?
12. MSMati Staniszewski
  We have now a number of, of different initiatives across the audio space, and there are, there are kind of two big buckets and, and, and roughly they will relate to that creative end agent side. On the creative side, what this means, uh, with the text-to-speech models that are controllable, uh, we then add a speech-to-text model that transcribes in a high accurate way, but across a low la- uh, resource languages, um, um, as well, so covering over almost 100 languages. Then created a music model, a fully licensed music model. Um, and as you think about the future, it's how those models will also interact with some of the visual space. So that's, uh, a lot of effort in how you can get the best of audio and then potentially combine that with existing video that you have to, to, to really have the best delivery. And then on the agent side, it's of course how you optimize the real time speech-to-text, real time text-to-speech. We just released our speech-to-text model, Scribe V2, which is under 150 milliseconds, 93.5% accuracy across the top 30 languages on, on Flares. And it's only top 30 here because we serve so many others, but most of the people don't. So, uh, so it's, uh, so it's beating, it's beating the, the, all the models on, on benchmarks. But as you think about the future, it's also the orchestration piece of how you bring speech-to-text, LM, and text-to-speech. We are releasing, we will be releasing over the next couple of months a new orchestration mechanism that will lower the end-to-end part, um, we think in a great way. But second thing, which is what is so hard, this is not going to only allow you to combine those pieces, but add also the emotional context of the conversation.
13. SGSarah Guo
  Mm-hmm.
14. MSMati Staniszewski
  So you can actually, uh, respond with the model and we think in more expressive and in a better way. And in the future, and something we're investing, is parallelizing a speech-to-speech more fused approach as well. And of course, depending on the use case, if you are like enterprise reliable use case, the cascaded approach is the approach for the next year or two.
15. SGSarah Guo
  Has more structure, yeah.
16. MSMati Staniszewski
  More structure. You have more, uh, visibility into each of the steps. It's reliable. You can, uh, uh, call tools. If you're feeling more expressive and can hallucinate, speech to speech might be the choice, and maybe over time you'll see them the, uh, kind of, uh, go one over, over another depending on the, on the industry. But that's like a huge investment on our side, which is where kind of the foundation of all the platform and, and, and the main part that we are...... continually investing in is, is, is kind of plethora of different models that combine the best of audio with some of the best of the other modalities together.
17. SGSarah Guo
  I want to take,
36:53 – 41:37
Future of AI Companions and Education
1. SGSarah Guo
  uh, our last few minutes and ask you a few questions about just the future that I think you'll have a really good point of view on, given you think about voice and audio all the time. What do you think of AI companions?
2. MSMati Staniszewski
  I think they will be a big thing and exist in a big way. Not something I'm personally excited about or something that we spend much, uh, time on. But, uh, I think the whole line of, of what's a assistant, companion, character that you enjoy as part of experience will kind of blurry and blend-
3. SGSarah Guo
  Mm-hmm.
4. MSMati Staniszewski
  ... to a large extent.
5. SGSarah Guo
  They're going to be very common, but you're not, like, enthusiastic personally about it.
6. MSMati Staniszewski
  I'm more exci- excited about, like, more of, um, the Jarvis version of that or, like-
7. SGSarah Guo
  Mm-hmm.
8. MSMati Staniszewski
  ... like more of, like, I have a super assistant, super pilot, like kind of-
9. SGSarah Guo
  Versus the social version.
10. MSMati Staniszewski
  Versus the social version that's like-
11. SGSarah Guo
  Yeah.
12. MSMati Staniszewski
  I think it's, it just would be, like, such an incredible unlock. And it, and it also, like, is in a, in something blending in that personal life context. Like, I would love to start the day and, like, someone that understands me and, like, start and tell me what's, like, relevant to me and open the blinds and then-
13. SGSarah Guo
  (laughs)
14. MSMati Staniszewski
  ... uh, like, tell me about the weather and the sunshine. It's, uh, it would be... And play music straight away.
15. SGSarah Guo
  It's gonna happen.
16. MSMati Staniszewski
  It's gonna happen.
17. SGSarah Guo
  Yeah.
18. MSMati Staniszewski
  That, I'm excited for. I think the companion, um, use cases we'll, we'll, we'll mention, solving loneliness and, and that part, I think that's one way. Maybe there are, like, different ways of engaging people back. I do, I do think there will be, like, an interesting future, even if you're thinking about education, where you will have super power with learning from AI tutors. But I think on the flip side of that, and I think this will... Like, that's my personal take. You will have education h- good percent of time spent with AI tutors, but then explicit percent of time spent without any technology, human-to-human-
19. SGSarah Guo
  Mm-hmm.
20. MSMati Staniszewski
  ... so you can, you can kind of learn that part too.
21. SGSarah Guo
  Yeah, I think this is the correct model. Um, both in terms of, like, emotional guidance and coaching and, um, you know, uh, guardrails, right? As well as, like, peer-to-peer learning.
22. MSMati Staniszewski
  Yeah. Yeah, exactly.
23. SGSarah Guo
  What do you think about, um, dictation or what happens in terms of how we, like, control, uh, technology that isn't necessarily personified as well?
24. MSMati Staniszewski
  Uh-
25. SGSarah Guo
  Or does it just all become personified?
26. MSMati Staniszewski
  I think not all personified. I think, like, some, you know, communicating with an oven and, and home probably will, like, stay pretty static and, and-
27. SGSarah Guo
  Or code, I might just...
28. MSMati Staniszewski
  Yeah, e-exactly. Like, you don't probably need that much of, of, like, additional emotional input.
29. SGSarah Guo
  Yeah.
30. MSMati Staniszewski
  But, uh, but I think it's, yeah, it's going to be huge parts where, like, in a way what I hope will happen is you will have ability to, like, stay more immersed in the real life with the devices going into, back into the pocket, back into some version of a, um, attached element, um, assuming that's, that's in, in the right setting. And, um, and that kind of acts on your behalf. And in many ways, like, let's say dictation, it's as Karpati says, decade of agents, let's, let's, let's call it a decade. Then you'll have a decade of, of robots. If you are interacting with robots, of course voice will be the input and the output as one of the key interfaces, so you will need that dictation as a, as a huge part. But similarly-

Episode duration: 41:38

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode WHWAYiY_RnQ

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome