No Priors Ep. 143 | With ElevenLabs Co-Founder Mati Staniszewski

Sarah Guo and Mati Staniszewski on elevenLabs Bets Big On Voice As The Future Interface.

Sarah GuohostMati StaniszewskiguestElad Gilhost

Dec 11, 202541m

ElevenLabs’ mission, business scale, and dual product lines (creative and agents)Sequencing research and product via specialized internal “labs”Use cases for voice AI: dubbing, customer support, commerce, gaming, and educationVoice quality, evaluation, personalization, and the challenge of choosing voicesCompetitive landscape vs. big labs, systems integrators, and open‑source modelsModel commoditization, defensibility, and the importance of product and ecosystemFuture of interaction: voice as primary interface, AI tutors, and agentic government

In this episode of No Priors, featuring Sarah Guo and Mati Staniszewski, No Priors Ep. 143 | With ElevenLabs Co-Founder Mati Staniszewski explores elevenLabs Bets Big On Voice As The Future Interface Sara Guo interviews ElevenLabs co‑founder and CEO Mati Staniszewski about building a foundational audio company that powers both creative voice generation and real‑time conversational agents. He explains how ElevenLabs grew to $300M ARR and 350 employees by pairing deep audio research with productized platforms for creators and enterprises. The discussion covers use cases like dubbing, customer support, immersive media, education, and even “agentic government,” as well as how they sequence research, navigate competition from big labs and open source, and think about model commoditization. They close by exploring how voice will reshape interfaces, education, and personal assistants over the next decade.

WHAT IT’S REALLY ABOUT

ElevenLabs Bets Big On Voice As The Future Interface

Sara Guo interviews ElevenLabs co‑founder and CEO Mati Staniszewski about building a foundational audio company that powers both creative voice generation and real‑time conversational agents. He explains how ElevenLabs grew to $300M ARR and 350 employees by pairing deep audio research with productized platforms for creators and enterprises. The discussion covers use cases like dubbing, customer support, immersive media, education, and even “agentic government,” as well as how they sequence research, navigate competition from big labs and open source, and think about model commoditization. They close by exploring how voice will reshape interfaces, education, and personal assistants over the next decade.

IDEAS WORTH REMEMBERING

7 ideas

Pair deep research with focused product layers to unlock real value.

ElevenLabs structures work into ‘labs’—small cross‑functional teams that first solve a core technical problem (e.g., voice, agents, music) and then quickly ship product layers on top, so research advances are immediately usable by customers.

Voice is evolving into the primary interface for many technologies.

Staniszewski argues that keyboards and screens are anachronistic for many tasks; natural speech will increasingly mediate interactions with phones, computers, robots, and services, requiring high‑quality, expressive, low‑latency audio models.

Personalization and voice selection are as important as raw model quality.

Enterprises often don’t know how to ‘choose’ a voice, so ElevenLabs supports them with internal ‘voice sommeliers’ and dynamic voice routing based on user segments, proving that tone, accent, and emotion heavily drive user perception.

Conversational agents are moving from reactive support to proactive experiences.

Customers like Meesho use agents not just for refunds and tracking, but as front‑door shopping assistants that help discover products and guide checkout, pointing to a shift from narrow support bots to full journey companions.

Model superiority is a head start, not a permanent moat.

ElevenLabs expects base models to commoditize within a few years; they view research as a 6–12 month advantage that lets them build better products and, more importantly, an ecosystem of voices, integrations, and workflows that compound over time.

Specialization in audio architectures can beat larger generalist labs.

Despite competition from OpenAI and Google, ElevenLabs claims to outperform them on key audio benchmarks by focusing on architectural breakthroughs rather than sheer scale, and by concentrating some of the world’s top audio researchers under one roof.

Education and government are poised for transformation via voice agents.

From chess.com tutors in the voices of top grandmasters to Ukraine’s push toward an ‘agentic government’ with voice‑driven citizen services, the same audio and agent infrastructure can reshape how people learn, access benefits, and interact with institutions.

WORDS WORTH SAVING

5 quotes

We at ElevenLabs are solving how humans and technology interact, how you can create seamlessly with that technology.

— Mati Staniszewski

Instead of having it just translated, could you have the original voice, original emotions, original intonation carried across?

— Mati Staniszewski

If you are looking for a solution across a set of different use cases that you want our engineering help and deploy that, then we are the right solution and probably the best solution.

— Mati Staniszewski

Research is a head start. All it is, is a head start in being able to accelerate the future a little bit closer.

— Mati Staniszewski

I think learning with AI with voice, where it’s on your headphone or in a speaker, is just going to be such a big thing where you have your own teacher on demand who understands you very personally.

— Mati Staniszewski

QUESTIONS ANSWERED IN THIS EPISODE

5 questions

How should enterprises decide when to adopt voice agents across multiple touchpoints versus starting with a single narrow use case like customer support?

Sara Guo interviews ElevenLabs co‑founder and CEO Mati Staniszewski about building a foundational audio company that powers both creative voice generation and real‑time conversational agents. He explains how ElevenLabs grew to $300M ARR and 350 employees by pairing deep audio research with productized platforms for creators and enterprises. The discussion covers use cases like dubbing, customer support, immersive media, education, and even “agentic government,” as well as how they sequence research, navigate competition from big labs and open source, and think about model commoditization. They close by exploring how voice will reshape interfaces, education, and personal assistants over the next decade.

What new metrics or evaluation frameworks are needed to rigorously assess voice quality, emotion, and user trust beyond standard audio benchmarks?

As base models commoditize, what will ultimately differentiate leading AI voice platforms: data, distribution, ecosystem, or something else?

How can education systems integrate AI voice tutors responsibly without undermining critical human‑to‑human social and emotional development?

What governance and ethical guardrails are necessary when governments deploy ‘agentic’ voice interfaces for citizen services at national scale?

EVERY SPOKEN WORD

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome

At a glance

ElevenLabs Bets Big On Voice As The Future Interface

Pair deep research with focused product layers to unlock real value.

Voice is evolving into the primary interface for many technologies.

Personalization and voice selection are as important as raw model quality.

Conversational agents are moving from reactive support to proactive experiences.

Model superiority is a head start, not a permanent moat.

Specialization in audio architectures can beat larger generalist labs.

Education and government are poised for transformation via voice agents.

How should enterprises decide when to adopt voice agents across multiple touchpoints versus starting with a single narrow use case like customer support?

What new metrics or evaluation frameworks are needed to rigorously assess voice quality, emotion, and user trust beyond standard audio benchmarks?

As base models commoditize, what will ultimately differentiate leading AI voice platforms: data, distribution, ecosystem, or something else?

How can education systems integrate AI voice tutors responsibly without undermining critical human‑to‑human social and emotional development?

What governance and ethical guardrails are necessary when governments deploy ‘agentic’ voice interfaces for citizen services at national scale?

Get more out of YouTube videos.