The $11B Bet That Voice Will Replace Everything | Mati Staniszewski x Nikhil Kamath | WTF Online

ElevenLabs is the AI company that makes machines sound human. They dubbed the Lex Friedman–Modi podcast into Hindi. They built an AI Gordon Ramsay that teaches you how to cook while you cook. They automated 60,000 customer support calls for Meesho. The company is worth $11 billion and is racing OpenAI to own the future of how humans talk to machines. The CEO is a 29-year-old from Poland named Mati Staniszewski who started it because every foreign film in Poland had one man dubbing every character. I sat down with him in Davos. We got into why headphones will matter more than phones, why the real opportunity for young entrepreneurs isn't building AI models but going deep into one boring domain — automotive, healthcare, e-commerce — and deploying voice agents better than anyone else. Then it took a turn. We ended up talking about why social media is fundamentally broken, why no foreign algorithm should define the mood of India's youth, and why nobody has built an AI-native social product yet. We decided to try. 00:00 Introduction 06:35 Voice as the next tech interface 13:24 Competing with OpenAI and big labs 20:12 Preserving emotion in dubbed content 27:39 Building profitable voice businesses today 35:09 AI valuations and global opportunity 42:29 Geopolitics reshaping trust in tech platforms 49:51 Designing a new social media platform 56:40 Incentivising authenticity over negativity #nikhilkamath Co-founder of Zerodha and Gruhas Host of 'WTF is' & 'People By WTF' Podcast Twitter: https://x.com/nikhilkamathcio/ Instagram: https://www.instagram.com/nikhilkamathcio/ LinkedIn: https://www.linkedin.com/in/nikhilkamathcio?utm_source=share&utm_campaign=share_via&utm_content=profile&utm_medium=ios_app Facebook: https://www.facebook.com/nikhilkamathcio/ #matistaniszewski LinkedIN- https://uk.linkedin.com/in/matiii Twitter - https://x.com/matiii Watch 'WTF is' Podcast on Spotify https://tinyurl.com/4nsm4ezn Watch 'People by WTF' Podcast on Spotify https://tinyurl.com/yme92c59 Watch 'WTF Online' on Spotify https://tinyurl.com/4tjua4th #WTFiswithnikhilkamath #PeopleByWTF #WTFOnline

Nikhil KamathhostMati Staniszewskiguest

Mar 11, 202659mWatch on YouTube ↗

CHAPTERS

ElevenLabs in India, and why hardware (Nothing) might be the voice wedge
Nikhil and Mati open with team presence in India and segue into hardware as the missing link for mainstream voice. They discuss Nothing’s positioning and how AI-native audio devices could make real-time translation and always-on assistance feel natural.
The “meeting companion” vision: voice agents, pendants, and opt-in transcription
Mati describes internal experiments using voice agents to capture event/meeting feedback and action items. The conversation explores what an ideal wearable note-taker could look like—assuming explicit consent and strong privacy norms.
Voice as the next interface: what must improve for a Jarvis-like experience
Mati outlines why voice is a natural interaction layer and what it will take to move computing “into the background.” He breaks the transition into three requirements: human-level interaction quality, knowledge/memory access, and the right device form factor.
Which devices win: headphones, glasses, pendants—and even silent speech
They debate the most likely winners among AI hardware formats. Mati leans toward behind-the-ear audio devices, and mentions emerging “silent speech” approaches that infer speech from mouth movements to interact privately.
Speculating on the Jony Ive + Sam Altman device, and why OpenAI voice matters
Nikhil and Mati speculate about what OpenAI and Jony Ive might build, focusing on behavior change vs. adoption friction. They also note OpenAI’s increased emphasis on voice, raising the competitive bar for specialists like ElevenLabs.
Competing with big labs: ElevenLabs’ moat (research + product) and platform split
Nikhil raises the fear of platform providers moving up the stack; Mati responds with ElevenLabs’ strategy: owning core audio research and shipping products that create durable customer value. He explains their two major lines: creative tools and enterprise/agent platforms.
Creator workflows and dubbing at scale: fixing audio, then localizing across languages
Mati explains practical creator use cases: patching missing lines with voice reconstruction and dubbing full podcasts into multiple languages. They discuss the Lex Fridman–PM Modi dubbing workflow and how cost differs between fully automated dubbing and human-reviewed, high-stakes translations.
Preserving emotion in dubbed content—and the next step: lip reanimation
Nikhil challenges the core problem: dubbing often loses emotion and sounds robotic. Mati explains how ElevenLabs attempts to preserve intonation and context, and why sentence structure differences create lip-sync mismatches—leading to the need for lip animation/reanimation.
Voice agents that make money today: domain-first businesses in lagging industries
Asked what entrepreneurs can build now, Mati recommends pairing voice tech with deep domain expertise—especially in “non-innovating” industries like automotive and healthcare. The key is deployment, integrations, compliance, and replicability beyond a single client.
Real-world deployments: Meesho scale and the ‘AI concierge’ future of commerce
Mati shares ElevenLabs’ work helping Meesho automate high-volume multilingual customer calls. They then zoom out to a broader vision: voice-driven shopping where an assistant helps discovery, comparison, and ordering—turning e-commerce into a guided conversation.
Education, loneliness, and how voice changes what humans should learn
Nikhil and Mati explore voice agents as companions (including loneliness/intimacy) and as personalized tutors. They debate how always-available knowledge changes education—shifting value from memorization toward learning-to-learn and social development.
AI valuations and global innovation: bubble talk, Europe/India opportunity, and risks
Nikhil questions whether AI valuations outpace revenue fundamentals. Mati argues this cycle is different due to clearer value and revenue, while acknowledging frothy pockets (e.g., GPU-layer providers) and emphasizing that global teams outside the Valley can build category leaders.
Geopolitics, data residency, and trust: will platforms fragment by country?
Nikhil predicts a more multipolar tech world where countries demand local data, local platforms, and reduced dependence on global incumbents like WhatsApp. Mati partially disagrees, arguing foundational models and some infrastructures will remain concentrated, while fine-tuning and product layers localize—making trust and network effects central.
Designing a new social network: authenticity, verification, voice-first interaction, and incentives
Nikhil shares plans for an India-first social platform that reduces algorithmic outrage and foreign influence over youth culture. Mati proposes voice-enabled “companion” consumption (summaries, deeper questions), multilingual access, strong human verification, and carefully designed incentives that don’t reward knee-jerk negativity.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome

ElevenLabs in India, and why hardware (Nothing) might be the voice wedge

The “meeting companion” vision: voice agents, pendants, and opt-in transcription

Voice as the next interface: what must improve for a Jarvis-like experience

Which devices win: headphones, glasses, pendants—and even silent speech

Speculating on the Jony Ive + Sam Altman device, and why OpenAI voice matters

Competing with big labs: ElevenLabs’ moat (research + product) and platform split

Creator workflows and dubbing at scale: fixing audio, then localizing across languages

Preserving emotion in dubbed content—and the next step: lip reanimation

Voice agents that make money today: domain-first businesses in lagging industries

Real-world deployments: Meesho scale and the ‘AI concierge’ future of commerce

Education, loneliness, and how voice changes what humans should learn

AI valuations and global innovation: bubble talk, Europe/India opportunity, and risks

Geopolitics, data residency, and trust: will platforms fragment by country?

Designing a new social network: authenticity, verification, voice-first interaction, and incentives

Get more out of YouTube videos.