Nikhil KamathThe $11B Bet That Voice Will Replace Everything | Mati Staniszewski x Nikhil Kamath | WTF Online
CHAPTERS
ElevenLabs in India, and why hardware (Nothing) might be the voice wedge
Nikhil and Mati open with team presence in India and segue into hardware as the missing link for mainstream voice. They discuss Nothing’s positioning and how AI-native audio devices could make real-time translation and always-on assistance feel natural.
The “meeting companion” vision: voice agents, pendants, and opt-in transcription
Mati describes internal experiments using voice agents to capture event/meeting feedback and action items. The conversation explores what an ideal wearable note-taker could look like—assuming explicit consent and strong privacy norms.
Voice as the next interface: what must improve for a Jarvis-like experience
Mati outlines why voice is a natural interaction layer and what it will take to move computing “into the background.” He breaks the transition into three requirements: human-level interaction quality, knowledge/memory access, and the right device form factor.
Which devices win: headphones, glasses, pendants—and even silent speech
They debate the most likely winners among AI hardware formats. Mati leans toward behind-the-ear audio devices, and mentions emerging “silent speech” approaches that infer speech from mouth movements to interact privately.
Speculating on the Jony Ive + Sam Altman device, and why OpenAI voice matters
Nikhil and Mati speculate about what OpenAI and Jony Ive might build, focusing on behavior change vs. adoption friction. They also note OpenAI’s increased emphasis on voice, raising the competitive bar for specialists like ElevenLabs.
Competing with big labs: ElevenLabs’ moat (research + product) and platform split
Nikhil raises the fear of platform providers moving up the stack; Mati responds with ElevenLabs’ strategy: owning core audio research and shipping products that create durable customer value. He explains their two major lines: creative tools and enterprise/agent platforms.
Creator workflows and dubbing at scale: fixing audio, then localizing across languages
Mati explains practical creator use cases: patching missing lines with voice reconstruction and dubbing full podcasts into multiple languages. They discuss the Lex Fridman–PM Modi dubbing workflow and how cost differs between fully automated dubbing and human-reviewed, high-stakes translations.
Preserving emotion in dubbed content—and the next step: lip reanimation
Nikhil challenges the core problem: dubbing often loses emotion and sounds robotic. Mati explains how ElevenLabs attempts to preserve intonation and context, and why sentence structure differences create lip-sync mismatches—leading to the need for lip animation/reanimation.
Voice agents that make money today: domain-first businesses in lagging industries
Asked what entrepreneurs can build now, Mati recommends pairing voice tech with deep domain expertise—especially in “non-innovating” industries like automotive and healthcare. The key is deployment, integrations, compliance, and replicability beyond a single client.
Real-world deployments: Meesho scale and the ‘AI concierge’ future of commerce
Mati shares ElevenLabs’ work helping Meesho automate high-volume multilingual customer calls. They then zoom out to a broader vision: voice-driven shopping where an assistant helps discovery, comparison, and ordering—turning e-commerce into a guided conversation.
Education, loneliness, and how voice changes what humans should learn
Nikhil and Mati explore voice agents as companions (including loneliness/intimacy) and as personalized tutors. They debate how always-available knowledge changes education—shifting value from memorization toward learning-to-learn and social development.
AI valuations and global innovation: bubble talk, Europe/India opportunity, and risks
Nikhil questions whether AI valuations outpace revenue fundamentals. Mati argues this cycle is different due to clearer value and revenue, while acknowledging frothy pockets (e.g., GPU-layer providers) and emphasizing that global teams outside the Valley can build category leaders.
Geopolitics, data residency, and trust: will platforms fragment by country?
Nikhil predicts a more multipolar tech world where countries demand local data, local platforms, and reduced dependence on global incumbents like WhatsApp. Mati partially disagrees, arguing foundational models and some infrastructures will remain concentrated, while fine-tuning and product layers localize—making trust and network effects central.
Designing a new social network: authenticity, verification, voice-first interaction, and incentives
Nikhil shares plans for an India-first social platform that reduces algorithmic outrage and foreign influence over youth culture. Mati proposes voice-enabled “companion” consumption (summaries, deeper questions), multilingual access, strong human verification, and carefully designed incentives that don’t reward knee-jerk negativity.
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome