The $11B Bet That Voice Will Replace Everything | Mati Staniszewski x Nikhil Kamath | WTF Online

The $11B Bet That Voice Will Replace Everything | Mati Staniszewski x Nikhil Kamath | WTF Online

Nikhil KamathMar 11, 202659m

Nikhil Kamath (host), Mati Staniszewski (guest)

Voice as the next tech interfaceThree prerequisites: quality, knowledge, form factorHeadphones vs glasses vs pendants; ambient computingElevenLabs: creative tools vs enterprise agentsEmotion-preserving dubbing and localization workflowsOpenAI/platform risk and defensibility beyond modelsGeopolitics, data residency, trust, and new social media design

In this episode of Nikhil Kamath, featuring Nikhil Kamath and Mati Staniszewski, The $11B Bet That Voice Will Replace Everything | Mati Staniszewski x Nikhil Kamath | WTF Online explores why voice is the next interface—and how to build on it Voice will likely become a dominant interface once speech feels human-level—interruptible, emotional, low-latency, and intelligent—so devices can “fade into the background.”

Why voice is the next interface—and how to build on it

Voice will likely become a dominant interface once speech feels human-level—interruptible, emotional, low-latency, and intelligent—so devices can “fade into the background.”

ElevenLabs positions itself as both a research company (foundational audio models) and a product company (creative tools and enterprise voice agents), reducing dependence on LLM platform providers.

High-quality localization/dubbing is shifting from “robotic translations” toward preserving emotion and intent, with the next step being automated lip re-animation to match translated speech.

Near-term profitable opportunities are in domain-specific voice agents for traditional industries (healthcare, automotive, e-commerce, financial services), where integration and workflow deployment are the real moat.

The conversation broadens into geopolitics, trust, and social media: Kamath argues global platforms may fragment, while Staniszewski emphasizes network effects and trust, and both explore how voice/AI could enable a healthier, authenticity-first social product.

Key Takeaways

Voice wins when it matches human conversational dynamics.

Staniszewski argues adoption hinges on speech that’s fast, interruptible, emotionally accurate, and paired with strong “assistant intelligence,” otherwise people won’t tolerate voice agents everywhere.

Get the full analysis with uListen AI

Great voice UX requires “knowledge plumbing,” not just a good TTS model.

He highlights integrations (CRM, enterprise systems), memory, and deployment channels (WhatsApp, phone numbers, devices) as essential to make voice agents truly useful in real workflows.

Get the full analysis with uListen AI

Form factor is still the unsolved bottleneck for ubiquitous voice.

They discuss headphones (especially behind-the-ear), pendants, wristbands, and glasses; the technology may exist, but adoption depends on comfortable, always-available hardware with battery, sensors, and social acceptability.

Get the full analysis with uListen AI

Localization value comes from preserving emotion—and soon, fixing lips.

ElevenLabs’ dubbing approach aims to retain intonation and contextual emotion across languages, but mismatched sentence structure breaks visual timing; lip animation/re-animation is framed as the next major layer.

Get the full analysis with uListen AI

Entrepreneurs can build big businesses by pairing voice with “boring” domain expertise.

Staniszewski recommends targeting lagging industries (healthcare, automotive) and starting with deployable wedges like customer support, then expanding into richer in-product experiences once trust and distribution are established.

Get the full analysis with uListen AI

Defensibility in AI apps often lives in deployment, trust, and evaluation loops.

In response to fears that OpenAI or big labs will copy applications, he emphasizes moats like telephony integration, monitoring/evals, domain data, and ongoing operational performance—not just the underlying model.

Get the full analysis with uListen AI

A healthier social platform may require verification plus different emotional incentives.

Kamath wants a non-rage-driven, less algorithmically manipulative social product; Staniszewski suggests verification/real-human trust and voice-first interaction patterns (summaries, spoken comments, auto-translation) while acknowledging incentives are the hardest design problem.

Get the full analysis with uListen AI

Notable Quotes

To make this possible, there's at least three things that need to happen: ... voice quality and interaction ... knowledge access ... and the form factor.

Mati Staniszewski

Dubbing is... a small market... the interactive use case is the biggest market... that will shift everything.

Nikhil Kamath / Mati Staniszewski

An hour podcast will be few dollars... but... we did employ a wider set of people that would actually check every translation.

Mati Staniszewski

I believe social media is broken today... an app which is not governed by an algorithm... triggering hate... I don't think is conducive in the long term.

Nikhil Kamath

If you could have a platform that inspires curiosity and learning... and doesn't lead to just two sides screaming at each other—[that would be] incredible.

Mati Staniszewski

Questions Answered in This Episode

On the “three prerequisites,” what are the current biggest bottlenecks: latency, emotional prosody, or interruption handling—and what milestones mark “human-level” voice?

Voice will likely become a dominant interface once speech feels human-level—interruptible, emotional, low-latency, and intelligent—so devices can “fade into the background.”

Get the full analysis with uListen AI

If behind-the-ear headphones are your preferred form factor, what specific sensors/on-device capabilities are required for the always-on agent use case without privacy backlash?

ElevenLabs positions itself as both a research company (foundational audio models) and a product company (creative tools and enterprise voice agents), reducing dependence on LLM platform providers.

Get the full analysis with uListen AI

In dubbing, how do you technically represent and transfer “emotion” across languages—prosody tokens, reference audio conditioning, or conversation-level context windows?

High-quality localization/dubbing is shifting from “robotic translations” toward preserving emotion and intent, with the next step being automated lip re-animation to match translated speech.

Get the full analysis with uListen AI

Lip re-animation sounds adjacent to video generation—will ElevenLabs build it, partner, or acquire, and what makes that workflow hard to commoditize?

Near-term profitable opportunities are in domain-specific voice agents for traditional industries (healthcare, automotive, e-commerce, financial services), where integration and workflow deployment are the real moat.

Get the full analysis with uListen AI

For entrepreneurs building vertical voice agents, what’s the minimum viable deployment wedge you’ve seen work fastest in healthcare or financial services?

The conversation broadens into geopolitics, trust, and social media: Kamath argues global platforms may fragment, while Staniszewski emphasizes network effects and trust, and both explore how voice/AI could enable a healthier, authenticity-first social product.

Get the full analysis with uListen AI

Transcript Preview

Nikhil Kamath

[upbeat music]

Mati Staniszewski

How many times have you-- this is your f- uh, how many times-

Nikhil Kamath

Fifth. Fifth time.

Mati Staniszewski

Fifth time.

Nikhil Kamath

Yeah. But never for podcast. Always for my fintech company.

Mati Staniszewski

Of course, yeah.

Nikhil Kamath

Are you in India?

Mati Staniszewski

We are.

Nikhil Kamath

Yeah?

Mati Staniszewski

We have, uh, I think it's between ten and fifteen. I think it's fourteen people now.

Nikhil Kamath

Nice.

Mati Staniszewski

If, uh, if we count the new people that are joining or joined.

Nikhil Kamath

And, and which city?

Mati Staniszewski

Mostly Bengaluru and few, uh, close to Mumbai.

Nikhil Kamath

I'm from Bengaluru.

Mati Staniszewski

Oh, are you, do you-- are you based there or you are part-

Nikhil Kamath

Partly based there.

Mati Staniszewski

Where is the other part?

Nikhil Kamath

Uh, Mumbai and Goa.

Mati Staniszewski

Okay. [laughs]

Nikhil Kamath

[laughs] What do you think the opportunity is? What can one build in voice that can be a big profitable business tomorrow? You're based out of which city now, Mati?

Mati Staniszewski

From London.

Nikhil Kamath

Right, from London. Do you know Carl, the Nothing guy?

Mati Staniszewski

Yeah. Yeah, yeah, yeah. Of course.

Nikhil Kamath

Yeah, yeah, yeah. We're investors of his.

Mati Staniszewski

Oh, no way.

Nikhil Kamath

Yeah.

Mati Staniszewski

I'm a tiny investor too.

Nikhil Kamath

Really?

Mati Staniszewski

Yeah. What-

Nikhil Kamath

What do you think?

Mati Staniszewski

Well, I l- you know, it's so hard to innovate in hardware.

Nikhil Kamath

Yeah.

Mati Staniszewski

And Carl is one of the very few that is actually doing both innovating-

Nikhil Kamath

Mm-hmm

Mati Staniszewski

... and got to a good scale-

Nikhil Kamath

Yeah

Mati Staniszewski

... um, which is so hard.

Nikhil Kamath

Yeah.

Mati Staniszewski

Um, and I think he, he's, like, thinking around design and how there will be a combination of AI native devices and maybe they all take slightly different forms.

Nikhil Kamath

Yeah.

Mati Staniszewski

I think there's a, a great chance for, for Nothing. What do you think?

Nikhil Kamath

Uh, so we engaged recently actually. We got on their cap table. I like Carl. I like Carl a lot. I feel like it's a new guy who's trying something new. Uh, the sales look great. Uh, I think they went from five hundred mil last year to about nine hundred mil this year. Uh, but it's such a tough space, right? Hardware.

Mati Staniszewski

So tough. But it's, you know, I, I'm also excited.

Nikhil Kamath

Mm.

Mati Staniszewski

You know, maybe that's a little bit biased.

Nikhil Kamath

Mm.

Mati Staniszewski

But he, he can lead with voice in many ways.

Nikhil Kamath

Yeah.

Mati Staniszewski

You know, the, the Nothing headphones I think are, are great.

Nikhil Kamath

Yeah, yeah, yeah.

Mati Staniszewski

And they are actually one of the first ones-

Nikhil Kamath

Yeah

Mati Staniszewski

... to start integrating, like, AI-assisted part of the-

Nikhil Kamath

Yeah

Mati Staniszewski

... of the experience, so you could speak with the headphones, you could personalize music-

Nikhil Kamath

That's not out yet, right?

Mati Staniszewski

They're testing a few things in alpha.

Nikhil Kamath

Mm. Yeah.

Mati Staniszewski

But if they, if they, um, you know, if they move quickly, I think this could be, like, the first truly AI native device that's kind of with you, around you.

Nikhil Kamath

Yeah.

Mati Staniszewski

Um, and I mean, that opens so many opportunities. I'm frequently speaking with Carl of, uh-

Nikhil Kamath

Yeah

Mati Staniszewski

... um, how, like, in the ideal future, you could potentially speak any language-

Nikhil Kamath

Mm-hmm

Mati Staniszewski

... and it automatically gets real-time translated to any other language around you, so we can communicate and, and travel, travel around.

Install uListen to search the full transcript and get AI-powered insights

Get Full Transcript

Get more from every podcast

AI summaries, searchable transcripts, and fact-checking. Free forever.

Add to Chrome