$6.6B AI CEO: How to Make Your First $10,000 with AI

$6.6B AI CEO: How to Make Your First $10,000 with AI

Silicon Valley GirlOct 4, 202543m

Mati Staniszewski (guest), Marina Mogilko (host)

Voice as the next AI interfaceVoice agents for customer support and sales conversionHow to deploy agents: knowledge base, workflows, integrationsVoice marketplace royalties and passive income mechanicsVoice cloning quality nuances (context/conditioning, mixing)Deepfakes, authentication, watermarking, and trust modelsJobs at risk, adapting via AI + domain expertiseTop recommended AI tools (Claude, Black Forest Labs, Lovable/v0/Replit)$10k/month opportunity: SMB voice-agent deployment servicesLanguages in an era of real-time translation

In this episode of Silicon Valley Girl, featuring Mati Staniszewski and Marina Mogilko, $6.6B AI CEO: How to Make Your First $10,000 with AI explores elevenLabs CEO on voice agents, monetization, and deepfake safeguards Mati argues voice is becoming a primary interface to AI because it carries emotion, context, and usability that text cannot, and businesses are rapidly adopting voice agents for support, sales, and product navigation.

ElevenLabs CEO on voice agents, monetization, and deepfake safeguards

Mati argues voice is becoming a primary interface to AI because it carries emotion, context, and usability that text cannot, and businesses are rapidly adopting voice agents for support, sales, and product navigation.

He explains how companies can deploy low-latency voice agents by connecting speech, LLM reasoning, and business workflows (calendar booking, knowledge bases, handoffs), often for hundreds of dollars per month plus telephony integrations like Twilio.

On the creator side, ElevenLabs’ voice marketplace lets users authenticate, clone their voice with ~30 minutes of recording, and earn royalties when others use it—about $5M paid to the community so far, with earnings skewed toward distinctive voices/accents.

They also discuss risks: impersonation will happen, so the future needs an “assume AI by default” mindset plus a three-layer verification model (device authenticity, authenticated/watermarked AI, and default distrust when unverified), alongside job shifts where AI replaces those who don’t use AI.

Key Takeaways

Voice will be a dominant interface for AI interactions.

Mati emphasizes voice transmits more information than text—emotion, inflection, imperfections—making it both a richer input signal and a more natural, “pleasurable” output for users.

Get the full analysis with uListen AI

Voice agents already convert, especially in self-serve tiers.

ElevenLabs uses its own agents to accelerate inbound leads; the agent can convert directly for business-tier self-serve plans, while enterprise still requires KYC and human-assisted processes.

Get the full analysis with uListen AI

Deployment is less about coding and more about business logic.

The platform abstracts the hard orchestration (speech + LLM + TTS latency), but success depends on mapping your knowledge base, defining workflows (“if X then trigger Y”), and integrating systems like calendars, checkout links, or CRM.

Get the full analysis with uListen AI

A practical SMB setup can be cost-effective to start.

For smaller businesses, Mati suggests initial costs in the “hundreds of dollars per month” range, with telephony brought via integrations such as Twilio and existing phone numbers.

Get the full analysis with uListen AI

The voice marketplace can generate real royalties, but outcomes are power-law distributed.

Creators can authenticate and share a cloned voice to ElevenLabs’ marketplace and earn when it’s used. ...

Get the full analysis with uListen AI

Voice cloning quality issues often come from context mismatch and audio mixing.

A cloned voice is effectively an average over training audio; a specific scene may have different emotion/intonation than that average. ...

Get the full analysis with uListen AI

Deepfakes will be inevitable—trust systems must adapt.

Mati recommends assuming voices can be perfectly cloned and building multi-layer defenses: (1) verify “humanness” via device identity signals, (2) watermark authenticated AI, and (3) default to distrusting anything unverified (assume AI by default).

Get the full analysis with uListen AI

Jobs aren’t just replaced by AI—they’re replaced by people using AI.

He predicts repetitive, recipe-based tasks (basic customer support, simple scheduling/refunds) will automate first, while complex cases elevate the value of human expertise—especially when paired with AI tools.

Get the full analysis with uListen AI

A near-term $10k/month opportunity is ‘agent deployment’ for local businesses.

There’s a gap between agent infrastructure and real-world implementation. ...

Get the full analysis with uListen AI

Language learning may shift from necessity to personal development.

With translation devices improving, the utilitarian need to learn languages may decline, but cultural learning and self-enrichment remain; native speech may still be smoother than mediated translation due to latency/turn-taking constraints.

Get the full analysis with uListen AI

Notable Quotes

Voice will be the... one of the key interfaces to the technology around us.

Mati Staniszewski

You don't have to be the coder. You just need to...

Mati Staniszewski

I think it's going to happen.

Mati Staniszewski

By default, it's AI, and you assume it's AI.

Mati Staniszewski

All the people that will be replaced will be replaced by people that use AI.

Mati Staniszewski

Questions Answered in This Episode

On ElevenLabs’ own inbound pipeline, what conversion-rate lift did you see from the voice agent versus the previous “wait for a human” baseline (and how do you measure net-new vs cannibalized leads)?

Mati argues voice is becoming a primary interface to AI because it carries emotion, context, and usability that text cannot, and businesses are rapidly adopting voice agents for support, sales, and product navigation.

Get the full analysis with uListen AI

For a small business (like a local dentist), what’s the minimum ‘business logic’ needed to launch a usable scheduling agent without harming customer experience?

He explains how companies can deploy low-latency voice agents by connecting speech, LLM reasoning, and business workflows (calendar booking, knowledge bases, handoffs), often for hundreds of dollars per month plus telephony integrations like Twilio.

Get the full analysis with uListen AI

What are the biggest failure modes you see in real deployments—hallucinated policies, wrong bookings, or integrations breaking—and how do you recommend monitoring them?

On the creator side, ElevenLabs’ voice marketplace lets users authenticate, clone their voice with ~30 minutes of recording, and earn royalties when others use it—about $5M paid to the community so far, with earnings skewed toward distinctive voices/accents.

Get the full analysis with uListen AI

You mentioned multilingual selling “in my voice.” How do you handle cultural tone, formality, and compliance differences across languages beyond just translation?

They also discuss risks: impersonation will happen, so the future needs an “assume AI by default” mindset plus a three-layer verification model (device authenticity, authenticated/watermarked AI, and default distrust when unverified), alongside job shifts where AI replaces those who don’t use AI.

Get the full analysis with uListen AI

In the marketplace, what specifically makes a voice ‘unique’ enough to earn—accent, timbre, prosody, brand-ability—and how should creators position their listing to stand out?

Get the full analysis with uListen AI

Transcript Preview

Mati Staniszewski

we paid about $5 million to the entire community.

Marina Mogilko

Meet Mati, CEO and co-founder of ElevenLabs, a company that has grown into a $6.6 billion leader in the Voice AI space, shaping how we talk, work, and even earn money. They've created an entire voice marketplace. Now, anyone can clone their voice and earn passive income. Can you name some opportunities that you see that can make people a decent amount of money so they can make a living, like $10k a month, something that's immediate?

Mati Staniszewski

Business, and you just want to make good money, I would try to take those voice agents and go to, let's say, local doctor's office and-

Marina Mogilko

ElevenLabs built the world's most realistic voice tech. The question is: Can they control what happens next?

Mati Staniszewski

Most of those companies just don't know this is possible. You don't have to be the coder, you just need to- [beep]

Marina Mogilko

If my voice is authorized to use my credit card to buy anything, and then somebody just uses the resemblance of it-

Mati Staniszewski

I think it's, it's, it's going to happen, but, uh-

Marina Mogilko

Hey, guys, welcome to Silicon Valley Girl. We have one of the guests today whose product I've been using for a while now, so I'm gonna ask a lot of technical questions [chuckles] as well. But please welcome Mati from ElevenLabs.

Mati Staniszewski

Well-

Marina Mogilko

Thank you so much.

Mati Staniszewski

Thank you so much, Marina. Great to see you again, and thanks for, thanks for having me.

Marina Mogilko

Yeah, thank you. I feel like you're one of the pioneers of this AI industry, because when I ask people, like, what apps they're using, when I'm talking about apps that I'm using, I always mention ElevenLabs, 'cause it's been a lifesaver. I wanted to start with a question, um, about the role of voice in AI. So what it feels to me is that 2023, you know, we started adopting ChatGPT, it was all text, and then these voice capabilities became more and more powerful. It understands what I'm saying now, it understands my accent. If I mispronounce something, it still gets me. Do you feel like we're moving into the era where voice is our main tool to interact with AI?

Mati Staniszewski

I mean, 100%. I do think that voice will be the... one of the key interfaces to the technology around us, and, um, and that shift is happening. Like you said, it's like a few years back, you wouldn't even dream of this being possible, and now I think it's, it's becoming a reality, where it, it allows you to transfer so much information, more than the text. You can, you can get the emotionality, the inflection pattern, the imperfections reflected in the voice, which of course makes it easier for the, um, if it's an input, for the, uh, for the technology to understand a lot more about the, the setup that you... or what you are trying to achieve. And then if you hear it back as well, I think it's a lot better and more pleasurable, um-

Install uListen to search the full transcript and get AI-powered insights

Get Full Transcript

Get more from every podcast

AI summaries, searchable transcripts, and fact-checking. Free forever.

Add to Chrome