
$6.6B AI CEO: How to Make Your First $10,000 with AI
Mati Staniszewski (guest), Marina Mogilko (host)
In this episode of Silicon Valley Girl, featuring Mati Staniszewski and Marina Mogilko, $6.6B AI CEO: How to Make Your First $10,000 with AI explores elevenLabs CEO on voice agents, monetization, and deepfake safeguards Mati argues voice is becoming a primary interface to AI because it carries emotion, context, and usability that text cannot, and businesses are rapidly adopting voice agents for support, sales, and product navigation.
ElevenLabs CEO on voice agents, monetization, and deepfake safeguards
Mati argues voice is becoming a primary interface to AI because it carries emotion, context, and usability that text cannot, and businesses are rapidly adopting voice agents for support, sales, and product navigation.
He explains how companies can deploy low-latency voice agents by connecting speech, LLM reasoning, and business workflows (calendar booking, knowledge bases, handoffs), often for hundreds of dollars per month plus telephony integrations like Twilio.
On the creator side, ElevenLabs’ voice marketplace lets users authenticate, clone their voice with ~30 minutes of recording, and earn royalties when others use it—about $5M paid to the community so far, with earnings skewed toward distinctive voices/accents.
They also discuss risks: impersonation will happen, so the future needs an “assume AI by default” mindset plus a three-layer verification model (device authenticity, authenticated/watermarked AI, and default distrust when unverified), alongside job shifts where AI replaces those who don’t use AI.
Key Takeaways
Voice will be a dominant interface for AI interactions.
Mati emphasizes voice transmits more information than text—emotion, inflection, imperfections—making it both a richer input signal and a more natural, “pleasurable” output for users.
Get the full analysis with uListen AI
Voice agents already convert, especially in self-serve tiers.
ElevenLabs uses its own agents to accelerate inbound leads; the agent can convert directly for business-tier self-serve plans, while enterprise still requires KYC and human-assisted processes.
Get the full analysis with uListen AI
Deployment is less about coding and more about business logic.
The platform abstracts the hard orchestration (speech + LLM + TTS latency), but success depends on mapping your knowledge base, defining workflows (“if X then trigger Y”), and integrating systems like calendars, checkout links, or CRM.
Get the full analysis with uListen AI
A practical SMB setup can be cost-effective to start.
For smaller businesses, Mati suggests initial costs in the “hundreds of dollars per month” range, with telephony brought via integrations such as Twilio and existing phone numbers.
Get the full analysis with uListen AI
The voice marketplace can generate real royalties, but outcomes are power-law distributed.
Creators can authenticate and share a cloned voice to ElevenLabs’ marketplace and earn when it’s used. ...
Get the full analysis with uListen AI
Voice cloning quality issues often come from context mismatch and audio mixing.
A cloned voice is effectively an average over training audio; a specific scene may have different emotion/intonation than that average. ...
Get the full analysis with uListen AI
Deepfakes will be inevitable—trust systems must adapt.
Mati recommends assuming voices can be perfectly cloned and building multi-layer defenses: (1) verify “humanness” via device identity signals, (2) watermark authenticated AI, and (3) default to distrusting anything unverified (assume AI by default).
Get the full analysis with uListen AI
Jobs aren’t just replaced by AI—they’re replaced by people using AI.
He predicts repetitive, recipe-based tasks (basic customer support, simple scheduling/refunds) will automate first, while complex cases elevate the value of human expertise—especially when paired with AI tools.
Get the full analysis with uListen AI
A near-term $10k/month opportunity is ‘agent deployment’ for local businesses.
There’s a gap between agent infrastructure and real-world implementation. ...
Get the full analysis with uListen AI
Language learning may shift from necessity to personal development.
With translation devices improving, the utilitarian need to learn languages may decline, but cultural learning and self-enrichment remain; native speech may still be smoother than mediated translation due to latency/turn-taking constraints.
Get the full analysis with uListen AI
Notable Quotes
“Voice will be the... one of the key interfaces to the technology around us.”
— Mati Staniszewski
“You don't have to be the coder. You just need to...”
— Mati Staniszewski
“I think it's going to happen.”
— Mati Staniszewski
“By default, it's AI, and you assume it's AI.”
— Mati Staniszewski
“All the people that will be replaced will be replaced by people that use AI.”
— Mati Staniszewski
Questions Answered in This Episode
On ElevenLabs’ own inbound pipeline, what conversion-rate lift did you see from the voice agent versus the previous “wait for a human” baseline (and how do you measure net-new vs cannibalized leads)?
Mati argues voice is becoming a primary interface to AI because it carries emotion, context, and usability that text cannot, and businesses are rapidly adopting voice agents for support, sales, and product navigation.
Get the full analysis with uListen AI
For a small business (like a local dentist), what’s the minimum ‘business logic’ needed to launch a usable scheduling agent without harming customer experience?
He explains how companies can deploy low-latency voice agents by connecting speech, LLM reasoning, and business workflows (calendar booking, knowledge bases, handoffs), often for hundreds of dollars per month plus telephony integrations like Twilio.
Get the full analysis with uListen AI
What are the biggest failure modes you see in real deployments—hallucinated policies, wrong bookings, or integrations breaking—and how do you recommend monitoring them?
On the creator side, ElevenLabs’ voice marketplace lets users authenticate, clone their voice with ~30 minutes of recording, and earn royalties when others use it—about $5M paid to the community so far, with earnings skewed toward distinctive voices/accents.
Get the full analysis with uListen AI
You mentioned multilingual selling “in my voice.” How do you handle cultural tone, formality, and compliance differences across languages beyond just translation?
They also discuss risks: impersonation will happen, so the future needs an “assume AI by default” mindset plus a three-layer verification model (device authenticity, authenticated/watermarked AI, and default distrust when unverified), alongside job shifts where AI replaces those who don’t use AI.
Get the full analysis with uListen AI
In the marketplace, what specifically makes a voice ‘unique’ enough to earn—accent, timbre, prosody, brand-ability—and how should creators position their listing to stand out?
Get the full analysis with uListen AI
Transcript Preview
we paid about $5 million to the entire community.
Meet Mati, CEO and co-founder of ElevenLabs, a company that has grown into a $6.6 billion leader in the Voice AI space, shaping how we talk, work, and even earn money. They've created an entire voice marketplace. Now, anyone can clone their voice and earn passive income. Can you name some opportunities that you see that can make people a decent amount of money so they can make a living, like $10k a month, something that's immediate?
Business, and you just want to make good money, I would try to take those voice agents and go to, let's say, local doctor's office and-
ElevenLabs built the world's most realistic voice tech. The question is: Can they control what happens next?
Most of those companies just don't know this is possible. You don't have to be the coder, you just need to- [beep]
If my voice is authorized to use my credit card to buy anything, and then somebody just uses the resemblance of it-
I think it's, it's, it's going to happen, but, uh-
Hey, guys, welcome to Silicon Valley Girl. We have one of the guests today whose product I've been using for a while now, so I'm gonna ask a lot of technical questions [chuckles] as well. But please welcome Mati from ElevenLabs.
Well-
Thank you so much.
Thank you so much, Marina. Great to see you again, and thanks for, thanks for having me.
Yeah, thank you. I feel like you're one of the pioneers of this AI industry, because when I ask people, like, what apps they're using, when I'm talking about apps that I'm using, I always mention ElevenLabs, 'cause it's been a lifesaver. I wanted to start with a question, um, about the role of voice in AI. So what it feels to me is that 2023, you know, we started adopting ChatGPT, it was all text, and then these voice capabilities became more and more powerful. It understands what I'm saying now, it understands my accent. If I mispronounce something, it still gets me. Do you feel like we're moving into the era where voice is our main tool to interact with AI?
I mean, 100%. I do think that voice will be the... one of the key interfaces to the technology around us, and, um, and that shift is happening. Like you said, it's like a few years back, you wouldn't even dream of this being possible, and now I think it's, it's becoming a reality, where it, it allows you to transfer so much information, more than the text. You can, you can get the emotionality, the inflection pattern, the imperfections reflected in the voice, which of course makes it easier for the, um, if it's an input, for the, uh, for the technology to understand a lot more about the, the setup that you... or what you are trying to achieve. And then if you hear it back as well, I think it's a lot better and more pleasurable, um-
Install uListen to search the full transcript and get AI-powered insights
Get Full TranscriptGet more from every podcast
AI summaries, searchable transcripts, and fact-checking. Free forever.
Add to Chrome