$6.6B AI CEO: How to Make Your First $10,000 with AI
CHAPTERS
- 0:00 – 1:23
Why voice is becoming the default interface for AI
Marina and Mati open by framing the shift from text-first AI (ChatGPT era) to voice-first interactions. Mati argues voice carries more information than text—emotion, intent, and nuance—making it both a richer input and a more natural output.
- •Voice as a primary AI interface is already underway
- •Voice conveys emotion, inflection, and “imperfections” that improve understanding
- •Audio responses can feel more human and pleasurable than text
- •Accents and mispronunciations are increasingly well-handled by modern systems
- 1:23 – 2:28
Voice agents for support and sales: real business use cases
Mati describes how companies replace IVR and basic call handling with voice agents that understand callers and respond quickly. He also explains how voice agents can assist across the user journey—from product guidance to inbound/outbound lead handling—and can sometimes convert directly on self-serve tiers.
- •Replacing legacy IVR with conversational voice agents
- •Voice agents across the full customer journey, not just support
- •Using agents for inbound/outbound qualification and routing
- •ElevenLabs uses agents to explain pricing and accelerate pipeline
- •Direct conversion is possible for self-serve plans; enterprise still needs KYC
- 2:28 – 5:22
Measuring impact: conversion lift and net-new lead capture
Marina presses on whether AI voice sales materially improves conversions. Mati notes they measured it, but the bigger win was capturing leads that would otherwise wait days/weeks—or never convert—because agents can respond instantly at scale.
- •AI voice agents reduce response time to near-zero
- •Biggest benefit can be net-new leads rather than pure conversion-rate lift
- •Agents help prospects self-qualify or move faster to the right human
- •Scales customer conversations without increasing headcount
- 5:22 – 7:14
Picking a domain that fits: the .online sponsorship segment
A sponsored interlude explains why .com domains are often unavailable and positions .online as a flexible alternative used by millions of businesses. Marina highlights discoverability, examples of notable .online sites, and a limited-time coupon offer.
- •.com scarcity leads to awkward brand names
- •Positioning .online as widely adopted and memorable
- •Claiming a domain that matches brand/product naming
- •Promotional pricing and coupon call-to-action
- 7:14 – 12:00
How to set up an ElevenLabs voice agent: platform + business logic
Mati outlines the practical setup: create an account, use their agentic platform that abstracts speech/LLM/TTS orchestration, then add the business knowledge and workflows. He gives examples like appointment scheduling, routing, and embedding agents into a website experience.
- •Agentic platform handles low-latency speech + LLM + TTS orchestration
- •Businesses must supply knowledge base, FAQs, and desired behaviors
- •Workflow logic: triggers, actions, and function calls (e.g., scheduling)
- •Omnichannel deployment: phone and web embed
- •Integrations (e.g., Twilio/telephony) enable using existing numbers
- 12:00 – 15:18
Multilingual sales and language learning: using your own voice across languages
Marina explores using her voice to sell courses in multiple languages and reduce anxiety for non-native speakers. Mati confirms multilingual support and suggests an adjacent use case: AI-powered speaking practice and personalized tutoring that feels less judgmental than a human call.
- •Agents can speak multiple languages while retaining the creator’s voice identity
- •Checkout via links/email follow-up or guided web flow
- •Lower friction for non-native speakers who avoid phone calls
- •Potential for “practice with Marina” conversational language training
- •Voice agents as scalable, personalized onboarding and coaching
- 15:18 – 17:37
Voice marketplace: earning money by licensing your cloned voice
Mati explains ElevenLabs’ voice marketplace: users record ~30+ minutes, complete authentication, and can publish their voice under defined terms. Creators earn royalties when others use their voice; the company has paid millions back to the community and is expanding language coverage.
- •Record ~30 minutes, authenticate, then optionally list on marketplace
- •Creators set time period/conditions for sharing
- •Royalties paid when the voice is used across the ecosystem
- •Scale: ~10,000 shared voices; payouts growing from $2M to ~$5M+
- •Uniqueness (accent/style) and community promotion improve earnings potential
- 17:37 – 21:37
Voice cloning quality issues: matching a specific scene and audio mix
Marina describes a real creator workflow—patching a line in a video while traveling—and the difficulty of matching the exact tone of a particular scene. Mati explains why clones reflect an “average” voice profile and previews future conditioning features; meanwhile he suggests regenerating or training from shorter, scene-specific samples.
- •Clones often represent an averaged voice rather than scene-specific emotion/intonation
- •Mismatch can come from both prosody and background audio mixing
- •Future direction: conditioning on surrounding seconds for better continuity
- •Current workaround: regenerate multiple takes
- •Counterintuitive tip: train from a short scene snippet to match that moment better
- 21:37 – 25:10
The future of voice AI: personal agents and hyper-personalized voices
Mati predicts most people will have an authenticated voice clone and eventually a personal voice agent that can act on their behalf. He also describes personalization at scale: businesses adapting voice style to caller demographics and users selecting preferred “service voices,” with examples from Korea/Japan and travel/navigation apps.
- •Personal authenticated voice + voice agent becomes common
- •Voice authentication will need to evolve beyond “sounds like you”
- •Businesses will tailor pace/emotion/accent to audience segments
- •Users may choose preferred voices for services (banks, maps, travel)
- •Viral/brand-defining voices can become product differentiators
- 25:10 – 27:35
Deepfakes and trust: the three-layer safeguard model
Marina raises impersonation risks, especially if voice could authorize payments. Mati argues we must assume perfect cloning will be possible and outlines a three-layer approach: prove human/device identity, watermark authenticated AI, and treat anything else as untrusted AI by default.
- •Assume bad actors and permissive tools will exist; plan for “perfect clones”
- •Layer 1: device-based identity signals to verify the caller/source
- •Layer 2: watermarking/metadata for authenticated AI (with quality tradeoffs)
- •Layer 3: default-to-AI skepticism unless proven human or permissioned
- •Open question: how much AI-modified content still counts as “human delivery”
- 27:35 – 31:23
What keeps an AI founder up at night: research race, safety, and labor impact
Mati balances optimism about a once-in-a-generation platform shift with the pressure of staying ahead in research. He cites responsibility for safeguards and concern about job disruption, while describing ecosystem approaches (like the marketplace) to share upside more broadly.
- •AI as a historic shift (possibly bigger than the internet)
- •Pressure to keep leading in audio research against major labs
- •Safety and misuse prevention as a core responsibility
- •Job displacement concerns and desire for inclusive transition mechanisms
- •Hiring and culture-setting are critical in a fast-moving space
- 31:23 – 33:50
Jobs at risk and how to adapt: “replaced by people who use AI”
Mati argues the key divide won’t be AI vs. humans, but AI-augmented workers vs. those who don’t adopt tools. He expects routine, recipe-based tasks to be automated first while complex, regulated, or expert judgment work remains higher-value—at least initially.
- •Core message: learn AI tools or risk being outcompeted
- •Customer support: routine tasks (scheduling/refunds) automate first
- •Human experts remain essential for complex, edge-case, regulated issues
- •Domain expertise + AI becomes a high-demand combination
- •Creative output accelerates; opportunity grows but requires faster iteration
- 33:50 – 36:49
Top 3 AI tools Mati recommends (besides ElevenLabs)
Mati names tools he finds especially useful across creation and building: Black Forest Labs for image generation, Anthropic’s Claude for reasoning and engineering-like help, and Lovable (plus similar tools like v0/Replit) for rapid prototyping. He emphasizes how these tools let non-engineers build demos and get closer to product implementation.
- •Black Forest Labs for high-realism image generation
- •Claude for strong reasoning and “another level engineer” assistance
- •Lovable (and v0/Replit) for rapid app/prototype building
- •AI lowers the barrier for go-to-market teams to prototype solutions
- •Prototyping tightens feedback loops with customers
- 36:49 – 41:53
The $10k/month opportunity: deploying voice agents for SMBs (no coding required)
Asked for immediate, practical ways to earn money, Mati points to a deployment gap: tools exist, but small businesses don’t know how to implement them. He suggests selling setup and operation of appointment-booking voice agents to dentists, doctors, and mechanics—businesses where missed calls equal lost revenue.
- •Big gap between voice-agent infrastructure and real-world deployment
- •Target SMBs that miss calls and lose bookings (dentists, clinics, mechanics)
- •Offer appointment scheduling, FAQ handling, and lead capture
- •You don’t need to code—focus on outreach, setup, and domain tailoring
- •A few clients can reach thousands to tens of thousands in monthly revenue
- 41:53 – 43:58
Startup advice + the origin of ElevenLabs, and whether language learning survives translation tech
Mati advises founders to obsess over a real, burning user problem, validate demand early, and choose co-founders/early hires carefully. He recounts starting from a dubbing pain point (flat Polish voiceover) but pivoting based on creator feedback; they close by debating language learning’s future as translation becomes ubiquitous, shifting from necessity to self-development and cultural connection.
- •Start with deep user understanding and validate pain intensity
- •Early prototype + user conversations revealed better initial product (line repair/voiceover)
- •Creators and audiobook authors became key early adopters due to cost barriers
- •Co-founder alignment and early team quality set culture and trajectory
- •Language learning may shift from necessity to hobby/self-growth, though native fluency still has advantages