a16zElevenLabs CEO: Why Voice is the Next AI Interface
Jennifer Li and Mati Staniszewski on elevenLabs CEO on voice AI, teams, enterprise, and licensing.
In this episode of a16z, featuring Mati Staniszewski and Jennifer Li, ElevenLabs CEO: Why Voice is the Next AI Interface explores elevenLabs CEO on voice AI, teams, enterprise, and licensing ElevenLabs sustains fast, high-quality shipping by running many small, high-ownership product teams alongside a strong research core led by its cofounder.
At a glance
WHAT IT’S REALLY ABOUT
ElevenLabs CEO on voice AI, teams, enterprise, and licensing
- ElevenLabs sustains fast, high-quality shipping by running many small, high-ownership product teams alongside a strong research core led by its cofounder.
- The company balances research versus product pragmatism by shipping product “gap fixes” when research breakthroughs are unlikely within roughly three months.
- A remote-first talent strategy (with regional hubs) helps ElevenLabs hire exceptional and non-traditional candidates globally, while maintaining culture through in-person immersion where needed.
- To reduce creative-industry resistance, ElevenLabs built creator-aligned economics via a Voice Marketplace and pursued fully licensed music generation through lengthy label negotiations.
- As ElevenLabs moves from creator-first PLG to enterprise adoption, it is building orchestration, integrations, reliability, and governance—while adapting culture and incentives for long sales cycles.
IDEAS WORTH REMEMBERING
5 ideasUse small, independent teams to keep shipping speed high.
ElevenLabs runs ~20 product teams of 5–10 people with high autonomy and ownership, accepting some duplication as the cost of moving quickly.
Set a time threshold for when product should patch what research can’t.
They avoided “UI sliders” in favor of solving problems at the model level, but adopted a rule of thumb: if research will take >3 months, ship a product workaround now.
Remote-first can be a competitive advantage if paired with intentional hubs.
They hired globally (including non-traditional backgrounds) and later added hubs (London/Warsaw/SF) to help new or early-career hires absorb context and culture.
Flattening hierarchy works, but requires strong cross-team ‘leads’ and focus control.
ElevenLabs removed titles and kept few leadership layers; to avoid distraction from radical transparency, they limit broad Slack exposure so teams maintain attention.
Creator-aligned monetization can turn AI skepticism into participation.
The Voice Marketplace lets users create/share voices and earn revenue; they report ~10,000 voices and $10M paid back to the community, reframing AI as opportunity.
WORDS WORTH SAVING
5 quotesSo we launched Voice Marketplace, where you, you could create your voice and then, uh, share it. And when the voice is shared, you earn money in the return. Today, we have almost ten thousand voices. We paid ten million dollars back to the people in the community.
— Mati Staniszewski
We don't want to do any sliders, any toggles. We don't want to become same as previous generation of, of the editing suites. So instead, let's solve it on the research level, where it will know based on the voice exactly how it should speak with the speed.
— Mati Staniszewski
I think ElevenLabs wouldn't have existed if we weren't starting from Europe.
— Mati Staniszewski
So we removed titles a year ago, and then, um-- and it's, it's going well. It still works.
— Mati Staniszewski
In some ways, the, the quota, the commissions are a effectively a lagging indicator of strategy.
— Mati Staniszewski
QUESTIONS ANSWERED IN THIS EPISODE
5 questionsHow exactly do you estimate whether a research initiative will take more or less than the “three-month” threshold, and what signals have proven most reliable?
ElevenLabs sustains fast, high-quality shipping by running many small, high-ownership product teams alongside a strong research core led by its cofounder.
What were the key contract terms or structural elements that finally made the 18-month label negotiations workable for licensed music generation?
The company balances research versus product pragmatism by shipping product “gap fixes” when research breakthroughs are unlikely within roughly three months.
In the Voice Marketplace, what safeguards or verification steps prevent abuse (e.g., unauthorized voice uploads) while still keeping onboarding easy for creators?
A remote-first talent strategy (with regional hubs) helps ElevenLabs hire exceptional and non-traditional candidates globally, while maintaining culture through in-person immersion where needed.
Which parts of the enterprise voice-agent stack (telephony, evaluation, monitoring, versioning, compliance) have been the biggest bottlenecks to moving from demo to production?
To reduce creative-industry resistance, ElevenLabs built creator-aligned economics via a Voice Marketplace and pursued fully licensed music generation through lengthy label negotiations.
How does the no-titles policy impact performance management, promotion, and retention—especially for senior hires who expect clear leveling?
As ElevenLabs moves from creator-first PLG to enterprise adoption, it is building orchestration, integrations, reliability, and governance—while adapting culture and incentives for long sales cycles.
Chapter Breakdown
ElevenLabs’ expanding audio stack: from voices to agents and licensed music
Mati opens with how ElevenLabs has broadened beyond text-to-speech into voice-agent orchestration and a fully licensed music model. The framing sets up the company’s thesis that audio—especially voice—is becoming a primary AI interface across many use cases.
Origin story and the “Eleven” motif: early constraints and early momentum
Mati reflects on the early days and the company’s fondness for the number eleven, using it as a light way to illustrate growth. He contrasts the tiny early infrastructure footprint with today’s scale to highlight execution speed and rapid organizational evolution.
Research foundation: building expressive, context-aware voice models
He credits cofounder Piotr and the research team for the core breakthroughs—capturing context, emotion, intonation, and speaker characteristics. This R&D base later enabled expansion to STT, music, and other audio domains.
Shipping fast with small, high-ownership teams (and the trade-offs)
ElevenLabs organizes into many small, independent teams that can ship end-to-end, maximizing ownership and throughput. Mati acknowledges the cost—duplication and uneven pace—but argues the speed and accountability benefits dominate.
Balancing research vs. product: when to ‘ship the hack’ vs. wait for the breakthrough
Mati describes a practical rule for deciding whether to rely on research progress or product workarounds, illustrated by a “speech speed” feature debate. The takeaway is to avoid being blocked by uncertain research timelines while still aiming for elegant model-level solutions.
Remote-first to hub-first hybrid: meeting global talent where they are
ElevenLabs began remote-first to access specialized talent globally, then added hubs once headcount and onboarding complexity increased. The model is flexible: early-career hires benefit from hubs, while experienced remote workers can stay distributed.
Cultural contrasts and unconventional hiring signals
Mati contrasts US and European work cultures, noting Europe’s quieter “work talk” norms but strong pockets of highly driven talent. He also emphasizes non-traditional hiring—valuing demonstrated craft (e.g., open-source work) over standard credentials.
Flat structure and no titles: speeding impact while managing attention and coordination
ElevenLabs removed titles and kept leadership layers thin to reinforce merit, mobility, and team-level ownership. Mati also describes operational realities: “leads” must coordinate across teams, and transparency can backfire if it distracts people from priorities.
Creative industries warming to AI: relationship-building over disruption narratives
The conversation turns to how creative professionals shifted from early resistance to increasing adoption. Mati argues the best results come from deep engagement with creative workflows and showing concrete examples to overcome knee-jerk “AI is bad” reactions.
Voice Marketplace: scaling diversity of voices while paying creators
Mati explains the marketplace strategy: enabling creators to upload and share voices, earn revenue, and help ElevenLabs cover the long tail of accents, languages, and styles. He shares growth metrics and an anecdote showing demand can emerge unexpectedly across markets.
Licensing music with labels: the 18-month negotiation and ‘forcing functions’
Mati outlines how ElevenLabs approached licensed music generation by partnering with major label groups and aligning on protections and rights. He highlights the need for deadlines/forcing functions, repeated resets, and extensive education to reach agreement.
Hiring for complex, high-stakes domains: risk-tolerant counsel and expert bridging
In unfamiliar areas like legal and licensing, ElevenLabs blended experienced operators with specialist consultants who spoke the industry’s language. Mati describes mis-hires—especially overly risk-averse profiles—and the value of counsel who can propose pragmatic lines, not just enumerate risks.
From creator-first PLG to enterprise: building orchestration, reliability, and GTM muscle
ElevenLabs began with creator adoption but saw early enterprise inbound, forcing a shift toward sales-led execution and production-grade platforms. Mati details how enterprise needs drove orchestration (STT + LLM + TTS), integrations (telephony), and the hard work of evaluation, monitoring, and compliance.
Scaling execution: alpha vs. stable releases, pre- vs. post-PMF teams, and incentive alignment
Mati describes mechanisms to preserve speed while serving enterprise: clear alpha labeling with customer choice, and internal separation of pre-PMF (ship fast) vs post-PMF (stability) teams. He closes with a CEO lesson on scaling: incentives shape behavior, so comp plans must match strategy—illustrated by refusing to license models to a competitor even when commissions would encourage it.
EVERY SPOKEN WORD
Install uListen for AI-powered chat & search across the full episode — Get Full Transcript
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome