CHAPTERS
David AI’s mission and context: YC S24 to $25M Series A
Diana Hu introduces David AI founders Tomer Cohen and Ben Wiley and frames the discussion around their rapid traction after YC and a newly announced $25M Series A. The conversation sets up why audio data is emerging as critical infrastructure for the “voice era” of AI.
What David AI does: an audio data research company for conversational speech
Tomer explains that David AI specializes in collecting and developing speech datasets—specifically conversational audio. Their emphasis is on real conversations across languages, dialects, accents, and contexts, designed to support next-generation speech models.
Why high-quality speech data is hard: no “Common Crawl for audio”
Ben outlines structural problems with sourcing training-grade audio from the internet. Much online audio is mono/single-track, while cutting-edge end-to-end speech architectures require cleanly separated channels with extremely low tolerance for cross-talk or bleed.
Origin story: from Scale friendships to a bet on multimodal and voice AI
The founders met at Scale AI and wanted to build something together, drawn to multimodal AI and voice as a key interface for real-world AI. They applied to YC with an idea, got accepted, left their jobs, and moved to San Francisco to start building immediately.
Customer discovery unlock: a robotics company’s audio-data bottleneck
Early outreach to YC companies revealed a surprising signal: a humanoid robotics company needed the most help not with hardware, but with audio data for voice. That insight helped David AI commit to audio as a wedge into a much larger set of use cases.
Contrarian focus: go deep on one modality instead of going horizontal
Diana highlights that specializing in audio can look contrarian given established data players. Tomer explains their conviction: voice is foundational for non-keyboard interfaces (robots, wearables, games, avatars), and depth in a vertical can create a defensible, repeatable product.
First product prototype: a weekend phone-calling app to collect clean conversations
Ben describes their initial build: a phone-based calling app created over a weekend to get friends and family to generate early conversational datasets. That prototype validated the collection approach and evolved into a global platform supporting both scripted and unscripted conversations.
From $1K pilot to six-figure and seven-figure contracts
Tomer walks through the early sales progression: a $1,000 first contract with the robotics customer, learning-driven iteration, and then a six-figure deal with a major AI lab by the end of the YC batch. Soon after, they closed seven-figure contracts and expanded to large tech customers with massive audio needs.
Go-to-market flywheel: “we’re not selling”—labs evaluate usefulness of datasets
They describe a product-led sales motion where accumulated datasets reduce friction over time. As they collect more and improve quality, it becomes easier for labs to adopt—customers can judge utility directly rather than negotiating a bespoke services engagement.
Operating model: an audio “data research lab,” not a labeling services shop
Tomer explains their differentiation: they form an internal point of view on where speech models are going, run R&D to validate data shapes, then scale winning datasets for broad adoption. This contrasts with traditional professional-services labeling, where data is collected to custom specs and owned by the customer.
Why David AI matters in the ecosystem: “picks and shovels” for voice agents
Diana connects David AI’s work to the boom in voice agent startups serving vertical industries. Tomer emphasizes the dependency chain: voice apps depend on models, and models depend on data—audio data is an underappreciated layer enabling the visible application boom.
What’s next: build the research function and scale collection 10x (then 10x again)
Looking ahead, Tomer highlights two parallel priorities: strengthening audio research to anticipate model roadmaps, and scaling data collection operations dramatically. The company sees a large opportunity and is focused on team growth to capture it.
Hiring plan: researchers, engineers, and operators to execute the vision
They close with specific hiring needs aligned to their strategy. Researchers will drive the forward-looking data roadmap, while engineers and operators will build and run the systems needed to collect and deliver datasets at scale.
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome