Skip to content
YC Root AccessYC Root Access

David AI: Powering the Voice Era of AI

Tomer Cohen and Ben Wiley launched David AI just days before the Y Combinator deadline—submitting their application at midnight and hoping it counted. A year later, their company is now one of the market leaders for voice training data in AI, having just closed a $25 million Series A. They met while working at Scale AI, where they bonded over the belief that the next big leap for AI would be moving beyond screens, into real-world interactions powered by voice. That idea became David AI, a company that collects, produces, and refines massive volumes of audio data for training voice models. So far, they've built a library of 100,000 hours of audio in over 15 languages, complete with rich metadata like accents and dialects. YC Partner Diana Hu recently sat down with the David AI founders to talk about how they got here, their founding story, and the kind of company they are building. Learn more about David AI at https://www.withdavid.ai. Apply to Y Combinator: https://ycombinator.com/apply Chapters 00:00 - Introduction 00:12 - What is David AI? 00:31 - Challenges in Audio Data 01:11 - Origin Story of David AI 01:46 - Building the First Product 04:12 - Early Success and Growth 05:24 - Business Model and Approach 07:40 - Future Plans and Hiring

Diana HuhostTomer CohenguestBen Wileyguest
May 28, 20259mWatch on YouTube ↗

CHAPTERS

  1. David AI’s mission and context: YC S24 to $25M Series A

    Diana Hu introduces David AI founders Tomer Cohen and Ben Wiley and frames the discussion around their rapid traction after YC and a newly announced $25M Series A. The conversation sets up why audio data is emerging as critical infrastructure for the “voice era” of AI.

  2. What David AI does: an audio data research company for conversational speech

    Tomer explains that David AI specializes in collecting and developing speech datasets—specifically conversational audio. Their emphasis is on real conversations across languages, dialects, accents, and contexts, designed to support next-generation speech models.

  3. Why high-quality speech data is hard: no “Common Crawl for audio”

    Ben outlines structural problems with sourcing training-grade audio from the internet. Much online audio is mono/single-track, while cutting-edge end-to-end speech architectures require cleanly separated channels with extremely low tolerance for cross-talk or bleed.

  4. Origin story: from Scale friendships to a bet on multimodal and voice AI

    The founders met at Scale AI and wanted to build something together, drawn to multimodal AI and voice as a key interface for real-world AI. They applied to YC with an idea, got accepted, left their jobs, and moved to San Francisco to start building immediately.

  5. Customer discovery unlock: a robotics company’s audio-data bottleneck

    Early outreach to YC companies revealed a surprising signal: a humanoid robotics company needed the most help not with hardware, but with audio data for voice. That insight helped David AI commit to audio as a wedge into a much larger set of use cases.

  6. Contrarian focus: go deep on one modality instead of going horizontal

    Diana highlights that specializing in audio can look contrarian given established data players. Tomer explains their conviction: voice is foundational for non-keyboard interfaces (robots, wearables, games, avatars), and depth in a vertical can create a defensible, repeatable product.

  7. First product prototype: a weekend phone-calling app to collect clean conversations

    Ben describes their initial build: a phone-based calling app created over a weekend to get friends and family to generate early conversational datasets. That prototype validated the collection approach and evolved into a global platform supporting both scripted and unscripted conversations.

  8. From $1K pilot to six-figure and seven-figure contracts

    Tomer walks through the early sales progression: a $1,000 first contract with the robotics customer, learning-driven iteration, and then a six-figure deal with a major AI lab by the end of the YC batch. Soon after, they closed seven-figure contracts and expanded to large tech customers with massive audio needs.

  9. Go-to-market flywheel: “we’re not selling”—labs evaluate usefulness of datasets

    They describe a product-led sales motion where accumulated datasets reduce friction over time. As they collect more and improve quality, it becomes easier for labs to adopt—customers can judge utility directly rather than negotiating a bespoke services engagement.

  10. Operating model: an audio “data research lab,” not a labeling services shop

    Tomer explains their differentiation: they form an internal point of view on where speech models are going, run R&D to validate data shapes, then scale winning datasets for broad adoption. This contrasts with traditional professional-services labeling, where data is collected to custom specs and owned by the customer.

  11. Why David AI matters in the ecosystem: “picks and shovels” for voice agents

    Diana connects David AI’s work to the boom in voice agent startups serving vertical industries. Tomer emphasizes the dependency chain: voice apps depend on models, and models depend on data—audio data is an underappreciated layer enabling the visible application boom.

  12. What’s next: build the research function and scale collection 10x (then 10x again)

    Looking ahead, Tomer highlights two parallel priorities: strengthening audio research to anticipate model roadmaps, and scaling data collection operations dramatically. The company sees a large opportunity and is focused on team growth to capture it.

  13. Hiring plan: researchers, engineers, and operators to execute the vision

    They close with specific hiring needs aligned to their strategy. Researchers will drive the forward-looking data roadmap, while engineers and operators will build and run the systems needed to collect and deliver datasets at scale.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome