David AI: Powering the Voice Era of AI

Tomer Cohen and Ben Wiley launched David AI just days before the Y Combinator deadline—submitting their application at midnight and hoping it counted. A year later, their company is now one of the market leaders for voice training data in AI, having just closed a $25 million Series A. They met while working at Scale AI, where they bonded over the belief that the next big leap for AI would be moving beyond screens, into real-world interactions powered by voice. That idea became David AI, a company that collects, produces, and refines massive volumes of audio data for training voice models. So far, they've built a library of 100,000 hours of audio in over 15 languages, complete with rich metadata like accents and dialects. YC Partner Diana Hu recently sat down with the David AI founders to talk about how they got here, their founding story, and the kind of company they are building. Learn more about David AI at https://www.withdavid.ai. Apply to Y Combinator: https://ycombinator.com/apply Chapters 00:00 - Introduction 00:12 - What is David AI? 00:31 - Challenges in Audio Data 01:11 - Origin Story of David AI 01:46 - Building the First Product 04:12 - Early Success and Growth 05:24 - Business Model and Approach 07:40 - Future Plans and Hiring

Diana HuhostTomer CohenguestBen Wileyguest

May 28, 20259mWatch on YouTube ↗

WHAT IT’S REALLY ABOUT

David AI builds high-quality conversational speech datasets for next-gen voice models

David AI focuses narrowly on speech—especially multilingual, multi-accent conversational audio—because high-performing voice models depend on specialized data that is scarce online.
The founders argue audio is uniquely difficult since there is no “common crawl” equivalent and most internet audio is mono, while modern end-to-end speech architectures need clean, separated channels.
The company originated from customer discovery with YC startups, where a humanoid robotics team’s urgent need for voice data revealed a broader market opportunity.
David AI evolved from a weekend-built phone-calling prototype into a global platform collecting scripted and unscripted conversations, enabling rapid growth from $1K pilots to six- and seven-figure contracts.
Their differentiated approach is a research-driven data product model—develop internally validated datasets first, then scale and offer them broadly—rather than bespoke professional-services labeling work.

IDEAS WORTH REMEMBERING

5 ideas

Audio data has a structural supply problem compared to text.

They claim there’s no audio “common crawl,” and what exists online is often unusable for modern training needs, making first-party collection a core advantage.

Separated-at-source audio is a key technical moat for conversational datasets.

Off-the-shelf source separation wasn’t good enough because end-to-end speech models tolerate very little channel bleed, so they collect multi-speaker audio correctly at capture time.

A narrow vertical focus can beat broad data-platform assumptions.

Despite expectations that incumbents dominate “data for AI,” they argue going deep on one modality (voice) lets them solve the hardest edge cases and create repeatable products.

Customer discovery can reveal unexpectedly large markets.

A robotics customer needing voice data became the “aha” that voice is foundational across robots, wearables, games, and avatars—not just call centers.

Research-led data productization is an alternative to bespoke labeling services.

Instead of fulfilling one-off customer specs where the buyer owns the dataset, they run internal R&D to identify valuable dataset shapes, validate them, then scale and sell broadly.

WORDS WORTH SAVING

5 quotes

There’s no real, like, common crawl for audio.

— Ben Wiley

These models have very, very low tolerance for any sort of bleed between channels.

— Ben Wiley

The only way to get high quality data was to collect it separated at the source.

— Ben Wiley

We believe that the best way to build this kind of company is to pick a vertical and go really, really deep.

— Tomer Cohen

Voice AI apps are only as good as the models underneath them, and the models are only as good as the data underneath them.

— Tomer Cohen

Conversational speech data (multi-language/dialect/accent)Lack of high-quality public audio corporaMono vs multi-channel separated-at-source recordingData quality requirements and channel bleed intoleranceVertical specialization vs horizontal data platformsFrom prototype calling app to global collection platformResearch-led dataset productization and scaling

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.