Skip to content
Lex Fridman PodcastLex Fridman Podcast

Dan Kokotov: Speech Recognition with AI and Humans | Lex Fridman Podcast #151

Dan Kokotov is VP of Engineering at Rev.ai, an automatic speech recognition company. Please support this podcast by checking out our sponsors: - Athletic Greens: https://athleticgreens.com/lex and use code LEX to get 1 month of fish oil - Blinkist: https://blinkist.com/lex and use code LEX to get 25% off premium - Business Wars: https://wondery.com/business-wars/ - Cash App: https://cash.app/ and use code LexPodcast to get $10 EPISODE LINKS: Rev: https://www.rev.com Rev.ai: https://rev.ai PODCAST INFO: Podcast website: https://lexfridman.com/podcast Apple Podcasts: https://apple.co/2lwqZIr Spotify: https://spoti.fi/2nEwCF8 RSS: https://lexfridman.com/feed/podcast/ Full episodes playlist: https://www.youtube.com/playlist?list=PLrAXtmErZgOdP_8GztsuKi9nrraNbKKp4 Clips playlist: https://www.youtube.com/playlist?list=PLrAXtmErZgOeciFP3CBCIEElOJeitOr41 OUTLINE: 0:00 - Introduction 3:23 - Dune 6:39 - Rev 12:39 - Translation 19:28 - Gig economy 28:08 - Automatic speech recognition 38:58 - Create products that people love 47:08 - The future of podcasts at Spotify 1:08:46 - Book recommendations 1:10:08 - Stories of our dystopian future 1:13:50 - Movies about Stalin and Hitler 1:19:05 - Interviewing Putin 1:25:02 - Meaning of life CONNECT: - Subscribe to this YouTube channel - Twitter: https://twitter.com/lexfridman - LinkedIn: https://www.linkedin.com/in/lexfridman - Facebook: https://www.facebook.com/LexFridmanPage - Instagram: https://www.instagram.com/lexfridman - Medium: https://medium.com/@lexfridman - Support on Patreon: https://www.patreon.com/lexfridman

Lex FridmanhostDan Kokotovguest
Jan 4, 20211h 28mWatch on YouTube ↗

At a glance

WHAT IT’S REALLY ABOUT

Human-AI Partnership Transforms Speech Into Text, Work, And Understanding

  1. Lex Fridman speaks with Dan Kokotov, VP of Engineering at Rev, about how Rev combines human freelancers and advanced automatic speech recognition (ASR) to deliver highly accurate captions and transcripts. They cover Rev’s origins as a streamlined alternative to freelancer marketplaces like Upwork, its evolution into rev.ai as a developer platform, and the technical and data advantages that drive their speech-to-text models. The conversation branches into the economics and ethics of the so‑called gig economy, product design, API and platform ecosystems, and how fully searchable audio could reshape podcasts, meetings, and information access. Along the way they explore management vs. engineering, dystopian sci‑fi (Dune, Brave New World, Brazil), platform censorship, and the broader meaning of creativity and human connection.

IDEAS WORTH REMEMBERING

5 ideas

Standardizing well-defined services enables a better marketplace than generic freelance platforms.

Rev focused on tightly scoped, repeatable work like transcription and translation, hiding freelancer selection and complexity from customers and pricing by simple units (minutes, pages). This made the experience far smoother than browsing and vetting individuals on platforms like Upwork.

High-quality labeled data from real customer work is Rev’s core ASR advantage.

Because Rev’s business is literally paying humans to create accurate transcripts, it accrues massive, well-labeled speech datasets—and even detailed edit logs—forming a powerful flywheel for training and improving its speech recognition models.

Human editors plus ASR drafts dramatically boost speed and quality over either alone.

Revvers typically start from machine-generated transcripts and correct them using specialized tools; on clean audio this can bring turnaround close to 2–3x real-time, while still surpassing raw ASR quality and handling difficult accents, noise, and domain-specific language.

Democratized, remote, on-your-own-time work can be meaningful, not merely ‘gig’ labor.

Rev’s freelancers include parents at home, students, socially anxious people, and former high-powered professionals—all using flexible, location-independent work. Many actively enjoy learning from the audio they transcribe, highlighting how such platforms can expand work options when done thoughtfully.

Making all spoken content searchable could transform how we learn and reference information.

If podcasts, meetings, and lectures were universally transcribed and indexed, people could quickly jump to exact moments, embed precise clips, and perform text-like search across speech—vastly improving discovery, research, and reuse of audio content.

WORDS WORTH SAVING

5 quotes

Our business model is being paid to annotate the data.

Dan Kokotov

You don’t need to worry about the details of how the sausage is made.

Dan Kokotov

There’s something about podcasts, like the audio, that makes it difficult to jump to a spot and look for that particular information.

Lex Fridman

You have to be like a creator and have that creator’s love for your own thing.

Dan Kokotov

If you’re not a religious person, the meaning of life is in contributing to this weird thing we call humanity—creating things and bringing up kids who are unique and different.

Dan Kokotov

Origins and business model of Rev (rev.com) and rev.aiHuman–AI collaboration in transcription and captioning workflowsTechnical landscape and challenges of automatic speech recognition (ASR)The ‘gig economy’, remote work, and life as a Rev freelancer (Revver)Product design, APIs, and frustrations with major tech platformsPodcasts, searchability of speech, and the future of audio contentLeadership, management vs. programming, and broader philosophical themes

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome