Lex Fridman PodcastDan Kokotov: Speech Recognition with AI and Humans | Lex Fridman Podcast #151
At a glance
WHAT IT’S REALLY ABOUT
Human-AI Partnership Transforms Speech Into Text, Work, And Understanding
- Lex Fridman speaks with Dan Kokotov, VP of Engineering at Rev, about how Rev combines human freelancers and advanced automatic speech recognition (ASR) to deliver highly accurate captions and transcripts. They cover Rev’s origins as a streamlined alternative to freelancer marketplaces like Upwork, its evolution into rev.ai as a developer platform, and the technical and data advantages that drive their speech-to-text models. The conversation branches into the economics and ethics of the so‑called gig economy, product design, API and platform ecosystems, and how fully searchable audio could reshape podcasts, meetings, and information access. Along the way they explore management vs. engineering, dystopian sci‑fi (Dune, Brave New World, Brazil), platform censorship, and the broader meaning of creativity and human connection.
IDEAS WORTH REMEMBERING
5 ideasStandardizing well-defined services enables a better marketplace than generic freelance platforms.
Rev focused on tightly scoped, repeatable work like transcription and translation, hiding freelancer selection and complexity from customers and pricing by simple units (minutes, pages). This made the experience far smoother than browsing and vetting individuals on platforms like Upwork.
High-quality labeled data from real customer work is Rev’s core ASR advantage.
Because Rev’s business is literally paying humans to create accurate transcripts, it accrues massive, well-labeled speech datasets—and even detailed edit logs—forming a powerful flywheel for training and improving its speech recognition models.
Human editors plus ASR drafts dramatically boost speed and quality over either alone.
Revvers typically start from machine-generated transcripts and correct them using specialized tools; on clean audio this can bring turnaround close to 2–3x real-time, while still surpassing raw ASR quality and handling difficult accents, noise, and domain-specific language.
Democratized, remote, on-your-own-time work can be meaningful, not merely ‘gig’ labor.
Rev’s freelancers include parents at home, students, socially anxious people, and former high-powered professionals—all using flexible, location-independent work. Many actively enjoy learning from the audio they transcribe, highlighting how such platforms can expand work options when done thoughtfully.
Making all spoken content searchable could transform how we learn and reference information.
If podcasts, meetings, and lectures were universally transcribed and indexed, people could quickly jump to exact moments, embed precise clips, and perform text-like search across speech—vastly improving discovery, research, and reuse of audio content.
WORDS WORTH SAVING
5 quotesOur business model is being paid to annotate the data.
— Dan Kokotov
You don’t need to worry about the details of how the sausage is made.
— Dan Kokotov
There’s something about podcasts, like the audio, that makes it difficult to jump to a spot and look for that particular information.
— Lex Fridman
You have to be like a creator and have that creator’s love for your own thing.
— Dan Kokotov
If you’re not a religious person, the meaning of life is in contributing to this weird thing we call humanity—creating things and bringing up kids who are unique and different.
— Dan Kokotov
High quality AI-generated summary created from speaker-labeled transcript.
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome