Lex Fridman PodcastDan Kokotov: Speech Recognition with AI and Humans | Lex Fridman Podcast #151
CHAPTERS
- 0:00 – 3:01
Why Lex loves Rev: transcripts, captions, and sponsor independence
Lex introduces Dan Kokotov and explains why Rev’s transcription/captioning workflow matters for making podcasts searchable and referenceable. He also clarifies sponsor boundaries and why curiosity—not money—drives guest selection.
- •Rev as a human+AI captioning/transcription company
- •Lex’s plan to publish transcripts/captions for past and future episodes
- •Quick sponsor mention and explicit stance on editorial independence
- •Examples of “frictionless” tools Lex loves (Premiere, RX, automation tools)
- 3:01 – 6:33
Dune, David Lynch, and the philosophy of pressure and progress
Dan names Dune as the greatest sci‑fi novel and unpacks why its world-building and mood matter. The conversation turns philosophical: comfort can lead to stagnation, and pressure/suffering can force civilizations to evolve.
- •Dan’s favorite sci‑fi: Dune (and sequels)
- •Skepticism vs excitement about new film adaptations; praise for Lynch’s mood capture
- •God Emperor of Dune: Leto’s tyrannical “purpose” and long-term strategy
- •Idea that hardship can drive expansion, renewal, and progress
- 6:33 – 12:39
Rev’s origin story: fixing the Upwork-style marketplace experience
Lex describes the pain of hiring transcription on freelance marketplaces and why Rev felt radically simpler. Dan explains Rev’s founding motivation: remove decision friction for customers and wasted effort for freelancers by standardizing services and hiding complexity.
- •Upwork-like marketplaces create regret and overhead for both sides
- •Rev’s simplified model: upload files → receive results quickly
- •Standardization enables reliable pricing, turnaround, and quality control
- •“Don’t worry how the sausage is made” product philosophy
- 12:39 – 14:45
Why translation first, and how Rev narrowed its focus to speech services
Dan explains why Rev initially chose translation: it was measurable, standardized, and easy to validate. He describes how the company’s focus shifted toward speech-to-text and related language services, including subtitles.
- •Criteria for early verticals: standardized, computable effort, few options
- •Early translation niche: immigration/birth certificate documents
- •Evolution toward speech services (transcription, captions, subtitles)
- •Language direction constraints (e.g., English → Russian subtitles)
- 14:45 – 19:37
The human side of translation: poetry, word choice, and cross-language nuance
Lex shares ambitions around interviewing Russian speakers and building a real-time translated podcast experience. Dan draws from his father’s poetry translation work to highlight why preserving emotion and structure across English and Russian is so hard—especially for machines.
- •Lex’s interest in real-time interpreted interviews (including Putin someday)
- •Dan’s father translating poetry; Louise Glück and Nobel-related attention
- •Structural differences: inflection, gender, syntax, vocabulary mismatch
- •Why machine translation is difficult between very different language families
- 19:37 – 20:26
Scaling a two-sided marketplace: balancing customers and ‘Revvers’
Dan frames Rev as a two-sided marketplace where growth is constrained by matching demand with a reliable workforce. He describes the constant challenge: too few Revvers hurts customers, too many hurts workers—so the system must continually rebalance.
- •Rev’s core scaling problem: supply-demand balance in a marketplace
- •Definition of Revvers (Rev’s freelancer community)
- •Customer experience risks: delays, inconsistent availability
- •Worker experience risks: not enough jobs, churn, motivation loss
- 20:26 – 25:10
Rethinking the ‘gig economy’: autonomy, remote work, and Revver demographics
Lex asks about distributed labor platforms like Mechanical Turk and what’s interesting about the gig economy. Dan prefers a framing centered on autonomy and work-from-home flexibility, then outlines who Revvers are and why they choose this work.
- •Dan dislikes the “gig” label—argues it diminishes skilled work
- •Core promise: flexible work on your own time, without geography limits
- •Revver demographics: mostly US-based due to English transcription needs
- •Wide range of backgrounds: parents, students, professionals, people avoiding offices
- 25:10 – 28:06
Inside the transcription workflow: the editor, ASR drafts, and real productivity
Dan explains what Revvers actually do day-to-day: choose jobs, open a specialized editor, start from an ASR draft (when audio quality allows), and correct errors. They discuss how hard real-time transcription is and what realistic speed looks like.
- •Revver interface: job selection + subject/length preferences
- •Custom transcription editor designed for speed and accuracy
- •Human workflow often begins with ASR draft, sometimes from scratch
- •Typical throughput: ~2–3x the audio duration for good-quality audio
- 28:06 – 34:59
Automatic Speech Recognition (ASR): word error rate, domains, and beating Big Tech
Dan defines ASR and explains how Rev benchmarks performance using word error rate (WER). He contrasts open-ended conversational audio with command-based assistant domains, and discusses competing with Google/Amazon/Microsoft on Rev’s target audio mix.
- •ASR = audio-to-words machine learning problem
- •WER explained: insertions, deletions, substitutions
- •Rev’s cited WER (~14%) vs strong human performance (~2–3%)
- •Domain differences: voice assistant commands vs unstructured conversation
- •Competitive benchmarking against major cloud providers
- 34:59 – 38:59
Rev’s data flywheel and the rev.ai platform vision
Lex notes the “magical” business loop: Rev gets paid to generate labeled data that improves its models. Dan outlines how rev.ai productizes the engine for developers (like AWS building blocks), and hints at richer training signals from Revver edit behavior.
- •Data advantage: aligned incentives to collect high-quality labeled speech data
- •Potential future signal: using the sequence and timing of human edits
- •Product evolution: human-only → consumer ASR editor → ASR-assisted humans → rev.ai API
- •Platform mindset: enable others to build novel apps on accurate transcripts
- 38:59 – 47:19
Building products people love: frictionless UX, documentation, and creator-led leadership
A rant-turned-lesson: Lex and Dan compare great products to painful ones (Mechanical Turk, some APIs) and discuss why big companies stop listening. They connect craftsmanship to leadership styles (Apple/Steve Jobs, Tesla/Elon Musk) and the danger of optimizing only metrics.
- •Frictionless UX as a competitive moat (one-click, simple workflows)
- •Why some platforms stagnate: metrics-first thinking and organizational distance
- •Importance of small teams and obsessive review culture (Apple examples)
- •Leadership that stays close to details and users’ pain points
- 47:19 – 57:37
The future of podcasts: searchable audio, transcripts as infrastructure, and Spotify exclusivity
Lex argues transcripts would transform podcast discovery and referencing the way text transforms notes. They discuss standards (RSS transcript extensions), ecosystem cooperation, and the trade-offs of platform exclusivity highlighted by Spotify’s Joe Rogan deal.
- •Podcasts are hard to search, quote, and revisit without transcripts
- •Transcripts enable indexing, clipping, embedding, and better journalism
- •RSS as an open ecosystem; transcript specs (e.g., Buzzsprout proposal)
- •Spotify exclusivity trade-off: investment/support vs loss of openness
- 57:37 – 1:04:06
Content moderation and free speech: outrage incentives vs healthy conversation
The discussion shifts to the platform responsibility problem: how to handle misinformation and extremist content without becoming an untrusted arbiter. They debate censorship, the engagement incentives of outrage, and the idea of optimizing for long-term user well-being.
- •Difficulty of drawing lines (“no Nazis” quickly becomes shades of gray)
- •Skepticism that removing content alone restores ‘sanity’
- •Outrage drives engagement; business metrics can conflict with social health
- •Alternative goal: long-term happiness/health of users vs short-term clicks
- 1:04:06 – 1:08:47
From engineer to manager: losing the coding dopamine and learning human motivation
Dan describes why he fell in love with programming (creation) and what changes when leading teams instead of writing code. They discuss the challenge of understanding individual motivations and adapting feedback styles to different personalities.
- •Creation as the core joy of programming
- •Management work is diffuse—impact is indirect and harder to measure
- •Different people require different feedback; “manage by exception”
- •Recommended management book: First, Break All the Rules
- 1:08:47 – 1:13:50
Book and film recommendations: Brave New World, Brazil, and the banality of bureaucracy
Dan recommends Brave New World and discusses its prescience about social stratification and sorting. They move into dystopian films—especially Brazil—and reflect on how incompetence and bureaucracy can create a uniquely realistic form of evil.
- •Brave New World vs 1984: different dystopian mechanisms
- •Genetic/socioeconomic sorting and modern echoes (marriage, social media)
- •Brazil as dystopia of ‘authoritarian incompetence’
- •Bureaucracy as a slow, banal path to catastrophic outcomes (Chernobyl as example)
- 1:13:50 – 1:18:26
Stalin, Hitler, and why some historical evils feel more ‘possible’ today
Lex and Dan debate portrayals of dictators in film, including Death of Stalin and Tarantino’s stylization. Lex argues Stalin’s bureaucratic machinery is a more instructive warning for the modern world than the caricature of Hitler, then connects to contemporary concerns about state overreach.
- •Death of Stalin: comedy that may be uncomfortably close to reality
- •Critique of stylized WWII portrayals; desire for more serious Stalin narratives
- •Stalin’s system-level cruelty as a recurring pattern societies can repeat
- •Parallels to modern authoritarian risks and information control
- 1:18:26 – 1:25:02
Interviewing Putin and the role of extreme empathy in hard conversations
Lex explains his approach to interviewing controversial figures: absorb context first, write questions late, and attempt genuine empathy. Dan compares it to an ‘extreme empath’ detective mindset—understanding how people justify harmful actions without excusing them.
- •Lex’s preparation strategy: learn the person deeply before drafting questions
- •Putin framed through “mob morality” and human psychology
- •Empathy as a tool for clarity, not endorsement
- •Podcasts as a medium for demystifying people through long-form context
- 1:25:02 – 1:28:46
Meaning of life: creation, family, and contributing to humanity’s ongoing organism
Lex asks the “zoomed-out” question about purpose. Dan answers that meaning comes from contributing—through creating things, raising kids, and participating in humanity’s evolving story—especially from a non-religious perspective.
- •Meaning as contribution to humanity’s larger evolving system
- •Joy in creating (software, ideas) and in raising children
- •Creation impulse as a defining human trait (religious and secular framing)
- •Life’s purpose grounded in living, making, and building