Dan Kokotov: Speech Recognition with AI and Humans | Lex Fridman Podcast #151

Dan Kokotov: Speech Recognition with AI and Humans | Lex Fridman Podcast #151

Lex Fridman PodcastJan 4, 20211h 28m

Lex Fridman (host), Dan Kokotov (guest), Narrator, Narrator, Narrator

Origins and business model of Rev (rev.com) and rev.aiHuman–AI collaboration in transcription and captioning workflowsTechnical landscape and challenges of automatic speech recognition (ASR)The ‘gig economy’, remote work, and life as a Rev freelancer (Revver)Product design, APIs, and frustrations with major tech platformsPodcasts, searchability of speech, and the future of audio contentLeadership, management vs. programming, and broader philosophical themes

In this episode of Lex Fridman Podcast, featuring Lex Fridman and Dan Kokotov, Dan Kokotov: Speech Recognition with AI and Humans | Lex Fridman Podcast #151 explores human-AI Partnership Transforms Speech Into Text, Work, And Understanding Lex Fridman speaks with Dan Kokotov, VP of Engineering at Rev, about how Rev combines human freelancers and advanced automatic speech recognition (ASR) to deliver highly accurate captions and transcripts. They cover Rev’s origins as a streamlined alternative to freelancer marketplaces like Upwork, its evolution into rev.ai as a developer platform, and the technical and data advantages that drive their speech-to-text models. The conversation branches into the economics and ethics of the so‑called gig economy, product design, API and platform ecosystems, and how fully searchable audio could reshape podcasts, meetings, and information access. Along the way they explore management vs. engineering, dystopian sci‑fi (Dune, Brave New World, Brazil), platform censorship, and the broader meaning of creativity and human connection.

Human-AI Partnership Transforms Speech Into Text, Work, And Understanding

Lex Fridman speaks with Dan Kokotov, VP of Engineering at Rev, about how Rev combines human freelancers and advanced automatic speech recognition (ASR) to deliver highly accurate captions and transcripts. They cover Rev’s origins as a streamlined alternative to freelancer marketplaces like Upwork, its evolution into rev.ai as a developer platform, and the technical and data advantages that drive their speech-to-text models. The conversation branches into the economics and ethics of the so‑called gig economy, product design, API and platform ecosystems, and how fully searchable audio could reshape podcasts, meetings, and information access. Along the way they explore management vs. engineering, dystopian sci‑fi (Dune, Brave New World, Brazil), platform censorship, and the broader meaning of creativity and human connection.

Key Takeaways

Standardizing well-defined services enables a better marketplace than generic freelance platforms.

Rev focused on tightly scoped, repeatable work like transcription and translation, hiding freelancer selection and complexity from customers and pricing by simple units (minutes, pages). ...

Get the full analysis with uListen AI

High-quality labeled data from real customer work is Rev’s core ASR advantage.

Because Rev’s business is literally paying humans to create accurate transcripts, it accrues massive, well-labeled speech datasets—and even detailed edit logs—forming a powerful flywheel for training and improving its speech recognition models.

Get the full analysis with uListen AI

Human editors plus ASR drafts dramatically boost speed and quality over either alone.

Revvers typically start from machine-generated transcripts and correct them using specialized tools; on clean audio this can bring turnaround close to 2–3x real-time, while still surpassing raw ASR quality and handling difficult accents, noise, and domain-specific language.

Get the full analysis with uListen AI

Democratized, remote, on-your-own-time work can be meaningful, not merely ‘gig’ labor.

Rev’s freelancers include parents at home, students, socially anxious people, and former high-powered professionals—all using flexible, location-independent work. ...

Get the full analysis with uListen AI

Making all spoken content searchable could transform how we learn and reference information.

If podcasts, meetings, and lectures were universally transcribed and indexed, people could quickly jump to exact moments, embed precise clips, and perform text-like search across speech—vastly improving discovery, research, and reuse of audio content.

Get the full analysis with uListen AI

Over-reliance on metrics can blind platforms to user experience and long-term trust.

They criticize companies that chase engagement or watch time while ignoring obvious user pain points (e. ...

Get the full analysis with uListen AI

Great leadership and products often come from obsessive attention to detail and user love.

Lex and Dan contrast small, visionary-led teams (e. ...

Get the full analysis with uListen AI

Notable Quotes

Our business model is being paid to annotate the data.

Dan Kokotov

You don’t need to worry about the details of how the sausage is made.

Dan Kokotov

There’s something about podcasts, like the audio, that makes it difficult to jump to a spot and look for that particular information.

Lex Fridman

You have to be like a creator and have that creator’s love for your own thing.

Dan Kokotov

If you’re not a religious person, the meaning of life is in contributing to this weird thing we call humanity—creating things and bringing up kids who are unique and different.

Dan Kokotov

Questions Answered in This Episode

How might universal, highly accurate speech-to-text change journalism, research, and education over the next decade?

Lex Fridman speaks with Dan Kokotov, VP of Engineering at Rev, about how Rev combines human freelancers and advanced automatic speech recognition (ASR) to deliver highly accurate captions and transcripts. ...

Get the full analysis with uListen AI

What ethical responsibilities do companies like Rev have when their labeled data and models can be used to monitor or analyze conversations at scale?

Get the full analysis with uListen AI

Could the Rev model of human–AI collaboration be extended to other domains beyond language, such as coding assistance or legal document review?

Get the full analysis with uListen AI

How should platforms balance free speech with the desire to reduce harm and ‘increase the health of conversations’ without sliding into over-censorship?

Get the full analysis with uListen AI

What can individual creators and engineers do to resist over-optimization on short-term metrics and instead build products people genuinely love?

Get the full analysis with uListen AI

Transcript Preview

Lex Fridman

The following is a conversation with Dan Kokotov, VP of engineering at rev.ai, which is, by many metrics, the best speech-to-text AI engine in the world. Rev in general is a company that does captioning and transcription of audio by humans and by AI. I've been using their services for a couple of years now and, and planning to use Rev to have both captions and transcripts to some of the previous and future episodes of this podcast to make it easier for people to read through the conversation or reference various parts of the episode, since that's something that quite a few people requested. I'll probably do a separate video on that with links on the podcast website so people can provide suggestions and improvements there. Quick mention of our sponsors. Athletic Greens all-in-one nutrition drink, Blinkist app that summarizes books, Business Wars podcast, and Cash App. So the choice is health, wisdom, or money. Choose wisely, my friends. And if you wish, click the sponsor links below to get a discount and to support this podcast. As a side note, let me say that I reached out to Dan and the Rev team for conversation because I've been using and genuinely loving their service, and really curious about how it works. I previously talked to the head of Adobe Research for the same reason. For me, there's a bunch of products, usually it's software, that comes along and just makes my life way easier. Examples are Adobe Premiere for video editing, iZotope RX for cleaning up audio, AutoHotKey on Windows for automating keyboard or mouse tasks, Emacs as an IDE for everything, including the universe itself. I can keep on going, but you get the idea. I just like talking to people who create things I'm a big fan of. That said, after doing this conversation, the folks at rev.ai offered to sponsor this podcast in the coming months. This conversation is not sponsored by the guest. It probably goes without saying, but I should say it anyway, that you cannot buy your way onto this podcast. I don't know why you would want to. I wanted to bring this up, uh, to make a specific point that no sponsor will ever influence what I do on this podcast, or to the best of my ability influence what I think. I wasn't really thinking about this, uh, for example, when I interviewed Jack Dorsey, who was the CEO of Square that happens to be sponsoring this podcast, but I should really make it explicit. I will never take money for bringing a guest on. Every guest on this podcast is someone I genuinely am curious to talk to or just genuinely love something they've created. As I sometimes get criticized for, I'm just a fan of people, and that's who I talk to. As I also talk about way too much, money is really never a consideration. In general, no amount of money can buy my integrity. That's true for this podcast, and that's true for anything else I do. If you enjoy this thing, subscribe on YouTube, review it on Apple Podcast, follow on Spotify, support on Patreon, or connect with me on Twitter @lexfridman. And now, here's my conversation with Dan Kokotov. You mentioned science fiction on the phone, so let's go with the ridiculous first. What's the greatest sci-fi novel of all time in your view? And maybe what ideas do you, do you find philosophically fascinating about it?

Install uListen to search the full transcript and get AI-powered insights

Get Full Transcript

Get more from every podcast

AI summaries, searchable transcripts, and fact-checking. Free forever.

Add to Chrome