
Dan Kokotov: Speech Recognition with AI and Humans | Lex Fridman Podcast #151
Lex Fridman (host), Dan Kokotov (guest), Narrator, Narrator, Narrator
In this episode of Lex Fridman Podcast, featuring Lex Fridman and Dan Kokotov, Dan Kokotov: Speech Recognition with AI and Humans | Lex Fridman Podcast #151 explores human-AI Partnership Transforms Speech Into Text, Work, And Understanding Lex Fridman speaks with Dan Kokotov, VP of Engineering at Rev, about how Rev combines human freelancers and advanced automatic speech recognition (ASR) to deliver highly accurate captions and transcripts. They cover Rev’s origins as a streamlined alternative to freelancer marketplaces like Upwork, its evolution into rev.ai as a developer platform, and the technical and data advantages that drive their speech-to-text models. The conversation branches into the economics and ethics of the so‑called gig economy, product design, API and platform ecosystems, and how fully searchable audio could reshape podcasts, meetings, and information access. Along the way they explore management vs. engineering, dystopian sci‑fi (Dune, Brave New World, Brazil), platform censorship, and the broader meaning of creativity and human connection.
Human-AI Partnership Transforms Speech Into Text, Work, And Understanding
Lex Fridman speaks with Dan Kokotov, VP of Engineering at Rev, about how Rev combines human freelancers and advanced automatic speech recognition (ASR) to deliver highly accurate captions and transcripts. They cover Rev’s origins as a streamlined alternative to freelancer marketplaces like Upwork, its evolution into rev.ai as a developer platform, and the technical and data advantages that drive their speech-to-text models. The conversation branches into the economics and ethics of the so‑called gig economy, product design, API and platform ecosystems, and how fully searchable audio could reshape podcasts, meetings, and information access. Along the way they explore management vs. engineering, dystopian sci‑fi (Dune, Brave New World, Brazil), platform censorship, and the broader meaning of creativity and human connection.
Key Takeaways
Standardizing well-defined services enables a better marketplace than generic freelance platforms.
Rev focused on tightly scoped, repeatable work like transcription and translation, hiding freelancer selection and complexity from customers and pricing by simple units (minutes, pages). ...
Get the full analysis with uListen AI
High-quality labeled data from real customer work is Rev’s core ASR advantage.
Because Rev’s business is literally paying humans to create accurate transcripts, it accrues massive, well-labeled speech datasets—and even detailed edit logs—forming a powerful flywheel for training and improving its speech recognition models.
Get the full analysis with uListen AI
Human editors plus ASR drafts dramatically boost speed and quality over either alone.
Revvers typically start from machine-generated transcripts and correct them using specialized tools; on clean audio this can bring turnaround close to 2–3x real-time, while still surpassing raw ASR quality and handling difficult accents, noise, and domain-specific language.
Get the full analysis with uListen AI
Democratized, remote, on-your-own-time work can be meaningful, not merely ‘gig’ labor.
Rev’s freelancers include parents at home, students, socially anxious people, and former high-powered professionals—all using flexible, location-independent work. ...
Get the full analysis with uListen AI
Making all spoken content searchable could transform how we learn and reference information.
If podcasts, meetings, and lectures were universally transcribed and indexed, people could quickly jump to exact moments, embed precise clips, and perform text-like search across speech—vastly improving discovery, research, and reuse of audio content.
Get the full analysis with uListen AI
Over-reliance on metrics can blind platforms to user experience and long-term trust.
They criticize companies that chase engagement or watch time while ignoring obvious user pain points (e. ...
Get the full analysis with uListen AI
Great leadership and products often come from obsessive attention to detail and user love.
Lex and Dan contrast small, visionary-led teams (e. ...
Get the full analysis with uListen AI
Notable Quotes
“Our business model is being paid to annotate the data.”
— Dan Kokotov
“You don’t need to worry about the details of how the sausage is made.”
— Dan Kokotov
“There’s something about podcasts, like the audio, that makes it difficult to jump to a spot and look for that particular information.”
— Lex Fridman
“You have to be like a creator and have that creator’s love for your own thing.”
— Dan Kokotov
“If you’re not a religious person, the meaning of life is in contributing to this weird thing we call humanity—creating things and bringing up kids who are unique and different.”
— Dan Kokotov
Questions Answered in This Episode
How might universal, highly accurate speech-to-text change journalism, research, and education over the next decade?
Lex Fridman speaks with Dan Kokotov, VP of Engineering at Rev, about how Rev combines human freelancers and advanced automatic speech recognition (ASR) to deliver highly accurate captions and transcripts. ...
Get the full analysis with uListen AI
What ethical responsibilities do companies like Rev have when their labeled data and models can be used to monitor or analyze conversations at scale?
Get the full analysis with uListen AI
Could the Rev model of human–AI collaboration be extended to other domains beyond language, such as coding assistance or legal document review?
Get the full analysis with uListen AI
How should platforms balance free speech with the desire to reduce harm and ‘increase the health of conversations’ without sliding into over-censorship?
Get the full analysis with uListen AI
What can individual creators and engineers do to resist over-optimization on short-term metrics and instead build products people genuinely love?
Get the full analysis with uListen AI
Transcript Preview
The following is a conversation with Dan Kokotov, VP of engineering at rev.ai, which is, by many metrics, the best speech-to-text AI engine in the world. Rev in general is a company that does captioning and transcription of audio by humans and by AI. I've been using their services for a couple of years now and, and planning to use Rev to have both captions and transcripts to some of the previous and future episodes of this podcast to make it easier for people to read through the conversation or reference various parts of the episode, since that's something that quite a few people requested. I'll probably do a separate video on that with links on the podcast website so people can provide suggestions and improvements there. Quick mention of our sponsors. Athletic Greens all-in-one nutrition drink, Blinkist app that summarizes books, Business Wars podcast, and Cash App. So the choice is health, wisdom, or money. Choose wisely, my friends. And if you wish, click the sponsor links below to get a discount and to support this podcast. As a side note, let me say that I reached out to Dan and the Rev team for conversation because I've been using and genuinely loving their service, and really curious about how it works. I previously talked to the head of Adobe Research for the same reason. For me, there's a bunch of products, usually it's software, that comes along and just makes my life way easier. Examples are Adobe Premiere for video editing, iZotope RX for cleaning up audio, AutoHotKey on Windows for automating keyboard or mouse tasks, Emacs as an IDE for everything, including the universe itself. I can keep on going, but you get the idea. I just like talking to people who create things I'm a big fan of. That said, after doing this conversation, the folks at rev.ai offered to sponsor this podcast in the coming months. This conversation is not sponsored by the guest. It probably goes without saying, but I should say it anyway, that you cannot buy your way onto this podcast. I don't know why you would want to. I wanted to bring this up, uh, to make a specific point that no sponsor will ever influence what I do on this podcast, or to the best of my ability influence what I think. I wasn't really thinking about this, uh, for example, when I interviewed Jack Dorsey, who was the CEO of Square that happens to be sponsoring this podcast, but I should really make it explicit. I will never take money for bringing a guest on. Every guest on this podcast is someone I genuinely am curious to talk to or just genuinely love something they've created. As I sometimes get criticized for, I'm just a fan of people, and that's who I talk to. As I also talk about way too much, money is really never a consideration. In general, no amount of money can buy my integrity. That's true for this podcast, and that's true for anything else I do. If you enjoy this thing, subscribe on YouTube, review it on Apple Podcast, follow on Spotify, support on Patreon, or connect with me on Twitter @lexfridman. And now, here's my conversation with Dan Kokotov. You mentioned science fiction on the phone, so let's go with the ridiculous first. What's the greatest sci-fi novel of all time in your view? And maybe what ideas do you, do you find philosophically fascinating about it?
Install uListen to search the full transcript and get AI-powered insights
Get Full TranscriptGet more from every podcast
AI summaries, searchable transcripts, and fact-checking. Free forever.
Add to Chrome