Lex Fridman PodcastKempf & Kunhya on Lex Fridman: How FFmpeg runs 90% of video
By inferring codec from content rather than file format or extension; FFmpeg assembly code now underlies YouTube, Netflix, and ninety percent of internet video.
At a glance
WHAT IT’S REALLY ABOUT
How FFmpeg and VLC power video, open source, and codecs worldwide
- The episode explains the end-to-end media pipeline—input I/O, container demuxing, codec decoding, filtering, and rendering—and why real-world files and streams require extreme robustness to broken or mislabeled inputs.
- It demystifies codecs vs. containers and the core ideas of modern compression (spatial/temporal prediction, transforms, quantization, entropy coding), emphasizing the human-perception goals and the huge compute cost of better compression.
- It frames FFmpeg as the ubiquitous open-source toolbox and “language” for multimedia processing that underpins most internet video workflows, while VLC is a widely deployed player and platform built to survive damaged and hostile media.
- The speakers argue that peak performance for global-scale decoding still requires massive handwritten SIMD assembly, citing the AV1 decoder dav1d (VideoLAN) as an extreme example where “every cycle matters.”
- The discussion covers open-source governance and sustainability (licensing, relicensing, forks, maintainer burnout) plus modern tensions with AI-generated security reporting, corporate expectations, and the security/telemetry stance of VLC.
IDEAS WORTH REMEMBERING
5 ideasPlayback is a multi-stage pipeline, and each stage can fail independently.
VLC/FFmpeg must fetch bytes (file/HTTP/DVD/UDP), demux container tracks, decode codecs, then render via GPU/audio devices; real-world media often violates specs, so resilient probing and error handling are core design requirements.
Codecs remove redundancy aggressively—often 100–1000×—by exploiting human perception.
Modern codecs shift from RGB to YUV, subsample chroma, predict spatially/temporally, and quantize in the frequency domain; the “best” loss is what humans least notice, not what preserves data perfectly.
Better compression usually costs dramatically more CPU to encode.
Newer standards (e.g., AV1, upcoming AV2, VVC) add many tools and search options; they can cut bitrate ~30% per generation, but encoding can become an order of magnitude (or more) more expensive.
FFmpeg succeeds because it’s a universal, composable toolbox—not just a library.
Its CLI and filters let individuals and trillion-dollar companies build complex media pipelines quickly, often via long generated command lines, while libraries (libavcodec/format/filter) embed into countless apps and devices.
Handwritten SIMD assembly still wins—often by multiples, not percents.
The speakers argue compilers/autovectorization and intrinsics routinely miss optimizations achievable with manual control of registers, instructions, pipelines, and even custom calling conventions, which matters at the scale of billions of decodes.
WORDS WORTH SAVING
5 quotesIt doesn't matter. The important is, is your code good? Is your code great? Is your technology great? We care about excellent code. We don't care who you are. Like maybe you're a dog. I don't care, right? I don't care where you come from. I need to look at your code.
— Jean-Baptiste Kempf
Everything we've just said in the past couple of minutes, every sentence is someone's lifetime's work. There are books about every sentence. So the level of complexity, in many cases, is inordinate.
— Kieran Kunhya
Yes, this is true, I refused dozens of millions of dollars, yes, several time. Yes, I could be a multimillionaire and be somewhere on the beach. Um, but I did not do it because I thought it was not moral, and it was not the right thing to do.
— Jean-Baptiste Kempf
Like if we had to compromise our software, we would shut it down. This is clear.
— Jean-Baptiste Kempf
Don't regret anything. No, it's because regrets are a tax on your mind, right? So learn from your mistakes, but don't regret.
— Jean-Baptiste Kempf
High quality AI-generated summary created from speaker-labeled transcript.
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome