How Emmy Award–winning filmmakers use AI to automate the tedious parts of documentaries

How Emmy Award–winning filmmakers use AI to automate the tedious parts of documentaries

How I AINov 17, 202547m

Claire Vo (host), Tim McAleer (guest)

Documentary shooting ratios and asset scaleAutomated logging and database metadataGuardrails: embedded metadata + web scrapingFrom scripts to a REST API serviceVideo understanding: frame sampling + audio transcriptionVector embeddings and semantic/reverse searchCustom app form factors: iOS capture, macOS OCR utilityAI skepticism in film and nonfiction ethicsPrompting workflow: “resume work” prompts and restarting

In this episode of How I AI, featuring Claire Vo and Tim McAleer, How Emmy Award–winning filmmakers use AI to automate the tedious parts of documentaries explores aI automates documentary logging, search, and field archival research workflows Tim McAleer (Florentine Films/Ken Burns) describes documentary post-production as a media-management problem: hundreds of hours of footage and tens of thousands of photos that historically required tedious manual logging.

AI automates documentary logging, search, and field archival research workflows

Tim McAleer (Florentine Films/Ken Burns) describes documentary post-production as a media-management problem: hundreds of hours of footage and tens of thousands of photos that historically required tedious manual logging.

He demos how early one-off scripts evolved into a production REST API that extracts file specs and embedded metadata, scrapes the web for source truth, generates accurate descriptions, and processes video via frame sampling plus Whisper transcription.

He then layers in vector embeddings (CLIP for images + text embeddings for descriptions) to enable semantic discovery and reverse-image search within a project’s archive.

Finally, he shows two hyper-specific “vibe-coded” tools—an iOS app for field capture of photo fronts/backs with embedded EXIF metadata, and a macOS OCR cropping app for historical documents—highlighting AI’s biggest near-term value as workflow tooling, not content generation.

Key Takeaways

Documentary workflows are dominated by asset management toil.

Nonfiction productions can involve tens of thousands of stills and hundreds of hours of footage; organizing, describing, and fact-checking assets becomes a core bottleneck that AI can relieve.

Get the full analysis with uListen AI

Accuracy improves when AI is constrained by trusted metadata.

Instead of relying on generic vision descriptions (which can hallucinate), McAleer appends embedded metadata (e. ...

Get the full analysis with uListen AI

The real leverage is turning ad-hoc scripts into shared infrastructure.

He starts with a single Python script, then scales into a REST API that teammates can call from any database tool—standardizing a multi-step pipeline (specs → copy/rename → parse metadata → scrape URL → generate description).

Get the full analysis with uListen AI

Video logging becomes tractable with sampling + a two-model approach.

To control cost, he samples frames at ~5-second intervals, captions those with a cheaper model, transcribes audio with Whisper, then sends the combined “video events” to a reasoning model to infer what’s happening.

Get the full analysis with uListen AI

Embeddings unlock discovery that keyword search can’t match.

By fusing image embeddings (CLIP from thumbnails) with text embeddings (from descriptions), the archive supports semantic search and “Find Similar” reverse-image lookup—useful for editors seeking a consistent visual ‘vibe.’

Get the full analysis with uListen AI

Embedding metadata into the file itself reduces downstream friction.

His iOS app ‘Flip Flop’ writes transcriptions/captions into EXIF and enforces structured filenames, so context travels with the image even outside the database and eliminates “messy camera roll” problems.

Get the full analysis with uListen AI

Small, purpose-built utilities beat generic tools for niche tasks.

‘OCR Party’ focuses on cropping and OCR-ing only the relevant region of messy historical documents (with an optional non-AI OCR mode), producing fast, searchable text for researchers while leaving final verification to fact-checking.

Get the full analysis with uListen AI

Notable Quotes

Post-production is like a technical mess of media management.

Tim McAleer

My goal was to automate this. For years, this has been manual data entry.

Tim McAleer

We want everything going into our database to be true and verifiable information.

Tim McAleer

You’ve now freed them up to just look more, right?

Tim McAleer

No one was gonna make me this app.

Tim McAleer

Questions Answered in This Episode

In your ‘Auto Log’ pipeline, how do you validate scraped web facts (and citations) before they become “truth” in the database?

Tim McAleer (Florentine Films/Ken Burns) describes documentary post-production as a media-management problem: hundreds of hours of footage and tens of thousands of photos that historically required tedious manual logging.

Get the full analysis with uListen AI

What error modes did you see most often early on—wrong locations/years, misidentified people, or something else—and how did you design guardrails to reduce them?

He demos how early one-off scripts evolved into a production REST API that extracts file specs and embedded metadata, scrapes the web for source truth, generates accurate descriptions, and processes video via frame sampling plus Whisper transcription.

Get the full analysis with uListen AI

Why fuse CLIP image embeddings with text embeddings instead of relying on one modality—what types of queries improve most with fusion?

He then layers in vector embeddings (CLIP for images + text embeddings for descriptions) to enable semantic discovery and reverse-image search within a project’s archive.

Get the full analysis with uListen AI

For video: how did you land on 5-second sampling, and what kinds of footage break with that interval (fast action, montage, B-roll, etc.)?

Finally, he shows two hyper-specific “vibe-coded” tools—an iOS app for field capture of photo fronts/backs with embedded EXIF metadata, and a macOS OCR cropping app for historical documents—highlighting AI’s biggest near-term value as workflow tooling, not content generation.

Get the full analysis with uListen AI

How do editors/researchers interact with semantic search in practice—do they type natural-language queries, use “find similar,” or browse clusters?

Get the full analysis with uListen AI

Transcript Preview

Claire Vo

How did you think about what problems there were to solve in AI relative to your job and the people that you work with, and why did you start where you started?

Tim McAleer

Post-production is like a technical mess of media management. You have many different file types. You have images, you have archival footage that you're gathering, live footage that you may have filmed out in the field, interviews, transcripts. So it ends up being hundreds of hours of footage, tens of thousands of photos. The data management piece, when you're dealing with all that different stuff, is the mess that I have used AI to tackle. My goal was to automate this. For years, this has been manual data entry.

Claire Vo

Automate away toil. That's what we wanna do.

Tim McAleer

No one was gonna make me this app, and so the ability to make an extremely specific app that makes a workflow on my team and my company easier, it's been an unbelievable moment.

Claire Vo

[upbeat music] Welcome back to How I AI. I'm Claire Vo, product leader and AI obsessive, here on a mission to help you build better with these new tools. Today, we have Tim McAleer, a producer at Ken Burns' Florentine Films, who's responsible for the technology and processes that bring these amazing films to life. Instead of focusing on how AI can create creative for these films, we're actually gonna talk about how Tim uses AI to build software products that make his post-production and research team's lives a lot better. If you're working with images, video, sound, or just a lot of data, this episode is a great one for you. Let's get to it. This episode is brought to you by Brex. If you're listening to this show, you already know AI is changing how we work in real, practical ways. Brex is bringing that same power to finance. Brex is the intelligent finance platform built for founders. With autonomous agents running in the background, your finance stack basically runs itself. Cards are issued, expenses are filed, and fraud is stopped in real time without you having to think about it. Add Brex's banking solution with a high-yield treasury account, and you've got a system that helps you spend smarter, move faster, and scale with confidence. One in three startups in the US already runs on Brex. You can, too, at brex.com/howiai. Tim, welcome to How I AI. I'm excited to have you here.

Tim McAleer

Thank you for having me.

Claire Vo

What I love about what we're gonna talk about today is you work in a very interesting and creative industry, putting out amazing content, and we're gonna talk a little bit about how AI is impacting the creation side of things. But you've actually used AI to smooth out some of the challenges you've had on the production and post-production side of things. So I'm curious, how did you think about what problems there were to solve in AI relative to your job and the people that you work with, and why did you start where you started?

Install uListen to search the full transcript and get AI-powered insights

Get Full Transcript

Get more from every podcast

AI summaries, searchable transcripts, and fact-checking. Free forever.

Add to Chrome