How Emmy Award–winning filmmakers use AI to automate the tedious parts of documentaries

Tim McAleer is a producer at Ken Burns’s Florentine Films who is responsible for the technology and processes that power their documentary production. Rather than using AI to generate creative content, Tim has built custom AI-powered tools that automate the most tedious parts of documentary filmmaking: organizing and extracting metadata from tens of thousands of archival images, videos, and audio files. In this episode, Tim demonstrates how he’s transformed post-production workflows using AI to make vast archives of historical material actually usable and searchable. *What you’ll learn:* 1. How Tim built an AI system that automatically extracts and embeds metadata into archival images and footage 2. The custom iOS app he created that transforms chaotic archival research into structured, searchable data 3. How AI-powered OCR is making previously illegible historical documents accessible 4. Why Tim uses different AI models for different tasks (Claude for coding, OpenAI for images, Whisper for audio) 5. How vector embeddings enable semantic search across massive documentary archives 6. A practical approach to building custom AI tools that solve specific workflow problems 7. Why AI is most valuable for automating tedious tasks rather than replacing creative work *Brought to you by:* Brex—The intelligent finance platform built for founders: https://brex.com/howiai *Where to find Tim McAleer:* Website: https://timmcaleer.com/ LinkedIn: https://www.linkedin.com/in/timmcaleer/ *Where to find Claire Vo:* ChatPRD: https://www.chatprd.ai/ Website: https://clairevo.com/ LinkedIn: https://www.linkedin.com/in/clairevo/ X: https://x.com/clairevo *In this episode, we cover:* (00:00) Introduction to Tim McAleer (02:23) The scale of media management in documentary filmmaking (04:16) Building a database system for archival assets (06:02) Early experiments with AI image description (08:59) Adding metadata extraction to improve accuracy (12:54) Scaling from single scripts to a complete REST API (15:16) Processing video with frame sampling and audio transcription (19:10) Implementing vector embeddings for semantic search (21:22) How AI frees up researchers to focus on content discovery (24:21) Demo of “Flip Flop” iOS app for field research (29:33) How structured file naming improves workflow efficiency (32:20) “OCR Party” app for processing historical documents (34:56) The versatility of different app form factors for specific workflows (40:34) Learning approach and parallels with creative software (42:00) Perspectives on AI in the film industry (44:05) Prompting techniques and troubleshooting AI workflows *Tools referenced:* • Claude: https://claude.ai/ • ChatGPT: https://chat.openai.com/ • OpenAI Vision API: https://platform.openai.com/docs/guides/vision • Whisper: https://github.com/openai/whisper • Cursor: https://cursor.sh/ • Superwhisper: https://superwhisper.com/ • CLIP: https://github.com/openai/CLIP • Gemini: https://deepmind.google/technologies/gemini/ *Other references:* • Florentine Films: https://www.florentinefilms.com/ • Ken Burns: https://www.pbs.org/kenburns/ • Muhammad Ali documentary: https://www.pbs.org/kenburns/muhammad-ali/ • The American Revolution series: https://www.pbs.org/kenburns/the-american-revolution/ • Archival Producers Alliance: https://www.archivalproducersalliance.com/genai-guidelines • Exif metadata standard: https://en.wikipedia.org/wiki/Exif • Library of Congress: https://www.loc.gov/ _Production and marketing by https://penname.co/._ _For inquiries about sponsoring the podcast, email jordan@penname.co._

Claire VohostTim McAleerguest

Nov 17, 202547mWatch on YouTube ↗

WHAT IT’S REALLY ABOUT

AI automates documentary logging, search, and field archival research workflows

Tim McAleer (Florentine Films/Ken Burns) describes documentary post-production as a media-management problem: hundreds of hours of footage and tens of thousands of photos that historically required tedious manual logging.
He demos how early one-off scripts evolved into a production REST API that extracts file specs and embedded metadata, scrapes the web for source truth, generates accurate descriptions, and processes video via frame sampling plus Whisper transcription.
He then layers in vector embeddings (CLIP for images + text embeddings for descriptions) to enable semantic discovery and reverse-image search within a project’s archive.
Finally, he shows two hyper-specific “vibe-coded” tools—an iOS app for field capture of photo fronts/backs with embedded EXIF metadata, and a macOS OCR cropping app for historical documents—highlighting AI’s biggest near-term value as workflow tooling, not content generation.

IDEAS WORTH REMEMBERING

5 ideas

Documentary workflows are dominated by asset management toil.

Nonfiction productions can involve tens of thousands of stills and hundreds of hours of footage; organizing, describing, and fact-checking assets becomes a core bottleneck that AI can relieve.

Accuracy improves when AI is constrained by trusted metadata.

Instead of relying on generic vision descriptions (which can hallucinate), McAleer appends embedded metadata (e.g., Library of Congress fields) and eventually web-scraped source info so the model anchors outputs to verifiable facts.

The real leverage is turning ad-hoc scripts into shared infrastructure.

He starts with a single Python script, then scales into a REST API that teammates can call from any database tool—standardizing a multi-step pipeline (specs → copy/rename → parse metadata → scrape URL → generate description).

Video logging becomes tractable with sampling + a two-model approach.

To control cost, he samples frames at ~5-second intervals, captions those with a cheaper model, transcribes audio with Whisper, then sends the combined “video events” to a reasoning model to infer what’s happening.

Embeddings unlock discovery that keyword search can’t match.

By fusing image embeddings (CLIP from thumbnails) with text embeddings (from descriptions), the archive supports semantic search and “Find Similar” reverse-image lookup—useful for editors seeking a consistent visual ‘vibe.’

WORDS WORTH SAVING

5 quotes

Post-production is like a technical mess of media management.

— Tim McAleer

My goal was to automate this. For years, this has been manual data entry.

— Tim McAleer

We want everything going into our database to be true and verifiable information.

— Tim McAleer

You’ve now freed them up to just look more, right?

— Tim McAleer

No one was gonna make me this app.

— Tim McAleer

Documentary shooting ratios and asset scaleAutomated logging and database metadataGuardrails: embedded metadata + web scrapingFrom scripts to a REST API serviceVideo understanding: frame sampling + audio transcriptionVector embeddings and semantic/reverse searchCustom app form factors: iOS capture, macOS OCR utilityAI skepticism in film and nonfiction ethicsPrompting workflow: “resume work” prompts and restarting

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.