How Emmy Award–winning filmmakers use AI to automate the tedious parts of documentaries

Tim McAleer is a producer at Ken Burns’s Florentine Films who is responsible for the technology and processes that power their documentary production. Rather than using AI to generate creative content, Tim has built custom AI-powered tools that automate the most tedious parts of documentary filmmaking: organizing and extracting metadata from tens of thousands of archival images, videos, and audio files. In this episode, Tim demonstrates how he’s transformed post-production workflows using AI to make vast archives of historical material actually usable and searchable. *What you’ll learn:* 1. How Tim built an AI system that automatically extracts and embeds metadata into archival images and footage 2. The custom iOS app he created that transforms chaotic archival research into structured, searchable data 3. How AI-powered OCR is making previously illegible historical documents accessible 4. Why Tim uses different AI models for different tasks (Claude for coding, OpenAI for images, Whisper for audio) 5. How vector embeddings enable semantic search across massive documentary archives 6. A practical approach to building custom AI tools that solve specific workflow problems 7. Why AI is most valuable for automating tedious tasks rather than replacing creative work *Brought to you by:* Brex—The intelligent finance platform built for founders: https://brex.com/howiai *Where to find Tim McAleer:* Website: https://timmcaleer.com/ LinkedIn: https://www.linkedin.com/in/timmcaleer/ *Where to find Claire Vo:* ChatPRD: https://www.chatprd.ai/ Website: https://clairevo.com/ LinkedIn: https://www.linkedin.com/in/clairevo/ X: https://x.com/clairevo *In this episode, we cover:* (00:00) Introduction to Tim McAleer (02:23) The scale of media management in documentary filmmaking (04:16) Building a database system for archival assets (06:02) Early experiments with AI image description (08:59) Adding metadata extraction to improve accuracy (12:54) Scaling from single scripts to a complete REST API (15:16) Processing video with frame sampling and audio transcription (19:10) Implementing vector embeddings for semantic search (21:22) How AI frees up researchers to focus on content discovery (24:21) Demo of “Flip Flop” iOS app for field research (29:33) How structured file naming improves workflow efficiency (32:20) “OCR Party” app for processing historical documents (34:56) The versatility of different app form factors for specific workflows (40:34) Learning approach and parallels with creative software (42:00) Perspectives on AI in the film industry (44:05) Prompting techniques and troubleshooting AI workflows *Tools referenced:* • Claude: https://claude.ai/ • ChatGPT: https://chat.openai.com/ • OpenAI Vision API: https://platform.openai.com/docs/guides/vision • Whisper: https://github.com/openai/whisper • Cursor: https://cursor.sh/ • Superwhisper: https://superwhisper.com/ • CLIP: https://github.com/openai/CLIP • Gemini: https://deepmind.google/technologies/gemini/ *Other references:* • Florentine Films: https://www.florentinefilms.com/ • Ken Burns: https://www.pbs.org/kenburns/ • Muhammad Ali documentary: https://www.pbs.org/kenburns/muhammad-ali/ • The American Revolution series: https://www.pbs.org/kenburns/the-american-revolution/ • Archival Producers Alliance: https://www.archivalproducersalliance.com/genai-guidelines • Exif metadata standard: https://en.wikipedia.org/wiki/Exif • Library of Congress: https://www.loc.gov/ _Production and marketing by https://penname.co/._ _For inquiries about sponsoring the podcast, email jordan@penname.co._

Claire VohostTim McAleerguest

Nov 17, 202547mWatch on YouTube ↗

CHAPTERS

Tim McAleer’s mission: automate documentary “toil” in post-production
Tim explains that nonfiction post-production is dominated by messy media management—images, archival footage, interviews, transcripts—and that AI is most immediately valuable as a tooling layer rather than a content generator. The goal: eliminate manual logging and data entry so teams can spend time on real research and storytelling.
Why documentaries create massive asset libraries (shooting ratios at scale)
Claire and Tim quantify the scale of documentary inputs versus final runtime using the industry concept of “shooting ratio.” Tim shares concrete numbers from the Muhammad Ali series to illustrate how quickly assets balloon into tens of thousands of items.
From manual databases to an AI-assisted logging pipeline
Tim shows the kind of production database used on films (stills, footage, music, descriptions, tags, provenance). Historically, researchers typed all this in by hand; Tim’s objective is to automate population of these fields with AI while preserving factual accuracy.
Early experiments: one-off scripts + AI image description (and the hallucination problem)
Tim recreates the initial workflow: sending a single JPEG to an AI vision model to get a description. The output is useful but too generic for editorial needs, and it can hallucinate specifics like location and year.
Adding guardrails: extract embedded metadata and enrich prompts for accuracy
To reduce guessing, Tim updates the script to scrape file metadata (e.g., Library of Congress fields) and append it to the prompt. The model then relies on trusted facts (location, year, photographer) rather than inventing details.
Scaling up: from a single script to a full REST API for the whole team
Tim explains how the idea evolved into a server-based REST API that any asset can call for a series of processing jobs. The API orchestrates steps like copying to a server, renaming, parsing metadata, scraping sources, and generating final descriptions.
Video understanding: frame sampling + Whisper transcription + a reasoning pass
For video, Tim extracts frames at intervals (e.g., every 5 seconds to control cost) and captions them with a cheaper model, then combines frame captions with time-aligned audio transcripts. A larger reasoning model ingests the packaged “video events” to produce a coherent description of what happens.
Semantic discovery: CLIP + text embeddings fused for better search
Once assets have reliable descriptions, Tim generates vector embeddings for both thumbnails (CLIP) and text (OpenAI embeddings), then fuses them. This enables semantic search and “find similar” workflows that beat exact keyword matching.
Real payoff: freeing researchers to do more (and better) content discovery
Tim argues the best outcome isn’t just speed—it’s reallocating expert time from clerical logging back into research. With less copy/paste and labeling, teams can turn over more “rocks,” collect more assets, and improve final selections.
Field research app demo: “Flip Flop” for capturing front/back archival photos
Tim demos an iOS app built for archive visits where researchers photograph fronts and backs of items. The app creates folder-based collections, pairs front/back images, applies structured filenames, transcribes text from the back, and embeds it into image metadata.
Embedding metadata into the file itself + the hidden win of structured naming
Beyond storing data in a database, Flip Flop writes AI-derived info into the image’s EXIF metadata so it travels with the file anywhere. Tim highlights that consistent, sortable filenames are a major workflow unlock—turning chaos into import-ready structure.
Targeted document extraction: “OCR Party” for partial OCR on difficult scans
Tim introduces a macOS menu bar app that lets users crop a portion of a historical document (e.g., a specific newspaper article) and OCR only that region. It supports both local macOS Vision OCR and an AI-based OCR option for better results on degraded text.
Choosing the right form factor: API service vs iOS app vs desktop utility
Claire and Tim zoom out to emphasize that AI enables building hyper-specific tools in the right place for the workflow. Tim’s three examples span backend automation, mobile capture in the field, and lightweight desktop utilities—each purpose-built and low-friction.
Learning mindset: vibe coding parallels with creative software mastery
Tim describes learning AI coding tools the same way creatives learn Photoshop or Premiere: iterative experimentation, tutorials, forums, and a focus on what’s possible. The tooling feels more like a creative medium than traditional programming.
AI in film: practical tooling today, cautious creation tomorrow + prompting resets
Tim shares a nuanced view: generative video is fun and improving, but nonfiction must avoid deceptive “fake archival” content. He closes with a pragmatic prompting technique—when stuck, ask the model for a “resume work” summary prompt and restart in a fresh thread.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome

Tim McAleer’s mission: automate documentary “toil” in post-production

Why documentaries create massive asset libraries (shooting ratios at scale)

From manual databases to an AI-assisted logging pipeline

Early experiments: one-off scripts + AI image description (and the hallucination problem)

Adding guardrails: extract embedded metadata and enrich prompts for accuracy

Scaling up: from a single script to a full REST API for the whole team

Video understanding: frame sampling + Whisper transcription + a reasoning pass

Semantic discovery: CLIP + text embeddings fused for better search

Real payoff: freeing researchers to do more (and better) content discovery

Field research app demo: “Flip Flop” for capturing front/back archival photos

Embedding metadata into the file itself + the hidden win of structured naming

Targeted document extraction: “OCR Party” for partial OCR on difficult scans

Choosing the right form factor: API service vs iOS app vs desktop utility

Learning mindset: vibe coding parallels with creative software mastery

AI in film: practical tooling today, cautious creation tomorrow + prompting resets

Get more out of YouTube videos.