How I AIHow Emmy Award–winning filmmakers use AI to automate the tedious parts of documentaries
CHAPTERS
Tim McAleer’s mission: automate documentary “toil” in post-production
Tim explains that nonfiction post-production is dominated by messy media management—images, archival footage, interviews, transcripts—and that AI is most immediately valuable as a tooling layer rather than a content generator. The goal: eliminate manual logging and data entry so teams can spend time on real research and storytelling.
Why documentaries create massive asset libraries (shooting ratios at scale)
Claire and Tim quantify the scale of documentary inputs versus final runtime using the industry concept of “shooting ratio.” Tim shares concrete numbers from the Muhammad Ali series to illustrate how quickly assets balloon into tens of thousands of items.
From manual databases to an AI-assisted logging pipeline
Tim shows the kind of production database used on films (stills, footage, music, descriptions, tags, provenance). Historically, researchers typed all this in by hand; Tim’s objective is to automate population of these fields with AI while preserving factual accuracy.
Early experiments: one-off scripts + AI image description (and the hallucination problem)
Tim recreates the initial workflow: sending a single JPEG to an AI vision model to get a description. The output is useful but too generic for editorial needs, and it can hallucinate specifics like location and year.
Adding guardrails: extract embedded metadata and enrich prompts for accuracy
To reduce guessing, Tim updates the script to scrape file metadata (e.g., Library of Congress fields) and append it to the prompt. The model then relies on trusted facts (location, year, photographer) rather than inventing details.
Scaling up: from a single script to a full REST API for the whole team
Tim explains how the idea evolved into a server-based REST API that any asset can call for a series of processing jobs. The API orchestrates steps like copying to a server, renaming, parsing metadata, scraping sources, and generating final descriptions.
Video understanding: frame sampling + Whisper transcription + a reasoning pass
For video, Tim extracts frames at intervals (e.g., every 5 seconds to control cost) and captions them with a cheaper model, then combines frame captions with time-aligned audio transcripts. A larger reasoning model ingests the packaged “video events” to produce a coherent description of what happens.
Semantic discovery: CLIP + text embeddings fused for better search
Once assets have reliable descriptions, Tim generates vector embeddings for both thumbnails (CLIP) and text (OpenAI embeddings), then fuses them. This enables semantic search and “find similar” workflows that beat exact keyword matching.
Real payoff: freeing researchers to do more (and better) content discovery
Tim argues the best outcome isn’t just speed—it’s reallocating expert time from clerical logging back into research. With less copy/paste and labeling, teams can turn over more “rocks,” collect more assets, and improve final selections.
Field research app demo: “Flip Flop” for capturing front/back archival photos
Tim demos an iOS app built for archive visits where researchers photograph fronts and backs of items. The app creates folder-based collections, pairs front/back images, applies structured filenames, transcribes text from the back, and embeds it into image metadata.
Embedding metadata into the file itself + the hidden win of structured naming
Beyond storing data in a database, Flip Flop writes AI-derived info into the image’s EXIF metadata so it travels with the file anywhere. Tim highlights that consistent, sortable filenames are a major workflow unlock—turning chaos into import-ready structure.
Targeted document extraction: “OCR Party” for partial OCR on difficult scans
Tim introduces a macOS menu bar app that lets users crop a portion of a historical document (e.g., a specific newspaper article) and OCR only that region. It supports both local macOS Vision OCR and an AI-based OCR option for better results on degraded text.
Choosing the right form factor: API service vs iOS app vs desktop utility
Claire and Tim zoom out to emphasize that AI enables building hyper-specific tools in the right place for the workflow. Tim’s three examples span backend automation, mobile capture in the field, and lightweight desktop utilities—each purpose-built and low-friction.
Learning mindset: vibe coding parallels with creative software mastery
Tim describes learning AI coding tools the same way creatives learn Photoshop or Premiere: iterative experimentation, tutorials, forums, and a focus on what’s possible. The tooling feels more like a creative medium than traditional programming.
AI in film: practical tooling today, cautious creation tomorrow + prompting resets
Tim shares a nuanced view: generative video is fun and improving, but nonfiction must avoid deceptive “fake archival” content. He closes with a pragmatic prompting technique—when stuck, ask the model for a “resume work” summary prompt and restart in a fresh thread.
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome