Skip to content
How I AIHow I AI

How Emmy Award–winning filmmakers use AI to automate the tedious parts of documentaries

Tim McAleer is a producer at Ken Burns’s Florentine Films who is responsible for the technology and processes that power their documentary production. Rather than using AI to generate creative content, Tim has built custom AI-powered tools that automate the most tedious parts of documentary filmmaking: organizing and extracting metadata from tens of thousands of archival images, videos, and audio files. In this episode, Tim demonstrates how he’s transformed post-production workflows using AI to make vast archives of historical material actually usable and searchable. *What you’ll learn:* 1. How Tim built an AI system that automatically extracts and embeds metadata into archival images and footage 2. The custom iOS app he created that transforms chaotic archival research into structured, searchable data 3. How AI-powered OCR is making previously illegible historical documents accessible 4. Why Tim uses different AI models for different tasks (Claude for coding, OpenAI for images, Whisper for audio) 5. How vector embeddings enable semantic search across massive documentary archives 6. A practical approach to building custom AI tools that solve specific workflow problems 7. Why AI is most valuable for automating tedious tasks rather than replacing creative work *Brought to you by:* Brex—The intelligent finance platform built for founders: https://brex.com/howiai *Where to find Tim McAleer:* Website: https://timmcaleer.com/ LinkedIn: https://www.linkedin.com/in/timmcaleer/ *Where to find Claire Vo:* ChatPRD: https://www.chatprd.ai/ Website: https://clairevo.com/ LinkedIn: https://www.linkedin.com/in/clairevo/ X: https://x.com/clairevo *In this episode, we cover:* (00:00) Introduction to Tim McAleer (02:23) The scale of media management in documentary filmmaking (04:16) Building a database system for archival assets (06:02) Early experiments with AI image description (08:59) Adding metadata extraction to improve accuracy (12:54) Scaling from single scripts to a complete REST API (15:16) Processing video with frame sampling and audio transcription (19:10) Implementing vector embeddings for semantic search (21:22) How AI frees up researchers to focus on content discovery (24:21) Demo of “Flip Flop” iOS app for field research (29:33) How structured file naming improves workflow efficiency (32:20) “OCR Party” app for processing historical documents (34:56) The versatility of different app form factors for specific workflows (40:34) Learning approach and parallels with creative software (42:00) Perspectives on AI in the film industry (44:05) Prompting techniques and troubleshooting AI workflows *Tools referenced:* • Claude: https://claude.ai/ • ChatGPT: https://chat.openai.com/ • OpenAI Vision API: https://platform.openai.com/docs/guides/vision • Whisper: https://github.com/openai/whisper • Cursor: https://cursor.sh/ • Superwhisper: https://superwhisper.com/ • CLIP: https://github.com/openai/CLIP • Gemini: https://deepmind.google/technologies/gemini/ *Other references:* • Florentine Films: https://www.florentinefilms.com/ • Ken Burns: https://www.pbs.org/kenburns/ • Muhammad Ali documentary: https://www.pbs.org/kenburns/muhammad-ali/ • The American Revolution series: https://www.pbs.org/kenburns/the-american-revolution/ • Archival Producers Alliance: https://www.archivalproducersalliance.com/genai-guidelines • Exif metadata standard: https://en.wikipedia.org/wiki/Exif • Library of Congress: https://www.loc.gov/ _Production and marketing by https://penname.co/._ _For inquiries about sponsoring the podcast, email jordan@penname.co._

Claire VohostTim McAleerguest
Nov 17, 202547mWatch on YouTube ↗

EVERY SPOKEN WORD

  1. 0:002:23

    Introduction to Tim McAleer

    1. CV

      How did you think about what problems there were to solve in AI relative to your job and the people that you work with, and why did you start where you started?

    2. TM

      Post-production is like a technical mess of media management. You have many different file types. You have images, you have archival footage that you're gathering, live footage that you may have filmed out in the field, interviews, transcripts. So it ends up being hundreds of hours of footage, tens of thousands of photos. The data management piece, when you're dealing with all that different stuff, is the mess that I have used AI to tackle. My goal was to automate this. For years, this has been manual data entry.

    3. CV

      Automate away toil. That's what we wanna do.

    4. TM

      No one was gonna make me this app, and so the ability to make an extremely specific app that makes a workflow on my team and my company easier, it's been an unbelievable moment.

    5. CV

      [upbeat music] Welcome back to How I AI. I'm Claire Vo, product leader and AI obsessive, here on a mission to help you build better with these new tools. Today, we have Tim McAleer, a producer at Ken Burns' Florentine Films, who's responsible for the technology and processes that bring these amazing films to life. Instead of focusing on how AI can create creative for these films, we're actually gonna talk about how Tim uses AI to build software products that make his post-production and research team's lives a lot better. If you're working with images, video, sound, or just a lot of data, this episode is a great one for you. Let's get to it. This episode is brought to you by Brex. If you're listening to this show, you already know AI is changing how we work in real, practical ways. Brex is bringing that same power to finance. Brex is the intelligent finance platform built for founders. With autonomous agents running in the background, your finance stack basically runs itself. Cards are issued, expenses are filed, and fraud is stopped in real time without you having to think about it. Add Brex's banking solution with a high-yield treasury account, and you've got a system that helps you spend smarter, move faster, and scale with confidence. One in three startups in the US already runs on Brex. You can, too, at brex.com/howiai.

  2. 2:234:16

    The scale of media management in documentary filmmaking

    1. CV

      Tim, welcome to How I AI. I'm excited to have you here.

    2. TM

      Thank you for having me.

    3. CV

      What I love about what we're gonna talk about today is you work in a very interesting and creative industry, putting out amazing content, and we're gonna talk a little bit about how AI is impacting the creation side of things. But you've actually used AI to smooth out some of the challenges you've had on the production and post-production side of things. So I'm curious, how did you think about what problems there were to solve in AI relative to your job and the people that you work with, and why did you start where you started?

    4. TM

      Yeah. Uh, I think most of the flashiest use cases of AI in, uh, creation or media and entertainment right now are often in, like, generating full video content or images or whatever it is. But post-production specifically is, like, a technical mess of media management. Especially in nonfiction, you have, like, many different file types, right? And you have images, you have archival footage that you're gathering, live footage that you may have filmed out in the field, interviews, transcripts. And so, like, the data management piece, when you're dealing with all that different stuff, is the mess that I have used AI to tackle, and I think that the sort of, like, AI as a tool versus AI for generation is even more immediately applicable in our field at the moment.

    5. CV

      Well, and I have a very, you know, very simple, humble little podcast, but even for us, we create a lot of research and-

    6. TM

      Yeah

    7. CV

      ... and longer content, and we're editing it down. I'm just curious, with documentaries and nonfiction work, what do you think the ratio is of media captured, researched, and archived to actually publish? 'Cause that will maybe give us a sense of how much of this you have to grapple with to get a good, good piece of content

  3. 4:166:02

    Building a database system for archival assets

    1. CV

      on the end.

    2. TM

      We have a thing in our industry called a shooting ratio, and so you can imagine in, like, a fiction series or, you know, like a sitcom on air, I don't quite know what those shooting ratios would be, but you're working with a script, and so you're gonna have a slightly lower ratio. In documentary, it can get quite high. Like, I can tell you that we made a series about Muhammad Ali a few years ago. It was an eight-hour show. We gathered 20,000 still images in the database of just stills. I think it was over 100 hours of footage, 'cause he had a lot of fights and that kind of thing, news, news footage. And then we also filmed, I wanna say, like, 35 interviews for the piece. So it ends up being, like, hundreds of hours of footage, tens of thousands of photos, and that's just, like... That's one example of a, you know, a particularly famous individual, but that tends to be what it looks like for our shows.

    3. CV

      So that's what you have to manage, make searchable, make usable by the entire production team.

    4. TM

      Yes.

    5. CV

      And you got inspired by ChatGPT and some of these early AI tools to do some of that. So you wanna hop in and show us what, you know, the first use case is?

    6. TM

      Absolutely. So I'm gonna start by kind of just showing you the, like, end result, uh, at, before I go right to, like, how I got here. So on any film that we work on, we end up having some kind of database, right? So this is a database where you can see the still images we've gathered. You can see there's a footage section, a music section, anything that might go into the film, and all the kind of stuff you might expect to see, right? Descriptions, tags, a date on the thing, where we got it from. Um, some more technical details also gonna appear over here. In any event, my goal was to automate this. For years, this has been manual data entry, and so I remember vividly... I'm gonna jump into Cursor now,

  4. 6:028:59

    Early experiments with AI image description

    1. TM

      but I do remember, like, when I first started doing this, it was ChatGPT. I remember ChatGPT-... added image upload, and it was this insane day for us. I was, like, in the office with my colleague, Clark, and we were just, like, throwing images at it and seeing kind of the quality of the output. Like, it was this, an aha moment where it was like, "Oh, my God, this thing can see," and how could we harness this text generation, right? To, to use it for our database entry. So I'm gonna simulate that, like, the starting point, and then we'll jump to where we're at today. But essentially, what it looked like at the beginning was we would throw something into GPT, and we would say, like: "Hey, can you describe this?" And it would hallucinate a little bit, but it was so tempting to figure out a way to harness that, that I started essentially, like, writing little Python scripts with ChatGPT, and at that time, it was like VS Code on one monitor and GPT on another. And I'm gonna-- All right, I'm just gonna go ahead and demo what that kinda looked like. I'm gonna speak my prompts, if that's okay.

    2. CV

      Oh, go ahead.

    3. TM

      I, I use this tool called Super Whisper, uh, because it kinda cleans up my off-the-cuff dictation. So I have a, an image here of a nice street in somewhere America, maybe mid-20th century. We're gonna see what kind of description we get from AI. All right. Uh, write me a script that submits the JPEG at the root of this workspace to OpenAI for a description. I want just a general visual description of what we can see in the image. Uh, any API credentials you need are in a text file at the root of the folder. And what we can see here is that, like, everything I just said got funneled through, uh, this app called Super Whisper. So it got funneled through a prompt that itself is cleaning up my, like, messy vibe coding. I think it's clean enough, so we're gonna go ahead and submit it.

    4. CV

      And I see you're using Claude Four Point Five Sonnet. Is that by choice or by default, or-

    5. TM

      Yeah. That is because I'm on a podcast right now, to be honest. [chuckles] Like, I think this is a very easy task for AI. I could keep it on auto for this, right?

    6. CV

      Yeah.

    7. TM

      I will say, I switch between various Claude models, depending upon the, like, difficulty.

    8. CV

      Mm-hmm.

    9. TM

      And I do try and be cheap and stay on auto if I know that I'm asking for easy stuff, you know?

    10. CV

      Okay, so you're just-- You're, you're giving us a little bit of quality control here.

    11. TM

      Yeah, I don't want it to mess up.

    12. CV

      Okay.

    13. TM

      We're live on air, you know?

    14. CV

      Yeah.

    15. TM

      [chuckles] All right. So it's telling me that I need to install some requirements. My guess is I have those requirements. It's got to submit image script. Let's see what it did. Here we go. It's running, submitting this image to OpenAI for analysis. What kind of, what kind of description will we get? There we go. "This image depicts a small rural main street from what appears to be the mid-20th century." We had guessed that. "There are a series of wooden storefronts, each with signs indicating there are local

  5. 8:5912:54

    Adding metadata extraction to improve accuracy

    1. TM

      businesses." Okay, so this is great, and this is kind of what we were getting in those early days of GPT image upload. But the problem here is, like, you're making a film, you wanna know what rural main street, what town are we in? What is the exact year? And you can't really just go with this kind of generic description. So a lot of times, we happen to know that images come with embedded metadata. And, you know, if you're using your iPhone camera today, you know that maybe there's some metadata, like GPS data, that kind of stuff, but archival images will often come with whatever notes people have scribbled onto them over time. And so I'm gonna now... I'm gonna, I'm gonna iterate on this one time and say: I want you to add a step to this script. I wanna scrape any available metadata from the file first and append that to the prompt. The goal here is that we are using any available metadata as, like, a source of truth for what this image actually is and not just guessing.

    2. CV

      And so just repeating that while this is running, what you're saying is-

    3. TM

      Yeah

    4. CV

      ... for this particular use case, you're working with a set of archival photos from sources that have embedded, uh, probably additional layers of metadata into it-

    5. TM

      Mm

    6. CV

      ... that you can read, that give more information, which is different than, you know, scanning something or taking something off your, off your phone, which I think we're gonna look at a bit later.

    7. TM

      Yeah.

    8. CV

      And so you're trying to harness the structured metadata off this file.

    9. TM

      Mm-hmm.

    10. CV

      Which, if you go back to the tab that shows the, the, the image, we can't see-

    11. TM

      Yeah

    12. CV

      ... with our, with our human eyes.

    13. TM

      Yeah, of course.

    14. CV

      But our, our agent friends can read with its robot brain. Um, and you're using that, that information to then upgrade this script that is gonna do all this AI analysis for you.

    15. TM

      That's exactly right. And so in this case, it's gonna be embedded metadata. I, you know, I happen to know this is an image from Library of Congress. There's gonna be some metadata on it, but it could also be something on the web. Like, where this eventually goes to is, like, okay, I know there's, there's a website with information, may not be in the file, but, hey, how about you go and scrape the web? Gather anything you can know about this, because ultimately, like, this is a journalistic endeavor. We-- The, the, these shows get fact-checked. We want everything going into our database to be, you know, true and verifiable information. All right, so let's see how it did when it added that metadata check. So let's see. We can see in the console it did a little bit of a scrape. It looks messy as hell, but somewhere in here, we can see stuff like, yeah, archival information. And it's now gonna use that. And what we've generally found is that when you add those guardrails, when you give it information you know to be true about the image, it, it relies on that so much more than just what it can see. Like, you know, AI really wants to perform for us. It really [chuckles] wants to do a good job. And so when you give it the tools and information to kind of write a better description, it's gonna, it's gonna be able to get there.

    16. CV

      And I wanna call out some things. So we talked about using the Anthropic Claude models in particular for the actual coding of the script.

    17. TM

      Yeah.

    18. CV

      But you're relying on the OpenAI AI models for the image analysis. Why OpenAI versus any other models, like, stick with the one that you love, or, um, it was the, the first one that did a good job for you, or do you feel like it's-

    19. TM

      Yeah

    20. CV

      ... particularly good at image analysis? I'm curious why you select those different models for different use cases.

    21. TM

      Yeah, it's mostly that it's the first one. Like, they were the first one who had a-- they had a vision preview on their API. They did it before Claude, and, like-... I had built up enough of an infrastructure using that API call that-

    22. CV

      Yeah

    23. TM

      ... it was like the switching costs were too much, you know?

    24. CV

      Yep.

    25. TM

      All right, so let's see what we got this time.

    26. CV

      It's much more detailed.

    27. TM

      It is. It's much more detailed. So the image shows a street scene on the main street of Cascade, Idaho. There we go. We know where it is now. "Captured in 1941 by photographer Russell Lee." We've got photo credits.

  6. 12:5415:16

    Scaling from single scripts to a complete REST API

    1. TM

      All right, so this is a great example of, like, you add the guardrails, and you're gonna get more detail, but you're also just gonna get facts, right? Before... I don't know if it's still up here somewhere. Yeah, before, it was a small, rural main street, now it is the main street of Cascade, Idaho. And so we can imagine this getting duplicated in various ways, right? This image has embedded metadata. Maybe it's a website that we're going and gathering it from. But effectively, like, this is where it all started. It started with a single Python script that I was running on my computer, and I was like, "This is awesome." My database software is, like, it's advanced enough to call external scripts. You can kind of use any database to do this, you know, Airtable, whatever, but you just need something that has an API-

    2. CV

      Mm-hmm

    3. TM

      ... and that can call an external script or web hook or something. So this is where we started, and now I'm gonna switch my screen share to a remote machine, like a little Mac Mini that I have running in my office, and what this... You know, it's hard to- at, at this moment, it's a more complex Cursor workspace you can see. Uh, maybe I'll bop into the rules. Basically, what this is, is a REST API, so that every image file, video file, music file, anything that ends up in that database that we looked at at the beginning, pings off of this REST API for all kinds of different, like, metadata tasks. If I, if I pop into the Jobs folder here for a second, you can- we could zero in on, like, basically what we were just doing but the current iteration of it. So I call it Auto Log, 'cause the process of writing this in for years, the, the, the manual data entry is called logging, so it's not the cleverest name, but it, you know, it fits. And, you know, you got a five-step process here. Basically, first, we're gonna gather the info, meaning like file specs. You know, how big the image is. Is it a JPEG? Is it a TIFF? We're gonna copy the file to our server. We're gonna name it our ID number. We're going to parse it for metadata. Is there any metadata? If there is, great, but either way, we're gonna look for more information on the web in this step four here, scrape URL. And then once we know everything we could possibly know about that image, we're gonna generate a description for it. And when you imagine how this might work for video, well, like, video is itself, it's just 24 images in a second, plus some audio, and so basically, this just gets scaled up to deal with video files, too.

  7. 15:1619:10

    Processing video with frame sampling and audio transcription

    1. CV

      Are you using the same model for video files? Are you taking them, extracting the stills, and putting them through OpenAI, or are you using a different model?

    2. TM

      I use a different model for... So I have to the- the video files requires, like, two levels. Most video, like, AI models out there, seem to do, uh, basically some version of frame sampling, so it could be extremely expensive if you were sending all 24 images every second to an API, right? So I pull at five-second intervals, 'cause I'm cheap. Some others maybe pull in a more- in a smarter way, maybe at, like, lighting changes or something like that. Like, there's different ways of thinking about the frame sampling. So for f- the frame captions themselves, I will use a cheap model. I'll use, like, a Nano, GPT-5 Nano. But then for the... And I can go in and show you a prompt here, which maybe illustrates this. I have frame prompts, which basically ask for just, like, a prompt of an individual still image extracted from video. But then I have a larger parent prompt. You can see that my prompts have gotten slightly more sophisticated- [chuckles]

    3. CV

      ... over time.

    4. TM

      Um, basically, what this does is it sends every single frame that we've extracted from a video file, it extends it- anything, like, any of the audio we've transcribed from that video file, it packages it up into this elaborate prompt, and it sends it to a reasoning model.

    5. CV

      Mm.

    6. TM

      And the purpose of that is to say, like, "These are all the video events that we have observed in this video. Here is, like, a massive text file of data. Tell me what you think is happening in the video."

    7. CV

      Got it.

    8. TM

      Yeah.

    9. CV

      Yeah. I, I, you know, maybe, maybe tip from one of our how- other How I AI guests, but I've found that the Gemini, um-

    10. TM

      Yeah

    11. CV

      ... the Gemini models are quite good with video. It's actually what we use to do our podcast raw recording to, uh, both highlight stills and a blog post that I put out. I process them through the, the Gemini models and have had a lot of success with those ones.

    12. TM

      And it just pulls out, like, the stills that might be, um-

    13. CV

      It just... It automatically pulls interesting stills.

    14. TM

      Mm.

    15. CV

      It actually gives me interesting stills, plus five seconds, or, like, plus five seconds, plus minus five, or minus five seconds, because sometimes the guest and I are looking ridiculous, so-

    16. TM

      Yeah, yeah, yeah, of course.

    17. CV

      [chuckles] Back face. So tip to anybody out there with video who hasn't tried the Gemini models, I, I find those particularly good for this use case.

    18. TM

      You might have just, you know, added something to our-

    19. CV

      There you go

    20. TM

      ... little roadmap here. [chuckles]

    21. CV

      Well, um, and so- and then I'm curious about the audio side of things. So I kinda, you know, I've-

    22. TM

      Mm

    23. CV

      ... I've played with the Gemini models for video. The still makes tons of sense to me. Tell us a little bit about the audio side of things.

    24. TM

      So the audio is also... I, and now I feel like I'm an OpenAI shill. Everything I'm using is OpenAI, and I think, except for the coding, which is interesting, but I think it's just habit. I use Whisper for audio. So, like, Whisper's an incredible open source model for speech-to-text detection. Even the, like, medium-sized model does a pretty good job. And what I do is... And I can pop back into the database software maybe to, like, illustrate this. What I do is I extract... You can see, like, frames pulled every five seconds.

    25. CV

      Mm-hmm.

    26. TM

      And there's a caption associated with each frame, and then there's... This is a shot of an alligator in a swamp, so he doesn't have a- any audio. He wasn't talking.

    27. CV

      [chuckles]

    28. TM

      But I basically pull audio at five-second increments, so that when we send those, like, video events up to the reasoning model, we are sending a full transcript, but we're sending it, like, kinda like pegged to the moment in the video that it happened.... if that makes sense.

    29. CV

      Yep.

    30. TM

      So the transcription is all happening, you know, on my back end over here. Um, everything, like, I think I could probably open up the console and see, like... There we go. Like, someone just sent a, a job through not that long ago. Like, I can kind of come in here and see what my colleagues are doing as they ping my API all day long.

  8. 19:1021:22

    Implementing vector embeddings for semantic search

    1. CV

      Great. And so you're pairing a snapshot image every five seconds from a video, the five-second transcript of the audio, speech-to-text via Whisper-

    2. TM

      Mm-hmm

    3. CV

      ... metadata, if you have it, parsing that all together, and then getting a very robust description and analysis of the content that you have available in, back in this tool that you're using to archive, log, manage all, all your assets.

    4. TM

      Yeah. And like I said, that tool could be kind of agnostic. Like, you could do it in a Google Sheet, if that's, you know, if that's what you like.

    5. CV

      Yeah.

    6. TM

      But, um, I like this. We've been using it for a while. Everything we just talked about is how we kind of get to like metadata that we can read, right? Like, generative metadata that is, A, we know it's accurate because it's kind of been put on these guardrails by our metadata extraction steps, and then also it, it provides this, like, nice visual for us. We can see what this thing is at a glance. But the next step of this, now that you have this, like, API running in the background, is you can generate something that maybe I can't read, but the AI can read pretty well-

    7. CV

      Mm-hmm

    8. TM

      ... which is vector embeddings. So I'll jump back to stills for this, 'cause I think it's a maybe an easier illustration of it. Every asset in our database gets put through m- two modes of embedding. So we'll send the thumbnail through and run it against an open source model. I use CLIP for this, and I'll generate an embedding off of that, and then we'll send the description through. Um, I use, uh, again, an OpenAI text model for this, um, and get an embedding for that, and then we'll fuse them. And the purposes of that is that so now we have, like, the ability to discover things semantically. Like, prior to this, and I think in a lot of film production today, you're working with exact text search, you know?

    9. CV

      Yeah.

    10. TM

      Like, if that description says dog, but, you know, somebody wrote in puppy, you're not finding that image. And so this has been, like, kind of the most exciting part of it, not necessarily where I knew it was going when it started.

    11. CV

      Yeah.

    12. TM

      Like, I was just excited to generate a description, right? But now the ability to discover semantically is, I think, you know, the most, the most, uh, robust part of the system.

  9. 21:2224:21

    How AI frees up researchers to focus on content discovery

    1. CV

      So what I love about this... I mean, a, a couple things is, one, you've really pushed every step of the way. You know, you could have stopped at, like, we got good descriptions, or we got, like, the structured metadata out-

    2. TM

      Yeah

    3. CV

      ... and now I have a script that runs it. You could have stopped at images only, but you took it to video, and video and audio. You could have stopped at structured data only, but you went to embeddings to get semantic search. So I love just the breadth of applicability of the AI in this process. But what I probably love more is I doubt this was anybody's favorite part of their job. Like, I doubt it was anybody's-

    4. TM

      [chuckles]

    5. CV

      ... favorite part of their job to be like, "Mm, I'm gonna go read some Library of Congress metadata."

    6. TM

      It used to be my job, so I can tell you firsthand, not my favorite part. And it's also, like, I think the, the best argument I have for all the work I've done creating this system is that, like, the same people who used to write this data were the ones who are responsible for doing the research.

    7. CV

      Yeah.

    8. TM

      So you've now freed them up to just look more, right?

    9. CV

      Yeah.

    10. TM

      Like, maybe now we could gather 25,000 still images for the Muhammad Ali project, 'cause you have that much more time. You're not just, like, copy and pasting stuff off a website to put it in this form, you know?

    11. CV

      Well, and you probably get to select from this big archive of data better assets to use in your content because they're more discoverable, because you have more confidence in the source and the content of, of that data. So I bet it uplevels, at the end of the day, the quality at, at the end-

    12. TM

      Yeah

    13. CV

      ... um, because you have just much more data to work off of.

    14. TM

      100%. I mean, like, a real quick example of that, too, is, like, I'm gonna use Abe Lincoln here, which is maybe not the best- [chuckles]

    15. CV

      [chuckles]

    16. TM

      ... use, use of this image. But embeddings enable us to find things in ways we never would've thought to find them before. So, like, I have a button down here where when I click it, what it basically is gonna do is a reverse image search within our own collection.

    17. CV

      Mm-hmm.

    18. TM

      So if I li- if I'm an editor, and I like an image, and this is gonna take a while, 'cause I'm not on site, but if I like an image, I can click the Find Similar button, and it's just gonna go and find every image that kind of has that vibe.

    19. CV

      Yeah.

    20. TM

      You can see here we have a duplicate of this one.

    21. CV

      Just-

    22. TM

      But then there you go. It recognized the man, and it started pulling in other portraits.

    23. CV

      [upbeat music] This episode is brought to you by Brex. If you're listening to this show, you already know AI is changing how we work in real, practical ways. Brex is bringing that same power to finance. Brex is the intelligent finance platform built for founders. With autonomous agents running in the background, your finance stack basically runs itself. Cards are issued, expenses are filed, and fraud is stopped in real time without you having to think about it. Add Brex's banking solution with a high-yield treasury account, and you've got a system that helps you spend smarter, move faster, and scale with confidence. One in three startups in the US already runs on Brex. You can, too, at brex.com/howiai.

  10. 24:2129:33

    Demo of “Flip Flop” iOS app for field research

    1. CV

      I love this. Okay, so this is more of your archival and footage data-

    2. TM

      Yeah

    3. CV

      ... but you capture a lot of stuff in the field, where people are not sitting in front of Cursor or their desktop and looking through these assets, and I know that you use some vibe coding and a creative approach to get more information about those assets. Could you walk us through that?

    4. TM

      Yeah. So the next use case is an app that I developed for archival research in the field. So I think that we, we really pride ourselves on, like, turning over every rock, on, on not just relying on what's digitized and available online, and going and visiting physical archives. And so, um, the process of visiting a physical archive-... is basically you have a bunch of folders, um, that you pull ahead of time. You arrive there, and your goal is just to snap, like, low-res, resolution iPhone snaps of everything you can possibly get. And so you're snapping the front of the image, and you're snapping the back of the image. Because the back is typically where there's gonna be, like, a scrolled description or maybe like, a, an accession number, an ID number, that the archive has added themselves. And so this process used to look like you show up at the archive, you take iPhone snaps for two days, you get back to the office, you have the messiest camera roll you've ever had. You cannot actually pair your fronts to your backs because it just got out- somehow it got out of order along the way. And so the goal was basically to make that process, like, a little better. So I, I vibe coded this iOS app to deal with this problem, and I, I, I tend to just, like, speak in screens. Like, the way... Maybe it's because I'm a visual person, like, the way I deal with it is I just think like: Okay, I see a screen that does this and a screen that does this. I, I imagine a button that does this. And the purpose of this was basically, like, I want people to be able to create collections for each folder they're capturing. I want them to be able to snap a front and a back, um, like a, the, the flip side of the image, uh, so that they can easily associate those, so the file names associate them. And I wanna immediately transcribe any information on the back and embed it into the original image. So now I have this app called Flip Flop. I ask ChatGPT at the end of my dog walk to generate some kind of specs doc or requirement doc. It pretty much does it in one go. If you chat with it for 30 minutes, you know, you can get a lot done. [chuckles] Uh, and then I fed this PRD to Claude Code, and it, uh, this one, it, like, it, it didn't build it in one shot, but it certainly built the UI in one shot. And so I guess maybe we should just jump into, like, the actual app?

    5. CV

      Yeah, let's do it.

    6. TM

      So Flip Flop, which is my cute little name for it, is, uh, basically designed to capture those fronts and backs that I was talking about. So you have three screens here. You've got a collection screen, where you're gonna create your folders. You've got a capture screen, where you're gonna take your images. And I'll just quickly highlight this part, which is where you kind of have your AI processing options. So I allow people to define a separate prompt for what I call the flip side of the image, the front, and the flop side of the image, the back. And so in this example, I'm gonna show you some photos of my dog now, and, uh, the flop side of the image is gonna have some text on it. So our prompts here are really just designed to get a decent caption from the image and to transcribe any text that we see on the back end. So let's create a new collection. We're gonna call it How I AI. That- that's good enough. There's also an option here to add more context. You know, the AI loves context, and so maybe if you're... You know, you can imagine if you're digitizing an entire collection of, you know, someone's personal letters or someone's, uh, portrait photographs, you would add that kind of thing here. But for now, we're just gonna create a collection, tap into that collection, and capture. So here we go.

    7. CV

      Puppy.

    8. TM

      It's a screen share within a screen share. We're gonna not care about the glare too much. I'm gonna capture the front side of this image of my dog, Tony's, third birthday. I now have the option to add notes, if that's what I wanna do, or I could just add a flop side of the image right here. And when I complete that, it will have, 'cause it's lightning fast, already sent it up to OpenAI for a description and embedded it-- and this is the really crucial thing, 'cause you just saw the first system I had-- embedded it in the image metadata itself. So the flop details have the transcription, "Tony's third birthday," and all of that will show up in the, what we call Exif metadata, which is just the image metadata standard.

    9. CV

      Got it, and just for people that that may be passed by-

    10. TM

      Yeah

    11. CV

      ... instead of simply generating kind of the text description and storing that in a database relative to the original image you took, you actually now have this structured metadata on the image file itself, which again, like, what a pain-

    12. TM

      Oh, a giant-

    13. CV

      -to do ]

    14. TM

      ... a giant pain. Yeah.

    15. CV

      To, a pain to do manually. And so now anytime anybody uses one of these images, even if they don't have, um, access to this, this app, even now, that, that, that image is embedded with that metadata.

    16. TM

      A

  11. 29:3332:20

    How structured file naming improves workflow efficiency

    1. TM

      hundred percent. So you could pull this onto any computer or any app, anything that can read underlying metadata, and it's gonna be able to see that this was Tony's third birthday. And that, so that's structured metadata in the sense that we've now structured the actual information about the image, but the other thing that's really crucial, honestly, is that we've structured the files themselves, right? So you can see they're getting named in a particular way.

    2. CV

      Mm-hmm.

    3. TM

      And so we've moved from, like, camera roll mess to, like, files that are gonna sort in your, in your computer, that you're gonna be able to import cleanly. You're gonna be able to distinguish easily what's the front of the image, what's the back of the image, and that has, I think, been the other unlock. Like, I had two colleagues out in the field a couple weeks ago, and they came back with 1,400 images, and I don't think that's only because they were able to use Flip Flop to capture it, but I think Flip Flop is certainly making the process easier since they've gotten back.

    4. CV

      The, the thing that I wanna call out for folks, maybe a general takeaway here, is these AI models are so good with files-

    5. TM

      Yeah

    6. CV

      ... and code can do a lot of stuff with files, and a lot of the people we talk to, um, you know, markdown, mark- markdown is the file type du jour these days, which is, you know, just like a, a specially formatted text document. But if you start to look at other file types and really understand what can be put in a particular file type, you can actually discover some pretty interesting things you can do with a combination of AI and coding to make those files much more useful for your use case. So this is one of these takeaways where I'm like, "I haven't thought about-... like what can be embedded in an image file or what can be embedded in a video file, and even just having, you know, ChatGPT or one of your general models say, "Hey, I'm working with an image. How can I load it up with as much context and specificity as possible? What's available to me?" And then using that as a jumping-off point for what you do is a pretty interesting use case of AI.

    7. TM

      I didn't even know, like, I'm very familiar with stills' underlying metadata fields, but I didn't really know what was available in audio or what was available in, in, in video files. And y- I just sort of, I go into Cursor and I ask, right? Like, now we have a music workflow, which we're not gonna look at, but, like, where we embed artist, album, kind of like licensing data into any music we consider for a film, and I didn't know that there was a m- metadata field we could just store that in, but of course there is, you know?

    8. CV

      Yeah.

    9. TM

      Uh, somebody thought of this a long time ago.

    10. CV

      Yep. Amazing. Okay, we have one last use case, which, um, Mom, if you're listening, I think you're gonna like this one. My mom's a genealogist-

    11. TM

      Oh, okay!

    12. CV

      ... so, uh, I think she's gonna like this, this use case. But let's show it first, and then I'll call out Mama where I think [chuckles] you can use

  12. 32:2034:56

    “OCR Party” app for processing historical documents

    1. CV

      it.

    2. TM

      Okay. All right, so you can imagine in our films we work with a lot of documents, and we're not always interested in the entire document. Sometimes, like, we just want to transcribe maybe part of it, maybe, um, we wanna translate and transcribe part of it. Like, take this newspaper document, for instance. Like, maybe the Arkansas State News is the article we're interested in. That's the transcript we want to be searchable. That's what our editor might wanna consider for the film. We can't just, like, put this in Adobe Acrobat and OCR the whole thing. It's, like, it's not gonna work, and even more than that, like, the quality of the image would not work with most OCR engines, you know? So AI is really good at OCR of old documents. It's really good at handwriting. It's pretty good at translation, too. So I built... And we're not gonna get into the building necessarily, but this is, this is one of the few, like, Xcode builds I had to do. So this is a Swift build, a little Mac menu bar app. It's called OCR Party, uh, which stems from the fact that we're just OCR-ing part of the image. You gotta have fun with these things.

    3. CV

      [chuckles]

    4. TM

      And let's see. We're gonna open up that newspaper in OCR Party. We're gonna get, like, a little preview window. So let's say actually what we want is, "Coolidge Seeks Peace in the World." Let's zoom in a little bit. Let's open up our cropping tool. This little thing down here is basically a choice between macOS Vision and, uh, an AI API call.

    5. CV

      Mm.

    6. TM

      And the purpose of that is because sometimes people don't... Sometimes people don't trust AI, you might have heard.

    7. CV

      [chuckles]

    8. TM

      And so I, I built that in as an option, essentially. I would, I would think the AI option gets used more, but nevertheless, now you're gonna select just the part of this article you care about or this paper that you care about, and you can see there's, like, a crease in the paper. There's a weird black mark here, but you can imagine we submit this for OCR. Now we have just that text that we pulled. We're also calling out for our editors, like, where on the page they're gonna be able to find it if they wanna sort of zoom in on it, crop to that particular article. And I can't exactly remember what text we were looking at, but it certainly completed those sentences where there was a black marker.

    9. CV

      Yep.

    10. TM

      Right? So AI was able to kind of infer, to the best of our ability, what that sentence might have said. And, you know, if this ends up in a film, I could guarantee it would get fact-checked later, but for the purposes of gathering documents, thousands of documents, this ability to kind of, like, precisely OCR is, is- it's been a nice little

  13. 34:5640:34

    The versatility of different app form factors for specific workflows

    1. TM

      unlock for us.

    2. CV

      One thing I also want to make sure people take away from this episode is, we've seen basically three form factors of apps.

    3. TM

      Yeah.

    4. CV

      So yes, they've all used AI, but you've been able to swap between sort of like a Python API service that gets called by another software application or database, a, um, iOS app that, you know, you can run on your phone, and then, like, a little desktop toolbar widget. And what I like- what I love about this moment in AI with, uh, with regards to software engineering is, like, if you have basic software engineering practices and then you know enough to be dangerous, like, yeah, you can, you can vibe code, uh, and, you know, a Swift, Swift app to run on, on your local desktop.

    5. TM

      Just a hyper-specific app.

    6. CV

      Yeah, just-

    7. TM

      You know, like, no one was gonna make me this app.

    8. CV

      No.

    9. TM

      And so the ability to make, like, an extremely specific app that makes a workflow, you know, on my team and my company easier, it's been, it's been an unbelievable moment. [chuckles]

    10. CV

      Yeah, I, I would say the TAM for this app is, like, you. [chuckles]

    11. TM

      Yeah. Yeah, yeah, yeah. I mean, I think I could sell it to, like, two colleagues. [chuckles]

    12. CV

      Well, and then my mom, so what I was gonna tell you-

    13. TM

      Oh, good

    14. CV

      ... is my mom, um, is a genealogist for, uh, the [chuckles] Daughters of the American Revolution, of which I am one. Uh, fun fact on Claire.

    15. TM

      Oh, no way!

    16. CV

      And she does the lineage tracing, and do you know how many times she screenshots something and is like, "Can you read this cursive?" Like, "What in the world-

    17. TM

      Yeah

    18. CV

      ... is this name?"

    19. TM

      Yeah.

    20. CV

      And it's like, you know, one name and a big, a big image, and so I do think AI is... And I'm like, "Yeah, I'm gonna drop this in a ChatGPT, and I'll tell you what I think it says."

    21. TM

      Mm.

    22. CV

      And I think its ability to read handwriting, um, old typefaces, kind of understand the nuances of, of spelling and things like that, are just really, really interesting for these sort of, um, research use cases.

    23. TM

      Yeah. We didn't look at a handwritten doc here, but that is definitely something happening, uh, at our company, like, the ability to read letters that we could not read before, and also just other languages, right? And then we immediately have that text to... You have letters written in some kind of cursive scrawl from the 17th century that is now translated to English and made legible for you.

    24. CV

      ... amazing! Well, we've seen three great use cases. I am sure you are the hero on the team, [chuckles]

    25. TM

      [chuckles]

    26. CV

      uh, for this kind of stuff, 'cause I can imagine, again-

    27. TM

      I think people might be tired of hearing me talk about AI-

    28. CV

      No

    29. TM

      ... but thank you.

    30. CV

      Yeah, but I mean, this is, this is hard stuff. It's tedious work to do. It cr- you know, requires a lot of time, a lot of detail orientation, and I'm sure people love using this information-

  14. 40:3442:00

    Learning approach and parallels with creative software

    1. CV

      I'm curious, you know, what's your point of view of AI, particularly in the film world? What are you excited about, and where do you think these kind of concerns are really warranted? And then where do you think the most practical applications are?

    2. TM

      I think today, it's, like, sort of where we started at the top. The practical applications are more in, like, tooling than they are in creation, but I do think that, like, the creation's gonna get there. Like, today, I play with, I play with all the generative video models. Like, how can I not?

    3. CV

      Mm.

    4. TM

      They're, they're super fun. Um, they are not, like, at professional-grade quality yet. Like, the amount of time you spend throwing tokens at even the highest-end video models, you're not gonna be able to match your shots that well. You're not gonna be able to match the footage you shot yourself that well, and so I don't think they're there yet, but let... Like, I'll be honest, they're gonna get there. I think that, like, they are still exciting to me, but I would separate a couple things. Like, in the nonfiction world, I think it- I think people should be careful. Like, I think-

    5. CV

      Yeah

    6. TM

      ... we should not be generating archival footage. We should not be trying to fool our viewers into thinking that there was video in 1750, you know? And I think that that's the part that's, like, a little scary.

    7. CV

      Mm.

    8. TM

      And then, of course, there's the displ- like, job displacement aspect of things.

    9. CV

      Yeah.

    10. TM

      I think people are scared. If you film stuff for a living, you're definitely scared that, like, y- that you're gonna be able to just, like, use text to generate that same-

    11. CV

      Yeah

    12. TM

      ... video you used to shoot. So I don't know how to, like... I don't think anybody has, like,

  15. 42:0044:05

    Perspectives on AI in the film industry

    1. TM

      good answers to that part of it.

    2. CV

      Mm-hmm.

    3. TM

      But my approach has certainly just been, like, jump in and learn the tools. Like, they are, they are gonna be here-

    4. CV

      Yeah

    5. TM

      ... whether we want them to be or not, and, uh, I think that they have a lot of practical benefits today that are less scary.

    6. CV

      Yeah. The best advice I can give to people, and I have, I have, uh... Of all the spaces, and I'll say this honestly, of all the spaces, I have the most job displacement concern, it's in video generation for-

    7. TM

      Yeah

    8. CV

      ... non, um, non-archival, non-documentary-

    9. TM

      Commercial

    10. CV

      ... cases, but commercial use cases.

    11. TM

      Yeah.

    12. CV

      Um, you just, you just see how it could be very applicable, and the best advice that I can give to people in this moment is, the more you learn the tools, the better off you will be. Whether or not, you know, whether or not you love where the tools are taking us as an industry or as a culture, kn- knowledge is power, and so the more you learn and understand, one, you can identify opportunities where it does add value, even in your creative process, and two, you're gonna be differentiated in the market from a job perspective-

    13. TM

      Mm

    14. CV

      ... because you're gonna have a more robust sense of what's available in your industry, and I think that stands for people in your industry. I think it stands for people in my industry, in technology. So I just say, there is no harm in learning this stuff.

    15. TM

      Yeah.... Absolutely. I also think that, like, there's a place in the process for it, which allows you, like, a place to learn without thinking it needs to end up in the final product, right? Like, you can use video models for storyboarding all day. You can maybe prove whether or not that shoot is worth spending that money on. Now you've learned how to use the video models a little bit, and, you know, you haven't necessarily displaced anyone, but you've, like, made your production a little bit more efficient, a little smarter. Maybe you've shot better footage as a result of it, you know?

    16. CV

      Yes, but we're not, we're not generating fake archival footage of, like, Genghis Khan. [chuckles]

    17. TM

      We're not, we are not doing that.

    18. CV

      [chuckles]

    19. TM

      Uh, definitely not doing that. And unlike PBS, which is where most of our films end up, have a lot of guidelines around that, and I think that's a good thing, but it's the other stuff. It's commercial, it's visual effects. Like, a lot of that stuff's gonna get easier, um, and so it's, it's coming one way or another.

  16. 44:0547:36

    Prompting techniques and troubleshooting AI workflows

    1. CV

      Great. Well, last question, have to ask you, when, you know, you're on your dog walk with ChatGPT doing voice mode, and it's not listening to you or not giving you what you want, what is your personal prompting technique? Especially 'cause you use voice.

    2. TM

      Yeah.

    3. CV

      Like, I'm willing to type things to AI. I don't know if I'd be willing [chuckles] to say them. So what's, what's your technique here?

    4. TM

      It definitely is different when you have to say it out loud. Um, I am, I am super nice to the AI. I, like, can vividly remember the one time I was mean to it. I'm nice to the... I don't know where this is going. I'm gonna be nice to all the models.

    5. CV

      [chuckles]

    6. TM

      What I do is, like, for lack of a better way of describing it, I just start over. Like, I will-- I know that a lot of these things have ways of, like, consolidating the context window now and sort of summarizing, but I will ask for what I call, like, a resume work prompt. I'll be like: "This isn't working. I want to resume work later with another AI dev. Can you give me a prompt with everything they'll need to know?" And typically, what you'll find is that that prompt shows you where it was off, you know? Like, in its summarization of what it was doing, I'll be like, "Oh, see, like, I wasn't asking for that. That's, that's why we were not communicating." And then I'll take that resume work prompt, I'll prune it a little bit, pop it into another chat, and then, you know, you'll find that you wish you hadn't beat your head against the wall with the previous chat for 20 minutes.

    7. CV

      You know, I am also Team Be Polite to Your AI, but then again, like, you hurt the one you love the most, and I've, I've found myself occasionally getting testy. And you know when I stopped being mean to AI is when reasoning really started to show, and I could see it reasoning how upset I was. It was like-

    8. TM

      Oh, it'd be like, "The user is mad at me right now." [chuckles]

    9. CV

      ... "The user is really frustrated with me right now. I need to totally rethink my [chuckles] approach." Go sweet, sweet baby AI, I'm sorry.

    10. TM

      Yeah.

    11. CV

      I apologize.

    12. TM

      Yeah.

    13. CV

      I'm not that mad at you. Okay, so create a, you know, g- return to progress prompt-

    14. TM

      Yeah

    15. CV

      ... really get the summary. Take that to understand if there was some misunderstanding, improve that, and then just start fresh. That's great.

    16. TM

      Yeah.

    17. CV

      Well, Tim, this has been super fun. So much for me to learn. I have tons of ideas, even just for my day-to-day life, about how I could use... I have kids, so I probably have 30,000 images.

    18. TM

      Let me know if your mom wants the OCR Party. I'm happy to-

    19. CV

      I, I will. She'll love it. Okay, Mom, I have gotten you your first Vibe Coded app-

    20. TM

      [chuckles]

    21. CV

      ... direct from the podcast source. Tim, where can we find you, and how can we be helpful?

    22. TM

      Yeah. Uh, I'm not that active on social, to be honest, but I am on... I'm LinkedIn. You can find me on there. I have a website that is itself a fun Vibe Code project, so you can find me, timmcaleer.com. I have a little chatbot there, the GP Tim. You can go chat with him, learn a little bit more, more about me and my work. Uh, and then other than that, I would say tune in to Florentine Films' upcoming production. We have an, a, a series about the American Revolution coming out in November, so on your local PBS station.

    23. CV

      My kids are obsessed with the American Revolution, so everybody watching-

    24. TM

      Sounds like it's in the family. [chuckles]

    25. CV

      It... Yeah, we will, we will be, uh, big fans. Tim, this has been great. Thank you so much, and thanks for joining How I AI.

    26. TM

      Thank you for having me. [upbeat music]

    27. CV

      Thanks so much for watching. If you enjoyed the show, please like and subscribe here on YouTube, or even better, leave us a comment with your thoughts. You can also find this podcast on Apple Podcasts, Spotify, or your favorite podcast app. Please consider leaving us a rating and review, which will help others find the show. You can see all our episodes and learn more about the show at howiai pod.com. See you next time! [upbeat music]

Episode duration: 47:36

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode 9ngbZwA_h00

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome