How I AIHow Emmy Award–winning filmmakers use AI to automate the tedious parts of documentaries
EVERY SPOKEN WORD
45 min read · 9,486 words- 0:00 – 2:23
Introduction to Tim McAleer
- CVClaire Vo
How did you think about what problems there were to solve in AI relative to your job and the people that you work with, and why did you start where you started?
- TMTim McAleer
Post-production is like a technical mess of media management. You have many different file types. You have images, you have archival footage that you're gathering, live footage that you may have filmed out in the field, interviews, transcripts. So it ends up being hundreds of hours of footage, tens of thousands of photos. The data management piece, when you're dealing with all that different stuff, is the mess that I have used AI to tackle. My goal was to automate this. For years, this has been manual data entry.
- CVClaire Vo
Automate away toil. That's what we wanna do.
- TMTim McAleer
No one was gonna make me this app, and so the ability to make an extremely specific app that makes a workflow on my team and my company easier, it's been an unbelievable moment.
- CVClaire Vo
[upbeat music] Welcome back to How I AI. I'm Claire Vo, product leader and AI obsessive, here on a mission to help you build better with these new tools. Today, we have Tim McAleer, a producer at Ken Burns' Florentine Films, who's responsible for the technology and processes that bring these amazing films to life. Instead of focusing on how AI can create creative for these films, we're actually gonna talk about how Tim uses AI to build software products that make his post-production and research team's lives a lot better. If you're working with images, video, sound, or just a lot of data, this episode is a great one for you. Let's get to it. This episode is brought to you by Brex. If you're listening to this show, you already know AI is changing how we work in real, practical ways. Brex is bringing that same power to finance. Brex is the intelligent finance platform built for founders. With autonomous agents running in the background, your finance stack basically runs itself. Cards are issued, expenses are filed, and fraud is stopped in real time without you having to think about it. Add Brex's banking solution with a high-yield treasury account, and you've got a system that helps you spend smarter, move faster, and scale with confidence. One in three startups in the US already runs on Brex. You can, too, at brex.com/howiai.
- 2:23 – 4:16
The scale of media management in documentary filmmaking
- CVClaire Vo
Tim, welcome to How I AI. I'm excited to have you here.
- TMTim McAleer
Thank you for having me.
- CVClaire Vo
What I love about what we're gonna talk about today is you work in a very interesting and creative industry, putting out amazing content, and we're gonna talk a little bit about how AI is impacting the creation side of things. But you've actually used AI to smooth out some of the challenges you've had on the production and post-production side of things. So I'm curious, how did you think about what problems there were to solve in AI relative to your job and the people that you work with, and why did you start where you started?
- TMTim McAleer
Yeah. Uh, I think most of the flashiest use cases of AI in, uh, creation or media and entertainment right now are often in, like, generating full video content or images or whatever it is. But post-production specifically is, like, a technical mess of media management. Especially in nonfiction, you have, like, many different file types, right? And you have images, you have archival footage that you're gathering, live footage that you may have filmed out in the field, interviews, transcripts. And so, like, the data management piece, when you're dealing with all that different stuff, is the mess that I have used AI to tackle, and I think that the sort of, like, AI as a tool versus AI for generation is even more immediately applicable in our field at the moment.
- CVClaire Vo
Well, and I have a very, you know, very simple, humble little podcast, but even for us, we create a lot of research and-
- TMTim McAleer
Yeah
- CVClaire Vo
... and longer content, and we're editing it down. I'm just curious, with documentaries and nonfiction work, what do you think the ratio is of media captured, researched, and archived to actually publish? 'Cause that will maybe give us a sense of how much of this you have to grapple with to get a good, good piece of content
- 4:16 – 6:02
Building a database system for archival assets
- CVClaire Vo
on the end.
- TMTim McAleer
We have a thing in our industry called a shooting ratio, and so you can imagine in, like, a fiction series or, you know, like a sitcom on air, I don't quite know what those shooting ratios would be, but you're working with a script, and so you're gonna have a slightly lower ratio. In documentary, it can get quite high. Like, I can tell you that we made a series about Muhammad Ali a few years ago. It was an eight-hour show. We gathered 20,000 still images in the database of just stills. I think it was over 100 hours of footage, 'cause he had a lot of fights and that kind of thing, news, news footage. And then we also filmed, I wanna say, like, 35 interviews for the piece. So it ends up being, like, hundreds of hours of footage, tens of thousands of photos, and that's just, like... That's one example of a, you know, a particularly famous individual, but that tends to be what it looks like for our shows.
- CVClaire Vo
So that's what you have to manage, make searchable, make usable by the entire production team.
- TMTim McAleer
Yes.
- CVClaire Vo
And you got inspired by ChatGPT and some of these early AI tools to do some of that. So you wanna hop in and show us what, you know, the first use case is?
- TMTim McAleer
Absolutely. So I'm gonna start by kind of just showing you the, like, end result, uh, at, before I go right to, like, how I got here. So on any film that we work on, we end up having some kind of database, right? So this is a database where you can see the still images we've gathered. You can see there's a footage section, a music section, anything that might go into the film, and all the kind of stuff you might expect to see, right? Descriptions, tags, a date on the thing, where we got it from. Um, some more technical details also gonna appear over here. In any event, my goal was to automate this. For years, this has been manual data entry, and so I remember vividly... I'm gonna jump into Cursor now,
- 6:02 – 8:59
Early experiments with AI image description
- TMTim McAleer
but I do remember, like, when I first started doing this, it was ChatGPT. I remember ChatGPT-... added image upload, and it was this insane day for us. I was, like, in the office with my colleague, Clark, and we were just, like, throwing images at it and seeing kind of the quality of the output. Like, it was this, an aha moment where it was like, "Oh, my God, this thing can see," and how could we harness this text generation, right? To, to use it for our database entry. So I'm gonna simulate that, like, the starting point, and then we'll jump to where we're at today. But essentially, what it looked like at the beginning was we would throw something into GPT, and we would say, like: "Hey, can you describe this?" And it would hallucinate a little bit, but it was so tempting to figure out a way to harness that, that I started essentially, like, writing little Python scripts with ChatGPT, and at that time, it was like VS Code on one monitor and GPT on another. And I'm gonna-- All right, I'm just gonna go ahead and demo what that kinda looked like. I'm gonna speak my prompts, if that's okay.
- CVClaire Vo
Oh, go ahead.
- TMTim McAleer
I, I use this tool called Super Whisper, uh, because it kinda cleans up my off-the-cuff dictation. So I have a, an image here of a nice street in somewhere America, maybe mid-20th century. We're gonna see what kind of description we get from AI. All right. Uh, write me a script that submits the JPEG at the root of this workspace to OpenAI for a description. I want just a general visual description of what we can see in the image. Uh, any API credentials you need are in a text file at the root of the folder. And what we can see here is that, like, everything I just said got funneled through, uh, this app called Super Whisper. So it got funneled through a prompt that itself is cleaning up my, like, messy vibe coding. I think it's clean enough, so we're gonna go ahead and submit it.
- CVClaire Vo
And I see you're using Claude Four Point Five Sonnet. Is that by choice or by default, or-
- TMTim McAleer
Yeah. That is because I'm on a podcast right now, to be honest. [chuckles] Like, I think this is a very easy task for AI. I could keep it on auto for this, right?
- CVClaire Vo
Yeah.
- TMTim McAleer
I will say, I switch between various Claude models, depending upon the, like, difficulty.
- CVClaire Vo
Mm-hmm.
- TMTim McAleer
And I do try and be cheap and stay on auto if I know that I'm asking for easy stuff, you know?
- CVClaire Vo
Okay, so you're just-- You're, you're giving us a little bit of quality control here.
- TMTim McAleer
Yeah, I don't want it to mess up.
- CVClaire Vo
Okay.
- TMTim McAleer
We're live on air, you know?
- CVClaire Vo
Yeah.
- TMTim McAleer
[chuckles] All right. So it's telling me that I need to install some requirements. My guess is I have those requirements. It's got to submit image script. Let's see what it did. Here we go. It's running, submitting this image to OpenAI for analysis. What kind of, what kind of description will we get? There we go. "This image depicts a small rural main street from what appears to be the mid-20th century." We had guessed that. "There are a series of wooden storefronts, each with signs indicating there are local
- 8:59 – 12:54
Adding metadata extraction to improve accuracy
- TMTim McAleer
businesses." Okay, so this is great, and this is kind of what we were getting in those early days of GPT image upload. But the problem here is, like, you're making a film, you wanna know what rural main street, what town are we in? What is the exact year? And you can't really just go with this kind of generic description. So a lot of times, we happen to know that images come with embedded metadata. And, you know, if you're using your iPhone camera today, you know that maybe there's some metadata, like GPS data, that kind of stuff, but archival images will often come with whatever notes people have scribbled onto them over time. And so I'm gonna now... I'm gonna, I'm gonna iterate on this one time and say: I want you to add a step to this script. I wanna scrape any available metadata from the file first and append that to the prompt. The goal here is that we are using any available metadata as, like, a source of truth for what this image actually is and not just guessing.
- CVClaire Vo
And so just repeating that while this is running, what you're saying is-
- TMTim McAleer
Yeah
- CVClaire Vo
... for this particular use case, you're working with a set of archival photos from sources that have embedded, uh, probably additional layers of metadata into it-
- TMTim McAleer
Mm
- CVClaire Vo
... that you can read, that give more information, which is different than, you know, scanning something or taking something off your, off your phone, which I think we're gonna look at a bit later.
- TMTim McAleer
Yeah.
- CVClaire Vo
And so you're trying to harness the structured metadata off this file.
- TMTim McAleer
Mm-hmm.
- CVClaire Vo
Which, if you go back to the tab that shows the, the, the image, we can't see-
- TMTim McAleer
Yeah
- CVClaire Vo
... with our, with our human eyes.
- TMTim McAleer
Yeah, of course.
- CVClaire Vo
But our, our agent friends can read with its robot brain. Um, and you're using that, that information to then upgrade this script that is gonna do all this AI analysis for you.
- TMTim McAleer
That's exactly right. And so in this case, it's gonna be embedded metadata. I, you know, I happen to know this is an image from Library of Congress. There's gonna be some metadata on it, but it could also be something on the web. Like, where this eventually goes to is, like, okay, I know there's, there's a website with information, may not be in the file, but, hey, how about you go and scrape the web? Gather anything you can know about this, because ultimately, like, this is a journalistic endeavor. We-- The, the, these shows get fact-checked. We want everything going into our database to be, you know, true and verifiable information. All right, so let's see how it did when it added that metadata check. So let's see. We can see in the console it did a little bit of a scrape. It looks messy as hell, but somewhere in here, we can see stuff like, yeah, archival information. And it's now gonna use that. And what we've generally found is that when you add those guardrails, when you give it information you know to be true about the image, it, it relies on that so much more than just what it can see. Like, you know, AI really wants to perform for us. It really [chuckles] wants to do a good job. And so when you give it the tools and information to kind of write a better description, it's gonna, it's gonna be able to get there.
- CVClaire Vo
And I wanna call out some things. So we talked about using the Anthropic Claude models in particular for the actual coding of the script.
- TMTim McAleer
Yeah.
- CVClaire Vo
But you're relying on the OpenAI AI models for the image analysis. Why OpenAI versus any other models, like, stick with the one that you love, or, um, it was the, the first one that did a good job for you, or do you feel like it's-
- TMTim McAleer
Yeah
- CVClaire Vo
... particularly good at image analysis? I'm curious why you select those different models for different use cases.
- TMTim McAleer
Yeah, it's mostly that it's the first one. Like, they were the first one who had a-- they had a vision preview on their API. They did it before Claude, and, like-... I had built up enough of an infrastructure using that API call that-
- CVClaire Vo
Yeah
- TMTim McAleer
... it was like the switching costs were too much, you know?
- CVClaire Vo
Yep.
- TMTim McAleer
All right, so let's see what we got this time.
- CVClaire Vo
It's much more detailed.
- TMTim McAleer
It is. It's much more detailed. So the image shows a street scene on the main street of Cascade, Idaho. There we go. We know where it is now. "Captured in 1941 by photographer Russell Lee." We've got photo credits.
- 12:54 – 15:16
Scaling from single scripts to a complete REST API
- TMTim McAleer
All right, so this is a great example of, like, you add the guardrails, and you're gonna get more detail, but you're also just gonna get facts, right? Before... I don't know if it's still up here somewhere. Yeah, before, it was a small, rural main street, now it is the main street of Cascade, Idaho. And so we can imagine this getting duplicated in various ways, right? This image has embedded metadata. Maybe it's a website that we're going and gathering it from. But effectively, like, this is where it all started. It started with a single Python script that I was running on my computer, and I was like, "This is awesome." My database software is, like, it's advanced enough to call external scripts. You can kind of use any database to do this, you know, Airtable, whatever, but you just need something that has an API-
- CVClaire Vo
Mm-hmm
- TMTim McAleer
... and that can call an external script or web hook or something. So this is where we started, and now I'm gonna switch my screen share to a remote machine, like a little Mac Mini that I have running in my office, and what this... You know, it's hard to- at, at this moment, it's a more complex Cursor workspace you can see. Uh, maybe I'll bop into the rules. Basically, what this is, is a REST API, so that every image file, video file, music file, anything that ends up in that database that we looked at at the beginning, pings off of this REST API for all kinds of different, like, metadata tasks. If I, if I pop into the Jobs folder here for a second, you can- we could zero in on, like, basically what we were just doing but the current iteration of it. So I call it Auto Log, 'cause the process of writing this in for years, the, the, the manual data entry is called logging, so it's not the cleverest name, but it, you know, it fits. And, you know, you got a five-step process here. Basically, first, we're gonna gather the info, meaning like file specs. You know, how big the image is. Is it a JPEG? Is it a TIFF? We're gonna copy the file to our server. We're gonna name it our ID number. We're going to parse it for metadata. Is there any metadata? If there is, great, but either way, we're gonna look for more information on the web in this step four here, scrape URL. And then once we know everything we could possibly know about that image, we're gonna generate a description for it. And when you imagine how this might work for video, well, like, video is itself, it's just 24 images in a second, plus some audio, and so basically, this just gets scaled up to deal with video files, too.
- 15:16 – 19:10
Processing video with frame sampling and audio transcription
- CVClaire Vo
Are you using the same model for video files? Are you taking them, extracting the stills, and putting them through OpenAI, or are you using a different model?
- TMTim McAleer
I use a different model for... So I have to the- the video files requires, like, two levels. Most video, like, AI models out there, seem to do, uh, basically some version of frame sampling, so it could be extremely expensive if you were sending all 24 images every second to an API, right? So I pull at five-second intervals, 'cause I'm cheap. Some others maybe pull in a more- in a smarter way, maybe at, like, lighting changes or something like that. Like, there's different ways of thinking about the frame sampling. So for f- the frame captions themselves, I will use a cheap model. I'll use, like, a Nano, GPT-5 Nano. But then for the... And I can go in and show you a prompt here, which maybe illustrates this. I have frame prompts, which basically ask for just, like, a prompt of an individual still image extracted from video. But then I have a larger parent prompt. You can see that my prompts have gotten slightly more sophisticated- [chuckles]
- CVClaire Vo
... over time.
- TMTim McAleer
Um, basically, what this does is it sends every single frame that we've extracted from a video file, it extends it- anything, like, any of the audio we've transcribed from that video file, it packages it up into this elaborate prompt, and it sends it to a reasoning model.
- CVClaire Vo
Mm.
- TMTim McAleer
And the purpose of that is to say, like, "These are all the video events that we have observed in this video. Here is, like, a massive text file of data. Tell me what you think is happening in the video."
- CVClaire Vo
Got it.
- TMTim McAleer
Yeah.
- CVClaire Vo
Yeah. I, I, you know, maybe, maybe tip from one of our how- other How I AI guests, but I've found that the Gemini, um-
- TMTim McAleer
Yeah
- CVClaire Vo
... the Gemini models are quite good with video. It's actually what we use to do our podcast raw recording to, uh, both highlight stills and a blog post that I put out. I process them through the, the Gemini models and have had a lot of success with those ones.
- TMTim McAleer
And it just pulls out, like, the stills that might be, um-
- CVClaire Vo
It just... It automatically pulls interesting stills.
- TMTim McAleer
Mm.
- CVClaire Vo
It actually gives me interesting stills, plus five seconds, or, like, plus five seconds, plus minus five, or minus five seconds, because sometimes the guest and I are looking ridiculous, so-
- TMTim McAleer
Yeah, yeah, yeah, of course.
- CVClaire Vo
[chuckles] Back face. So tip to anybody out there with video who hasn't tried the Gemini models, I, I find those particularly good for this use case.
- TMTim McAleer
You might have just, you know, added something to our-
- CVClaire Vo
There you go
- TMTim McAleer
... little roadmap here. [chuckles]
- CVClaire Vo
Well, um, and so- and then I'm curious about the audio side of things. So I kinda, you know, I've-
- TMTim McAleer
Mm
- CVClaire Vo
... I've played with the Gemini models for video. The still makes tons of sense to me. Tell us a little bit about the audio side of things.
- TMTim McAleer
So the audio is also... I, and now I feel like I'm an OpenAI shill. Everything I'm using is OpenAI, and I think, except for the coding, which is interesting, but I think it's just habit. I use Whisper for audio. So, like, Whisper's an incredible open source model for speech-to-text detection. Even the, like, medium-sized model does a pretty good job. And what I do is... And I can pop back into the database software maybe to, like, illustrate this. What I do is I extract... You can see, like, frames pulled every five seconds.
- CVClaire Vo
Mm-hmm.
- TMTim McAleer
And there's a caption associated with each frame, and then there's... This is a shot of an alligator in a swamp, so he doesn't have a- any audio. He wasn't talking.
- CVClaire Vo
[chuckles]
- TMTim McAleer
But I basically pull audio at five-second increments, so that when we send those, like, video events up to the reasoning model, we are sending a full transcript, but we're sending it, like, kinda like pegged to the moment in the video that it happened.... if that makes sense.
- CVClaire Vo
Yep.
- TMTim McAleer
So the transcription is all happening, you know, on my back end over here. Um, everything, like, I think I could probably open up the console and see, like... There we go. Like, someone just sent a, a job through not that long ago. Like, I can kind of come in here and see what my colleagues are doing as they ping my API all day long.
- 19:10 – 21:22
Implementing vector embeddings for semantic search
- CVClaire Vo
Great. And so you're pairing a snapshot image every five seconds from a video, the five-second transcript of the audio, speech-to-text via Whisper-
- TMTim McAleer
Mm-hmm
- CVClaire Vo
... metadata, if you have it, parsing that all together, and then getting a very robust description and analysis of the content that you have available in, back in this tool that you're using to archive, log, manage all, all your assets.
- TMTim McAleer
Yeah. And like I said, that tool could be kind of agnostic. Like, you could do it in a Google Sheet, if that's, you know, if that's what you like.
- CVClaire Vo
Yeah.
- TMTim McAleer
But, um, I like this. We've been using it for a while. Everything we just talked about is how we kind of get to like metadata that we can read, right? Like, generative metadata that is, A, we know it's accurate because it's kind of been put on these guardrails by our metadata extraction steps, and then also it, it provides this, like, nice visual for us. We can see what this thing is at a glance. But the next step of this, now that you have this, like, API running in the background, is you can generate something that maybe I can't read, but the AI can read pretty well-
- CVClaire Vo
Mm-hmm
- TMTim McAleer
... which is vector embeddings. So I'll jump back to stills for this, 'cause I think it's a maybe an easier illustration of it. Every asset in our database gets put through m- two modes of embedding. So we'll send the thumbnail through and run it against an open source model. I use CLIP for this, and I'll generate an embedding off of that, and then we'll send the description through. Um, I use, uh, again, an OpenAI text model for this, um, and get an embedding for that, and then we'll fuse them. And the purposes of that is that so now we have, like, the ability to discover things semantically. Like, prior to this, and I think in a lot of film production today, you're working with exact text search, you know?
- CVClaire Vo
Yeah.
- TMTim McAleer
Like, if that description says dog, but, you know, somebody wrote in puppy, you're not finding that image. And so this has been, like, kind of the most exciting part of it, not necessarily where I knew it was going when it started.
- CVClaire Vo
Yeah.
- TMTim McAleer
Like, I was just excited to generate a description, right? But now the ability to discover semantically is, I think, you know, the most, the most, uh, robust part of the system.
- 21:22 – 24:21
How AI frees up researchers to focus on content discovery
- CVClaire Vo
So what I love about this... I mean, a, a couple things is, one, you've really pushed every step of the way. You know, you could have stopped at, like, we got good descriptions, or we got, like, the structured metadata out-
- TMTim McAleer
Yeah
- CVClaire Vo
... and now I have a script that runs it. You could have stopped at images only, but you took it to video, and video and audio. You could have stopped at structured data only, but you went to embeddings to get semantic search. So I love just the breadth of applicability of the AI in this process. But what I probably love more is I doubt this was anybody's favorite part of their job. Like, I doubt it was anybody's-
- TMTim McAleer
[chuckles]
- CVClaire Vo
... favorite part of their job to be like, "Mm, I'm gonna go read some Library of Congress metadata."
- TMTim McAleer
It used to be my job, so I can tell you firsthand, not my favorite part. And it's also, like, I think the, the best argument I have for all the work I've done creating this system is that, like, the same people who used to write this data were the ones who are responsible for doing the research.
- CVClaire Vo
Yeah.
- TMTim McAleer
So you've now freed them up to just look more, right?
- CVClaire Vo
Yeah.
- TMTim McAleer
Like, maybe now we could gather 25,000 still images for the Muhammad Ali project, 'cause you have that much more time. You're not just, like, copy and pasting stuff off a website to put it in this form, you know?
- CVClaire Vo
Well, and you probably get to select from this big archive of data better assets to use in your content because they're more discoverable, because you have more confidence in the source and the content of, of that data. So I bet it uplevels, at the end of the day, the quality at, at the end-
- TMTim McAleer
Yeah
- CVClaire Vo
... um, because you have just much more data to work off of.
- TMTim McAleer
100%. I mean, like, a real quick example of that, too, is, like, I'm gonna use Abe Lincoln here, which is maybe not the best- [chuckles]
- CVClaire Vo
[chuckles]
- TMTim McAleer
... use, use of this image. But embeddings enable us to find things in ways we never would've thought to find them before. So, like, I have a button down here where when I click it, what it basically is gonna do is a reverse image search within our own collection.
- CVClaire Vo
Mm-hmm.
- TMTim McAleer
So if I li- if I'm an editor, and I like an image, and this is gonna take a while, 'cause I'm not on site, but if I like an image, I can click the Find Similar button, and it's just gonna go and find every image that kind of has that vibe.
- CVClaire Vo
Yeah.
- TMTim McAleer
You can see here we have a duplicate of this one.
- CVClaire Vo
Just-
- TMTim McAleer
But then there you go. It recognized the man, and it started pulling in other portraits.
- CVClaire Vo
[upbeat music] This episode is brought to you by Brex. If you're listening to this show, you already know AI is changing how we work in real, practical ways. Brex is bringing that same power to finance. Brex is the intelligent finance platform built for founders. With autonomous agents running in the background, your finance stack basically runs itself. Cards are issued, expenses are filed, and fraud is stopped in real time without you having to think about it. Add Brex's banking solution with a high-yield treasury account, and you've got a system that helps you spend smarter, move faster, and scale with confidence. One in three startups in the US already runs on Brex. You can, too, at brex.com/howiai.
- 24:21 – 29:33
Demo of “Flip Flop” iOS app for field research
- CVClaire Vo
I love this. Okay, so this is more of your archival and footage data-
- TMTim McAleer
Yeah
- CVClaire Vo
... but you capture a lot of stuff in the field, where people are not sitting in front of Cursor or their desktop and looking through these assets, and I know that you use some vibe coding and a creative approach to get more information about those assets. Could you walk us through that?
- TMTim McAleer
Yeah. So the next use case is an app that I developed for archival research in the field. So I think that we, we really pride ourselves on, like, turning over every rock, on, on not just relying on what's digitized and available online, and going and visiting physical archives. And so, um, the process of visiting a physical archive-... is basically you have a bunch of folders, um, that you pull ahead of time. You arrive there, and your goal is just to snap, like, low-res, resolution iPhone snaps of everything you can possibly get. And so you're snapping the front of the image, and you're snapping the back of the image. Because the back is typically where there's gonna be, like, a scrolled description or maybe like, a, an accession number, an ID number, that the archive has added themselves. And so this process used to look like you show up at the archive, you take iPhone snaps for two days, you get back to the office, you have the messiest camera roll you've ever had. You cannot actually pair your fronts to your backs because it just got out- somehow it got out of order along the way. And so the goal was basically to make that process, like, a little better. So I, I vibe coded this iOS app to deal with this problem, and I, I, I tend to just, like, speak in screens. Like, the way... Maybe it's because I'm a visual person, like, the way I deal with it is I just think like: Okay, I see a screen that does this and a screen that does this. I, I imagine a button that does this. And the purpose of this was basically, like, I want people to be able to create collections for each folder they're capturing. I want them to be able to snap a front and a back, um, like a, the, the flip side of the image, uh, so that they can easily associate those, so the file names associate them. And I wanna immediately transcribe any information on the back and embed it into the original image. So now I have this app called Flip Flop. I ask ChatGPT at the end of my dog walk to generate some kind of specs doc or requirement doc. It pretty much does it in one go. If you chat with it for 30 minutes, you know, you can get a lot done. [chuckles] Uh, and then I fed this PRD to Claude Code, and it, uh, this one, it, like, it, it didn't build it in one shot, but it certainly built the UI in one shot. And so I guess maybe we should just jump into, like, the actual app?
- CVClaire Vo
Yeah, let's do it.
- TMTim McAleer
So Flip Flop, which is my cute little name for it, is, uh, basically designed to capture those fronts and backs that I was talking about. So you have three screens here. You've got a collection screen, where you're gonna create your folders. You've got a capture screen, where you're gonna take your images. And I'll just quickly highlight this part, which is where you kind of have your AI processing options. So I allow people to define a separate prompt for what I call the flip side of the image, the front, and the flop side of the image, the back. And so in this example, I'm gonna show you some photos of my dog now, and, uh, the flop side of the image is gonna have some text on it. So our prompts here are really just designed to get a decent caption from the image and to transcribe any text that we see on the back end. So let's create a new collection. We're gonna call it How I AI. That- that's good enough. There's also an option here to add more context. You know, the AI loves context, and so maybe if you're... You know, you can imagine if you're digitizing an entire collection of, you know, someone's personal letters or someone's, uh, portrait photographs, you would add that kind of thing here. But for now, we're just gonna create a collection, tap into that collection, and capture. So here we go.
- CVClaire Vo
Puppy.
- TMTim McAleer
It's a screen share within a screen share. We're gonna not care about the glare too much. I'm gonna capture the front side of this image of my dog, Tony's, third birthday. I now have the option to add notes, if that's what I wanna do, or I could just add a flop side of the image right here. And when I complete that, it will have, 'cause it's lightning fast, already sent it up to OpenAI for a description and embedded it-- and this is the really crucial thing, 'cause you just saw the first system I had-- embedded it in the image metadata itself. So the flop details have the transcription, "Tony's third birthday," and all of that will show up in the, what we call Exif metadata, which is just the image metadata standard.
- CVClaire Vo
Got it, and just for people that that may be passed by-
- TMTim McAleer
Yeah
- CVClaire Vo
... instead of simply generating kind of the text description and storing that in a database relative to the original image you took, you actually now have this structured metadata on the image file itself, which again, like, what a pain-
- TMTim McAleer
Oh, a giant-
- CVClaire Vo
-to do ]
- TMTim McAleer
... a giant pain. Yeah.
- CVClaire Vo
To, a pain to do manually. And so now anytime anybody uses one of these images, even if they don't have, um, access to this, this app, even now, that, that, that image is embedded with that metadata.
- TMTim McAleer
A
- 29:33 – 32:20
How structured file naming improves workflow efficiency
- TMTim McAleer
hundred percent. So you could pull this onto any computer or any app, anything that can read underlying metadata, and it's gonna be able to see that this was Tony's third birthday. And that, so that's structured metadata in the sense that we've now structured the actual information about the image, but the other thing that's really crucial, honestly, is that we've structured the files themselves, right? So you can see they're getting named in a particular way.
- CVClaire Vo
Mm-hmm.
- TMTim McAleer
And so we've moved from, like, camera roll mess to, like, files that are gonna sort in your, in your computer, that you're gonna be able to import cleanly. You're gonna be able to distinguish easily what's the front of the image, what's the back of the image, and that has, I think, been the other unlock. Like, I had two colleagues out in the field a couple weeks ago, and they came back with 1,400 images, and I don't think that's only because they were able to use Flip Flop to capture it, but I think Flip Flop is certainly making the process easier since they've gotten back.
- CVClaire Vo
The, the thing that I wanna call out for folks, maybe a general takeaway here, is these AI models are so good with files-
- TMTim McAleer
Yeah
- CVClaire Vo
... and code can do a lot of stuff with files, and a lot of the people we talk to, um, you know, markdown, mark- markdown is the file type du jour these days, which is, you know, just like a, a specially formatted text document. But if you start to look at other file types and really understand what can be put in a particular file type, you can actually discover some pretty interesting things you can do with a combination of AI and coding to make those files much more useful for your use case. So this is one of these takeaways where I'm like, "I haven't thought about-... like what can be embedded in an image file or what can be embedded in a video file, and even just having, you know, ChatGPT or one of your general models say, "Hey, I'm working with an image. How can I load it up with as much context and specificity as possible? What's available to me?" And then using that as a jumping-off point for what you do is a pretty interesting use case of AI.
- TMTim McAleer
I didn't even know, like, I'm very familiar with stills' underlying metadata fields, but I didn't really know what was available in audio or what was available in, in, in video files. And y- I just sort of, I go into Cursor and I ask, right? Like, now we have a music workflow, which we're not gonna look at, but, like, where we embed artist, album, kind of like licensing data into any music we consider for a film, and I didn't know that there was a m- metadata field we could just store that in, but of course there is, you know?
- CVClaire Vo
Yeah.
- TMTim McAleer
Uh, somebody thought of this a long time ago.
- CVClaire Vo
Yep. Amazing. Okay, we have one last use case, which, um, Mom, if you're listening, I think you're gonna like this one. My mom's a genealogist-
- TMTim McAleer
Oh, okay!
- CVClaire Vo
... so, uh, I think she's gonna like this, this use case. But let's show it first, and then I'll call out Mama where I think [chuckles] you can use
- 32:20 – 34:56
“OCR Party” app for processing historical documents
- CVClaire Vo
it.
- TMTim McAleer
Okay. All right, so you can imagine in our films we work with a lot of documents, and we're not always interested in the entire document. Sometimes, like, we just want to transcribe maybe part of it, maybe, um, we wanna translate and transcribe part of it. Like, take this newspaper document, for instance. Like, maybe the Arkansas State News is the article we're interested in. That's the transcript we want to be searchable. That's what our editor might wanna consider for the film. We can't just, like, put this in Adobe Acrobat and OCR the whole thing. It's, like, it's not gonna work, and even more than that, like, the quality of the image would not work with most OCR engines, you know? So AI is really good at OCR of old documents. It's really good at handwriting. It's pretty good at translation, too. So I built... And we're not gonna get into the building necessarily, but this is, this is one of the few, like, Xcode builds I had to do. So this is a Swift build, a little Mac menu bar app. It's called OCR Party, uh, which stems from the fact that we're just OCR-ing part of the image. You gotta have fun with these things.
- CVClaire Vo
[chuckles]
- TMTim McAleer
And let's see. We're gonna open up that newspaper in OCR Party. We're gonna get, like, a little preview window. So let's say actually what we want is, "Coolidge Seeks Peace in the World." Let's zoom in a little bit. Let's open up our cropping tool. This little thing down here is basically a choice between macOS Vision and, uh, an AI API call.
- CVClaire Vo
Mm.
- TMTim McAleer
And the purpose of that is because sometimes people don't... Sometimes people don't trust AI, you might have heard.
- CVClaire Vo
[chuckles]
- TMTim McAleer
And so I, I built that in as an option, essentially. I would, I would think the AI option gets used more, but nevertheless, now you're gonna select just the part of this article you care about or this paper that you care about, and you can see there's, like, a crease in the paper. There's a weird black mark here, but you can imagine we submit this for OCR. Now we have just that text that we pulled. We're also calling out for our editors, like, where on the page they're gonna be able to find it if they wanna sort of zoom in on it, crop to that particular article. And I can't exactly remember what text we were looking at, but it certainly completed those sentences where there was a black marker.
- CVClaire Vo
Yep.
- TMTim McAleer
Right? So AI was able to kind of infer, to the best of our ability, what that sentence might have said. And, you know, if this ends up in a film, I could guarantee it would get fact-checked later, but for the purposes of gathering documents, thousands of documents, this ability to kind of, like, precisely OCR is, is- it's been a nice little
- 34:56 – 40:34
The versatility of different app form factors for specific workflows
- TMTim McAleer
unlock for us.
- CVClaire Vo
One thing I also want to make sure people take away from this episode is, we've seen basically three form factors of apps.
- TMTim McAleer
Yeah.
- CVClaire Vo
So yes, they've all used AI, but you've been able to swap between sort of like a Python API service that gets called by another software application or database, a, um, iOS app that, you know, you can run on your phone, and then, like, a little desktop toolbar widget. And what I like- what I love about this moment in AI with, uh, with regards to software engineering is, like, if you have basic software engineering practices and then you know enough to be dangerous, like, yeah, you can, you can vibe code, uh, and, you know, a Swift, Swift app to run on, on your local desktop.
- TMTim McAleer
Just a hyper-specific app.
- CVClaire Vo
Yeah, just-
- TMTim McAleer
You know, like, no one was gonna make me this app.
- CVClaire Vo
No.
- TMTim McAleer
And so the ability to make, like, an extremely specific app that makes a workflow, you know, on my team and my company easier, it's been, it's been an unbelievable moment. [chuckles]
- CVClaire Vo
Yeah, I, I would say the TAM for this app is, like, you. [chuckles]
- TMTim McAleer
Yeah. Yeah, yeah, yeah. I mean, I think I could sell it to, like, two colleagues. [chuckles]
- CVClaire Vo
Well, and then my mom, so what I was gonna tell you-
- TMTim McAleer
Oh, good
- CVClaire Vo
... is my mom, um, is a genealogist for, uh, the [chuckles] Daughters of the American Revolution, of which I am one. Uh, fun fact on Claire.
- TMTim McAleer
Oh, no way!
- CVClaire Vo
And she does the lineage tracing, and do you know how many times she screenshots something and is like, "Can you read this cursive?" Like, "What in the world-
- TMTim McAleer
Yeah
- CVClaire Vo
... is this name?"
- TMTim McAleer
Yeah.
- CVClaire Vo
And it's like, you know, one name and a big, a big image, and so I do think AI is... And I'm like, "Yeah, I'm gonna drop this in a ChatGPT, and I'll tell you what I think it says."
- TMTim McAleer
Mm.
- CVClaire Vo
And I think its ability to read handwriting, um, old typefaces, kind of understand the nuances of, of spelling and things like that, are just really, really interesting for these sort of, um, research use cases.
- TMTim McAleer
Yeah. We didn't look at a handwritten doc here, but that is definitely something happening, uh, at our company, like, the ability to read letters that we could not read before, and also just other languages, right? And then we immediately have that text to... You have letters written in some kind of cursive scrawl from the 17th century that is now translated to English and made legible for you.
- CVClaire Vo
... amazing! Well, we've seen three great use cases. I am sure you are the hero on the team, [chuckles]
- TMTim McAleer
[chuckles]
- CVClaire Vo
uh, for this kind of stuff, 'cause I can imagine, again-
- TMTim McAleer
I think people might be tired of hearing me talk about AI-
- CVClaire Vo
No
- TMTim McAleer
... but thank you.
- CVClaire Vo
Yeah, but I mean, this is, this is hard stuff. It's tedious work to do. It cr- you know, requires a lot of time, a lot of detail orientation, and I'm sure people love using this information-
- 40:34 – 42:00
Learning approach and parallels with creative software
- CVClaire Vo
I'm curious, you know, what's your point of view of AI, particularly in the film world? What are you excited about, and where do you think these kind of concerns are really warranted? And then where do you think the most practical applications are?
- TMTim McAleer
I think today, it's, like, sort of where we started at the top. The practical applications are more in, like, tooling than they are in creation, but I do think that, like, the creation's gonna get there. Like, today, I play with, I play with all the generative video models. Like, how can I not?
- CVClaire Vo
Mm.
- TMTim McAleer
They're, they're super fun. Um, they are not, like, at professional-grade quality yet. Like, the amount of time you spend throwing tokens at even the highest-end video models, you're not gonna be able to match your shots that well. You're not gonna be able to match the footage you shot yourself that well, and so I don't think they're there yet, but let... Like, I'll be honest, they're gonna get there. I think that, like, they are still exciting to me, but I would separate a couple things. Like, in the nonfiction world, I think it- I think people should be careful. Like, I think-
- CVClaire Vo
Yeah
- TMTim McAleer
... we should not be generating archival footage. We should not be trying to fool our viewers into thinking that there was video in 1750, you know? And I think that that's the part that's, like, a little scary.
- CVClaire Vo
Mm.
- TMTim McAleer
And then, of course, there's the displ- like, job displacement aspect of things.
- CVClaire Vo
Yeah.
- TMTim McAleer
I think people are scared. If you film stuff for a living, you're definitely scared that, like, y- that you're gonna be able to just, like, use text to generate that same-
- CVClaire Vo
Yeah
- TMTim McAleer
... video you used to shoot. So I don't know how to, like... I don't think anybody has, like,
- 42:00 – 44:05
Perspectives on AI in the film industry
- TMTim McAleer
good answers to that part of it.
- CVClaire Vo
Mm-hmm.
- TMTim McAleer
But my approach has certainly just been, like, jump in and learn the tools. Like, they are, they are gonna be here-
- CVClaire Vo
Yeah
- TMTim McAleer
... whether we want them to be or not, and, uh, I think that they have a lot of practical benefits today that are less scary.
- CVClaire Vo
Yeah. The best advice I can give to people, and I have, I have, uh... Of all the spaces, and I'll say this honestly, of all the spaces, I have the most job displacement concern, it's in video generation for-
- TMTim McAleer
Yeah
- CVClaire Vo
... non, um, non-archival, non-documentary-
- TMTim McAleer
Commercial
- CVClaire Vo
... cases, but commercial use cases.
- TMTim McAleer
Yeah.
- CVClaire Vo
Um, you just, you just see how it could be very applicable, and the best advice that I can give to people in this moment is, the more you learn the tools, the better off you will be. Whether or not, you know, whether or not you love where the tools are taking us as an industry or as a culture, kn- knowledge is power, and so the more you learn and understand, one, you can identify opportunities where it does add value, even in your creative process, and two, you're gonna be differentiated in the market from a job perspective-
- TMTim McAleer
Mm
- CVClaire Vo
... because you're gonna have a more robust sense of what's available in your industry, and I think that stands for people in your industry. I think it stands for people in my industry, in technology. So I just say, there is no harm in learning this stuff.
- TMTim McAleer
Yeah.... Absolutely. I also think that, like, there's a place in the process for it, which allows you, like, a place to learn without thinking it needs to end up in the final product, right? Like, you can use video models for storyboarding all day. You can maybe prove whether or not that shoot is worth spending that money on. Now you've learned how to use the video models a little bit, and, you know, you haven't necessarily displaced anyone, but you've, like, made your production a little bit more efficient, a little smarter. Maybe you've shot better footage as a result of it, you know?
- CVClaire Vo
Yes, but we're not, we're not generating fake archival footage of, like, Genghis Khan. [chuckles]
- TMTim McAleer
We're not, we are not doing that.
- CVClaire Vo
[chuckles]
- TMTim McAleer
Uh, definitely not doing that. And unlike PBS, which is where most of our films end up, have a lot of guidelines around that, and I think that's a good thing, but it's the other stuff. It's commercial, it's visual effects. Like, a lot of that stuff's gonna get easier, um, and so it's, it's coming one way or another.
- 44:05 – 47:36
Prompting techniques and troubleshooting AI workflows
- CVClaire Vo
Great. Well, last question, have to ask you, when, you know, you're on your dog walk with ChatGPT doing voice mode, and it's not listening to you or not giving you what you want, what is your personal prompting technique? Especially 'cause you use voice.
- TMTim McAleer
Yeah.
- CVClaire Vo
Like, I'm willing to type things to AI. I don't know if I'd be willing [chuckles] to say them. So what's, what's your technique here?
- TMTim McAleer
It definitely is different when you have to say it out loud. Um, I am, I am super nice to the AI. I, like, can vividly remember the one time I was mean to it. I'm nice to the... I don't know where this is going. I'm gonna be nice to all the models.
- CVClaire Vo
[chuckles]
- TMTim McAleer
What I do is, like, for lack of a better way of describing it, I just start over. Like, I will-- I know that a lot of these things have ways of, like, consolidating the context window now and sort of summarizing, but I will ask for what I call, like, a resume work prompt. I'll be like: "This isn't working. I want to resume work later with another AI dev. Can you give me a prompt with everything they'll need to know?" And typically, what you'll find is that that prompt shows you where it was off, you know? Like, in its summarization of what it was doing, I'll be like, "Oh, see, like, I wasn't asking for that. That's, that's why we were not communicating." And then I'll take that resume work prompt, I'll prune it a little bit, pop it into another chat, and then, you know, you'll find that you wish you hadn't beat your head against the wall with the previous chat for 20 minutes.
- CVClaire Vo
You know, I am also Team Be Polite to Your AI, but then again, like, you hurt the one you love the most, and I've, I've found myself occasionally getting testy. And you know when I stopped being mean to AI is when reasoning really started to show, and I could see it reasoning how upset I was. It was like-
- TMTim McAleer
Oh, it'd be like, "The user is mad at me right now." [chuckles]
- CVClaire Vo
... "The user is really frustrated with me right now. I need to totally rethink my [chuckles] approach." Go sweet, sweet baby AI, I'm sorry.
- TMTim McAleer
Yeah.
- CVClaire Vo
I apologize.
- TMTim McAleer
Yeah.
- CVClaire Vo
I'm not that mad at you. Okay, so create a, you know, g- return to progress prompt-
- TMTim McAleer
Yeah
- CVClaire Vo
... really get the summary. Take that to understand if there was some misunderstanding, improve that, and then just start fresh. That's great.
- TMTim McAleer
Yeah.
- CVClaire Vo
Well, Tim, this has been super fun. So much for me to learn. I have tons of ideas, even just for my day-to-day life, about how I could use... I have kids, so I probably have 30,000 images.
- TMTim McAleer
Let me know if your mom wants the OCR Party. I'm happy to-
- CVClaire Vo
I, I will. She'll love it. Okay, Mom, I have gotten you your first Vibe Coded app-
- TMTim McAleer
[chuckles]
- CVClaire Vo
... direct from the podcast source. Tim, where can we find you, and how can we be helpful?
- TMTim McAleer
Yeah. Uh, I'm not that active on social, to be honest, but I am on... I'm LinkedIn. You can find me on there. I have a website that is itself a fun Vibe Code project, so you can find me, timmcaleer.com. I have a little chatbot there, the GP Tim. You can go chat with him, learn a little bit more, more about me and my work. Uh, and then other than that, I would say tune in to Florentine Films' upcoming production. We have an, a, a series about the American Revolution coming out in November, so on your local PBS station.
- CVClaire Vo
My kids are obsessed with the American Revolution, so everybody watching-
- TMTim McAleer
Sounds like it's in the family. [chuckles]
- CVClaire Vo
It... Yeah, we will, we will be, uh, big fans. Tim, this has been great. Thank you so much, and thanks for joining How I AI.
- TMTim McAleer
Thank you for having me. [upbeat music]
- CVClaire Vo
Thanks so much for watching. If you enjoyed the show, please like and subscribe here on YouTube, or even better, leave us a comment with your thoughts. You can also find this podcast on Apple Podcasts, Spotify, or your favorite podcast app. Please consider leaving us a rating and review, which will help others find the show. You can see all our episodes and learn more about the show at howiai pod.com. See you next time! [upbeat music]
Episode duration: 47:36
Install uListen for AI-powered chat & search across the full episode — Get Full Transcript
Transcript of episode 9ngbZwA_h00
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome