EVERY SPOKEN WORD
15 min read · 3,355 words- 0:00 – 1:38
Getting started with Google Flow and Gemini Omni
- CVClaire Vo
Today, I am doing a very strange episode where I'm gonna create a video avatar of myself, and in about 15 minutes, get to a full minute-long video starring none other than your favorite podcast host, Claire Vo. Let's get to it. This episode is brought to you by Merge. Building an AI product is one thing. The hard part is everything around it, connecting to the tools your team and customers rely on, letting agents take action with the right permissions, and keeping everything reliable and cost-efficient once you're in production. Most teams end up piecing that together themselves. So instead of building the products you actually care about, you get pulled into integrations, permissions, routing, and all the infrastructure underneath. Merge is the infrastructure layer for production AI. It connects to thousands of tools, gives agents secure ways to act inside them, and optimizes model routing and spend without you building or owning any of it. OpenAI, Dropbox, and Ramp already use Merge to move fast and build AI right. Visit merge.dev/howiai to start building for free. This episode of "How I AI" is going to be an adventure because, I'm gonna be honest, I'm not 100% sure this is going to work. I'm gonna return to a product I covered very briefly a couple weeks ago, called Google Flow, and the new Gemini Omni video generation model, and I'm gonna try really hard to create an AI avatar of myself that we can animate, or I guess cinematically create, using AI.
- 1:38 – 2:55
The avatar creation process: scanning and photo capture
- CVClaire Vo
So this is Google Flow, and one of the features of Google Flow and the Omni model is you are supposed to be able to create an avatar of yourself. Now, we tried this the day it came out. It did not work, but we're gonna give it another college try and see if we can get a full-featured avatar of myself that then we can go and build consistent character videos off of. So I'm gonna select up here. I'm gonna create an avatar. We're gonna click Get Started. I'm gonna scan this QR code. I have my phone here. I've done this before, so hopefully it'll be fast. Okay, I'm gonna put the mic away just for one second. Okay, I'm gonna allow access to my camera, and we're just gonna take some photos. Okay. Ready? Start. 17, 81, 49, 20, 25, 22. Okay, now it's having me turn my head, so I turned my head that way, gave me a check mark. Turn my head the other way. It's giving me a check mark, and, and it says we're done. Now, it said we were done last time we tried this, so we're gonna see. It's gonna take a couple minutes, and then we will come back and see if I can actually use this avatar of myself.
- 2:55 – 6:59
Using Flow to brainstorm a hype video storyboard
- CVClaire Vo
Okay, so look at this beauty. Um, [laughs] there's this fisheye lens version of me that is now an avatar, so I supposedly can use this, and let's use it to create a hype video for the "How I AI" podcast. So I'm gonna go in here and say, "Help me create a storyboard for a hype video for the How I AI podcast. I already have a character named me we can reference. Help me come up with the few scenes that would make this great. This is a podcast by Claire about the best ways to use AI at work and in life!" Okay, so what I love about Flow, or what is pitched to me about Flow, is that it's not just a video generation tool, it's actually a whole creative suite. And so i- ideally, it's gonna be able to help me not only animate or, um, g- video generate this avatar of myself, it's also gonna help me actually brainstorm what this overall video could be. And I'm, you know, I'm creative, but I'm not video creative, so I'm excited to see what it looks like. So, "How do you imagine Claire? Is she in a modern studio or perhaps a bright, airy home office? Should it feel high-tech and sleek or more grounded and lifestyle-focused? And are we going for high-energy and fast-paced, and thoughtful, inspiring?" So I'm gonna say she is in a dark home office, dark green walls, with books about AI and fun posters lying around. This should be more authentic lifestyle version, but it's high-tech and about coding. Have a hacker vibe to it. Okay. Wow, a bunch of typos, but we'll see, we'll see what [laughs] this does. And what I love about these video models and these new tools, again, usually here on "How I AI," we talk about coding, we talk about website generation, we talk about PRDs and work product, but what I really appreciate about these new generative AI models, in particular these multimodal ones, image and video, is it unlocks for me an ability to generate, create something that I would have never been able to do before. So I would've never been able to, um, solo produce a hype video for my podcast. [laughs] I would have a hard time brainstorming it. I wouldn't know how to frame it. I wouldn't know how to block it. But now I have this AI producer here that can help me with this effort. So let's see what the frames are. It's about seven frames. Um, it's gonna be an extreme close-up of me typing on a mechanical keyboard, totally on brand. Um, then there's gonna be a wide shot of the office, then it's gonna reveal me in my ergonomic chair. Spoiler alert, I am not actually in an ergonomic chair. I'm going to spin around. That's gonna be funny, and it's gonna give me a digital heads-up display, which is also ridiculous, but let's let it happen.Then it's gonna do a very, what I'm presuming to be, a very cheesy AI montage, uh, a lifestyle moment, [laughs] a call to action. Gonna hit you with the, the podcast, uh, microphone, and then it's gonna say, "How I AI." Um, if this looks good, I'm gonna say, "This is great. Generate the storyboard. I already have the character @ me." Um, and so I'm gonna send that. [laughs] We're gonna see what it comes up with. I've noticed that it has a hard time referencing the me character in some early tests, so let's see what it comes up with. I'm presuming it's gonna take a couple minutes, so we will take a mini break, and then come back to see what it looks like. Okay, it looks like it's generating a,
- 6:59 – 8:41
Generating the first video scene with the avatar
- CVClaire Vo
a grid for the storyboard. It can't use the avatar, so I think it's gonna do it without the character reference. It'll be really interesting to see what it comes up with. But then as soon as it's ready, I'm gonna go ahead and generate at least a couple of these storyboard scenes one by one, and we can see how well it does with my avatar. Oh, I mean, this is delightful. Look at this glowy mechanical keyboard. Look at how I am hacking on three keyboards. I'm gonna make a little eyes at you with my, my fake glasses, my very trendy glasses. There's going to be me dragging and dropping a file that probably says, like, AI.md. I'm gonna smile, and then I'm gonna speak into the podcast. This looks great. So what I think I'm gonna do is I'm gonna paste in this first frame of the video that the agent came up with, and instead of saying Claire, I'm just gonna @ mention in this avatar that it gave me, um, so that we can see if it generates this video with me as the character. And so I think I've replaced my name here. Um, I've given details on camera, on lighting, on everything. I press Enter. Let's see what it creates with my avatar. I have no idea what we're gonna get into, and hopefully it won't be terrifying. Okay, I'm already nervous. What is surprising to me that I didn't actually expect is it does have my posters and my book's background here, I guess because they're behind me when I took the photo. It's taking advantage of that. And I'm gonna share my audio as well, and we're gonna see how this video worked.
- 8:41 – 9:32
Troubleshooting: accidentally generating images instead of videos
- CVClaire Vo
Okay, I got that wrong. I actually generated images instead of videos. Totally messed up. Did not click the right thing down here in the bottom right. I had image generation instead of video generation. So again, I'm gonna paste that, um, walkthrough of the scene here. I'm gonna replace my name with the me avatar. Uh, it's gonna have my fingers flying across that mechanical keyboard. It's going to be so cool. I'm gonna go ahead and press Send, and we're gonna see how long it takes to generate a video. Now, something you'll notice about every time you generate videos, it, it used to work like this in Veo 2, so I'm not... Veo 3 as well, so I'm not surprised they do this, is they're generating two versions of it. It's gonna take a couple minutes. The image took a couple seconds. These are probably gonna take a couple minutes, so I will come back, and hopefully we will have our first video
- 9:32 – 11:37
Generating all seven scenes for the complete video
- CVClaire Vo
with Claire's face in it. And while we're waiting, I'm gonna queue up one or two other scenes, and see if we can get ones going with my actual face in it, because some of these had, um, like, the back of my head as opposed to my face. And I think we wanna see what my face avatar looks like. So we'll pick frame three and see if we can get that going as well. Okay, the first video generated. Now we have blue nail polish. I still like it. Okay, let's see. [keyboard clicking] We were told AI would replace us. [laughs] That is quite spooky. Okay, we were told AI is going to replace us. Let's see if the video with me actually generates a callback to that. Um, so while that's generating, I'm gonna go ahead and make all of these. We're gonna stitch them together. It's going to be so awesome. So stick with us. We're gonna generate a bunch of videos, and we're gonna stitch it together into one long hype video. This episode is brought to you by Jira Product Discovery. AI has made individual PMs incredibly productive, but multiplayer mode is where it still breaks, getting everyone aligned on what should actually get built. Decisions live in a markdown file from last week, the roadmap's a spreadsheet no one's looking at. Jira Product Discovery is where teams actually decide what to build. Capture ideas, prioritize them as a team, and share a living roadmap everyone works from. It's powered by Atlassian's teamwork graph, so it can pull in customer feedback, what your team's shipped, plus your goals, and suggest what to build next. And when a decision is made, you can hand it off straight to Jira, so a developer or even an agent can pick it up and start building. Teams at Canva, Deliveroo, and Toast already use Jira Product Discovery. Join more than 25,000 teams at atlassian.com/howiai. Start building the right things together.
- 11:37 – 13:13
Reviewing the avatar videos
- CVClaire Vo
Okay, I have seven scenes generating, but while we are waiting for those to finish, I just cannot... Oh. [laughs] Sorry. Sorry. For you all that are listening and not watching, I just got, um, jump scared by the AI version of myself wearing glasses, um, turning around in a spinning chair. So let's take a look at both of these. [laughs] This one's pretty good. I'm spinning in a circle. Okay. Sorry. Back to those I need to describe this for. Um-So this is using an AI avatar of myself. The prompt was, I spin my ergonomic chair around to face the camera, I push my glasses, which I don't have, up to the bridge of my nose, and I say, "This is Claire. Um, I am Claire, and this is How I AI." Let's watch V1 of this video, which is actually a scream riot. I'm Claire, and this is How I AI. [laughs] Okay, it was actually pretty good. Um, what's really funny is I do have the... It has the NVIDIA Way in the background, which I don't have right here, but I do have upstairs, so I do believe the AI overlords are really paying attention. Um, I wanna make you laugh and look at the second version, where I spin in a circle twice. Pretty good. I'm Claire, and this is How I AI. This one got my, um, not curled hair a lot better, but I prefer the other video. It makes me look a little bit
- 13:13 – 14:32
Stitching the videos together in the browser-based editor
- CVClaire Vo
nicer. Okay, I'm gonna take one minute. I'm gonna stitch all these videos together in the form factor that Gemini told me I should, that Flow told me I should. We're gonna bring this hype video together. I'm gonna show it to you end to end, and then I'm going to conclude today's very strange episode of How I AI, where I use my avatar to create an end-to-end hype video for this podcast. Cool. So it actually seems I can show you a little bit of how we're gonna stitch this video together. So if you see here, once I click into any one video, I have a video editor timeline here that I can use right in the browser to stitch together all these videos. So I'm gonna go ahead and add these in the order that the original AI told me my hype video should go, and then we'll look at it end to end, and we'll see if we really like it. Okay, this took me about five minutes, but all I did was, um, stitch together my favorite versions of all these avatar-generated AI videos, um, scene by scene, about seven of them together, um, to show one end-to-end hype video. Again, this episode is probably gonna be sub 15 minutes. That includes recording my face as an avatar, figuring out what the heck is going on with this tool, building a storyboard, generating all the videos, and stitching them together here in this editor.
- 14:32 – 15:32
The complete How I AI hype video
- CVClaire Vo
And now, the worldwide debut of the How I AI hype video. I am going to show you. Who knows what we're about to get, but we're about to get it. Here we go. We were told AI would replace us. [laughs] Oh my God. I'm Claire, and this is How I AI. From automating the mundane to dreaming up the impossible. [laughs] It's about the tools that change the way we live and work. [laughs] Join me as we deconstruct the future, one prompt at a time. Subscribe to How I AI. How I AI, available now everywhere you... Available now everywhere you get your podcasts.
- 15:32 – 19:04
What worked and what didn’t
- CVClaire Vo
[laughs] Okay. Okay. I am actually obsessed with this. Let's talk about what I love and what I don't. What I love. This took zero time and effort, and it is ... I wouldn't say it's, like, 80% there, but is it 50% there? 100% yes. Am I gonna tweet this immediately? Absolutely. Did this take no effort? Z- basically no effort, no knowledge. Okay. So what did I like about this avatar experience? You know what? This is, like, kind of my face. It's not quite my face. I would say about 50% of the time it's my face, and 50% of the time it's, like, a uncanny version of, of my face. Some things I noticed from a character cons- consistency perspective, this gave me beautiful, long, wavy hair, which I have recently cut off 'cause I have a child. So you see there's, like, a location inconsistency. This background, um, has, has books and a, um, an hourglass. This background is a different color, and it has plants. It pulls in some things from my avatar. Like, it pulls in this poster that was in the background of when I took my photos, and it changes a little bit over time. And so, um, you can see the books on the shelf change. The lighting changes. As always, these video gen and image gen models are really early 2000s coded on what they think AI and impressive technology is. So I'm holding, like, a 24-inch iPad in this video, looking at a schematic of, it looks like a church. It's very confusing. Um, the heads-up display that shows up on my face when I'm looking at AI, I'm, I'm apparently, um, coding in, in Gemini a robot of some sort. So it's pretty hilarious. But even looking at this frame, I would say this is the one that felt like it looked most like my face. Like, I'll just try to look serious so you all can see. It, it's pretty good. It's got, it's even got my sun damage here, so good job, Gemini, not, um, smoothing out, smoothing out my face. And so I do think this is 90% there. Um, not 100% there, but it's really interesting even, um, seeing my face turn left and right how accurate it got on the side profiles of my faces. Now, this scene right here where I'm laughing, 100% uncanny valley. I look very strange, like I'm on some sort of, um, medication perhaps. And so I'm not sure it 100% has emotions really well. And some of the timing and hiccups you noticed while you were watching the video, I s- I spoke over myself, those sorts of things. But this scene right here-It's legitimately pretty good. I bet with some, um, consistent background prompting, with a little bit more effort, with some additional images going into this Omni model, I think I can make a hype video that would convince most of you, if not all of you. Now, do I think it's great at typography? Do I think it's great at graphics? No. This is kind of lame. This ending part is kind of lame. But again, we're talking s- probably 10 minutes top to bottom. So we're, we're talking, you know, probably 15 minutes from very beginning, knew nothing about this tool, to I have this one-minute video now I can share with you all. I'm pretty blown away,
- 19:04 – 20:33
Final thoughts
- CVClaire Vo
you guys. And so I'm gonna go spend a little bit more time with the Google Omni model. I'm gonna spend a little bit more time with Flow. This might be my new favorite hobby project. I'm kind of obsessed with it. I wanna hear if you all are willing to put your avatar in here, if you can actually get it to generate consistent characters, and what your experience is using these kind of incredible new video models. So I know this is a little bit of a different style of How I AI. We usually do coding. We usually do work stuff. This is a tool I did not know. This is a process I'm very unfamiliar with, and I really think I got an outcome that was much better than I expected with very little knowledge of the tool. So if that is not a How I AI success story, I'm not sure what is. I hope you enjoyed this very strange mini episode of How I AI. I cannot wait to see what you generate, and please share your examples in the comments. Thanks for joining. Thanks so much for watching. If you enjoyed the show, please like and subscribe here on YouTube, or even better, leave us a comment with your thoughts. You can also find this podcast on Apple Podcasts, Spotify, or your favorite podcast app. Please consider leaving us a rating and review, which will help others find the show. You can see all our episodes and learn more about the show at howiaipod.com. See you next time.
Episode duration: 20:35
Install uListen for AI-powered chat & search across the full episode — Get Full Transcript
Transcript of episode UNZczH0gpHc
