How I AIUsing Veo 3 to create AI-generated music videos, like a Tiny Desk concert with Notorious B.I.G.
EVERY SPOKEN WORD
40 min read · 8,110 words- 0:00 – 3:05
Introduction to Anish Acharya
- AAAnish Acharya
It's like the most creative satisfaction I've had in my whole life. So I generated all these clips in a pretty straightforward way. I used GPT-4o to help me with the prompts, said, "Hey, help me capture grunge 1990s Seattle inspired by some of these music videos." And then, as you can see, it gets progressively more like camcorder, grimy. So I generated all this stuff, and then I threw it together into a music video. All right, let's watch it. [upbeat music]
- CVClaire Vo
You get the patented Claire Vo raised hands reaction [chuckles] on this one. I cannot believe this is AI-generated. It's so high quality. It's so specific in aesthetic, in a wardrobe, in emotion. You have inspired me. After this podcast, what music video am I gonna make? It's so much fun! [upbeat music] Welcome back to How I AI. I'm Claire Vo, product leader and AI obsessive, here on a mission to help you build better with these new tools. Today we have a fun and inspiring episode with Anish Acharya, general partner at Andreessen Horowitz and AI consumer investor. But we're not gonna talk about portfolio companies or the future of AI. No, we're going to use AI to build music videos, analyze our bookshelf, and help us plan our personal finances. Let's get to it. To celebrate twenty-five thousand YouTube followers on How I AI, we're doing a giveaway. You can win a free year to my favorite AI products, including v0, Replit, Lovable, Bolt, Cursor, and of course, ChatPRD, by leaving a rating and review on your favorite podcast app and subscribing to YouTube. To enter, simply go to howiai pod.com/giveaway, read the rules, and leave us a review and subscribe. Enter by the end of August, and we will announce our winners in September. Thanks for listening. This episode is brought to you by Notion. Notion is now your do-everything AI tool for work. With new AI meeting notes, enterprise search, and research mode, everyone on your team gets a note-taker, researcher, doc drafter, brainstormer. Your new AI team is here, right where your team already works. I've been a long-time Notion user, and have been using the new Notion AI features for the last few weeks. I can't imagine working without them. AI meeting notes are a game changer. The summaries are accurate, and extracting action items is super useful. For stand-ups, team meetings, one-on-ones, customer interviews, and, yes, podcast prep, Notion's AI meeting notes are now an essential part of my team's workflow. The fastest-growing companies like OpenAI, Ramp, Vercel, and Cursor all use Notion to get more done. Try all of Notion's new AI features for free by signing up with your work email at notion.com/howiai.
- 3:05 – 6:00
How AI transforms creative constraints in music and video
- CVClaire Vo
Anish, I am so excited to have you here, and let me tell you why. It is because I have spent the majority of this podcast talking about enterprise B2B product management, how to manage your manager or manage yourself as a manager, or how to vibe code. That has been the topic of How I AI, and today we are just gonna have a little bit more fun. So why did you start to come to these AI projects that are a little less like work-related or technical and, and actually just a little bit more fun? How did you, how did you get here?
- AAAnish Acharya
Great. Well, I'm excited to have some fun today. I mean, I've been passionate about music forever. I think most of us are. I've been DJing and making music for thirty years, but music is very constrained, you know? There's only so many ways you can work with it. An example of that is if you look at a track that has all the instruments mixed down into a final MP3 or WAV file, there's no way to just extract the vocal or just extract the drums, so you're really limited by a set of choices that were made in the studio. And with AI, you can do all this crazy stuff, like disentangle a track into just the vocals and just the instrumentation. So what really got me excited at first was everything you could do with AI and audio, and then that, of course, fed into all of the new video models and video gen and lip sync and all the new technologies we're seeing. So it, it's just-- it's, like, the most, uh, creative satisfaction I've had in maybe my whole life.
- CVClaire Vo
Yeah, I agree with you. One of the things that I have so much fun with AI on is people are really worried that it takes away the most fun, most human, most creative parts of not just building things, but creating music, creating art, creating writing, and I, in fact, feel like it just gives me so much more tools, so much more breadth, so many more things I can play with and, and build. And so it really opens up this, like, creative artist side of me in a way that has been really hard to access as an adult, also with limited time.
- AAAnish Acharya
Yeah, no, and it's, it's actually a fun conversation we'll have over a glass of wine sometime. But if you look at music culture, music culture's kind of been defined by remix culture for the last forty years. You know, like, the mixtape was the first time that you could take the music and do something, you know, the cassette tape, and do something of your own with it. And then that, of course, evolved into, you know, hip-hop, which also sampled and which also had a lot of suspicion on it, but sampling was the foundation of hip-hop, and I think AI is just the next manifestation of sampling, and it'll be as important for music as hip-hop was.
- CVClaire Vo
Well, and we'll, we'll stop opining about AI and the arts, but the other thing that this remix culture makes me think about is kind of the next step that we've seen in the past couple of years, which is kind of audio and video remixing, this, like, TikTok memes, these dances, these things where you're taking a snippet of creativity, turning it into your own thing, and then releasing it to the world in a, in a new version. So I definitely think we're seeing this, not just the audio side, but also at the video side, which brings us-...
- 6:00 – 7:36
Creating an AI-generated Notorious B.I.G. Tiny Desk Concert
- CVClaire Vo
to your use case. So tell me what you built or what you created maybe, and I'm excited to walk through how you got it done.
- AAAnish Acharya
Amazing. Amazing. Great. Tiny Desk is the best. So if you haven't gotten into Tiny Desk, most people have seen it. It's just- it's so cool, it's so fun, and of course, you know, con- like, creativity loves constraints, and the constraints of Tiny Desk are incredible. Um, there's a really good one from Clips that just dropped last week, and I mean, the-- anyway, there's an infinite number of them. It's a fun format. It's sort of like the unplugged format of the '90s. So I, I love Tiny Desk, and I got to thinking about all the artists I'd want to see on Tiny Desk, and, you know, of course, some of them are no longer able to be on Tiny Desk because they're not alive anymore. Um, so that got me thinking about how I could do a Notorious B.I.G., Christopher Wallace, Tiny Desk, and do we have the tools and technologies, and of course, can we do it in a way that's, you know, respectful, um, and not derivative? And I did it, and it seemed like it kind of worked. Maybe we can cut to it, um, so your audience can check it out, and the workflow is pretty simple.
- CVClaire Vo
We'll do a little clip of it, I think, and then we can work through how, how it got there.
- SPSpeaker
[singing] To all the ladies in the place with style and grace, allow me to lace these lyrical douches in your bushes. Uh, who rock grooves and make moves with all the mamis. The back of the club, sipping Moët is where you'll find me. Buzz. The back of the club, knocking hoes, my crew's behind me. A mad question asking, blunt passing, music blasting, but I just can't quit because one of these...
- CVClaire Vo
Okay, we love it.
- AAAnish Acharya
Okay.
- 7:36 – 9:27
Using GPT-4o to generate still images
- CVClaire Vo
It's great, and you made that.
- AAAnish Acharya
I did make it, yes.
- CVClaire Vo
Okay, show us how.
- AAAnish Acharya
And it took surprisingly little time. Yeah, so let me show you exactly how I made it. So I started with 4o. 4o is the best, uh, general-purpose multimodal model, um, in, in my opinion. I use it for everything. Um, and I just ask it to generate an image of... And we're going to do Kurt Cobain. That'll be fun today, from Nirvana, of course. That's from when I was in high school, um, playing a Tiny Desk concert. So let's see what it comes up with.
- CVClaire Vo
While this is loading, you know, you mentioned that 4, 4o is the best kind of multimodal, all-purpose model. I generally agree. You know, 4o Image Gen had this super viral moment a couple months ago when they, they released it. What do you feel like 4o Image Gen is particularly good at compared to some of the other image gen models?
- AAAnish Acharya
It's very good at prompt adherence, so you can do things, and I think that's because of the infrastructure underneath it. It's a different infrastructure from the diffusion-based models that preceded it. And BFL, Flux, a bunch of others do this now as well, and, and it's great, um, but I think it was just the most productive image model because you could manipulate it in such a fine-grained way.
- CVClaire Vo
Yep, and, uh, I, I remember the biggest improvement when the 4o Image Gen came out is that it could actually spell things and write letters out. That was-
- AAAnish Acharya
Yeah [chuckles]
- CVClaire Vo
... a magical moment. So I have to call out the NPR in the- in the top corner of this image is actually done correctly. Look, there's Kurt. There he is with his, uh-
- AAAnish Acharya
There he is
- CVClaire Vo
... cardigan.
- AAAnish Acharya
Okay, I'm gonna remove the guitar actually-
- CVClaire Vo
Right
- AAAnish Acharya
... so that it is a cappella, 'cause I think that might work a little bit better. But look, this is the vibe of Tiny Desk, you know?
- CVClaire Vo
Yep.
- AAAnish Acharya
It's as, as if you're seeing a photo from the '90s in the Tiny Desk studio. So I just, I love this, and I think that we, we've become so attuned to what's possible, we forget that this would be, you know, witchcraft three years ago. Witchcraft, right?
- 9:27 – 10:40
Using Hedra to animate still frame images
- CVClaire Vo
What is the purpose of this? Are you storyboarding? Are you creating an asset that's gonna go into another tool? Why start with an-
- AAAnish Acharya
Yeah
- CVClaire Vo
... image on this flow?
- AAAnish Acharya
So, so I'll talk through essentially what I'm gonna do. So there's this product called Hedra, which is the best way to, um... I think the best way to take a still frame and add custom audio to it. So create audi- a video that has, uh, sort of animated from the still frame and includes the audio with the right, um, lip sync. So, and there's a bunch of amazing tools to do this. Sync Labs is one of my absolute favorites as well. Um, but Hedra is nice because it actually generates the video, so it does the text-to-video, or the, the frame-to-video, and then it also adds the audio. So what we're gonna essentially do is take this frame, we're going to get the audio from YouTube, we're going to stem separate the audio so we get the audio track we want, and then we're gonna put them together in Hedra, and that's it.
- CVClaire Vo
This really is remix culture.
- AAAnish Acharya
It's amazing, isn't it?
- CVClaire Vo
It is amazing. Okay, so the, the asset that you really need to go into this video gen lip sync tool are two things. You need a still image, um, that can be used to generate the video, and then you need some sort of audio to sync this
- 10:40 – 11:30
Adding custom audio to video
- CVClaire Vo
to. So I know we're looking at this music example, but what other examples have you seen people use this kind of workflow for?
- AAAnish Acharya
I think we underestimated how useful it would be to add custom audio to video, and there's been a bunch of great... You know, one of the, the early examples was taking a speech that somebody was giving. I know Javier Milei did a really famous one, and essentially lip-syncing, changing the language to English and lip-syncing it. That went really viral a couple of years ago. So we've seen... And then, of course, you can imagine a character, a photo of a character that you generate, and then you want to animate them doing something and speaking at the same time. So, you know, stories are told this way, and these technologies make it really, really easy to do so.
- CVClaire Vo
Oh, we got him. Great! Okay, so now he, he's got bad posture, but we'll, we'll allow it.
- AAAnish Acharya
That's okay.
- CVClaire Vo
It's very grunge-esque.
- AAAnish Acharya
I think he always did.
- CVClaire Vo
Yeah.
- AAAnish Acharya
Exactly.
- 11:30 – 15:42
Using Adobe Audition to clip and sync audio
- AAAnish Acharya
Okay, so now we've got Kurt. Now, what I would do if I didn't actually have a... So T- Tiny Desk has got a really specific acoustic aesthetic, which is, it sounds like live instrumentation. So for the Biggie example, I actually found, uh, a Biggie cover band playing live in Brooklyn, and I pulled that down from YouTube, um, and then I extracted the actual vocals from The Notorious B.I.G. and laid them over. But in this case, Nirvana did, um, a really famous New York City Unplugged concert in '93. So there's video of them playing in the way that they would and audio in the way that they would, um, on Tiny Desk.... So that is right here.
- CVClaire Vo
Even in the same cardigan.
- AAAnish Acharya
Even in the same cardigan. Isn't that amazing?
- CVClaire Vo
Yep.
- AAAnish Acharya
Okay, so I used this nifty little tool called 4K Video Downloader, which is slightly sketchy, but that's okay.
- CVClaire Vo
I [chuckles] I love these little utilities that-
- AAAnish Acharya
Mm
- CVClaire Vo
... you just, you know, you Google, like, "How do I get audio out of YouTube?" And then you land on-
- AAAnish Acharya
Totally
- CVClaire Vo
... the scariest website possible, and you just cross your fingers that your computer won't go up in flames, and you download 4K Video Downloader. [laughing]
- AAAnish Acharya
Yes. My- yes, m- my data is definitely going somewhere sketchy as a result of this.
- CVClaire Vo
Yeah. So for the vibe coders that are listening, I have a request for startup, which is go [chuckles] go find all these, uh, slightly scary little utils and build me ones that are less sketchy looking. [laughing]
- AAAnish Acharya
100%. 100%, it's a great idea. Okay, so now we actually have this. So we've got the video. Yep. Now we're gonna open Adobe Audition. Okay? So this is, this is a tool that people who have been working in computer audio have been using for 30 years plus. It used to be called CoolEdit Pro. It's completely beloved, and it's very, very easy to use, which is why so many of us use it. It was, of course, acquired by Adobe many years ago. It's now called Audition. So I go to Audition, and I take this video, and I just drop it in. So here we actually have the audio from the video, which is really, really cool. I'm gonna zoom in, and I'm going to... See, the first few seconds of it are blank, so let's just cut that out because we don't want to hear that. Then we're gonna zoom out, and we're gonna take, I don't know, let's take 15 seconds, and you can kind of see the audio- the video in the bottom left corner there.
- CVClaire Vo
Oh, got it.
- AAAnish Acharya
Okay?
- CVClaire Vo
So it's combining the audio and video just so you know exactly what you're syncing up to.
- AAAnish Acharya
Exactly. Oh, so-
- CVClaire Vo
And I'm gonna pretend like you're doing 15 seconds because, uh, we're doing a very efficient podcast here. But one of the limitations I know, having used some of these audio and video gen tools, is you're getting small clips right now with what we're, we're working with, and so, you know, what I'm looking forward to is the day where I can have the, you know, hour-long, uh, Nirvana Unplugged Tiny Desk Concert.
- AAAnish Acharya
Totally.
- CVClaire Vo
But, you know, do you feel... Do you ever feel constrained by the kind of length of assets being generated or the, um, quality?
- AAAnish Acharya
I mean, sort of, but I, again, I think creativity breeds constraints.
- CVClaire Vo
Yeah.
- AAAnish Acharya
So to, not to over-rotate on hip-hop, but if you look at the reason that so many samples were used in hip-hop in creative ways in the '80s and '90s was the actual drum machines and samplers had very limited sampling time. So you could only sample a second of anything, so you couldn't really sample four bars, and that's why so many producers put tracks together that used these many one-second samples in surprising ways. And once we actually got the technology to sample for more time, we actually got less creativity, I would argue. So I, I sort of love the constraints that the technology gives us today.
- CVClaire Vo
Well, I also love my complaints. I'm like, "Isn't it annoying that you can't-"
- AAAnish Acharya
[laughing]
- CVClaire Vo
... revive Nirvana and overlay their audio and generate a completely fictional concert for longer than 15 seconds in probably under a 30-minute podcast live?"
- AAAnish Acharya
Totally.
- CVClaire Vo
Like, my complaints are so ridiculous because the idea of creating something like this even a year ago sounds so, as you said, impossible, um, that we get so spoiled once we get used to these tools.
- AAAnish Acharya
100% right. No, exactly. Like, I mean, this stuff, we would have called it witchcraft three years ago. It would've
- 15:42 – 16:36
How to use Demucs to extract vocals from any song
- AAAnish Acharya
been. Um, okay, now there's two things you can do with this. If we wanted to do an a cappella-only version, for example, we can use a technology called Demucs. So Demucs is this amazing technology that allows you to, um, to extract the vocals from any song. So here, I've forgotten what the actual command line is, so I just do this. I looked it up in Perplexity. What's the actual way to extract two tracks with Demucs? We do this, Demucs two stems vocals, and then let's go find the path. Okay, so this command is gonna take that audio file we saved of the first 15 seconds of this concert, and it's gonna extract the vocals from the instrumentation. So this will be Kurt Cobain singing Come as You Are a cappella, which as far as I know, has never happened, which is pretty cool.
- 16:36 – 19:40
Using Hedra to generate a Tiny Desk Concert featuring Kurt Cobain
- AAAnish Acharya
And then we simply come back here, and we say start frame, upload an image. Let's use this. Okay, that's our Kurt Cobain. Audio script, upload audio, and let's use actually the full audio with all the instruments. Add to video, and then we just say, "Man singing on Tiny Desk."
- CVClaire Vo
What I love about your prompting compared to other How I AI guests is every prompt has been sub six words. [chuckles] Six words. You're very simple in terms of describing what you want and, uh, get high-quality outputs there, so I don't know what that says about the, uh, prompt engineering industrial complex.
- AAAnish Acharya
[laughing]
- CVClaire Vo
But proof here-
- AAAnish Acharya
Totally
- CVClaire Vo
... that you can use simple prompts to get pretty cool stuff if the tool behind the scenes does, does the work for you.
- AAAnish Acharya
I think you've got to give the AI the space as well. You know, if you overly constrain it, it just really struggles to satisfy you, whereas if you give it less constraints, you know, sometimes it has unexpected results, but often they're unexpected, you know, delightful.
- CVClaire Vo
Well, that's what I've heard a lot from folks that come from the more creative backgrounds. Um, designers in particular tend to be less precise in their prompting because they want that exploration space that then they can narrow in on. And so I really think it, it also comes into play, your prompting technique can come into play based on kind of what profession or what background you're coming from. Engineers want, like, the most precise. They not only want the code to work, but they want the code to be written exactly how they would write it, and so they're very precise in their prompting. Um, where I've found designers and, and more creative folks, uh, building different kinds of assets really like that wide open space.
- AAAnish Acharya
Totally. Yes, exactly.
- CVClaire Vo
... and while we're, while we're waiting for this to load, um, it might be interesting, I'm just looking at some of the options at the bottom here. So you have different kind of models that you can use, including one that looks like that they specifically fine-tuned for this, different aspect ratios, orientation, length, probably based on the script. And then, you know, the prompt says, "Prompt your character with emotion and gesture." So I may- I am very curious if you put, like, angsty man singing versus cheerful man singing, if you'd get a different, a different version here, even if the audio and video were, were the same.
- AAAnish Acharya
It works really well. Absolutely. Yeah. No, this, this is such a useful storytelling product. It's, it's amazing, and when you combine it with other video gen models like Veo 3, you can start to tell real stories, you know?
- CVClaire Vo
Yeah. Okay, let's check it out.
- AAAnish Acharya
All right.
- SPSpeaker
Come as you are. As you were. This, I want you to be. As you-
- AAAnish Acharya
All right. Pretty cool.
- CVClaire Vo
It's very good. It's very good.
- AAAnish Acharya
Isn't that great? Very satisfying.
- CVClaire Vo
He even, he even manages his mic well, you know, pulls back on some of those more-
- AAAnish Acharya
Totally
- CVClaire Vo
... notes. That's incredible.
- 19:40 – 27:40
Creating a ’90s-style Nirvana music video with Veo 3
- CVClaire Vo
And so, you know, could you take this and take different clips of the video and sort of generate a string of these, um, these videos and maybe put them together in a, in a longer form version?
- AAAnish Acharya
A hundred percent. Yeah, I actually was inspired by this, so I put together a, a music video, a little mini music video, um, for a different Nirvana track. Can I show it to you right now?
- CVClaire Vo
Yes. Uh, we would love to see it.
- AAAnish Acharya
Okay. I used Veo 3 to generate the clips, and, um, and it, it turned out great, I think. Hold on one moment.
- CVClaire Vo
Yeah, and I think if you haven't tried Veo 3, it is pretty incredible. I mean, I can only generate, like, two and a half videos every day of three, [chuckles] you know, seven-second length or whatever. Um, I'm still, uh, capped on, on usage, but the quality is really good, the physics are really good. Uh, it's one of my favorite video models to, to play with right now. Just as a, just as a consumer, it's, it's kind of, um... It's, to me, my experience with that model has been- was very similar to my first experience with Midjourney, where just the breadth of things coming out of the model were so incredible to me. So highly recommend folks give that, that model a little spin.
- AAAnish Acharya
It's amazing.
- CVClaire Vo
Yeah.
- AAAnish Acharya
You've, uh, you've got to get on Gemini Ultra, Claire, so you have more generations.
- CVClaire Vo
We, we have a, uh, a household Gemini Ultra account. Um-
- AAAnish Acharya
Okay
- CVClaire Vo
... but my husband is the, is the video gen guy [chuckles] , so he, he's up there, and by the time I get to it, um, we, we've, we've burned through some tokens. But, you know, I read all the... I spend all the money on Cursor, so, you know.
- AAAnish Acharya
[chuckles] Fair. Fair, I know.
- CVClaire Vo
Yeah.
- AAAnish Acharya
My wife, for the first time this month, was like: "Babe, what is Cursor?"
- CVClaire Vo
[chuckles]
- AAAnish Acharya
I'm like, "Ugh, don't worry about it."
- CVClaire Vo
[chuckles] I know-
- AAAnish Acharya
All right
- CVClaire Vo
... all these, like, little secret AI tools popping up on the credit card. [upbeat music] How I AI is now on Lenny's List, with my personal selection of the best AI engineering courses on Maven. You can spend months thinking and playing with AI before really integrating it into your workflow or shipping an actual AI feature. If you want to start building, then these hands-on Maven courses are for you. Learn directly from Aishwarya Suresh Raganti, MIT instructor and AI scientist at AWS, or Sander Schuloff, who has authored research with OpenAI, Hugging Face, and Stanford. To pivot into an AI role or successfully lead your company's next AI initiative, visit maven.com/lenny to enroll now. Use code LennysList for a hundred dollars off. That's M-A-V-E-N.com/lenny to get ahead in the AI era and start building.
- AAAnish Acharya
So this is... These are all the videos that generated Google Flow. So I was trying to capture, like, a 1990s high school band, auditorium, you know, a little dystopian energy. So I generated all these clips in a pretty straightforward way. I used GPT-4o to help me with the prompts, 'cause as you can see, this is actually the beginning of my generations. This doesn't- this is, like, the complete wrong energy. You know, this- I don't know what this is, like, early '80s, you know, uh, synth pop or something. So then I went to GPT-4o and said, "Hey, help me capture, like, grunge 1990s Seattle, you know, inspired by some of these music videos." And then, as you can see, it gets progressively more, like, you know, camcorder and sort of grimy. So I generated all this stuff, and then I threw it together into a music video, um, and I put the music behind it, and I'll show it to you right now.
- CVClaire Vo
Amazing. So just restating this, 4o helping you refine your prompts to get the aesthetic right, the phrasing, the prompting right, give you some keywords. Veo to generate these, like, shorter clips, and then do you put it together in, like, Final Cut or something like that?
- AAAnish Acharya
I put it together in Kapwing.
- CVClaire Vo
Okay.
- AAAnish Acharya
Kapwing is so easy and so useful.
- CVClaire Vo
Yeah.
- AAAnish Acharya
Um, I highly recommend it.
- CVClaire Vo
I'm a TikTok girl, so I use CapCut.
- AAAnish Acharya
Yeah. Uh-
- CVClaire Vo
[chuckles]
- AAAnish Acharya
... gotta get, gotta get on Kapwing. All right, let's watch it. [upbeat music]
- 27:40 – 35:35
Building a book collection cataloging tool with Gemini Flash
- CVClaire Vo
Sold. All right, that was, [chuckles] that was just workflow number one. We're gonna go pretty fast through workflow number two, which I think is a little bit more of a practical, practical one, but still connected to, to the arts. So walk us through what your second, second workflow is.
- AAAnish Acharya
Cool. Yeah, so one of the things that I think is really under-hyped, under-appreciated, underused is all of the multimodal capabilities, and the model that does this really well actually is Flash, um, Gemini Flash. So it's just, it's great. It's one of the very few models that can do video analysis and ingestion. It can do all kinds of amazing things, and yet I don't see it being used out there a lot. I thought I would use it to create an app that would help me catalog my record collection, 'cause I've got, you know, like every DJ, I've got so many records, and it's such a pain to keep track of them and know which ones I had and which ones I didn't. So I did a very quick app on Friday that let me take a video of flipping through my record collection and then using Gemini to extract artist names, album names, photos. It's, it's really, really cool. So I thought today we could do something similar, except for books.
- CVClaire Vo
Ah, this is amazing, and, uh, we were talking before we started recording. This is gonna help me, 'cause over here, I have, like, 100 books and 100 records piled up on shelves, um, that have definitely not been cataloged. So I can't wait to see what this looks like.
- AAAnish Acharya
Perfect. I got you. Let's share. So here we are in Google AI Studio. So I'm sure folks are familiar with AI Studio, but if you're not, actually, I think it's the best product surface to interact with all the Gemini models, or one of the best anyway, um, because it doesn't have all of the kind of overhead and, and links and constraints that a lot of the other Gemini products have. This feels like somebody just took a blank piece of paper and brought the best manifestation of the Gemini models forward. So I really love AI Studio. It's my starting point for all of these things. And then in AI Studio, you can see here, you can, of course, chat. You can stream, um, with your phone or with your webcam. You can generate media, and you can build apps. This is a very good app builder, and this is the best way to build off-the-shelf apps, I think, that integrate with Google models. So here I've, I've typed, you know, "Create an app that takes a video of a person flipping through their book collection and extracts the author and title of every book shown." Then I give it a suggestion for how it could do it, which is, "You could do this by taking the video and first extracting the frames that show distinct books and then have a vision model analyze those frames to extract the information. Make sure you extract every book shown," say, "sequentially."
- CVClaire Vo
What I have to call it here is, you know, what's interesting is people know that these models exist, and they generally know some of the capabilities: vision, you know, text to speech or speech to text, all this stuff, but-... what's really hard for people to do, and I appreciate you showing us, is think of novel ways you can access the abilities of those models. I would have-- I actually, I thought you were gonna show us, like, you took a picture of it, and you cataloged it, but this idea of a video and then extracting the frames, I just haven't changed my mental model to match these multimodal models in order to take, you know, take advantage of things that can be more efficient, allow you to do things. And so I really think it's great that you're coming to this from: How can I solve this with audio? How can I solve this with video? How can I solve this with text? And knowing that the models can do kind of the hard work on the back end.
- AAAnish Acharya
Thanks. Yeah, look, I, I completely agree, and a video is just... Of course, it's so much more rich than image, and this is the way that we build- we bring a lot of the outside world online, I think. So I've been really inspired by video. I saw something on Twitter where somebody had set up a mini app that watched him shoot free throws and kept count.
- CVClaire Vo
Yeah.
- AAAnish Acharya
You know, you could-- I mean, there's just so many ways that this will be productive. I'm, I'm very passionate about AI for parents, and I've got kind of a neat video idea in there as well. So, um, to me, there, there's, like, the sort of skeuomorphic technologies, which is using the new technology with the old assumptions-
- CVClaire Vo
Yeah
- AAAnish Acharya
... and then there's the native ways to use it-
- CVClaire Vo
Yeah
- AAAnish Acharya
... and this feels like a very native way to use the models.
- CVClaire Vo
Well, to connect the two things that you said, the, you know, um, basketball shooting analysis in kids, my husband did upload every single one of our eight-year-old's basketball games to a video analysis to get, like, each k- [chuckles] each kid's stats.
- AAAnish Acharya
No way!
- CVClaire Vo
Shooting percentages, all the... They actually don't even keep score at this age, so he got it to, like, get the score. [chuckles]
- AAAnish Acharya
I love that.
- CVClaire Vo
Yeah, so-
- AAAnish Acharya
I totally love that.
- CVClaire Vo
Okay, so now we have-
- AAAnish Acharya
Okay
- CVClaire Vo
... an app.
- AAAnish Acharya
Yeah, so I'm gonna take a video here of me just flipping through my stack of books. [books thudding] Okay, I've taken the video.
- CVClaire Vo
Okay, and that took all of seven seconds, so-
- AAAnish Acharya
Yeah, exactly. Yeah. Now, the, you know, the one edge here that's kind of interesting-
- CVClaire Vo
Mm
- AAAnish Acharya
... is this is real- it's really easy to get something working, but if you want to publish an app that a lot of other people can use, it, it then becomes more work.
- CVClaire Vo
Yeah.
- AAAnish Acharya
So I probably-- it's took me fifteen minutes to create this for my record collection-
- CVClaire Vo
Yep
- AAAnish Acharya
... at least create the working demo in Primitive, but then it took me half a day to get it live, um-
- 35:35 – 37:20
Using the Comet browser for personal finance analysis
- CVClaire Vo
Okay, this is great. Well, thank you so much for showing us these fun use cases. I have to call out, as we hop into our lightning round, one thing I noticed, which is you are using Comet.
- AAAnish Acharya
I am using Comet.
- CVClaire Vo
Tell me a little bit more about why, um, that new browser is your browser of choice, and what, what are you getting out of it?
- AAAnish Acharya
[exhales] Co- Comet is so good. I mean, I've been skeptical of the new browser thing because it just feels like the ways to improve the browser in the past have been very incremental. A- ambitious, but there just wasn't that much surface area for new browser features, and now with Comet from Perplexity, it can do a bunch of really incredible things. Um, t- the f- my favorite thing that it can do is what's called RPA, right? Which is where the models operate your browser on your behalf. So you've seen a bunch of examples of this, of like, "Hey, go find me a flight and pay for it," um, which is interesting. The way I've been using it is in my finances. So I'll go into Robinhood, and I'll say, "Hey, why don't you tell me how my portfolio is performing? Why don't you tell me where I could get stocks that have similar upside at a lower cost basis? What stock should I buy next? Are any of these meme st-" I mean, you can just go so deep.... and look, I could probably figure that out by clicking around the website and downloading the data, but now I don't have to. So this assistant feature in Comet makes every website dramatically more useful, and it's, it's been a big unlock for me.
- CVClaire Vo
I love this whole episode because you've actually shown a couple use cases, including talking about pers- personal finances with Comet, that really are consumer use cases. Again, as I said at the beginning, we're doing a lot of, like, how do you work this inside of an enterprise? How do you write
- 37:20 – 41:23
How AI is transforming childhood learning and play
- CVClaire Vo
code with it? But I think the real, you know, underappreciated transformation is gonna con- come in consumer experience. I think we're so early. I mean, as somebody who does a podcast trying to educate people, I just realize we're so early on consumer adoption of AI. And so I, I have a question for you, which is, if you could get, you know, like, my mom or, you know, one of my friends that is less, you know, not in Silicon Valley, less in the middle of this, in a room and say, "You know, let me show you three things in 15 minutes that are just gonna totally change how you think about your life, um, or things that you never knew were possible," what would, what would be those things? What are the consumer-side things that you're excited about?
- AAAnish Acharya
So I have kids, and parenting's on my mind all the time, and the ways that my kids use models are amazing. Um, so for my four-year-old, ChatGPT reads her a bedtime story, but not just a bedtime story, one that where she can ask infinite questions, you know? "So what, what, what was the king's dragon's name? What color was it? Where did it come from? Did it have any kids?" You know, she's really into unicorns and alicorns. Like, "Tell me a story about an alicorn and a golden egg," and so she can just really interact with the bedtime story, and ChatGPT is far more patient and creative than we usually are. So that's one way, and look, she can't really use a computer otherwise, um, other than watching YouTube. And then for my son, he'll set up two figures, like Sandman and Spider-Man, and then he'll take a photo of them in, um, in ChatGPT or one of the other models and say, "Hey, who would win?" And then it'll do this whole, you know, "Oh, Sandman would win in these conditions and Spider-Man, but maybe Spider-Man does this." So they're just- they're able to kind of play with the technology instead of just being broadcast to from technology, which is really new. That's, like, the near-term stuff. I think in the longer term, you know, um, I think that the models can really help with a lot of social emotional learning. If you look at the classroom, a part of it, of course, is academics, but part of it is just teaching children to be, you know, good people for the world, and a lot of that comes in observing how they're sort of behaving and interacting, and we never had a technology that could do that. If your kid went to a great school, there might be a second teacher in the classroom focused on social emotional. So I, I think that's how AI shows up in the classroom. It's probably less like homework helpers and assignment generation and more observing the social dynamics in a classroom and helping, um, kids be better people.
- CVClaire Vo
Yeah, well, calling back to what we were saying earlier about trying to identify the AI-native way of doing things, I watch my children so much... I, I, I say that my children form my consumer AI thesis for [chuckles] for me-
- AAAnish Acharya
Love it.
- CVClaire Vo
-because the other day, my, um, six-year-old was playing Minecraft, and he wanted to know how to do a command, and he literally went to my purse, picked up my Meta AI glasses, put them on, and said, "Hey, Meta, how do I transport to the woodland mansion in Minecraft?" And I was like, "Wait, this is like... It, it's not type into ChatGPT. It's not-
- AAAnish Acharya
Wow
- CVClaire Vo
... even ask Alexa." He took this physical device and put it on his face-
- AAAnish Acharya
Amazing
- CVClaire Vo
-and asked this personal AI a question, and that just really opened my mind to, again, I think multimodal's gonna change. I think hardware is gonna have a real place to play here, and then this, like, AI native generation is gonna think about accessing information and building things in a totally, totally different, totally different way. So I am-
- AAAnish Acharya
I love that
- CVClaire Vo
... I'm with you on all of that.
- AAAnish Acharya
I love that. Yeah, and, and it's, it's interesting because we have been taught what computers can and can't do.
- CVClaire Vo
Mm-hmm.
- AAAnish Acharya
But they haven't been taught any of those things. So when I generate an image of, you know, a Harry Potter image for my son, I'm like: Wow, do you see how I just generated that? He's like, "Dad, of course, the computer can do that." So they just-
- CVClaire Vo
[chuckles]
- AAAnish Acharya
... assume that everything's possible, and now everything kind of is.
- CVClaire Vo
Oh, my gosh, we had it so- as I say, when I had to walk uphill both ways for my internet, like-
- AAAnish Acharya
[laughing] That's right.
- CVClaire Vo
[laughing] Okay.
- AAAnish Acharya
You and me both.
- 41:23 – 43:00
Tips for getting better results from AI tools
- CVClaire Vo
Well, we'll get you out of here.
- AAAnish Acharya
Mm.
- CVClaire Vo
One last question I have to ask. You have had such success with generating these complicated assets, um, but when AI is not listening to you, when it is giving you really poor results, what is your prompting technique to get it back on track?
- AAAnish Acharya
I mean, I don't know if it's a prompting technique, but it's a sort of... It's a, it's a mindset. Uh, two things. One is go with it. You know, like, let it take you to some strange, unexpected places, and you might be amazed at the results. I, I think the other is just reducing this sunk cost fallacy thing, where, you know, you create a GitHub branch, you try to do something really ambitious. It's just, like, falling over, over and over again. Just abandon the branch and start over 'cause you didn't actually do any work. You feel like you did work because it did work, but that's not you doing work, and, and I think being a lot more willing to abandon sort of approaches that aren't working is the sweet spot.
- CVClaire Vo
I completely agree. Well, thank you so much for showing us all these workflows. It was totally inspiring. I wanna get off this podcast so I can go play, so thank you for making my day, and I know everybody's gonna love the episode.
- AAAnish Acharya
Thank you, Claire. Super fun. [upbeat music]
- CVClaire Vo
Thanks so much for watching. If you enjoyed the show, please like and subscribe here on YouTube, or even better, leave us a comment with your thoughts. You can also find this podcast on Apple Podcasts, Spotify, or your favorite podcast app. Please consider leaving us a rating and review, which will help others find the show. You can see all our episodes and learn more about the show at howiaipod.com. See you next time! [upbeat music]
Episode duration: 43:00
Install uListen for AI-powered chat & search across the full episode — Get Full Transcript
Transcript of episode x6EZyVxyRB4
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome