How I AIUsing Veo 3 to create AI-generated music videos, like a Tiny Desk concert with Notorious B.I.G.
CHAPTERS
Why Anish uses AI for music: from DJ constraints to “creative satisfaction”
Claire Vo introduces Anish Acharya (a16z) and frames the episode as a fun, consumer-focused tour of AI workflows. Anish explains how AI removes longstanding audio constraints (like isolating vocals) and reignites remix culture in a new medium.
- •AI makes it possible to disentangle mixed tracks (vocals vs instrumentation)
- •Anish’s background: decades of DJing and making music
- •AI as the next evolution of sampling/remix culture
- •Creative tooling as an antidote to limited time/skills as an adult
Tiny Desk as a format: using constraints to unlock creativity
They discuss why the Tiny Desk format works so well—tight constraints, recognizable setting, and intimate audio. Anish uses this as the conceptual template for resurrecting “impossible” performances in a respectful, non-derivative way.
- •Tiny Desk’s “constraint-driven” creativity (like ’90s unplugged)
- •Choosing artists who can’t perform anymore as a creative prompt
- •Balancing novelty with respect and originality
- •Video remix culture parallels (TikTok-style iteration)
Case study: building an AI Notorious B.I.G. Tiny Desk performance (overview)
Anish shares the finished Biggie-style Tiny Desk clip and outlines the overall workflow. The key idea: assemble a believable still frame + the right audio layers, then use a tool to animate and lip-sync.
- •Workflow ingredients: still image + target audio
- •Sourcing live-band audio for Tiny Desk-like acoustics
- •Layering extracted vocals over a live performance bed
- •Keeping the workflow “surprisingly simple” end-to-end
Generating the hero still image with GPT-4o Image Gen
Anish demonstrates creating a Tiny Desk-style still frame (using Kurt Cobain as the live example). They highlight why 4o’s image generation is effective: strong prompt adherence and controllable edits.
- •4o Image Gen strengths: prompt adherence and fine-grained control
- •Using the still image as the anchor asset for later animation
- •Iterating quickly (e.g., removing guitar for better a cappella lip-sync)
- •Noticing realism cues (Tiny Desk vibe, logos, era-accurate styling)
Animating a still image with Hedra: frame-to-video + custom audio lip-sync
They introduce Hedra as a practical tool that both generates video motion from a still and synchronizes uploaded audio. The chapter broadens into other applications like translating speeches and animating characters for storytelling.
- •Hedra pipeline: still frame → generated motion → audio lip-sync
- •Alternative tools mentioned (e.g., Sync Labs)
- •Use cases: dubbing/translation, character narration, story scenes
- •Prompting for emotion/gesture to influence performance
Sourcing and preparing audio: pulling from YouTube and trimming in Adobe Audition
Anish downloads a reference performance from YouTube and uses Adobe Audition to trim and align usable segments. They discuss current limitations (short clip lengths) and why constraints can actually improve creativity.
- •Using a YouTube downloader to extract source audio/video
- •Adobe Audition workflow: import video, view waveform, cut dead air, select segment
- •Current generation limits (short clips) vs future longer-form possibilities
- •Constraints as a creativity driver (analogy to early hip-hop sampling limits)
Extracting vocals with Demucs: turning any track into stems
Anish introduces Demucs to separate vocals from instrumentation via a simple command-line flow. This enables custom mashups like an a cappella Kurt vocal or isolating Biggie vocals for live-band overlays.
- •Demucs for stem separation (e.g., ‘two stems vocals’)
- •Command-line approach supported by lookup tools (e.g., Perplexity)
- •Creates never-before-heard variants (a cappella-style outputs)
- •Enables higher-believability remixes by swapping vocal beds
Assembling the Tiny Desk clip in Hedra: minimal prompts, strong results
With the still frame and audio ready, Anish uploads both into Hedra and generates the performance clip. They discuss how short, simple prompts can outperform over-engineered prompting when the model is strong.
- •Hedra inputs: start frame + uploaded audio → generated performance
- •Ultra-simple prompts (e.g., “Man singing on Tiny Desk”)
- •Prompting philosophy: leave space for the model to surprise you
- •Designers vs engineers: exploration prompts vs precise control prompts
Creating a ’90s-style Nirvana music video with Veo 3 (and refining prompts with 4o)
Anish shows a multi-clip Veo 3 workflow to produce a gritty, camcorder-like grunge music video. He uses GPT-4o to diagnose “wrong energy” generations and iteratively steer toward the desired Seattle ’90s aesthetic.
- •Veo 3 used for short cinematic clips; quality praised (physics, realism)
- •4o used as a “prompt coach” to converge on the right aesthetic keywords
- •Iterative progression: clean/incorrect vibe → grimy camcorder grunge
- •Editing/assembly in Kapwing (Claire mentions CapCut as alternative)
Evaluating realism: what looks incredible vs what still breaks
Claire reacts strongly to the realism—wardrobe, emotion, sequencing—while noting telltale artifacts. They call out specific failure modes (duplication, odd props) that creators learn to spot and work around.
- •High points: cohesive aesthetic, camera movement, emotional tone
- •Artifacts: malformed cigarette pack, duplicate crowd jumpers/mirrored motion
- •Practical expectation-setting: “almost perfect” outputs today
- •Framing outputs as love letters/fan creations rather than replacements
Workflow #2: building a video-based book/record cataloger with Gemini Flash in AI Studio
Anish pivots to a practical multimodal app: video of flipping through a collection → extracted frames → recognized titles/authors. He argues video is a “native” interface for bringing the physical world online.
- •Gemini Flash (1.5) for video ingestion + vision extraction
- •Google AI Studio as a lightweight surface for Gemini experimentation
- •Prompt strategy: extract distinct frames, read titles/authors sequentially, capture all items
- •Idea generalizes from records to books and other household inventories
Deploying personal software: from quick prototype to shareable Cloud Run app
They compare how fast it is to build a working demo (minutes) versus making it production/shareable (hours). The takeaway is the rise of “personal software”—people building one-off tools for their own lives.
- •Prototype speed: ~15 minutes for a working demo
- •Publishing overhead: deployment, reliability, usability, cost controls
- •AI Studio app builder + ‘Deploy with Cloud Run’ for shareable links
- •API credit costs and the tradeoff of sharing vs keeping personal
Comet browser for personal finance: AI agents operating websites (RPA)
In a lightning round, Anish explains why he uses Perplexity’s Comet browser—its assistant can operate web apps and summarize insights without manual clicking. He applies it to portfolio analysis inside Robinhood.
- •RPA-style browsing: the model navigates sites on your behalf
- •Finance queries: performance summaries, comparable opportunities, next buys
- •Avoids downloading data or manual dashboard digging
- •Broader claim: assistants make every website more useful
AI for kids: interactive stories, play, and social-emotional learning
They explore consumer AI adoption through parenting. Anish describes kids using AI as interactive collaborators (not passive media), and predicts classroom impact will extend beyond homework to social dynamics and SEL.
- •Interactive bedtime stories with infinite Q&A
- •Kids’ play: photo-based “who would win” scenarios (Spider-Man vs Sandman)
- •Future classroom use: observing social interaction, supporting SEL
- •Claire’s example: Meta AI glasses as a new, hardware-native interface for kids
Getting better results: embrace surprises, reset often, avoid sunk-cost prompting
Anish closes with a mindset for when models fail: follow unexpected directions sometimes, but don’t get trapped iterating on a broken approach. Restarting is cheap, and abandoning “bad branches” is a feature of AI creativity.
- •Go with surprising outputs; they can become new creative directions
- •Avoid sunk-cost fallacy—restart instead of endlessly patching prompts
- •AI work can feel like effort even when you haven’t “built” anything durable
- •Simple prompts + iteration often beat heavy prompt engineering