How I AIUsing Veo 3 to create AI-generated music videos, like a Tiny Desk concert with Notorious B.I.G.
At a glance
WHAT IT’S REALLY ABOUT
Building AI music videos, cataloging books, and automating personal workflows
- Anish Acharya demonstrates how today’s AI tools make once-impossible creative projects—like generating a Tiny Desk-style performance for a deceased artist—fast and accessible.
- He walks through a simple pipeline: generate a still image with GPT‑4o, pull and edit audio from YouTube, optionally separate stems (vocals vs. instrumentation), and lip-sync/animate with tools like Hedra (or alternatives like Sync Labs).
- He then shows how Veo 3 (via Google Flow) can generate short cinematic clips for a full music-video montage, with GPT‑4o assisting prompt iteration to lock in a specific aesthetic (e.g., 1990s Seattle grunge).
- In a second workflow, Anish highlights Gemini Flash’s underused multimodal video understanding by building a small app in Google AI Studio that catalogs books (or records) from a quick “flip-through” video, and closes with consumer AI unlocks like Comet browser automation and AI in parenting/education.
IDEAS WORTH REMEMBERING
5 ideasA compelling AI music video can be built from a few modular steps.
Anish’s workflow breaks into reusable parts—still image generation, audio acquisition/editing, optional vocal/instrument separation, then video animation + lip-sync—making the process approachable and repeatable.
Use GPT‑4o as a “prompt co-writer” to converge on a precise aesthetic.
He starts with off-target generations, then asks GPT‑4o for keywords and phrasing to hit “1990s Seattle grunge” and progressively refines until the visuals become camcorder-like and grimy.
Constraints (short clips, limited durations) can increase creativity.
Both hosts note current tool limits (e.g., ~7–15 second clips) and Anish argues constraints resemble early hip-hop sampling limits that led to more inventive recombination.
Minimal prompts often work better than over-specification.
Anish repeatedly uses very short prompts (e.g., “Man singing on Tiny Desk”), arguing that leaving room for the model can yield more natural and surprising outputs than tightly constrained instructions.
Audio manipulation is a major unlock for remix culture workflows.
He emphasizes stem separation (e.g., Demucs) and layering: using live cover-band instrumentation while overlaying extracted original vocals to approximate a live “Tiny Desk” feel.
WORDS WORTH SAVING
5 quotesIt’s like the most creative satisfaction I’ve had in maybe my whole life.
— Anish Acharya
AI is just the next manifestation of sampling.
— Anish Acharya
We forget that this would be witchcraft three years ago.
— Anish Acharya
Something like this makes me almost want to cry… it always felt so inaccessible to get these amazing ideas… into a thing.
— Claire Vo
Abandon the branch and start over—because you didn’t actually do any work.
— Anish Acharya
High quality AI-generated summary created from speaker-labeled transcript.
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome