No PriorsNo Priors Ep. 64 | With Suno CEO and Co-Founder Mikey Shulman
At a glance
WHAT IT’S REALLY ABOUT
Suno’s CEO on Democratizing Music Creation With AI-Generated Songs
- Suno CEO and co-founder Mikey Shulman discusses how Suno uses transformer-based AI models to generate complete songs—including lyrics, vocals, and instrumentation—from simple text prompts, with the goal of making music creation accessible to everyone. He explains why the team chose to focus on music rather than speech, emphasizing that quality is ultimately judged by human emotion and aesthetics, not standard AI benchmarks. The conversation covers Suno’s technical approach to tokenizing audio, emerging user behaviors around collaborative creation, and the potential impact on how people create, share, and experience music. Shulman predicts that AI tools will expand participation in music, accelerate cultural evolution in sound and song structure, and blur the line between creators and consumers.
IDEAS WORTH REMEMBERING
5 ideasAesthetics and emotional impact matter more than traditional AI benchmarks in music.
Unlike text models that optimize for test scores or factual accuracy, Suno evaluates success by how music feels to listeners, relying heavily on human listening, A/B tests, and taste rather than standardized metrics.
Transformers work well for audio, but smart tokenization is the real innovation lever.
Suno uses transformer architectures familiar from text AI, and focuses its R&D on turning continuous, high-sample-rate audio into discrete tokens in ways that preserve nuance and musicality.
Avoiding hard-coded music theory enables more novel and unexpected sounds.
Shulman emphasizes that they deliberately do not encode rules like “12 tones” or fixed instrument sets, instead letting the model learn structures implicitly via next-token prediction, opening space for new timbres and hybrids.
AI music tools can turn music creation into a mainstream, social activity.
Users aren’t just output-focused; they enjoy the creative process itself, co-writing lyrics, trading prompts, and effectively “jamming” with friends and the model, echoing the joy of live jam sessions.
The line between creator and consumer is likely to blur significantly.
Shulman expects future experiences where listening and modifying songs blend together, making “creation vs. consumption” a less meaningful distinction as people interact more actively with music.
WORDS WORTH SAVING
5 quotesSpeech just needs to be right… and the real creativity was happening in a totally different part of audio, which is music.
— Mikey Shulman
Aesthetics matter… you have to use your ears in order to evaluate things.
— Mikey Shulman
The model shouldn’t know about music theory… If I tell my model, ‘There are only 12 tones,’ my model will only know how to output 12 tones.
— Mikey Shulman
Like a video game, music is fun by yourself and maybe more fun in multiplayer mode.
— Mikey Shulman
The machine doesn’t know that there is even a concept of voice… it’s just all sound.
— Mikey Shulman
High quality AI-generated summary created from speaker-labeled transcript.
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome