No PriorsNo Priors Ep. 64 | With Suno CEO and Co-Founder Mikey Shulman
CHAPTERS
- 0:00 – 1:03
Suno in 60 seconds: text prompts to full songs (with lyrics)
Sarah introduces Mikey Shulman and Suno’s goal of democratizing music creation. She plays an example generation to set the bar for audio quality and establish what “music generation” means in practice.
- •Suno generates complete songs from text prompts
- •Early example highlights lyrics + production quality
- •Company context: under two years old, emerged from stealth recently
- 1:03 – 2:32
From childhood bands to Harvard physics PhD to AI startups
Mikey traces his path from early piano and playing in bands to pursuing physics, then pivoting into software and machine learning. He explains how practical engineering skill (not theoretical brilliance) shaped his trajectory.
- •Started piano at age four; played in bands but didn’t see himself as a top performer
- •PhD in physics focused on quantum computing; heavy emphasis on low-temp/microwave engineering
- •No rigid master plan; followed interests and opportunities
- 2:32 – 3:47
Kensho and the accidental on-ramp into machine learning
He describes joining Kensho as a software engineer at an opportune moment when ML roles were emerging. That timing let him build teams/products and ultimately experience an acquisition by S&P Global.
- •Joined Kensho very early; strong team chemistry was the draw
- •2014-era ML hiring: physics PhD was sufficient to be an ML engineer
- •Built products and teams; Kensho acquired by S&P Global in 2018
- 3:47 – 5:09
Why Bark came first: falling in love with audio via earnings-call transcription
Suno’s origin story begins with a practical enterprise audio task—transcribing earnings calls. That project revealed how underdeveloped audio AI was compared to text and images, motivating broader exploration and the Bark open-source release.
- •First serious audio work: automating painful manual earnings-call transcription
- •Audio lagged behind text/images (especially apparent since 2020)
- •Bark released open source while the team already aimed beyond speech
- 5:09 – 5:35
Choosing music over speech: creativity, emotion, and “how it makes you feel”
Mikey explains why Suno didn’t become a speech company despite conventional advice. Speech often optimizes for correctness, while music optimizes for emotional impact—exactly the creative frontier they wanted.
- •Many advised building a straightforward speech B2B company
- •Speech succeeds even if slightly robotic; music demands expressiveness
- •Music is evaluated by feeling, not factual correctness
- 5:35 – 6:56
Technical approach: transformers + the hard problem of audio tokenization
Elad probes the modeling stack, and Mikey shares that Suno uses transformer-based models. The key innovation area is converting continuous audio into manageable discrete tokens suitable for next-token prediction and scaling.
- •Core architecture: transformers (benefiting from text-community advances)
- •Scaling laws and infrastructure learnings transfer from LLMs
- •Big challenge: tokenizing continuous, high-rate audio into discrete representations
- 6:56 – 10:05
Measuring quality when benchmarks fail: “aesthetics matter” and human listening
Quality assessment in generative music relies heavily on human ears rather than tidy metrics. Mikey describes heavy use of listening, A/B tests, and the idea that optimizing for the wrong benchmark can miss what listeners actually value.
- •Audio benchmarks are immature; metrics can mislead
- •“Aesthetics matter”: subjective listening is central to evaluation
- •Team-wide listening and A/B testing become core workflow—and surprisingly fun
- 10:05 – 11:34
What’s hardest about AI music: modeling human emotion across cultures and contexts
Mikey contrasts easy-to-describe technical issues (bitrate, stereo) with the real difficulty: evoking emotion. Musical preference is culturally and demographically dependent, making “objective truth” and clean eval targets elusive.
- •Technical fidelity is solvable; emotional resonance is the frontier
- •Preferences vary by culture, age, and context
- •No analog to standardized tests (e.g., bar/LSAT) for music models
- 11:34 – 14:38
Business and product strategy: consumer-first, pricing early, and uncertain long-term models
Mikey frames Suno as a mass consumer product rather than an add-on for professional DAWs. He defends charging early to learn what users truly value, while acknowledging that generative AI pricing models may evolve beyond SaaS subscriptions.
- •Target audience is everyone (not just prosumers using Ableton/Logic/Pro Tools)
- •Free tier + subscription helps discover willingness-to-pay and value drivers
- •Long-term monetization may shift (subscriptions vs ads vs microtransactions/marketplaces)
- 14:38 – 18:40
Surprising user behaviors: creativity, sharing, and hacked “multiplayer” collaboration
Unexpected use cases center on people enjoying the creation process itself and inventing collaborative workflows. Mikey compares it to gaming—fun solo, even better with others—and notes the social and identity aspects of sharing songs.
- •Creation becomes enjoyable “journey,” not just painful path to a final track
- •Users co-write lyrics, trade verses/choruses, and iterate socially
- •Identity signaling emerges: users add their names to titles when trending
- 18:40 – 21:28
Live demo: generating a ‘No Priors’ song (phonk/reggae/Hawaiian R&B + sitar)
The hosts prompt Mikey to create a song live, choosing quirky genre blends and instruments. The generated track plays, sparking discussion about making it the show’s intro and imagining visuals for the synthetic “artist.”
- •Prompt engineering via genre hybridization and instrument constraints
- •Demonstrates rapid end-to-end generation (lyrics + music + vocals)
- •Highlights playful, participatory workflow and shareable outcomes
- 21:28 – 22:47
Creator vs consumer ratio—and why that line may disappear
Sarah and Mikey explore how generative tools could reshape the asymmetry between creators and listeners. Mikey predicts new modes like “micro-sharing” (songs like selfies) and a future where creation and consumption blend into one continuum.
- •Micro-sharing: songs made for small groups to capture moments
- •Generative tools push more users toward making and remixing
- •Future framing shifts from “create vs consume” to “engage with music”
- 22:47 – 27:13
Where music goes next: faster cultural evolution, new structures, and ‘untiktokifying’ listening
Mikey offers a forward-looking view: more time and money spent on music, without replacing attachment to human artists. He argues AI can accelerate cultural change and unlock novelty not only in sounds but also in melodies, chords, and song structure.
- •Engagement with favorite artists likely deepens, not declines
- •Like DAWs, AI may accelerate how quickly styles and culture evolve
- •Beyond new sounds: new chord changes/structures; hope for longer-form listening
- 27:13 – 30:15
Showcase favorite + the ‘it’s all sound’ insight, then Suno hiring
Mikey plays a favorite example from Suno’s showcase and emphasizes how striking it is that vocals, music, and lyrics can be fully synthetic. The episode closes with a hiring plug and where to reach Suno.
- •Featured showcase track demonstrates convincing synthetic vocals and songwriting
- •Model doesn’t explicitly “know” voice—learns sound patterns end-to-end
- •Hiring in Cambridge MA and New York; careers@suno.com