Skip to content
No PriorsNo Priors

No Priors Ep. 64 | With Suno CEO and Co-Founder Mikey Shulman

Mikey Shulman, the CEO and co-founder of Suno, can see a future where the Venn diagram of music creators and consumers becomes one big circle. The AI music generation tool trying to democratize music has been making waves in the AI community ever since they came out of stealth mode last year. Suno users can make a song complete with lyrics, just by entering a text prompt, for example, “koto boom bap lofi intricate beats.” You can hear it in action as Mikey, Sarah, and Elad create a song live in this episode. In this episode, Elad, Sarah, And Mikey talk about how the Suno team took their experience making at transcription tool and applied it to music generation, how the Suno team evaluates aesthetics and taste because there is no standardized test you can give an AI model for music, and why Mikey doesn’t think AI-generated music will affect people’s consumption of human made music. Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @MikeyShulman Show Notes: 0:00 Mikey’s background 3:48 Bark and music generation 5:33 Architecture for music generation AI 6:57 Assessing music quality 8:20 Mikey’s music background as an asset 10:02 Challenges in generative music AI 11:30 Business model 14:38 Surprising use cases of Suno 18:43 Creating a song on Suno live 21:44 Ratio of creators to consumers 25:00 The digitization of music 27:20 Mikey’s favorite song on Suno 29:35 Suno is hiring

Sarah GuohostMikey ShulmanguestElad Gilhost
May 16, 202430mWatch on YouTube ↗

CHAPTERS

  1. 0:00 – 1:03

    Suno in 60 seconds: text prompts to full songs (with lyrics)

    Sarah introduces Mikey Shulman and Suno’s goal of democratizing music creation. She plays an example generation to set the bar for audio quality and establish what “music generation” means in practice.

    • Suno generates complete songs from text prompts
    • Early example highlights lyrics + production quality
    • Company context: under two years old, emerged from stealth recently
  2. 1:03 – 2:32

    From childhood bands to Harvard physics PhD to AI startups

    Mikey traces his path from early piano and playing in bands to pursuing physics, then pivoting into software and machine learning. He explains how practical engineering skill (not theoretical brilliance) shaped his trajectory.

    • Started piano at age four; played in bands but didn’t see himself as a top performer
    • PhD in physics focused on quantum computing; heavy emphasis on low-temp/microwave engineering
    • No rigid master plan; followed interests and opportunities
  3. 2:32 – 3:47

    Kensho and the accidental on-ramp into machine learning

    He describes joining Kensho as a software engineer at an opportune moment when ML roles were emerging. That timing let him build teams/products and ultimately experience an acquisition by S&P Global.

    • Joined Kensho very early; strong team chemistry was the draw
    • 2014-era ML hiring: physics PhD was sufficient to be an ML engineer
    • Built products and teams; Kensho acquired by S&P Global in 2018
  4. 3:47 – 5:09

    Why Bark came first: falling in love with audio via earnings-call transcription

    Suno’s origin story begins with a practical enterprise audio task—transcribing earnings calls. That project revealed how underdeveloped audio AI was compared to text and images, motivating broader exploration and the Bark open-source release.

    • First serious audio work: automating painful manual earnings-call transcription
    • Audio lagged behind text/images (especially apparent since 2020)
    • Bark released open source while the team already aimed beyond speech
  5. 5:09 – 5:35

    Choosing music over speech: creativity, emotion, and “how it makes you feel”

    Mikey explains why Suno didn’t become a speech company despite conventional advice. Speech often optimizes for correctness, while music optimizes for emotional impact—exactly the creative frontier they wanted.

    • Many advised building a straightforward speech B2B company
    • Speech succeeds even if slightly robotic; music demands expressiveness
    • Music is evaluated by feeling, not factual correctness
  6. 5:35 – 6:56

    Technical approach: transformers + the hard problem of audio tokenization

    Elad probes the modeling stack, and Mikey shares that Suno uses transformer-based models. The key innovation area is converting continuous audio into manageable discrete tokens suitable for next-token prediction and scaling.

    • Core architecture: transformers (benefiting from text-community advances)
    • Scaling laws and infrastructure learnings transfer from LLMs
    • Big challenge: tokenizing continuous, high-rate audio into discrete representations
  7. 6:56 – 10:05

    Measuring quality when benchmarks fail: “aesthetics matter” and human listening

    Quality assessment in generative music relies heavily on human ears rather than tidy metrics. Mikey describes heavy use of listening, A/B tests, and the idea that optimizing for the wrong benchmark can miss what listeners actually value.

    • Audio benchmarks are immature; metrics can mislead
    • “Aesthetics matter”: subjective listening is central to evaluation
    • Team-wide listening and A/B testing become core workflow—and surprisingly fun
  8. 10:05 – 11:34

    What’s hardest about AI music: modeling human emotion across cultures and contexts

    Mikey contrasts easy-to-describe technical issues (bitrate, stereo) with the real difficulty: evoking emotion. Musical preference is culturally and demographically dependent, making “objective truth” and clean eval targets elusive.

    • Technical fidelity is solvable; emotional resonance is the frontier
    • Preferences vary by culture, age, and context
    • No analog to standardized tests (e.g., bar/LSAT) for music models
  9. 11:34 – 14:38

    Business and product strategy: consumer-first, pricing early, and uncertain long-term models

    Mikey frames Suno as a mass consumer product rather than an add-on for professional DAWs. He defends charging early to learn what users truly value, while acknowledging that generative AI pricing models may evolve beyond SaaS subscriptions.

    • Target audience is everyone (not just prosumers using Ableton/Logic/Pro Tools)
    • Free tier + subscription helps discover willingness-to-pay and value drivers
    • Long-term monetization may shift (subscriptions vs ads vs microtransactions/marketplaces)
  10. 14:38 – 18:40

    Surprising user behaviors: creativity, sharing, and hacked “multiplayer” collaboration

    Unexpected use cases center on people enjoying the creation process itself and inventing collaborative workflows. Mikey compares it to gaming—fun solo, even better with others—and notes the social and identity aspects of sharing songs.

    • Creation becomes enjoyable “journey,” not just painful path to a final track
    • Users co-write lyrics, trade verses/choruses, and iterate socially
    • Identity signaling emerges: users add their names to titles when trending
  11. 18:40 – 21:28

    Live demo: generating a ‘No Priors’ song (phonk/reggae/Hawaiian R&B + sitar)

    The hosts prompt Mikey to create a song live, choosing quirky genre blends and instruments. The generated track plays, sparking discussion about making it the show’s intro and imagining visuals for the synthetic “artist.”

    • Prompt engineering via genre hybridization and instrument constraints
    • Demonstrates rapid end-to-end generation (lyrics + music + vocals)
    • Highlights playful, participatory workflow and shareable outcomes
  12. 21:28 – 22:47

    Creator vs consumer ratio—and why that line may disappear

    Sarah and Mikey explore how generative tools could reshape the asymmetry between creators and listeners. Mikey predicts new modes like “micro-sharing” (songs like selfies) and a future where creation and consumption blend into one continuum.

    • Micro-sharing: songs made for small groups to capture moments
    • Generative tools push more users toward making and remixing
    • Future framing shifts from “create vs consume” to “engage with music”
  13. 22:47 – 27:13

    Where music goes next: faster cultural evolution, new structures, and ‘untiktokifying’ listening

    Mikey offers a forward-looking view: more time and money spent on music, without replacing attachment to human artists. He argues AI can accelerate cultural change and unlock novelty not only in sounds but also in melodies, chords, and song structure.

    • Engagement with favorite artists likely deepens, not declines
    • Like DAWs, AI may accelerate how quickly styles and culture evolve
    • Beyond new sounds: new chord changes/structures; hope for longer-form listening
  14. 27:13 – 30:15

    Showcase favorite + the ‘it’s all sound’ insight, then Suno hiring

    Mikey plays a favorite example from Suno’s showcase and emphasizes how striking it is that vocals, music, and lyrics can be fully synthetic. The episode closes with a hiring plug and where to reach Suno.

    • Featured showcase track demonstrates convincing synthetic vocals and songwriting
    • Model doesn’t explicitly “know” voice—learns sound patterns end-to-end
    • Hiring in Cambridge MA and New York; careers@suno.com

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.