Y CombinatorBuilding The World's Best Image Diffusion Model
Episode Details
EPISODE INFO
- Released
- September 19, 2024
- Duration
- 55m
- Channel
- Y Combinator
- Watch on YouTube
- ▶ Open ↗
EPISODE DESCRIPTION
Suhail Doshi, a YC alumni who previously founded Mixpanel and Mighty, has created a state-of-the-art (SOTA) AI image diffusion model with Playground. The app allows you to talk to it like a graphic designer and helps you create imagery and text for a wide variety of use cases. In this episode of Lightcone, Suhail sits down with the hosts to talk about his experience building Playground with his team, and what it takes to make a SOTA model. Try Playground: https://playground.com/design Read Playground V3 Paper: https://arxiv.org/pdf/2409.10695 Chapters (Powered by https://bit.ly/chapterme-yc) - 0:00 Intro 1:07 What is Playground? 1:47 What Garry was able to make using Playground 7:04 The focus on text accuracy 10:44 Building a marketplace for Playground 16:00 Prompts are like HTML for graphics 22:25 Creating new design professions 26:13 Using tailwinds of what is happening in language 30:06 Problems with aesthetics evals 32:42 The commercial applications 33:54 When the users you get are not the users you want 40:30 Reflections on going through YC twice 48:30 Running a research lab/startup hybrid vs a pure startup 53:35 What it takes to make a state-of-the-art model 55:09 Outro
SPEAKERS
Suhail Doshi
guestGarry Tan
hostHarj Taggar
hostJared Friedman
host
EPISODE SUMMARY
In this episode of Y Combinator, featuring Suhail Doshi and Garry Tan, Building The World's Best Image Diffusion Model explores playground Reinvents Image Generation As A True Graphic Design Partner The conversation centers on Playground v3, a state-of-the-art image diffusion model and design product optimized for real-world graphic design tasks rather than artistic toy use. Founder Suhail Doshi explains how the team rebuilt the entire stack—architecture, captioning, UX, and marketplace—to achieve unprecedented text accuracy, prompt understanding, and designer‑like interaction. They emphasize shifting from raw model access and prompt engineering toward visual templates, natural language edits, and a creator ecosystem. Alongside technical details, Doshi shares strategic lessons on choosing users, pivoting from failed directions, and marrying research rigor with product usefulness.
RELATED EPISODES
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome




