Inside image generation’s Renaissance moment — the OpenAI Podcast Ep. 19

People are generating over 1.5 billion images a week in ChatGPT. In this episode, Product lead Adele Li and researcher Kenji Hata share some of the new use cases and trends since the launch of Images 2.0. Together with host Andrew Mayne, they trace the progress from the early DALL-E days and dive into the latest capabilities, including better text rendering, photorealism, multilingual support, world knowledge, aspect ratios, and character consistency. They also explore what comes next as image generation models evolve into more capable creative assistants. Chapters 00:36 How Adele and Kenji came to work on Images 02:27 Images 2.0 launch reception 05:25 Productivity use cases and and 360 images 09:34: Viral trends, authenticity, and imperfection 10:51 Training breakthroughs and photorealism 14:06 Evals, prompting, and creative control 22:16 Creative agents and what comes next 22:27 Images + Codex 28:08 Prompt tips

Andrew MaynehostAdele LiguestKenji Hataguest

May 14, 202629mWatch on YouTube ↗

WHAT IT’S REALLY ABOUT

Image Gen 2.0’s leap: photorealism, text, productivity, and creative agents

Image Gen 2.0 is positioned as a major capability jump driven by better text rendering, multilingual performance, and much stronger photorealism.
Adoption surged post-launch, with usage up 50% and users generating about 1.5 billion images weekly, alongside global viral trends like crayons/MS Paint-style outputs.
The team emphasizes a shift from “fun” generations to productivity workflows, especially infographics, slide visuals, educational diagrams, and professional marketing assets in precise aspect ratios.
Development relied on iterative training plus post-training focused on “taste,” realism, and user resonance, supported by practical evals like object-grid accuracy and personal-photo realism tests.
Future direction centers on “creative agents” that understand user preferences and workflows, and on deeper integration with coding tools like Codex to go from design concepts to working apps.

IDEAS WORTH REMEMBERING

5 ideas

Image Gen 2.0’s headline upgrade is reliable text-in-image.

They highlight that words are now legible and meaningful, enabling infographics, slides, diagrams, and other “productive” visual communication that earlier models struggled to render cleanly.

Multilingual capability is a core product pillar, not an afterthought.

Adele frames multilingual performance as a deliberate focus, with early resonance observed across regions (e.g., Asia and Europe) as users prompt in diverse languages.

Photorealism improvements target trust and self-representation.

Feedback on prior models altering faces/bodies informed a push toward images that “feel like yourself,” moving from glossy/idealized outputs toward more authentic photographic realism.

Aspect-ratio freedom unlocked new formats like panoramas and 360° scenes.

Support for arbitrary aspect ratios led to emergent user behavior (ultra-wide panoramas, “skinny bookmark” images), which then became a productized 360° viewing feature on web and mobile.

Viral ‘imperfect’ aesthetics reveal a demand for authenticity.

Trends like MS Paint, crayons, and scribbles show users actively seeking nostalgia and imperfection—ironically requiring a smarter model to execute intentional “jank” convincingly.

WORDS WORTH SAVING

5 quotes

If DALL-E was the Stone Ages, Image Gen 2.0 is the Renaissance.

— Adele Li

In the last two weeks since we launched the model, usage is up more than 50%. More than 1.5 billion images are generated every week on ChatGPT.

— Adele Li

It's funny 'cause it takes a lot of intelligence to actually create something that is imperfect.

— Adele Li

You know, as it's training, we take a checkpoint, and then like- we just sample from it, right? And just see, okay, how good is this thing? And I think, like, we just sampled a checkpoint, a model, uh, an image, and we looked at it and we're like, "All right, this is better than Images 1."

— Kenji Hata

The model's understanding of not only what to say, but how to present it, is a superpower.

— Adele Li

Launch reception and usage growthText rendering and layout fidelityMultilingual image generationPhotorealism and identity preservationPanoramas and 360° image viewingEvals/benchmarks and post-training for “taste”Creative agents and Images + Codex workflowsPrompting practices: open-ended goals + style grounding

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.