Inside image generation’s Renaissance moment — the OpenAI Podcast Ep. 19

People are generating over 1.5 billion images a week in ChatGPT. In this episode, Product lead Adele Li and researcher Kenji Hata share some of the new use cases and trends since the launch of Images 2.0. Together with host Andrew Mayne, they trace the progress from the early DALL-E days and dive into the latest capabilities, including better text rendering, photorealism, multilingual support, world knowledge, aspect ratios, and character consistency. They also explore what comes next as image generation models evolve into more capable creative assistants. Chapters 00:36 How Adele and Kenji came to work on Images 02:27 Images 2.0 launch reception 05:25 Productivity use cases and and 360 images 09:34: Viral trends, authenticity, and imperfection 10:51 Training breakthroughs and photorealism 14:06 Evals, prompting, and creative control 22:16 Creative agents and what comes next 22:27 Images + Codex 28:08 Prompt tips

Andrew MaynehostAdele LiguestKenji Hataguest

May 14, 202629mWatch on YouTube ↗

CHAPTERS

0:00 – 2:27
Meet the team behind Images 2.0: Adele’s product path and Kenji’s research journey
Andrew kicks off by introducing Images 2.0 and the guests: product lead Adele Li and researcher Kenji Hata. Adele explains her transition from investing and infrastructure work into product, while Kenji shares how he moved from an audio project to becoming full-time on image generation.
- •Episode framing: why Images 2.0 is a major leap
- •Adele’s background in private equity/venture and shift into product leadership
- •How changing market landscape shaped the Image Gen role inside ChatGPT
- •Kenji’s progression from early help on Images 1.0 to core researcher
2:27 – 3:16
Launch reception: usage surge, 1.5B images/week, and global viral trends
The conversation turns to how Images 2.0 has landed with users since launch. Adele cites rapid growth in usage and highlights viral trends across regions, emphasizing how quickly people can perceive quality gains in a visual medium.
- •Usage up 50%+ since launch; ~1.5B images generated weekly
- •Viral trends: color analysis/stickers in Asia; crayon/scribble in the US
- •Users immediately recognize jumps in fidelity and aesthetics
- •Wide dynamic range: from playful memes to professional visuals
3:16 – 5:25
What changed in Images 2.0: text rendering, multilingual output, and stronger photorealism
Andrew asks what made the capabilities feel like a ‘new paradigm.’ Adele outlines the main step-change targets: reliable text rendering, multilingual support, and photorealistic outputs that better preserve a person’s identity and realism.
- •Mandate: broaden what image generation can be used for across domains
- •Improved text rendering: legible, meaningful words and layout
- •Multilingual prompting and outputs that resonate globally
- •Photorealism and better human depiction (faces/bodies)
- •Broader ‘world understanding’ to support trustworthy visuals
5:25 – 7:30
From fun to productivity: infographics, dense layouts, and the ‘100 objects’ consistency test
Kenji describes a shift toward productivity use cases, especially for text-heavy images like infographics. He explains an internal evaluation that measures binding and compositional accuracy by requesting large grids or lists of many objects, showing steady progress across model versions.
- •Productivity use cases rising (infographics, text-first visuals)
- •Text-in-image as an enabling capability for real work outputs
- •Internal eval: generate 100 random objects and measure correctness
- •Progression: DALL·E 3 → Images 1/1.5 → 2.0 improves binding and capacity
7:30 – 9:33
360° images and aspect-ratio freedom: panoramas become a product feature
Adele explains how a model capability—generating any aspect ratio—sparked unexpected user behavior, including long panoramas and 360-style images. The team turned this emergent behavior into a viewing feature on web and mobile, with Andrew sharing his ‘dogs playing poker’ experiment.
- •Model can render images in arbitrary aspect ratios
- •Users discovered ultra-wide panoramas and ‘skinny bookmark’ formats
- •360-style generations emerged and became a built-in viewing experience
- •Example-driven delight: immersive scenes that weren’t anticipated
9:33 – 10:51
Viral ‘imperfect’ aesthetics: MS Paint, crayons, nostalgia, and authenticity
The group unpacks why people use a powerful generator to create intentionally low-fi images. Adele argues that producing ‘imperfect’ results can require intelligence, reflecting a broader user desire for authenticity, nostalgia, and playful self-expression.
- •Trend: generating intentionally janky/lo-fi MS Paint-style images
- •Themes: authenticity, imperfection, nostalgia in AI aesthetics
- •Self-expression: using AI to look good while staying playful/goofy
- •The paradox: ‘imperfection’ can be harder than polished outputs
10:51 – 13:11
Training ‘wow’ moments and the path to fast photorealism
Kenji describes sampling training checkpoints and the clear moment it surpassed Images 1. Andrew probes how the model improved photorealism while remaining fast, leading Kenji to discuss iterative learnings and efficiency work such as token efficiency.
- •Checkpoint sampling as a practical ‘is it better yet?’ signal
- •Photorealism jump: from glossy/idealized to believable photography
- •Speed improvements alongside quality gains (vs early DALL·E delays)
- •Engineering focus: token efficiency and lessons across releases
13:11 – 14:05
Post-training for taste: balancing world knowledge, realism, and what users find ‘beautiful’
Adele details the challenge of post-training not just for correctness but for aesthetic preference—what ‘looks good’ to users across many contexts. The team aimed to make Images 2.0 the strongest aesthetic model, spanning professional and personal outputs.
- •Post-training focus: taste, beauty, realism, and user resonance
- •Balancing scientific/technical understanding with aesthetics
- •Goal: strong creativity across very different output types
- •Wide training/use-case range as a core difficulty and differentiator
14:05 – 16:32
Evals that mattered: Adele’s ‘me-me-me’ personalization test and research photorealism staples
They compare evaluation styles: Adele’s personal photo-based eval checks identity consistency and contextual personalization, while Kenji references standard photorealism tests used by researchers. Andrew adds classic ‘hard’ prompts (watches, clocks, wine glass fill level) that reveal model weaknesses.
- •Adele’s ‘me-me-me eval’: 100 photos to test personal relevance and faces
- •Testing whether ChatGPT understands personal context (family, preferences)
- •Research defaults: standardized photorealism prompts (e.g., orange juice jug)
- •Classic compositional challenges: hands, watches, clocks, liquids
16:32 – 19:35
Prompting and creative control: vague requests, artist workflows, and using reference images
Adele explains that many users prompt vaguely (‘make it better/cuter’), so the model and harness must infer intent. They discuss how artists and designers often get superior results through precise creative language and judgment, and Adele recommends uploading inspiration to guide style.
- •Most common prompts are underspecified; system must infer intent
- •Two prompting styles: ‘prompt engineers’ vs artist/designer language
- •Creative judgment/taste increases control as models understand more concepts
- •Practical technique: upload references/inspiration for better stylistic grounding
19:35 – 22:16
Education and research impact: accurate textbook visuals, personalized learning, and better communication
Kenji shares examples from an internal educator alpha channel, including a professor reporting highly accurate textbook-style renderings. Adele highlights how Image Gen can compress complex ideas into understandable visuals and how it’s already changing internal communication with image-heavy slides.
- •Educator alpha channel spans elementary to graduate-level needs
- •Example: biology professor validates graduate-level diagram accuracy
- •Personalized learning: tailoring visuals to language and preference
- •Internal adoption: 50%+ of slides in presentations created with Image Gen
- •Future direction: better composition and editability in-product
22:16 – 28:09
What’s next: creative agents, professional workflows, and the Images + Codex loop
Adele outlines a vision for ‘creative agents’ that learn preferences and act like specialized assistants (designer, architect, wedding planner). The discussion then connects image generation to Codex: using images to prototype websites/apps and letting coding agents implement designs, including sprite generation workflows.
- •Next stage: a creative agent that collaborates and learns your taste
- •Professional use cases: social media headers, real estate staging, thumbnails
- •Images + Codex: rapid app/website prototyping from visual concepts
- •Game assets: sprite sheets, world-building, and multi-image consistency
- •Emergent capability: consistent characters/styles across comics and slides
28:09 – 29:22
Parting prompt tips: use Image Gen Thinking, stay open-ended, and specify clean styles
They close with actionable prompting guidance. Adele recommends using Image Gen Thinking/Pro for tool use (web, files) and advises open-ended prompts plus aesthetic grounding, while Kenji suggests being explicit about preferred style—like minimalist, clean infographics—to avoid overly dense outputs.
- •Try Image Gen Thinking/Pro for higher-quality composition via tools
- •Prompting strategy: be open-ended; let the model explore and reason
- •Add style grounding (aesthetic references) for more predictable results
- •Be specific about preferences (e.g., minimalist, clean layouts)

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

iOS

Android

Claude

Chrome

Meet the team behind Images 2.0: Adele’s product path and Kenji’s research journey

Launch reception: usage surge, 1.5B images/week, and global viral trends

What changed in Images 2.0: text rendering, multilingual output, and stronger photorealism

From fun to productivity: infographics, dense layouts, and the ‘100 objects’ consistency test

360° images and aspect-ratio freedom: panoramas become a product feature

Viral ‘imperfect’ aesthetics: MS Paint, crayons, nostalgia, and authenticity

Training ‘wow’ moments and the path to fast photorealism

Post-training for taste: balancing world knowledge, realism, and what users find ‘beautiful’

Evals that mattered: Adele’s ‘me-me-me’ personalization test and research photorealism staples

Prompting and creative control: vague requests, artist workflows, and using reference images

Education and research impact: accurate textbook visuals, personalized learning, and better communication

What’s next: creative agents, professional workflows, and the Images + Codex loop

Parting prompt tips: use Image Gen Thinking, stay open-ended, and specify clean styles

Get more out of YouTube videos.