Skip to content
OpenAIOpenAI

Inside image generation’s Renaissance moment — the OpenAI Podcast Ep. 19

People are generating over 1.5 billion images a week in ChatGPT. In this episode, Product lead Adele Li and researcher Kenji Hata share some of the new use cases and trends since the launch of Images 2.0. Together with host Andrew Mayne, they trace the progress from the early DALL-E days and dive into the latest capabilities, including better text rendering, photorealism, multilingual support, world knowledge, aspect ratios, and character consistency. They also explore what comes next as image generation models evolve into more capable creative assistants. Chapters 00:36 How Adele and Kenji came to work on Images 02:27 Images 2.0 launch reception 05:25 Productivity use cases and and 360 images 09:34: Viral trends, authenticity, and imperfection 10:51 Training breakthroughs and photorealism 14:06 Evals, prompting, and creative control 22:16 Creative agents and what comes next 22:27 Images + Codex 28:08 Prompt tips

Andrew MaynehostAdele LiguestKenji Hataguest
May 14, 202629mWatch on YouTube ↗

EVERY SPOKEN WORD

  1. 0:000:36

    Intro

    1. AM

      Hello, I'm Andrew Main, and this is the OpenAI Podcast. On today's episode, we're talking about Images 2.0 with researcher Kenji Hata and product lead Adele Li. They'll discuss why the new model represents such a major leap forward, the evaluations that mattered most during development, and what people are creating with it now that it's widely available.

    2. AL

      [instrumental music] If DALL-E was the Stone Ages, Image Gen 2.0 is the Renaissance. It's not only great artistically and aesthetically, but it also incorporates, you know, science, art, architecture, all in one image.

    3. KH

      We looked at it and we're like, "All right, this is better than Images 1." [laughs]

  2. 0:362:27

    How Adele and Kenji came to work on Images

    1. AM

      A- Adele, tell me a little bit about how you became a product manager here.

    2. AL

      So I joined OpenAI a little over two years ago, and before OpenAI, I was an investor my entire career.

    3. AM

      Oh, wow.

    4. AL

      So I was in private equity and spent three years at Ve- Redpoint Ventures investing in AI and software companies. And when I first joined OpenAI, it was for a completely different role. I was thinking about, how do we build out our data and compute infrastructure? And over time, made my way over to the product side, and for the last six months have been working on Image Gen.

    5. AM

      It, it's, it's interesting how you saw yourself going from one role then finding yourself into this space here, which is kind of cool, you know, to think about the idea that you have this sort of, you know, ability to be useful in different ways.

    6. AL

      Absolutely, and I think the role of a product manager is just to do the job that needs to be done-

    7. AM

      Mm-hmm

    8. AL

      ... no matter what it is. And for Image Gen in particular, it's been really awesome to flex a lot of different muscles when it comes to, uh, building products, working with researchers like Kenji, but also thinking about, like, what is the gap in the market today that we wanna fill, and what is the opportunity that we wanna grasp here? It's not the same market that it was a year ago-

    9. AM

      Mm-hmm

    10. AL

      ... when we first released Image Gen 1.0. Now it's a very different landscape. There are multiple image generation makers out there. Um, and ChatGPT is a very different company and, uh, product itself too, and so, um, really thinking about the evolution of Image Gen and its role within ChatGPT has been really, really exciting to me.

    11. AM

      Kenji, how did you end up working on Images?

    12. KH

      Uh, actually, like, when I first started at OpenAI, I also started about two years ago.

    13. AM

      Mm-hmm.

    14. KH

      Um, I was working on, like, some random audio project initially. Just-

    15. AM

      Mm-hmm

    16. KH

      ... this was my first project, and then at the time, I just found my way just working on helping them work on, uh, Images 1.0, the-

    17. AM

      Mm-hmm

    18. KH

      ... prior to the launch. Um, and so gradually I moved more and more onto the project, and then I became full-time on it, basically.

  3. 2:275:25

    Images 2.0 launch reception

    1. AM

      What has the reception been like right now for the model?

    2. AL

      In the last two weeks since we launched the model, usage is up more than 50%. More than 1.5 billion images are generated every week on ChatGPT.

    3. AM

      Wow.

    4. AL

      And we've seen viral trends emerge across the world, um, all the way from trends in Asia for color analysis and stickers to US, where crayon and scribble are going viral. Um, but also a lot of people exploring emergent use cases, and I think it shows the dynamic range of the model but also how people are able to visually grasp the advancement of the model almost immediately. I think the visual, uh, communication and reaction that we've seen from our users, for them to say, "Hey, this is the best, highest fidelity, highest quality and aesthetic model that we've seen," has been really awesome.

    5. AM

      This felt like a really big shift, almost worthy of maybe not even being, uh, uh, Images 2, but almost, like, just a new paradigm because just the capabilities are through the roof. What made that possible?

    6. AL

      When we started working on this project, I think we sat down and we discussed, what is the step change of capability and use cases that we wanted to build towards? Um, and we believe that image generation has the ability to do so much more than-

    7. AM

      Mm-hmm

    8. AL

      ... it, what it does today. You could distill every single output, uh, or visual content that you see today into an image, and so that was the mandate that we sought out to improve. Um, and f- with this 2.0 model, we've improved on various different dimensions. Um, one is text rendering.

    9. AM

      Mm-hmm.

    10. AL

      The ability for text on a page is so much better fidelity. The language and words actually make sense, and they're actual words. Um, the second of all is multilingual.

    11. AM

      Mm-hmm.

    12. AL

      So we've really focused on making this model work in various different languages, and we're already seeing that people across the world in Asia and Europe are really resonating with these advancements. Um, the third is photorealism. I think we really saw a lot of feedback from our previous models that, uh, the output wasn't very realistic or altered their face or their bodies, and so one of our mandates was, how do we actually make the image feel like more like yourself? And so all the things that you think that the model knows, it does because it has imbued the knowledge of the world into, um, its conscience and is able to visually communicate that back to you as a user. And so putting that all together, I think we really get a state-of-the-art image generation model that is the best aesthetic model out there on the market right now, um, that really represents a new paradigm for image generation, um, which is a huge part of, I think, AI progress at large, uh, that, that we have an opportunity to work on here.

    13. KH

      We often listen back, uh, listen to feedback on social media too.

    14. AM

      Mm-hmm.

    15. KH

      So we kind of just take all these things and basically are just aware of it and try to make sure that they're mitigated or completely fixed in some cases in, in the next iteration.

  4. 5:259:34

    Productivity use cases and and 360 images

    1. AM

      What kind of use cases are you seeing? What are you seeing people do with this now?

    2. KH

      I think one that's particularly close to, like, the research team as a general is, like, infographics, text. Um, I think text in images is, like, so much better nowadays.

    3. AM

      Mm-hmm.

    4. KH

      So, um, I think it just opens up a lot more productive use cases, and at, from, like, the research side, we th- kind of think, you know, image generation used to al- always be about fun and maybe, like, unproductive things.

    5. AM

      Mm-hmm.

    6. KH

      But now we're really seeing steps forward into productivity and, uh, image generation for any type of use case that you can imagine it for.

    7. AM

      So you mentioned text. I remember the early modelsUh, no disrespect to chimpanzees, but getting it to spell, like, OpenAI even looked like a chimp did it. And then now I'm looking at pages of text and finely detailed stuff, and I know that as models get smarter, variable binding, the ability to put things next to each other improves, but this was just a big improvement.

    8. KH

      Yeah. But I don't think it's, like, completely unexpected.

    9. AM

      Mm.

    10. KH

      I think you, you see a lot of growth in between-

    11. AM

      Mm

    12. KH

      ... uh, well, first you see between DALL-E 3 and, you know, GPT Images 1.

    13. AM

      Mm-hmm.

    14. KH

      There was, um, if you ask for a grid of random objects, you, you go from maybe, like, five to eight in DALL-E 3 to maybe around 16 in Images 1, and then with 1.5, we went to about 25 to 36-

    15. AM

      Mm-hmm

    16. KH

      ... um, consistently. And I think now we could probably do over 100, I think.

    17. AM

      Wow.

    18. KH

      This is, like, a test that we might do internally is just, um, we just ask ChatGPT gen- give me a list of 100 random objects, right?

    19. AM

      Yeah.

    20. KH

      And then we just send that to our image generator and s- see how, how many are correct. And usually, you know, it'll get almost all 100 correct. Uh, and that's... But you see the, the constant growth-

    21. AM

      Mm-hmm

    22. KH

      ... over time. Um, so I don't think it's, like, completely unexpected. It's just a steady pace.

    23. AM

      That was a test I used to use for, like, the really old models back with, like, Ada, Babbage, and Curry, like, list 100 science fiction books.

    24. KH

      Yeah.

    25. AM

      And then some of them would get, by the time it got to, like, 22, would just start repeating stuff-

    26. KH

      Yeah, yeah

    27. AM

      ... as it was, the model reached the end of it. So we've seen stuff too, like 360, 360-degree panoramas. How did that happen?

    28. AL

      Yeah, that really came from the emerging capability of the model, which is the ability to render images in any aspect ratio.

    29. AM

      Mm-hmm.

    30. AL

      We discovered that people were generating really long, amazing panoramics, you know, skinny bookmarks as well, and one of the cool capabilities with the model is that not only were you able to generate images in this panoramic aspect ratio, but you'd also render images in the style of 360.

  5. 9:3410:51

    Viral trends, authenticity, and imperfection

    1. AM

      funny too how one of the things that was trending was taking popular images or photos of people and then having the model make, like, kinda janky looking Microsoft Paint versions of them.

    2. AL

      Yes. Yeah.

    3. AM

      And did you think that was something you would see, was that people are gonna use this [laughs] incredibly capable tool to then go make, you know, these silly looking things?

    4. AL

      Yeah. It's funny 'cause it takes a lot of intelligence to actually create something that is imperfect.

    5. AM

      That's what I tell people all the time.

    6. AL

      Yeah. And it's definitely very interesting in the viral trends that we're seeing online right now. Um, one thing that I think people are really striving for is authenticity-

    7. AM

      Mm

    8. AL

      ... imperfection, nostalgia. We're seeing that in the MS Paint prompt-

    9. AM

      Mm

    10. AL

      ... crayons, um, all different kinds of generations that people are creating, and that really feels like the theme of consumers, is they wanna interact with AI in a very authentic, imperfect way. They wanna show their imperfections and use AI to help make them look good, but also show a more fun and goofy side of themselves, and I think that's self-expression via AI is something that we're really excited about. And, you know, I think it's really part of our mission as a company to make it easier for people to learn more and distribute that intelligence, but also, um, letting them express a version of themselves that maybe wasn't possible before.

  6. 10:5114:06

    Training breakthroughs and photorealism

    1. AM

      Kenji, was there a moment with this model where you're saying to yourself, "Wow, I think, think this is ready to go"?

    2. KH

      You know, as it's training, we take a checkpoint, and then, like-

    3. AM

      Mm

    4. KH

      ... we just sample from it, right? And just see, okay, how good is this thing? And I think, like, we just sampled a checkpoint, a model, uh, an image, and we looked at it and we're like, "All right, this is better than Images 1." [laughs]

    5. AM

      [laughs]

    6. KH

      Like, we were just like, "Okay."

    7. AM

      I remember watching the iteration of one of the early versions of DALL-E-

    8. KH

      Yeah

    9. AM

      ... and how at first it was sort of the wispy, sort of weird, sort of the tendril sort of thing, and talking to one of the researchers like, "Is, is that gonna go away?" He's like, "I think two, probably two runs away from that." And then [fingers snap] just like that. The ability to predict that was amazing to me, and all of a sudden everything got crisp and clear.

    10. KH

      Yeah.

    11. AM

      And then also, like, looking at, y- you know, years ago I'd played with, like, you know, GANs and, like, doing those things. You'd, you have to squint and say, "I think it's a pickup truck," or something like that.

    12. KH

      Yeah.

    13. AM

      So it's interesting what you see as you say, "Okay, this just all of a sudden got much better." And-

    14. KH

      Yeah. I mean, it was just very obvious. You just, you just take the early checkpoint, you just sample an image from it, and then you just sample an image from, uh, you know, Images 1, and you just look at the two, and you're just... There's just, there's-

    15. AM

      Yeah. Why do I like this garbage? This is-

    16. KH

      I forgot what the image was. It might have just been, like, a picture of, like, a woman at a sea- on the seaside, like-

    17. AM

      Yeah

    18. KH

      ... you know, overlooking a seaside. We just looked at it and we're like, "All right."There's like no, no question

    19. AM

      Yeah, that was the big, the big-

    20. KH

      [laughs]

    21. AM

      ... the big jump was the photorealism of going from something that looked, that was more of a, a glossy, idealized magazine cover to something that looked like a really good photograph. So help me understand, like besides just more compute, how did this happen? How did you get a model that's much better and also that doesn't take an hour to generate an image? The times are still... I, I remember in the DALL-E days-

    22. KH

      Mm-hmm

    23. AM

      ... like we would literally have to, you know, "Tell, tell us what you want," and then an hour later it'd be on Instagram, to now these things are in ChatGPT, and it's faster. How is it getting both more intelligent and you're maintaining the same speeds?

    24. KH

      I think we learned a lot, uh, in each release, like between 1 and 1.5, now 2.

    25. AM

      Mm-hmm.

    26. KH

      And so we take each, each of the learnings that we've made, and we've, you know... Like for example, speed, right?

    27. AM

      Mm-hmm.

    28. KH

      Um, you know, one of the things is like, "Oh, can we make the model more token efficient?" Or-

    29. AM

      Mm-hmm

    30. KH

      ... or, or something like that. And, uh, you know, we did a lot of work to make it, to make it pu- produce very good images with less tokens.

  7. 14:0622:16

    Evals, prompting, and creative control

    1. AM

      you have any personal favorite benchmark tests you like to do, things you say, "I wanna see it make an image of this"?

    2. AL

      I have a eval that I call the me, me, me eval.

    3. AM

      Okay. [laughs]

    4. AL

      It's essentially 100 photos of myself-

    5. AM

      [laughs]

    6. AL

      [laughs] ... and my friends and my family. Um, and I put everyone in goofy positions. I have about a card or a birthday, um, for every single person. Um, and I think it's a really great eval in the sense that, uh, you only know the people around your, you know, faces the best.

    7. KH

      Mm-hmm.

    8. AL

      Um, you also want to create funny things with the model and thi- do things that are relevant.

    9. KH

      Mm-hmm.

    10. AL

      And so one thing for me as the product manager, um, that I'm testing is not only is the raw capability of the model really great-

    11. KH

      Mm-hmm

    12. AL

      ... but also does ChatGPT understand what I want in that context?

    13. KH

      Mm-hmm.

    14. AL

      You know, ChatGPT remembers, you know, that I have a brother, that I have a mom and dad, um, and what they like to do. And so does the model accurately know how to insert pieces of personalization in the moments that matter in the images? These are things that I'm testing for.

    15. AM

      How about you?

    16. KH

      Besides the grid one I mentioned earlier, that's probably the one I've used the most. For a while, I think Divya and I were doing a lot about photorealism. [laughs]

    17. AM

      Yeah.

    18. KH

      We were trying real hard to push on that. Um, uh, just basically, I know Divya's favorite one was, like a woman holding an or- a jug of orange juice. I don't know if you've seen that.

    19. AL

      Yeah. [laughs]

    20. KH

      [laughs] There's like so many images of a woman holding a jug of orange juice. Um-

    21. AL

      Well, actually feel like the researchers had a more standard set of images like, than they like to-

    22. KH

      I think so too

    23. AL

      ... to lead on.

    24. AM

      Yeah, and you get like the standard, can it do somebody writing with their left hand and a watch on their right hand and a clock-

    25. AL

      The clock

    26. AM

      ... showing this. I think the big, the big leap of the image is like probably 1 or 1.5 was like a half-full glass of wine.

    27. AL

      Hmm.

    28. KH

      Or the wine glass full to the rim?

    29. AM

      Yeah.

    30. KH

      Yeah, yeah, yeah.

  8. 22:1622:27

    Creative agents and what comes next

    1. AM

      How do you see the progression of this? This is great, but typically any time I talk to somebody at OpenAI about what they're working on, they're like, "Yeah, this is good, but..."

    2. AL

      I think we're still super early in exploring all the different use

  9. 22:2728:08

    Images + Codex

    1. AL

      cases that people are really trying to push the model with. Um, and so one of the things that we're really excited about, um, is what is that next, um, stage for Image Gen, um, which is to create the creative agent.

    2. AM

      Mm.

    3. AL

      Ultimately, the agent that can work alongside you, be your creative assistant, um, and really understand how you work, what your preferences are, what is the output that you wanna get to, um, and build the product and model ecosystem that helps users kind of have a personal interior designer, personal architect, um, personal, you know, wedding planner, et cetera, all in one Image Gen.

    4. AM

      I'll tell you another thing that was kind of amazing was like, um, I write books, and so, like, every now and I have a book come out, I've gotta change my social media headers. And I just went, and I said, "Oh, find my book cover and write, you know, create a, a po- you know, create appropriate size social media header that I can put on X or Facebook or whatever." Like, let's see. First shot. First shot. Right aspect ratio, everything.

    5. KH

      We basically did that from the start or trained the models-

    6. AM

      Mm

    7. KH

      ... to be good at that from the start. I remember, like, I worked on the initial de-risks of-

    8. AM

      Mm

    9. KH

      ... of ev- basically it could do any aspect ratio that you ask.

    10. AM

      Yeah.

    11. AL

      Yeah, you can now, um, really just easily specify the outcome that you want.

    12. AM

      Yeah.

    13. AL

      Like in the case of yourself, you're like, "I want promotional material."

    14. AM

      Yeah.

    15. AL

      "I don't have an idea. I didn't specify exactly what I wanted." But the model was able to do the research and then give it to you in the style and aspect ratio that was relevant to you, and that's super powerful. We're already seeing this. Um, you know, you're, you're an author. I've talked to real estate agents who are using Image Gen to help them create listings for their apartments or stage their listings.

    16. AM

      Mm-hmm.

    17. AL

      Um, YouTube creators have talked to me about using Image Gen for their thumbnails and-

    18. AM

      Mm

    19. AL

      ... promotional content. I've talked to-Top artists who wanna use Image Gen to connect with their fans, and I think the ability for all different kinds of professions to start to use Image Gen to help them with visual creation is super powerful, especially if you're working in a visual and a creative industry.

    20. AM

      Mm-hmm.

    21. AL

      Image Gen is such a hack in your professional toolkit. I think it has to be a part of everyone's everyday workflow in the future.

    22. AM

      This does feel like the... I think it feels like the first time where anything I can reasonably come up with, it does a pretty good job of it.

    23. AL

      We think it's a new paradigm for-

    24. AM

      Yeah

    25. AL

      ... image generation altogether. Like, if, you know, we said this in the launch video, if DALL-E was the Stone Ages, Image Gen 2.0 is the Renaissance.

    26. AM

      Yeah.

    27. AL

      Um, and I think that is so true because the model, it's not only great artistically and aesthetically, but it also incorporates, you know, science, art-

    28. AM

      Mm

    29. AL

      ... architecture, all in one image together, and I think that composition, um, and knowledge that the model has just means that the outputs are so much more trustworthy, um, are more powerful-

    30. AM

      Mm-hmm

  10. 28:0829:21

    Prompt tips

    1. AL

      with Image Gen.

    2. AM

      Any, any parting prompt tips for people?

    3. AL

      Well, one of the things I would suggest people try is Image Gen Thinking.

    4. AM

      Okay.

    5. AL

      So if you navigate to the Thinking or Pro models-

    6. AM

      Mm-hmm

    7. AL

      ... we have a more powerful version of Image Gen in that experience, and in that model, uh, you actually are able to search the web, analyze files, um, leverage tools under the hood, um, which then yields a better quality and higher composition photo. And the suggestion that I have for prompting that experience is be open-ended.

    8. AM

      Mm-hmm.

    9. AL

      I think the model will go and do the exploration itself to understand and try to reason, um, and find information that matters. And I also think giving it a sense of an aesthetic is also super helpful. Um, using, grounding that in a style has been really, um, fruitful for a great result.

    10. AM

      Good one. Good one.

    11. KH

      I think just being very particular about the style or, like, what you like in general. Like, for me, I like minimalist infographics.

    12. AM

      Mm-hmm.

    13. KH

      Sometimes I think the model can be a little dense.

    14. AM

      Mm.

    15. KH

      And so I just... Maybe I'm just a simplistic kinda guy.

    16. AM

      [laughs]

    17. KH

      So I just like very th- very clean, a very clean look, so I like that.

    18. AM

      Adele, Kenji, thank you very much.

Episode duration: 29:22

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode bH2nP-aCFjk

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome