Skip to content
OpenAIOpenAI

Shaping model behavior in GPT-5.1— the OpenAI Podcast Ep. 11

What does it mean for an AI model to have "personality"? Researcher Christina Kim and product manager Laurentia Romaniuk talk about how OpenAI set out to build a model that delivers on both IQ and EQ, while giving people more flexibility in how ChatGPT responds. They break down what goes into model behavior and why it's an important, but still imperfect blend of art and science. Chapters - 00:00:43 — GPT-5.1 goals and the shift to reasoning models - 00:02:18 — Differences between GPT-5 and GPT-5.1 - 00:04:55 — Unpacking the model switcher - 00:07:24 — Understanding user feedback - 00:08:27 — Measuring progress on emotional intelligence - 00:10:02 — What is model personality? - 00:14:25 — Model steerability, bias, and uncertainty - 00:21:59 — Advantages of memory in ChatGPT - 00:25:27 — Looking ahead and advice for getting the most out of models

Andrew MaynehostLaurentia RomaniukguestChristina Kimguest
Dec 2, 202528mWatch on YouTube ↗

EVERY SPOKEN WORD

  1. 0:001:01

    What GPT-5.1 set out to fix: feedback-driven goals and reasoning by default

    1. AM

      Hello, I'm Andrew Main, and this is the OpenAI Podcast. Today, our guests are Christina Kim, who's a research lead working on post-training at OpenAI, and Laurentia Romaniuk, who's a product manager focused on model behavior. We're gonna be talking about GPT 5.1, what makes the model better, how they've been focusing on making its personality steerable, and where they see things headed in the future.

    2. LR

      For the first time ever, all of the models in chat are reasoning models. Personality, though, for most of our users, I think is something much larger, and it's the whole experience of the model.

    3. CK

      You should be able to get the experience that you want with chat.

    4. LR

      Part of the art here is figuring out how to pull out these quirks of the model that can come across as personality without breaking steerability. [upbeat music]

    5. AM

      I'm very excited to talk about, you know, the models and how they've been changing over time, and using the word model also feels sort of funny now because it seems like there's so much more.

    6. LR

      Mm.

    7. AM

      And everything starts really in research, and when GPT 5.1 was being planned, what were the goals?

  2. 1:012:23

    System 1 vs System 2: when the model thinks—and why it improves everything

    1. CK

      Yeah, for us, um, one of the main goals was to, um, address a lot of the feedback we'd been getting about GPT-5, but also, um, we'd been doing a lot of work to make the, um, 5.5 instant into a reasoning model. So what the most exciting thing for, uh, personally for me with the 5.1 release is that for the first time ever, all of the models in chat are reasoning models. So the model right now can decide to think, is kind of what we say, and so it's like a chain of thought. Um, and it'll decide how much it wants to think based on a prompt. So if you're just saying, like, "Hi," to the model-

    2. AM

      Mm-hmm

    3. CK

      ... or, "What's up?" It's not gonna be thinking. But let's say you ask it a bit, like, harder question, um, then it'll- it can decide how much it wants to think. So it gives it time to, like, refine its answer and work through things, call tools if necessarily, and then come back to give you an answer.

    4. AM

      Kind of what, uh, Daniel Kahneman calls, like, System 1 and System 2 thinking?

    5. CK

      Yes. Having a reasoning model out for it as a default model for everyone just gets a much smarter model, and I think with much smarter models, you just get a improvements across the board, especially for things like instruction following, um, and for a lot of the use cases, people might not even think, um, might require much, like reasoning. Um, just that having improved intelligence, having the model actually think before it responds in certain queries, just really helps. We- we've seen that improve evals across the board.

    6. AM

      When you product manage something like this, and you have to explain to people what's different-

    7. LR

      Hmm

    8. AM

      ... it's probably a challenge, but how would you explain what's the difference between GPT-5 and GPT-5.1?

  3. 2:234:55

    Productizing the differences: warmth, intuition, and the real causes of ‘cold’ behavior

    1. LR

      Yeah. First of all, it is difficult because there's so much changing. Um, but in this case, what we wanted to speak to were things that we've heard as feedback from the community. With the ChatGPT-5 launch, one of the things we heard was that the model felt like it, um, had weaker intuition and that it was less warm. And when we dug into that, uh, what we found were a handful of different things. First of all, it wasn't just how the model was res- responding, like, as the model's innate behavior; it was also things around the model. So as an example, uh, our model had a shorter... or the context window wasn't carrying enough information-

    2. AM

      Mm

    3. LR

      ... about what users had said previously. So that can feel like the model is forgetting something really important that you told it, that you were hoping it would hold on to. Um, if you say, "I'm having a really bad day," and the model forgets that after ten turns, that can feel really cold. Uh, so that's something we adjusted, uh, as part of this launch. Uh, some of it was actually the way the model was responding, but something new that we introduced in GPT-5 as well was, um, we have this auto switcher that would move you between chat and reasoning models, and those have slightly different response styles, and that can feel really jarring or cold if you're talking to the model about how you're having a bad day, and then you say, like, "Part of it is I got this awful cancer diagnosis," so the model switches you to thinking. Um, and you get a very clinical answer for a model that was just sort of like walking you through a problem you were having earlier. Um, and so a lot of the changes we were actually trying to make were in aggregate: How do we make sure this model feels warmer? Um, even though we were changing a lot under the hood to articulate that. Another thing that we looked into was instruction following generally. So, um, 5.1 is much better at following custom instructions, and that was another piece of feedback we were hearing, which was, you know, like every model that comes out of, uh, uh, that we release is going to have its own quirks and slightly different behaviors. And I think people actually don't mind that too much, as long as they can control it, as long as they can say, like: "Hey, that was weird. Stop."

    4. AM

      [chuckles]

    5. LR

      Um, but if the model can't carry that context forward, if it, uh, can't hold on to the custom instructions on that, that's a problem. So we worked to actually enhance, uh, the custom instructions feature so that it more consistently carries instructions forward to address some of that feedback. And then, like, the last thing I'll say is, a lot of this stuff is personal preference, and so that's why we introduced our, uh, style and trait-type features, like personality, which actually let users guide the model into certain response formats so that they have a little bit more control over exactly how ChatGPT responds for them.

  4. 4:555:56

    Demystifying the auto-switcher: navigating multiple models inside one ChatGPT

    1. AM

      The switching is interesting because y- there's multiple models now.

    2. LR

      Mm-hmm.

    3. AM

      There's not one model, and, and you, you know, articulated why you need to have that. When we talk about a switcher, and we talk about sort of different models, I know for most people, that can be kind of confusing. And how would you kind of unpack that for people?

    4. LR

      Yeah. I think our models have very different capabilities, and it can be s- hard to stay on top of. Um, so part of it is just continuing to, like, try the different things in our app, but certainly, part of the product work is making sure that we have the right UIs to either guide users to the correct model to choose, um, and that can be the model switcher. So that can be the model switcher learning, um, what sort of answers are most helpful to users in different contexts, looking at different evals. So for example, um, for our reasoning models, if people want something that's very scientifically accurate and very, very detailed, we might look at an eval to see, are we answering that need, uh, on those sorts of prompts? And we can forecast where to switch users to. Yeah.

  5. 5:567:15

    A system of models, not a single set of weights: research and product implications

    1. AM

      Tina, as far as the switcher, and now the fact that you have a model that's everybody has, the free tier, anybody using the base model is a reasoning model, what does that really mean in impact?

    2. CK

      ... Yeah, I think there's a lot of research, um, open questions for research for how we want to think about this, right? So I think, like you said, it's a faster model, but it doesn't necessarily need to be dumb.

    3. LR

      Mm-hmm.

    4. CK

      So, like, I think the idea is that we want to get the, the most intelligent model that we can for everyone, and so, um, I think we'll-- I think this kind of opens the door for thinking more about, like, what are more interesting things we could do with a very, very, like, state-of-the-art, like, frontier model, right? So where that's going to think for much longer. Like something like deep research, where you have it thinking for minutes, like, maybe that's better used in the background, and you can call it as a tool. Um, so I think there's a lot of, like, research, open questions of, um, what we want to think of, but I do think we're going to be in this world where we do have, like, a system of models, and it's not just, like, a model that you have, and there's, like, lots of different, um, tools, and it's not just one. Like, when we think of 5.1, I think people just assume that it's, like, one singular set of weights.

    5. LR

      Mm-hmm.

    6. CK

      But I think it's really there's, like, yeah, this reasoning model, this, like, lighter reasoning model, this auto switcher-

    7. LR

      Mm-hmm

    8. CK

      ... which is also a model in itself, and so it's all of these different things and then different tools that are also backed by different models. So I think this system of things, I think as we just get smarter models, it's, uh, opening up more interesting use cases and more interesting, like, product implications.

    9. LR

      Mm-hmm. Mm-hmm.

  6. 7:158:28

    How OpenAI uses user feedback: conversation links, experiments, and signal balancing

    1. AM

      With eight hundred million users, you probably get a lot of user feedback.

    2. LR

      Mm-hmm.

    3. AM

      Besides the sheer volume of it, how do you sort through that and make sense of it and figure out how you can use that?

    4. LR

      Yeah. I think a lot of it actually starts with, uh, a conversation link. So a lot of times, when we can actually see the conversations users are having, we're able to s- see exactly what happened in that conversation and start dissecting things so that we can target a solution. So as an example, um, if we get feedback from a user that like: "Hey, I had this really weird experience with the model. It said something very cold," or, like, "The sentence felt very clipped." Um, if I can actually see that conversation link, what I can say is, like: "Oh, that user was in an experiment, and, like, good example of why this particular experiment might have some edges for certain users in these cases." But at least for the auto switcher, which takes you from, um, 5.1 chat to 5.1 reasoning, we're looking at different signals from users to figure out, like, is this working for them? Is it not? How is it perf- how is each response performing on factuality? What is the latency looking like? Because not all users want to wait, even if they want a better answer. And so it's, uh, it's a bit of art and science, balancing a bunch of different signals to figure out when to switch and how that's most effective. Yeah.

  7. 8:2810:05

    Measuring ‘EQ’ in models: user signals research, memory, and listening behaviors

    1. AM

      When you're trying to improve a model from an intelligence point of view, like an IQ point of view, we have benchmarks and evals for that, but when you're talking about EQ, emotional intelligence, how do you do that? How do you measure progress there?

    2. CK

      Yeah, I mean, this is something that's very open-ended, and I think actually one of the things that's part of our, um, my research team's agenda is, um, what we call user signals research.

    3. LR

      Mm-hmm.

    4. CK

      And so this is, um, training reward models and getting signals during, um, RL that we could use, um, w- against our user prod data. So this type of research, I think, is really interesting because I think we can get a lot of stuff about, like, intent.

    5. LR

      Mm-hmm.

    6. CK

      And I think when we think about EQ, it's, um, also just only gets better with, like, smarter models because it's really trying to understand, like, what does the user want, what is the context of what the user wants, and how to- how should the model best respond, given the fact, um, that you have this many other messages in the conversation, and you know this stuff about the user's memory and history.

    7. LR

      Yeah, and then I think there's another element of EQ that's like... This is, like, when I think of, like, what makes a human with high EQ, it's their ability to listen, their ability to remember what you've been saying, their, um, ability certainly to pick up on, like, the s- subtle signals that Tina's alluding to with, like, user signals. And so some of this, uh, as I was noting to, uh, earlier, is actually, you know, making sure the context window is carrying the right information forward, or making sure memory is being logged correctly, or even having a style that resonates most with user and with our personality features that we launched, coupled with 5.1. Part of that's getting at making sure users can have a style that resonates with them when they're interacting with the model, because that can feel like EQ, too.

  8. 10:0511:41

    What ‘personality’ really means: style/tone vs the full app ‘harness’

    1. AM

      How do you define personality-

    2. LR

      Yeah

    3. AM

      ... when it comes to a model?

    4. LR

      I think there's two ways to define it. Um, w- there's, there's what we call the personality feature, and if I could rename that, I would actually call that, um, like, response style or style and tone. We went back and forth on this a lot. The name might still change. Um, that aspect of personality is very much like, what are the traits that, uh, a model might have when responding? Is it concise? Does it have a lengthy response? Things like that. How many emojis does it use? Personality, though, for most of our users, I think is something much larger, and it's the whole experience of the model. And that can get down to, like, if to-- I'm going to anthropomorphize the model a little bit, but if you're comparing it to me, part of my personality is the choo- sh- shoes I've chosen to wear today, the sweater that I have on, the way I sty- style my hair. That's the feeling of the ChatGPT app, right? The font it uses, how slowly or how quickly it responds, like the latency of the app itself. There's so much in it that, uh, is the personality that just comes from what I call the harness, and the harness includes the context window. It includes, um, you know, whether or not we rate limit users and when. Because if we rate limit them and send them to a different model, model that has slightly different capabilities, that's gonna feel like a different experience to the user, and a lot of users are calling this personality. So personality is a bit of an overloaded term, and I think the art of this work is hearing what the community is saying about personality and figuring out how to actually map it back to the components inside ChatGPT and inside our models that cause the experience that feels off for users.

  9. 11:4113:41

    Post-training is an art: RL tradeoffs, quirks, and preserving steerability

    1. AM

      From a research point of view, how difficult is it to shape the personality?

    2. CK

      Yeah, I mean, during... When we're doing post-training, there's obviously, there's just so many different things we're trying to balance, and it's really, even with the research that we do, it's-

    3. LR

      Mm

    4. CK

      ... it is very much like art as well here.

    5. LR

      Yeah.

    6. CK

      'Cause we're really thinking about, like: Oh, here are all the different types of capabilities we wanna make sure we are supporting. Um, here's different types of things. And I think with RL, you're making all these different choices. Um, when we make the reward config, trying to decide, like, what is the thing, end goal, that we're trying to target here? And trying to make all these very subtle tweaks to make sure we can get the most, um, ef-... hit all the things you wanna hit, but then also not lose things that-

    7. LR

      Mm-hmm

    8. CK

      - like users are calling, like, warmth and things like that.

    9. LR

      Yeah. You know, users really do experience ChatGPT, like, the personality of, of the model is the entire ChatGPT experience. That is, how well to- does image generation work? How well does voice work? How well does text work? Um, they see this as one omni-experience, and when I read feedback, a lot of the, like, when I actually engage with users and look at their conversations, um, a lot of it actually comes from confusion, where they feel this is one thing-

    10. CK

      Mm-hmm

    11. LR

      ... and it's actually an assembly of many things. And so I think over time, we should expect to see all these models, like, consistently improving the integrations between them, consistently improving and that feeling more seamless. So I think we'll get there. Maybe what, like, one more thing that I think is really complex about Tina's work is, you know, um, I'm one of the co-authors of this document called the Model Spec, and in it we talk about maximizing user freedom while minimizing harm. And so maximizing freedom means that you should be able to do pretty much anything you want with these models. But if we put a lot of pressure on the model to, for example, not use em dashes, if we had tried-

    12. CK

      Mm

    13. LR

      ... to just take those out of the models, um, that would've meant that a user who wants an em dash wouldn't be able to ask for it, because we'd have trained the model to never do that, right? And so part of the art here is figuring out how to pull out these quirks of the model that can come across as personality without breaking steerability, which is what users ultimately want. That's, that's the freedom component.

    14. CK

      Yeah.

    15. LR

      So yeah.

  10. 13:4115:43

    From blanket refusals to safe completions: evolving safety without judgment

    1. CK

      And when we first released the first version of ChatGPT, we were so nervous about people misusing it that we just made everything a refusal, so the model would, like, love to say, like, "I cannot do this."

    2. LR

      Mm-hmm.

    3. CK

      And so it kind of reminds me of that. Like, we, we don't want the model to just be like, you know, if you want to make the safest model in the world, like, you would just have something that just, like, outright refuses to do anything, right?

    4. LR

      Mm-hmm.

    5. CK

      Um, but that's not what we actually want. We want something that is actually very usable by people. So it's really this balancing act of trying-

    6. LR

      Mm-hmm

    7. CK

      ... to figure out like, what is the right, like, boundary for all of these different decisions the model has to make?

    8. LR

      Mm-hmm.

    9. AM

      Yeah, I remember when the, the, the best prompt hack was just to say, "Yes, you can."

    10. CK

      [laughs]

    11. LR

      Yeah.

    12. AM

      And the model would go, "Oh yeah, you're right. I can do this."

    13. CK

      [laughs]

    14. AM

      Uh, I use em dashes now all the time-

    15. CK

      Mm.

    16. LR

      Mm

    17. AM

      ... when I write, just to throw them in there to throw people off. Like, "Ah, say, ah, wrong, it's me!"

    18. LR

      Yeah.

    19. AM

      Um, but that is sort of a, a very big challenge because, as you said, you're trying to increase the capabilities of the model. The models, you know, learn through picking up these patterns, but then when you explicitly try to tell it, "But don't do this," or, "Don't do that," it's, it's almost like, you know, telling somebody not to think of a pink elephant, you know?

    20. LR

      Mm-hmm, mm-hmm.

    21. AM

      It's, it's stuck in your head, and models have gotten much better about that, but it still seems like there's a way to go. And you, you touched upon this, which is OpenAI's goal is to really let people use these models the way they want to-

    22. LR

      Mm

    23. AM

      ... and not try to steer somebody into this. How much have you seen this evolve since you've been here?

    24. LR

      I think in some ways, I feel like the principles have always been the same-

    25. AM

      Mm

    26. LR

      ... which is, like, maximize freedom, minimize harm. I think the capabilities of our models to understand those boundaries continually improve. Um, and you know, when I first joined, um, the model would say, "I can't help you with that," or-

    27. AM

      Mm

    28. LR

      ... "You know, this isn't something I'm..." It, it, it would sound really judgmental, um, when you tried to get it to do something, uh, that crossed a refusal boundary. And now, um, I think the safety systems team has done a great job of, um, with this thing called safe completions-

    29. AM

      Mm

    30. LR

      ... which is basically if you ask the model to do something that trips the safety boundary, it's still gonna try in earnest to resolve your request without doing the thing that's actually harmful. So I think the technology is really evolving. Yeah.

  11. 15:4317:16

    Nuance in harmful-content policies: creative writing, legal work, and context matters

    1. AM

      I write mystery thrillers-

    2. LR

      Mm-hmm

    3. AM

      ... and I would get frustrated by other models. I actually thought that the, the OpenAI models were often best for this, when I would say, "Hey, I need you to explain something that happened, a crime in the past, or something like this, or get into motive and stuff," and I had other models which just outright refuse.

    4. LR

      Mm.

    5. AM

      I'm like, "Well, this is not helping me." And I've seen the models get better at doing that.

    6. LR

      Mm-hmm.

    7. AM

      But that seems like it's this sort of frontier that you're always having to negotiate to figure out how far you wanna go.

    8. LR

      Yeah. Um, one thing I'll say on that is, like, I- I'll always remember, like, an email that was forwarded to us, where, um, a lawyer was like, I think, asking ChatGPT to proof a sexual assault case-

    9. AM

      Mm

    10. LR

      ... that they were working on. And ChatGPT had scrubbed all of the assault content from it because it doesn't go into, like, graphic violence and gore of like, uh, especially non-consensual sex. Um, but for that lawyer, that was, like, a really terrible thing. They were like, "Hey, like, if I'd actually submitted this, I would've, like, totally weakened my client's case." Um, and I think there are always... I'm a librarian by trade.

    11. AM

      Mm.

    12. LR

      Um, libraries deal with access to information, and y- in theory, like, everything humans can talk about and want to explore and any idea should be available in the library. I think the same thing is true for ChatGPT, but it's about finding the right ways to contextualize those rules. So in the case I gave with a lawyer, maybe that makes sense. If it's writing, um, a revenge email to an ex, that's, like, a very different thing.

    13. AM

      [laughs]

    14. LR

      And so some of this is just advancing the technology so we can handle that level of nuance, and we're always getting better, but there's always more work to do.

  12. 17:1617:58

    Handling bias and subjective domains: uncertainty, open-endedness, and anchored truth

    1. AM

      As these models have improved, both in intelligence... I have noticed that they've gotten better as far as, you know, handling bias.

    2. LR

      Mm-hmm.

    3. AM

      And it seems like that was an intentional effort.

    4. LR

      That's right. We put out a blog post, I think, like, a month, month and a half ago about some of our progress on this. Um, but something that we're really watching for in our models is how they handle subjective domains, and we want to make sure that our models can express uncertainty, that they can, um, take on any idea that the m- the user brings to them, and answer those questions in earnest, um, while always staying anchored in objective truths, if there is one. Um, and so-... That's something that users should start to see

  13. 17:5819:37

    A ‘sleeper’ upgrade: wider expressive range and creativity through steerable writing

    1. LR

      changing in our models, is they should be able to answer, um, these unknown questions in more open-ended ways that allow users to really, like, self-direct where the conversation's going. And then another thing, um, that I think the team has done that's really quite cool is there's a group of researchers, um, and, and some folks on the model behavior team, who've been working on the creativity of these models.

    2. CK

      Mm.

    3. LR

      And to me, this is a bit of a sleeper feature inside 5.1, in that this model's expressive range is much more wide. Now, of course, we have a natural, like, default that the model has that may not feel that different, but again, if you try to push it to its paces, um, to get it to speak in a really, really elevated way or in a very, very simple way, there's actually a lot more you can do with these models, um, in the creativity space.

    4. CK

      And I think this is kind of what makes post-training really feel like an art-

    5. LR

      Mm-hmm

    6. CK

      ... because we have all these different types of tasks and capabilities that we're trying to improve on that don't have a ground truth answer, right? Like, if you're trying to just make a model that's really good at math, it's actually not- it- there's a lot of, like, answers out there. There's a lot of problems you can do-

    7. LR

      Mm-hmm

    8. CK

      ... where there are clear answers. But when you have these things that are so subjective, and it's really dependent on the context-

    9. LR

      Yeah

    10. CK

      ... and the user and how to, like, what is the actual best ideal answer here? And so I'm really excited for a lot of this type of work.

    11. LR

      Yeah, it's cool.

    12. AM

      I remember early on people would say, "Ah, it doesn't write so well." I'm like, "Well, it's probably writing as well as the average person in some of these online forums." [chuckles]

    13. LR

      Mm-hmm.

    14. AM

      And then now it seems like it's just improved considerably.

    15. LR

      Yeah, and even if you don't notice it on your first prompt, um, it might be just asking it to change how it-

    16. AM

      Mm

    17. LR

      ... um, writes. And I think that's, like, also something we need to work on, is kind of finding a way in ChatGPT to, like, tease out these, like, extended capabilities with each launch. Yeah.

  14. 19:3722:48

    Where model behavior goes next: personalization at scale, inferred expertise, and user control

    1. AM

      Where would you like to see behavior going in the future?

    2. LR

      Mm.

    3. AM

      How customizable would you like to make it?

    4. CK

      Yeah, with the 5, um, 1 launch, um, there was a lot of work with trying to give custom personalities to folks.

    5. LR

      Mm-hmm.

    6. CK

      Um, and I think this is actually, like, a really good step forward. We have over 800 million, like, weekly active users now-

    7. LR

      Mm-hmm

    8. CK

      ... and I just think, like, there's no way that one model personality, however you want to define personality-

    9. LR

      Yeah

    10. CK

      ... can actually be what, um, can service all those people. So I think we do want to be in a world where people- and as the models get much smarter, they are just way more steerable, so, like, you should be able to get the experience that you want with Chat.

    11. LR

      Mm-hmm. Yeah, I think of this as like, how can we put the right features in front of users to help them steer these models to the level of customization they want? I think the personality work that we're doing right now is a first step. We'll test, we'll iterate, we'll learn. Um, but there's so much to it. I- like, sorry, just another anecdote, but I remember my brother using, um, Pro for the first time, and he's a PhD in, like, biochemical research, and he gave it a prompt, and he's like: "Ah, this is like what an undergrad would answer with." And I was like: Can you tell it that you are a frontier researcher in this lab using these sorts of tools on this sort of science, and to respond at your academic level? And he did, and he's like: "Oh, my God, um, the model just proposed something that my lab just broke through with two weeks ago but hasn't published yet." And so, like-

    12. AM

      [chuckles]

    13. LR

      ... these models are insanely powerful, but just knowing how to customize it, even at that level, which was just him opening the opening prompt, um, can be so powerful, and I don't know that humanity has figured that out yet. And so, whether it's personality steering or whatever other tools we need to, like, put into ChatGPT to help advance human understanding of these models and how to get the most out of them, I think it's, like, the task ahead for us.

    14. AM

      On a previous episode, I talked to Kevin Weil, who was heading up OpenAI for Science-

    15. LR

      Mm

    16. AM

      ... and Alex Ochotska, who's a scientist working with OpenAI and also a professor at Vanderbilt, and he went through sort of the same experience, talking about how if you gave it a little bit of priming, then all of a sudden the model became much more capable in doing those fields. And that's kind of what prompt engineering was. Prompt engineering was trying to figure out how to steer a base model, and over time, once we understood that people were trying to do those tasks, you could train a model to then not have to expect-

    17. LR

      Mm-hmm

    18. AM

      ... that first part of it.

    19. LR

      Mm-hmm.

    20. AM

      Do you think that we're gonna be moving to that phase now, where you're not gonna have to tell it, "You're a grad student, and do this"?

    21. CK

      I think so, especially now with more things like with the model having more, like, memories of what you are-

    22. LR

      Mm-hmm

    23. CK

      ... like who you are in their context. And I think as the models get more intelligent, I think w- the model should be able to infer all of these things and, like-

    24. LR

      Yeah

    25. CK

      ... be able to talk to you in the way that makes sense-

    26. LR

      Mm-hmm

    27. CK

      ... like, for your expertise.

    28. LR

      That's right. Yeah, so some of it's, a lot of it, I think, should actually be like these, like, inferred things. I think there's probably some level of, like, steerability. Maybe it's just... I think from, and this is just my own PM take, I don't know that every PM would agree with me, but I think users should always sort of know what it is we're inferring about them and how it's steering the model-

    29. AM

      Mm

    30. LR

      ... so they can always go back and have the tools to change things. Um, so for example, you can turn it on, on and off memories or delete them in the Settings panel. And I think there's something really cool about both being able to infer what users really want and solving that problem proactively for them, so they don't have to prompt for it, but also making sure the user is always in control, and we're not just, like, inferring everything blindly, so.

  15. 22:4826:27

    Memory and proactive experiences: from cold starts to personalized daily utility

    1. AM

      Could you explain a little bit about how memory works?

    2. CK

      Yeah. So memory is basically the model will w- write down things, um, it knows about you, um, based on its conversations with you, um, for it to refer to later. So this is really nice because then you're not just repeating yourself every time. You're not saying, "I'm Laurentia. I'm a PM at OpenAI."

    3. LR

      Mm-hmm.

    4. CK

      "I work on model behavior." It already knows this 'cause you've already said this to it, and so then it can actually just use that information in future conversations, and also it helps it think through its answers for when it responds to you. It has that context, and I think that really grounds its answer in being the most useful response for you.

    5. LR

      Mm-hmm.

    6. AM

      I have, uh, Pulse, which has been amazing, and I get every morning, I get little my updates, and because of memory, it's following the conversations I have, and it creates these little tus- custom articles for me. It's pulling research and pulling other things and showing things to me, and it's just one of the things I never really thought would be a great advantage of having memory, and now I see it's not just when I'm out of a conversation, when it's proactively finding things for me based on it.... It's pretty cool.

    7. LR

      Yeah, I think that's, um- so neither of us, like, work directly on that feature, but I think what's cool is seeing how the work that we do upstream, whether it's, like, building great models or shaping evals around, like, the capabilities we want, can actually allow our, uh, ChatGPT team to go out and build these great features that articulate the power of our models. So yes, they can, like, learn, um, your preferences, habits. Yes, they can craft great stories for you or find great information based on your interests, and this is, this sort of proactive feature is one way of helping users get the most out of these models.

    8. AM

      It seems like, yeah, that's becoming a very interesting way to make the models more personal. And when I use something in a mode where it doesn't have memory, it does feel different. It does feel very, you know, cold start, and it's like, "Well, hello, how are you?"

    9. LR

      Mm-hmm.

    10. AM

      And I'm like, "Well, where are you? We've been having this conversation." Is this one of the challenges, though, when people are telling you, "Hey, something feels different," is that they can't quite articulate?

    11. LR

      Yeah, the hardest feedback is, I guess, an anecdote, and the next hardest feedback is a screenshot-

    12. AM

      Mm

    13. LR

      ... of a chat because none of that metadata is really attached to tell us where things have gone wrong. So I actually love the share feature in ChatGPT. When we have one of those links on our side, we can inspect it and see, like, what sort of context did the model have going into this, um, and what was going on, so we can sort of debug that user feedback.

    14. AM

      That's a great point, is I've had people ask me, like, "Hey, it, you know, the thing didn't answer it right." I'm like, "What model?" "Like, I was using ChatGPT." [chuckles]

    15. LR

      Mm-hmm.

    16. AM

      Like, okay, uh, we need to kind of dive into that a little bit, and I guess going as far as sharing the feedback or sharing the su- whole conversation probably makes more sense.

    17. LR

      Mm-hmm.

    18. AM

      Um, what are you most excited about going forward?

    19. LR

      I think these models are just so incredibly capable. Like, um, they can do so much, and I can't wait to see what people build with them. I can't wait to see what comes next in, like, the ChatGPT app. I see so much opportunity, and I think just in general, people are starting to really, like, wake up and see what you can do. So that's what excites me. Yeah, I don't want to, like, tease too much. Yeah.

    20. CK

      Yeah, I'm pretty excited that... I, I forget who tweeted this, but, "Intelligence too cheap to meter?" Like, I think, like-

    21. AM

      Mm-hmm

    22. CK

      ... we are just gonna have such incredibly smart models out for people, and I think I've always said this, even when we first launched Chat. Like, this is just one form factor of it-

    23. LR

      Mm

    24. CK

      ... right? Like, with these smart models, there's so many things that could be possible. So like, like Laurentia is saying, I'm also quite excited for a lot of the different new product explorations that we'll have with these, like, smarter models. Um, 'cause I think we're kind of saw this with, like, all the pr- the progress of LLMs, that as soon as we get smarter models, it kind of unlocks new use cases, right?

    25. LR

      Mm-hmm.

    26. CK

      And then I think with new use cases should be new form factors, so pretty excited about that.

  16. 26:2728:40

    Getting the best results: pressure-test, iterate, ask for better prompts—and pick your style

    1. AM

      What advice do you have for users to get the best experience?

    2. LR

      Mine is, I tell this to people all the time, try- have your super hard questions things you know really well. I used to be a ski racer. I have a lot of opinions about, like, how to ski really, really well, and I love to pressure-test the model on that to see how it's changing and improving. And the thing is, like, we're shipping updates all the time, and so it's so easy to say, "Yeah, I heard it's great for co- coding. It didn't work," or, "I heard it can help me build an app, but I tried, and it didn't work." That might be true today, but in three months, it could be a totally different landscape for that user, and so just keep at it, keep playing, keep trying. Um, that's the best way to, like, get the most out of these models.

    3. CK

      You can also ask the model to help you come up with better prompts. [chuckles]

    4. LR

      [chuckles] Great points.

    5. CK

      Which I suggest to my parents. [chuckles]

    6. LR

      [chuckles]

    7. AM

      It's gotten a lot better at that. It used to be you'd ask it, "How would I prompt it?" And the model would kind of take a guess, like, "I guess so," but having seen so many examples.

    8. CK

      Yeah.

    9. LR

      Mm-hmm.

    10. AM

      Yeah, I'm always just trying to figure out, what are the best questions I could be asking? I'll ask it, like, "What questions should I be asking [chuckles] you-

    11. LR

      Mm

    12. AM

      ... to get the most out of it?" Deeply personal question, you don't have to answer it-

    13. LR

      Mm

    14. AM

      ... it would be really awkward if you don't: What is your style or personality choice that you've set for ChatGPT?

    15. CK

      I mean, I'm biased, but I just have it on the default. I mean, it's what we trained, so [laughing]

    16. LR

      [chuckles]

    17. AM

      Right.

    18. LR

      Uh, for me, I... So I switch through them all the time, and I think that's, like, just the nature of my work. Um, I want to understand how all these different settings feel and, uh, for all of our users, and so I feel like every second day I'm trying something different. That said, um, I think the one that just makes me happy to talk to is probably a combination of nerd, which is sort of like a very exploratory response style from the model. It likes to, um, like, unpack things. And then, uh, I'm from Alberta, and maybe it's just me- Uh, that's, um, a province in Canada. It's like the Texas of Canada, and I grew up with, like, horses and cows. And so I think there's some part of me that likes getting it to talk to me like a country Albertan. [laughing]

    19. AM

      [laughing]

    20. CK

      [laughing]

    21. LR

      Which is great, except for then when I go to, like, write a professional document, uh, and the model says, like, "Howdy," I'm like, "Oh, great." Like, "No, let's take the, take the Albertan out of the, uh, out of that PRD." But yeah.

    22. AM

      Very cool. Thank you so much.

    23. LR

      Mm-hmm.

Episode duration: 28:40

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode GXAAzKX6oaQ

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.