OpenAIShaping model behavior in GPT-5.1— the OpenAI Podcast Ep. 11
EVERY SPOKEN WORD
30 min read · 6,471 words- AMAndrew Mayne
Hello, I'm Andrew Main, and this is the OpenAI Podcast. Today, our guests are Christina Kim, who's a research lead working on post-training at OpenAI, and Laurentia Romaniuk, who's a product manager focused on model behavior. We're gonna be talking about GPT 5.1, what makes the model better, how they've been focusing on making its personality steerable, and where they see things headed in the future.
- LRLaurentia Romaniuk
For the first time ever, all of the models in chat are reasoning models. Personality, though, for most of our users, I think is something much larger, and it's the whole experience of the model.
- CKChristina Kim
You should be able to get the experience that you want with chat.
- LRLaurentia Romaniuk
Part of the art here is figuring out how to pull out these quirks of the model that can come across as personality without breaking steerability. [upbeat music]
- AMAndrew Mayne
I'm very excited to talk about, you know, the models and how they've been changing over time, and using the word model also feels sort of funny now because it seems like there's so much more.
- LRLaurentia Romaniuk
Mm.
- AMAndrew Mayne
And everything starts really in research, and when GPT 5.1 was being planned, what were the goals?
- CKChristina Kim
Yeah, for us, um, one of the main goals was to, um, address a lot of the feedback we'd been getting about GPT-5, but also, um, we'd been doing a lot of work to make the, um, 5.5 instant into a reasoning model. So what the most exciting thing for, uh, personally for me with the 5.1 release is that for the first time ever, all of the models in chat are reasoning models. So the model right now can decide to think, is kind of what we say, and so it's like a chain of thought. Um, and it'll decide how much it wants to think based on a prompt. So if you're just saying, like, "Hi," to the model-
- AMAndrew Mayne
Mm-hmm
- CKChristina Kim
... or, "What's up?" It's not gonna be thinking. But let's say you ask it a bit, like, harder question, um, then it'll- it can decide how much it wants to think. So it gives it time to, like, refine its answer and work through things, call tools if necessarily, and then come back to give you an answer.
- AMAndrew Mayne
Kind of what, uh, Daniel Kahneman calls, like, System 1 and System 2 thinking?
- CKChristina Kim
Yes. Having a reasoning model out for it as a default model for everyone just gets a much smarter model, and I think with much smarter models, you just get a improvements across the board, especially for things like instruction following, um, and for a lot of the use cases, people might not even think, um, might require much, like reasoning. Um, just that having improved intelligence, having the model actually think before it responds in certain queries, just really helps. We- we've seen that improve evals across the board.
- AMAndrew Mayne
When you product manage something like this, and you have to explain to people what's different-
- LRLaurentia Romaniuk
Hmm
- AMAndrew Mayne
... it's probably a challenge, but how would you explain what's the difference between GPT-5 and GPT-5.1?
- LRLaurentia Romaniuk
Yeah. First of all, it is difficult because there's so much changing. Um, but in this case, what we wanted to speak to were things that we've heard as feedback from the community. With the ChatGPT-5 launch, one of the things we heard was that the model felt like it, um, had weaker intuition and that it was less warm. And when we dug into that, uh, what we found were a handful of different things. First of all, it wasn't just how the model was res- responding, like, as the model's innate behavior; it was also things around the model. So as an example, uh, our model had a shorter... or the context window wasn't carrying enough information-
- AMAndrew Mayne
Mm
- LRLaurentia Romaniuk
... about what users had said previously. So that can feel like the model is forgetting something really important that you told it, that you were hoping it would hold on to. Um, if you say, "I'm having a really bad day," and the model forgets that after ten turns, that can feel really cold. Uh, so that's something we adjusted, uh, as part of this launch. Uh, some of it was actually the way the model was responding, but something new that we introduced in GPT-5 as well was, um, we have this auto switcher that would move you between chat and reasoning models, and those have slightly different response styles, and that can feel really jarring or cold if you're talking to the model about how you're having a bad day, and then you say, like, "Part of it is I got this awful cancer diagnosis," so the model switches you to thinking. Um, and you get a very clinical answer for a model that was just sort of like walking you through a problem you were having earlier. Um, and so a lot of the changes we were actually trying to make were in aggregate: How do we make sure this model feels warmer? Um, even though we were changing a lot under the hood to articulate that. Another thing that we looked into was instruction following generally. So, um, 5.1 is much better at following custom instructions, and that was another piece of feedback we were hearing, which was, you know, like every model that comes out of, uh, uh, that we release is going to have its own quirks and slightly different behaviors. And I think people actually don't mind that too much, as long as they can control it, as long as they can say, like: "Hey, that was weird. Stop."
- AMAndrew Mayne
[chuckles]
- LRLaurentia Romaniuk
Um, but if the model can't carry that context forward, if it, uh, can't hold on to the custom instructions on that, that's a problem. So we worked to actually enhance, uh, the custom instructions feature so that it more consistently carries instructions forward to address some of that feedback. And then, like, the last thing I'll say is, a lot of this stuff is personal preference, and so that's why we introduced our, uh, style and trait-type features, like personality, which actually let users guide the model into certain response formats so that they have a little bit more control over exactly how ChatGPT responds for them.
- AMAndrew Mayne
The switching is interesting because y- there's multiple models now.
- LRLaurentia Romaniuk
Mm-hmm.
- AMAndrew Mayne
There's not one model, and, and you, you know, articulated why you need to have that. When we talk about a switcher, and we talk about sort of different models, I know for most people, that can be kind of confusing. And how would you kind of unpack that for people?
- LRLaurentia Romaniuk
Yeah. I think our models have very different capabilities, and it can be s- hard to stay on top of. Um, so part of it is just continuing to, like, try the different things in our app, but certainly, part of the product work is making sure that we have the right UIs to either guide users to the correct model to choose, um, and that can be the model switcher. So that can be the model switcher learning, um, what sort of answers are most helpful to users in different contexts, looking at different evals. So for example, um, for our reasoning models, if people want something that's very scientifically accurate and very, very detailed, we might look at an eval to see, are we answering that need, uh, on those sorts of prompts? And we can forecast where to switch users to. Yeah.
- AMAndrew Mayne
Tina, as far as the switcher, and now the fact that you have a model that's everybody has, the free tier, anybody using the base model is a reasoning model, what does that really mean in impact?
- CKChristina Kim
... Yeah, I think there's a lot of research, um, open questions for research for how we want to think about this, right? So I think, like you said, it's a faster model, but it doesn't necessarily need to be dumb.
- LRLaurentia Romaniuk
Mm-hmm.
- CKChristina Kim
So, like, I think the idea is that we want to get the, the most intelligent model that we can for everyone, and so, um, I think we'll-- I think this kind of opens the door for thinking more about, like, what are more interesting things we could do with a very, very, like, state-of-the-art, like, frontier model, right? So where that's going to think for much longer. Like something like deep research, where you have it thinking for minutes, like, maybe that's better used in the background, and you can call it as a tool. Um, so I think there's a lot of, like, research, open questions of, um, what we want to think of, but I do think we're going to be in this world where we do have, like, a system of models, and it's not just, like, a model that you have, and there's, like, lots of different, um, tools, and it's not just one. Like, when we think of 5.1, I think people just assume that it's, like, one singular set of weights.
- LRLaurentia Romaniuk
Mm-hmm.
- CKChristina Kim
But I think it's really there's, like, yeah, this reasoning model, this, like, lighter reasoning model, this auto switcher-
Episode duration: 28:40
Install uListen for AI-powered chat & search across the full episode — Get Full Transcript
Transcript of episode GXAAzKX6oaQ
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome