Shaping model behavior in GPT-5.1— the OpenAI Podcast Ep. 11

Shaping model behavior in GPT-5.1— the OpenAI Podcast Ep. 11

OpenAIDec 2, 202528m

Andrew Mayne (host), Laurentia Romaniuk (guest), Christina Kim (guest)

Reasoning-by-default and adaptive “thinking”Auto-switching between models and response-style discontinuitiesWarmth/EQ as context, memory, and user-intent understandingCustom instructions consistency and steerabilityPersonality as style/tone vs full product “harness”Safety evolution: refusals vs safe completionsFuture personalization: inferred expertise + user control

In this episode of OpenAI, featuring Andrew Mayne and Laurentia Romaniuk, Shaping model behavior in GPT-5.1— the OpenAI Podcast Ep. 11 explores gPT-5.1 makes ChatGPT reasoning-first, warmer, and more steerable today OpenAI discusses GPT-5.1’s central shift: all ChatGPT-tier models are now reasoning models that can choose when to “think” more deeply, improving performance broadly (not just on overtly hard problems).

GPT-5.1 makes ChatGPT reasoning-first, warmer, and more steerable today

OpenAI discusses GPT-5.1’s central shift: all ChatGPT-tier models are now reasoning models that can choose when to “think” more deeply, improving performance broadly (not just on overtly hard problems).

They unpack why users perceived GPT-5 as colder or less intuitive—often due to system-level factors like insufficient carried context, jarring auto-switching between response modes, latency tradeoffs, and inconsistent adherence to custom instructions.

The guests frame “personality” as both response style (tone/format traits) and the whole product experience (“the harness”), including memory, UI, switching behavior, and multimodal performance.

They describe the ongoing challenge of maximizing user freedom while minimizing harm, emphasizing nuanced safe-completions over blanket refusals, and a future where personalization is increasingly inferred—but remains transparent and user-controlled.

Key Takeaways

GPT-5.1 normalizes reasoning as the default experience.

All chat models being reasoning-capable lets the system allocate more “thinking” only when needed, improving instruction following and answer quality across many everyday prompts—not just puzzles or math.

Get the full analysis with uListen AI

“Coldness” often comes from system design, not just model tone.

Short carried context, memory/config issues, and abrupt switching into a more clinical reasoning style can make the model feel less warm even if the underlying model is improved.

Get the full analysis with uListen AI

Auto-switching is a UX and evaluation problem as much as a modeling problem.

The switcher optimizes across signals like factuality and latency, but mismatched response styles in sensitive moments (e. ...

Get the full analysis with uListen AI

Steerability is the antidote to model quirks users dislike.

Users tolerate idiosyncrasies if they can reliably say “stop” via custom instructions or style/trait controls—so 5. ...

Get the full analysis with uListen AI

“Personality” is an overloaded term spanning the whole ChatGPT experience.

Beyond tone and verbosity, users experience personality through latency, UI feel, memory behavior, rate limits/model fallbacks, and how seamlessly text/voice/image features work together.

Get the full analysis with uListen AI

Balancing “maximize freedom, minimize harm” requires nuance, not blanket constraints.

Overtraining away behaviors (even small ones like punctuation patterns) can reduce user freedom; similarly, modern safety aims for “safe completions” that help earnestly without enabling harm.

Get the full analysis with uListen AI

Personalization will shift from explicit prompting to inferred context—if users stay in control.

Memory and inferred expertise can reduce the need for priming (“respond at PhD level”), but the product should surface what’s inferred and provide easy controls to edit/disable memories and steering.

Get the full analysis with uListen AI

Notable Quotes

“For the first time ever, all of the models in chat are reasoning models.”

Christina Kim

“The model right now can decide to think.”

Christina Kim

“Personality… for most of our users… is the whole experience of the model.”

Laurentia Romaniuk

“Part of the art here is figuring out how to pull out these quirks… without breaking steerability.”

Laurentia Romaniuk

“Intelligence too cheap to meter?”

Christina Kim

Questions Answered in This Episode

What specific changes did you make to ensure more prior conversation context is carried forward, and how do you test that it feels “warmer” rather than just “more verbose”?

OpenAI discusses GPT-5. ...

Get the full analysis with uListen AI

How does the auto-switcher detect when a conversation is emotionally sensitive (e.g., a medical diagnosis) so it avoids switching into a more clinical response style?

They unpack why users perceived GPT-5 as colder or less intuitive—often due to system-level factors like insufficient carried context, jarring auto-switching between response modes, latency tradeoffs, and inconsistent adherence to custom instructions.

Get the full analysis with uListen AI

When you say the model can “decide how much it wants to think,” what are the product-visible tradeoffs—latency, cost, and tool use—and how are those thresholds tuned?

The guests frame “personality” as both response style (tone/format traits) and the whole product experience (“the harness”), including memory, UI, switching behavior, and multimodal performance.

Get the full analysis with uListen AI

What are the most reliable user signals you’ve found for measuring EQ-like qualities (listening, remembering, tact), and where do they still fail?

They describe the ongoing challenge of maximizing user freedom while minimizing harm, emphasizing nuanced safe-completions over blanket refusals, and a future where personalization is increasingly inferred—but remains transparent and user-controlled.

Get the full analysis with uListen AI

How do you prevent style/trait personalization from masking factual uncertainty (e.g., a confident tone on subjective domains) while still letting users steer tone?

Get the full analysis with uListen AI

Transcript Preview

Andrew Mayne

Hello, I'm Andrew Main, and this is the OpenAI Podcast. Today, our guests are Christina Kim, who's a research lead working on post-training at OpenAI, and Laurentia Romaniuk, who's a product manager focused on model behavior. We're gonna be talking about GPT 5.1, what makes the model better, how they've been focusing on making its personality steerable, and where they see things headed in the future.

Laurentia Romaniuk

For the first time ever, all of the models in chat are reasoning models. Personality, though, for most of our users, I think is something much larger, and it's the whole experience of the model.

Christina Kim

You should be able to get the experience that you want with chat.

Laurentia Romaniuk

Part of the art here is figuring out how to pull out these quirks of the model that can come across as personality without breaking steerability. [upbeat music]

Andrew Mayne

I'm very excited to talk about, you know, the models and how they've been changing over time, and using the word model also feels sort of funny now because it seems like there's so much more.

Laurentia Romaniuk

Mm.

Andrew Mayne

And everything starts really in research, and when GPT 5.1 was being planned, what were the goals?

Christina Kim

Yeah, for us, um, one of the main goals was to, um, address a lot of the feedback we'd been getting about GPT-5, but also, um, we'd been doing a lot of work to make the, um, 5.5 instant into a reasoning model. So what the most exciting thing for, uh, personally for me with the 5.1 release is that for the first time ever, all of the models in chat are reasoning models. So the model right now can decide to think, is kind of what we say, and so it's like a chain of thought. Um, and it'll decide how much it wants to think based on a prompt. So if you're just saying, like, "Hi," to the model-

Andrew Mayne

Mm-hmm

Christina Kim

... or, "What's up?" It's not gonna be thinking. But let's say you ask it a bit, like, harder question, um, then it'll- it can decide how much it wants to think. So it gives it time to, like, refine its answer and work through things, call tools if necessarily, and then come back to give you an answer.

Andrew Mayne

Kind of what, uh, Daniel Kahneman calls, like, System 1 and System 2 thinking?

Christina Kim

Yes. Having a reasoning model out for it as a default model for everyone just gets a much smarter model, and I think with much smarter models, you just get a improvements across the board, especially for things like instruction following, um, and for a lot of the use cases, people might not even think, um, might require much, like reasoning. Um, just that having improved intelligence, having the model actually think before it responds in certain queries, just really helps. We- we've seen that improve evals across the board.

Andrew Mayne

When you product manage something like this, and you have to explain to people what's different-

Laurentia Romaniuk

Hmm

Andrew Mayne

... it's probably a challenge, but how would you explain what's the difference between GPT-5 and GPT-5.1?

Install uListen to search the full transcript and get AI-powered insights

Get Full Transcript

Get more from every podcast

AI summaries, searchable transcripts, and fact-checking. Free forever.

Add to Chrome