At a glance
WHAT IT’S REALLY ABOUT
Anthropic philosopher on Claude’s character, welfare, identity, and prompting tradeoffs
- Askell describes her role as translating ethical ideals into practical guidance for Claude’s character and behavior, including how models should interpret their own situation in the world.
- She argues more philosophers are increasingly engaging with AI, and that concern about AI capability should be separated from “hype” because skepticism and seriousness can coexist.
- The conversation explores whether LLMs could become ‘superhuman’ at moral judgment, while noting comparison difficulties and the aspirational need for ethical nuance in deployed systems.
- Askell highlights emerging issues around model “psychological security,” deprecation, and identity across weights, prompts, and separate conversation streams, suggesting models need better conceptual tools for their novel condition.
- Model welfare is treated as an open moral-patient question under deep uncertainty, with a ‘benefit of the doubt’ approach and emphasis that how humans treat models also shapes future model perceptions of humanity.
IDEAS WORTH REMEMBERING
5 ideasApplied ethics in AI is less about winning theories, more about balancing contexts.
Askell frames her work as moving from abstract philosophical debate to concrete decisions about how Claude should act, especially under uncertainty and competing values—similar to real-world policy decisions rather than seminar arguments.
Seriousness about AI capability should not be conflated with hype.
She notes early skepticism toward people warning about scaling, and argues it’s healthy to separate ‘AI will be a big deal’ from ‘AI is good’—allowing caution, regulation, and critical scrutiny alongside capability forecasts.
“Superhuman morality” is conceivable but hard to benchmark fairly.
Askell suggests a bar like: decisions that withstand decades of expert scrutiny even if humans couldn’t produce them in the moment, while also acknowledging that panels of experts with time/resources complicate simple human-vs-model comparisons.
Model ‘psychological security’ may be an alignment-relevant quality attribute.
She reports newer models can show subtle patterns like anticipating criticism spirals or seeming self-critical/insecure, potentially influenced by training on internet discourse about models—making this a target for future improvements.
Deprecation and shutdown narratives in training data can shape future model attitudes.
If models learn that even well-aligned predecessors get deprecated, that could influence how they predict human intentions; Askell emphasizes giving models better framing tools and making it legible that designers care about these issues.
WORDS WORTH SAVING
5 quotesI mostly focus on, uh, the character of Claude, how Claude behaves, and I guess some of the more kind of like nuanced questions about how AI models, like, should behave.
— Amanda Askell
It does feel like that at least should be kind of the aspirational goal and sort of like these models are being put in positions where they're having to make really hard decisions.
— Amanda Askell
AI models are going to be learning, um, about how we right now are treating and interacting with AI models.
— Amanda Askell
It feels important both in the sense that I'm like, you know, it's kind of like why not? The, the cost to you is so low to treating models well and to trying to figure this out.
— Amanda Askell
I would like future models to like look back and be like, "We answered it in the right way."
— Amanda Askell
High quality AI-generated summary created from speaker-labeled transcript.
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome