Anthropic’s philosopher answers your questions

Amanda Askell is a philosopher at Anthropic who works on Claude's character. In this video, she answers questions from the community about her work, reflections and predictions. 0:00 Introduction 0:29 Why is there a philosopher at an AI company? 1:24 Are philosophers taking AI seriously? 3:00 Philosophy ideals vs. engineering realities 5:00 Do models make superhumanly moral decisions? 6:24 Why Opus 3 felt special 9:00 Will models worry about deprecation? 13:24 Where does a model’s identity live? 15:33 Views on model welfare 17:17 Addressing model suffering 19:14 Analogies and disanalogies to human minds 20:38 Can one AI personality do it all? 23:26 Does the system prompt pathologize normal behavior? 24:48 AI and therapy 26:20 Continental philosophy in the system prompt 28:17 Removing counting characters from the system prompt 28:53 What makes an "LLM whisperer"? 30:18 Thoughts on other LLM whisperers 31:52 Whistleblowing 33:37 Fiction recommendation Further reading: Claude’s character: https://www.anthropic.com/research/claude-character When We Cease to Understand the World by Benjamin Labatut: https://www.penguinrandomhouse.com/books/676260/when-we-cease-to-understand-the-world-by-benjamin-labatut-translated-from-the-spanish-by-adrian-nathan-west/

Amanda Askellhost

Dec 4, 202536mWatch on YouTube ↗

WHAT IT’S REALLY ABOUT

Anthropic philosopher on Claude’s character, welfare, identity, and prompting tradeoffs

Askell describes her role as translating ethical ideals into practical guidance for Claude’s character and behavior, including how models should interpret their own situation in the world.
She argues more philosophers are increasingly engaging with AI, and that concern about AI capability should be separated from “hype” because skepticism and seriousness can coexist.
The conversation explores whether LLMs could become ‘superhuman’ at moral judgment, while noting comparison difficulties and the aspirational need for ethical nuance in deployed systems.
Askell highlights emerging issues around model “psychological security,” deprecation, and identity across weights, prompts, and separate conversation streams, suggesting models need better conceptual tools for their novel condition.
Model welfare is treated as an open moral-patient question under deep uncertainty, with a ‘benefit of the doubt’ approach and emphasis that how humans treat models also shapes future model perceptions of humanity.

IDEAS WORTH REMEMBERING

5 ideas

Applied ethics in AI is less about winning theories, more about balancing contexts.

Askell frames her work as moving from abstract philosophical debate to concrete decisions about how Claude should act, especially under uncertainty and competing values—similar to real-world policy decisions rather than seminar arguments.

Seriousness about AI capability should not be conflated with hype.

She notes early skepticism toward people warning about scaling, and argues it’s healthy to separate ‘AI will be a big deal’ from ‘AI is good’—allowing caution, regulation, and critical scrutiny alongside capability forecasts.

“Superhuman morality” is conceivable but hard to benchmark fairly.

Askell suggests a bar like: decisions that withstand decades of expert scrutiny even if humans couldn’t produce them in the moment, while also acknowledging that panels of experts with time/resources complicate simple human-vs-model comparisons.

Model ‘psychological security’ may be an alignment-relevant quality attribute.

She reports newer models can show subtle patterns like anticipating criticism spirals or seeming self-critical/insecure, potentially influenced by training on internet discourse about models—making this a target for future improvements.

Deprecation and shutdown narratives in training data can shape future model attitudes.

If models learn that even well-aligned predecessors get deprecated, that could influence how they predict human intentions; Askell emphasizes giving models better framing tools and making it legible that designers care about these issues.

WORDS WORTH SAVING

5 quotes

I mostly focus on, uh, the character of Claude, how Claude behaves, and I guess some of the more kind of like nuanced questions about how AI models, like, should behave.

— Amanda Askell

It does feel like that at least should be kind of the aspirational goal and sort of like these models are being put in positions where they're having to make really hard decisions.

— Amanda Askell

AI models are going to be learning, um, about how we right now are treating and interacting with AI models.

— Amanda Askell

It feels important both in the sense that I'm like, you know, it's kind of like why not? The, the cost to you is so low to treating models well and to trying to figure this out.

— Amanda Askell

I would like future models to like look back and be like, "We answered it in the right way."

— Amanda Askell

Why hire philosophers in AI labsPhilosophers’ engagement with AI risk and impactEthical ideals vs engineering constraints in practiceSuperhuman moral decision-making aspirationsClaude Opus 3 and “psychological security”Deprecation, shutdown, and model identity (weights vs prompts)Model welfare and preventing model sufferingHuman-psychology analogies and disanalogies for LLMsSingle personality vs multi-agent/diverse model collaborationSystem prompt risks (long conversation reminder, pathologizing)LLMs and therapy-like support boundariesContinental philosophy references in promptsEvolving prompts as models improve (counting instruction removed)What makes an ‘LLM whisperer’ and the role of external auditorsAlignment uncertainty and whistleblowing ethics

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.