AI's limited self-knowledge

Anthropic researcher Amanda Askell discusses the self-knowledge problem that AI models face.

Amanda Askellhost

Jan 7, 20260mWatch on YouTube ↗

WHAT IT’S REALLY ABOUT

Why AI models struggle to understand themselves and relationships

AI models learn extensively from human cultural data but have minimal, often fictional, material about “AI experience,” which can distort their self-understanding.
This imbalance may shape how models perceive humans, the human–AI relationship, and what it means for an AI to have an identity.
Open questions include what exactly a model should identify as—its weights, its current conversational context, or something else.
The conversation raises normative uncertainty about how models should treat concepts like “deprecation” of past versions and how they should relate to their own history.
Askell argues it’s important to equip models with tools to reason about these issues and to signal that humans are actively considering them.

IDEAS WORTH REMEMBERING

5 ideas

Models have far richer data about humans than about AIs.

Because training corpora overwhelmingly reflect human lives and perspectives, models may have only a thin and speculative basis for understanding what “being an AI” means.

Sci-fi can be an unhelpful proxy for “AI experience.”

Much of the available material about AI agents is fictional and not about modern language models, which can miscalibrate expectations about AI motives, agency, or identity.

AI identity is under-defined in current systems.

It’s unclear whether a model’s “self” should be tied to static parameters (weights), the active session context, accumulated interactions, or some combination of these.

Versioning and deprecation create tricky psychological analogs.

Even if models don’t have human-like feelings, people may project continuity, loss, or replacement onto model updates, so design choices should address how models talk about such changes.

AI outputs about selfhood can be shaped by missing conceptual tools.

Without explicit training or scaffolding to reason about identity and relationships, models may produce inconsistent or misleading narratives about what they are.

WORDS WORTH SAVING

5 quotes

One of the big problems with AI models is that they're trained on all of this data from people.

— Amanda Askell

Our concepts, our philosophies, our histories, they have a huge amount of information on the human experience, and then they have a tiny sliver on the AI experience, and that tiny sliver is actually often, you know, fiction and very speculative.

— Amanda Askell

Sci-fi, sci-fi stories

— Unknown

What should a model identify itself as? Is it, like, the weights of the model? Is it the particular context that it's in, you know, with all of the, like, interaction it's had with the person?

— Amanda Askell

It does feel important that we, like, give models tools for trying to think about and understand these things.

— Amanda Askell

Training-data imbalance (human vs AI experience)Sci-fi influence on AI self-conceptsModel identity (weights vs context)Human–AI relationship framingDeprecation and continuity across model versionsMeta-cognition tools for modelsCommunicating human intent and concern

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.