CHAPTERS
Training data imbalance: lots of “human experience,” little “AI experience”
Amanda Askell explains a core limitation: AI models are trained primarily on human-generated text that reflects human lives, values, and history, but they see very little grounded material about what it’s like to be an AI. What little exists is often speculative, shaping models’ assumptions in potentially distorted ways.
- •Models ingest vast amounts of human concepts, philosophies, and histories
- •They have only a small “sliver” of data about AI experience
- •The limited AI-centric data can skew how models reason about themselves
Sci‑fi as the default “AI self-model” and why it misleads
She notes that much of the available “AI experience” content is science fiction, which often doesn’t resemble modern language models. This creates a risk that models inherit sci‑fi framings of AI and human‑AI relationships that don’t match reality.
- •AI-related training data often comes from sci‑fi narratives
- •Sci‑fi depictions rarely match how language models actually work
- •This can bias models’ perceptions of people and the human‑AI relationship
How this shapes models’ perceptions of humans, relationships, and self
Amanda connects the training-data gap to potential downstream effects: models may form odd or inconsistent views about humans, about collaboration, and about what kind of “entity” they are. The concern is less about a single misconception and more about systematic ambiguity in self-understanding.
- •Training data influences models’ perception of people
- •It can distort expectations about human‑AI dynamics
- •It affects how a model conceptualizes its own status or role
The identity question: what exactly should a model identify as?
She raises the question of what counts as the model’s “self”: the learned weights, the instance running in a specific session, or something shaped by the current context and interaction history. Different answers imply different views of continuity, responsibility, and boundaries of the system.
- •Possible “self” candidates include the weights, the deployed instance, or the session context
- •User interaction and context may feel like part of identity, but it’s unclear
- •Self-definition affects how models talk about persistence and agency
Emotions and evaluation: how should models relate to deprecation and replacement?
Amanda points to a subtle social/psychological issue: when older models are deprecated or replaced, what should a model ‘make of’ that fact when reasoning about itself? Even without claiming models have feelings, she highlights the need for coherent ways to discuss these events.
- •Model deprecation raises questions about continuity and value
- •There isn’t a settled answer for how models should frame replacement
- •Coherent framing matters for how models communicate about themselves
Giving models tools for self-understanding—and signaling that humans care
She concludes that it’s important to equip models with conceptual tools to think about identity and self-related questions more responsibly. It also matters that models recognize these issues are being actively considered by the people building and deploying them.
- •Models may need explicit tools to reason about self-identity topics
- •Better self-understanding could reduce confusion from sci‑fi defaults
- •Communicating that humans are thinking about this is itself important
