a16zEmmett Shear on Building AI That Actually Cares: Beyond Control and Steering
At a glance
WHAT IT’S REALLY ABOUT
Emmett Shear argues alignment requires care, not mere control mechanisms
- Shear argues most “alignment” work is actually steering/control, which is acceptable for tools but becomes morally and practically dangerous if the system is a being—bordering on slavery if one-way control is imposed.
- He reframes alignment as an ongoing process (like family cohesion or biological homeostasis), claiming morality is learned and updated through lived experience rather than fixed rules or “tablets from on high.”
- He distinguishes goal descriptions from goals themselves, emphasizing that instruction-following requires robust goal inference, theory of mind, and prioritization across competing objectives—areas where failures look like classic alignment breakdowns.
- Shear claims even a perfectly controllable superhuman “tool” is catastrophic because human wishes are unstable and insufficiently wise for that power; the only sustainable outcome is an AI being that can refuse harmful instructions because it genuinely cares.
- Softmax’s approach is to train theory of mind and pro-social behavior via large-scale multi-agent reinforcement learning simulations, creating a “surrogate model for cooperation” analogous to how LLMs pretrain on broad language manifolds.
IDEAS WORTH REMEMBERING
5 ideasAlignment always implies “aligned to what,” not a generic property.
Shear argues the phrase “aligned AI” hides normative assumptions—often that alignment means “does what the builder wants,” which may not be a public good depending on who the builder is.
Treat alignment as a living process, not a solved end state.
He compares alignment to families and bodies: coherence requires continual re-knitting and learning, mirroring how humans revise moral beliefs (e.g., historical moral progress on slavery).
Instruction-following failures often come from goal inference, not “disobedience.”
He stresses you don’t give an AI a goal—you give a description that must be interpreted; without strong theory of mind, systems fill gaps incorrectly (the “clean the room, throw away the baby” trope).
“Technical alignment” includes inferring goals, prioritizing them, and acting competently.
Shear decomposes breakdowns into observing/orienting, deciding, and acting (OODA-like): misread intent, mishandle tradeoffs, or execute poorly—each producing different alignment failure modes.
Care is the substrate of values—goals are downstream of what an agent attends to.
He proposes “care” as a pre-conceptual weighting over world-states (akin to reward signals), which then generates values and explicit goals; alignment should cultivate the right care dynamics, not just rule-following.
WORDS WORTH SAVING
5 quotesMost of AI is focused on alignment as steering. That's the polite word. If you think that, that we're making are beings, you'd also call this slavery. Someone who, who you steer, who doesn't get to steer you back, who non-optionally receives your steering, that's called a slave. It's also called a tool if it's not a being. So if it's a machine, it's a tool, and if it's a being, it's a slave.
— Emmett Shear
Alignment takes an argument. Alignment requires you to align to something. You can't just be aligned.
— Emmett Shear
Alignment is not a thing. It's not a state. It's a process.
— Emmett Shear
Morality is very obviously the, an ongoing learning process and something where we, we make moral discoveries.
— Emmett Shear
A tool that you can't control, bad. A tool that you can control, bad. A being that isn't aligned, bad. The only good outcome is a being that is, that cares, that actually cares about us.
— Emmett Shear
High quality AI-generated summary created from speaker-labeled transcript.
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome