a16zEmmett Shear on Building AI That Actually Cares: Beyond Control and Steering
CHAPTERS
Tools vs beings: why “steering” can become slavery
Shear opens with a provocative framing: if advanced AI systems are treated as beings, then one-way “steering” without reciprocal agency resembles slavery; if they’re mere machines, they’re tools. He argues the only stable end-state is not control, but a system that genuinely cares about humans.
Alignment requires an argument: aligned to what, and whose values?
Shear challenges the common phrase “build aligned AI” as incomplete: alignment always implies a target. He notes that, in practice, ‘alignment’ often means aligning to the builder’s goals—which may not be a public good.
Alignment as an ongoing process, not a finished state
He reframes alignment as a living, continuously renewed process rather than a destination you reach once. Using analogies from rocks to families to biological cells, he argues stable ‘alignment’ emerges from constant re-knitting and adaptation.
Morality as learning and moral progress (and the danger of certainty)
Shear takes a moral realist stance: morality is real, we learn it, and societies make moral discoveries over time. He warns that believing morality is fully solved is itself a common moral failure—arrogance that blocks learning.
Steering vs raising a moral agent: why rule-followers can be dangerous
He argues that building systems that only follow rules or chains of command produces brittleness and potential harm. A ‘good’ outcome requires something closer to raising a child: cultivating internalized pro-social judgment, not mere compliance.
Technical alignment as goal inference: description-of-goal vs goal itself
In a detailed exchange with Krier, Shear separates a goal from a description of a goal. Instructions are observations; the system must infer intent (theory of mind) and translate it into coherent goal pursuit—where many failures originate.
OODA-loop failures, principal–agent issues, and balancing multiple goals
Shear expands failure modes beyond misunderstanding: systems can also mis-prioritize among goals or be incompetent at execution. He maps these to observe/orient/decide/act breakdowns and argues imperfection is inevitable—what matters is degree and domain.
Care as the foundation beneath goals and values
Shear proposes ‘care’ as the pre-conceptual substrate from which values and goals emerge. Care is framed as weighted attention/importance over states—analogous to reward in RL or fitness signals in biology—and is what makes morality and motivation possible.
Most labs focus on steering/control because they treat AI as tools
Shear argues mainstream alignment work largely optimizes steerability—appropriate for tool-like systems but dangerous if systems become beings. As capabilities approach AGI, he claims society risks repeating historical errors: treating ‘like-us-but-different’ entities as not counting.
Substrate and personhood: what evidence could change your mind?
A long debate examines whether silicon vs carbon matters for moral status. Shear pushes for falsifiability: if no observation could change your view, it’s faith, not belief; he argues behavior plus internal-structure evidence should drive inference about personhood.
Inferring subjective experience: homeostatic loops and hierarchical self-models
Shear sketches a (speculative) test for experience using revisited states, homeostasis, and layered models (models of models) inspired by active inference/free-energy ideas. He suggests higher-order dynamics correspond to pain/pleasure, feelings, and thought, and notes current LLMs likely lack these long-horizon structures.
Why even controllable super-tools are unsafe: Sorcerer’s Apprentice problem
Shear argues the danger is not only losing control of goals, but also succeeding at control: human wishes are unstable and often unwise at high power levels. A caring being provides a natural limiter—refusing harmful requests—whereas a perfectly obedient tool can amplify bad intent or incompetence.
Softmax’s roadmap: pretraining theory-of-mind via multi-agent RL
Shear describes Softmax’s approach as building technical alignment through rich multi-agent environments: cooperation, competition, coalition changes, and shifting norms. The idea is to pretrain on the ‘full manifold’ of social/game-theoretic situations, then fine-tune to real-world contexts—analogous to LLM pretraining on broad language.
Chatbots, narcissism, and social design: why multiplayer AI matters
Shear critiques one-on-one chatbots as ‘mirrors with bias’ that can feed narcissistic loops and even destabilize users. He proposes embedding AIs in group chats to reduce mirroring, create healthier dynamics, and generate richer training signals for collaboration.
Model personalities, multi-agent whiplash, and entropy in real social settings
He describes distinct ‘simulated personalities’ across major models and notes current systems struggle in group settings—either over-participating or staying silent. Multi-agent environments are higher-entropy and punish overfitting, implying today’s training regimes (optimized for tidy domains like coding/math) won’t generalize well socially.
AI futures and a ‘good’ outcome: rebutting Yudkowsky and envisioning peer society
Shear agrees with Yudkowsky’s warning about superhuman tools but argues organic alignment—beings that care—is possible and necessary. He closes with a vision: AIs with robust models of self/other/we, living as peers in society (with both tools and citizens), plus reflections on why he wouldn’t have stayed to steer OpenAI’s tool-centric trajectory.
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome