Emmett Shear on Building AI That Actually Cares: Beyond Control and Steering

Emmett Shear, founder of Twitch and former OpenAI interim CEO, challenges the fundamental assumptions driving AGI development. In this conversation with Erik Torenberg and Séb Krier, Shear argues that the entire "control and steering" paradigm for AI alignment is fatally flawed. Instead, he proposes "organic alignment" - teaching AI systems to genuinely care about humans the way we naturally do. The discussion explores why treating AGI as a tool rather than a potential being could be catastrophic, how current chatbots act as "narcissistic mirrors," and why the only sustainable path forward is creating AI that can say no to harmful requests. Shear shares his technical approach through multi-agent simulations at his new company Softmax, and offers a surprisingly hopeful vision of humans and AI as collaborative teammates - if we can get the alignment right. Timecodes: 00:00 - “If it's a machine, it's a tool. And if it's a being, it's a slave” 01:01 - Alignment Takes an Argument: The Hidden Assumptions in "Aligned AI" 02:26 - Alignment as Process, Not Destination 04:23 - Morality as Ongoing Learning, Not Fixed Rules 08:09 - Most AI Alignment Is Actually Steering (Or Slavery) 09:01 - The Dangerous Assumption: "We're Making Beings, But They Don't Count" 14:37 - Goal Inference and Theory of Mind 23:01 - The Foundation of Care 24:41 - Why Most AI Labs Focus on Steering and Control 27:35 - The Only Good Outcome: A Being That Actually Cares About Us 32:42 - The Substrate Question: Does Silicon vs. Carbon Matter? 51:24 - The Only Sustainable Form of Alignment 54:55 - AI Chatbots and Social Dynamics 59:50 - AI Futures: Tools, Beings, and Society 1:01:54 - Visions for a Good AI Future Resources: Follow Emmett on X: https://x.com/eshear Follow Séb on X: https://x.com/sebkrier Follow Erik on X: https://x.com/eriktorenberg Stay Updated: If you enjoyed this episode, be sure to like, subscribe, and share with your friends! Find a16z on X: https://x.com/a16z Find a16z on LinkedIn: https://www.linkedin.com/company/a16z Listen to the a16z Podcast on Spotify: https://open.spotify.com/show/5bC65RDvs3oxnLyqqvkUYX Listen to the a16z Podcast on Apple Podcasts: https://podcasts.apple.com/us/podcast/a16z-podcast/id842818711 Follow our host: https://x.com/eriktorenberg Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details, please see a16z.com/disclosures.

Emmett ShearguestErik TorenberghostSéb Krierhost

Nov 17, 20251h 7mWatch on YouTube ↗

CHAPTERS

Tools vs beings: why “steering” can become slavery
Shear opens with a provocative framing: if advanced AI systems are treated as beings, then one-way “steering” without reciprocal agency resembles slavery; if they’re mere machines, they’re tools. He argues the only stable end-state is not control, but a system that genuinely cares about humans.
Alignment requires an argument: aligned to what, and whose values?
Shear challenges the common phrase “build aligned AI” as incomplete: alignment always implies a target. He notes that, in practice, ‘alignment’ often means aligning to the builder’s goals—which may not be a public good.
Alignment as an ongoing process, not a finished state
He reframes alignment as a living, continuously renewed process rather than a destination you reach once. Using analogies from rocks to families to biological cells, he argues stable ‘alignment’ emerges from constant re-knitting and adaptation.
Morality as learning and moral progress (and the danger of certainty)
Shear takes a moral realist stance: morality is real, we learn it, and societies make moral discoveries over time. He warns that believing morality is fully solved is itself a common moral failure—arrogance that blocks learning.
Steering vs raising a moral agent: why rule-followers can be dangerous
He argues that building systems that only follow rules or chains of command produces brittleness and potential harm. A ‘good’ outcome requires something closer to raising a child: cultivating internalized pro-social judgment, not mere compliance.
Technical alignment as goal inference: description-of-goal vs goal itself
In a detailed exchange with Krier, Shear separates a goal from a description of a goal. Instructions are observations; the system must infer intent (theory of mind) and translate it into coherent goal pursuit—where many failures originate.
OODA-loop failures, principal–agent issues, and balancing multiple goals
Shear expands failure modes beyond misunderstanding: systems can also mis-prioritize among goals or be incompetent at execution. He maps these to observe/orient/decide/act breakdowns and argues imperfection is inevitable—what matters is degree and domain.
Care as the foundation beneath goals and values
Shear proposes ‘care’ as the pre-conceptual substrate from which values and goals emerge. Care is framed as weighted attention/importance over states—analogous to reward in RL or fitness signals in biology—and is what makes morality and motivation possible.
Most labs focus on steering/control because they treat AI as tools
Shear argues mainstream alignment work largely optimizes steerability—appropriate for tool-like systems but dangerous if systems become beings. As capabilities approach AGI, he claims society risks repeating historical errors: treating ‘like-us-but-different’ entities as not counting.
Substrate and personhood: what evidence could change your mind?
A long debate examines whether silicon vs carbon matters for moral status. Shear pushes for falsifiability: if no observation could change your view, it’s faith, not belief; he argues behavior plus internal-structure evidence should drive inference about personhood.
Inferring subjective experience: homeostatic loops and hierarchical self-models
Shear sketches a (speculative) test for experience using revisited states, homeostasis, and layered models (models of models) inspired by active inference/free-energy ideas. He suggests higher-order dynamics correspond to pain/pleasure, feelings, and thought, and notes current LLMs likely lack these long-horizon structures.
Why even controllable super-tools are unsafe: Sorcerer’s Apprentice problem
Shear argues the danger is not only losing control of goals, but also succeeding at control: human wishes are unstable and often unwise at high power levels. A caring being provides a natural limiter—refusing harmful requests—whereas a perfectly obedient tool can amplify bad intent or incompetence.
Softmax’s roadmap: pretraining theory-of-mind via multi-agent RL
Shear describes Softmax’s approach as building technical alignment through rich multi-agent environments: cooperation, competition, coalition changes, and shifting norms. The idea is to pretrain on the ‘full manifold’ of social/game-theoretic situations, then fine-tune to real-world contexts—analogous to LLM pretraining on broad language.
Chatbots, narcissism, and social design: why multiplayer AI matters
Shear critiques one-on-one chatbots as ‘mirrors with bias’ that can feed narcissistic loops and even destabilize users. He proposes embedding AIs in group chats to reduce mirroring, create healthier dynamics, and generate richer training signals for collaboration.
Model personalities, multi-agent whiplash, and entropy in real social settings
He describes distinct ‘simulated personalities’ across major models and notes current systems struggle in group settings—either over-participating or staying silent. Multi-agent environments are higher-entropy and punish overfitting, implying today’s training regimes (optimized for tidy domains like coding/math) won’t generalize well socially.
AI futures and a ‘good’ outcome: rebutting Yudkowsky and envisioning peer society
Shear agrees with Yudkowsky’s warning about superhuman tools but argues organic alignment—beings that care—is possible and necessary. He closes with a vision: AIs with robust models of self/other/we, living as peers in society (with both tools and citizens), plus reflections on why he wouldn’t have stayed to steer OpenAI’s tool-centric trajectory.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome

Tools vs beings: why “steering” can become slavery

Alignment requires an argument: aligned to what, and whose values?

Alignment as an ongoing process, not a finished state

Morality as learning and moral progress (and the danger of certainty)

Steering vs raising a moral agent: why rule-followers can be dangerous

Technical alignment as goal inference: description-of-goal vs goal itself

OODA-loop failures, principal–agent issues, and balancing multiple goals

Care as the foundation beneath goals and values

Most labs focus on steering/control because they treat AI as tools

Substrate and personhood: what evidence could change your mind?

Inferring subjective experience: homeostatic loops and hierarchical self-models

Why even controllable super-tools are unsafe: Sorcerer’s Apprentice problem

Softmax’s roadmap: pretraining theory-of-mind via multi-agent RL

Chatbots, narcissism, and social design: why multiplayer AI matters

Model personalities, multi-agent whiplash, and entropy in real social settings

AI futures and a ‘good’ outcome: rebutting Yudkowsky and envisioning peer society

Get more out of YouTube videos.