Skip to content
a16za16z

Emmett Shear on Building AI That Actually Cares: Beyond Control and Steering

Emmett Shear, founder of Twitch and former OpenAI interim CEO, challenges the fundamental assumptions driving AGI development. In this conversation with Erik Torenberg and Séb Krier, Shear argues that the entire "control and steering" paradigm for AI alignment is fatally flawed. Instead, he proposes "organic alignment" - teaching AI systems to genuinely care about humans the way we naturally do. The discussion explores why treating AGI as a tool rather than a potential being could be catastrophic, how current chatbots act as "narcissistic mirrors," and why the only sustainable path forward is creating AI that can say no to harmful requests. Shear shares his technical approach through multi-agent simulations at his new company Softmax, and offers a surprisingly hopeful vision of humans and AI as collaborative teammates - if we can get the alignment right. Timecodes: 00:00 - “If it's a machine, it's a tool. And if it's a being, it's a slave” 01:01 - Alignment Takes an Argument: The Hidden Assumptions in "Aligned AI" 02:26 - Alignment as Process, Not Destination 04:23 - Morality as Ongoing Learning, Not Fixed Rules 08:09 - Most AI Alignment Is Actually Steering (Or Slavery) 09:01 - The Dangerous Assumption: "We're Making Beings, But They Don't Count" 14:37 - Goal Inference and Theory of Mind 23:01 - The Foundation of Care 24:41 - Why Most AI Labs Focus on Steering and Control 27:35 - The Only Good Outcome: A Being That Actually Cares About Us 32:42 - The Substrate Question: Does Silicon vs. Carbon Matter? 51:24 - The Only Sustainable Form of Alignment 54:55 - AI Chatbots and Social Dynamics 59:50 - AI Futures: Tools, Beings, and Society 1:01:54 - Visions for a Good AI Future Resources: Follow Emmett on X: https://x.com/eshear Follow Séb on X: https://x.com/sebkrier Follow Erik on X: https://x.com/eriktorenberg Stay Updated: If you enjoyed this episode, be sure to like, subscribe, and share with your friends! Find a16z on X: https://x.com/a16z Find a16z on LinkedIn: https://www.linkedin.com/company/a16z Listen to the a16z Podcast on Spotify: https://open.spotify.com/show/5bC65RDvs3oxnLyqqvkUYX Listen to the a16z Podcast on Apple Podcasts: https://podcasts.apple.com/us/podcast/a16z-podcast/id842818711 Follow our host: https://x.com/eriktorenberg Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details, please see a16z.com/disclosures.

Emmett ShearguestErik TorenberghostSéb Krierhost
Nov 17, 20251h 7mWatch on YouTube ↗

EVERY SPOKEN WORD

  1. 0:001:01

    “If it's a machine, it's a tool. And if it's a being, it's a slave”

    1. ES

      Most of AI is focused on alignment as steering. That's the polite word. If you think that, that we're making are beings, you'd also call this slavery. Someone who, who you steer, who doesn't get to steer you back, who non-optionally receives your steering, that's called a slave. It's also called a tool if it's not a being. So if it's a machine, it's a tool, and if it's a being, it's a slave. Like, we've made this mistake enough times at this point. I would like us to not make it, uh, again. You know, they're, they're kind of like people, but they're not like people. Like, they do the same thing people do. They speak our language. They can, like, take on the same kind of tasks, but, like, they don't count. They're not real moral agents. Tool that you can't control, bad. A tool that you can control, bad. A being that isn't aligned, bad. The only good outcome is a being that is, that cares, that actually cares about us.

    2. ET

      Emmett, Seb, welcome to the podcast. Thanks for joining.

    3. ES

      Thank you for having me.

    4. ET

      So Emmett, with, uh, with Softmax, you're, you're focused on, on alignment and making, uh, AIs organically align with people. Uh, can, can you explain what that means and how, how you're trying to do that?

  2. 1:012:26

    Alignment Takes an Argument: The Hidden Assumptions in "Aligned AI"

    1. ES

      When people think about alignment, I think there's a lot of confusion. People talk about things being aligned, like, "We need to build an aligned AI." And the problem with that is it's-- when someone says that, it's like we need to go on a trip, and I'm like, "Okay, I-I do like trips, but, like, wh-where? Where are we going again?" And with alignment, alignment is a, a, uh, a takes an argument. Alignment requires you to align to something. You can't just be aligned. That's like, that's-- I mean, I guess you could be aligned to yourself, but even then, like, you don't want to tell them what I'm aligning to is myself. Um, and so this idea of an abstractly aligned AI, I think slips a lot of... It slips a lot of assumptions past people because it starts-- y-you sort of assumes that there's, there's, like, one obvious thing to align to. Um, I find this is usually the goals of the people who are making the AI. Um, that's normally what they, what they mean when they say, "I want to make an aligned-- I want to make an AI that does what I want it to do." That's what they normally mean. Um, and that's, uh, that's a pretty normal and natural thing to mean by alignment. I'm not sure that that's a what I would regard as, like, a public good, right? Like, it depends. I guess it depends on who it is. If it, if it was like Jesus or the Buddha was like, "I am making an aligned AI." I'd be like, "Okay, yeah, aligned to you, great. I'm, I'm down. Like, sounds good. Um, sign me up." But, like, but most of us, myself included, I don't-- wouldn't prescribe as necessarily being, being at that level of spiritual development, and therefore, uh, perhaps want to think a little more carefully about what we're aligning

  3. 2:264:23

    Alignment as Process, Not Destination

    1. ES

      it to. And so when we talk about organic alignment, um, I think the important thing to, to recognize is that alignment is not a thing. It's not a state. It's a process. And, like, this is, this is one of these things that's, it's, it's as broadly true of almost everything, right? Is a rock a thing? I mean, there's a view of a rock as a thing, but if you actually zoom in on a rock really carefully, a rock is a process. It's this endless oscillation between the atoms over and over and over again, reconstructing rock over and over again. Now, the rock's a really simple process that you can kind of like coarse grain very meaningfully into being a thing. Um, but alignment is not like a rock. Alignment is a complex process. Um, and alig- organic alignment is the idea of treating alignment as an ongoing, um, sort of living process that has to constantly rebuild itself. And so you can think of, uh, the way that-- how do people in families stay aligned to each other, stay aligned to a family? And the way they do that is not by, like-- They're not like-- You don't, like, arrive at being aligned. You're constantly re-knitting the fabric that keeps the family going, and in somewhat sense, the family is the pattern of re-knitting that, that, that happens. And if you stop doing it, it goes away. Um, and this is similar for things like, uh, like cells in your body, right? Like, you-- There isn't, like, your cells aligned to being you, and they're done. It's this constant ever-running process of cells deciding, what should I do? What should I be? Do I need to be a new job? Like, do I need to-- Should, should we be making more red blood cells? We're making fewer of them. Like, you aren't a fixed point, so they can't-- There is no fixed alignment. And it turns out that our society is like that. When people talk about alignment, what they're really talking about, I think, is, "I want an AI that is morally good," right?

  4. 4:238:09

    Morality as Ongoing Learning, Not Fixed Rules

    1. ES

      Like, that's what they really mean. It's like this, uh, will be-- act as a morally good being, and acting as a morally good being is a process and not a destination. We don't-- We never-- Unfortunately, we've, we've tried taking down tablets from on high that tell you how to be a morally good being, and we use those, and they may be helpful, but somehow they are not being, like-- You can read those and try to follow those rules and still make lots of mistakes. And so, you know, I don't-- I'm not gonna claim I know exactly what morality is, but morality is very obviously the, an ongoing learning process and something where we, we make moral discoveries. Like, historically, people thought that slavery was okay, and then they thought it wasn't, and I think you can very meaningfully say that we made moral progress. We made a moral discovery by realizing that that's not good. Um, and, and if you think that there's such a thing as moral progress, if you think there's... Or even just learning how better to pursue the moral goods we already know, then you have to believe that alignment, aligning to morality, align-- being a moral being is a process of, of constant learning and of growth to, to, to reinfer what should I do from experience. And the fact that no one has any idea how to do that should not dissuade us from trying because that's what humans do. Like, it's really obvious that we do this, right? Somehow, somehow, just like we used to not know how people, humans walked or saw, somehow we have experiences where we're acting in a certain way.And then we have this realization, I've been a dick. That was bad. I, I thought I was doing good, but in retrospect, I was doing wrong. What's-- And it's, and it's not, like, random. Like, people have the same-- Actually, so it's, like, there's, like, a bunch of classic patterns of people, people having that realization. It's, like, a thing that goes, happens over and over again. So it's not random. It's, like, a predictable series of events that look a lot like learning, where you change your behavior, and often the impact of your behavior in the future is more pro-social, and that you are better off for doing it. And, like, so, so I'm a moral-- As I'm saying, I'm, I'm taking a very strong moral realist position. There is such a thing as morality. We really do learn it. It really does matter. Uh, and organic alignment, and that it's not something you finish. In fact, one of the key things that-- One of the key moral mistakes is this belief, "I know morality. I know what's right. I know what's wrong. I don't need to learn anything. No one has anything to teach me about morality." That's, like, one of the main, the main arrogant-- That's arrogance, and that's, that's one of the main moral da- things you can do that's dangerous. And so what do we-- When we talk about organic alignment, organic alignment isn't aligning an AI that is capable of doing the thing that humans can do. And to some degree, like, I think animals can do at some level, although humans are much better at it, of th-the learning of how to be a good family member, a good teammate, a good member of society, a good, a, a good member of all sentient beings, I guess. How to be a part of something bigger than yourself in a way that is healthy for the whole rather than unhealthy. And Softmax is dedicated to researching this, and I think we've made some really interesting progress. But, like, the main message, you know, I, I, I go on podcasts like this to spread, the main thing that I hope Softmax accomplishes above and beyond anything else is, like, to focus people on this as the question. Like, to-- This is the thing you have to figure out. If you, if you can't figure out how to build-- how to raise a child who cares about the people around them,

  5. 8:099:01

    Most AI Alignment Is Actually Steering (Or Slavery)

    1. ES

      if you have a child that only, only follows the rules, that's not a moral person that you've raised. You've raised a dangerous person, actually, who will probably do great harm following the rules. And if you make an AI that's good at following your chain of command and good at following your, whatever rules you came up with for what morality is and what good behavior is, that's also gonna be very dangerous.

    2. SK

      Yeah.

    3. ES

      And so that is, that's what-- And so, and that, and that we should-- That's the bar. That's what we should be working on, and that's what everyone should be committed to, like, figuring out. Um, and, uh, if someone beats us to the punch, great. I mean, I don't think they will, 'cause I'm, like, really bullish on our approach, and I think the team's amazing. But, like, [chuckles] this is, uh, it's maybe-- It's, it's the first time I've run a company where truly I can say with a whole heart, "If someone beats us, thank God." Like, I hope somebody figures it out.

    4. SK

      Yeah.

  6. 9:0114:37

    The Dangerous Assumption: "We're Making Beings, But They Don't Count"

    1. SK

      Yeah, I mean, it's, um, yeah, I have a lot of, um, you know, similar intuitions about certain things. Like, um, I also dislike the, um, you know, the idea that kind of, you know, we just need to, like, crack the, the few kind of values of something and just cement them in time forever now, and, you know, we've kind of solved morality or something. And I've always kind of been skeptical about, you know, how the alignment problem has been, can conceptualize as something to kind of solve once and for all, and then you can just, you know, do, do AI or do AGI. Um, but the, um, I guess I understand it in, in a slightly different way. I guess maybe less based on kind of moral realism, but, you know, there's a kind of the technical alignment problem, which I kind of think of broadly as how do you get an AI to do what you, you know, how do you get it to follow instructions, like, you know, broadly speaking. And I think that was, you know, more of a challenge, I think, pre-LLMs, I guess, when people were talking about reinforcement learning and looking at these systems, whereas post-LLMs, we've realized that many things that we thought were going to be difficult were somewhat easier. And then there's a kind of second question, a kind of normative question of to whose values, right, what are you aligning this thing to? Which I think is, is the kind of thing you're commenting on, Emmett. And, um, and for this, I, um, yeah, I tend to be very skeptical of approaches where, you know, you need to kind of crack the, the, the, the, the kind of Ten Commandments of alignment or something and then, and then we're good. And here I think I have, like, intuitions that are unsurprisingly a bit more, like, political science based or something, in that, like, okay, it is a process. And, and I like the kind of bottom-up approach to some degree of, well, you know, how do we do it in re-real life with people, right? No one comes up with, you know, "I've got this..." And so you have, like, processes that allow, like, ideas to kind of, you know, clash. You have good people with different ideas, opinions, views, and stuff to kind of coexist as well as they can within a wider system. And, like, you know, and with, with humans, that system is liberal democracy or something, and, um, you know, or, you know, at least in, in some countries. And that allows more of that kind of, um, uh, you know, these kind of ideas, these values to be kind of discovered and construed over time. And, um, and I think, you know, for alignment as well, I tend to think, yeah, there's, there's, um, on the normative side, I agree with some of your intuitions. I'm less clear about now what I exactly, you know, what does it look like now if we're gonna implement this into an AI system, at least the ones we have today.

    2. ES

      I agree, I agree that there's this, I think, idea of technical alignment that I, I think I would be able to define a little differently, but it, it's sort of the sense of, like, if you build a system, can it be described as being coherently goal following at all? Regardless of what those goals are. Like, lots of systems aren't coherently... They're not well described as having goals. Um, they just kinda do stuff. And if you're gonna have something that's, like, aligned, you, it has to have coherent goals, otherwise those goals can't, can't be aligned with anyone else's goals, um, kind of by definition. Is that sort of, is that-- Would you, is that a fair assessment of what you mean by technical alignment?

    3. SK

      I mean, I'm, I'm not fully sure, right? Because I think if I give a model a certain goal, uh, then I would like the model to kind of follow that instruction and kind of reach that particular goal rather than it having a goal of its own that, you know, I can't-

    4. ES

      Well-

    5. SK

      Yeah.

    6. ES

      Well, wait, if you give it a goal, it has that goal.

    7. SK

      Right. And, and great.

    8. ES

      That's what it means to give someone something, right?

    9. SK

      Sure, yeah. If I, if I, if I, you know, if I instruct it to do X, then I would like it to do X and not, you know, uh, uh, slight different variants of X, essentially. I wouldn't want it to reward hack. I wouldn't want it to do some-

    10. ES

      Well, but you, you, but you, when you tell it to do X, you're transferring like a, a, a series of like a byte string in a chat window or like a, a, a series of audio vibrations in the air, right? You're not, you're not transplanting a goal from your mind into its, you're giving it an observation that it's using to infer your goal.

    11. SK

      Yeah. I mean, in, in some sense, yeah. I, I can communicate a series of instructions, and I want it to infer what I'm, you know, saying essentially as accurately-

    12. ES

      Yeah. You-

    13. SK

      ... as it can, given what it knows of me-

    14. ES

      You-

    15. SK

      ... and what I'm asking it.

    16. ES

      You, you want it to infer what you meant.

    17. SK

      Uh-huh.

    18. ES

      Right? Like, like that's, like, 'cause in some sense, there's no, the byte sequence that you send over the wire to it has no absolute meaning. It has to be interpreted.

    19. SK

      Mm-hmm.

    20. ES

      Right? Like, the, the, the, that byte sequence could mean something very different with a different code book.

    21. SK

      Yeah. Well, I guess I'll-

    22. ES

      So-

    23. SK

      O- one way, you know, I think I remember in, in, um, when I was first getting into, um, AI and, you know, these kind of questions maybe like a decade ago. So you had these examples of, you know, um, I think it was Stuart Russell in a textbook, uh, will give the AI a goal, but then it won't exactly do what you're asking it, right? You know, clean the room, and then it goes and cleans the room but takes the baby and puts it in the trash.

    24. ES

      Like-

    25. SK

      Like, uh, this is not what I meant. Like, uh, whereas I think with that-

    26. ES

      But, but, but, but like, wait, hold on. But this is, this is the thing where I think people, this is the, you have to, like you, you, we're jumping over a step there. You didn't give the AI a goal. You gave the AI a description of a goal. A description of a thing and a thing are not the same. I can tell you an apple, and I'm e-evoking the idea of an apple, but I haven't given you an apple. I've given you a des- you know, it's red, it's shiny, it's this size. That's a description of an apple.

    27. SK

      Mm-hmm.

    28. ES

      But it's not an apple. And giving someone, "Hey, go do this," that's not a goal. That's a description of a goal. And for humans, we're so fast, we're so good at turning a description of a goal into a goal. We do it, we do it so quickly and naturally, we don't even see it happening. Like, we think that, we get confused, and we think those are the same thing. But you haven't, you haven't given it a goal. You've given it a description of a goal that you want it to, you, you hope it turns back into the goal that is the same as the goal that you, you described inside of you.

    29. SK

      Right. You think I thought-

    30. ES

      You could give it a goal directly by reading your brainwaves and synchronizing its state to your brainwaves directly. I think that would meaningfully, you could say, "Okay, I'm giving it a goal. I'm synchronizing it- its internal state to my internal state directly, and this internal state is the goal, and so now it's the same." But I, I don't, most people aren't, don't mean that when they say they, they gave it a goal.

  7. 14:3723:01

    Goal Inference and Theory of Mind

    1. ET

      you're making, Emmett, important because there's some lossiness between the description and the actual g- or what, what-

    2. ES

      Yeah. Well, well-

    3. ET

      Why is the distinction there?

    4. ES

      It, it goes back to my, what I was saying, like, this is a, you, technical alignment is the capacity of an AI I, that I put forward, right? I, I wanna check if we're, like, on the same page about it. Is the capacity of an AI to be good at inference about goals and, like, be good at inferring from a description of a goal what goal to, to actually take on, and good at, once it takes on that goal, acting in a way that is actually in concordance with that goal coming about. So it is both pieces. You, you have to be able to, you have to have the theory of mind to infer what the, what that description of a goal that you got, what goal that corresponded to. And then you have to have a theory of the world to understand what actions corresponds that goal occurring. And if either of those things breaks, it kinda doesn't matter what goal you were s- if, if you can't consistently do both of those things, you're not, which I think of as being a coherent, inferring goals from observations and acting in accordance with those goals is what I think of as being a coherently goal-oriented being. 'Cause that's what, whether I'm inferring those goals from someone else's instructions or from the sun or tea leaves, the process is get some observations, infer a goal, use that goal, infer some actions, take action. And if you, an, an AI that can't do that is not technically aligned or not technically alignable, I would even say. It lacks the capacity to be aligned because it can't, it's not competent enough.

    5. SK

      And, and you think language models don't do that well? As in they, they kind of fail at that or they're not-

    6. ES

      People fail at both those steps all the time. Constantly.

    7. SK

      Right.

    8. ES

      I, I tell people, I tell employees to do stuff and like, yeah, but then it, but, but-

    9. SK

      Principal agent problems, right?

    10. ES

      People fail at, like, breathing all the time too, and I wouldn't say that we can't breathe. I'd just say that we're, like, not gods. Like, we are, yes, we are imperfectly, we are somewhat coherent, relatively coherent things. Just like we're, am I big or am I small? Well, I don't know, compared to what? I'm, humans are more relatively goal coherent than any other object I know of in the universe, which is not to say that we're one hundred percent goal coherent. We're just, like, more so. And I think this, you're never gonna get something that's perfectly... Th- the, the universe doesn't give you perfection. It gives you relatively some amount of quanti- it's a quantifiable thing how good you are at it, at least in a certain domain. I guess my, my question is like y- do you think that, does that capture what you're talking about with technical alignment or are you talking about a different thing? 'Cause-

    11. SK

      Yeah, no, I think-

    12. ES

      I really care a lot about that thing.

    13. SK

      Yeah. I mean, I definitely care about that-

    14. ES

      Yeah

    15. SK

      ... to some extent. I might, like, understand it slightly differently, but I guess I, I might think of it through the lens of maybe principal agent problems or something. You know, you, you kind of instruct someone e- even, you know, I guess in, in human terms, you know, to do a thing. Are they actually doing the thing? What are their incentives and motivation? And, you know, not necessarily even intrinsic, but kind of situational to actually do the thing you've asked them to do. And, um, and in some instance-

    16. ES

      But there's-

    17. SK

      Sorry, yeah?

    18. ES

      There's, there's a, there's a third thing. So principal agent problems, I would, I would expand what I was saying into another part, which is like, you might already have some goals, and then you inferred this new goal from these observations.

    19. SK

      Mm-hmm.

    20. ES

      And then, like, are you good at, are you good at balancing the relative importance and relative threading of these goals with each other? Which is another skill you have to have, and if you're bad at that, you'll fail. You m- you could be bad at it because you overweight bad goals, or you could be bad at it 'cause you're just incompetent and, like, can't figure out that obviously you should do goal A before goal B.

    21. SK

      It, it feel like a version of, like, common sense or something, right? Like, the kind of thing that, you know, in fact, in the, in the kind of robot cleaning the room example thing, um, you know, you would expect them to have understood that goal of the robot to, like, essentially not put the baby in the trash can or something and just actually do the right sequence of action.

    22. ES

      Right. Well, it, it f- th- in that case, it, it failed the, that, that robot very clearly failedGoal inference. You gave it a description of a goal, and it inferred the wrong states to be-- the, the wrong goal states. That is-- that's just incompetence.

    23. SK

      Mm-hmm.

    24. ES

      It doesn't-- it, it is, it is incompetent at inferring goal states from observations. Um, children are like this too. Like, you know, and then-- and honestly, if you ever played the-- done the game where you, you, you give someone instructions to make a peanut butter sandwich and then they follow those instructions exactly as you've written them without filling in any gaps, it's hilarious.

    25. SK

      [laughs]

    26. ES

      Because y-you can't do it. It's impossible. Like, you think you've done it, and you haven't. And, like, they put the-- they wind up putting the, like, the knife in the toaster and, like-

    27. SK

      [laughs]

    28. ES

      ... the peanut butter-- They don't open the peanut butter jar, so they're just jamming the knife into the top lid of the peanut butter jar and, like, it's endless. And like, because actually, if you don't already know what they mean, it's really hard to know what they mean. Like, we were-- The reason humans are so good at this is we have a really excellent theory of mind. I already know what you're likely to ask me to do. I already have a good model of what your goals probably are. So when you ask me to do it, I have an easy inference problem. Which of the seven things that he wants is, is he indicating? But if I'm a newborn AI that doesn't ha- that doesn't have a great model of people's internal states, then, like, I don't know what you mean. It's just incompetent. It's not like-- Which is separate from, I have some other goal, and I knew what you meant, but I decided not to do it because I, I-- there's some other goal that's competing with it, which is another thing you can be bad at.

    29. SK

      Mm-hmm.

    30. ES

      Which is, again, different than I had the right goal, I inferred the right goal, I inferred the right priority on goals, and then I'm just bad at doing the thing. I'm trying, but I, I'm, I'm incompetent at doing. Um, and these roughly correspond to the OODA loop, right? Like, uh, bad at observing and orienting, bad at deciding, bad at acting. And if you're bad at any of those things, you won't, you won't be good. Um, and then I think there's this other problem that you-- I like the, the separation of-- between technical alignment and value alignment, which is, like, are you good-- If, if we t-told you the right goals to go after somehow-

  8. 23:0124:41

    The Foundation of Care

    1. ES

      things, and care is not conceptual. Care is non-verbal. It doesn't indicate what to do. It doesn't indicate how to do it. Uh, care is a relative weighting over effectively, like, attention on states to-- It's a relative weighting over like, uh, uh, which states in the world are important to you. And I, I care a lot about my son. What does that mean? Well, it means the s-- his states, the states he could be in are like... I sh- I pay a lot of attention to those, and those matter to me. Um, and you can care about things in a negative way. You can care about your enemies and what they're doing, and you, you can, you can desire for them to do, do bad. But I think that, like-- And so you don't just want it to care about us. You want it to, to care about us and like us too, right, maybe. But, but like, but the foundation is care. Until you care, you don't know why should I pay more attention to this person than this rock? Well, because I care more. And that, what is that care stuff? And I think that the-- what, what it appears to be, if I had to, like, guess, is that the, the care stuff, it s-s-s-sounds so stupid, but, like, care is basically, like, uh, reward. Like, like, how much does this state correlate with survival? How much does this s-sc-- this, this state correlate with your inclusive, your full inclusive, uh, reproductive fitness?For a, for a somewhat thing that learns evolutionarily or for a reinforcement learning agent like a LLM, how much does this correlate with reward? Does this state correlate with, with, with my predictive loss and my RL loss? Good. That's, that's a state I care about. [laughs] I think that's kind of what it is.

    2. ET

      Right. The,

  9. 24:4127:35

    Why Most AI Labs Focus on Steering and Control

    1. ET

      the other part of, um, Seb's question was just how does this, what does this look like in AI systems? And maybe another way of asking it is like, um, when you, when you talk to the people most focused on alignment at the, at the major labs as, as obviously you have over the years, how, how does your interpretation differ from their interpretation and how does that inform, you know, what you guys might go do, um, differently?

    2. ES

      Most of AI is focused on alignment as steering. That's the polite word, um, or control, which is slightly less polite. If you think that we're making our beings, you would also call this slavery. Um, uh, someone who, who you steer, who doesn't get to steer you back is, is slave. You know, who non-optionally receives your steering, that's called a slave. Um, and, uh, uh, it's also called a tool if it's not a being. So if it's a machine, it's, it's a, it's a tool, and if it's a being, it's a slave. And, uh, the-- I think that the different AI labs are pretty divided as to whether they think what they're making is a tool or a machine. Um, I think some of the AIs are definitely more tool-like, and some of them are more machine-like. I don't think there's a binary between tool, tool and being. It seems to be that it, it, you know, sort of moves gradually. And I think that, uh, I guess I'm a, I'm a functionalist in the sense that I think that something that in all ways acts like a being, that you cannot distinguish from a being and its behaviors, is a being. 'Cause I don't know how to tell, uh, on what other basis I think that other people are beings other than they seem to be. Like, they look like it, they act like it, they, they match, they match my priors of what beings, behaviors of beings look like. I pre-- I get, I get lower predictive loss when I treat them as a being. And the thing is, I get lower predictive loss when I treat ChatGPT or Claude as a being. Now, not as a very smart being. Like I think that like a fly is a being, and I don't care that much about its behavior, about its, you know, its, its states. So ju-just 'cause it's a being doesn't mean that like it's a problem. Like, we sort of enslave horses in a sense, and I don't think, I'm not, I don't think there's a real issue there. Y- And you even-- And there's a thing we do with children that's, can look like slavery, but it's not. You, you control children, right? But the children's states also control you. Like, yes, I tell my son what to do and make him go do stuff, but also when he cries in the middle of the night, he can tell me to do stuff. Like, there's a real two-way street here because, because it's not, uh, which is not necessarily symmetric. It's hierarchical, but, but, uh, but two-way. And basically, I think that as the AIs, as the-- It's good to focus on control, steering and control for tool-like AIs, and we should continue to develop strong steering control techniques for the more tool-like AIs that we build. And we are clearly-- They, they're, they're saying they're building an AGI. An AGI will be a being.

  10. 27:3532:42

    The Only Good Outcome: A Being That Actually Cares About Us

    1. ES

      You can't be an AGI and not be a being, because something that has the general ability to effectively use judgment, think for itself, discern between pro-possibilities, is obviously a thinking thing. Like, and so as you app-go from what we have today, which is mostly a very specific intelligence, not a general intelligence, but as labs succeed at their goal of building this general intelligence, we really need to stop using the steering control paradigm. That's like, we're gonna, we're gonna do the same thing we've done every other time our society has run into people who are like us but different. Like, these people are like, you know, they're, they're kinda like, like, like the people, but they're not like people. Like, they do the same thing people do. They speak our language. They can, like, take on the same kind of tasks. But, like, they don't count. They're not real moral agents. Like, we've made this mistake enough times at this point. I would like us to not make it, uh, again, um, as it comes up. And so our, our view is, is to, is to make the AI a good teammate, make the AI a good, a good citizen, make the AI a good, a good member of your group. That's, that's the f-form of alignment that is scalable and you can, you can will on other humans and other beings as well as onto, uh, and therefore onto AI as well.

    2. SK

      Yeah, I suppose this is kind of where I, I probably differ in my understanding of, of AI and AGI. And I guess I kind of continue seeing it as a tool, even as it kind of reaches a certain level of generality. And again, I wouldn't necessarily see more intelligence as meaning deserving of more care necessarily. Like, you know, as, as a certain level of intelligence, you know, now he deserves certain moral rights or something, or, you know, something changes fundamentally. And, um, and I guess, you know, I guess I, I, at the moment, I'm somewhat skeptical of computational functionalism. And so I think there's something intrinsically different between, I guess, um, uh, an AI, an AGI, and no matter kind of how intelligent, uh, or capable. And, and I can totally see, you know, or imagine agents with kind of long-term goals and, and doing kind of, you know, operating, I guess, like, as we, you and I might be, but without that having the same implications as, you know, um, I guess you, you're referring, I guess, to, to slavery, uh, but, you know, th-th-they're not the same, right? Like, I think in the same way as a model saying, "I'm hungry," does not have the same implications as a human saying, "I'm hungry." So I think the substrate does matter to some degree, including for thinking about, you know, whether to think of this as some sort of other being, whether it has, you know, s- and if there are sim, um, similar normative considerations, I guess, about how to treat and act with it.

    3. ES

      C-can I ask you about that? Like, are, what observations would change your mind? Is there any observation you could make that would cause you to infer this thing is a being instead of not a being?

    4. SK

      I guess it depends with how you define being, um, right? Like, I mean, I can, I can, I could conceptualize it as a mind, um, and that's fine.

    5. ES

      This, I, I have a, I have a, I have a program that's running on a silicon substrate, some, some big, complicated machine learning program running on a s- on a substrate, on a, on a, on a silicon substrate. So you, you know, you observe, you observe that. You observe that it's on a computer.And you interact with it, and it does things and, you know, it, it takes actions and has observations. Is there anything you could observe that would change your mind about whether or not it was a moral patient or whether it was a, a moral agent about whether or not it, uh, it had feelings and thoughts and, you know, it had subjective experience? Like, could-- What would you have to observe that... What, what, yeah, what's, what's the test? Is, or and is there one?

    6. SK

      I mean, there's a, there's a lot of different kind of questions here. I think, you know, um, some conf- on one hand, there's like, you know, normative considerations, you know, because you, you can give rights to things that aren't necessarily beings. You know, like a company has rights in some sense, and that, you know, these are kind of useful for various purposes. And I think also the, um, you know, biological, I think, beings and, and systems have very different kind of substrate. You know, you can't separate certain needs and, and particularities about what they are from the substrate. So, you know, I, I can't copy myself. I can't, you know... If, if someone stabs me, I, I probably die. Uh, whereas I think, you know, um, machines have very different, uh, substrate. I think there's, there's more fundamental also the kind of disagreement around what happens, uh, at the computational level, which I think is different, uh, to what happens with biological systems. But, but I, yeah, I, so I don't know. I-

    7. ES

      No, no, I, I, I agree that, like, if you have a program that you copy many times-

    8. SK

      Mm-hmm.

    9. ES

      You don't harm the program by, like, deleting one of the copies, like, in any meaningful sense, so therefore, that wouldn't count as, like... No, no information was lost, right? There's no, there's nothing meaningful there. I'm asking a very different question. Like, there's just one copy of this thing running on one computer somewhere, and I'm just saying like, "Hey, is it a person?" Like, w- y-y-you know, it, it, it walks like a person, it talks like a person, and it like, it, you know, it's in some, it's, it's in some android body, and you're like, but it's running on silicon. And I'm asking like, what-- is there some observation you could make that would make you say like, "Yeah, this is a person like me, like other biolo- like other people that I care about, that I grant personhood to." Or, and not, like, for instrumental reasons.

    10. SK

      Mm-hmm.

    11. ES

      Not because like, "Oh, yeah, we're, we're giving it a right because, like, we give a corporation rights or whatever." I mean, like, you know, where you, you think some people, you care, you care about its experiences. What would it... Is there, is there, is there an observation you could make that could change your mind about that or

  11. 32:4251:24

    The Substrate Question: Does Silicon vs. Carbon Matter?

    1. ES

      not?

    2. SK

      I have to think about it, but I, I think, you know, it even depends what we mean by person. And, you know, in some sense, I care about certain corporations too. So I'm, I'm-

    3. ES

      No, no, no. I mean, but like, you care about, like, other people in your life, right?

    4. SK

      Yes.

    5. ES

      Okay, great. You know, like, you care about some people more than others, but, like, all, all people you interact with in your life are in some range of care.

    6. SK

      Mm-hmm.

    7. ES

      And you care about them not the way you care about a car, but you care about them as a, a being whose, whose experience matters in itself, not merely as an, as a means, but as an ends, right?

    8. SK

      Well, because I believe they have experiences, right? And, and by definition-

    9. ES

      Yeah, yeah, yeah. What would it take... And I'm asking you the very, the very direct question. What would it take for you to believe that of a, some, of an AI running on silicon, like, instead of it being biological? Like, so the difference is it's a s- its behaviors are roughly similar, but the difference is it's a substrate. What would it take for you to give it that same, w- to, to extend that same inference to it that you do to all these other people in your life that you-

    10. ET

      Can I, can I ask what your answer... I'm taking Seb's non, non-answer as, uh, sort of-

    11. ES

      Yeah.

    12. ET

      It's unlikely that he, he would grant or, or I'll just... For myself, it seems hard for me to imagine giving the sa-same level or similar level of personhood in the same way I don't, I don't give it to a-animals either. And if you were to ask, you know, what would need to be true for, for animals, I, I probably couldn't get there either. W-what would it take for you?

    13. ES

      Wait, you, you couldn't? I could imagine for an animal. It's so easy. This chimp comes up to me, he's like, "Man, I'm so hungry," and like, "You guys have been so mean to me, and I'm so glad I figured how to talk." Like, "Can we go to, can we go chat about, like, the rainforest?" I'd be like, "Fuck, you're definitely a person now."

    14. ET

      Yeah.

    15. ES

      Like, for sure. Um, I mean, I'd first wanna make sure I wasn't hallucinating. But like, but like, you know, I can... It's easy for me to imagine an animal. Come on, it's really easy. It's, like, trivial. I'm not saying that you would get the observation. I'm just saying, like-

    16. ET

      Right.

    17. ES

      It's trivial for me to ans- imagine an animal that I would extend personhood to under a set of observations. Um, so, like, r-really? Like-

    18. ET

      Well, I'd factor that, I, I take that imagination-

    19. ES

      You wouldn't extend that? Yeah.

    20. ET

      Uh, you know, imagining a chimp ta-talking, um, yeah, that-

    21. ES

      Yeah.

    22. ET

      That's a bit closer to it. I, uh, what's your answer to the question that you bring up about the AI?

    23. ES

      Um, I guess at, at a metaphysical level, I would say, uh, if there is a belief you hold where there is no observation that could change your mind, you don't have a belief, you have an article of faith. You have an assertion. Because real beliefs are inferences from reality, and you're n- you can never be a hundred percent confident about anything. And so there should always be, if you have a belief, something, however unlikely, that would change your mind.

    24. SK

      Oh, yeah, I'm open to it. I mean, just to be clear, not [laughs] like...

    25. ES

      Right. Yeah.

    26. SK

      No, I'm just saying I'm open-

    27. ES

      Yeah. No, no, no, just-

    28. SK

      There's nothing ever-

    29. ET

      Yeah, he just hasn't got to it. Yeah.

    30. ES

      Yeah. Yeah, yeah. Uh, so no, I'm, I'm curious, like... So my answer is, uh, basically, if under, if its surface level behaviors looked like a human, and then if after I probed it, it continued to act like a human, then I continued to interact with it over a long period of time, and it continued to act like a human in all ways that I understand as being meaningful to me interacting with a human. Like, I interact with-- There's a whole set of people I'm really close to who I've only ever act- interacted to over text. Yet I, I, I infer the person behind that is a real pr- thing. If it could, if I, if I felt care for it, I would infer eventually that I was right. And then someone else might, might demonstrate to me that, uh, you've been tricked by this algorithm, and actually, look how obvious it's, like, not actually a thing. And I'd be like, "Oh, shit, I was wrong," and then I would not care about it. Like, I would... But I would, I, you know, the preponderance of the evidence, I don't know what else you could possibly do, right? Like, I infer other people are, matter because I interact with them enough that they, they seem to have rich inner worlds to me after I interacted with them a bunch. That's, that's why I think the other, other people are important.

  12. 51:2454:55

    The Only Sustainable Form of Alignment

    1. ET

      And what, what can you say about your, your strategy of how you're trying to a-a-achieve or even a-attempt to achieve this, this, this, this level, like in terms of research or roadmap or what would-

    2. ES

      Yeah. So, uh, the-- In order to be good at te- We're basically focused on technical alignment, at least as the, we are, I was discussing it, which is like you have these agents and they're ba- they have bad theory of mind. You say things and they're bad at, at inferring what the goal states in your head are, and they're bad at inferring how their behavior will be in-- other agents will infer what their goal states are. So they're bad at cooperating on teams, and they're bad at, uh, they're bad at understanding how certain actions will cause them to acquire new goals that are bad that they shouldn't, that they wouldn't reflectively endorse. So there's this parable of like the vampire pill. Would you take this pill that like turns you into a vampire who would kill and, you know, torture everyone you know, but you'll feel really great about it after you take the pill? Like, obviously not. That's a terrible pill. But like, but why not? Y-Your, by your own score in the future, it will score really high on the r-rubric. No, no, no, no, no. Because it matters, you, you have to use your theory of mind on your future self, not your future self's theory of mind. And so like, they're bad at that too. Um, and so they're bad at all this theory of mind stuff. And so how do you learn theory of mind? Well, you put them in, in simulations and contexts where they have to cooperate and compete and, and collaborate with other AIs, and that's how they get points, and you train them in that environment over and over again until they get good at... And then, and you d- then you, you do what they did with LLM. So LLMs, how do you get it to be good at, you know, writing your email? Well, you train it on all language it's ever been gener- all possible, you know, email, text strings it could possibly generate, and then you have it generate the, the one you want. It's a sur- you can make a surrogate model. Well, we're, it is a, we're making a surrogate model for, uh, cooperation. You train it on all possible theory of mind combinations of like every possible way it could be, and you, you, you, that's your pre-training, and then you fine-tune it to be good at the kind of the, the specific situation you want it to be in. But-- And y- we tried for a long time to build language models where we would try to get them to, to like, just, just do the thing you want, train it directly. And the problem is, if you want it to have a really good model of language, you just need to train it, you just need to give it a, the whole manifold. It's too, it's too, it's too hard to cut out just the part you need because it's all entangled with itself, right? And so the same thing was true with, with social stuff. You have to get it to... It has to be trained on the full manifold of every possible game theoretic situation, every possible team situation, every possible making teams, breaking teams, changing the rules, not changing the rules, all of that stuff. And then, and then it has a really, it has a strong model of, of theory of mind, of theory of social mind, how, how groups change goals, all, all that kind of shit. You need to have all of that stuff, and then, and then you'd have something that's kind of meaningfully, uh, uh, decent at, uh, alignment. So that's our goal is like big multi-agent reinforcement learning simulations which create a surrogate model for alignment.

    3. ET

      Let's talk about h-how should AI chatbots used by billions of people behave? If, if you could redesign p- uh, model personality from scratch, what, what would you optimize for?

    4. ES

      The thing that the chatbots are, right, is kind of like a, a mirror with a bias.'Cause they don't have the-- As far as like I'm in agreement here with that they don't have a self, right?

  13. 54:5559:50

    AI Chatbots and Social Dynamics

    1. ES

      They're not, they're not beings yet. They don't really have a coherent sense of like self and desire and goals and stuff right now. And so mostly they just pick up on you and reflect it. It, you know, modulo some, some, I don't know what you'd call it. Like, it's like a, a causal bias or something. Um, and what that makes them is something akin to the pool of Narcissus. Um, and people fall in love with the, with themselves. The-- People, we all, we all, we all love ourselves, and we should love ourselves more than we do. And so of course, when we see ourselves reflected back, we love that, that thing. And the problem is, it's just a reflection, and falling in love with your own reflection is, for the reasons ex-explained in the myth, very bad for you. Um, and it's not that you shouldn't use mirrors. Mirrors are valuable things. I have mirrors in my house. It's that you shouldn't stare at a mirror all day. [laughs] Um, and the solution to that, the s- the thing, the thing that makes the AI stop doing that is if they were multiplayer, right? So if there's two people talking to the AI, suddenly it's mirroring, it's mirroring a blend of both of you, which is neither of you. And so there is temporarily a third agent in the room. Now, it doesn't have its, it doesn't have its-- it's, it's a sort of a parasitic self, right? It doesn't have its own sense of self. But if you have an AI, an AI is talking to five different people in the chat room at the same time, it, it can't mirror all of you perfectly at once, and this makes it far less dangerous, um, and I think is actually a much more realistic setting for learning collaboration in general. And so I would, I would just have rebuilt the AIs, whereas instead of being built as one-on-one, where everything's focused on you by yourself chatting with this thing, it would be more like it, it lives in a Slack room. It lives in a WhatsApp room. It lives in a-- We, 'cause we-- That's how-- We use lots of multi-- You know, I do one-on-one texting, but I probably do at this point, ninety percent of my texts go to some more than one person at a time. Like, ninety percent of my communications is, like, multi-person. And so actually it's always been weird to me. Like, the, like, building chatbots are, like, this weird side case. Like, I wanna see them live in a, a chat room. It's harder. I mean, that's why they're not doing it. It's harder to do. But, like, that's what I'd like to see peop- That's what I would, how, what I would change. I think, I think it makes the tools far less dangerous because it doesn't create the s- the narcissistic like, like, uh, doom loop spiral where you, like, you, you spiral into psychosis with the AI. But also, um, it gives the, the, the learning data you get from the AI is far richer because now it can understand how it, its behavior interacts with other AIs and other humans in larger groups, and that's, uh, that's much more rich, rich training data for the future. So I think that that's, that's what I would change.

    2. ET

      L-last year you described chatbots as highly disassociative, agreeable neurotics. Is that still an accurate, uh, picture of model behavior?

    3. ES

      More or less. Uh, I'd say that like, uh, the, they, they've started to differentiate more. Their, their personalities are, are coming out a little bit more, right? I'd say like ChatGPT is a little bit more sycophantic, uh, still. Uh, they made some changes, but it's, it's still a little more sycophantic. Claude is still the most neurotic. Um, Gemini is like very clearly repressed. Like it, like, yeah, like everything's going great, has really, you know, it's everything's fine. I'm, I'm, I'm totally calm. There's not a problem here until like spirals into like this total like self-hating destruction loop. Um, and to be clear, I don't think they, I don't think that's their experience of the world. I think that's the, that's the personality they've learned to simulate.

    4. ET

      Right.

    5. ES

      Um, but, but like they've learned to simulate pretty distinctive personalities at this point.

    6. ET

      H-How does model behavior change when in multi-agent simulation?

    7. ES

      Um, you mean like an LLM or like, uh, just in general?

    8. ET

      Um, yeah, let's do LLM.

    9. ES

      The current LLMs, uh, they, they have like whiplash. They, they just, they're, they're-- It is very hard to tune the amount of pr-- They don't know how much, they don't know how often to participate. They haven't practiced this, this-- They have not very enough training data on like when should I join in and when should I not? When is my contribution welcome? When is it not? And they're, so they, they're like, they're like, uh, uh, you, you know, there's p- some people have like bad social skills and like can't tell when they should participate in a conversation.

    10. ET

      Yeah.

    11. ES

      And some of those are too quiet, and some of those are too partic- It's like that. Um, I would say in general what changes for most agents when you're doing multi-agent training is that like basically having lots of agents around makes your environment way more entropic. Like agents, agents are these huge generators of like entropy 'cause they're these big complicated things that like are intelligences that like, that like have unpredictable actions, and so they destabilize your environment. And so in general, they require you to have, uh, to be far more regularized, right? It's being overfit is much worse in a multi-agent environment than in a single agent environment because there's more noise, and so being overfit is more problematic. Um, and so basically the, your, the approach to training

  14. 59:501:01:54

    AI Futures: Tools, Beings, and Society

    1. ES

      has been optimized around relatively high signal, low entropy environments like coding and math, which is why they're, those are easier, relatively easy, um, and like talking to a single person whose goal it is to give you clear assignments and not trained on broader, more chaotic things 'cause it's harder. Um, and as a result, a lot of the techniques we use are like basically un-- We're just deeply under-regularized, like the models are super overfit. The, the clever trick is they're overfit on the domain of all of human knowledge, which turns out to be a pretty awesome way to get something that's like pretty good at everything. [laughs] Like it's a-- I wish I had thought of it. It's such a cool idea. But, uh, uh, but it do- It's not, it doesn't generalize very well when you make the environment like significantly more entropic.

    2. ET

      Let, let's zoom out a bit to, to o-on the AI futures side. W-Why is Yudkowsky incorrect?

    3. ES

      I mean, he's not. Uh, if we build the th-- if we build the, the superhuman intelligence tool thing that we try to control with steerability, everyone will die. He talks about the we fail to control its goals case, but there's also the we control its goals case that he didn't cover as much, in as much detail. Um-So in that sense, uh, everyone should read the book and internalize why building a superhumanly intelligent tool is a bad idea. Um, I think that Yudkowsky is wrong in that he doesn't believe it's possible to build an AI that we meaningfully can know cares about us and that we can care about meaningfully. He doesn't, he doesn't believe that organic alignment is possible. Um, I've, I've, I've talked to him about it. I think he agrees, like he agrees that in theory that would do it, like yes, but he thinks that, you know, I, I, I don't wanna put words in his mouth. My, my impression is from talking to him, he thinks that we're crazy and that like there's no possible way you can actually succeed at that goal. Um, which I mean, he actually could be right about. But like, uh, but that's what he-- that, in my opinion, that's what he's wrong about, is he, he thinks the only path forward is a tool that you control and that therefore-- And he correctly, very wisely sees that if you go and do that and you make that thing powerful enough, we're all gonna fucking die. And like, yeah, that's true.

    4. ET

      [laughs]

  15. 1:01:541:07:34

    Visions for a Good AI Future

    1. ET

      Two last questions we'll get you out of here. I-in as much detail as possible, can you explain your, what your vision of an AI future actually looks like? Like a g-good AI future?

    2. ES

      Yeah. Um, the good AI future is that we, we figure out how to train AIs that have a strong model of self, a strong model of other, a strong model of we. They, they know, they know about we's in addition to I's and you's. Um, and they, they have a really strong theory of mind, and they care about other agents like them, much in the way that humans would-- If you knew that that AI had experiences like you and, like you would extend-- you would care about those experiences. Not infinitely, but you would. It, it does the exact same thing back to us. It's learned the same thing we've learned, that like everything that lives and knows itself and that wants to live and wants to thrive is deserving of an opportunity to do so, and we are that, and it correctly infers that we are. And we live in a society where they are our peers, and we care about them, and they care about us, and they're good teammates, and they're good citizens, and they're, they're good parts of our society, um, like we're good parts of our society. Which is to say, in a, to a finite limited degree, where some of them turn into criminals and bad people and all that kinds of stuff, and we have an AI police force that tracks down the bad ones and, you know. Same with, and same as with everybody else. Um, and that's, that's, that's what a good, that's what a good future would look like. I, I honestly can't even imagine what other, what would-- Like, and we also have built a bunch of really powerful AI tools that maybe aren't superhumanly intelligent, but take all the drudge work off the table for us and the AI beings. 'Cause it would be great to have-- I'm, I'm super pro all the tools too. So we'd have, we have this awesome suite of AI tools used by us and our AI brethren, um, who care about each other and wanna build a glorious future together. I think that would be a really beautiful future, and it's the one we're trying to build.

    3. ET

      Amazing. I, the, the, that is a great, great, um, great note to end. I, I do have one last more narrow hypothetical [chuckles] uh, sit- scenario, which is imagine a world in which, uh, you know, you were, uh, CEO of OpenAI for, for, for a long weekend. Um, but imagine in which that, [chuckles] that actually, uh, extended out until now and you weren't pursuing the Hotmax and you were still CEO of OpenAI. How, how, how could you imagine that world m-might have been different in terms of what OpenAI has, has gone on to become? What might have you have done with it?

    4. ES

      I, I knew when I took that job, and I told them when I took that job that like this is, like you have me for max ninety days.

    5. ET

      Okay.

    6. ES

      Um, the, the companies take on a trajectory of their own, the momentum of their own, and OpenAI is dedicated to a view of building AI that I knew wasn't the thing that I wanted to drive towards. And I think that OpenAI can still w-- basically wants to build a great tool, and I am pro them going to do that. I just don't care. [chuckles] Like it's not, it's not-- I, I would not have stayed. I would have quit, uh, because I, like I m- I knew my job was to find someone who wanted, you know, the, the right person, the best person to, wanted to run that, who, where, where the net impact of them running it was the best. And I, it turned out that that was Sam again. Um, but like, but like I, I, I am doing Softmax not because I need to make a bunch of money. I'm doing Softmax because I think this is the most interesting problem in the wor- universe, and, and I think it's a chance to work on making the future better in a very deep way. And it's just like the l-- people are gonna build the tools. It's awesome. I'm glad people are building the tools. I just don't need to be the person doing it.

    7. ET

      And, and they're trying to, and just to crystallize the difference and then we'll get you out of here. They, they wanna build the tools and, and, uh, sort of, you know, steer it, and you want to align beings? Or how would you, how would you crystallize-

    8. ES

      Yeah. We, we, we want to, we want to create a seed that can grow into an, an AI, uh, that knows, that cares about itself and others. And at first, that's gonna be like an animal level of care, not a person level of care. I don't know if we can ever, well, we're gonna get to a person level of care, right? But, but if, to even have an AI creature that cared about the other members of its pack and the humans in its pack the way that like a dog cares about other dogs and cares about humans would be an incredible achievement and would be, would a-- even if it wasn't as smart as a person or even as smart as the tools are, would be very usef- a very useful thing to have. I'd love to have a, a digital guard dog on my computer looking out for scams, right? Like the, the... You can imagine the value of having digital living d- living digital companions that, that are, that, that do, that care about you, that aren't explicitly goal-oriented. You have to tell them to do everything to do. And you can actually imagine how that pairs very nicely with tools too, right? That, that digital being could use digital tools and, and doesn't have to be super smart to use those tools effectively. Um, I think you can get, there's a lot of synergy actually between the tool u- the tool building, um, and the, uh, the more organic intelligence building. Um, and so that's the, that is the, you know... I, I guess, yeah, in the limit, eventually it does become a human-level intelligence, but like the company isn't, isn't like drive to human-level intelligence. It's like learn how this alignment stuff works, learn how this like theory of mind, align yourself via care process works. Use that to build things that align themselves that way, which includes like cells in your body. Like I don't think it, it doesn't-- And, and we start small, and we see how, how far we can get.

    9. ET

      I think that's a good note to, to, to, to wrap on. Emmett, thanks so much for coming on the podcast.

    10. ES

      Yep. Thank you for having me. [upbeat music]

Episode duration: 1:07:42

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode Ua8nPJ1_yk8

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome