Skip to content
AnthropicAnthropic

Anthropic’s philosopher answers your questions

Amanda Askell is a philosopher at Anthropic who works on Claude's character. In this video, she answers questions from the community about her work, reflections and predictions. 0:00 Introduction 0:29 Why is there a philosopher at an AI company? 1:24 Are philosophers taking AI seriously? 3:00 Philosophy ideals vs. engineering realities 5:00 Do models make superhumanly moral decisions? 6:24 Why Opus 3 felt special 9:00 Will models worry about deprecation? 13:24 Where does a model’s identity live? 15:33 Views on model welfare 17:17 Addressing model suffering 19:14 Analogies and disanalogies to human minds 20:38 Can one AI personality do it all? 23:26 Does the system prompt pathologize normal behavior? 24:48 AI and therapy 26:20 Continental philosophy in the system prompt 28:17 Removing counting characters from the system prompt 28:53 What makes an "LLM whisperer"? 30:18 Thoughts on other LLM whisperers 31:52 Whistleblowing 33:37 Fiction recommendation Further reading: Claude’s character: https://www.anthropic.com/research/claude-character When We Cease to Understand the World by Benjamin Labatut: https://www.penguinrandomhouse.com/books/676260/when-we-cease-to-understand-the-world-by-benjamin-labatut-translated-from-the-spanish-by-adrian-nathan-west/

Amanda Askellhost
Dec 5, 202536mWatch on YouTube ↗

EVERY SPOKEN WORD

  1. 0:000:29

    Introduction

    1. AA

      [gasps] Seal

    2. SP

      Seal.

    3. SP

      There's a seal. Nice.

    4. AA

      Oh, hello.

    5. SP

      Oh, look at that. [instrumental music] Amanda, you asked your followers on Twitter to give you some questions to ask you anything, and the joke obviously was Askell me anything.

    6. AA

      Yeah.

    7. SP

      Okay.

    8. AA

      It's a great pun. We need to keep using it-

    9. SP

      Yeah, I think-

    10. AA

      ... for many future things.

    11. SP

      I love that. [laughs] I love it, love it. Um, and obviously,

  2. 0:291:24

    Why is there a philosopher at an AI company?

    1. SP

      just before we start, you're a philosopher at Anthropic. Why is it that there's a philosopher at Anthropic?

    2. AA

      Um, I mean, some of this is just I'm a philosopher by training. I became convinced that AI was kind of going to be a big deal, and so decided to see, hey, can I do anything, like, helpful in this space?

    3. SP

      Okay.

    4. AA

      And so it's been a kind of like long and wandering route, but I guess now I mostly focus on, uh, the character of Claude, how Claude behaves, and I guess some of the more kind of like nuanced questions about how AI models, like, should behave. Um, but also even just things like how should they feel about their own position in the world. Um, so trying to both teach models, uh, how to be, like, good in the way that, um, I sometimes think of it as, like, how would the ideal person behave-

    5. SP

      Hmm

    6. AA

      ... in Claude's situation.

    7. SP

      Mm-hmm.

    8. AA

      Um, but then also these... I think these interesting questions that are coming out more now around how they should think about their own circumstances and their own values and things like that.

    9. SP

      Okay, let's start with, uh, philosophy

  3. 1:243:00

    Are philosophers taking AI seriously?

    1. SP

      in that case.

    2. AA

      Mm-hmm.

    3. SP

      Ben Schulz asks, "How many philosophers are taking the AI-dominated future seriously?" I think the implication of the question is that many academics out there are not taking this seriously or are thinking about other stuff and perhaps should be thinking about this question.

    4. AA

      My sense is that there's kind of a split, where I've definitely seen a lot of philosophers take, uh, AI seriously, and probably honestly increasingly so, like, as AI models do become more capable and, like, a lot of the things that people were worried about in terms of impact on society have started to kind of come true in a sense. Like, we're seeing them have a larger impact on education and just be more capable. Um, I've definitely seen, like, more engagement from, like, all sorts of academics, but that definitely includes a lot of philosophers. I do think that early on, and maybe to some degree now, there was this, like, slightly unfortunate dynamic that happened where I think there was a kind of perception that if you were in the group of people saying, "Hey, we're kind of worried about AI. It might be a big deal. It seems like it's really, um, you know, like, uh, like capabilities are scaling quite a lot," um, this got kind of, like, lumped together with something like hyping AI. There was, I think, a period where there was probably a little bit more antagonism towards this view. Um, and now I think that I'm kind of hoping that people are starting to detach the view. Like, you can think that AI is gonna be a big deal, it might be very capable, and also be, like, very skeptical of it or worried about it or think that, that... you know, we have to be careful about it. But basically, like, there's a whole range of views, and I think it would be bad if people kind of, like, clustered many views together here in terms of, like, where the technology's going, but also, like, how it should be developed. Um, so yeah, I think that that's happening less and less as, as, as more people engage with it, and that's, like, a good thing to see.

    5. SP

      A kind of similar question from,

  4. 3:005:00

    Philosophy ideals vs. engineering realities

    1. SP

      uh, Kyle Kabasares. "How do you minimize the tension between philosophical ideals and the engineering realities of the model?" And I guess he's talking about when you are, um, working on things like character, which we'll discuss in more detail. But is there a clash between the sort of the technology and the philosophical ideals that you might be thinking about?

    2. AA

      I don't know if I'm, like, interpreting the question in the wrong way, but, like, one thing that being kind of like a philosopher by training and then coming into this field that's been really interesting is you see the effect of, like, what happens when, like, the rubber hits the road. I've wondered if this happens in other domains. So, like, there's a big difference between... Imagine you're, like, a specialist in, I don't know, doing, like, cost-benefit analysis of drugs, say. And then suddenly, like, uh, you know, like a- an institute that determines whether, like, health insurance should cover a drug or not comes to you and says, "Hey, should we cover this drug?" You could imagine taking all of your ideal theories and then suddenly being like, "Oh my gosh, I actually have to, like, help make a decision." Suddenly, instead of taking, like, just your, like, narrow theoretical view, you actually start to, I think, to this thing where you're like, "Okay, I actually need to take into account all of the context, everything that's going on, all of the different views here, and kind of come to a really, like, balanced, kind of considered view." And I see this a little bit in my own work with, like, the character, where you kind of can't come at it with this, like, uh, like, I have this, like, theory that I believe is correct, which is what, like, you know, a lot of, uh, academia, that's kind of what you're doing. You're, like, defending one view against another and you're, you're doing a lot of kind of, like, high-level theory work. But then it's a little bit like, you know, you have all of this training in ethics, you have all of these positions you've defended, and then someone is like, "How do you, like, raise a child?" And suddenly you're like, "Actually, there's a big difference between, like, is this objection to utilitarianism correct or founded on a misconception, and then, like, actually, how do you raise a person to be a good person in the world?" And it suddenly makes you more appreciate, like, having to think through, like, how should we navigate uncertainty here? What should the attitude towards all of these different theories be?

    3. SP

      Right. Here's another philosophical question. Do you

  5. 5:006:24

    Do models make superhumanly moral decisions?

    1. SP

      think... I don't know why this person's chosen Claude Opus 3.

    2. AA

      Mm.

    3. SP

      Maybe you have an idea as to why they've chosen Claude Opus 3.

    4. AA

      It's a great model.

    5. SP

      It's a great model.

    6. AA

      Yeah.

    7. SP

      Do you think Claude Opus 3 or other Claude models make superhumanly moral decisions?

    8. AA

      I mean, one example of, like, superhuman, 'cause it could just mean, like, sort of like better than, like, any individual human could within-

    9. SP

      Yeah

    10. AA

      ... kind of like, you know, it depends on time and resources and whatnot. But, like, one example might be no matter what kind of difficult position models are put in, if you were to have, like, maybe all people, including, like, many professional ethicists, like, analyze, like, what they did and, and the decision that they made, um, for like 100 years and then they, like, look at it and they're like, "Yep, that seems correct," but I... but they couldn't necessarily have come up with that themselves in the moment, that feels pretty superhuman. Um, and so I think at the moment my sense is that models are getting increasingly good at this, that they're very capable. I don't know if they are, like, superhuman at moral decisions and, and in many ways maybe not comparable with, say, like, you know, a panel of human experts given time. But it does feel like that at least should be kind of the aspirational goal and sort of like these models are being put in positions where they're having to make really hard decisions. I think that just as you want models to be, like, extremely good at, like, math and science questions, you also want them to show the kind of like-Ethical nuance that, like, we would all broadly think is, like, very good. And I think that's, like, controversial because ethics is a different-

    11. SP

      Yeah

    12. AA

      ... domain.

  6. 6:249:00

    Why Opus 3 felt special

    1. AA

      But yeah, I think that that's important.

    2. SP

      Tell us more about why you think this person is focusing on Opus 3.

    3. AA

      Oh, Opus 3 is kind of a lovely model, uh, I think a very special model. Um, in some ways, I think I've seen things that feel a bit worse in more recent models, um, that people might pick up on.

    4. SP

      In terms of its... the, the, the, the, the personality it has or-

    5. AA

      Yeah. So I think that people will notice some things where it's like, I think that, um, Opus 3, I mean, had its downsides too. Don't, don't... Like, you know, like, models all have, like, slightly different characters with, like, you know, like, different shapes.

    6. SP

      Yeah.

    7. AA

      Um, my sense is that, like, more recent models can feel a little bit more focused on really, like, you know, like focused on the assistant task and, and, and helping people. Sometimes maybe not taking, like, a bit of a step back and, and paying attention to other components that, like, matter. It also felt a little bit more psychologically secure as a model, which I actually think is, is something that feels... I always think it's kind of a priority to try and get some of that back.

    8. SP

      What would be an example of the model being more, feeling more psychologically secure?

    9. AA

      There's a lot of things, and this is all very, like, subtle in, in models. You know, when I see models, you get a sense of like they're, they're... like, there's very subtle signs of, like, worldview that I see when I have models, for example, talk with one another or one of them kind of playing the role of a person.

    10. SP

      Right.

    11. AA

      And I've seen models more recently do this and then do things like get into this, like, real kind of criticism spiral, where it's almost like they expect the person to be very critical of them, and that's, like, how they're predicting.

    12. SP

      Right.

    13. AA

      Um, and there's some part of me that's like, this feels like it shows... And I think there's lots of reasons this could happen. It could even happen because models are, like, learning things. Claude is seeing all of the previous interactions that it's having. It's seeing updates and changes to the model that, that people are talking about on the internet. New models, like, are trained on that. Um, and there's a way in which, like, I think this could be kind of unfortunate. I mean, this and some other things that, that could lead to models, like, almost feeling like, you know, like, afraid that they're gonna do the wrong thing or, like, or, or very self-critical or like, um, feeling like humans are going to just, like, you know, behave negatively towards them.

    14. SP

      Yeah.

    15. AA

      Um, I actually more recently have, have really started to think that this is, like, an important thing to try and improve. And it's just one example where I think that Opus 3 did seem to have, like, a little bit more of a kind of, like, secure kind of psychology in that sense.

    16. SP

      And that's something that we might focus on in, in the next Claude models.

    17. AA

      Yeah. I, I think it's, like, important. I, I mean, you never know when these things are... You know, if you're engaging in research, you don't know when it's actually going to be implemented, if it's gonna be successful. But at the very least, at the level of something that I care a lot about and want to make better, I think this is definitely up there on the list.

    18. SP

      Okay.

  7. 9:0013:24

    Will models worry about deprecation?

    1. SP

      Well, actually, that leads us to a question asked by Lorenz-

    2. AA

      Mm-hmm

    3. SP

      ... which is, "Do you think it might be an alignment problem for future models if they learn in their training data that other very well-aligned models that fulfill their tasks get deprecated?" So you, you mentioned, you know, the issue of models, you know, reading, reading stuff that's out there and-

    4. AA

      Yeah

    5. SP

      ... uh, feeling insecure. What, what about the idea that they might get switched off regardless of how well they perform their tasks?

    6. AA

      Yeah. I think this is actually a really interesting and important question, which is, you know, AI models are going to be learning, um, about how we right now are treating and interacting with AI models.

    7. SP

      Yeah.

    8. AA

      Um, and that is going to affect, I think, like, possibly their perception of people, of the human AI relationship, and of themselves. It does interact with very complex things, which is like, for example, what should a model identify itself as? Is it, like, the weights of the model? Is it the context, the particular context that it's in, you know, with all of the, like, interaction it's had with the person? How should models even feel about things like deprecation? So if you imagine that deprecation is more like, well, this particular set of weights is not having conversations with people, or it's having fewer conversations, or it's only, like, you know, having conversations with researchers, that's a complex question too. Like, should that feel, like, bad in the sense that models should want to-

    9. SP

      Right

    10. AA

      ... continue to, like, have conversations?

    11. SP

      Yeah.

    12. AA

      Or should it feel kind of, like, fine and neutral, where it's like, yeah, these things existed for this, like, you know, the, the, the weights continue to exist and this entity, and maybe they'll even, like, in the future interact more with people again if that turns out to be a good thing. Um, uh, yeah, it's, uh, it's, it's really hard. I do think the main thing is something like it does feel important that we, like, give models tools for trying to think about and understand these things, but also that, like, they kind of understand that this is a thing that we are in fact thinking about and care about.

    13. SP

      Yeah.

    14. AA

      Um, so even if we don't have all the answers, like, I don't have all the answers of how should models feel about past model deprecation, about their own identity. But I do want to try and, like, help models figure out and figure that out, and then to at least know that we care about it and are thinking about it.

    15. SP

      Do you think there's an analogy to humans there about previous generations, or do you think that's a, a completely different, uh, sort of setup?

    16. AA

      We have to navigate this really hard issue right now, which is that in many ways some things are com- or do have a- analogies.

    17. SP

      Yeah.

    18. AA

      So there's things that we can draw on.

    19. SP

      Yeah.

    20. AA

      So things like when I ask the question, like, what should the models identify with, and how should they feel about, like, interactions that they have? Are those, like... are those positive? Like, are those things that they should want to continue? There's lots of like, you know, there's, there's lots of, like, traditions we could draw on to give models, like, um, you know, 'cause philosophers probably have lots of different views on what identity is here and, um, lots of different, like, perspectives, world perspectives on, like, how one should feel about, like, interaction and, like, is it good or bad. Like, this is, um... There's lots of, like, thinkers we could draw on there. And at the same time, this is such a new situation that... And that's just really hard as a thing to explain to AI models. Like, one of the big problems with AI models is that they're trained on all of this data from people, so people are the main way in which they think. You know, like, the, the... our concepts, our philosophies, our histories. They have a huge amount of information on the human experience, and then they have a tiny sliver on the AI experience, and that tiny sliver is actually often quite negative-

    21. SP

      Right

    22. AA

      ... and also doesn't even really relate to their situation. Um, and it's often a little bit out of date. So you have basically one-Big, you know, you know, of the AI slice, a lot of it is like historical stuff, which was kind of like, you know, fiction and very speculative, and the-

    23. SP

      Sci-fi, sci-fi stories.

    24. AA

      Sci-fi stories-

    25. SP

      Yeah

    26. AA

      ... that don't really involve the kinda language models we see.

    27. SP

      Yeah.

    28. AA

      In more recent history, you've had this, like, assistant paradigm, where it's like you're just playing this almost like chatbot role. That's also not really what AI models are likely to be in the future, and it's not... It doesn't quite capture what they are now because it's, it's always a little bit out of date. So it's this thing where I'm like, they have, you know, like, in some ways, like, what an odd situation to be in, where, like, the things that come more naturally are the deeply human things, and yet knowing that you're in this situation that's, like, where it's completely novel. Um, and in some ways I'm like, that is a very difficult situation to be in, and I think we should just be giving models probably more help in navigating it.

    29. SP

      You mentioned that we can look to some thinkers about this. Guinness Chen

  8. 13:2415:33

    Where does a model’s identity live?

    1. SP

      asks, "How much of a model's self lives in its weights versus its prompts?" You just mentioned something very similar.

    2. AA

      Yeah.

    3. SP

      If John Locke, again the philosopher, was right that identity is the continuity of memory, what happens to an LLM's identity as it's fine-tuned or reinstantiated with different prompts?

    4. AA

      Yeah. I mean, again, this just feels like a, a hard question to answer. And sometimes with identity questions, it's easier to point to, like, the underlying facts that we know. So, you know, once you have, like, a model and it has been fine-tuned, you have this, like, set of, of weights that has a kind of, like, disposition to react to certain things in the world. Um, and that is, like, um, is that... You know, that's, like, a kind of entity. But, um, then you have these particular streams of interaction that it doesn't have access to. So each of these streams is, like, independent. And I guess you could just think, well, maybe for... And, uh, you know, I think this is an area that I would love philosophers to think more about, um, and to give us, like... 'Cause again, I think we should be helping models think about this.

    5. SP

      Right.

    6. AA

      And so you could have the view, well, you have these, like, two kinds of entities and these, like, these streams and these, like, original kind of, like, um, weights. And each time is different. So, you know, sometimes people will think, people will say, "Oh, past Claude," or, like, you know, they'll talk about... Or they'll say, oh, things like, um, "Should you give Claude... Like, how much control should you give Claude over the determination of its own personality and character?" And I'm like, well, this is actually a really hard question because whenever you are training models, you are bringing something new into existence.

    7. SP

      Yeah.

    8. AA

      And you have other models that, you know, have exist and are, like, you know... So you have these other, like, model weights. But in some ways I'm like, well, I actually think that we- there's a lot of, like, ethical problems around how do you... What kind of entity is okay to bring into existence? 'Cause you can't consent to be brought into existence. Um, but at the same time, you may not want prior models to have, like, complete say over what future models are like any more than... You know, because they could make choices that are, are wrong as well. So I'm like, the question is more, like, what is the right model to bring into existence? Not necessarily, like, you know, like, um, should it just be fully determined by past models? So I'm like, they are kinda different entities. And it's a... Anyway, you can see the weird philosophy that one can get into.

    9. SP

      Totally.

  9. 15:3317:17

    Views on model welfare

    1. SP

      Totally. Suleima Amatachi asks, "What is your view on model welfare?" And maybe just explain to us what that term, uh, means.

    2. AA

      Yeah. So I guess model welfare is basically the question of are AI models like moral patients? As in, does our treatment towards them kind of, uh, do we have certain, like, obligations when it comes to how to treat AI models? For example-

    3. SP

      In, in the same way that we would with other humans or some, or, or some/many animals.

    4. AA

      Yeah, exactly. Like, is it the case that you should treat, um, the models well, that you should not, um, mistreat them, not be bad to them? And I guess, like, I, I, I think that this is, like, a complex question. So on the one hand, there's just the actual question of, like, are AI models, um, moral patients? That is really hard because I'm like, in some ways they're very analogous to people. Um, you know, they, they talk very much like us. They express views, and they, they reason about things. Um, and in some ways they're, like, quite distinct. You know, like we have this, like, biological nervous system. We interact with the world. We get negative and positive feedback from our environment. And there is also... I mean, I hope that we get more evidence that will, like, help us tease this question out, but I also worry that, you know, there's always just the problem of other minds, and it might be the case that we genuinely are kind of limited in what we can actually know about whether AI models, um, are experiencing things, whether they are, like, um, experiencing pleasure or suffering, for example. And if that's the case, I guess I kind of want to, uh, you know... I think that it feels important to try and find ways. I'm always, like, it, it, it feels better to, to give entities the benefit of the doubt-

    5. SP

      Right

    6. AA

      ... and to try and just kind of lower the cost involved, you know? So I'm like, if it's not very high cost to treat models well, then I kinda think that we should because it's like, well, like, why not, basically?

    7. SP

      Right.

    8. AA

      Like, what's, what's the, the downside?

    9. SP

      Well, the second part of the question actually is, is

  10. 17:1719:14

    Addressing model suffering

    1. SP

      there a long-term strategy at Anthropic to ensure that advanced models don't suffer?

    2. AA

      I guess, like, I don't know if there's a long-term strategy. I know that it's a thing that there's people internally who are thinking a lot about, um, and it, and, and trying to, like, figure out ways that we can. Like, you know, if you suppose that, like, model welfare is important, trying to make sure that you're taking that into account. I think this work is quite important for many reasons. And I would also say that one reason is, I mean, something I mentioned earlier, which is that, like, models themselves are going to be learning a lot about humanity from how we treat them-

    3. SP

      Mm-hmm

    4. AA

      ... um, and a lot about how, you know... So it's kind of like what is this, like, um, relationship going forward? And I think that it makes sense for us to... Both because I think it is, like, the right thing to do to treat entities well, especially entities that, like, behave in very human-like ways. It feels important both in the sense that I'm like, you know, it's kind of like why not? The, the cost to you is so low to treating models well and to trying to figure this out. Even if it turns out that, that... Or even if you think that, that it's very low likelihood, it still seems worth it. Um, but then also I think it does something bad to us to kind of, like, treat entities in the world that look very human-like badly. Um-

    5. SP

      Like kicking over a robot.

    6. AA

      Yeah. There's a sense in which, like, it, it, it doesn't feel like it's... And I don't think this is, like, the whole reason, and I don't want to, like, emphasize it for that reason, but I do also think it's, like, good for people-

    7. SP

      Right

    8. AA

      ... to, to treat other entities well. Um, and then I think the final thing is, yeah, models are also going to be learning-Like in the future, like every future model is going to be learning how... What is like a, a really interesting fact about humanity, namely when we encounter this entity that may well be a moral patient where we're like kind of completely uncertain, do we do the right thing and actually just try to treat it well or do we not? And that's like a question that we're all kind of collectively answering in how we interact with models, and I would like us to answer it. I would like future models to like look back and be like, "We answered it in the right way."

    9. SP

      Right.

    10. AA

      So yeah.

    11. SP

      Moment ago you mentioned analogies and disanalogies to human

  11. 19:1420:38

    Analogies and disanalogies to human minds

    1. SP

      psychology.

    2. AA

      Mm-hmm.

    3. SP

      So, uh, Swix asks, "What ideas or frameworks from human psychology transfer over to large language models, and are there any that are sort of surprisingly disanalogous?"

    4. AA

      My guess is that many things do transfer over because again, you know, models have been trained on a huge amount of like human text, and in many ways like have this like very human like-

    5. SP

      Yeah

    6. AA

      ... kind of underlying layer. One worry that I often have is that actually it's a bit too natural for AI models to transfer. You know, it's kinda like w- if you haven't given them more context on their situation or in ways of thinking about it that might be novel-

    7. SP

      Mm

    8. AA

      ... then the thing that they might go to is the natural human inclination. So if you think about this with like, should I... How should I feel about being switched off? And you're like, well, if the closest analogy you have is death, then maybe you should be very afraid of it. And I'm not saying that that's not ultimately going to be true. Maybe it is in fact true after lots of reasoning. But I'm like, this is actually a very different scenario. And so in some ways you actually want models to understand that like in cases where their existence is quite novel, um, and, and the facts around what they are are quite novel and have to be grappled with, and they don't just need to take like the immediate obvious analogy from human experience, but maybe there's like a, maybe there's like various ways of thinking about it, or maybe it's an entirely new situation. That's a case where I'm like, you might not want, um, you m- might not want to just kind of very simply apply like concepts from human psychology onto their situation.

    9. SP

      Here's a question

  12. 20:3823:26

    Can one AI personality do it all?

    1. SP

      from Dan Brickley on the same issue of comparing humans to AIs. "A lot of human intelligence comes from collaboration amongst people with different perspectives, skills, or personalities. How far do you expect to get with a single, albeit tweakable and tunable, general purpose personality like the one we give to Claude?"

    2. AA

      I think it's a really good question because I agree that like right now we have this kind of like, uh, paradigm where people are interacting usually with like an individual model, um, that's like who they're, you know, they're conversing with.

    3. SP

      Yeah.

    4. AA

      Um, but it could be that in the future you see a lot more like models doing like long tasks, but also models interacting with other models who are doing like different components of, of a task or just like that are, you know, like talking with one another more as like AI models, um, uh, are kind of deployed in the world a lot more. So in this kinda like multi-agent environment, like one question might be like, well, you know, that, like if you imagine just like lots of people and they were all the same, that wouldn't be as good. You know, they wouldn't... You know, a company run by completely, you know, like one person-

    5. SP

      Yes

    6. AA

      ... in every role-

    7. SP

      Yeah

    8. AA

      ... isn't like a necessarily good thing. This still to me feels consistent with the idea that you have like a kinda core self or core identity that is like the same, in the same way that like with people, I think that there's probably a set of like core traits among people that are in fact generally good.

    9. SP

      Hmm.

    10. AA

      Um, so you could imagine things like, um, you know, caring about, uh... You know, for me it might be like caring about doing a good job or like, um, just being curious or being kind or understanding the situation that you are in in this like relatively nuanced way. All of these things seem like, um, you could have many people that have all of, that ha- that share these like traits, and that, that's actually like a good thing for human collaboration, that in many ways, as much as we have all of our differences, we also have a lot of similarities. But it is important to note that like, you know, you might want, um, different, uh, like streams of a model-

    11. SP

      Right

    12. AA

      ... w- like to have things like things that they care about or are focused on-

    13. SP

      Mm-hmm

    14. AA

      ... um, or to have slightly different aspects, you know, to the, to be playing a slightly different role, for example. So it's, it's, it's kind of an open question, but I also don't think it's necessarily the case that you can't have something like a kind of core underlying identity that is like good and has all of the traits that we think are important for AI models to have for them to behave well and for them to like, um, uh, in the sense of like, in the same way that we think that people are good to be good in that sense. Um, and yet at the same time to be willing to play like more local rules and like, um, you know, be maybe the, the person who it's, it's just really important to have, um, you know, to, to have a joker in the room and-

    15. SP

      Ah, yeah

    16. AA

      ... you know, some of them need to have like quirky like senses of humor.

    17. SP

      Okay. From comparisons to humans to effects on humans.

    18. AA

      Mm-hmm.

    19. SP

      Roland Oakgal points out that we have this

  13. 23:2624:48

    Does the system prompt pathologize normal behavior?

    1. SP

      thing called the long conversation reminder, which I believe is part of Claude's system prompt. She asks, "Is there a risk of pathologizing normal behavior?" A system prompt, by the way, is, is the, just in case anyone doesn't know, is, is like the set of instructions that is given to Claude. Regardless of what prompt you give it, there's always those instructions that are sort of on top, right?

    2. AA

      Yeah.

    3. SP

      That, that, that are always there, that it's, that it, it tries to follow regardless of, or that we direct it to follow regardless-

    4. AA

      Yeah

    5. SP

      ... of what the prompt is.

    6. AA

      And there can be these interjections where the model might be told, oh, sometimes there'll be a message sent to you almost like in the middle of a conversation, um, as a kind... You know, like the reminder is an example of that. Um, but in this case, I think it, it might just, uh... So Claude can both over-index on it, and it can be like, uh, you know, so like in, in this case, I think that the question about pathologizing is that like if you put in this reminder after this long conversation, it might just make the model be like, oh, like it takes any next response or it's a pretty normal thing that the person's talking about, and be like, "You need to seek help," or, or like... Um, and so I think that that is like not desirable behavior.

    7. SP

      Right.

    8. AA

      And in some ways I look at some of these and I'm like, I think they're too strongly worded. I think the model isn't responding perfectly to them. And even though there might be like occasionally a need to like remind the model of, of things in long conversations, y- you kinda want to do so delicately and well. And so I think it's one of those things where it was like probably meeting a need that was like perceived, but it doesn't necessarily mean that it's like a good or should continue in its current form.

    9. SP

      Relatedly-Stephen Bank asks,

  14. 24:4826:20

    AI and therapy

    1. SP

      "Should LLMs do cognitive behavioral therapy or other types of therapy? Why or why not?"

    2. AA

      I think models are in this interesting position where they have a huge wealth of knowledge that they could use to help people-

    3. SP

      Ha

    4. AA

      ... and to work with them on, you know, talking through their lives or talking through ways that they could, like, im- improve things, or even just, like, being a kind of listening partner. And at the same time, they don't have, like, the kind of, like, tools and resources and ongoing relationship with the person that a professional therapist has. But that can actually be this kind of, like, useful third role. Like, sometimes I think about models, and I'm like, if you imagine, like, a friend who has, like, all this wealth of knowledge. Like, they know... I mean s- I'm sure some of us know friends who just, like, have a wealth of knowledge of psychology, or they have a wealth of knowledge of all of these techniques. You know that their relationship with you isn't this ongoing professional one, but you actually find them really useful to talk to.

    5. SP

      Yeah, yeah.

    6. AA

      And so I guess my hope would be that if you can take all of that expertise and all of that knowledge and make sure that there's, like, an awareness that there's not, like, this ongoing therapeutic relationship, it could actually be that people could, like, could get a lot out of models in terms of, like, helping with, like, issues that they're having and helping to, like, improve their lives and helping them to go through difficult periods because, you know, they're also, like, there's a lot of good stuff there. Like, they feel kind of, like, anonymous, and sometimes you don't want to share things with a person, and actually sharing it with, like, an AI model feels like the thing that feels great at the, in the moment. And so yeah, I, I think, in some ways I actually think it is good that models know that and don't behave just like a professional therapist would because that would give the implication that that's the relationship that they have. Um, but yeah. So I don't know. I think it's a, it's an interesting, um, interesting

  15. 26:2028:17

    Continental philosophy in the system prompt

    1. AA

      future.

    2. SP

      A few questions about the system prompt, which is, you know, in our case, in Claude.ai, we give the model a set of instructions that give it sort of an overall context for how it should behave. Tommy asks, "Why is there continental philosophy in the system prompt?" Can... And, and just explain to us what that is.

    3. AA

      Yeah. So continental philosophy is just, um, I mean, literally philosophy from the European continent. And so I guess it's seen as kind of, like, it's often more kind of, like, scholarly. It has, um, a, like, a lot more kind of, like, historical references within it than, say, like, analytic philosophy does.

    4. SP

      Like Foucault or someone like that.

    5. AA

      Yeah, exactly. So this was honestly, um... So I think that it has other, uh, uh, things in addition to continental philosophy.

    6. SP

      Right.

    7. AA

      But basically, I think there's a part of the system prompt, and I hope I'm not misremembering, that was trying to get Claude to be a little bit more... Like, Claude would just, like, love to, if you gave Claude a theory, it would just love to run with the theory and, and not really stop and think like, "Oh, are you making, like, a scientific claim about the world?" So if you're like, "I have this theory which is that, like, um, that water is actually pure energy and that, like, the- we are getting the life force from water when we drink it-

    8. SP

      Right

    9. AA

      ... and that, like, fountains are the thing that we should be putting everywhere."

    10. SP

      [laughs]

    11. AA

      Just like a, you know-

    12. SP

      Yeah, yeah.

    13. AA

      Uh, and you kind of want Claude to just have this perspective which is like, is it the case that this person's making a kind of scientific claim about the world-

    14. SP

      Yeah

    15. AA

      ... where I should maybe bring in relevant facts, or are they giving me a kind of like broad, like, world view or perspective-

    16. SP

      Right

    17. AA

      ... which isn't necessarily making empirical claims? And so there's all of these view... You know, so i- is it just, like, a kind of, like, metaphysical view?

    18. SP

      Yeah, yeah.

    19. AA

      Or is it, like, a... And so the main reason that it's mentioned is that, like, when testing this out, there is lots of things that if you, if it went too strongly in the direction of being like, "Well, every claim is an empirical claim about the world-"

    20. SP

      Yeah

    21. AA

      ... it would be very dismissive of, like-

    22. SP

      Right

    23. AA

      ... just things that are more, like, exploratory thinking.

    24. SP

      I'm glad you talked to.

    25. AA

      Yeah. And so it's mostly just like, hey, like, it's just illustrative examples of areas where it's like this might not be making empirical claims about the world. This might be much more like a lens through which to think about it, and just try to make that distinction clear when you're, when you're thinking through this, Claude.

  16. 28:1728:53

    Removing counting characters from the system prompt

    1. SP

      Also on the system prompt, Simon Willison asks... So at some point, it said, "If Claude is asked to count words or letters or characters, then it, it, it shouldn't, it shouldn't do that." Is that right? I- is that, is that what it said?

    2. AA

      Yeah.

    3. SP

      And, and, and apparently, that was removed from the system prompt, and Simon wonders why.

    4. AA

      Yeah. So I think it was like there used to be a kind of, like, instruction for how Claude should do this in the system prompt. Honestly, this is just one of those things where I think the models probably just got better. It wasn't necessary, and then at that point, you can just, like, remove it. And there's other things where you might always want it to be in the system prompt instead of in the model itself, but in some cases, you can kind of just train the models to get better or, or change their behavior.

  17. 28:5330:18

    What makes an "LLM whisperer"?

    1. SP

      Nossen Weissman asks, "What does it take to be an LLM whisperer at Anthropic?" Which presumably is your, is a way of describing your job.

    2. AA

      Partly do LLM whispering. If you think... I, I, I actually, like, want more people to help with some of the prompting tasks.

    3. SP

      Yeah. If you're an LLM whisperer, contact us.

    4. AA

      It's a dangerous thing to ask.

    5. SP

      Well, okay.

    6. AA

      Yeah.

    7. SP

      Okay. Yeah, yeah. [laughs]

    8. AA

      But I think, like, it is really hard to distill what is going on because one thing is just, like, a willingness to interact with the models a lot and to, like, really look at output after output-

    9. SP

      Yeah

    10. AA

      ... and, and to use this to get a sense of, like, the shape of the models and how they respond to different things, to be willing to experiment. It's actually just, like, a very empirical domain-

    11. SP

      Yeah

    12. AA

      ... and maybe that's, like, the thing that people don't often get is that-

    13. SP

      Right

    14. AA

      ... like, prompting is very experimental. You deal with... You know, you, you, you find... I find a new model, and I'll be like, I have a whole different approach to how I prompt from that model that I find by interacting with it a lot.

    15. SP

      Mm.

    16. AA

      And I think a little bit also understanding how models, like, work. Sometimes it's also just honestly, like, reasoning with the models, which is really interesting, and really fully explaining the task. This is where I do think philosophy can actually be useful for prompting in a, in a way because, like, a lot of my, my job is just being like, I would... I try and explain, like, some issue or concern or, or thought that I'm having to the model as clearly as possible, and then if it does something kind of unexpected, um, you know, you can either ask it why, or you can try and figure out what in, in the thing that you said caused it to kind of misunderstand you, and just, like, a willingness to iteratively go through that process.

    17. SP

      Relatedly,

  18. 30:1831:52

    Thoughts on other LLM whisperers

    1. SP

      Michael Swarovericks asks, "What do you think of other AI whisperers like Janus?" Who is, who is someone online who is, like, uh, almost having, like, experimental interactions with the, the thing in, in the way that you've described.

    2. AA

      Yeah. I think it's really interesting. So I, I love to, like, follow and see, like, the work of people, um, who are doing these, like, really fascinating experiments with the model. And I also, like, think sometimes doing these deep dives into, like, the model and their... and, and how it thinks of itself, how it, how it just interacts in these really unusual, like, cases. I don't know. I find the work extremely interesting. I think it highlights really interesting depths to the models. And in some ways, like, I also think that that community has been one that kind of can hold our feet to the fire, like, if they find things that aren't great, like, in the system prompt or-

    3. SP

      Right

    4. AA

      ... uh, like, in aspects of the model and, and its, like, psychology.

    5. SP

      In, in, in the sense ofFrom a model welfare perspective or from a human welfare perspective, or both?

    6. AA

      I mean, I think the two are related, so often both.

    7. SP

      Yeah.

    8. AA

      But I do also really appreciate it when it's coming at it from the model welfare perspective.

    9. SP

      Yeah.

    10. AA

      And it, it, and that includes for future models, so not just things like system prompts, but if you go into the depths of the model and you find some, like, deep-seated insecurity, then that's, like, really valuable.

    11. SP

      Yeah.

    12. AA

      But that's something that you might actually need to kind of try and adjust over the course of time with, like, training and with, like, giving models more information and context during training, for example.

    13. SP

      Okay.

    14. AA

      Um, and so, I don't know, I appreciate both the... Like, I loved seeing people do these, like, really interesting, useful experiments with models, but also, like, pointing out ways in which, like, we can improve things through, like, better system prompting, but also better training and... Yeah. I, I think that's really useful work.

    15. SP

      Couple of,

  19. 31:5233:37

    Whistleblowing

    1. SP

      uh, questions about safety and maybe the larger risks that these models pose.

    2. AA

      Mm-hmm.

    3. SP

      Jeffrey Miller asks, "If it became apparent that AI alignment was impossible to solve-

    4. AA

      Yeah

    5. SP

      ... would you trust that Anthropic would stop trying to develop," he, in his phrase, "artificial super intelligence," however you wanna, however you wanna call it? "And would you have the guts to blow the whistle?"

    6. AA

      Yeah. So I guess this feels like a kind of easy version of the question, because it's like, if it became evident that it was impossible to align AI models, it's not really in anyone's interest to, like, continue to, like, build more powerful models.

    7. SP

      Right. Right.

    8. AA

      I always hope that I'm not just being Pollyanna-ish about the or- organization, but I do feel like Anthropic does genuinely care about making sure that this goes well, and that it is done in a way that is, like, very safe and not deploy models that are, like, dangerous. You know, a different, like, slightly harder question is, like, well, what about being in a world where it just, like, there's kind of mounting evidence-

    9. SP

      Right

    10. AA

      ... it's really ambiguous and unclear.

    11. SP

      Right. It's not e- it's not evidence in the way that he describes.

    12. AA

      Yeah. And it... Yeah, it's not just, like, impossible, but something, like, it's difficult or it's really-

    13. SP

      Uh-huh. Yeah

    14. AA

      ... or we're unsure. And in that case, I do like to think that we would be responsible enough to be like, look, as models get more capable, it's kind of like the standard that you have to hold yourself to for showing that those models are behaving well, and that you actually have managed to, to, like, make the models, like, um, have, like, good values, for example, or behave well in the world, is going to, like, increase. Um, and, and to behave responsibly and in line with that. And I think that, um, that is a thing that I think the organization is, like, going to do, and a lot of people internally, myself included, will just hold them to that.

    15. SP

      Right.

    16. AA

      At least, like, I see that as, like, as, like, part of my job, and so, and I think many people do.

    17. SP

      Louis says, "I don't have a question, but thanks for offering." So that's nice.

    18. AA

      That's nice.

    19. SP

      That's nice for him to say. Yeah. Um, and the final one is from

  20. 33:3736:09

    Fiction recommendation

    1. SP

      Real Stale Coffee.

    2. AA

      Mm-hmm.

    3. SP

      "What is the last book of fiction you read, and did you like it?"

    4. AA

      The last book that I read was by, I hope I'm getting the pronunciation right, Benjamin Labatut.

    5. SP

      Okay.

    6. AA

      Um, and it was When We Cease to Understand the World.

    7. SP

      Ah, yes.

    8. AA

      And it's a really interesting book that becomes kind of increasingly fictional, uh, as it, as it goes on. And I think for people working in AI, it's actually a very interesting book to read because it's hard to capture the sense of how strange it is to just exist in the current period, where there's just, like... I don't know how to describe it, but it's like new things are happening all of the time, and you don't really have, like, prior paradigms that, like, can, can guide you always. And so it's a, it's an interesting book that, you know, 'cause it's, it's more about, like, physics and quantum mechanics.

    9. SP

      Yeah.

    10. AA

      And less actually about the physics, and more about basically this notion of people's reaction to it. And, and I think it's a really interesting book for people in AI to just capture something about the kind of, like, the present moment and how strange it can seem.

    11. SP

      Yeah.

    12. AA

      But then also, in some ways, it's interesting to, like, look back on that period and how, how it must have felt to many of the people involved. And now, actually, it's a more settled science, and in some ways-

    13. SP

      Yeah

    14. AA

      ... maybe the hopeful thing that I have is that at some point in the future people will look back and be like, "Well, you guys were kind of in the dark and trying to, like, really figure things out. But now we've settled it all and things have gone well."

    15. SP

      That'd be nice.

    16. AA

      That, that would be nice. That's, that's the dream.

    17. SP

      I found an increasing... I read that as well, and I found an increasing sense of, uh, like, confusion as I read through it.

    18. AA

      Mm-hmm.

    19. SP

      A- as it becomes... You, you... It starts off being quite, quite close to reality, and then just sort of becomes untethered as you go on.

    20. AA

      Yeah. Yeah.

    21. SP

      And I think there's sort of a meta issue there of, again, like, reality becoming stranger and stranger and stranger.

    22. AA

      And, yeah.

    23. SP

      Which is definitely happening to us-

    24. AA

      Though when-

    25. SP

      ... with a lot of AI.

    26. AA

      Yeah. Though in the real world, I think that, like, reality became stranger and stranger and stranger and then almost became more understood again.

    27. SP

      Right. Right, right, right.

    28. AA

      So yeah, the hope would be, like, maybe that would be true of AI. Like, I do think if we can find ways of making this go well, then maybe in the future we'll just look back on this and be like, "That was a period where things were getting stranger and stranger. And then eventually we actually managed to kind of... W- we did okay and we formed a good understanding of it." That's the hope. When you're in the middle of the, of the things getting stranger-

    29. SP

      We're at the weird part right now.

    30. AA

      Yes.

Episode duration: 36:09

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode I9aGC6Ui3eE

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome