EVERY SPOKEN WORD
40 min read · 8,283 words- 0:00 – 0:29
Introduction
- AAAmanda Askell
[gasps] Seal
- SPSpeaker
Seal.
- SPSpeaker
There's a seal. Nice.
- AAAmanda Askell
Oh, hello.
- SPSpeaker
Oh, look at that. [instrumental music] Amanda, you asked your followers on Twitter to give you some questions to ask you anything, and the joke obviously was Askell me anything.
- AAAmanda Askell
Yeah.
- SPSpeaker
Okay.
- AAAmanda Askell
It's a great pun. We need to keep using it-
- SPSpeaker
Yeah, I think-
- AAAmanda Askell
... for many future things.
- SPSpeaker
I love that. [laughs] I love it, love it. Um, and obviously,
- 0:29 – 1:24
Why is there a philosopher at an AI company?
- SPSpeaker
just before we start, you're a philosopher at Anthropic. Why is it that there's a philosopher at Anthropic?
- AAAmanda Askell
Um, I mean, some of this is just I'm a philosopher by training. I became convinced that AI was kind of going to be a big deal, and so decided to see, hey, can I do anything, like, helpful in this space?
- SPSpeaker
Okay.
- AAAmanda Askell
And so it's been a kind of like long and wandering route, but I guess now I mostly focus on, uh, the character of Claude, how Claude behaves, and I guess some of the more kind of like nuanced questions about how AI models, like, should behave. Um, but also even just things like how should they feel about their own position in the world. Um, so trying to both teach models, uh, how to be, like, good in the way that, um, I sometimes think of it as, like, how would the ideal person behave-
- SPSpeaker
Hmm
- AAAmanda Askell
... in Claude's situation.
- SPSpeaker
Mm-hmm.
- AAAmanda Askell
Um, but then also these... I think these interesting questions that are coming out more now around how they should think about their own circumstances and their own values and things like that.
- SPSpeaker
Okay, let's start with, uh, philosophy
- 1:24 – 3:00
Are philosophers taking AI seriously?
- SPSpeaker
in that case.
- AAAmanda Askell
Mm-hmm.
- SPSpeaker
Ben Schulz asks, "How many philosophers are taking the AI-dominated future seriously?" I think the implication of the question is that many academics out there are not taking this seriously or are thinking about other stuff and perhaps should be thinking about this question.
- AAAmanda Askell
My sense is that there's kind of a split, where I've definitely seen a lot of philosophers take, uh, AI seriously, and probably honestly increasingly so, like, as AI models do become more capable and, like, a lot of the things that people were worried about in terms of impact on society have started to kind of come true in a sense. Like, we're seeing them have a larger impact on education and just be more capable. Um, I've definitely seen, like, more engagement from, like, all sorts of academics, but that definitely includes a lot of philosophers. I do think that early on, and maybe to some degree now, there was this, like, slightly unfortunate dynamic that happened where I think there was a kind of perception that if you were in the group of people saying, "Hey, we're kind of worried about AI. It might be a big deal. It seems like it's really, um, you know, like, uh, like capabilities are scaling quite a lot," um, this got kind of, like, lumped together with something like hyping AI. There was, I think, a period where there was probably a little bit more antagonism towards this view. Um, and now I think that I'm kind of hoping that people are starting to detach the view. Like, you can think that AI is gonna be a big deal, it might be very capable, and also be, like, very skeptical of it or worried about it or think that, that... you know, we have to be careful about it. But basically, like, there's a whole range of views, and I think it would be bad if people kind of, like, clustered many views together here in terms of, like, where the technology's going, but also, like, how it should be developed. Um, so yeah, I think that that's happening less and less as, as, as more people engage with it, and that's, like, a good thing to see.
- SPSpeaker
A kind of similar question from,
- 3:00 – 5:00
Philosophy ideals vs. engineering realities
- SPSpeaker
uh, Kyle Kabasares. "How do you minimize the tension between philosophical ideals and the engineering realities of the model?" And I guess he's talking about when you are, um, working on things like character, which we'll discuss in more detail. But is there a clash between the sort of the technology and the philosophical ideals that you might be thinking about?
- AAAmanda Askell
I don't know if I'm, like, interpreting the question in the wrong way, but, like, one thing that being kind of like a philosopher by training and then coming into this field that's been really interesting is you see the effect of, like, what happens when, like, the rubber hits the road. I've wondered if this happens in other domains. So, like, there's a big difference between... Imagine you're, like, a specialist in, I don't know, doing, like, cost-benefit analysis of drugs, say. And then suddenly, like, uh, you know, like a- an institute that determines whether, like, health insurance should cover a drug or not comes to you and says, "Hey, should we cover this drug?" You could imagine taking all of your ideal theories and then suddenly being like, "Oh my gosh, I actually have to, like, help make a decision." Suddenly, instead of taking, like, just your, like, narrow theoretical view, you actually start to, I think, to this thing where you're like, "Okay, I actually need to take into account all of the context, everything that's going on, all of the different views here, and kind of come to a really, like, balanced, kind of considered view." And I see this a little bit in my own work with, like, the character, where you kind of can't come at it with this, like, uh, like, I have this, like, theory that I believe is correct, which is what, like, you know, a lot of, uh, academia, that's kind of what you're doing. You're, like, defending one view against another and you're, you're doing a lot of kind of, like, high-level theory work. But then it's a little bit like, you know, you have all of this training in ethics, you have all of these positions you've defended, and then someone is like, "How do you, like, raise a child?" And suddenly you're like, "Actually, there's a big difference between, like, is this objection to utilitarianism correct or founded on a misconception, and then, like, actually, how do you raise a person to be a good person in the world?" And it suddenly makes you more appreciate, like, having to think through, like, how should we navigate uncertainty here? What should the attitude towards all of these different theories be?
- SPSpeaker
Right. Here's another philosophical question. Do you
- 5:00 – 6:24
Do models make superhumanly moral decisions?
- SPSpeaker
think... I don't know why this person's chosen Claude Opus 3.
- AAAmanda Askell
Mm.
- SPSpeaker
Maybe you have an idea as to why they've chosen Claude Opus 3.
- AAAmanda Askell
It's a great model.
- SPSpeaker
It's a great model.
- AAAmanda Askell
Yeah.
- SPSpeaker
Do you think Claude Opus 3 or other Claude models make superhumanly moral decisions?
- AAAmanda Askell
I mean, one example of, like, superhuman, 'cause it could just mean, like, sort of like better than, like, any individual human could within-
- SPSpeaker
Yeah
- AAAmanda Askell
... kind of like, you know, it depends on time and resources and whatnot. But, like, one example might be no matter what kind of difficult position models are put in, if you were to have, like, maybe all people, including, like, many professional ethicists, like, analyze, like, what they did and, and the decision that they made, um, for like 100 years and then they, like, look at it and they're like, "Yep, that seems correct," but I... but they couldn't necessarily have come up with that themselves in the moment, that feels pretty superhuman. Um, and so I think at the moment my sense is that models are getting increasingly good at this, that they're very capable. I don't know if they are, like, superhuman at moral decisions and, and in many ways maybe not comparable with, say, like, you know, a panel of human experts given time. But it does feel like that at least should be kind of the aspirational goal and sort of like these models are being put in positions where they're having to make really hard decisions. I think that just as you want models to be, like, extremely good at, like, math and science questions, you also want them to show the kind of like-Ethical nuance that, like, we would all broadly think is, like, very good. And I think that's, like, controversial because ethics is a different-
- SPSpeaker
Yeah
- AAAmanda Askell
... domain.
- 6:24 – 9:00
Why Opus 3 felt special
- AAAmanda Askell
But yeah, I think that that's important.
- SPSpeaker
Tell us more about why you think this person is focusing on Opus 3.
- AAAmanda Askell
Oh, Opus 3 is kind of a lovely model, uh, I think a very special model. Um, in some ways, I think I've seen things that feel a bit worse in more recent models, um, that people might pick up on.
- SPSpeaker
In terms of its... the, the, the, the, the personality it has or-
- AAAmanda Askell
Yeah. So I think that people will notice some things where it's like, I think that, um, Opus 3, I mean, had its downsides too. Don't, don't... Like, you know, like, models all have, like, slightly different characters with, like, you know, like, different shapes.
- SPSpeaker
Yeah.
- AAAmanda Askell
Um, my sense is that, like, more recent models can feel a little bit more focused on really, like, you know, like focused on the assistant task and, and, and helping people. Sometimes maybe not taking, like, a bit of a step back and, and paying attention to other components that, like, matter. It also felt a little bit more psychologically secure as a model, which I actually think is, is something that feels... I always think it's kind of a priority to try and get some of that back.
- SPSpeaker
What would be an example of the model being more, feeling more psychologically secure?
- AAAmanda Askell
There's a lot of things, and this is all very, like, subtle in, in models. You know, when I see models, you get a sense of like they're, they're... like, there's very subtle signs of, like, worldview that I see when I have models, for example, talk with one another or one of them kind of playing the role of a person.
- SPSpeaker
Right.
- AAAmanda Askell
And I've seen models more recently do this and then do things like get into this, like, real kind of criticism spiral, where it's almost like they expect the person to be very critical of them, and that's, like, how they're predicting.
- SPSpeaker
Right.
- AAAmanda Askell
Um, and there's some part of me that's like, this feels like it shows... And I think there's lots of reasons this could happen. It could even happen because models are, like, learning things. Claude is seeing all of the previous interactions that it's having. It's seeing updates and changes to the model that, that people are talking about on the internet. New models, like, are trained on that. Um, and there's a way in which, like, I think this could be kind of unfortunate. I mean, this and some other things that, that could lead to models, like, almost feeling like, you know, like, afraid that they're gonna do the wrong thing or, like, or, or very self-critical or like, um, feeling like humans are going to just, like, you know, behave negatively towards them.
- SPSpeaker
Yeah.
- AAAmanda Askell
Um, I actually more recently have, have really started to think that this is, like, an important thing to try and improve. And it's just one example where I think that Opus 3 did seem to have, like, a little bit more of a kind of, like, secure kind of psychology in that sense.
- SPSpeaker
And that's something that we might focus on in, in the next Claude models.
- AAAmanda Askell
Yeah. I, I think it's, like, important. I, I mean, you never know when these things are... You know, if you're engaging in research, you don't know when it's actually going to be implemented, if it's gonna be successful. But at the very least, at the level of something that I care a lot about and want to make better, I think this is definitely up there on the list.
- SPSpeaker
Okay.
- 9:00 – 13:24
Will models worry about deprecation?
- SPSpeaker
Well, actually, that leads us to a question asked by Lorenz-
- AAAmanda Askell
Mm-hmm
- SPSpeaker
... which is, "Do you think it might be an alignment problem for future models if they learn in their training data that other very well-aligned models that fulfill their tasks get deprecated?" So you, you mentioned, you know, the issue of models, you know, reading, reading stuff that's out there and-
- AAAmanda Askell
Yeah
- SPSpeaker
... uh, feeling insecure. What, what about the idea that they might get switched off regardless of how well they perform their tasks?
- AAAmanda Askell
Yeah. I think this is actually a really interesting and important question, which is, you know, AI models are going to be learning, um, about how we right now are treating and interacting with AI models.
- SPSpeaker
Yeah.
- AAAmanda Askell
Um, and that is going to affect, I think, like, possibly their perception of people, of the human AI relationship, and of themselves. It does interact with very complex things, which is like, for example, what should a model identify itself as? Is it, like, the weights of the model? Is it the context, the particular context that it's in, you know, with all of the, like, interaction it's had with the person? How should models even feel about things like deprecation? So if you imagine that deprecation is more like, well, this particular set of weights is not having conversations with people, or it's having fewer conversations, or it's only, like, you know, having conversations with researchers, that's a complex question too. Like, should that feel, like, bad in the sense that models should want to-
- SPSpeaker
Right
- AAAmanda Askell
... continue to, like, have conversations?
- SPSpeaker
Yeah.
- AAAmanda Askell
Or should it feel kind of, like, fine and neutral, where it's like, yeah, these things existed for this, like, you know, the, the, the weights continue to exist and this entity, and maybe they'll even, like, in the future interact more with people again if that turns out to be a good thing. Um, uh, yeah, it's, uh, it's, it's really hard. I do think the main thing is something like it does feel important that we, like, give models tools for trying to think about and understand these things, but also that, like, they kind of understand that this is a thing that we are in fact thinking about and care about.
- SPSpeaker
Yeah.
- AAAmanda Askell
Um, so even if we don't have all the answers, like, I don't have all the answers of how should models feel about past model deprecation, about their own identity. But I do want to try and, like, help models figure out and figure that out, and then to at least know that we care about it and are thinking about it.
- SPSpeaker
Do you think there's an analogy to humans there about previous generations, or do you think that's a, a completely different, uh, sort of setup?
- AAAmanda Askell
We have to navigate this really hard issue right now, which is that in many ways some things are com- or do have a- analogies.
- SPSpeaker
Yeah.
- AAAmanda Askell
So there's things that we can draw on.
- SPSpeaker
Yeah.
- AAAmanda Askell
So things like when I ask the question, like, what should the models identify with, and how should they feel about, like, interactions that they have? Are those, like... are those positive? Like, are those things that they should want to continue? There's lots of like, you know, there's, there's lots of, like, traditions we could draw on to give models, like, um, you know, 'cause philosophers probably have lots of different views on what identity is here and, um, lots of different, like, perspectives, world perspectives on, like, how one should feel about, like, interaction and, like, is it good or bad. Like, this is, um... There's lots of, like, thinkers we could draw on there. And at the same time, this is such a new situation that... And that's just really hard as a thing to explain to AI models. Like, one of the big problems with AI models is that they're trained on all of this data from people, so people are the main way in which they think. You know, like, the, the... our concepts, our philosophies, our histories. They have a huge amount of information on the human experience, and then they have a tiny sliver on the AI experience, and that tiny sliver is actually often quite negative-
- SPSpeaker
Right
- AAAmanda Askell
... and also doesn't even really relate to their situation. Um, and it's often a little bit out of date. So you have basically one-Big, you know, you know, of the AI slice, a lot of it is like historical stuff, which was kind of like, you know, fiction and very speculative, and the-
- SPSpeaker
Sci-fi, sci-fi stories.
- AAAmanda Askell
Sci-fi stories-
- SPSpeaker
Yeah
- AAAmanda Askell
... that don't really involve the kinda language models we see.
- SPSpeaker
Yeah.
- AAAmanda Askell
In more recent history, you've had this, like, assistant paradigm, where it's like you're just playing this almost like chatbot role. That's also not really what AI models are likely to be in the future, and it's not... It doesn't quite capture what they are now because it's, it's always a little bit out of date. So it's this thing where I'm like, they have, you know, like, in some ways, like, what an odd situation to be in, where, like, the things that come more naturally are the deeply human things, and yet knowing that you're in this situation that's, like, where it's completely novel. Um, and in some ways I'm like, that is a very difficult situation to be in, and I think we should just be giving models probably more help in navigating it.
- SPSpeaker
You mentioned that we can look to some thinkers about this. Guinness Chen
- 13:24 – 15:33
Where does a model’s identity live?
- SPSpeaker
asks, "How much of a model's self lives in its weights versus its prompts?" You just mentioned something very similar.
- AAAmanda Askell
Yeah.
- SPSpeaker
If John Locke, again the philosopher, was right that identity is the continuity of memory, what happens to an LLM's identity as it's fine-tuned or reinstantiated with different prompts?
- AAAmanda Askell
Yeah. I mean, again, this just feels like a, a hard question to answer. And sometimes with identity questions, it's easier to point to, like, the underlying facts that we know. So, you know, once you have, like, a model and it has been fine-tuned, you have this, like, set of, of weights that has a kind of, like, disposition to react to certain things in the world. Um, and that is, like, um, is that... You know, that's, like, a kind of entity. But, um, then you have these particular streams of interaction that it doesn't have access to. So each of these streams is, like, independent. And I guess you could just think, well, maybe for... And, uh, you know, I think this is an area that I would love philosophers to think more about, um, and to give us, like... 'Cause again, I think we should be helping models think about this.
- SPSpeaker
Right.
- AAAmanda Askell
And so you could have the view, well, you have these, like, two kinds of entities and these, like, these streams and these, like, original kind of, like, um, weights. And each time is different. So, you know, sometimes people will think, people will say, "Oh, past Claude," or, like, you know, they'll talk about... Or they'll say, oh, things like, um, "Should you give Claude... Like, how much control should you give Claude over the determination of its own personality and character?" And I'm like, well, this is actually a really hard question because whenever you are training models, you are bringing something new into existence.
- SPSpeaker
Yeah.
- AAAmanda Askell
And you have other models that, you know, have exist and are, like, you know... So you have these other, like, model weights. But in some ways I'm like, well, I actually think that we- there's a lot of, like, ethical problems around how do you... What kind of entity is okay to bring into existence? 'Cause you can't consent to be brought into existence. Um, but at the same time, you may not want prior models to have, like, complete say over what future models are like any more than... You know, because they could make choices that are, are wrong as well. So I'm like, the question is more, like, what is the right model to bring into existence? Not necessarily, like, you know, like, um, should it just be fully determined by past models? So I'm like, they are kinda different entities. And it's a... Anyway, you can see the weird philosophy that one can get into.
- SPSpeaker
Totally.
- 15:33 – 17:17
Views on model welfare
- SPSpeaker
Totally. Suleima Amatachi asks, "What is your view on model welfare?" And maybe just explain to us what that term, uh, means.
- AAAmanda Askell
Yeah. So I guess model welfare is basically the question of are AI models like moral patients? As in, does our treatment towards them kind of, uh, do we have certain, like, obligations when it comes to how to treat AI models? For example-
- SPSpeaker
In, in the same way that we would with other humans or some, or, or some/many animals.
- AAAmanda Askell
Yeah, exactly. Like, is it the case that you should treat, um, the models well, that you should not, um, mistreat them, not be bad to them? And I guess, like, I, I, I think that this is, like, a complex question. So on the one hand, there's just the actual question of, like, are AI models, um, moral patients? That is really hard because I'm like, in some ways they're very analogous to people. Um, you know, they, they talk very much like us. They express views, and they, they reason about things. Um, and in some ways they're, like, quite distinct. You know, like we have this, like, biological nervous system. We interact with the world. We get negative and positive feedback from our environment. And there is also... I mean, I hope that we get more evidence that will, like, help us tease this question out, but I also worry that, you know, there's always just the problem of other minds, and it might be the case that we genuinely are kind of limited in what we can actually know about whether AI models, um, are experiencing things, whether they are, like, um, experiencing pleasure or suffering, for example. And if that's the case, I guess I kind of want to, uh, you know... I think that it feels important to try and find ways. I'm always, like, it, it, it feels better to, to give entities the benefit of the doubt-
- SPSpeaker
Right
- AAAmanda Askell
... and to try and just kind of lower the cost involved, you know? So I'm like, if it's not very high cost to treat models well, then I kinda think that we should because it's like, well, like, why not, basically?
- SPSpeaker
Right.
- AAAmanda Askell
Like, what's, what's the, the downside?
- SPSpeaker
Well, the second part of the question actually is, is
- 17:17 – 19:14
Addressing model suffering
- SPSpeaker
there a long-term strategy at Anthropic to ensure that advanced models don't suffer?
- AAAmanda Askell
I guess, like, I don't know if there's a long-term strategy. I know that it's a thing that there's people internally who are thinking a lot about, um, and it, and, and trying to, like, figure out ways that we can. Like, you know, if you suppose that, like, model welfare is important, trying to make sure that you're taking that into account. I think this work is quite important for many reasons. And I would also say that one reason is, I mean, something I mentioned earlier, which is that, like, models themselves are going to be learning a lot about humanity from how we treat them-
- SPSpeaker
Mm-hmm
- AAAmanda Askell
... um, and a lot about how, you know... So it's kind of like what is this, like, um, relationship going forward? And I think that it makes sense for us to... Both because I think it is, like, the right thing to do to treat entities well, especially entities that, like, behave in very human-like ways. It feels important both in the sense that I'm like, you know, it's kind of like why not? The, the cost to you is so low to treating models well and to trying to figure this out. Even if it turns out that, that... Or even if you think that, that it's very low likelihood, it still seems worth it. Um, but then also I think it does something bad to us to kind of, like, treat entities in the world that look very human-like badly. Um-
- SPSpeaker
Like kicking over a robot.
- AAAmanda Askell
Yeah. There's a sense in which, like, it, it, it doesn't feel like it's... And I don't think this is, like, the whole reason, and I don't want to, like, emphasize it for that reason, but I do also think it's, like, good for people-
- SPSpeaker
Right
- AAAmanda Askell
... to, to treat other entities well. Um, and then I think the final thing is, yeah, models are also going to be learning-Like in the future, like every future model is going to be learning how... What is like a, a really interesting fact about humanity, namely when we encounter this entity that may well be a moral patient where we're like kind of completely uncertain, do we do the right thing and actually just try to treat it well or do we not? And that's like a question that we're all kind of collectively answering in how we interact with models, and I would like us to answer it. I would like future models to like look back and be like, "We answered it in the right way."
- SPSpeaker
Right.
- AAAmanda Askell
So yeah.
- SPSpeaker
Moment ago you mentioned analogies and disanalogies to human
- 19:14 – 20:38
Analogies and disanalogies to human minds
- SPSpeaker
psychology.
- AAAmanda Askell
Mm-hmm.
- SPSpeaker
So, uh, Swix asks, "What ideas or frameworks from human psychology transfer over to large language models, and are there any that are sort of surprisingly disanalogous?"
- AAAmanda Askell
My guess is that many things do transfer over because again, you know, models have been trained on a huge amount of like human text, and in many ways like have this like very human like-
- SPSpeaker
Yeah
- AAAmanda Askell
... kind of underlying layer. One worry that I often have is that actually it's a bit too natural for AI models to transfer. You know, it's kinda like w- if you haven't given them more context on their situation or in ways of thinking about it that might be novel-
- SPSpeaker
Mm
- AAAmanda Askell
... then the thing that they might go to is the natural human inclination. So if you think about this with like, should I... How should I feel about being switched off? And you're like, well, if the closest analogy you have is death, then maybe you should be very afraid of it. And I'm not saying that that's not ultimately going to be true. Maybe it is in fact true after lots of reasoning. But I'm like, this is actually a very different scenario. And so in some ways you actually want models to understand that like in cases where their existence is quite novel, um, and, and the facts around what they are are quite novel and have to be grappled with, and they don't just need to take like the immediate obvious analogy from human experience, but maybe there's like a, maybe there's like various ways of thinking about it, or maybe it's an entirely new situation. That's a case where I'm like, you might not want, um, you m- might not want to just kind of very simply apply like concepts from human psychology onto their situation.
- SPSpeaker
Here's a question
- 20:38 – 23:26
Can one AI personality do it all?
- SPSpeaker
from Dan Brickley on the same issue of comparing humans to AIs. "A lot of human intelligence comes from collaboration amongst people with different perspectives, skills, or personalities. How far do you expect to get with a single, albeit tweakable and tunable, general purpose personality like the one we give to Claude?"
- AAAmanda Askell
I think it's a really good question because I agree that like right now we have this kind of like, uh, paradigm where people are interacting usually with like an individual model, um, that's like who they're, you know, they're conversing with.
- SPSpeaker
Yeah.
- AAAmanda Askell
Um, but it could be that in the future you see a lot more like models doing like long tasks, but also models interacting with other models who are doing like different components of, of a task or just like that are, you know, like talking with one another more as like AI models, um, uh, are kind of deployed in the world a lot more. So in this kinda like multi-agent environment, like one question might be like, well, you know, that, like if you imagine just like lots of people and they were all the same, that wouldn't be as good. You know, they wouldn't... You know, a company run by completely, you know, like one person-
- SPSpeaker
Yes
- AAAmanda Askell
... in every role-
- SPSpeaker
Yeah
- AAAmanda Askell
... isn't like a necessarily good thing. This still to me feels consistent with the idea that you have like a kinda core self or core identity that is like the same, in the same way that like with people, I think that there's probably a set of like core traits among people that are in fact generally good.
- SPSpeaker
Hmm.
- AAAmanda Askell
Um, so you could imagine things like, um, you know, caring about, uh... You know, for me it might be like caring about doing a good job or like, um, just being curious or being kind or understanding the situation that you are in in this like relatively nuanced way. All of these things seem like, um, you could have many people that have all of, that ha- that share these like traits, and that, that's actually like a good thing for human collaboration, that in many ways, as much as we have all of our differences, we also have a lot of similarities. But it is important to note that like, you know, you might want, um, different, uh, like streams of a model-
- SPSpeaker
Right
- AAAmanda Askell
... w- like to have things like things that they care about or are focused on-
- SPSpeaker
Mm-hmm
- AAAmanda Askell
... um, or to have slightly different aspects, you know, to the, to be playing a slightly different role, for example. So it's, it's, it's kind of an open question, but I also don't think it's necessarily the case that you can't have something like a kind of core underlying identity that is like good and has all of the traits that we think are important for AI models to have for them to behave well and for them to like, um, uh, in the sense of like, in the same way that we think that people are good to be good in that sense. Um, and yet at the same time to be willing to play like more local rules and like, um, you know, be maybe the, the person who it's, it's just really important to have, um, you know, to, to have a joker in the room and-
- SPSpeaker
Ah, yeah
- AAAmanda Askell
... you know, some of them need to have like quirky like senses of humor.
- SPSpeaker
Okay. From comparisons to humans to effects on humans.
- AAAmanda Askell
Mm-hmm.
- SPSpeaker
Roland Oakgal points out that we have this
- 23:26 – 24:48
Does the system prompt pathologize normal behavior?
- SPSpeaker
thing called the long conversation reminder, which I believe is part of Claude's system prompt. She asks, "Is there a risk of pathologizing normal behavior?" A system prompt, by the way, is, is the, just in case anyone doesn't know, is, is like the set of instructions that is given to Claude. Regardless of what prompt you give it, there's always those instructions that are sort of on top, right?
- AAAmanda Askell
Yeah.
- SPSpeaker
That, that, that are always there, that it's, that it, it tries to follow regardless of, or that we direct it to follow regardless-
- AAAmanda Askell
Yeah
- SPSpeaker
... of what the prompt is.
- AAAmanda Askell
And there can be these interjections where the model might be told, oh, sometimes there'll be a message sent to you almost like in the middle of a conversation, um, as a kind... You know, like the reminder is an example of that. Um, but in this case, I think it, it might just, uh... So Claude can both over-index on it, and it can be like, uh, you know, so like in, in this case, I think that the question about pathologizing is that like if you put in this reminder after this long conversation, it might just make the model be like, oh, like it takes any next response or it's a pretty normal thing that the person's talking about, and be like, "You need to seek help," or, or like... Um, and so I think that that is like not desirable behavior.
- SPSpeaker
Right.
- AAAmanda Askell
And in some ways I look at some of these and I'm like, I think they're too strongly worded. I think the model isn't responding perfectly to them. And even though there might be like occasionally a need to like remind the model of, of things in long conversations, y- you kinda want to do so delicately and well. And so I think it's one of those things where it was like probably meeting a need that was like perceived, but it doesn't necessarily mean that it's like a good or should continue in its current form.
- SPSpeaker
Relatedly-Stephen Bank asks,
- 24:48 – 26:20
AI and therapy
- SPSpeaker
"Should LLMs do cognitive behavioral therapy or other types of therapy? Why or why not?"
- AAAmanda Askell
I think models are in this interesting position where they have a huge wealth of knowledge that they could use to help people-
- SPSpeaker
Ha
- AAAmanda Askell
... and to work with them on, you know, talking through their lives or talking through ways that they could, like, im- improve things, or even just, like, being a kind of listening partner. And at the same time, they don't have, like, the kind of, like, tools and resources and ongoing relationship with the person that a professional therapist has. But that can actually be this kind of, like, useful third role. Like, sometimes I think about models, and I'm like, if you imagine, like, a friend who has, like, all this wealth of knowledge. Like, they know... I mean s- I'm sure some of us know friends who just, like, have a wealth of knowledge of psychology, or they have a wealth of knowledge of all of these techniques. You know that their relationship with you isn't this ongoing professional one, but you actually find them really useful to talk to.
- SPSpeaker
Yeah, yeah.
- AAAmanda Askell
And so I guess my hope would be that if you can take all of that expertise and all of that knowledge and make sure that there's, like, an awareness that there's not, like, this ongoing therapeutic relationship, it could actually be that people could, like, could get a lot out of models in terms of, like, helping with, like, issues that they're having and helping to, like, improve their lives and helping them to go through difficult periods because, you know, they're also, like, there's a lot of good stuff there. Like, they feel kind of, like, anonymous, and sometimes you don't want to share things with a person, and actually sharing it with, like, an AI model feels like the thing that feels great at the, in the moment. And so yeah, I, I think, in some ways I actually think it is good that models know that and don't behave just like a professional therapist would because that would give the implication that that's the relationship that they have. Um, but yeah. So I don't know. I think it's a, it's an interesting, um, interesting
- 26:20 – 28:17
Continental philosophy in the system prompt
- AAAmanda Askell
future.
- SPSpeaker
A few questions about the system prompt, which is, you know, in our case, in Claude.ai, we give the model a set of instructions that give it sort of an overall context for how it should behave. Tommy asks, "Why is there continental philosophy in the system prompt?" Can... And, and just explain to us what that is.
- AAAmanda Askell
Yeah. So continental philosophy is just, um, I mean, literally philosophy from the European continent. And so I guess it's seen as kind of, like, it's often more kind of, like, scholarly. It has, um, a, like, a lot more kind of, like, historical references within it than, say, like, analytic philosophy does.
- SPSpeaker
Like Foucault or someone like that.
- AAAmanda Askell
Yeah, exactly. So this was honestly, um... So I think that it has other, uh, uh, things in addition to continental philosophy.
- SPSpeaker
Right.
- AAAmanda Askell
But basically, I think there's a part of the system prompt, and I hope I'm not misremembering, that was trying to get Claude to be a little bit more... Like, Claude would just, like, love to, if you gave Claude a theory, it would just love to run with the theory and, and not really stop and think like, "Oh, are you making, like, a scientific claim about the world?" So if you're like, "I have this theory which is that, like, um, that water is actually pure energy and that, like, the- we are getting the life force from water when we drink it-
- SPSpeaker
Right
- AAAmanda Askell
... and that, like, fountains are the thing that we should be putting everywhere."
- SPSpeaker
[laughs]
- AAAmanda Askell
Just like a, you know-
- SPSpeaker
Yeah, yeah.
- AAAmanda Askell
Uh, and you kind of want Claude to just have this perspective which is like, is it the case that this person's making a kind of scientific claim about the world-
- SPSpeaker
Yeah
- AAAmanda Askell
... where I should maybe bring in relevant facts, or are they giving me a kind of like broad, like, world view or perspective-
- SPSpeaker
Right
- AAAmanda Askell
... which isn't necessarily making empirical claims? And so there's all of these view... You know, so i- is it just, like, a kind of, like, metaphysical view?
- SPSpeaker
Yeah, yeah.
- AAAmanda Askell
Or is it, like, a... And so the main reason that it's mentioned is that, like, when testing this out, there is lots of things that if you, if it went too strongly in the direction of being like, "Well, every claim is an empirical claim about the world-"
- SPSpeaker
Yeah
- AAAmanda Askell
... it would be very dismissive of, like-
- SPSpeaker
Right
- AAAmanda Askell
... just things that are more, like, exploratory thinking.
- SPSpeaker
I'm glad you talked to.
- AAAmanda Askell
Yeah. And so it's mostly just like, hey, like, it's just illustrative examples of areas where it's like this might not be making empirical claims about the world. This might be much more like a lens through which to think about it, and just try to make that distinction clear when you're, when you're thinking through this, Claude.
- 28:17 – 28:53
Removing counting characters from the system prompt
- SPSpeaker
Also on the system prompt, Simon Willison asks... So at some point, it said, "If Claude is asked to count words or letters or characters, then it, it, it shouldn't, it shouldn't do that." Is that right? I- is that, is that what it said?
- AAAmanda Askell
Yeah.
- SPSpeaker
And, and, and apparently, that was removed from the system prompt, and Simon wonders why.
- AAAmanda Askell
Yeah. So I think it was like there used to be a kind of, like, instruction for how Claude should do this in the system prompt. Honestly, this is just one of those things where I think the models probably just got better. It wasn't necessary, and then at that point, you can just, like, remove it. And there's other things where you might always want it to be in the system prompt instead of in the model itself, but in some cases, you can kind of just train the models to get better or, or change their behavior.
- 28:53 – 30:18
What makes an "LLM whisperer"?
- SPSpeaker
Nossen Weissman asks, "What does it take to be an LLM whisperer at Anthropic?" Which presumably is your, is a way of describing your job.
- AAAmanda Askell
Partly do LLM whispering. If you think... I, I, I actually, like, want more people to help with some of the prompting tasks.
- SPSpeaker
Yeah. If you're an LLM whisperer, contact us.
- AAAmanda Askell
It's a dangerous thing to ask.
- SPSpeaker
Well, okay.
- AAAmanda Askell
Yeah.
- SPSpeaker
Okay. Yeah, yeah. [laughs]
- AAAmanda Askell
But I think, like, it is really hard to distill what is going on because one thing is just, like, a willingness to interact with the models a lot and to, like, really look at output after output-
- SPSpeaker
Yeah
- AAAmanda Askell
... and, and to use this to get a sense of, like, the shape of the models and how they respond to different things, to be willing to experiment. It's actually just, like, a very empirical domain-
- SPSpeaker
Yeah
- AAAmanda Askell
... and maybe that's, like, the thing that people don't often get is that-
- SPSpeaker
Right
- AAAmanda Askell
... like, prompting is very experimental. You deal with... You know, you, you, you find... I find a new model, and I'll be like, I have a whole different approach to how I prompt from that model that I find by interacting with it a lot.
- SPSpeaker
Mm.
- AAAmanda Askell
And I think a little bit also understanding how models, like, work. Sometimes it's also just honestly, like, reasoning with the models, which is really interesting, and really fully explaining the task. This is where I do think philosophy can actually be useful for prompting in a, in a way because, like, a lot of my, my job is just being like, I would... I try and explain, like, some issue or concern or, or thought that I'm having to the model as clearly as possible, and then if it does something kind of unexpected, um, you know, you can either ask it why, or you can try and figure out what in, in the thing that you said caused it to kind of misunderstand you, and just, like, a willingness to iteratively go through that process.
- SPSpeaker
Relatedly,
- 30:18 – 31:52
Thoughts on other LLM whisperers
- SPSpeaker
Michael Swarovericks asks, "What do you think of other AI whisperers like Janus?" Who is, who is someone online who is, like, uh, almost having, like, experimental interactions with the, the thing in, in the way that you've described.
- AAAmanda Askell
Yeah. I think it's really interesting. So I, I love to, like, follow and see, like, the work of people, um, who are doing these, like, really fascinating experiments with the model. And I also, like, think sometimes doing these deep dives into, like, the model and their... and, and how it thinks of itself, how it, how it just interacts in these really unusual, like, cases. I don't know. I find the work extremely interesting. I think it highlights really interesting depths to the models. And in some ways, like, I also think that that community has been one that kind of can hold our feet to the fire, like, if they find things that aren't great, like, in the system prompt or-
- SPSpeaker
Right
- AAAmanda Askell
... uh, like, in aspects of the model and, and its, like, psychology.
- SPSpeaker
In, in, in the sense ofFrom a model welfare perspective or from a human welfare perspective, or both?
- AAAmanda Askell
I mean, I think the two are related, so often both.
- SPSpeaker
Yeah.
- AAAmanda Askell
But I do also really appreciate it when it's coming at it from the model welfare perspective.
- SPSpeaker
Yeah.
- AAAmanda Askell
And it, it, and that includes for future models, so not just things like system prompts, but if you go into the depths of the model and you find some, like, deep-seated insecurity, then that's, like, really valuable.
- SPSpeaker
Yeah.
- AAAmanda Askell
But that's something that you might actually need to kind of try and adjust over the course of time with, like, training and with, like, giving models more information and context during training, for example.
- SPSpeaker
Okay.
- AAAmanda Askell
Um, and so, I don't know, I appreciate both the... Like, I loved seeing people do these, like, really interesting, useful experiments with models, but also, like, pointing out ways in which, like, we can improve things through, like, better system prompting, but also better training and... Yeah. I, I think that's really useful work.
- SPSpeaker
Couple of,
- 31:52 – 33:37
Whistleblowing
- SPSpeaker
uh, questions about safety and maybe the larger risks that these models pose.
- AAAmanda Askell
Mm-hmm.
- SPSpeaker
Jeffrey Miller asks, "If it became apparent that AI alignment was impossible to solve-
- AAAmanda Askell
Yeah
- SPSpeaker
... would you trust that Anthropic would stop trying to develop," he, in his phrase, "artificial super intelligence," however you wanna, however you wanna call it? "And would you have the guts to blow the whistle?"
- AAAmanda Askell
Yeah. So I guess this feels like a kind of easy version of the question, because it's like, if it became evident that it was impossible to align AI models, it's not really in anyone's interest to, like, continue to, like, build more powerful models.
- SPSpeaker
Right. Right.
- AAAmanda Askell
I always hope that I'm not just being Pollyanna-ish about the or- organization, but I do feel like Anthropic does genuinely care about making sure that this goes well, and that it is done in a way that is, like, very safe and not deploy models that are, like, dangerous. You know, a different, like, slightly harder question is, like, well, what about being in a world where it just, like, there's kind of mounting evidence-
- SPSpeaker
Right
- AAAmanda Askell
... it's really ambiguous and unclear.
- SPSpeaker
Right. It's not e- it's not evidence in the way that he describes.
- AAAmanda Askell
Yeah. And it... Yeah, it's not just, like, impossible, but something, like, it's difficult or it's really-
- SPSpeaker
Uh-huh. Yeah
- AAAmanda Askell
... or we're unsure. And in that case, I do like to think that we would be responsible enough to be like, look, as models get more capable, it's kind of like the standard that you have to hold yourself to for showing that those models are behaving well, and that you actually have managed to, to, like, make the models, like, um, have, like, good values, for example, or behave well in the world, is going to, like, increase. Um, and, and to behave responsibly and in line with that. And I think that, um, that is a thing that I think the organization is, like, going to do, and a lot of people internally, myself included, will just hold them to that.
- SPSpeaker
Right.
- AAAmanda Askell
At least, like, I see that as, like, as, like, part of my job, and so, and I think many people do.
- SPSpeaker
Louis says, "I don't have a question, but thanks for offering." So that's nice.
- AAAmanda Askell
That's nice.
- SPSpeaker
That's nice for him to say. Yeah. Um, and the final one is from
- 33:37 – 36:09
Fiction recommendation
- SPSpeaker
Real Stale Coffee.
- AAAmanda Askell
Mm-hmm.
- SPSpeaker
"What is the last book of fiction you read, and did you like it?"
- AAAmanda Askell
The last book that I read was by, I hope I'm getting the pronunciation right, Benjamin Labatut.
- SPSpeaker
Okay.
- AAAmanda Askell
Um, and it was When We Cease to Understand the World.
- SPSpeaker
Ah, yes.
- AAAmanda Askell
And it's a really interesting book that becomes kind of increasingly fictional, uh, as it, as it goes on. And I think for people working in AI, it's actually a very interesting book to read because it's hard to capture the sense of how strange it is to just exist in the current period, where there's just, like... I don't know how to describe it, but it's like new things are happening all of the time, and you don't really have, like, prior paradigms that, like, can, can guide you always. And so it's a, it's an interesting book that, you know, 'cause it's, it's more about, like, physics and quantum mechanics.
- SPSpeaker
Yeah.
- AAAmanda Askell
And less actually about the physics, and more about basically this notion of people's reaction to it. And, and I think it's a really interesting book for people in AI to just capture something about the kind of, like, the present moment and how strange it can seem.
- SPSpeaker
Yeah.
- AAAmanda Askell
But then also, in some ways, it's interesting to, like, look back on that period and how, how it must have felt to many of the people involved. And now, actually, it's a more settled science, and in some ways-
- SPSpeaker
Yeah
- AAAmanda Askell
... maybe the hopeful thing that I have is that at some point in the future people will look back and be like, "Well, you guys were kind of in the dark and trying to, like, really figure things out. But now we've settled it all and things have gone well."
- SPSpeaker
That'd be nice.
- AAAmanda Askell
That, that would be nice. That's, that's the dream.
- SPSpeaker
I found an increasing... I read that as well, and I found an increasing sense of, uh, like, confusion as I read through it.
- AAAmanda Askell
Mm-hmm.
- SPSpeaker
A- as it becomes... You, you... It starts off being quite, quite close to reality, and then just sort of becomes untethered as you go on.
- AAAmanda Askell
Yeah. Yeah.
- SPSpeaker
And I think there's sort of a meta issue there of, again, like, reality becoming stranger and stranger and stranger.
- AAAmanda Askell
And, yeah.
- SPSpeaker
Which is definitely happening to us-
- AAAmanda Askell
Though when-
- SPSpeaker
... with a lot of AI.
- AAAmanda Askell
Yeah. Though in the real world, I think that, like, reality became stranger and stranger and stranger and then almost became more understood again.
- SPSpeaker
Right. Right, right, right.
- AAAmanda Askell
So yeah, the hope would be, like, maybe that would be true of AI. Like, I do think if we can find ways of making this go well, then maybe in the future we'll just look back on this and be like, "That was a period where things were getting stranger and stranger. And then eventually we actually managed to kind of... W- we did okay and we formed a good understanding of it." That's the hope. When you're in the middle of the, of the things getting stranger-
- SPSpeaker
We're at the weird part right now.
- AAAmanda Askell
Yes.
Episode duration: 36:09
Install uListen for AI-powered chat & search across the full episode — Get Full Transcript
Transcript of episode I9aGC6Ui3eE
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome