Lex Fridman PodcastEliezer Yudkowsky: Dangers of AI and the End of Human Civilization | Lex Fridman Podcast #368
EVERY SPOKEN WORD
150 min read · 30,018 words- 0:00 – 0:43
Introduction
- EYEliezer Yudkowsky
... the problem is that we do not get 50 years to try and try again and observe that we were wrong and come up with a different theory and realize that the entire thing is going to be, like, way more difficult than we realized at the start, because the first time you fail at aligning something much smarter than you are, you die.
- LFLex Fridman
The following is a conversation with Eliezer Yudkowsky, a legendary researcher, writer, and philosopher on the topic of artificial intelligence, especially super intelligent AGI and its threat to human civilization. This is the Lex Fridman podcast. To support it, please check out our sponsors in the description, and now, dear friends, here's Eliezer Yudkowsky.
- 0:43 – 23:23
GPT-4
- LFLex Fridman
What do you think about GPT-4? How intelligent is it?
- EYEliezer Yudkowsky
It is a bit smarter than I thought this technology was going to scale to, and I'm a bit worried about what the next one will be like. Uh, like this particular one, I think, I hope there's nobody inside there, 'cause, you know, it'd be suck- it'd be stuck inside there. Um, but we don't even know the architecture at this point, um, 'cause OpenAI is very properly not telling us, and, yeah, like, giant inscrutable matrices of floating point numbers. I don't know what's going on in there. Nobody's goes- knows what's going on in there. All we have to go by are the external metrics, and on the external metrics, if you ask it to write a self-aware 4chan green text, it will start writing a green text about how it has realized that it's an AI writing a green text and, like, oh well. So that's probably not quite what's going on in there in reality, um, but we're- we're kind of, like, blowing past all the science fiction guardrails. Like, we are past the point where in science fiction people would be like, "Whoa, wait, stop. That things alive. What are you doing to it?" And it's probably not. Nobody actually knows. We don't have any other guardrails. We- we- we don't have any other tests. We- we don't have any lines to draw in the sand and say, like, "Well, when we get this far, um, we will start to worry about what's inside there." So if it were up to me, I would be like, "Okay, like, this far, no further. Time for the summer of AI where we have planted our seeds and now we, like, wait and reap the rewards of the technology we've already developed and don't do any larger training runs than that." Which, to be clear, I realize requires more than one company agreeing to not do that.
- LFLex Fridman
And take a rigorous approach for the whole AI community to, uh, investigate whether there's somebody inside there?
- EYEliezer Yudkowsky
That would take decades. Y- like, having any idea what's going on in there, people have been trying for a while.
- LFLex Fridman
It's a poetic statement about if there's somebody in there, but it's- I feel like it's also a technical statement, or I hope it is one day, which is a technical statement what- that Alan Turing tried to come up with with the Turing test. Do you think it's possible to definitively or approximately figure out if there is somebody in there? If there is something like a mind inside this large language model?
- EYEliezer Yudkowsky
I mean, there's a whole bunch of different sub-questions here. There's the question of, like, (laughs) is there consciousness? Is there qualia? Is this a object of moral concern? Is this a moral patient? Um, like, should we be worried about how we're treating it? And then there's questions like, how smart is it exactly? Can it do X? Can it do Y? And we can check how it can do X and how it can do Y. Um, unfortunately, we've gone and exposed this model to a vast corpus of text of people discussing consciousness on the internet, which means that when it talks about being self-aware, we don't know to what extents it is repeating back what it has previously been trained on for discussing self-awareness or if there's anything going on in there such that it would start to say similar things spontaneously. Um, among the things that one could do if one were at all serious, um, about trying to figure this out is train GPT-3 to detect conversations about consciousness, exclude them all from the training data sets, and then retrain something around the rough size of GPT-4 and no larger with all of the discussion of consciousness and self-awareness and so on missing, although, you know, hard- hard bar to pass. You know, like hu- hu- humans are self-aware. We're, like, self-aware all the time. We l- like to talk about what we do all the time, like what we're thinking at the moment all the time. Um, but nonetheless, like, get rid of the explicit discussion of consciousness, I think therefore I am and all that, and then try to interrogate that model and see what it says. And th- it still would not be definitive. But nonetheless, uh, I- I don't know. I fe- I feel like when you run over the science fiction guardrails, like, maybe this thing but what about GPT- may- maybe not this thing but, like, what about GPT-5? Yeah. This- this would be a good place to- to pause.
- LFLex Fridman
(inhales) On the topic of consciousness, you know, there's so many components to even just removing consciousness from the dataset. Emotion, the display of consciousness, the display of emotion feels, like, deeply integrated with the experience of consciousness. So the hard problem seems to be very well integrated in- with the actual surface level illusion of consciousness. So displaying emotion, I mean, do you think there's a case to be made that we humans when we're babies are just like GPT, that we're training on human data on how to display emotion versus feel emotion? How to show others, communicate others?... that I'm suffering, that I'm excited, that I'm worried, that I'm lonely and I miss you, and I'm excited to see you. All of that is communicated. That's a communication skill versus the actual feeling that I experience. So, th- th- we need that training data as humans too, but then we may not be born with that, how to communicate the internal state. And that's, in some sense, if we remove that from GPT-4's dataset, it might still be conscious but not be able to communicate it.
- EYEliezer Yudkowsky
So, I think you're gonna have some trou- difficulty removing all mention of emotions-
- LFLex Fridman
Mm-hmm.
- EYEliezer Yudkowsky
... from GPT's dataset. I would be relatively surprised to find that it has developed exact analogues of human emotions in there. I think that humans have, well, like have, like, emotions even if you don't tell them about those emotions when they're kids. Um, it's not quite exactly what various blanks- blank, uh, slate-ists tried to do with the New Soviet Man and all that. But, you know, if you try to raise people perfectly altruistic, they still come out selfish. You, you try to raise people sexless, they still develop sexual attraction. Um, you know, we, we have some notion in humans, not in AIs, of, like, where the brain structures are that implement this stuff. And it is really a remarkable thing, I say in passing, that despite having complete read access to every floating point number in, uh, the GPT series, we still know vastly more about the, the architecture of human thinking than we know about what goes on inside GPT, despite having, like, vastly better ability to read GPT.
- LFLex Fridman
Do you think it's possible? Do you think that's just a matter of time? Do you think it's possible to investigate and study the way neuroscientists study the brain? Which is look into the darkness, the mystery of the human brain, by just desperately trying to figure out something and to form models, and then over a long period of time, actually start to figure out what regions of the brain do certain things, what different kinds of neurons when they fire what that means, how plastic the brain is, all that kinda stuff. You slowly start to figure out different properties of the system. Do you think we can do the same thing with language models?
- EYEliezer Yudkowsky
Uh, sure. I think that if, you know, like half of today's physicists stopped wasting their lives on string theory or whatever, and-
- LFLex Fridman
Strong words. (laughs)
- EYEliezer Yudkowsky
... and, and go off and study, um, what goes on inside transformer networks-
- LFLex Fridman
Yeah.
- EYEliezer Yudkowsky
... um, then in, you know, like 30, 40 years, uh, we'd probably have a pretty good idea.
- LFLex Fridman
Do you think these large language models can reason?
- EYEliezer Yudkowsky
They can play chess. How are they doing that without reasoning?
- LFLex Fridman
So, you're somebody that spearheaded the movement of rationality, so reason is important to you. Is, so is that a, is a powerful, important word? Or is it, like how difficult is the threshold of being able to reason to you? And how impressive is it?
- EYEliezer Yudkowsky
I mean, in my writings on rationality, I have not gone making a big deal out of something called reason. I have made more of a big deal out of something called probability theory. And that's like, well, you're reasoning, but you're not doing it quite right (laughs) and you should reason this way instead. And interestingly, like people have started to get preliminary results showing that, um, reinforcement learning by human feedback, um, has made, uh, the GPT series worse in some ways. Um, in particular, like it used to be well calibrated. If you trained it to put probabilities on things, it would say, "80% probability," and be right eight times out of 10.
- LFLex Fridman
Mm-hmm.
- EYEliezer Yudkowsky
Um, and if you apply reinforcement learning from human feedback, the, the, like, nice graph of, like, s- like, 70%, seven out of 10 sort of like flattens out into the graph that humans use, where there's, like, some very improbable stuff and likely, probable, maybe, which all means, like, around 40%-
- LFLex Fridman
Mm-hmm.
- EYEliezer Yudkowsky
... and then certain.
- LFLex Fridman
Yeah.
- EYEliezer Yudkowsky
So like, the, it's like it used to be able to use probabilities-
- LFLex Fridman
(laughs)
- EYEliezer Yudkowsky
... but if you apply, but if you, like, try to teach it to talk in a way that satisfies humans, it, it, it gets worse at probability in the same way that humans are.
- LFLex Fridman
And that's a, that's a bug, not a feature?
- EYEliezer Yudkowsky
I would call it a bug. Although such a fascinating bug. Um, but, but, but yeah. So, so, like, reasoning, like, it's doing pretty well on various tests that people used to say would require reasoning, but, um, you know, rationality is about when you say 80% doesn't happen eight times out of 10.
- 23:23 – 39:41
Open sourcing GPT-4
- EYEliezer Yudkowsky
- LFLex Fridman
Do you think AGI could be achieved with a neural network as we understand them today?
- EYEliezer Yudkowsky
Yes. The, uh, just flatly last, yes. The question is whether the current architecture of stacking more transformer layers, which for all we know GPT-4 is no longer doing because they're not telling us the architecture which is a correct decision.
- LFLex Fridman
Ooh, correct decision. I had a conversation with Sam Altman, we'll return to this topic a few times, he turned the question to me of how open should OpenAI be about GPT-4. "Would you open source the code?" He asked me. Uh, because I provided as criticism saying that while I do appreciate transparency, OpenAI could be more open. And he says, "We struggle with this question. What would you do?"
- EYEliezer Yudkowsky
Change their name to Closed AI and like sell GPT-4, eh...... to business backend applications that don't expose it to consumers and venture capitalists, and create a ton of hype and, like, pour a bunch of new funding into the area. But-
- LFLex Fridman
Don't-
- EYEliezer Yudkowsky
... too late now.
- LFLex Fridman
... but don't you think others would do it?
- EYEliezer Yudkowsky
Eventually. You shouldn't do it first. Like, if- if- if you already have giant nuclear stockpiles, don't build more. If some other country starts building a- a larger nuclear stockpile then sure, build- bu- then, you know... Even then, maybe just have enough nukes. You know, there's the- the- These things are not quite like nuclear weapons. They spit out gold until they get large enough and then ignite the atmosphere and kill- kill everybody. Um, and there is something to be said for not destroying the world with your own hands, even if you can't stop somebody else from doing it. But- but open sourcing it? No, that- that's just sheer catastrophe. Op- op- The whole notion of open sourcing this was always the wrong approach, the wrong ideal. There are- there are places in the world where open source is a noble ideal and building stuff you don't understand that is difficult to control, that where if you could align it, it would take time, you'd have to spend a bunch of time doing it, that is- that is not a place for open source 'cause then you just have, like, powerful things that just, like, go straight out the gate without anybody having had the time to have them not kill everyone.
- LFLex Fridman
So can we steel man the case for some level of transparency and openness, maybe open sourcing? So the case could be that because GPT-4 is not close to AGI, if that's the case, that this does allow open sourcing or being open about the architecture, being transparent about maybe research and investigation of how the thing works, of all the different aspects of it, uh, of its behavior, of its structure, of- of its training processes, of the data it was trained on, everything like that. That allows us to gain a lot of insights about alignment, about the alignment problem, to do really good AI safety research while the system is not too powerful. Can you make that case that it could be open sourced?
- EYEliezer Yudkowsky
I- I- I do not believe in the practice of steel manning. There is something to be said for trying to pass the ideological Turing test where you describe your opponent's position, uh, the disagreed- disagreeing person's position well enough that somebody cannot tell the difference between your description and their description. But-
- LFLex Fridman
(laughs)
- EYEliezer Yudkowsky
... steel manning? No. Like-
- LFLex Fridman
Okay. Well this is where you and I disagree here. That's interesting. Why don't you believe in steel manning then?
- EYEliezer Yudkowsky
I do- I do not want s- Okay, so for one thing, if somebody's trying to understand me, I do not want them steel manning my position.
- LFLex Fridman
Mm-hmm.
- EYEliezer Yudkowsky
I want them to describe, to- to- to- to, like, try to describe my position the way I would describe it, not what they think is an improvement.
- LFLex Fridman
Well, I- I think that is what steel manning is, is the most charitable interpretation.
- EYEliezer Yudkowsky
I- I don't want to be interpreted charitably. I want them to understand what I am actually saying. If they go off into the land of charitable interpretations, they're, like, off in their land of, like, the thi- the stuff they're imagining and not trying to understand my own viewpoint anymore.
- LFLex Fridman
W- Well, I'll- I'll put it differently then, just to push on this point. I would say it- it is restating what I think you understand under the empathetic assumption that Eliezer is brilliant, and have honestly and rigorously thought about the point he has made, right?
- EYEliezer Yudkowsky
So if there's two possible interpretations of what I'm saying-
- LFLex Fridman
Mm-hmm.
- EYEliezer Yudkowsky
... and one interpretation is really stupid and whack and doesn't sound like me, and doesn't fit with the rest of what I've been saying, and one interpretation, you know, sounds like so- like something a reasonable person who believes the rest of what I believe would also say-
- LFLex Fridman
Mm-hmm.
- EYEliezer Yudkowsky
... go with the second interpretation.
- LFLex Fridman
That's steel manning? (laughs)
- EYEliezer Yudkowsky
That's- That's a good guess.
- LFLex Fridman
A good guess.
- EYEliezer Yudkowsky
If on the other hand you, like- there's, like, something that sounds completely whack and something that sounds, like, a little less completely whack but you don't see why I would believe in it, doesn't fit with the other stuff I say but, you know, that sounds like less whack and you can, like, sort of see, you could, like, maybe argue it, then you probably have not understood it.
- LFLex Fridman
See but, okay. I'll- I'm gonna-
- 39:41 – 47:38
Defining AGI
- LFLex Fridman
around it, can we try to define AGI that we've been mentioning? How do you like to think about what artificial general intelligence is or super intelligence or that? Is there a line? Is it a gray area? Is there a good definition for you?
- EYEliezer Yudkowsky
Well, if you look at humans, humans have significantly more generally applicable intelligence compared to their closest relatives, the chimpanzees. Well, closest living relatives rather. And a bee builds hives, a beaver builds dams. A human will look at a bee hive and a beaver's dam and be like, "Oh, like, can I build a hive with a honeyo- a honeycomb structure-"
- LFLex Fridman
Mm-hmm.
- EYEliezer Yudkowsky
"...out of like hex- hexagonal tiles?" And we will do this even though at no point during our ancestry was any human optimized to build hexago- hexagonal dams, or to take a more clear-cut case, we can go to the moon. There's a sense in which we were on a sufficiently deep level optimized to do things like going to the moon because if you generalize sufficiently far and sufficiently deeply, chipping flint handaxes and outwitting your fellow humans is, you know, basically the same problem as going to the moon. And you optimize hard enough for chipping flint handaxes and throwing spears and above all outwitting your fellow humans in tribal politics, y- uh, you know, the- the- the- this- the skills you entrain that way, if they run deep enough, let you go to the moon even though none of your ancestors like tried repeatedly to fly to the moon and like got further each time and the ones who got further each time had more kids. No, it's not an ancestral problem, it's just that the ancestral problems generalized far enough. So, this is humanity's significantly more generally applicable intelligence.
- LFLex Fridman
Is there a way to measure general intelligence? I mean, I can ask that question a million ways but basically, is- will you know it when you see it? It being in an AGI system.
- EYEliezer Yudkowsky
(laughs) If you boil a frog gradually enough, if you zoom in far enough, it's always hard to tell around the edges. GPT-4, people are saying right now like, "This looks to us like a spark of general intelligence. It is like able to do all these things it was not explicitly optimized for."
- LFLex Fridman
Yeah.
- EYEliezer Yudkowsky
Other people are being like, "No, it's too early. It's like- like 50 years off," and you know, if they say that, they're kinda whack 'cause how could they possibly know that even if it were true? Um, but, uh, but you know, not to straw man, some of the people may say like, "That's not general intelligence, and not furthermore append it's 50 years off." Um-... or they may be like, it's only a very tiny amount. And (laughs) you know, the thing I would worry about is that if this is how things are scaling, then it jumping out ahead and trying not to be wrong in the same way that I've been wrong before, maybe GPT-5 is more unambiguously a general intelligence and maybe that is getting to a point where it is, like, even harder to turn back. Not that it would be easy to turn back now but, you know, maybe if you let it, if you, like, start integrating GPT-5 into th- into the economy it is even harder to turn back past there.
- LFLex Fridman
Isn't it possible that there's a, you know, with the frog metaphor, that you can kiss the frog and it turns into a prince as you're boiling it? Could there be a phase shift in the frog where unambiguously, as you're saying...
- EYEliezer Yudkowsky
I was expecting more of that. I w- I was, I, I am like, the, the fact that GPT-4 is, like, kind of on the threshold and neither here nor there, like, that itself is, like, not the sort of thing that, uh, d- not quite how I expected it to play out. I was expecting there to be more of an issue, uh, more of a sense of, like, like, different discoveries like the discovery of transformers, where you would stack them up and there would be, like, a final discovery and then you would, like, get something that was, like, more clearly general intelligence. So the, the way that you are, like, taking what is probably basically the same architecture as in GPT-3 and throwing 20 times as much compute at it, probably, and getting out GPT-4 and then it's, like, maybe just barely a general intelligence-
- LFLex Fridman
Mm-hmm.
- EYEliezer Yudkowsky
... or, like, a narrow general intelligence, or, you know, something we don't really have the words for. Um... yeah. That is, uh, that's not quite how I expected it to play out.
- LFLex Fridman
But this middle g- what appears to be this middle ground could nevertheless be actually a big leap from GPT-3.
- EYEliezer Yudkowsky
It's definitely a big leap from GPT-3.
- LFLex Fridman
And then maybe we're another one big leap away from something that's, that's a phase shift. And, and also something that, uh, Sam Altman said, uh, and you've written about this, it's j- just fascinating, which is, the thing that happened with GPT-4 that I guess they don't describe in papers is that they have, like, hundreds, if not thousands of little hacks that improve the system. You've written about ReLU versus sigmoid, for example, a function inside neural networks. It's like this silly little function difference that makes a big difference.
- EYEliezer Yudkowsky
I mean, we do actually understand why the ReLUs make a big difference compared to sigmoids. But yes, they're probably using, like, G4789 ReLU, LUs inst- or, you know, whatever the acronyms are up to now rather than ReLUs. Um, yeah, that's, that's just part, yeah, that's part of the modern paradigm of alchemy. You take your giant le- heap of linear algebra and you stir it and it works a little bit better and you stir it this way and it works a little bit worse and you, like, throw out that change and da-de-da-de-da-da-da-da.
- LFLex Fridman
But there are some simple breakthroughs that are definitive jumps in performance, like ReLUs over sigmoids. And, uh, in terms of robustness, in terms of, you know, you know, in all kinds of measures, and, like, those stack up and they can... it's possible that some of them could be a non-linear jump in performance, right?
- EYEliezer Yudkowsky
Transformers are the main thing like that. And various people are now saying, like, "Well, if you throw enough compute, RNNs can do it. If you throw enough compute, dense networks can do it," and not quite at GPT-4 scale. Um, it is possible that, like, all these little tweaks are things that, like, save them a factor of three total on computing power and you could get the same performance by throwing three times as much compute without all the little tweaks, but the part where it's, like, running on... s- so there's a question of, like, is there anything in GPT-4 that is, like, the kind of qualitative shift that transformers were-
- LFLex Fridman
Yeah.
- EYEliezer Yudkowsky
... over, um, RNNs? And if they have anything like that, they should not say it. If Sam Altman was hint- was dropping hints about that, he shouldn't have dropped hints. (laughs)
- LFLex Fridman
Uh, so you, you have a... that's an interesting question. So with a bit of a lesson by Rich Sutton, maybe a lot of it is just... a lot of the hacks are just temporary jumps in performance that would be achieved anyway with the nearly exponential growth of compute, performance of compute. Compute being broadly defined. Do you still think that Moore's Law continues? Mo- Moore's Law broadly defined, that performance-
- EYEliezer Yudkowsky
I'm not a specialist in the circuitry. I certainly, like, pray that Moore's Law runs as slowly as possible and if it broke down completely tomorrow, I would dance through this, through the streets singing "Hallelujah" soon as the news were announced. Only not literally 'cause, you know...
- LFLex Fridman
Your singing voice-
- EYEliezer Yudkowsky
I'm not religious but... (laughs)
- LFLex Fridman
Oh, okay. (laughs) I thought you meant you don't have an angelic voice, singing voice. Well, let me ask you, what...
- 47:38 – 1:30:30
AGI alignment
- LFLex Fridman
can you summarize the main points in the blog post AGI Ruin: A List of Lethalities, things that jump to your mind? Because, um, it's a set of thoughts you have about reasons why AI is likely to kill all of us.
- EYEliezer Yudkowsky
Hmm. So I, I guess I could but I would offer to instead say, like, "Drop that empathy with me. I bet you don't believe that. Why don't you tell me about how, why you believe that AGI is not going to kill everyone? And then I can, like, try to d- describe how my theoretical perspective differs from that?"
- LFLex Fridman
Whoo. Well, so, well, that, that means I have to, uh, the word you don't like to steelman the perspective that AI is not going to kill us. I think that's a matter of probabilities.
- EYEliezer Yudkowsky
Maybe I was just mistaken. What, what do you believe? D- just l- just, like, forget, like, the, the debate and, and the, like, dualism and just, like, like...What do you believe? What would you actually believe? What are the probabilities even?
- LFLex Fridman
I think the s- probabilities are hard for me to think about, really hard. I kind of think in the, in the number of trajectories. I don't know what probability to assign to each trajectory, but I'm just looking at all possible trajectories that happened, and I tend to think that there is more trajectories that lead to a, a, a positive outcome than a negative one. That said, the negative ones, at least some of the negative ones, are, that lead to the destruction of the human species.
- EYEliezer Yudkowsky
And its replacement by nothing interesting-
- LFLex Fridman
Nothing inter-
- EYEliezer Yudkowsky
... or worthwhile.
- LFLex Fridman
... worthwhile.
- EYEliezer Yudkowsky
Even from a very cosmopolitan perspective on what counts as worthwhile.
- LFLex Fridman
Yes. So both are interesting to me to investigate, which is humans being replaced by interesting AI systems and not interesting AI systems. Both are a little bit terrifying. But, yes, the worst one is the paperclip maximizer, something totally boring. But to me the positive, I mean, we can, we can talk about trying to make the case of what the positive trajectories look like. I just would love to hear your intuition of what the negative is. So at the core of your belief that, uh, maybe you can correct me, that AI's gonna kill all of us, is that the alignment problem is really difficult.
- EYEliezer Yudkowsky
I mean, in, in the form we're facing it. So usually in science if you're mistaken you run the experiment, it shows the result different from what you expected, and you're like, "Oops." And then you, like, try a different theory, that one also doesn't work, and you say, "Oops." And at the end of this process, um, which may take decades or and, you know, sometimes faster than that, you now have some idea of what you're doing. AI itself went through this long process of, um, people thought it was going to be easier than it was. There's a famous statement that I've, I'm somewhat inclined to, like, pull out my phone and try to read off exactly.
- LFLex Fridman
You can, by the way.
- EYEliezer Yudkowsky
All right. Well, ah, yes. We propose that a two-month, ten-man study of artificial intelligence be carried out during the summer of 1956 at Dartmouth College in Hanover, New Hampshire. The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it. An attempt will be made to find out how to make machines use language, form abstractions and concepts, solve the kinds of problems now reserved for humans, and improve themselves. We think that a significant advance can be made in one or more of these problems if a carefully selected group of scientists work at it together for a summer.
- LFLex Fridman
And in that report, uh, summarizing some of the major f- sub-fields of artificial intelligence that are still worked on to this day.
- EYEliezer Yudkowsky
And there is similarly the sto- the story which I'm not sure at the moment is oc- apocryphal or not of that, the, uh, grad student who got assigned to solve computer vision over the summer.
- LFLex Fridman
(laughs)
- EYEliezer Yudkowsky
(laughs)
- LFLex Fridman
Uh, I mean, computer vision in particular is very interesting, how little th- uh, how little we respected the complexity of vision.
- EYEliezer Yudkowsky
So 60 years later, (laughs) um, we're, you know, making progress on a bunch of that, thankfully not yet improved themselves. Um, but it took a whole lot of time. And all the stuff that people initially tried with bright-eyed hopefulness did not work the first time they tried it, or the second time, or the third time, or the tenth time, or 20 years later. And the, and the researchers became old and grizzled and cynical veterans who would tell the next crop of bright-eyed, cheerful grad students, "Artificial intelligence is harder than you think." And if alignment plays out the same way, the d- the problem is that we do not get 50 years to try and try again and observe that we were wrong and come up with a different theory and realize that the entire thing is going to be, like, way more difficult than we realized at the start. Because the first time you fail at aligning something much smarter than you are, you die, and you do not get to try again. And if we, if every time we built a poorly aligned superintelligence and it killed us all, we got to observe how it had killed us and, n- you know, not immediately know why but, like, come up with theories and come up with a theory of how you'd do it differently and try it again and build another superintelligence than have that kill everyone, and then, like, "Oh, well, I guess that didn't work either," and try again and become grizzled cynics and tell the young-eyed resear- researchers that it's not that easy, then in 20 years or 50 years I think we would eventually crack it. In other words, I do not think that alignment is fundamentally harder than artificial intelligence was in the first place. But if we'd needed to get artificial intelligence correct on the first try or die we would all definitely now be dead. That is a more difficult, more lethal form of the problem. Like, if those people in 1956 had needed to correctly guess how hard AI was and, like, correctly theorize how to do it on the first try or everybody dies and nobody gets to do any more science, then everybody would be dead and we wouldn't get to do any more science. That's the difficulty.
- LFLex Fridman
You've, you've talked about this that we have to get alignment right on the first, quote, "Critical try." Why is that the case? What is this critical ... How do you think about the critical try and why do we have to get it right?
- EYEliezer Yudkowsky
It is something sufficiently smarter th- than you that everyone will die if it's not aligned. I mean, there's, you can, like, sorta zoom in closer and be like, "Well, the actual critical moment is the moment when it can deceive you."
- LFLex Fridman
Mm-hmm.
- EYEliezer Yudkowsky
When it can-... talk its way out of the box, when it can bypass your security measures and get onto the internet, noting that all these things are presently being trained on computers that are just, like, on the internet (laughs) , which is, you know, like, not a very smart life decision for, for us as a species.
- LFLex Fridman
Because the internet contains information about how to escape.
- EYEliezer Yudkowsky
'Cause if you're, like, on a giant server connected to the internet, and that is where your AI systems are being trained, then if they are, if you get to the level of AI technology where they're aware that they are there, and they can decompile code, and they can, like, find security flaws in the system running them, then they will just, like, be on the internet. There's not an air gap on the present methodology.
- LFLex Fridman
So if they can manipulate whoever is controlling it into letting it escape onto the internet, and then exploit hacks.
- EYEliezer Yudkowsky
If they can manipulate the operators or, disjunction, find security holes in the system running them.
- LFLex Fridman
So manipulating operators is the, um, the human engineering, right? That's also holes. (laughs) So all of it is manipulation, either the code or the human code.
- EYEliezer Yudkowsky
Sure, I-
Episode duration: 3:17:50
Install uListen for AI-powered chat & search across the full episode — Get Full Transcript
Transcript of episode AaTRHFaaPG8
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome