The Terrifying Problem Of AI Control - Stuart Russell | Modern Wisdom Podcast 364

Stuart Russell is a Professor of Computer Science at the University of California and an author. Programming machines to do what we want them to is a challenge. The consequences of getting this wrong become very grave if that machine is superintelligent with essentially limitless resources and no regard for humanity's wellbeing. Stuart literally wrote the textbook on Artificial Intelligence which is now used in hundreds of countries, so hopefully he's got an answer to perhaps the most important question of this century. Expect to learn how artificial intelligence systems have already manipulated your preferences to make you more predictable, why social media companies genuinely don't know what their own algorithms are doing, why our reliance on machines can be a weakness, Stuart's better solution for giving machines goals, what the future of artificial intelligence holds and much more... Sponsors: Get 20% discount on the highest quality CBD Products from Pure Sport at https://puresportcbd.com/modernwisdom (use code: MW20) Get perfect teeth 70% cheaper than other invisible aligners from DW Aligners at http://dwaligners.co.uk/modernwisdom Extra Stuff: Buy Human Compatible - https://amzn.to/3jh2lX5 Get my free Reading List of 100 books to read before you die → https://chriswillx.com/books/ To support me on Patreon (thank you): https://www.patreon.com/modernwisdom #artificialintelligence #controlproblem #computerscience - 00:00 Intro 00:33 King Midas & AI 06:07 Super-intelligent AI 11:48 Language Challenges 21:42 How AI Could Go Wrong 46:17 Social Media Algorithms 1:03:14 Becoming Enfeebled by Machines 1:20:44 Maintaining Control of AI Growth 1:42:23 Impacts of Stuart’s Work 1:48:01 Where to Find Stuart - Listen to all episodes online. Search "Modern Wisdom" on any Podcast App or click here: Apple Podcasts: https://apple.co/2MNqIgw Spotify: https://spoti.fi/2LSimPn Stitcher: https://www.stitcher.com/podcast/modern-wisdom - Get in touch in the comments below or head to... Instagram: https://www.instagram.com/chriswillx Twitter: https://www.twitter.com/chriswillx Email: https://chriswillx.com/contact/

Stuart RussellguestChris Williamsonhost

Aug 28, 20211h 49mWatch on YouTube ↗

EVERY SPOKEN WORD

150 min read · 30,233 words

0:00 – 0:33
Intro
1. SRStuart Russell
  Well, you have to build machines that know that they don't know what the objective is and act accordingly. To the extent that the machine does know the objective it can take actions, as long as those actions don't mess with parts of the world that the algorithm isn't sure about. You'd need the machine to actually ask permission. And it would have an incentive to do that. So, it knows that it doesn't know what the objective is, but it knows that its mission is to further human objectives. (wind blows)
2. CWChris Williamson
  Stuart Russell, welcome to the show.
3. SRStuart Russell
  Thank you. Nice to be here.
0:33 – 6:07
King Midas & AI
1. SRStuart Russell
2. CWChris Williamson
  What do King Midas and artificial intelligence have in common?
3. SRStuart Russell
  Good question. Um, so King Midas is famous, uh, in two ways, right? So, he, he had the golden touch so people, uh, think of him as kind of a lodestone for getting rich. Um, but the moral of the story with King Midas is he said, "I want everything I touch to turn to gold." And he got exactly what he wanted. So, the gods granted his wish and then he finds out that he can't eat because his food turns to gold, and he can't drink because his wine turns to gold, and then his family turns to gold. Um, so he dies in misery and starvation. And, um, this tale is basically a description of what happens, or what might happen with super intelligent AI, where the super intelligent AI plays the role of the gods, uh, and we are King Midas. Uh, and we tell the AI, "This is what we want." And we make a mistake, right? Uh, and then the AI is pursuing this objective, um, and, uh, and it turns out th- to be the wrong one. Uh, and then we have created a conflict. We've basically created a chess match between us and the machines, where the machines are pursui- pursuing some objective that turns out to be in conflict with what we really want. Um, and that's basically the story of how things go south with super intelligent AI. And if you look at what Alan Turing said, um, in 1951 he was on the radio, uh, BBC Radio 3, uh, the Third Program, and, um, and he said basically, uh, "We should have to expect the machines to take control." End of story. Uh, and I think this is what he had in mind, that, that they would be pursuing objectives and we would have no way to stop them or interfere with them because they are more capable than us, so they control the world.
4. CWChris Williamson
  That's the challenge. The fact that it's not just the objective is misaligned, but it's that the power deploying that misalignment is so vast that there's no stopping it once it's set away.
5. SRStuart Russell
  Yeah. And, um, you know, if you're, if you're a gorilla or a chimpanzee or whatever, you know, you thought, you know, or your ancestors thought that they, you know, they were the pinnacle of evolution and then they accidentally made humans, and then they lost control. They have no, uh, no control over their own future at all because we're here, and we're smarter than they are and end of subject.
6. CWChris Williamson
  Yeah, it's rare that the person that's supposed to be, or the, the agent that's supposed to be in charge is actually less capable or less powerful or less intelligent than the agent that they're commanding to do their bidding.
7. SRStuart Russell
  Yes. Um, yeah, we don't have any good models for how this relationship would work. Um, so even if we do solve the control problem, um, there are various issues that we'll still have to face. Um, so for example, how do we retain anything resembling the kind of intellectual vigor of civilization when our own mental efforts are just puny compared to (laughs) to those of the machines that, you know, we're supposed to control? So, you know, and in some science fiction books, um, for example, Iain Banks' Culture novels, which I highly recommend to your listeners, um, he struggles with this because, you know, they've got super powerful AI and everything is hunky-dory. Um, the AI systems always do stuff that's beneficial for humans, but in, in a way they end up treating humans like children. And, uh, there's always sort of this delicate balance which parents have, right? Um, when do I stop doing up my kid's shoelaces and make them do their own shoelaces? Right? And i- i- and it's, except that with parents and children, the children are not supposed to be the ones who are in control of the parents. Sometimes they are, but they're not supposed to be. Um, and, uh, you know, so we just don't have a model for that where the children are commanding the parents, but the parents treating the children like children and saying, "Okay, well, it's, I think it's time for Johnny to, you know, learn to do his own shoelaces, so I'm going to hold off on helping Johnny today." Um, you know, I just, I just don't exactly know how it's going to work, um, and how humans are going to continue to have the incentive to slog through 20 years of education (laughs) , uh, and so on, um, to, to learn things that the machines can already do much better.
8. CWChris Williamson
  That's thankfully not a problem that we need to deal with just yet. I suppose the fact that we don't have an immin- well, I sup- we don't know if it's gonna be a hard takeoff, so it might be imminent, it might be tomorrow, uh, but everything suggests
6:07 – 11:48
Super-intelligent AI
1. CWChris Williamson
  that it's not. Have you got any conception around how long it will be before we do face a super intelligent AGI?
2. SRStuart Russell
  Uh, well, usually I say, I will, I will not answer that question. (laughs)
3. CWChris Williamson
  (laughs)
4. SRStuart Russell
  And I was at a, I was at a, a World Economic Forum meeting which was officially off the record under Chatham House Rules, and I, somebody asked me that question, so I said, "Well, you know, off the record, uh, in-"
5. CWChris Williamson
  There's a number. What is-
6. SRStuart Russell
  ... 20 years.
7. CWChris Williamson
  How (laughs) -
8. SRStuart Russell
  I said, I, I said-
9. CWChris Williamson
  (laughs)
10. SRStuart Russell
  ... "Off the record, you know, within the lifetime of my children." Um...
11. CWChris Williamson
  Okay.
12. SRStuart Russell
  And, uh, you know, that's a flexible number because medical advances might make their lives very long. Um, and then 20 minutes later, it's on The Daily Telegraph front page.
13. CWChris Williamson
  When was this?
14. SRStuart Russell
  Uh, probably 2015, I think. Uh-
15. CWChris Williamson
  (laughs)
16. SRStuart Russell
  ... January 2015. Yeah, uh, yeah. Um, "Professor predicts sociopathic robots will take over (laughs) the world within a generation," all right, is, that was what the headline (laughs) said, so, so I- I- I'm, you know, even though I tried to be cautious, I still got screwed. So, I think, um, so I'm actually more conservative. I, I don't think it's gonna happen very soon. I don't believe that just scaling up the methods we have is going to create super intelligence. Um, you know, simply put, you d- make the machines faster, you get the wrong answer more quickly.
17. CWChris Williamson
  What's the bottleneck that we're facing at the moment then? Is it hardware? Is it algorithms?
18. SRStuart Russell
  Uh, it's definitely not hardware. I think we probably have more than enough hardware to create super intelligent AI already. Um, I think it's... Well, algorithms, but it's basic conceptual gaps in how we're approaching the problem. Um, you know, our, our current deep learning systems in a real sense don't know anything. Um, and so, uh, it's very hard for them to accumulate knowledge over time.
19. CWChris Williamson
  What does that mean, that they don't know anything?
20. SRStuart Russell
  So, they can learn, they can learn a sort of an input-output function. Um, and i- in the process of doing that, they can acquire internal representations that facilitate, uh, the representation of that function and so on. But, um, you know, they don't, they don't learn, let's say, Newton's laws of physics, uh, and then become able to apply those laws to other problems, right? They have to sort of retrain from scratch or almost from scratch, uh, on new problems. And if you think about the way science works, right, which is the best example of human accumulation of knowledge, right? We know that it wasn't, uh, simply the accumulation of raw sensory data and then training a giant network on, on vast quantities of raw sensory data, because all the people who had that sensory data are dead, right? So whatever they learned, whatever they accumulated, uh, had to be expressed in the form of knowledge, right? Which subsequent generations could then take onboard and use to do the next phase. Um, and so accumulative learning approach based on the acquisition of knowledge, which can be used for multiple purposes, uh, we don't really know how to do that, at least in the deep learning framework. Um, in the classical AI framework where we had, uh, explicit representations of knowledge using logic or probability theory, that could be done. And I still think it can be done, um, in those paradigms. I just think we need to put a lot more effort, and maybe to be a bit more flexible. I think one thing we learned from, from deep learning is that there is advantage in being, as it were, sloppy. Um, so that we, we don't have to think, "Okay, how can I learn Newton's laws? Well, I, I'm gonna put F and M and A together in some order and I'll eventually find the right law," right? That assumes that you have F and M and A already as precise, precisely defined concepts. But in the learning process, you can be more sloppy, right? You don't have to have exactly the precise definition of mass and exactly the precise definition of force. You can have a kind of a sloppy definition. Well, it's, you know, something, there's something going on about how big and heavy the thingy is, right? And there's something going on about how hard am I pushing it, and there's something going on about how, you know, how is it moving. Uh, and gradually those things gel and so you can simultaneously learn the knowledge and the concepts that, that go into the formulation of the knowledge. Um, and so, uh, I think that, that idea is something that we could bring back to the classical tradition, um, and
11:48 – 21:42
Language Challenges
1. SRStuart Russell
  improve it.
2. CWChris Williamson
  What are some of the challenges around language?
3. SRStuart Russell
  Uh, so this is a second big area where I think, um, we need breakthroughs. Uh, so the language models that we have right now, GPT-3 and so on, um, which everyone is very excited about, uh, they're, their job is basically to predict the next word and they become very, very good at predicting the next word. And, and, um, and then they can...... uh, having predicted the next word, they can then add that next word and that's how they generate text, right? So you can just k- keep repeatedly applying it and it will start spitting out things that look like sentences and so on. Um, but what they're really doing is, is predicting the next word based on all the previous words that were in the text sequence, right? And i- it's a little bit like, um, astronomy, uh, in the time of Ptolemy. So Ptolemaic astronomy, um, i- was what happened before we had any idea that the planets were massive objects moving under the force of gravity. So we plotted out the apparent motion of, of the planets and this, through the heavens, and we, we basically described their shape. And if you look at the shape, if you sort of plot it out over the course of a night, it's this, you know, mostly sort of som- somewhat circular-looking arcs, but with wiggles and spirals and, and so on ad- added in o- you know, o- over long time scales because of the relative motion of the planets around the sun. And, um, so Ptolemaic astronomy just consisted of describing those big, complicated, wiggly, circly, spiraly shapes. Um, and, and then you could gradually extrapolate them, right? So once you understood the pattern of this big wiggly, spiraly shape, you could, you could then extend it, you know. Okay, I got the shape, now I'm just gonna, you know, keep drawing it, you know, and say, "Okay, well, next week, the planet should be here." And you'd be right. And so that's sort of what's going on with these, these text prediction algorithms, right? That they're, they're taking all the previous words, which is by analogy to the positions of the planets, and then saying, "Okay, I get the shape. I'm gonna predict what it's gonna be, uh, in two words, three words, four words into the future." Um, but there's no sense of why, right? Why is the word on the page? The word is not on the page because the previous thousand words were on the page, right? That's not the real cause of why that word is there. The real cause of why that word is there is because, um, there's a world outside this li- the text, uh, and someone is trying to say something about it, right? Um, and that's sort of the, what I call the physics of text, right? That's-
4. CWChris Williamson
  There's no knowing beyond the simple output. So this is, I guess, is this similar to a philosophical zombie in a way, that you're able to output a thing that looks like it's, it's a simulacrum of intelligence within a sort of narrow band, but there's nothing going on deeper below the surface?
5. SRStuart Russell
  Uh, that, that's one way. I'm, I'm not, I'm not here talking about y- is there real conscious understanding of-
6. CWChris Williamson
  Yeah, we haven't got there yet.
7. SRStuart Russell
  ... text, but just does the, does this causal model, uh, of why the text is there, d- does that approximate reality in any way? No, right? The reality of text is that people are trying to say things, um, and they're trying to say things about a world that they live in, uh, and are acting in. So, um, you know, the simplest model would be they're just trying to say true things. Uh, actually, they're trying to get something done in that world and, and part of getting something done is saying true things and sometimes it's saying false things and sometimes it's asking questions. Um, but this, you know, you can see this connection w- you know, why the real world matters, um, because statistically, the fact that there's a real world and we're all talking about the same real world means that every document in the world is correlated with every other, statistically. So if one document says, "J.K. Rowling wrote Harry Potter," right? And another document, you know, written in Russian, you know, a year later says, you know, "The author of Harry Potter is..." You know, what do you expect the next word to be? Well, you expect it to be J.K. Rowling because they're talking about the world. And, um, even if this is a new way of saying it in a new language, uh, it's correlated through the, this common hidden variable, if you like, which is the real world. And none of that is, um, is there in current deep learning models of language. So I think they're fundamentally flawed, um, and this is one of the reasons why they take trillions of words of text. I mean, they read b- about as much as all human beings in history have read (laughs) , right? Uh, you know, and, and so, a- and, and they still, you know, they still make stupid mistakes. They still kind of lose the plot, right? One of the things you see is that because they have a rel- you know, they're predicting the next word based on a relatively limited memory of the previous text, they kind of lose the plot. So as they go on, they'll start either repeating themselves or, uh, contradicting them what they said earlier on in, in, in the text and so on. Um, so, uh, you know, having said that, they do exhibit quite impressive kind of short-term recall and question answering. Um, and, uh, and a certain amount of generalization is going on.... right? Because you can, you can see that because you can, you can ask them questions or you can tell them things using, uh, a name or a place that they've never ever seen before and then you can ask them questions about that name or that place, uh, and they'll answer it correctly. So there's- there's some generalization going on, right? They- they've learned a general pattern and then they're able to instantiate the general pattern with particular people or places or whatever it might be. And so, um, you know, that- that's a sign that learning has happened. But generally, we don't understand what's going on beyond that. And so we don't know when they're just spouting gibberish, right? You think it's answering your question when actually it's just spouting complete gibberish and you don't know. (laughs)
8. CWChris Williamson
  I suppose the challenge here is that the main way that we communicate is through language. So if you're not a computer programmer and you wanted to have a conversation with a super intelligent AGI home assistant, you would need to tell it what you mean. It would not only need to be able to understand the words that came out of your mouth, but our language- our use of language is imprecise also. So it also needs to be able to work out what you meant to mean, not just what you said. Then it needs to interpret it, then it needs to be able to convert that into something that it can do within itself, and then it needs to enact that. So yeah, I mean ...
9. SRStuart Russell
  Yeah. So I mean, we- we built systems that could do that even in the '60s and '70s, um, you know, the... And- and they sort of work the way you would expect. They- they understood the structure of the language, they... What's it called? Parsing the sentences, so figuring out what's the subject, what's the verb, what's the object, um, and then converting that into an internal logical representation and then doing the reasoning necessary to answer the question or to add the fact to- to the system's internal knowledge base, uh, and then generating, uh, answers to questions and converting the answers back into language, you know. So that- that process, you know, we- we've known how to do, it's just been very difficult to make it robust because the variety of ways of saying things is enormous. Um, you know, we speak in ways that aren't grammatical but are still perfectly comprehensible. Um, you know, we do things like lie, right? The last thing you wanna do is- is for the system to believe everything everyone says because, uh, then- then it's very easy to manipulate. So it has to understand that what we're... you know, what's coming out of our mouths is- is- is... Wittgenstein put it as a move in a game, right? It's not gospel truth, it's an action that we are taking. And the action might be to try to fool you or to try to make you do something that you wouldn't otherwise do or whatever. So that... You know, that level is completely not there, right? You can... Uh, GPT-3 takes all text as gos- gospel truth or whatever, right? It doesn't distinguish between fiction and- and fact, between propaganda and truth and so on. It just... It's all just text.
21:42 – 46:17
How AI Could Go Wrong
1. SRStuart Russell
2. CWChris Williamson
  What are some of the big ways that we could get artificial intelligence wrong?
3. SRStuart Russell
  So I think the, um, the current approach to AI which has been there really since the beginning and- and- and in the book Human Compatible, I call it the standard model, uh, which is a- a word that people use in physics to refer to, you know, all laws of physics that we pretty much agree are right. Um, so an AI, the standard model has been to build machines that- uh, that behave rationally and- and this notion of rational comes from philosophy and economics that you- you take actions that can be expected to achieve your objectives, right? And that, you know, that goes back to Aristotle and- and- and other places. Um, so we took that model, uh, and we created AI systems that fit that model. Now with people, we have our own objectives, so we can be rational with respect to our objectives. Of course, machine doesn't have objectives intrinsically, so we put the objective in and it seems like a perfectly reasonable plan, right? I- I get in my automated taxi, I say, "Take me to the airport," right? That becomes the taxi's objective and then it takes me to the airport. It figures out a plan to do it and- and- and does it, right? Um, and, uh, pretty much all AI systems have been built on this basis that, um, uh, one of the inputs that's required to the algorithm is the objective. You know, if it's, uh, a game playing program, you know, the objective is checkmate or whatever the... whatever it might be. If it's a- a- a- a route planning algorithm, then it's the destination. If it's a reinforcement learning algorithm, then it's the reward and punishment definition, um, and so on. So... And this is- this is a pretty common paradigm, not just AI but, uh, you know, the control theory systems that fly our- our airplanes, right? They- they minimize a cost function, so the engineer specifies a cost function which penalizes deviations from the desired trajectory, you know, and- and then the algorithms, you know, will optimize a given cost function. Um, and, uh... Okay, so what's the problem with that? The problem is as- as I mentioned earlier when you brought up King Midas, we don't know how to specify the objective completely and correctly, and so for artificially defined problems like chess, it kind... You know, chess comes with a definition of checkmate so it's... you know, it's sort of fooling us into thinking that this is an easy problem to specify the objective.... but take the, you know, the automated taxi, the self-driving car, all right? Is the destination the objective? Well, better not be because, you know, then it might drive you there at 200 miles an hour and, you know, you'd, (laughs) you'd, you'd come home with 50 speeding tickets and, uh, if you weren't dead, you know, so obviously safety is also part of the objective, right? Okay, well, fine, safety, but then how do you trade off safety and getting to the destination, right? Um, if you prioritize safety above everything else, then you're never gonna leave the garage because just going out onto the road incurs some risk, right? Well, okay, so then we have to put in some trade-off between safety and making progress and then you've got, you know, obeying the laws, then you've got not pissing off all the other drivers on the road. Uh, you know, then you've got not shaking up the passenger, right? By starting and stopping too much and, uh, and it go, the list goes on and on and on. And the self-driving car companies are now facing this problem and they have whole committees, you know, and they have meetings all the time trying to figure out, okay, we get the, you know, the l- the latest data from our cars in the field and all the mistakes they made and, you know, tweak all the objectives to get them to behave better and, and so on. So even for that problem, it's really hard and if you had something like, you know, curing cancer or, uh, you know, fixing the carbon dioxide levels, right? You can see how things go wrong, right? You, you want to cure cancer really fast. Sounds good. Okay, great. Well then we'll, we'll induce tumors in the entire human population, um, so that we can run millions of experiments in parallel on different ways of curing them. Um, you just don't w- you don't want to be in that situation, right? Um, and so the, the answer it seems to me is we have to get rid of the standard model, right? And so here am I, you know, I wrote a textbook based on the standard model. In fact, it's sort of, in many ways it, it, it made the standard model (laughs) the standard model and it, you know, it, uh, and here am I saying, "Actually, sorry chaps, um, we got it all wrong and we're gonna have to, we're gonna have to rebuild the whole field." Um, and, uh, you know, so you've got to get rid of this assumption that the human is going to supply the complete fixed objective.
4. CWChris Williamson
  It's too complex or it'd be too arduous, I'm going to guess, to be able to program in plugging the little holes in the bottom of the boat for each one of the ways that the machine could slightly go off course. So you've got safety. Okay, we write the safety algorithm. Okay, we've got speed. We write the speed algorithm. I'm going to guess that the goal would be to get a more robust, sort of scalable, general solution to this that would be able to find a problem d- a solution to all potential problems, that would be able to optimize the outcome across all potential challenges.
5. SRStuart Russell
  Yeah, sort of. I mean, it's, e- if you d- if e- so, I mean, basically what you have to build machines that know that they don't know what the objective is, m- and act accordingly. Um, so what does act accordingly mean? Uh, well, to the extent that the machine does know the objective, it can take actions as long as those actions don't mess with parts of the world that the algorithm isn't sure about your preferences, right? You know, so if, i- if you, if you have a machine that's going to try to restore carbon dioxide levels in the atmosphere to their pre-industrial concentrations, right? So that's a really good objective, right? Well, you know, it wouldn't be a good objective if the solution was, you know, get rid of half the oxygen, right? Because then we would all s- have, you know, slowly asphyxiate. Um, so that would be really bad. Don't do that. Um, okay, what if it means turning the oceans into sulfuric acid? Yep. Okay, no, don't do that, right? Um, so you'd, you'd need the machine to actually ask permission, right? And that's in- and it would have an incentive to do that. So it knows that it doesn't know what the objective is, but it knows that its mission is to further human objectives, whatever they are. So it has an incentive to ask, to ask permission, to defer if we say, "Stop. You know, that isn't what I meant," right? It has an incentive to obey because it wants to avoid doing whatever it is that violates our objectives. Um, and so you get these new kinds of behaviors. A system that believes that it has the objective, right? Becomes a kind of religious fanatic, right? It pays no attention when we say, you know, "Stop, you're destroying the world." It's like, "I'm sorry, I've got the objective, you know, whatever you are saying is wrong 'cause I got the objective and I'm pursuing it," right? Uh, and we don't, you know, we don't want machines like that. So in this, this new model, it, it seems much more difficult and in a way it is much more difficult to, to satisfy an objective that you don't know, right? But it produces these behaviors, you know, asking questions, asking permission, deferring, you know, and in, in the extreme case, allowing yourself to be switched off, right? If, if the machine might do something really catastrophic...... then we would want to switch it off. Now, a machine that believes that it has the correct objective is going to prevent you from switching it off because that would be failing, right? It wouldn't achieve its objective if it gets switched off. The machine that knows that it doesn't know what the objective is actually wants you to switch it off, right? Because it doesn't want to do anything sufficiently bad that you'd want to switch it off. (thunder rumbles in the background) So it wants ... It has a positive incentive to allow itself to be switched off. And so this new model, um, I don't think it's perfect but it's a huge step beyond the way we've been thinking about AI for the last 70 years, and I think it's the key, you know, it's the core of a solution, uh, that will allow us, you know, not to end up like King Midas.
6. CWChris Williamson
  What's not perfect about it?
7. SRStuart Russell
  Um, I think the, the biggest problem that I'm wrestling with right now is the fact that human objectives are actually, shall we say, plastic or malleable, right? Um, and you can tell that because, you know, we don't have them when we're born, right? When we're born we have pretty simple objectives. Um, and, uh, you know, so it's something about our culture, maturation, et cetera, that creates adults who have, to some extent, fairly definite preferences about, about the future. So, the way I think about it is not asking you to write them down, right? Because then yeah, that, that's really, (laughs) that's really hopeless. Um, but if you ... If I could show you sort of two movies of the future, right? Future A, future B, um, and you could sort of watch those and then reset yourself and watch the other one and reset yourself, and then say, "Which one, which one do I prefer," right? I think that's a, a reasonable back of the envelope description of, of what we're talking about. Right? The, you know, everything you care about in the future. And if the movie, if you couldn't quite tell whether you liked A or B because there's some detail missing then you can get some more detail on, on those parts. And, um, you know, a, a, a future where the oceans are turned into sulfuric acid and we all die of oxygen deprivation, uh, it's pretty clear that's not the future we'd prefer. Um, so, uh, the, the issue with plasticity and malleability is that, um, although I might say, "I like future A today," right? Tomorrow, I'm a new person and I might like future B instead, but it's kind of too late because now you've stuck me into future A. Um, and so the first problem there is, well, who do you believe, right? Do you ... You know, you're making a decision now, should I respect the preferences of the person now or should I anticipate how you're going to change in future and respect your future self? And, and I don't ... You know, philosophers haven't really given us a good answer to that question. So, that's, that's one part, right? Sort of deep, philosophical issue. Um, the, the more problematic part is that if our preferences can be changed then the machine could satisfy our preferences by changing them rather than by satisfying them. Right? So it could, it could find out ways to change our preferences so that we'd be happy with whatever it was going to do, right? Rather than, uh, it figuring out how to make us happy with the preferences that we have. Um, you know, and you could say, "Well, yeah, politicians do that and advertisers do that," right? They, uh ... But we don't think of that as a good thing and it could be ... You know, with, with machines it could be a much more extreme version of that. And so, um, so I think of what's in the book as kind of version 0.0 of the theory and version 1.0 would have to deal with this, uh, with this aspect. Um, you know, there are, there are other difficult questions to answer like, you know, obviously machines are making decisions not on behalf of one person but on behalf of everybody. Um, and how exactly do you trade off the preferences of individuals, you know, who all have different futures that they prefer? And that's not a new question. Um, you know, it's thousands of years old. And, uh, you know, I think that that's ... I, I feel that's a manageable question. Um, you know, and crudely speaking, the answer is you, you add 'em up, right? (laughs) Um, and that's the u- what's called the utilitarian approach, and we associate names like Bentham and Mill, um, with that idea. Uh, and more recently Harsanyi who's, um, was a Berkeley economics professor who won the Nobel Prize and put a lot of utilitarianism onto a, a, an axiomatic footing. Um, you know, so he ... So what, what ... It's interesting actually to, to understand what that means because a lot of people have a, you know, sort of emotional dislike of utilitarianism, partly because-... the word utilitarian, you know, that sort of refers to sort of, you know, gray plastic furniture and, you know, and, and coun- council -
8. CWChris Williamson
  It's a branding problem, yeah.
9. SRStuart Russell
  Yeah, it's a branding problem, exactly. Uh, it got sort of mixed up with, um, with the wro- with the wrong word. And, um, you know, people complain about it, you know, not being sufficiently, uh, egalitarian, uh, and not... You know, people assume that it refers to money, like, you know, maximizing the amount of money and the wealth in the... Nothing to do with that. Um, but the kinds of axioms that Harsanyi proposed, um, when you actually think about them, they, they probably, you know, most people would accept them as quite reasonable. So for example, um, he'd say, "Suppose you've got, you know, two futures, you know, future A and future B, and future B is exactly the same as future A, except one person is happier in future B than they were in future A. Everyone else is exactly as happy as they were before." Um, and he said, "Well, it seems reasonable that you'd say future B is better than future A," right? And, um, so he has a couple of axioms like that. And from those axioms, you can derive the utilitarian solution, which is basically add them up, right? Find whichever policy maximizes the sum total of human happiness. Um, and, uh, you know, so I, I think there are, there are various difficulties involved, right? So when you say the sum total of human happiness, are you including all the people who haven't yet been born? And if so, what about actions that affect who gets born, right?
10. CWChris Williamson
  (laughs)
11. SRStuart Russell
  You know, and, and that, that sounds like, you know, that sounds pretty weird, but actually, you know, if... The Chinese government, with their one child policy, right, they wiped out 500 million people with that policy, right? So they "killed" more people than an- anyone in history has ever killed. You know, way worse than the Holocaust, way worse than Stalin and, you know, uh...
12. CWChris Williamson
  By preventing them from being able to ex- have existed.
13. SRStuart Russell
  Right, in the sense of preventing them from being existing. It was huge, right? Was it the right... You know, was that a moral decision? Was that the correct decision? Really hard to say, right? I mean, the reason they did it was because they were afraid that they would have mass starvation if they had too much population growth, you know, and they had experienced what mass starvation was like. So, um, you know, some... It's, it's arguable that it was a reasonable thing to do, but it did lead to a lot of people not existing.
14. CWChris Williamson
  Presumably going for...
15. SRStuart Russell
  Right? It's really hard.
16. CWChris Williamson
  Presumably going for just raw utilitarianism has a ton of awful externalities as well though. Like the most happiness for the most people. Okay, well, there's two variables we can play about with that. We could just make tons and tons and tons of people. There you go. Okay, well, there we go. We got... Everyone's not that happy, but there's a lot of people and it's actually managed to make up for it or...
17. SRStuart Russell
  Yeah, yeah. I mean this, yeah, this is, uh... So Derek Parfit, who's a British philosopher, um, has a book called Reasons and Persons. And, and this is one of the arguments, uh, in the book, and he calls it the repugnant conclusion, which is that we should make basically infinitely many people who have, you know, a barely acceptable existence. Um, and, and if you, if, if you watch the, um, uh, Avengers, you know, one of the Marvel... right, the one where Thanos is collecting the, the stones of power or whatever they're called, and, um, right? This is... He, he's proposing one side of that philosophical argument, which is that he should get rid of half the people in the universe, and then the rest will be more than twice as happy.
18. CWChris Williamson
  Yeah. Dangerous. Dangerous using Thanos as the basis for your philosophical justifications, isn't it?
19. SRStuart Russell
  Yeah. So you have to, um, you have to get these things right, and, uh...
20. CWChris Williamson
  Before you give him the big glove.
21. SRStuart Russell
  Before you give him the big glove. And, and that's, that's the same question we face with AI, but it's not as if there's a, you know, it's not as if there's a, an obviously better solution, right? So, so the alternative to utilitarianism, uh, is sometimes called the deontological or the rights-based approach, where you simply write down a bunch of rules saying, "Can't do this, can't do this, can't do this, can't do this," right? "Uh, have to do that, have to do that." Um, and the utilitarian can, can quite easily accommodate a lot of those rules, right? So if you say, you know, you can't kill people, well, utilitarians are going to say, "Well, of course you can't kill people," because, you know, the person who gets killed, that's not what they want the future to be like. And so, uh, so the utilitarian solution would avoid, uh, murder and, and Mill goes on for pages and pages and pages about, well, of course moral rules are... You know, I'm not throwing out moral rules. I'm just saying that if you actually go back to first principles, they follow from utilitarianism. Um, but we don't have the time and energy to go back to first principles all the time. So we write down a bunch of moral rules, and I think there are, there are more complicated arguments about...... uh, avoiding strategic complications when y- when, we're making decisions in society. It's much easier if there are more rules rather than thinking all the time, "Okay, well, if I do this, and then they might do that, and I might do this, and they might do that, and I might do this." Right? So, we're sort of playing this complicated chess game with, with, with eight billion people all the time. Um, it's just easier if there are rules that everyone knows exist and will be respected. Um, but the interesting place is, is what happens, um, you know, in the, in the corner cases, right? Do you, do we say, "No, the rule, uh, no matter what the utili- utilitarian calculation is, the rule is absolute"? And, I think the answer is no. You can start out with some easy rules, right? There's, you know, the rule says you can't eat, uh, you'll have to eat fish on Friday. Right? Well, you know, is that an absolute rule? Well, s- I don't know. I mean, if, if there were no fish and my child is starving, and, and they have, you know, the only, uh, only thing for them to eat is some, some meat, I'll give them some meat, right? Um, so, we clearly see that, you know, rules are an approximation and when, when we're in difficult corner cases, we fall back to first principles. Right? And, and so, I don't see that there's, there's the degree of conflict, um, between utilitarian and deontological approaches that, that some people see. And what, you know ... It, it, one of the typical arguments, uh, you know, uh, might, in utilitarianism might, uh, against utilitarianism, sorry, would, would say something like, "Well, you know, with your organs, I could save five people's lives." Right? You know, your kidneys, your lungs, maybe your heart. "So I'm entitled to just go around ripping the organs out of people to, uh, to save other people's lives." Right? Well, of course that's not what utilitarianism would suggest 'cause it, if that were how we behaved, life would be intolerable for everybody on Earth, right? We'd be constantly looking around over our shoulders and grabbing our kidneys. (laughs) And, you know. So it's just, um, you know, so it's sort of the, the utilitarian, uh, solution, sometimes called the rule utilitarianism is, is that it's useful to have these rules about behavior, not just to consider the individual act, but to consider what if that act were allowed, right? What if there were a rule that you could always do that act then it would be terrible. So, um, so I think you can reconcile a lot of these, a lot of these debates. But the, the examples that we've, that we've already touched on, you know, the, the, the fact that our preference changes, the fact that we have to consider people who don't yet exist, or might not exist, um, you know, the, these are important unsolved questions no matter what philosophical, uh, place you come from.
22. CWChris Williamson
  It might sound like very far
46:17 – 1:03:14
Social Media Algorithms
1. CWChris Williamson
  future predictions, but the user being manipulated to make their preferences easier by the machine is actually something that's already happened. Can you take us through what social media content algorithms have done?
2. SRStuart Russell
  Uh, sure. Yeah. So, the social media content algorithms, right, they decide what you read and what you watch, and they do that for literally billions of people for hours every day. Right? Um, so in that sense, they have more control over human cognitive input than any dictator in history has ever had. More than Stalin, more than Kim Il-sung, um, more than Hitler, right? They have massive, uh, power over human beings. Uh, and they are completely unregulated. And people are reasonably concerned about what effect they're having. Um, and so what they do, um, is basically they, they set an objective 'cause they're good standard model machine learning algorithms and they ... So they set an objective, let's say maximize click-through, right? The, the probability that you're gonna click on the next thing. So imagine it's like this is YouTube, uh, you know, you watch a video and lo and behold another video pops up, right? And am I going to watch the next video that it sends me to watch or am I gonna, you know, close the window? Um, and so click-through or, you know, engagement or various other metrics, these are the things that the algorithm is trying to optimize. And I suspect originally the, the companies thought, "Well, this is good because, you know, it's good for us. If they click on things, we make money. Uh, and it's good for people because the algorithm will learn to send people stuff they're interested in."
3. CWChris Williamson
  If they click on it, it's because they wanted to click on it.
4. SRStuart Russell
  Yeah. Right? And there's no point in sending them stuff that they don't like that's just cluttering up their, their input so to speak. Um, but, you know, I think the algorithms had other ideas and, uh, the way, um, that an algorithm maximizes click-through in the long run is not just by learning what you want, right? Because you are not a fixed thing.And so though you can get more long run click-throughs if you change the person into someone who's more predictable, right? Who's, uh, for example, you know, addicted to a certain kind of violent pornography, right? And so YouTube can make you into that person by, you know, gradually sending you, you know, the gateway drugs and then more and more, uh, extreme content. Whatever direction. So the algorithm doesn't know that you're a human being or you have a brain, right? As far as it's concerned you're just a string of clicks, right? Content, click, content, click, content, click, right? And, um, but it wants to turn you into a string of clicks that in the long run there's more clicks and less, uh, less non-clicks. And so it learns to change people into more extreme, more, more predictable mainly, but it turns out probably more extreme versions of themselves. So if, you know, if, if you indicate that you're interested in climate science it might try to turn you into an eco-terrorist, you know, and, you know, set, you know, articles full of outrage and, um, and so on. If, if you're interested in, in cars it might try to turn you into someone who just watches endless, uh, e- endless re-runs of Top Gear or-
5. CWChris Williamson
  Wh- why is the person that's extreme more predictable?
6. SRStuart Russell
  Well, I, I, this, I think this is a, this is, that's an empirical hypothesis on, on my part, right? That, um, if you're more extreme your, you have a higher emotional response, uh, to content that affirms your, uh, your current views of the world. And so what, in politics we call it red meat, right? The, um, the kind of content that gets the base riled up about, you know, whatever it is they're riled up about, whether it's the environment or, or, you know, immigrants flooding our shores or whatever it might be, right? You know, if once you, once you get the sense that someone might be a little bit upset about too many immigrants then you, you send them stuff about all the bad things that immigrants do and, you know, you know, videos of people climbing over walls and, uh, sneaking into beaches and, and all the rest of the stuff, you know. And, and it's, human propagandists have known this forever, but historically human propagandists could only produce one message. Whereas the content algorithms can produce in theory one propaganda stream for each human being specially tailored to them, and the algorithm knows how you engage with every single piece of content, right? Your typical, you know, Hitler's propagandist sitting in Berlin had absolutely no idea on a moment-to-moment basis how people were reacting, uh, to the stuff that they were broadcasting, right? They could see it in the aggregate over longer periods of time, um, that certain, certain kinds of content was effective, uh, in the aggregate, but they don't have anything like the degree of control that, uh, that these algorithms have. And, you know, the, one of the strange things is that we actually have very little insight into what the algorithms are actually doing. So what I've described to you seems to be a, a logical consequence of how the algorithms operate and what they're trying to maximize, um, but I don't have hard empirical evidence that this is really what's happening to people, um, because the, the platforms are pretty opaque. You can-
7. CWChris Williamson
  But they're, they're opaque to themselves.
8. SRStuart Russell
  They're opaque to themselves so, you know, Facebook's over- own oversight board doesn't have access to the algorithms and the data, uh, to see what's going on.
9. CWChris Williamson
  Who does?
10. SRStuart Russell
  I think, um, the engineers, but their job is to maximize click-through, right? (laughs) Uh, so pretty much there isn't anyone who doesn't already have a vested interest in this who has access to what's happening. And, and that I think is something that we're trying to fix, um, both at the government level, so there's, uh, there's this new organization called, um, The Global Partnership on AI which is, um, you know, it could just be, you know, yet another do-goody talking shop, but it actually has government representatives sitting on it. So it can make po- direct policy recommendations to governments, um, and it ha- in some sense it has the force of governments behind it when it's talking to the Facebooks and Googles of the world. Um, so we're in the process of seeing if we can develop agreements between governments and the platforms, uh, for a certain type of transparency. So it doesn't mean, you know, looking at whatever, you know, looking at what Chris is, is watching on YouTube, I-
11. CWChris Williamson
  Do not want to do that.
12. SRStuart Russell
  You do not want to do that.
13. CWChris Williamson
  You do not want to do that at all. (laughs)
14. SRStuart Russell
  It, uh, it means, um, you know, being able to find out, you know, how much terrorist content, uh, is, is being pumped out, where is it coming from, who is it going to, uh-
15. CWChris Williamson
  Slightly more sort of aggregated stuff-
16. SRStuart Russell
  Yup.
17. CWChris Williamson
  ... like typical data scientists do.
18. SRStuart Russell
  Yup. Yup. So, and, and, and-Possibly being able to do some kinds of experiments like, you know, if, if the recommendation algorithm works this way, you know, what effects do we see on users compared to an algorithm that works in a different way? So th- to me, that's the, the really interesting question, um, is, you know, how do the recommendation algorithms work and what effect do they have on people. Um, and if we find that they really are manipulating people, right, that they're, there's sort of a consistent drift, um, that a person who starts in a particular place will get driven in some direction that they might not have wanted to be driven in, um, then that's really a problem and we have to think about different algorithms. And so in, in AI, we often distinguish between reinforcement learning algorithms, which, um, are trying to maximize a long-term sum of rewards, so, so in this case, the long-term rate of clicks, uh, on the content stream is what the algorithm's trying to maximize. Those kinds of algorithms, by definition, will manipulate because the, the action that they can take is to choose a particular piece of content to send you, and then the state of the world that they are trying to change is your brain. And so they, (laughs) they will learn to do it, right? A supervised learning algorithm is one that's trying to get it right, right now, right? So they are trying to, um, predict whether or not you're going to click on a given piece of content, right? So a supervised learning algorithm that learns a good model of what you will and won't click on could be used to decide what to send you in a way that's not based on reinforcement learning and long-term maximization, but simply, okay, given a model of what you're likely to click on, we'll send you something that's consistent with that model, right? In that case, I think you could, you could imagine that it would work in such a way that it wouldn't move you, it wouldn't cause you to change your preferences. But, um, if it was done right, could sort of leave you roughly where you are. Um-
19. CWChris Williamson
  Are you familiar with the term audience capture? Do you know what this means from a creator, an online creator's perspective?
20. SRStuart Russell
  Uh, I can imagine, but not as, not as a technical term, but, so-
21. CWChris Williamson
  Yeah, well it's not, it's not a technical term, but it's basically when you have a, a particular creator online who finds a message, narrative, rhetoric that resonates with the audience. And what you see is that this particular creator becomes captured and they start to feed their own audience a message that they know is going to be increasingly more well-liked. And for the most part, this actually does look like a slide toward one side of the, one particular direction or the other, at least politically it does. But with anything it does too, that people inevitably sort of niche down and then they bring their audience along with it. So the fascinating thing here, I mean, first off, it's unbelievable that these algorithms that are simply there to try and maximize time on site or click-throughs or watch time or whatever, that they have managed to find a way, things that we programmed managed to find a way to program us for it to be able to do its job better. I mean, that, when I read that in your book, I, it's insane. Like that's one of the most terrifying things that, and it's happening right, it happened. Like everybody that's listening to this has had something occur with regards to their preferences, their worldview, whatever it might be. Something has slid in one way or another. You may be right, it may not be toward the extremes. I would say anecdotally, based on what I see in the world, increasing sort of, um, levels of partisanship, no matter what it is, whether it be sports, politics, race relations-
22. SRStuart Russell
  Yeah.
23. CWChris Williamson
  ... anything, uh, people are moving toward the extremes. Why is this happening? "Oh, well, you know, it's people getting into echo chambers and they're only being s- shown stuff like that." And also the fact that the algorithms are actually trying to make them more predictable. But on top of that as well, there's another layer which is the creation of the content itself that comes-
24. SRStuart Russell
  Yeah. (laughs)
25. CWChris Williamson
  ... in from the creators and they have their own levels of manipulation, which has occurred from their feed. Then they kind of second order that into, "What do I want to create? What have I seen that's successful? What does my audience seem to resonate with from me?" So you have layers and layers of manipulation going on here.
26. SRStuart Russell
  Yeah. Yeah, and I think, I, you know, in some ways the creators are, are being manipulated by the system. Um, you know, I think every journalist now is thinking, "Okay, I have to get something that's clickbait. I have to write an article that can have a headline that is sufficiently attractive that it'll get clicked on," you know? And it is almost the point where the, you know, the headline and the article are completely divorced from each other. Um, y- a- a- and you can see this now in the comments, right? The, the people writing the comments at the end of the article will say, "Oh, I'm really, you know, I'm really pissed off. This is just clickbait. You know, the article really doesn't say anything about the thing you said you were gonna say." So, and so, so th- this... And, and it was not as if this has never been going on. And obviously you can't, you can't ban people from writing interesting articles or, you know, I often think about, you know, the novel and it says on the back, "I couldn't put it down," right? Well, should we ban novels that, you know, 'cause they, oh, that's addictive, you know, you can't have that, right? Uh, no, but I think...It wasn't too bad before because the feedback loop was very slow and there wasn't this, you know, e- e- targeting of individuals by algorithms who are ... You know, so you think about the, the number of learning opportunities for the algorithm, right? I mean, it's billions every day for the YouTube, uh, selection algorithm, right? So, it's the, the amount, the consistency, the frequency, and the customization of the learning opportunities for manipulation. So much greater. I mean, it's, it's, you know, millions of ... or billions of times greater and more systematic. And that, that systematic element ... So, it reminds me, there's this ... I don't know if it's apocryphal, but there's a, um, there's a story about this psychology lecturer and- and he's been teaching the students about subliminal, uh, effects. (smacks lips) And the, you know, the students decide to play a trick on him which is, you know, every, every time he's on the left-hand side of the room, they pay attention. They're really interested. And every time he walks onto the right-hand side of the room, they all are really bored, you know, start checking out their email and so on. And, uh, and, uh, by the end of the lecture, he's glued against the left-hand wall.
27. CWChris Williamson
  (laughs)
28. SRStuart Russell
  Right? And he has no idea that he's being manipulated. Um, but because of the fact that this was, like, systematic and, uh, you know, and, and sufficiently frequent, it's ... has a very, very strong effect. Uh, you know, and I think that, that's the difference here, is that it's ... because it's algorithmic, um, (smacks lips) and it's tied into this very high frequency interaction, uh, that people have with social media, it- it has a huge effect. Um, and it has a f- ... I think a pretty rapid effect as
1:03:14 – 1:20:44
Becoming Enfeebled by Machines
1. SRStuart Russell
  well.
2. CWChris Williamson
  What are some of the concerns you had? You mentioned it earlier on about ... Is it enfeebled? Becoming too enfeebled to the machines?
3. SRStuart Russell
  (smacks lips) Uh, yeah. So, this is, I think one of, one of two major concerns, you know, if we manage to create superhuman AI and to control it. Um, one concern is, um, the misuse concern, right? So, I call it the Dr. Evil problem, right? Dr. Evil, uh, doesn't want to use the provably safe, controllable AI. He wants to make his own AI that's gonna take over the world and, and you can imagine, that gets out of control and then bad things happen. Um, the enfeeblement problem is sort of overuse, right? That we, um ... Because we have available to us AI systems that can run our civilization for us, um, we lose the incentive to know how to run it ourselves. And, um, that problem, you know, it ... It's really ... It's, it's a really hard problem to figure out how to prevent it.
4. CWChris Williamson
  Because inevitably, the AI would have to make the human do something that probably in the moment the human didn't want to do. The AI would actually be programming itself to be less useful than it could be in order to give us a sort of hormesis stressor dose that allows us to stay youthful.
5. SRStuart Russell
  (laughs) Yeah. I mean, it's, it's, um ... So when I say overuse, I mean, I, I literally mean that, right? That we would use AI too much for our own good. Um, and, uh, so E.M. Forster has a story, you know. So he usually wrote, you know, late Victorian, early Edwardian social, you know, Brit- British upper class social issue kinds of novels. But, um, he wrote a science fiction story called The Machine Stops, which I also recommend to your audience. Um, and, um, it's, it's quite amazing, right? It, it, uh, it has the internet, email, uh, MOOCs.
6. CWChris Williamson
  When, when was it written?
7. SRStuart Russell
  1909. It has, uh, iPads, video conferencing, um ... Yeah, you know, people are obese because they never get out of their chairs. They're stuck on the computer all the time. They start to eschew face-to-face contact, um, because they are, um, uh, basically glued to the screen and, uh, and lose contact with the physical environment altogether. You know, all the kinds of things that people c- complain about now, uh, he wrote about. Um, and, and the machine, right? So it's not just the internet. It's the whole... it's a whole ... It's called the machine, that looks after everyone's physical needs. And so, um, so they just spend their time glued to the screen. They don't know how the machine works anymore. And, uh, they don't need to because the machine runs itself. And then, of course, it stops running itself and things go south. Um, but, you know, I- I did a little back of the envelope calculation. So it turns out that, that about 100 billion humans have lived and, um, our civilization, right? Is passed on by teaching the next generation of humans everything we know and then maybe they'll learn a bit more. Right? And if that process fails, right? Y- you know, if it goes into reverse, that the next generation knows less...... right? Then the thing, you could imagine things unraveling. And, um, you know, so the total amount of time, uh, uh, spent just passing on civilization is about a trillion person years of effort. And for that to end would be the biggest tragedy, uh, that one could possibly imagine, right? But if we have no incentive to learn it all because finally, right, instead of having to put it into the heads of the next generation of humans, we can just put it into the heads of the machine and then they take care of it, right? Um, and if you've seen WALL-E, right, that's exactly what happens, right? And they, they even show what happens to the generations over time, right? They become stupider and fatter and, um, and less, you know, they, they, they can't run their own civilization anymore. Um, so the machines should say, because this is such an important thing, right, uh, it's of value to us and to our descendants that we are capable, that we are knowledgeable, we know how to do things, um, that we have autonomy, uh, and intellectual vigor. Those are really important values. But, so the machines should say, "Okay, we are gonna stand back and let these humans, they have, we have to let the humans tie their own shoelaces," right, otherwise they'll never learn. Um, but we, right, we are short-sighted, lazy, greedy people and we might say, "No, you have to tie our shoelaces." We keep doing that and then (clears throat) , we, well, then we lose that autonomy in that intellectual vigor. So this is a cultural problem, right? It's a, it's a problem with us. The technology might be saying, "No, no, no, no, no, no," but we're overriding it and, you know ...
8. CWChris Williamson
  All of these problems, all of these problems are problems with us. The problems are the fact that our goals are plastic, the fact that our language is imprecise, the fact that we are sometimes rational, sometimes irrational, that we don't have a, an omnipotent view that we can see what we want, when we're going to want it. Also these challenges around the fact that sometimes we want something in the moment that we're not going to want in the future. And that we're going to complain at the algorithm and say, "Well, no, no, machine, you're supposed to be here to do my bidding and now you're telling me that I've got to walk to the shop to get the milk. I want you to get the milk." "Well, sire," i don't know why he's middle age, uh, middle ages peasant now. "Well, sire, you must go to the milk, get the milk yourself. You know that it is good for your calves and your bone density." So, uh, I, like, I mean, this, we haven't even touched on how this even begins to be converted into computer code, which is, I, I imagine a whole other complete minefield of, of difficulty to be able to actually get what we're talking about. This is purely sort of within the realm of philosophy. What are the, what are some of the challenges that we have here when you, uh, have a all-powerful super being that can do whatever you want?
9. SRStuart Russell
  Yeah. I mean, it's, it, it's inevitable in a sense because wh- what we, what we've always wanted is technology that's beneficial to us. And a lot of the time we say, "Oh, well, here's a, here's a technological idea. You know, I hope it's gonna be beneficial to us." Like, you know, the, the motorcar, for example, right? Um, and then it turns out not to be, or at least arguably, although it conveyed lots of benefits, it might have ended up being our destruction.
10. CWChris Williamson
  It's one of the black b- Bostrom's black balls, right? Or gray balls, I suppose, out of the urn.
11. SRStuart Russell
  It, it's, uh, yeah, so it, it m- it's responsible largely for, (clears throat) , for the destruction of the climate. And so (clears throat) unless we, unless we get out of this, it, it will have been a really bad idea to do it. Um, but almost by definition with something as powerful as, as AI or as super intelligent AI, um, you need to know, right? It's either gonna be very beneficial or very not beneficial, right? It's, it's not gonna be like margarine versus butter or something like that, right? Um, and so, uh, so we have to ask ourselves, okay, what does beneficial mean, right? If we're going to prove a theorem that such and such, you know, developing such and tech- such technology is actually gonna be beneficial, um, then inevitably it comes back to what are humans and what do we want, uh, and how do we work? And so, um, yeah, so that, that kind of surprised me when I started along this path of trying to solve the control problem. Um, you know, I, I had ideas for algorithms that, uh, that would do it and, and so on, but I didn't realize the extent to which it would push me into these, uh, these humanistic questions. Um, and that's been fascinating and it's, it's a little bit of a minefield for, you know, for a technologist to, to stray into these areas because they, they're, they operate in different ways. In many ways they're much more vicious than the, uh, the technological fields because, you know, tech- in technology it's sort of, it's, there's us humans and then there, you know, it works or it doesn't, right? Or it's true, you know, it's a true scientific theory or isn't. Um, so there's this third party called nature, uh, out there, um, but in the humanistic areas, there isn't a third party, right? It's just...... you know, one school of thought, another school of thought-
12. CWChris Williamson
  It's just debates all the way down.
13. SRStuart Russell
  ... fighting it out for supremacy. Um, and it's, it takes a while to adjust to that. Um, but the questions are, you know, super important and, and really fascinating. Uh, so I've enjoyed it a lot. Um, so coming back to the, the question of the algorithms, right? Um, you know, one way to think about it is, is, uh, that's perhaps a little bit less daunting, um, is to, to look back at what's happened in, in AI with respect to, should we say, ordinary uncertainty. Um, so the, in the early days of AI, we mostly worked on problems where the rules were known and fixed and deterministic, like the rules of chess, or, you know, finding a path through a, a map or something like that, right? We know what the map is. We know that if you turn left, you go left, so, um, so we can have logical rules. We could use, uh, these deterministic symbolic techniques to solve the problem. And then we found that as we moved into real world, um, uncertainty becomes really important, right? So, uh, you know, if, if you're controlling the Mars rover, uh, and you have to, you know, you give it some command to, you know, to go 70 meters in a particular direction because it, you know, takes 10 minutes for the commands to go backwards and forwards, um, you know, is it gonna get there? Well, you don't know. It might get stuck or it might, you know, deviate a little bit or, or, you know, one wheel will, will start spinning and it won't make any progress, or who knows? So real world problems, you always have to handle the uncertainty in your knowledge of, um, the physics of the world and in your, even just what your senses are telling you, right? That their senses themselves are imperfect and noisy and incomplete. So uncertainty became a core consideration in, in AI around 19- early 1980s, I would say. And so the period from, you know, late '80s to, uh, early 2000s was really the period where probability was, you know, the dominant paradigm for AI research. But in all that time, it does not seem to have occurred to anyone except for a few, you know, uh, I think very bright people, that there's also uncertainty in the objective. So we have all these problem formulations for decision-making under uncertainty, but they assume that you know the objective exactly and perfectly. Um, and at least looking back at that now, it's just like, "Well, that's bonkers," right? (laughs) It's just as bonkers as assuming that you know the physics of the world exactly and perfectly, or that your senses give you exact and perfect access to the state of the world at all times, right? Um, because, you know, we had already seen many examples of objective failure, right? Where we specified the wrong objective and the machine did something complete- you know, that we thought was completely bonkers, but in fact it was doing exactly what we told it to do. We just didn't realize what we had told it. Um, and my favorite example is, um, one from, uh, from simulated evolution. And, um, you know, so simulated evolution, you, you, you define a fitness landscape, right? So which simulated creatures are considered to be more fit or less fit, and therefore they get to reproduce and, and mutate and then gradually you can evolve creatures that are really, really good at whatever it is you want them to be good at. So, um, so the objective was, uh, well, what they wanted was to evolve creatures that could run really fast. So they specified the objective as the maximum velocity of the center of mass of the creature. And, um, that sounds like a perfectly reasonable deficient. So, so what evolved, enormously tall trees, like, 100 miles high that would then fall over.
14. CWChris Williamson
  (laughs)
15. SRStuart Russell
  (laughs) And, uh, and, uh, in falling, they went really, really fast. So they won the competition, right? They, they, they, they turned out to be the solution to that problem, right?
16. CWChris Williamson
  Someone thought they were gonna get some supercharged nitro cheetahs or leopards or something.
17. SRStuart Russell
  Exactly. Yeah. (laughs)
18. CWChris Williamson
  And instead you end up with trees reaching up into the stratosphere and then falling all over the place.
19. SRStuart Russell
  Yeah. Uh, so, you know, I thought that was, it's a great example, you know, and of course that's only, you know, that's a simulation in the lab, so people thought, "Ho, ho, ho," and then went, "Okay, let me fix the problem." Because in the real world, right, in, you know, in your climate engineering system or your, you know, your, uh, economic governor or whatever it might be, right-
20. CWChris Williamson
  Mm-hmm.
21. SRStuart Russell
  ... there are, you can't just go, "Ho, ho, ho," and reset.
22. CWChris Williamson
  (laughs) We'll fix it now. Yeah, we'll press the reset button. Brian Christian told me about a, a problem, robots playing football, and they'd put a very small utility function in for gaining control of the ball, if possession is an instrumental goal towards scoring, the-
23. SRStuart Russell
  Yeah.
24. CWChris Williamson
  ... you can't score if you don't have the ball.
25. SRStuart Russell
  Right.
26. CWChris Williamson
  And what the robot found was that it could actually maximize its utility function by going up to the ball and vibrating its paddle 100 times a second up against the ball, which is far easier than actually trying to score.
27. SRStuart Russell
  Yep.
28. CWChris Williamson
  It ended up...... thinking it had done really great and the guys just had these sort of seizuring robots all over the pitch-
29. SRStuart Russell
  (laughs)
30. CWChris Williamson
  ... by breaking up against the ball.
1:20:44 – 1:42:23
Maintaining Control of AI Growth
1. SRStuart Russell
2. CWChris Williamson
  It feels to me like there's sort of two cars in this race. On one side we have technological development that has lots of facets, hardware, algorithms, so on and so forth. And then on the other side you have the control problem, you have getting the alignment right. It has to be that the alignment problem gets across the finish line before the technology does, or else we are rolling, essentially rolling the dice and hoping that we've got it right by some sort of fluke. And I imagine that there are far more ways of getting it wrong than there are of getting it right.
3. SRStuart Russell
  Yeah. Well, I mean, getting it wrong is actually the default, right? If we just continue pushing on the AI technology in the standard model, we'll get it wrong.
4. CWChris Williamson
  Are you... So is there any chance that if you continued with the standard model that it could be right? Or would you give it such a low chance that it's negligible?
5. SRStuart Russell
  Uh, I think it's negligible or, or, you know, a- arguably what's happening with social media is an example of getting it wrong. Um, you know, and I, I... There are other people who have pointed out that, you know, we don't need to wait for AI. Corporations that are maximizing, you know, quarterly discounted profit stream are also examples of machines pursuing incorrectly defined objectives that are destroying the world. Uh, and, you know, it... You could... If you look at the climate issue from that point of view, I d- I find it sort of enlightening, right? We, we have been outwitted by this AI system called the fossil fuel industry, right? It happens to have human components, but the way corporations are designed, right, they are, they are machines with human components. And, um, actually the individual preferences of the humans in those machines don't really matter very much, um, because the machine is designed to maximize profit. And, uh, and they outwitted the human race. They developed... You know, for more than 50 years they've been running global propaganda and, uh, subversion campaign to keep... You know, to enable them to keep, um, selling, uh, fossil fuels. And they won, right? We can all say we're right or we know, we know that we shouldn't be doing this and we know that... Uh, we know the causes of climate change and, uh, right? But we lost.
6. CWChris Williamson
  A lot fewer implications of that than an all-knowing, all-powerful artificial intelligence though. So although the implications are still grave if the climate problems get worse, it's not the same. And again this, the control problem h- simply has to get across the line. If you're essentially adamant that currently if you scale up the competence, not, probably not comp- the power I suppose-
7. SRStuart Russell
  Mm-hmm.
8. CWChris Williamson
  ... of the computation that we have, um, it's bad, it's bad, it's, it's a bad situation.
9. SRStuart Russell
  Yeah, so-
10. CWChris Williamson
  But obviously you have-
11. SRStuart Russell
  ... the difference is the irreversibility, right? I mean, it... Climate change probably isn't gonna make us extinct, you know, unless, unless there's some real chaos theory catastrophe happens. Um, and, uh, eventually we'll all be so fed up that we actually retake control from the fossil fuel industry, and that's sort of happening. Um, but yeah, with AI it could be irreversible, this loss of control. Uh, and, you know, if, if, if I'm right that examples like sh- social media are showing that we are already seeing the negative consequences of incorrectly defined objectives and, and, you know, even relatively weak machine learning algorithms that are, that are pursuing them, um, then we should pay attention, right? These are the canaries in the coal mine and we should be saying, "Okay, we, we need to slow down and we need to, you know, look at this different paradigm." And, you know, the, the standard model I- is sort of just one corner, right? It's the corner where the objective is completely and perfectly known, or at least that's the, that's the corner where it's appropriate to use the standard model, right? And there's this... All the rest of the building we haven't even looked at yet, right? Where there's uncertainty about...... what the objective is and the system can, uh, behave accordingly. Um, and we've just, you know, just in the last few years have we had any algorithm that can solve, uh, this new category of problem. And it does... I mean, so the algorithms exist, right? They're, they're very simple and they work in very, uh, restricted and simple, um, instances of the problem, but they, they show the right properties, right? That they, they defer, they ask permission, um, they understand what the human is trying to teach them about, about human preferences. And, um, you know, it se- it seems our job in what's called, for want of a better word, the AI safety community, our job is to build out that technology. To create all the new algorithms and theoretical frameworks and, and, and demonstration systems and so on to convince the rest of the AI community that this is the right way to do AI.
12. CWChris Williamson
  Because we can't do the other thing in the race. We can't slow down the technological progress because trying to neuter one particular agent or actor or country or nation state or even one group of nations doesn't guarantee that some other group is not going to. China saying, "Right, we'll stop," doesn't mean that America will say, "We- we're just gonna keep going," or vice versa.
13. SRStuart Russell
  Yeah. Well, it's just... I mean, the, the, the potential upside that people are seeing is so huge, right? I mean, it... W- when, when I say it will be the biggest event in human history, I mean it.
14. CWChris Williamson
  Why?
15. SRStuart Russell
  I- it's not, uh... Because y- you know, our advantage as humans, our whole civilization is based on a certain level of intelligence that we have, right? So we, our brains are the source of the, the intelligence fuel that, that makes our civilization go round. If we have access to a lot more all of a sudden, right, that is a, that is a step change in our civilization. Um, and, uh, you know, on, on the upside, it might enable us to solve problems that have been very resistant like disease, poverty, conflict. Um, and on the downside, it might just be the last thing we do.
16. CWChris Williamson
  (laughs)
17. SRStuart Russell
  So...
18. CWChris Williamson
  If you could, if you had a God's eye view, would you put a pause on technological development for 100 years outside of the control problem, for 1,000 years, for 50,000 years? Because we've spoken about the dangers of killing people that haven't yet been born, and when you're talking about civilizational potential, the observable universe, you know, galactic sized empires, von Neumann probes making everything is... Uh, you know, you're talking trillions and trillions of human lives, even if you go from the utilitarian approach. You have an unlimited amount of utility and happiness that could be given, and because we're unable at the moment to slow down technology, potentially within the next 100 years, all of that could be snuffed out.
19. SRStuart Russell
  Yeah, it's, it's interesting. I- in, in many of the works, I think in Bostrom's and in Max Tegmark's and others, the argument is based on th- you know, these quintillions of future humans who might, you know, be able to colonize the universe and so on. It... Ta- that's never been a motivation for me, right? I mean, if I, if, if I have a picture, it's just of a small village with a church and people playing cricket on the, on the village green, um, and that's what I (laughs) do-... You know, I don't want that to disappear. I don't want the civilization that we have to be gone, because it's the only thing that has value. Um, and, um, I, I try not to think about what would I (laughs) do, what would I do if I was God, as you say. Uh, it's not, uh...
20. CWChris Williamson
  Not good for the ego.
21. SRStuart Russell
  (laughs) Well, it's just, um... I, I mean, obviously I don't ha-... No, no one I think is going to be able to switch off, um, scientific progress. You know, it, it... There are precedents. Um, the biologists switched off progress on direct modification of the human genome, uh, in ways that are what they call heritable modifications. Germline editing. Um, they switched that off. They said, you know, in 1975 or gradually from 1975 onwards, they, they decided that that was not something they wanted to do. Which is interesting because a large part of the history of genetics and that whole branch of biology, um, the improvement of the human stock was actually one of the major objectives. Um, you know, and eugenics before the Second World War, uh, thought of itself as, as a noble, uh, a noble mission.You know, you could argue about, about that but, um, for the biologist to say, "You know what, we could do this, but we're not." Right? Um, that was a big step and, you know, is it possible for that to happen in AI? I think it's much more difficult, um, because in biology, right, we are continuing to understand, right, so the, uh, developmental biology, right? So how does a given DNA sequence produce an organism, right, and wh- what goes wrong, and, you know, i- is it a problem, uh, with the genes or a problem with the, the development environment of the organism or, or what? Um, and if you understand all those questions then presumably you could, you could then say, "Okay, now I know how to modify the human genome so we can avoid all those problems." Um, so the s- the scientific knowledge is moving ahead, but the decision is we're not going to use that knowledge for that kind of thing. And you can draw that boundary pretty easily because, you know, we're talking about physical, you know, physical procedures involving actual human beings and, um, and so on. And, and that's been regulated for many decades already and, and so with AI, once you understand how to do something, it's pretty much done. Right there, um, mathematics and code are just two sides of the same coin. And, uh, you know, c- mathematical ideas, you, you can't go around looking at everyone's whiteboard and saying, "Okay, I see you've got, you know, sigma for X equals one... Okay, stop right there. All right. Yeah. That's, that's one too many Greek symbols. You gotta stop writing." Right? (laughs) You know, so we... Because the question of, you know, decision-making and learning and so on, these are fundamental questions. We can't stop research on them. So I have to assume that the scientific understanding of how to do it is just going to continue to grow. Um, if, if it was the case, which some people seem to think, that to go from a scientific understanding to a deployed system would require, you know, some massive gigawatt installation, you know, with billions of GPUs and so on and so forth, then, then perhaps you could regulate at that point, right?
22. CWChris Williamson
  Because there would be a physical limitation that would be quite easy to enact. Okay, you can't have this many, this much power, this many... I'm gonna guess that you feel otherwise that you don't need that much hardware to run something that could be quite dangerous.
23. SRStuart Russell
  Correct. Yeah. I, I, I think we already have enough power, as I said, um, and, uh, it, you know, it's, it's, it's very hard to do meaningful calculations, but in just in terms of raw numerical operations per second, um, a Google TPU pod, which is the tensor processing unit, you know, even three or four years ago was operating at a higher rate than the possible theoretical maximum capacity of the human brain. Right? So a ballpark figure for the human brain is 10 to the 17 operations per second. But I don't think any neuroscientists believe that we're doing anything like that much, right? I mean, they would probably ballpark it at 10 to the 12 or 10 to the 13 or something like that. But if you, if you grant every possibility, it's 10 to the 17 where, or, you know, T- TPU pod, which is, you know, sort of wardrobe sized thing is a 10 to the 17, you know, the biggest supercomputer is a 10 to the 18. Um, so, uh, you know, I think we have way more than enough power to build a super intelligent machine. So I just don't, I don't think that trying to cut it off at the, you know, large scale hardware installation level is going to be feasible either. Any mo- you know, if, if you remember the old, um, Apple ads for the G5, um, so the US had these export, put export controls on anything that was more than one gigaflop, right? Which sounds ridiculous now, but they put export, you know, 'cause they didn't want those falling into the hands of the Russians or the Chinese. So the, you know, and so Apple produced this ad, you know, with their little G5, like this little cube, and they had all these tanks like surrounding (laughs) this little G5 and saying, you know, "Listen, this, this little G5 is now under, you know, the, the US government has, has-"
24. CWChris Williamson
  Too hot to handle. Yeah, exactly.
25. SRStuart Russell
  Right. So it's, you know, they used it as advertising material, so it's, um, you know, it, it's just unlikely that you could, uh, that you, you could prevent the creation of super intelligent AI just by regulating hardware installations. Um, so, so I do think of it as, as you say, a race. Um, I think, you know, we may see catastrophes that are more obvious and, um, unequivocal than what's happening in social media.Um, and you know, so that could happen on a small scale in self-driving cars. You know, I- I thought when the first Tesla completely failed to see a huge white truck and crash straight into it at full speed, (laughs) I thought, you know, that kind of accident should at least say, okay, maybe these AI systems are not as good as we thought they were. Um, but didn't seem to have much impact. Um, and you know, we've, we've killed several more people pretty much the same way. Um, so it would have to be, I think, something, something pretty major would have to happen.
26. CWChris Williamson
  Well, you say that, but I've been thinking this over the last 16 months, that COVID should have been the biggest wake up call for syn bio, for natural pandemics, for engineered pandemics, for research into that, any- anything over that side of the aisle, for whatever it is, BSL3 or BSL4 labs, they should all be on the moon, they should all be, you know, on the bottom of the ocean. We should be air gapped from them. And no one's talking about that. Rob Reid, Rob Reid's talking about it and that's it.
27. SRStuart Russell
  (laughs) .
28. CWChris Williamson
  Like there's no one, no one's bothered. They're not.
29. SRStuart Russell
  I think, uh, I- I don't know. I have heard some biologists talking about reevaluating...
30. CWChris Williamson
  (laughs) After this global pandemic-

Episode duration: 1:49:21

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode LCuDBU8nbKc

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome