Skip to content
Lex Fridman PodcastLex Fridman Podcast

Sergey Levine: Robotics and Machine Learning | Lex Fridman Podcast #108

Sergey Levine is a professor at Berkeley and a world-class researcher in deep learning, reinforcement learning, robotics, and computer vision, including the development of algorithms for end-to-end training of neural network policies that combine perception and control, scalable algorithms for inverse reinforcement learning, and deep RL algorithms. Support this podcast by signing up with these sponsors: - ExpressVPN at https://www.expressvpn.com/lexpod - Cash App - use code "LexPodcast" and download: - Cash App (App Store): https://apple.co/2sPrUHe - Cash App (Google Play): https://bit.ly/2MlvP5w EPISODE LINKS: Sergey's Twitter: https://twitter.com/svlevine Sergey's Website: http://rail.eecs.berkeley.edu/ Sergey's Papers: https://scholar.google.com/citations?user=8R35rCwAAAAJ PODCAST INFO: Podcast website: https://lexfridman.com/podcast Apple Podcasts: https://apple.co/2lwqZIr Spotify: https://spoti.fi/2nEwCF8 RSS: https://lexfridman.com/feed/podcast/ Full episodes playlist: https://www.youtube.com/playlist?list=PLrAXtmErZgOdP_8GztsuKi9nrraNbKKp4 Clips playlist: https://www.youtube.com/playlist?list=PLrAXtmErZgOeciFP3CBCIEElOJeitOr41 OUTLINE: 0:00 - Introduction 3:05 - State-of-the-art robots vs humans 16:13 - Robotics may help us understand intelligence 22:49 - End-to-end learning in robotics 27:01 - Canonical problem in robotics 31:44 - Commonsense reasoning in robotics 34:41 - Can we solve robotics through learning? 44:55 - What is reinforcement learning? 1:06:36 - Tesla Autopilot 1:08:15 - Simulation in reinforcement learning 1:13:46 - Can we learn gravity from data? 1:16:03 - Self-play 1:17:39 - Reward functions 1:27:01 - Bitter lesson by Rich Sutton 1:32:13 - Advice for students interesting in AI 1:33:55 - Meaning of life CONNECT: - Subscribe to this YouTube channel - Twitter: https://twitter.com/lexfridman - LinkedIn: https://www.linkedin.com/in/lexfridman - Facebook: https://www.facebook.com/LexFridmanPage - Instagram: https://www.instagram.com/lexfridman - Medium: https://medium.com/@lexfridman - Support on Patreon: https://www.patreon.com/lexfridman

Lex FridmanhostSergey Levineguest
Jul 14, 20201h 37mWatch on YouTube ↗

EVERY SPOKEN WORD

  1. 0:003:05

    Introduction

    1. LF

      The following is a conversation with Sergey Levine, a professor at Berkeley and a world-class researcher in deep learning, reinforcement learning, robotics, and computer vision, including the development of algorithms for end-to-end training of neural network policies that combine perception and control, scalable algorithms for inverse reinforcement learning, and in general, deep RL algorithms. Quick summary of the ads. Two sponsors: Cash App and ExpressVPN. Please consider supporting the podcast by downloading Cash App and using code LEXPODCAST and signing up at expressvpn.com/lexpod. Click the links, buy the stuff. It's the best way to support this podcast and, in general, the journey I'm on. If you enjoy this thing, subscribe on YouTube, review it with five stars on Apple Podcast, follow on Spotify, support it on Patreon, or connect with me on Twitter @LexFridman. As usual, I'll do a few minutes of ads now and never any ads in the middle that could break the flow of the conversation. This show is presented by Cash App, the number one finance app in the App Store. When you get it, use code LEXPODCAST. Cash App lets you send money to friends, buy Bitcoin, and invest in the stock market with as little as $1. Since Cash App does fractional share trading, let me mention that the order execution algorithm that works behind the scenes to create the abstraction of the fractional orders is an algorithmic marvel. So big props to the Cash App engineers for taking a step up to the next layer of abstraction over the stock market, making trading more accessible for new investors and diversification much easier. So again, if you get Cash App from the App Store or Google Play and use the code LEXPODCAST, you get $10 and Cash App will also donate $10 to FIRST, an organization that is helping to advance robotics and STEM education for young people around the world. This show is also sponsored by ExpressVPN. Get it at expressvpn.com/lexpod to support this podcast and to get an extra three months free on a one-year package. I've been using ExpressVPN for many years. I love it. I think ExpressVPN is the best VPN out there. They told me to say it, but it happens to be true in my humble opinion. It doesn't log your data, it's crazy fast, and it's easy to use, literally just one big power-on button. Again, it's probably obvious to you, but I should say it again, it's really important that they don't log your data. It works on Linux and every other operating system, but Linux, of course, is the best operating system. Shout out to my favorite flavor, Ubuntu Mate 20.04. Once again, get it at expressvpn.com/lexpod to support this podcast and to get an extra three months free on a one-year package. And now, here's my conversation with Sergey Levine.

  2. 3:0516:13

    State-of-the-art robots vs humans

    1. LF

      What's the difference between a state-of-the-art human, such as you and I ... Well, I don't know if we qualify as state-of-the-art humans, but a state-of-the-art human and a state-of-the-art robot?

    2. SL

      That's a very (laughs) interesting question. Robot capability is, it, it's kind of a ... I think it's a very tricky thing to ex- to, to understand because there are some things that are difficult that we wouldn't think are difficult and some things that are easy that we wouldn't think are easy. Um, and there's also a really big gap between capabilities of robots in terms of, uh, hardware and their physical capability and capabilities of robots in terms of what they can do autonomously. There is a, a little video that, um, I think robotics, uh, researchers really like to show, especially robotics learning researchers like myself, from, uh, 2004 from, uh, Stanford, uh, which demonstrates a prototype robot called the PR1. And the PR1 was a robot that was designed as a home assistance robot, and there's this beautiful video showing the PR1, uh, tidying up a living room, uh, putting away toys, and at the end, bringing a, a, a beer to the, uh, person sitting on the couch, uh, which looks really amazing. And then, uh, the punchline is that this robot is entirely controlled by a person.

    3. LF

      Yes.

    4. SL

      So you can ... So the, in some ways, the gap between a state-of-the-art human and a state-of-the-art robot, if the robot has a human brain, is actually not that large. Uh, now obviously, like, human bodies are sophisticated and, and very robust and resilient in many ways, but on the whole, if we're willing to, like, spend a bit of money and do a bit of engineering, we can kind of close the hardware gap almost. But the intelligence gap, that one is, is very wide.

    5. LF

      And when you say hardware, you, you're referring to the physical, sort of the actuators, the actual-

    6. SL

      Mm-hmm.

    7. LF

      ... body of the robot as opposed to-

    8. SL

      Yeah.

    9. LF

      ... the hardware on which the cognition, the nervous, the hardware of the nervous system.

    10. SL

      Yes. Ex- exactly. I, I'm referring to the, the, the body rather than the mind.

    11. LF

      So what's How much on the human side of the cognitive abilities, in your sense, is nature versus nurture? So- so how much of it is a product of evolution and how much of it is something we learn from sort of scratch-

    12. SL

      Yeah.

    13. LF

      ... or from the day we're born?

    14. SL

      I'm gonna read into your question as asking about the implications of this for AI.

    15. LF

      Of course, exactly.

    16. SL

      Because I'm- I'm not a bi- biologist, I can't really, like, speak authoritatively on the topic.

    17. LF

      Yeah, so and to linger on it, if- if it's so, if it's all about learning-

    18. SL

      Mm-hmm.

    19. LF

      ... then there's more hope for AI?

    20. SL

      Yeah. So the- the way that I look at this is that, um, you know, well, first of course, biology is very messy and it's, if you ask the question, "How does a person do something?" Or, "How does a person's mind do something?" um, you can come up with a- with a bunch of hypotheses and oftentimes you can find support for many different, often conflicting hypotheses. Um, one way that we can approach the question of- of- of what the implications of this for AI- AI are, is we can think about what's sufficient. So, you know, maybe a person is from birth very, very good at some things, like for example, recognizing faces. There's a very strong evolutionary pressure to do that. If you can recognize the- your mother's face, then, uh, you're more likely to survive and therefore people are good at this. But we can also ask like, what's- what's the minimum sufficient thing?

    21. LF

      Right.

    22. SL

      And one of the ways that we can study the minimal sufficient thing is we could, for example, see what people do in- in unusual situations. If you present them with things that evolution couldn't have prepared them for. You know, our- our daily lives actually do this to us all the time. We- we didn't evolve to- to deal with, you know, automobiles and space flight and whatever. Um, so there are all these situations that we can find ourselves in, uh, and we do very well there. Like, I can give you a- a joystick to control a robotic arm, uh, w- which you've never used before and you might be pretty bad for the first couple of seconds but if I tell you, like, "Your life depends on using this robotic arm to, like, open this door." Uh, you'll probably manage it even though you've never seen this device before, you- you've never used a joystick to control this and you'll kind of muddle through it and that's not, um, your evolved natural ability, that's your- your flexibility or your adaptability and that's exactly where current robotic systems really kind of fall flat.

    23. LF

      But I wonder how much general almost, what we think of as common sense, pre-trained models underneath all of that. So that ability to adapt to a joystick, it- is, requires you to have a kind of, you know, I'm human so it's hard for me to introspect all the knowledge I have about the world but it seems like there might be an iceberg underneath of the amount of knowledge we actually bring to the table.

    24. SL

      Yeah.

    25. LF

      That's kind of the open question. What's your take on that?

    26. SL

      I think there's abs- There's absolutely an iceberg of knowledge th- that we bring to the table but I think it's very likely that that iceberg of knowledge is actually built up over our lifetimes. Uh, because we have, you know, we have a lot of prior experience to draw on, uh, and it kind of makes sense that the right way for us to, you know, to- to optimize, uh, our- our efficiency, our evolutionary fitness and so on is to utilize, uh, all that experience to- to build up the best iceberg we can get. Uh, and- and that's actually one of the, you know, while that sounds an awful lot like what machine learning actually does, I think that for modern machine learning, it's actually a really big challenge to take this unstructured massive experience and distill out something that looks like a common sense understanding of the world.

    27. LF

      Mm-hmm.

    28. SL

      And perhaps part of that isn't, it's not because something about machine learning itself is- is broken or hard, but because we've been a little too rigid in subscribing to a very supervised, very rigid notion of learning, you know, kind of the input-output, Xs go- go to Ys sort of model. And maybe what we really need to, uh, to do is to view the world more as like a mass of experience that is not necessarily providing any rigid supervision but sort of providing many, many instances of things that could be and then you take that and you distill it into some sort of com- common sense understanding.

    29. LF

      I see. Well, you're- you're painting an optimistic, beautiful picture, especially from the robotics perspective 'cause that- that means we just need to invest in both better learning algorithms. Uh, figure out how we can get access to more and more data for those learning algorithms to extract signal from and then, uh, accumulate that iceberg of knowledge. It's a beautiful picture. It's a hopeful one.

    30. SL

      I think it's potentially a little bit more than just that. Um, and this is, this is where we're, we perhaps reach the limits of our current understanding but one thing that I think that the research community hasn't really resolved in a satisfactory way is, uh, how much it matters where that experience comes from. Like, you know, do you just like download everything on the internet and cram it into essentially the 21st century analog of the giant language model and- and see what happens or does it actually matter whether your machine physically experiences the world or in- in a sense that it actually attempts things, observes the outcome of its actions and- and kind of augments its experience that way.

  3. 16:1322:49

    Robotics may help us understand intelligence

    1. LF

      step back, what is the goal of robotics? What's the general problem robotics are trying to solve? You actually kind of painted two pictures here. One of sort of the narrow, one of the general. What, in your view, is the big problem with robotics? Again, ridiculously philosophical, high level (laughs) questions.

    2. SL

      I think that, um, you know, maybe there are two ways I can answer this question. One is there's a very pragmatic problem which is like what would make robots... what would sort of maximize the usefulness of robots? And there the answer might be something like a system where, uh, uh, a system that, uh, can, uh, perform whatever task a human user, uh, sets for it, you know, within the physical constraints, of course. If you tell it to teleport to another planet, it probably can't do that. But if you, if you ask it to do something that's within its physical capability, then potentially with a little bit of, uh, additional training or a little edi- bit of additional trial and error, it ought to be able to figure it out in much the same way as like a human teleoperator ought to figure out how to s- how to drive the robot to do that. That's kind of the very pragmatic, uh, view of what it would take to kind of solve the, the robotics problem, if you will. But I think that there is a second answer and that answer, th- that's, the answer is a lot closer to why I want to work on, on robotics which is that I think it's, it's, it's less about what it would take to do a really good job in the world of robotics, but more the other way around of what robotics can bring to the table to help us understand artificial intelligence.

    3. LF

      So your dream fundamentally is to understand intelligence?

    4. SL

      W- yes. I think that's the dream for many people who, who, who w- who actually work in this space. Uh, I think that there is, there is something very pragmatic and very useful about studying robotics, but I, I do think that a lot of people that go into this field actually, uh, you know, the things that they draw inspiration from are the potential for robots to like help us learn about intelligence and about ourselves.

    5. LF

      So wh- that's, that's fascinating that robotics is basically...... the space by which you can get closer to understanding the fundamentals of artificial intelligence. So, what is it about robotics that's different from some of the other approaches? So, if we look at some of the early breakthroughs in deep learning or in the computer vision space and the natural language processing, there's really nice clean benchmarks that a lot of people competed on and thereby came up with a lot of brilliant ideas. What's the fundamental difference to you between computer vision, purely defined on ImageNet, and kind of the bigger robotics problem?

    6. SL

      So there are a couple of things. Uh, one is that with robotics, um, you kind of have, um, you kind of have to take away many of the crutches. So, you have to deal with- with both the- the- the particular problems of perception, control, and so on, but you also have to deal with the integration of those things. And, uh, you know, classically, we've always thought of the integration as kind of a separate problem, so a- a classic kind of modular engineering approach is that we solve the individual sub-problems then wire them together, and then the whole thing works. Um, and one of the things that we've been seeing over the last couple of decades is that, well, maybe studying the thing as a whole might lead to just, like, very different solutions than we- if we were to study the parts and wire them together. So, the integrative nature of robotics research helps us see, you know, the different perspectives on the problem. Uh, another part of the answer is that with robotics, um, it- it casts a certain, uh, paradox into very clever relief so that this is sometimes referred to as, uh, Moravec's paradox, the, uh, idea that in artificial intelligence, things that are very hard for people can be very easy for machines and vice versa, things that are very easy for people can be very hard for machines. So, you know, uh, integral and differential calculus, pretty difficult to learn for people, but if you program a computer to do it, it can derive derivatives and integrals for you all day long without any trouble. Uh, whereas, uh, some things like, you know, drinking from a cup of water, very easy for a person to do, very hard, uh, for a robot to deal with. And sometimes when we see such blatant discrepancies, they give us a really strong hint that we're missing something important. So, if we really try to zero in on those discrepancies, we might find that little bit that we're missing. And it not that we need to make machines better or- or worse at math and better at drinking water, but just that by studying those discrepancies, we might find some new, uh, insight.

    7. LF

      So, that- that could be- that could be in any space. It doesn't have to be robotics. But you're saying, um... I mean, I- I get it's kind of interesting that robotics seems to have a lot of those discrepancies, so the-

    8. SL

      Yeah.

    9. LF

      ... the- the Hans Moravec paradox is probably referring to the space of the- the physical interaction, like you said, m- object manipulation, walking, all the kind of stuff we do in the physical world. Th- that... Well, how do you make sense... If you were to try to disentangle, uh, the- the Moravec paradox, like, why is there such a gap in our intuition about it? Why do you think manipulating objects is so hard from everything you've learned from applying reinforcement learning in this space?

    10. SL

      Yeah. I think that one reason is maybe that, um, for many of the pro- for many of the other problems that we've studied in AI and computer science and so on, the notion of input, output, and supervision is much, much cleaner. So, computer vision, for example, deals with very complex inputs, but it's comparatively a bit easier, at least up to some level of abstraction, to cast it as a very tightly supervised problem. It's comparatively much, much harder to cast robotic manipulation as a very tightly supervised problem. You can do it, it just doesn't seem to work all that well. So, you could say that, well, maybe we get a labeled dataset where we know exactly which motor commands to send and then we train on that, but for various, uh, reasons, that's not actually, like, such a great solution, and it also doesn't seem to be even remotely similar to how people and animals learn to do things because we're not told by, like, our parents, "Here's how you, uh, fire your muscles in order to walk." Uh, we, you know, we do get some guidance, but the really low-level detailed stuff we figure out mostly on our own.

    11. LF

      And that's what you mean by tightly coupled, that every single little sub-action gets a supervised signal of whether it's a good one or not?

    12. SL

      Right. So- so while in computer vision you could sort of imagine up to a level of abstraction that maybe, you know, if somebody told you this is a car and this is a cat and this is a dog, in motor control it's very clear that that was not the case.

  4. 22:4927:01

    End-to-end learning in robotics

    1. LF

      If we look at sort of the sub-spaces of robotics that, uh... Again, as you said, robotics integrates all of them together and we get to see how this beautiful mess interplays, but... So there's nevertheless still perception, so it's the- the computer vision problem broadly speaking understanding the environment, then there's also, and maybe you can correct me on this kind of categorization of the space, then there's prediction in trying to anticipate what things are going to do into the future in order for you to be able to, um, act in that world, and then there's also this game theoretic aspect of how your actions will change the behavior of others. In this kind of space, w- what... And this is bigger than reinforcement learning, this is just broadly looking at the problem of robotics, what's the hardest problem here?

    2. SL

      Mm-hmm.

    3. LF

      Or is there... Or is what you said true that when you start to look at all of them together, that's an- that's a whole nother thing? Like, you can't even say which one individually is harder because all of them together, you should only be looking at them all together?

    4. SL

      I think when you look at them all together, some things actually become easier, and I think that's actually pretty important. Um, so we- we had, um, you know, uh, back in, um, 2014, we had, uh, some work, basically our- our first work on end-to-end reinforced learning for robotic manipulation skills from vision, which, you know, at the time was something that seemed a little inflammatory and controversial in the robotics world. Uh, but, uh, other than the- the inflammatory and controversial part of it, the point that we were actually trying to make in that work is that...... for the particular case of combining perception and control, you could actually do better if you treat them together than if you try to separate them. And the way that we tried to demonstrate this is we picked, you know, a fairly simple motor control task where a robot had to insert a little red trapezoid into a trapezoidal hole. Uh, and we had our separated, uh, solution which involved first detecting the hole using a pose detector, and then, uh, actuating the arm to put it in, and then our intent solution which just mapped pixels to, to torques. Uh, and one of the things we observed is that I- if you use the intent solution, essentially the pressure on the perception part of the model is actually lower. Like, it doesn't have to figure out exactly where the thing is in 3D space, it just needs to figure out, uh, where it is, you know, distributing the errors in such a way that the horizontal difference matters more than the vertical difference because vertically it just pushes it down all the way until it can't go any further, and their perceptual errors are a lot less harmful, whereas, uh, perpendicular to the direction of motion, perceptual errors are much more harmful. So the point is that if you combine these two things, you can trade off errors between the components optimally to best accomplish the task, and the components can actually be weaker while still leading to better overall performance.

    5. LF

      Yeah. That's a profound idea. I mean, in, in the space of pegs and things like that, it's quite simple. It, it almost is tempting to overlook, but that's... seems to be at least intuitively an idea that should generalize to basically all aspects of perception control.

    6. SL

      Of course. And-

    7. LF

      That one strengthens the other.

    8. SL

      Yeah. And, and we, uh, you know, people who have studied, uh, sort of perceptual heuristics in humans and animals find things like that all the time. So one, one very well-known example, there's something called the gaze heuristic, which is a, uh, a little trick that you can use to intercept a flying object. So if you want to catch a ball, for instance, uh, you could try to localize it in 3D space, estimate its velocity, estimate the effect of wind resistance, solve a complex system of differential equations in your head, um, or you can, uh, maintain, uh, a running speed so the object stays in the same position as in your field of view. So if it dips a little bit, you speed up. If it rises a little bit, um, you slow down. And if you follow this simple rule, you'll actually arrive at exactly the place where the object lands and you'll catch it. And humans use it when they play baseball. Human pilots use it when they fly airplanes to figure out if they're about to collide with somebody. Frogs use this to catch insects, and so on and so on. So this is something that actually happens in nature, and I'm sure this is just one instance of it that we were able to identify just because it's a, you know, or that scientists were able to identify because it's so prevalent, but there are probably many others.

  5. 27:0131:44

    Canonical problem in robotics

    1. SL

    2. LF

      Do you have a... Just so we can zoom in as we talk about robotics, do you have a canonical problem, sort of a simple, clean, beautiful representative problem in robotics that you think about when you're thinking about some of these problems? We talked about robotic manipulation. To me, that seems intuitively at least, uh, the robotics community has converged towards that as a space. That's the canonical problem. If you agree, then maybe do zoom in in some particular aspect of that problem that you just like. Like, if we solve that problem perfectly, it'll unlock a major step in, um, towards human-level intelligence.

    3. SL

      I don't think I have, like, a really great answer to that. And, and I think partly the reason I don't have a great answer kind of has to do with, um... it has to do with the fact that, that the difficulty is really in the flexibility and adaptability rather than in, in doing a particular thing really, really well. So, um, it, it's hard to just say like, "Oh, if you can, I don't know, like, shuffle a deck of cards as fast as, like, a Vegas, uh-"

    4. LF

      Right.

    5. SL

      "... uh, casino dealer, then you'll, you'll be very proficient." It's really the ability to quickly figure out how to do some arbitrary new thing well enough to, like, you know, to move on to the next (laughs) arbitrary thing.

    6. LF

      But, so the, the source of newness and uncertainty, have you found, uh, problems in which it's easy to, uh, generate new newness-ness-nesses?

    7. SL

      Yeah.

    8. LF

      New types of newness. (laughs)

    9. SL

      Yeah. So a few years ago, uh... so if you'd asked me this question around, like, 2016 maybe, I would have probably said that robotic grasping is a really great, uh, example of that because it's a task with great real world utility. Like, you will get a lot of money if you can do it well. Uh-

    10. LF

      What is robotic grasping?

    11. SL

      Picking up any object.

    12. LF

      With a robotic hand.

    13. SL

      Yeah, exactly. So you will get a lot of money if you do it well because, uh, lots of people want to run warehouses with robots, and it's highly non-trivial because, uh, very different objects, uh, will require very different grasping strategies. But actually since then, people have gotten really good at building systems to solve this problem (laughs) -

    14. LF

      (laughs)

    15. SL

      ... uh, to the point where I'm not actually sure how much more progress we can make, uh, with that as, like, the main guiding, uh, thing. But it's kind of interesting to see the kind of methods that have actually worked well in that space because robotic grasping classically used to be regarded very much as, um, kind of a, almost like a geometry problem. So, uh, you know, people who have, uh, studied the history of computer vision will find this very familiar that it's kind of... in the same way that in the early days of computer vision, people thought of it very much as, like, an inverse graphics thing. In robotic grasping, people thought of it as an inverse physics problem essentially. You, you look at what's in front of you, figure out the shapes, then use your best estimate of the laws of physics to figure out where to put your fingers and you pick up the thing. Um, and it turns out that what works really well for robotic grasping, instantiated in many different recent works, including our own, but also ones from many other labs, is to use learning methods with some combination of either exhaustive simulation or, like, actual real world trial and error. And it turns out that those things actually work really well, and then you don't have to worry about solving geometry problems or physics problems.

    16. LF

      So w- what are... j- just by the way on the grasping, what are the difficulties that have been worked on? So one is, like, the materials of things, maybe occlusions on the perception side. Why, why is it such a difficult... why is picking stuff up such a difficult problem?

    17. SL

      Yeah.It's a difficult problem because the n- number of things that you might have to deal with, uh, or the variety of things that you have to deal with is extremely large. And, um, oftentimes things that work for one class of objects won't work for other classes of objects. So, if you c- if you get really good at picking up boxes and now you have to pick up plastic bags, um, you know, you just need to employ a very different strategy. And there are many properties of, of objects that are more than just their geometry that has to do with, you know, the bits that, uh, that are easier to pick up, the bits that are harder to pick up, the bits that are more flexible, the bits that'll cause the thing to pivot and bend and drop out of your hand, versus the bits that result in a nice secure grasp, things that are flexible, things that if you pick them up the wrong way they'll fall upside down and the contents will spill out. So there's all these little details that come up. But the task is still, kind of, can be characterized as one task. Like, there's a very clear notion of you did it or you didn't do it.

    18. LF

      So,

  6. 31:4434:41

    Commonsense reasoning in robotics

    1. LF

      in terms of spilling things, th- there creeps in this notion that starts to sound and feel like common sense reasoning. Do you think solving the general problem of robotics requires, um, common sense reasoning, requires general intelligence, this kind of human level capability of, you know, like you said, be robust and deal with uncertainty, but also be able to sort of reason and assimilate different pieces of knowledge that you have? Um, yeah. What do you, what, what do you, what are your thoughts on the needs of common sense reasoning in the space of, uh, the general robotics problem?

    2. SL

      So, I'm gonna slightly dodge that question and say-

    3. LF

      Great.

    4. SL

      ... that I think, I think maybe actually it's the other way around, is that studying robotics can help us understand how to put common sense into our AI systems. One way to think about common sense is that, a- and, and why our current systems might lack common sense, is that common sense is a property, it, it is an emergent property of, um, actually having to interact with a particular world, a particular universe, and get things done in that universe. So, you might think that, uh, for instance, like, um, an image captioning system, uh, maybe it looks at pictures of, of the world and it types out English sentences. So it kind of, it kind of deals with our world. And then you can easily construct situations where image captioning systems do things that defy common sense, like give it a picture of a person wearing a fur coat and will say it's a teddy bear. But what I think what's really happening in those settings is that the system doesn't actually live in our world. It lives in its own world that consists of pixels and English sentences, and doesn't actually consist of like, you know, having to put on a fur coat in the winter so you don't get cold. Uh, so perhaps the, the reason for the disconnect is that the systems that we have now simply inhabit a different universe. And if we build AI systems that are forced to deal with all of the messiness and complexity of our universe, maybe they will have to acquire common sense to essentially maximize their utility. Whereas the systems we're building now don't have to do that. They can take some shortcuts.

    5. LF

      Th- that's fascinating. You've, uh, a couple times already sort of reframed the role of robotics in this whole thing. And for, for some reason, I don't know if, um, my way of thinking is common, but I thought like, we need to understand and solve intelligence in order to solve robotics. And you're kind of framing it as, no, robotics is one of the best ways to just study artificial intelligence and build sort of like, robotics is, like, the right space in which you get to explore some of the fundamental learning mechanisms, fundamental sort of multimodal, multit- task, aggregation of knowledge mechanisms that are required for general intelligence. That's a really interesting way to think about it. But let me ask about learning.

  7. 34:4144:55

    Can we solve robotics through learning?

    1. LF

      Can the general sort of robotics, the epitome of the robotics problem be solved purely through learning, perhaps end-to-end learning? Sort of learning from scratch, as opposed to injecting human expertise and rules and heuristics and so on?

    2. SL

      I think that in, in terms of the spirit of the question, I, I, I would say yes. Uh, I mean, I, I think that in, though in some ways it's maybe like an overly sharp dichotomy. Like, you know-

    3. LF

      Sure.

    4. SL

      ... I, I think that in some ways when we build algorithms, we, you know ... At some point, a person does something. Like-

    5. LF

      Yeah, hyper-parameters.

    6. SL

      ... a person, a person-

    7. LF

      There's always this, this A person turned on the computer. A person, uh, uh, you know, implemented, uh, TensorFlow. (laughs) Yeah.

    8. SL

      Um, but yeah. I think that in terms of the, in terms of the point that you're getting at, I do think the answer is yes. I think that, uh, I think that we can solve many problems that have previously required meticulous manual engineering through automated optimization techniques. And actually, one thing I will say on this topic is I don't think this is actually a very radical or very new idea. I think people have, uh, have been thinking about automated optimization techniques as a way to do control for a very, very long time. And, mm, in some ways, what's changed is, is really more the name.

    9. LF

      Mm-hmm.

    10. SL

      So, you know, today we would say that, "Oh, my robot does, uh, machine learning and does reinforcement learning." Maybe in the 1960s you'd say, "Oh, uh, my robot is doing optimal control." And maybe the difference between typing out a system of differential equations and doing feedback linearization versus training a neural net, maybe it's not such a large difference. It's just, uh, you know, pushing the optimization deeper and deeper into the thing.

    11. LF

      Well, it's interesting you think that way. But with, uh, deep, especially with deep learning, that, uh, the accumulation of sort of, uh, experiences in data form to form deep representations starts to feel like knowledge as opposed to o- optimal control. So this feels like there's an accumulation of knowledge to the learning process.

    12. SL

      Yes. Uh, yeah, so I think that is a good point, that, uh, one big difference between learning-based systems and classic optimal control systems is that learning-based systems, in principle, should get better and better the more they do something.

    13. LF

      Right.

    14. SL

      And, and I do think that that's actually a very, very powerful difference.

    15. LF

      So if we look back at the world of expert systems, and symbolic AI, and so on, of, uh, using logic to accumulate expertise, human expertise, human encoded expertise. Uh, do you think that will have a role at some point? So the, you know, deep learning, machine learning, reinforcement learning has been inc- has shown incredible results and breakthroughs and just, uh, inspired thousands, maybe millions of researchers. But, you know, there's this, uh, less popular now, but it used to be popular, idea of symbolic AI. Do you think that will have a role?

    16. SL

      I think in some ways, um, the, kind of the, the descendants of, uh, (laughs) uh, of symbolic AI actually already have a role. So, you know, this is the, the highly biased history from my perspective. You, you say that, well, initially we thought that rational decision-making involves, uh, logical manipulation. So you have some model of the world expressed in term- in terms of logic. Uh, you have some query like, "What action do I take in order to, for X to be true?" And then you manipulate your logical symbolic representation to get an answer. What that turned into somewhere in the 1990s is, uh, well, instead of building, uh, kind of predicates and statements that have true or false, uh, values, we'll build probabilistic systems where, uh, things have probabilities associated with them, probabilities of being true and false. And that turned into Bayes' nets and that provided sort of a boost to what were really, you know, still essentially logical inference systems, just probabilistic logical inference systems. And then people said, "Well, let's actually learn the, the, the individual probabilities inside these models." And then people said, "Well, let's not even specify the nodes in the models. Let's just put a, a big neural net in there." But in many ways, I see these as actually kind of descendants from the same idea. It's essentially instantiating rational decision-making by means of some inference process and, uh, learning by means of an optimization process. So, so in a sense I would say yes, that it has a place and in many ways that place is or, you know, it, it already holds that place.

    17. LF

      It's, it's already in there. Yeah. It's just by different, it, uh, looks slightly different than it was before.

    18. SL

      Yeah. But, but in some, there are some things that, that we can think about that make this a little bit more obvious. Like if I train a big neural net model to predict what will happen in response to my robot's actions, and then I run probabilistic inference, meaning I invert that model to figure out the actions that lead to some plausible outcome. Like to me that seems like a kind of logic. You have a model of the world, it just happens to be expressed by a neural net, and you are doing some inference procedure, some sort of manipulation on that model to figure out, uh, you know, the answer to a query that you have.

    19. LF

      It's the interpretability, it's the explainability though that seems to be lacking more so, because, uh, the nice thing about sort of expert systems is you can follow the reasoning of the system that to us mere humans is somehow compelling. It, it would, it, it's just, um, I don't know what to make of this fact that, uh, there's a human desire for intelligence systems to be able to convey in a poetic way to us why it made the decisions it did, like tell a, a co- convincing story. And, uh, perhaps that's like, um, a silly human thing, like we shouldn't expect that of intelligence systems, like we should be super happy that there is intelligence systems out there. But, uh, i- if I were to sort of psychoanalyze the researchers at the time, I would say expert systems connected to that part, uh, that desire of AI researchers for systems to be explainable. I mean, maybe on that topic, do you have a hope that sort of inference systems, so learning-based systems will be as explainable as the dream was with expert systems, for example?

    20. SL

      I think it's a very complicated question because I think that in some ways the question of explainability is, um, kind of very closely tied to the question of, uh, o- of like performance. Like, you know, why, why do you want your system to explain itself? Well, so that it's, so that when it screws up you can kind of figure out why it did it.

    21. LF

      Right. That's-

    22. SL

      But, but, but in some ways that's a, that's a much bigger problem actually. Like, your system might screw up and then it might screw up in how it explains itself (laughs) uh, or you might have some bug somewhere so that it's not actually doing what it was supposed to do. So, you know, maybe a, a, a good way to view that problem is really as a problem of, as a bigger problem of verification and validation, um, of which explainability is sort of o- one component.

    23. LF

      I see. I, I, I just see it differently. I see explainability, uh, you, you put it beautifully. I think you actually summarized the field of explainability. But to me there's another aspect of explainability, which is like storytelling.

    24. SL

      Mm-hmm.

    25. LF

      That's has nothing to do with errors or with like the, the sort of, it doesn't... It uses errors as p- as, uh, elements of its story as opposed to a fundamental need to be explainable when errors occur. It's just that for other intelligence systems to be in our world, we seem to want to tell each other stories. And, (laughs) that, that's true in the political world, that's true in the academic world. And that I, you know, neural networks are less capable of doing that or perhaps they're equally capable. Storytelling is storytelling. Maybe it doesn't matter what the fundamentals of the system are, you just need to be a good storyteller.

    26. SL

      Maybe one, uh, specific story I can tell you about, um, in that space is actually about some work, um, that was done by, uh, by my former collaborator who's now a professor at MIT named Jacob Andreas. Um, Jacob, uh, actually works in natural language processing, but he had this idea to do a little bit of work in reinforcement learning and how, um, on how natural language can basically structure the internals of policies trained with RL. And, uh, one of the things he did is he set up a model that attempts to perform some task that's defined by a reward function.Uh, but the model reads in a natural language instruction. So this is a pretty common thing to do in instruction following. So y- you tell it like, you know, "Go to the red house," and then it's supposed to go to the red house. But then one of the things that Jakob did is he treated that sentence not as a command from a person, but as a representation of the internal kind of, uh, state of the- of the- of the mind of this policy essentially. So that when it was faced with a new task, what it would do is it would basically try to think of possible language descriptions, attempt to do them, and see if they led to the right outcome. So it would kind of think out loud like, you know, "I'm faced with this new task, what am I gonna do? Let me go to the red house. Oh, that didn't work. Let me go to the blue, uh, room or something. Let me go to the green plant." And once it got some reward, it would say, "Oh, go to the green plant. That's what's working. I'm gonna go to the green plant." And then you could look at the string that it came up with and that was a description of how it thought it should solve the problem. So you could do... You could basically incorporate language as internal state and you can start getting some handle on these kinds of things.

    27. LF

      And then what I was kinda trying to get to is that also if you add to the reward function the convincingness of that story.

    28. SL

      Mm-hmm.

    29. LF

      So I have another reward signal of like people who review that story, how much they like it. So that y- you know, uh, initially that could be a hyper-parameter or sort of hard-coded heuristic type of thing. But it's an interesting notion of the convincingness of the story becoming part of the reward function, the objective function of the explainability. It's... In the world of, uh, sort of Twitter and fake news, that might be a scary notion that, uh, the- the nature of truth may not be as important as the convincingness of the how convincing you are in telling the story around the facts. Well, let me ask, um, the- the basic question.

  8. 44:551:06:36

    What is reinforcement learning?

    1. LF

      You're one of the world-class researchers in reinforcement learning, deep reinforcement learning, certainly in the robotic space. What is reinforcement learning?

    2. SL

      I think that reenfor- what reinforcement learning refers to today is really just the, uh, uh, kind of the modern incarnation of learning-based control. So classically, reinforcement learning has a much more narrow definition, which is that it's, you know, literally learning from reinforcement, like the thing does something and then it gets a reward or punishment. But really, I think the way the term is used today is it's used for- more broadly to learning-based control. So some kind of system that's supposed to be controlling something and it uses data to get better.

    3. LF

      And what does control mean? So is action is the fundamental element there?

    4. SL

      Yeah. It means making rational decisions.

    5. LF

      Now-

    6. SL

      And rational decisions are decisions that maximize a measure of utility.

    7. LF

      And sequentially. So you made- made decisions time and time and time again. Now, like so do- it's easier to see that kind of idea in the space of maybe games, in the space of robotics. Do you see it bigger than that? Is it applicable? Like where- where are the limits of the applicability of reinforcement learning?

    8. SL

      Yeah. So rational decision-making is essentially the f- the encapsulation of the AI problem viewed in- through a particular lens. So any problem that we would want a machine to do, uh, an intelligent machine can likely be represented as a decision-making problem. You know, classifying images is- is a decision-making problem, although not a sequential one typically. Uh, you know, controlling a chemical plant is a decision-making problem. Deciding what videos to recommend on YouTube is a decision-making problem. And one of the really appealing things about reinforcement learning is if it does encapsulate the range of all these decision-making problems, perhaps working on reinforcement learning is, uh, you know, one of the ways to reach a very broad swath of AI problems.

    9. LF

      But what- what to you is the fundamental difference between reinforcement learning and maybe supervised machine learning?

    10. SL

      So reinforcement learning can be viewed as a generalization of supervised machine learning. Uh, you can certainly cast supervised learning as a reinforcement learning problem. You can just say your loss function is- is the negative of your reward. Uh, but you have stronger assumptions. You have the assumption that someone actually told you what the correct answer was, uh, that your data was IID, and so on. So you could view reinforcement learning as essentially relaxing some of those assumptions. Now that's not always a very productive way to look at it because if you actually have a supervised learning problem, you'll probably solve it much more effectively by using supervised learning methods because it's easier. Uh, but, uh, you can view reinforcement learning as a generalization of that.

    11. LF

      No, for sure. But they're fundamentally different. They c- th- that's a mathematical statement that's absolutely correct. But it seems that reinforcement learning, the kind of tools we bring to the table today, of today, so maybe down the line everything will be a reinforcement learning problem, just like you said. Uh, image classification should be mapped to a reinforcement learning problem. But today, the tools and ideas, uh, the way we think about them are- are different. Sort of, um, supervised learning has been used very effectively to solve basic narrow AI problems. Reinforcement learning kind of represents the dream of AI. It's, uh, very much so in the research space now in sort of captivating the imagination of people of what we can do with intelligent systems, but it hasn't yet had as wide of an impact as the supervised learning approaches. So that, sort of that, um, my question comes from a more practical sense, like what do you see is the gap between the more general reinforcement learning and the very specific... Uh, yes, it's sequential decision-making with one sequence, one step in the sequence of the supervised learning?

    12. SL

      So from a practical standpoint, I think that one- one thing that is, uh, you know, potentially a little tough now, and this is I think something that we'll see... This is a gap that we might see closing over the next couple of years, is the ability of reinforcement learning algorithms to effectively utilize large amounts of prior data. So one of the reasons why it's a bit difficult today to use reinforcement learning for all the things that we might want to use it for is that in most of the settings where we want to do rational decision-making...Um, it's a little bit tough to just deploy some policy that does crazy stuff, uh, and learns purely through trial and error. It's much easier to collect a lot of data, a lot of logs of some other policy that you've got, and then maybe use... You know, if you can get a good policy out of that, then you deploy it and let it kind of fine-tune a little bit. But algorithmically, it's quite difficult to do that. So, I think that once we figure out how to get reinforcement learning to bootstrap effectively from large data sets, then we'll see, uh, very, very, uh, rapid growth in applications of these technologies. So, this is what's referred to as off-policy reinforcement learning or offline RL or batch RL, um, and I think we're, we're seeing a lot of, uh, research right now that does bring us closer and closer to that.

    13. LF

      Can you maybe paint the picture of the different methods? So, you said, uh, off-policy. What's value-based reinforcement learning? What's policy-based? What's model-based? What's off-policy, on-policy?

    14. SL

      Mm-hmm.

    15. LF

      What are the different categories of reinforcement learning?

    16. SL

      Yeah. So, one way we can think about reinforcement learning is that it's, um... It's in some very fundamental way, it's about, um, learning models that can answer, kind of, what-if questions. So, what would happen if I take this action that I hadn't taken before? Um, and you do that, of course, from experience, from data. And oftentimes, you do it in a loop, so you build a model that answers these what-if questions, uh, use it to figure out the best action it can take, and then go and try taking that and see if the outcome agrees with what you predicted. Um, so, uh, the different kinds of techniques, uh, basically refer to different ways of doing it. So, model-based methods answer a question of, uh, what state you would get. Basically, what would happen to the world if you were to take a certain action? Value-based methods, uh, they answer the question of what value you would get, meaning what utility you would get. Um, but in a sense, they're not really all that different because they're both really just answering these what-if questions. Now, unfortunately for us, with current machine learning methods, answering what-if questions can be really hard because they are really questions about things that didn't happen. If you wanted to answer what-if questions about things that did happen, you wouldn't need a learned model. You would just, like, repeat the thing that worked before, um, and that's really a big part of why RL is a little bit tough. So, if you have a purely on-policy kind of online process, then you ask these what-if questions, you make some mistakes, then you go and try doing those mistaken things, and then you observe kind of the counterexamples that'll teach you not to do those things again. Uh, if you have a bunch of off-policy data and you just want to synthesize the best pol- policy you can out of that data, then you really have to deal with the, the challenges of making these, these counterfactual

    17. LF

      First of all, what's a policy? (laughs)

    18. SL

      Yeah. A policy is, uh, a, a model or some kind of function that maps from observations of the world to actions. So, in reinforcement learning, we often refer to the, uh, the current configuration of the world as the state. So, we say the state kind of encompasses everything you need to fully define where the world is at at the moment, and depending on how we formulate the problem, we might say you either get to see the state or you get to see an observation which is some snapshot, some piece of the state.

    19. LF

      So, policy is, just includes everything in it in order to be able to act in this world?

    20. SL

      Yes.

    21. LF

      And so, what does off-policy mean if-

    22. SL

      So, yeah. So, the terms on-policy and off-policy refer to how you get your data. So, if you get your data from somebody else who was doing some other stuff, maybe you get your data from some manually programmed, uh, system that was, you know, just running in the world before, that's referred to as off-policy data. But if you got the data by actually acting in the world based on what your current policy thinks is good, we call that on-policy data, and obviously, on-policy data is more useful to you because if your current policy makes some bad decisions, you will actually see that those decisions are bad. Off-policy data, however, might be much easier to obtain because maybe that's all the logged data that you have from before.

    23. LF

      So, we talk about... We offline talked about autonomous vehicles, so you can envision off-policy kind of approaches in, uh, robotic spaces where there's already a ton of robots out there, but they don't get the luxury of being able to, uh, explore based on a reinforcement learning framework. So, how do we make... Again, open question, but how do we make off-policy methods work?

    24. SL

      Yeah. So, uh, this is something that has been kind of a big open problem for a while, and in the last few years, uh, uh, people have made a little bit of progress on that. Um, you know, I can tell you about... Uh, and it's not by any means solved yet, but I can tell you some of the things that, um, for example, we've done to try to, uh, address some of the challenges. Um, it turns out that one really big challenge with off-policy reinforcement learning is that you can't really trust your models to give accurate predictions for any possible action. So, if I've never tried to... If, if in my data set, I never saw somebody steering the car off the road onto the sidewalk, my value function or my model is probably not going to predict the right thing if I ask it what would happen if I were to steer the car off the road onto the sidewalk. So, one of the important things you have to do to get off-policy RL to work is you have to be able to figure out whether a given action will result in a trustworthy prediction or not, and you can use, uh, kind of distribution, uh, uh, estimation methods, kind of density estimation methods, to try to figure that out. So, you could figure out that, well, this action, my model is telling me that it's great, but it looks totally different from any action I've taken before, so my model is probably not correct. Uh, and you can incorporate, uh, regularization, uh, terms into your learning objective that will essentially tell you not to ask those questions that your model is unable to answer.

    25. LF

      What would lead to breakthroughs in this space do you think? Like, what's needed? Uh, is this a data set question? Do we need to collect big benchmark data sets that allow us to explore this space? Is it, uh, a new kinds of methodologies? Like, what's your sense? Or, or maybe coming together in a space of robotics and defining the right problem to be working on?

    26. SL

      Yeah. I think for off-policy reinforcement learning in particular, it's very much an algorithms question right now, and, uh, you know, this is something that...... I think is great because an algorithms question is, you know, that, that just takes some very smart people to get together and think about it really hard. Whereas if it was, like, a, a data problem or a hardware problem, that would take some serious engineering. So that's why I'm pretty excited about that problem because I think that we're in a position where we can make some real progress on it just by coming up with the right algorithms. In terms of which algorithms they could be, um, you know, the, the problems at their core are very related to problems in, you know, things like, like causal inference, right? Because what you're really dealing with is situations where you have a model, a statistical model, that's trying to make predictions about things that it hadn't seen before, uh, and if it's a, if it's a model that's generalizing properly, then it'll make good predictions. If it's a model that picks up on spurious correlations, that it will not generalize properly, and then you can... you have an arsenal of tools you can use. You could, for example, figure out what are the regions where it's trustworthy, or on the other hand, you could try to make it generalize better somehow, or some combination of the two.

    27. LF

      Is there room for mixing sort of, uh, where most of it, m- like 90, 95% is off-policy, you already have the data set, and then you get to send a robot out to do a little exploration? Like what, what's that role of mixing them together?

    28. SL

      Yeah, absolutely. I, I think that this is something that you actually mea- uh, uh, described very well at the beginning of the, o- of our discussion when you talked about the iceberg.

    29. LF

      Yes.

    30. SL

      Like this is the iceberg, the, the 99% of your prior experience, that's your iceberg. You'd use that for off-policy reinforcement learning, and then, of course, if you've never, uh, you know, opened that particular kind of door with that particular lock before, then you have to go out and fiddle with it a little bit, and that's that additional 1% to help you figure out a new task. And I think that's actually like a pretty good recipe going forward.

  9. 1:06:361:08:15

    Tesla Autopilot

    1. SL

      and quickly.

    2. LF

      So what- what do you think about if we just looked at one particular case study of, uh, Tesla Autopilot that has quickly approaching towards a million vehicles on the road where some percentage of the time, 30-40% of the time is driven using the computer vision, multitask, HydraNet, right? Uh, and then the other percent, that's what they call it, HydraNet. (laughs) Uh, the- the other percent is human controlled. From the human side, how can we use that data? What's your sense? So like, what's the signal? Do you have ideas in this autonomous vehicle space when people can lose their lives? You know, it's a, it's a safety critical environment. So how do we use that data?

    3. SL

      So, um, I think that, uh, actually the kind of problems that come up when we want, um...... systems that are, uh, reliable and that can kind of understand the limits of their capabilities, they're actually very similar to the kind of problems that come up when we have, when we're doing off-policy reinforcement learning. So as I mentioned before, in off-policy reinforcement learning, the big problem is you need to know when you can trust the predictions of your model. Because if you, if you're trying to evaluate some pattern of behavior for which your model doesn't give you an accurate prediction, then you shouldn't use that to, to modify your policy. And it's actually very similar to the problem that we're faced when we actually then deploy that thing and we want to decide whether we trust it in the moment or not. Um, so perhaps we just need to do a better job of figuring out that part, and that's a very deep research question, of course.

    4. LF

      Right.

    5. SL

      But it's also a question that a lot of people are working on, so I'm pretty optimistic that we can make some progress on that over the next few years.

  10. 1:08:151:13:46

    Simulation in reinforcement learning

    1. LF

      What's the role of simulation in reinforcement learning, d- uh, deep reinforcement learning, reinforcement learning? Like, how essential is it? It's been essential for the breakthroughs so far, for some interesting breakthroughs. Do you think it's a crutch that we rely on? I mean, again, this connects to our off-policy, uh, discussion, but do you think we could ever get rid of simulation or do you think simulation will actually take over, will create more and more realistic simulations that will allow us to, uh, to solve actual real-world problems, like transfer the models we learn in simulation to real-world problems?

    2. SL

      Mm-hmm. Yeah. I think that simulation is a very pragmatic tool that we can use to get a lot of useful stuff to work right now. But I think that in the long run, we will need to build machines that can learn from real data because that's the only way that we'll get them to improve perpetually. Because if we can't have our machines learn from real data, if they have to rely on simulated data, eventually the simulator becomes the bottleneck. In fact, this is a general thing. If your machine has any bottleneck that is built by humans and that doesn't improve from data, it will eventually be the thing that holds it back. Uh, and if you're entirely reliant on your simulator, that'll be the bottleneck. If you're entirely reli- reliant on a manually designed controller, that's gonna be the bottleneck. So, simulation is very useful, it's very pragmatic, but it's not a substitute for being able to utilize real experience. Uh, and this is, by the way, this is something that I think is quite, uh, relevant now, especially in the context of some of the things we've discussed because some of these, uh, kind of scaffolding issues that I mentioned, things like the broken dishes and the unknown reward function, like, these are not problems that you would ever stumble on, uh, when working in a purely simulated kind of environment. But they become very apparent when we try to actually run these things, uh, in the real world.

    3. LF

      To throw a brief wrench into our discussion, let me ask, do you think we're living in a simulation?

    4. SL

      Oh, I have no idea. (laughs)

    5. LF

      Do you think that's a useful thing to even think about, about the, sort of the, the, the, the fundamental physics nature of reality? Uh, or another perspec- The reason I think the simulation hypothesis is interesting is, is to think about how difficult is it to create sort of a virtual reality game-type situation that will be sufficiently convincing to us humans, or sufficiently enjoyable that would, we wouldn't want to leave? I mean, that's actually a practical engineering (laughs) challenge. And I, I, I personally really enjoy virtual reality, but it's quite far away, but I kinda think about what would it take for me to want to spend more time in virtual reality versus the real world? And that's a, that's a, sort of a nice, clean question because at that point, we've reached... If I want to live in a virtual reality, that means we're just a few years away where majority of the population lives in a virtual reality, and that's how we create the simulation, right? You don't need to actually simulate the, you know, the quantum gravity and just every aspect of the, of the universe. And that's a re- and that's an interesting question for reinforcement learning, too, is if we want to make sufficiently realistic simulations that may i- i- it blend the difference between sort of the real world and the simulation, thereby just all, the sum of the things we've been talking about, kind of the problems go away if we can create actually interesting, rich simulations.

    6. SL

      It's an interesting question, and it actually, I think your question casts your previous question to, in a, in a very interesting light.

    7. LF

      Yeah.

    8. SL

      Because in, in some ways, asking whether we can, um... Well, the more prac- the more kind of practical ver- version is like, you know, can we build simulators that are good enough to train essentially AI systems that will work in the world, and it's kind of interesting to think about this, about what this implies. If true, it kind of implies that it's easier to create the universe than it is to create a brain. And that seems like-

    9. LF

      And th-

    10. SL

      Put this way, it seems kinda weird.

    11. LF

      The aspect of the simulation most interesting to me is the simulation by the humans. That seems to be a complexity that makes the robotics problem harder. Now, I don't know if every robotics person agrees with that notion. Just, uh, as, as a quick aside, what are your thoughts about when the human enters the picture of the robotics problem? How does that change the reinforcement learning problem, the, the learning problem in general?

    12. SL

      Yeah, I think that's a, that's a kind of a complex question, and, um, I guess my hope for a while had been that if we build these, uh, robotic learning systems that, that are multitask, that utilize lots of prior data and that learn from their own experience, the bit where they have to interact with people will be perhaps handled in much the same way as all the other bits. So, if they have prior experience in interacting with people and they can learn from their own experience of interacting with people for this new task, maybe that'll be enough. Um, now, of course there, if it's not enough, there are many other things we can do, and there's quite a bit of research in that, in that area. But I think it's worth a shot to see whether the, uh, the, the multi-agent interaction, the, the ability to understand that other, uh, beings in the world have their own goals and tensions and thoughts and so on, whether that kind of understanding...Can emerge automatically from simply learning to do things with, and maximize utility.

    13. LF

      That information arises from the data.

  11. 1:13:461:16:03

    Can we learn gravity from data?

    1. LF

      You- you've said something about gravity, sort of, um, that you don't need to explicitly inject anything into the system, that it can be learned from the data. And gravity is an example of something that could be learned from data, so like the physics of the world. Like, what... (laughs) What are the limits of what we can learn from data? Do you really, do you think we can... Uh, so, uh, a very simple, clean way to ask that is, do you really think we can learn gravity from just data? The idea, the- the laws of gravity.

    2. SL

      So, so something that I think is, is a common, um, kind of pitfall when thinking about prior knowledge and learning is, um, to assume that just because we know something, then that it's better to tell the machine about that rather than-

    3. LF

      Right.

    4. SL

      ... have it figure it out on its own. In many cases, things that are important that affect many of the events that the machine will experience are actually pretty easy to learn. Like, you know, if things, if every time you drop something, it falls down, like yeah, you might not get the, you know, you might get kind of the- the Newton's version, not Einstein's version, but it'll be pretty good and it will probably be sufficient for you to act rationally in the world because you see the phenomena all the time. So, things that are readily apparent from the data, we might not need to specify those by hand. It might actually be easier to let the machine figure them out.

    5. LF

      It just feels like that there might be a space of many local, uh, local minima in terms of theories of this world that we would discover and get stuck on.

    6. SL

      Yeah, of course.

    7. LF

      That Newtonian mechanics is not necessarily easy to come by.

    8. SL

      Yeah, and well, and in fact, uh, in- in some fields of science, for example, human civilization is itself full of these local optima. So for example, if you think about how people, uh, try to figure out, uh, biology and medicine, you know, for the longest time, the kind of rules, like the kind of, uh, principles that serve us very well in our day-to-day lives actually serve us very poorly in understanding, uh, medicine and biology. We had kind of very, uh, superstitious and weird ideas about how the body worked until the advent of the modern scientific method. So that does seem to be, you know, a failing of this approach, but it's also a failing of human intelligence, arguably.

  12. 1:16:031:17:39

    Self-play

    1. SL

    2. LF

      Maybe a small aside, but some, you know, the idea of self-play is fascinating in reinforcement learning, sort of these competitive, creating a competitive context in which agents can play against each other in a sort of at the same skill level and thereby increasing each other's skill level. It seems to be this kind of self-improving mechanism is exceptionally powerful in the context where it could be applied. First of all, is that beautiful to you that this mechanism work as well as it does and also can be generalized to other contexts, like in the robotic space or anything that's applicable to the real world?

    3. SL

      I think that, um, it's a very interesting idea and I th- but I suspect that the bottleneck to actually generalizing it to the robotic setting is actually going to be the same as- as the bottleneck for everything else, that we need to be able to build machines that can get better and better through natural interaction with the world. And once we can do that, then they can go out and play with, they can play with each other, they can play with people, they can play with the natural environment. Uh, but before we get there, we've got all these other problems we've got, we have to get out of the way.

    4. LF

      So, there's no shortcut around that. You have to interact with the natural environment that...

    5. SL

      Well, because in a, in a self-play setting, you still need a mediating mechanism. So the- the reason that, um, you know, self-play works for a board game is because the rules of that board game mediate the interaction between the agents. So, the kind of intelligent behavior that will emerge depends very heavily on the nature of that mediating mechanism.

  13. 1:17:391:27:01

    Reward functions

    1. LF

      So, on the side of reward functions, that's, uh, coming up with good reward functions seems to be the thing that we associate with, uh, general intel- like, h- human beings seem to value the idea of developing our own reward functions, of, uh, you know, arriving at meaning and so on. And yet for reinforcement learning, we often kind of specify that's the given. What's your sense of how we develop reward func- you know, good reward functions?

    2. SL

      Yeah, I think that's a very, uh, complicated and very deep question. And- and you're completely right that classically in reinforcement learning this question, uh, I guess, kind of been treated as a non-issue, that you sort of treat the reward as this external thing that comes from some other bit of your biology and you kind of don't worry about it. And I do think that that's actually, you know, a little bit of a mistake, that we should worry about it. And we can approach it in a few different ways. We can approach it, for instance, by thinking of rewards as a communication medium. We can say, "Well, how does a person communicate to a robot what its objective is?" You can approach it also as a sort of more of an intrinsic, uh, motivation meeting. You could say, "Can we write down kind of a general objective that leads to, uh, good capability?" Like, for example, can you write down some objectives such that even in the absence of any other task if you maximize that objective, you'll sort of learn useful things? Uh, this is, uh, something that has sometimes been called unsupervised reinforcement learning, which I think is a really fascinating, uh, area of research, especially today. Uh, we've done a bit of work on that recently. One of the things we've studied is whether, uh, we can have some notion of, uh, uh, of unsupervised reinforcement learning by means of, um, you know, information theoretic quantities, like for instance, minimizing a Bayesian measure of surprise. This is an idea that was, uh, you know, pioneered actually in the computational neuroscience community by folks like, uh, Karl Friston. Uh, and we've done some work recently that shows that you can actually learn pretty interesting skills by essentially behaving in a way that- that allows you to make accurate predictions about the world. It seems a little circular, like do the things that will lead to you getting the right answer (laughs) -... for, for prediction. But you can, you know, by doing this, you can sort of discover stable niches in the world. You can discover that if you're playing Tetris, then correctly, uh, you know, clearing the rows will let you play Tetris for longer and keep the board nice and clean, which sort of satisfies some desire for order in the world, uh, and as a result, get some degree of leverage over your domain. Um, so we're exploring that pretty actively right now.

Episode duration: 1:37:30

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode kxi-_TT_-Nc

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome