Lex Fridman PodcastPieter Abbeel: Deep Reinforcement Learning | Lex Fridman Podcast #10
EVERY SPOKEN WORD
80 min read · 15,670 words- 0:00 – 15:00
The following is a…
- LFLex Fridman
The following is a conversation with Pieter Abbeel. He's a professor at UC Berkeley and the director of the Berkeley Robotics Learning Lab. He's one of the top researchers in the world working on how we make robots understand and interact with the world around them, especially using imitation and deep reinforcement learning. This conversation is part of the MIT course on Artificial General Intelligence and the Artificial Intelligence Podcast. If you enjoy it, please subscribe on YouTube, iTunes, or your podcast provider of choice, or simply connect with me on Twitter @lexfridman, spelled F-R-I-D. And now, here's my conversation with Pieter Abbeel. You've mentioned that if there was one person you could meet, it would be Roger Federer. So let me ask, when do you think we'll have a robot that fully autonomously can beat Roger Federer at tennis? Uh, R- Ro- Roger Federer-level player at tennis?
- PAPieter Abbeel
Huh, well, first, if you could make it happen for me to meet Roger, let me know. (laughs)
- LFLex Fridman
(laughs)
- PAPieter Abbeel
Um, in terms of getting a, a robot to, um, beat him at tennis, it's kind of an interesting question because f- for a lot of the challenges we think about in AI, the software is really the missing piece. But for something like this, the hardware is nowhere near either. Like, to really have a robot that can physically run around, the Boston Dynamics robots are starting to get there, but still not really human-level ability to, to run around, and then swing a racket... Let's see-
- LFLex Fridman
So you think that's a hardware problem?
- PAPieter Abbeel
I don't think it's a hardware problem only. I think it's a hardware and a software problem. I think it's both. And I think they'll, they'll have independent progress. So I'd say the, the hardware, maybe in 10, 15 years. On-
- LFLex Fridman
On clay, not grass. I mean, grass is probably harder.
- PAPieter Abbeel
(laughs) With the sliding?
- LFLex Fridman
Yeah.
- PAPieter Abbeel
Well, the clay, I'm, I'm not sure what's harder, grass or clay. The clay re- involves sliding, which might be harder to master, actually. Yeah.
- LFLex Fridman
But y- you're not limited to, uh, bipedal. I mean, I'm sure there's no r-
- PAPieter Abbeel
Well, if we can build the machine, it's a whole different question, of course, you know. If we can, if we can say, okay, this robot can be on wheels, it can move around on wheels, and can be designed differently, then I think that, that can be done sooner, probably, than a, a full humanoid type of setup.
- LFLex Fridman
What do you think of swing a racket? So you've worked at, uh, basic manipulation.
- PAPieter Abbeel
Mm-hmm.
- LFLex Fridman
How hard d- do you think is the task of swinging a racket with a, be able to hit a nice backhand or a forehand?
- PAPieter Abbeel
Let's say, let's say, let's say we just set up stationary, uh, a, a nice robot arm, let's say, you know, a standard industrial arm, and it can watch the ball come and then swing the racket. It's a good question. I'm not sure it would be super hard to do. I mean, I'm sure it would require a lot, if we do it with re- with, with reinforced learning, it would require a lot of trial and error. It's not gonna swing it right the first time around. But yeah, I don't, I don't see why it couldn't.
- LFLex Fridman
So you think it's learnable?
- PAPieter Abbeel
Swing it the right way. I think it's learnable. I think if you set up a ball machine, let's say, on one side, and then a robot with a tennis racket on the other side, I think it's learnable, and maybe a little bit of pre-training and simulation. Yeah, I, I think that's, I think that's feasible. I thi- I think the swinging the racket is feasible. It'd be very interesting to see how much precision it can get, 'cause, I mean, that's, that's where, I mean, some of the human players can hit it on the lines, which is very high precision.
- LFLex Fridman
With spin. The spin-
- PAPieter Abbeel
With spin.
- LFLex Fridman
... is, is, is an interesting, uh, whether RL can learn to put a spin on the ball.
- PAPieter Abbeel
Oh, you got me interested. Maybe someday we'll set this up. (laughs)
- LFLex Fridman
Some day, sure. But then-
- PAPieter Abbeel
You got me intrigued. (laughs)
- LFLex Fridman
(laughs) Your answer is basically, okay, for this problem, it sounds fascinating, but for the general problem of a tennis player, we might be a little bit farther away. What's the most impressive thing you've seen a robot do in the physical world?
- PAPieter Abbeel
So physically, for me, it's the Boston Dynamics videos always just ring home and I'm just super impressed. Recently, the robot running up the stairs, doing the parkour-type thing. I mean, yes, we don't know what's underneath. They don't really write a lot of detail, but even if it's hardcoded underneath, which it might or might not be, just the physical abilities of doing that parkour, that's a very impressive robot right there.
- LFLex Fridman
So have, have you met Spot Mini or any of those robots in person?
- PAPieter Abbeel
Met Spot Mini last year in, in April at the MARS event that Jeff Bezos organizes.
- LFLex Fridman
Mm-hmm.
- PAPieter Abbeel
They brought it out there. And it was nicely following around Jeff. When Jeff left the room, they had it follow him along, which is pretty impressive.
- 15:00 – 30:00
So, I think where…
- LFLex Fridman
we think of as the real world?
- PAPieter Abbeel
So, I think where things get really tricky in the real world compared to the things we've looked at so far with great success in reinforcement learning is the time scales, which takes us to an extreme. So, when you think about real world, I mean, I don't know, maybe some student decided to do a, a PhD here, right? Okay, that's, that's a decision, that's a very high level decision. But if you think about their lives, I mean, any person's life, it's a sequence of muscle fiber contractions and relaxations, and that's how you interact with the world. And that's a very high frequency control thing. But it's ultimately what you do and how you affect the world, um, until I guess we have brain readings, you can maybe do it slightly differently. But typically that's how you affect the world. And the decision of doing a PhD is like so abstract relative to what you're actually doing-
- LFLex Fridman
Mm-hmm.
- PAPieter Abbeel
... in the world. And I think that's where credit assignment becomes just completely beyond what any current RL algorithm can do. And we need hierarchical reasoning at a level that is just not available at all yet.
- LFLex Fridman
Where do you think we can pick up hierarchical reasoning? By which mechanisms?
- PAPieter Abbeel
Yeah. So maybe let me highlight what I, I think the limitations are of what already was done 20, 30 years ago. In fact, you'll find reasoning systems that reason over relatively long horizons. But the problem is that they were not grounded in the real world.
- LFLex Fridman
Mm-hmm.
- PAPieter Abbeel
So people would have to hand design, uh, some kind of logical dynamical descriptions of the world and that didn't tie into perception. And so they didn't tie into real objects and so forth. And so that, that w- that was a big gap. Now with deep learning, we start having the ability to really see with sensors, process that, and understand what's in the world. And so it's a good time to try to bring these things together. One... I, I see a few ways of getting there. One way to get there would be to say deep learning can get bolted on somehow to some of these more traditional approaches. Now, bolted on would probably mean you need to do some kind of end-to-end training where you say, "My deep learning processing somehow leads to a representation that in turn uses some kind of traditional underlying dynamical, uh, systems that can be used for planning." And that's, for example, the direction Aviv Tamar and Thenard Kirittaj here have been pushing with Causal InfoGAN, and of course other people too.
- LFLex Fridman
Mm-hmm.
- PAPieter Abbeel
That, that's, that's one way. Can we somehow force it into the form factor that is amenable to reasoning? Another direction we've been thinking about for a long time and didn't make any progress on was more information theoretic approaches. So the idea there was that what it means to take high level action is to take and choose a latent variable now that tells you a lot about what's gonna be the case in the future, 'cause that's what it means to, to take a high level action.
- LFLex Fridman
Mm-hmm.
- PAPieter Abbeel
I say, "Okay, what... I decide I'm gonna navigate to the gas station 'cause I need to get gas for my car." Well, that'll now take five minutes to get there. But the fact that I get there, I could already tell that from the high level...... action I took much earlier.
- LFLex Fridman
Mm-hmm.
- PAPieter Abbeel
Um, that we had a very hard time getting success with. Um, not saying it's a dead end necessarily, but we had a lot of trouble getting that to work. And then we started revisiting the notion of, what are we really trying to achieve? Um, what we're trying to achieve is not necessarily hierarchy per se, but you could think about, what does hierarchy give us? Um, what it's- we hope it would give us is better credit assignment. Um, kinda what does better credit assignment is- is giving- is giving us? It gives us, um, faster learning.
- LFLex Fridman
Mm-hmm.
- PAPieter Abbeel
Right. And so faster learning is ultimately maybe what we're after, and so that's what we ended up with the RL squared paper on learning to reinforcement learn, which at the time Rocky Duan led. Um, and that's exactly the meta-learning approach where you say, "Okay, we don't know how to design hierarchy, but we know what we wanna get from it, let's just enter and optimize for what we wanna get from it-"
- LFLex Fridman
Mm-hmm.
- PAPieter Abbeel
"... and see if it might emerge." And we saw things emerge. The maze navigation had consistent motion down hallways, which is what you want. A hierarchical control should say, "I wanna go down this hallway," and then when there is an option to take a turn, I can decide whether to take a turn or not, and repeat. Even had the notion of where have you been before or not-
- LFLex Fridman
Mm-hmm.
- PAPieter Abbeel
... to not revisit places you've been before. Um, it still didn't scale yet to the real world kind of scenarios I think you had in mind, but it was some sign of life that maybe you can meta-learn these hierarchical concepts.
- LFLex Fridman
I mean, it seems like, uh, through these meta-learning concepts, we get at the, what I think is one of the hardest and most important problems of AI, which is transfer learning. So it's generalization. How far along this journey towards building general systems are we in being able to do transfer learning well? So there are some signs that you can generalize a little bit, but do you think we're on the right path or s- totally different breakthroughs are needed to be able to transfer knowledge between different learned models?
- PAPieter Abbeel
Yeah. I'm- I'm pretty torn on this, in that I think there are some very impress-
- LFLex Fridman
Depends on the day? (laughs)
- PAPieter Abbeel
(laughs) Well, th- th- there's just some very impressive results already.
- LFLex Fridman
Right.
- PAPieter Abbeel
Right? I mean-
- LFLex Fridman
Yes. Absolutely.
- PAPieter Abbeel
... I would say when even with the initial kinda big breakthrough in 2012 with AlexNet, right? The initial, the initial thing is, "Okay, great. This does better on ImageNet, hence image recognition." But then immediately thereafter, there was of course the notion that, wow, what was learned on ImageNet and you now wanna solve a new task, you can fine tune AlexNet for new tasks, and that was often found to be the even bigger deal, that you learn something that was reusable, which was not often the case before. Usually machine learning, you learn something for one scenario and that was it.
- LFLex Fridman
Yeah, and that's really exciting. I mean, that's- that's a huge application. That's probably the biggest success of transfer learning to date, if- in terms of scope and impact.
- PAPieter Abbeel
Mm-hmm. That was a huge breakthrough, and then recently I feel like similar kind of... By- by scaling things up, it seems like this has been expanded upon, like people training even bigger networks, they might transfer even better. If you looked at, for example, uh, some of the OpenAI results on language models and some of the recent Google results on language models, they're learned for just prediction-
- 30:00 – 42:51
Right. …
- PAPieter Abbeel
for achieving a goal but rewards for making progress, and that becomes time-consuming. And once you're starting to do that, let's say you want a robot to do something, you need to give all this detailed reward, well, why not just give a demonstration?
- LFLex Fridman
Right.
- PAPieter Abbeel
Because why not just show the robot? And now the question is how do you show the robot? One way to show is to tele-operate the robot and then the robot really experiences things, and that's nice 'cause that's really high signal-to-noise ratio data and we've done a lot of that, and you teach a robot skills-... in just 10 minutes, you can teach a robot a new basic skill, like, okay, pick up the bottle, place it somewhere else.
- LFLex Fridman
Mm-hmm.
- PAPieter Abbeel
That's a skill. No matter where the bottle starts, maybe it always goes onto a target or something.
- LFLex Fridman
Mm-hmm.
- PAPieter Abbeel
That's fairly easy to teach a robot with teleop. Now, what's even more interesting if you can now teach a robot through third-person learning, where the robot watches you do something-
- LFLex Fridman
Mm-hmm.
- PAPieter Abbeel
... and doesn't experience it, but just watches it and says, "Okay, well, if you're showing me that, that means I should be doing this. And I'm not gonna be using your hand, because I don't get to control your hand, but I'm gonna use my hand and do that mapping." And so that's where I think one of the big breakthroughs has happened this year, this was led by Chelsea Finn here, um, it's almost like learning a machine translation for demonstrations-
- LFLex Fridman
Mm-hmm.
- PAPieter Abbeel
... where you have a human demonstration and the robot learns to translate it into what it means for the robot to do it.
- LFLex Fridman
Mm-hmm.
- PAPieter Abbeel
And that was a meta-learning formulation, learn from one to get the other, um, and that, I think, opens up a lot of opportunities to learn a lot more quickly.
- LFLex Fridman
So my focus is on autonomous vehicles. Do you think this approach of third person watching is ama- uh, the autonomous driving is amenable to this kind of approach?
- PAPieter Abbeel
So for autonomous driving, I would say it's... Third-person is slightly easier, and the reason I'm gonna say slightly easier to do with third-person is because the car dynamics are very well understood. So the, the-
- LFLex Fridman
Easier than, uh, first person, you mean? Or, or-
- PAPieter Abbeel
I think, I think the distin-
- LFLex Fridman
... easier than-
- PAPieter Abbeel
So I think the distinction between third-person and first-person is not a very important distinction for autonomous driving.
- LFLex Fridman
Yeah.
- PAPieter Abbeel
They're very similar, because the distinction is really about who turns the steering wheel-
- LFLex Fridman
Mm-hmm.
- PAPieter Abbeel
... and, or maybe I'll, I'll, let me put it differently. How to get from a point where you are now to a point, let's say, a couple meters in front of you, and that's a problem that's very well understood, and that's the only distinction between third and first-person there. Whereas with robot manipulation, interaction forces are very complex-
- LFLex Fridman
Mm-hmm.
- PAPieter Abbeel
... and it's still a very different thing. Um, for autonomous driving, I think there is still the question, imitation versus RL. So imitation gives you a lot more signal. I think where imitation is lacking and needs some extra machinery is it doesn't, in its normal format, doesn't think about goals or objectives. And of course there are versions of imitation learning, inverse reinforced learning type imitation learning-
- LFLex Fridman
Mm-hmm.
- PAPieter Abbeel
... which also thinks about goals. I think then we're getting much closer, but I think it's very hard to think of a fully reactive car generalizing well, if it really doesn't have a notion of objectives to generalize well to the kind of generality that you would want. You'd want more than just that reactivity that you get from just behavioral cloning/supervised learning.
- LFLex Fridman
So a lot of the work, whether it's self play or even imitation learning, would benefit significantly from simulation, from effective simulation, and you're doing a lot of stuff in the physical world and in simulation. Do you have hope for greater and greater power of simulation loo- being boundless eventually, to where most of what we need to operate in the physical world would, could be simulated to a degree that's directly transferable to the physical world? Or are we still very far away from that?
- PAPieter Abbeel
So I think we could even rephrase that question in some sense.
- LFLex Fridman
Please. (laughs)
Episode duration: 42:44
Install uListen for AI-powered chat & search across the full episode — Get Full Transcript
Transcript of episode l-mYLq6eZPY
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome