Lex Fridman PodcastAnca Dragan: Human-Robot Interaction and Reward Engineering | Lex Fridman Podcast #81
EVERY SPOKEN WORD
150 min read · 30,123 words- 0:00 – 2:26
Introduction
- LFLex Fridman
The following is a conversation with Anca Dragan, a professor at Berkeley working on human-robot interaction, algorithms that look beyond the robot's function and isolation and generate robot behavior that accounts for interaction and coordination with human beings. She also consults at Waymo, the autonomous vehicle company. But in this conversation, she's 100% wearing her Berkeley hat. She's one of the most brilliant and fun roboticists in the world to talk with. I had a tough and crazy day leading up to this conversation, so I was a bit tired, even more so than usual. (laughs) But, uh, almost immediately as she walked in, her energy, passion, and excitement for human-robot interaction was contagious. So I had a lot of fun and really enjoyed this conversation. This is the Artificial Intelligence podcast. If you enjoy it, subscribe on YouTube, review it with five stars on Apple podcast, support it on Patreon, or simply connect with me on Twitter @lexfridman, spelled F-R-I-D-M-A-N. As usual, I'll do one or two minutes of ads now and never any ads in the middle that can break the flow of the conversation. I hope that works for you and doesn't hurt the listening experience. This show is presented by Cash App, the number one finance app in the App Store. When you get it, use code LEXPODCAST. Cash App lets you send money to friends, buy Bitcoin, and invest in the stock market with as little as $1. Since Cash App does fractional share trading, let me mention that the order execution algorithm that works behind the scenes to create the abstraction of fractional orders is an algorithmic marvel. So big props to the Cash App engineers for solving a hard problem that, in the end, provides an easy interface that takes a step up to the next layer of abstraction over the stock market, making trading more accessible for new investors and diversification much easier. So again, you get Cash App from the App Store or Google Play and use the code LEXPODCAST, you get $10 and Cash App will also donate $10 to FIRST, an organization that is helping to advance robotics and STEM education for young people around the world. And now, here's my conversation with Anca Dragan.
- 2:26 – 5:32
Interest in robotics
- LFLex Fridman
When did you first fall in love with robotics?
- ADAnca Dragan
I think it was a very gradual process and it was somewhat accidental actually, because I first started getting into programming when I was a kid, and then into math, and then into comp- I decided computer science was the thing I was gonna do, and then in college I got into AI, and then I applied to the Robotics Institute at Carnegie Mellon. And I was coming from this little school in Germany that no one e- nobody had heard of, but I had spent an exchange semester at Carnegie Mellon, so I had letters from Carnegie Mellon. So that was the only pl- you know, MIT said no, Berkeley said no, Stanford said no. That was the only place I got into, so I went there to the Robotics Institute. And I thought that robotics is a really cool way to actually apply the stuff that I knew and loved, like optimization. So that's how I got into robotics. I have a better story how I got into cars, which is I, you know, I used to do mostly manipulation in my PhD, but now I do kind of a bit of everything application-wise, including cars. And I got into cars because I was here in Berkeley while I was a PhD student still for RSS 2014.
- LFLex Fridman
Mm-hmm.
- ADAnca Dragan
Peter Abbeel organized it.
- LFLex Fridman
Mm-hmm.
- ADAnca Dragan
And he arranged for, it was Google at the time, to give us rides-
- LFLex Fridman
Yeah.
- ADAnca Dragan
... in self-driving cars. And I was in a robot, and it was just making decision after decision, the right call, and it was so amazing.
- LFLex Fridman
(laughs)
- ADAnca Dragan
So it was a whole different experience, right? Just, I mean, manipulation is so hard you can't do anything. And there it was
- LFLex Fridman
Was it the most magical robot you've ever met? So like for me too, uh, meeting Google self-driving car for the first time was like a transformative moment. Um, like I h- I had two moments like that, that and Spot Mini, I don't know if you met Spot Mini from Boston Dynamics.
- ADAnca Dragan
(laughs) .
- LFLex Fridman
I felt like, I felt like I fell in love or something, like ga- I, it, 'cause I thought, like I know how a Spot Mini works, right? It's just, I mean, there's nothing truly special. It, it's, it's great engineering work, but the anthropomorphism that went on into my brain.
- ADAnca Dragan
Mm-hmm.
- LFLex Fridman
Like it came to life, like a, it had a little arm and it like, and it looked at me. He, she looked at me, you know, I don't know, there's a magical connection there and, uh, it made me realize, wow, robots can be so much more than things that manipulate objects. They can be things that have a human connection. Do you have-
- ADAnca Dragan
(laughs) .
- LFLex Fridman
Was, uh, the self-driving car the moment, like w- was there a robot that truly sort of inspired you?
- ADAnca Dragan
That was, I remember that experience very viscerally, riding in that car and being just wowed. Um, I, I had the, they gave us a sticker that said, "I rode in a self-driving car."
- LFLex Fridman
(laughs) .
- ADAnca Dragan
And it had this cute little firefly on.
- LFLex Fridman
Yeah. Yes.
- ADAnca Dragan
And, or logo or something.
- LFLex Fridman
Oh, that was like the smaller one, like the firefly.
- ADAnca Dragan
Yeah, yeah, the really cute one.
- LFLex Fridman
The cute one, the cute one.
- ADAnca Dragan
Yeah. And, and I put it on my laptop and I had that for years until I finally changed my laptop out
- 5:32 – 7:32
Computer science
- ADAnca Dragan
and, you know.
- LFLex Fridman
What about if we walk back? You mentioned optimization. Uh, like what beautiful ideas inspired you in math, computer science e- early on? Like, why get into this field? It seems like a cold and boring field of math. Like, what was exciting to you about it?
- ADAnca Dragan
The thing is, I liked math from very early on, from fifth grade is when I got into the Math Olympiad and all of that.
- LFLex Fridman
Oh, you competed too, oh?
- ADAnca Dragan
Yeah. This, in Romania, it's like our national sport, you, you gotta understand. (laughs)
- LFLex Fridman
(laughs)
- ADAnca Dragan
So I got into that fairly early and, and it was little...... maybe too just theory, with no kind of ... I didn't kind of had a, didn't really have a goal (laughs) . And other than understanding, which is cool, I always like learning and understanding, but there was no, "Okay, what am I applying this understanding to?" And so I think that's how I got into, more heavily into computer science 'cause it was, it was kind of math meets something you can do tangibly in the world.
- LFLex Fridman
Do you remember, like, the first program you written?
- ADAnca Dragan
Okay. The first program w- I've written-
- LFLex Fridman
Uh-oh.
- ADAnca Dragan
... (laughs) was ... I kind of do. It was in QBasic in fourth grade. (laughs)
- LFLex Fridman
Wow.
- ADAnca Dragan
And, (laughs) and it was drawing, like, a circle. (laughs)
- LFLex Fridman
The graphics.
- ADAnca Dragan
Or a square. Yeah.
- LFLex Fridman
You draw-
- ADAnca Dragan
That was ... I don't know how to do that anymore. (laughs)
- LFLex Fridman
(laughs)
- ADAnca Dragan
But in fourth grade, that's like the first thing that they taught me. It was like, you could take a special ... I wouldn't say it was an extra curri- it was in a sense an extra-curricular, so you could sign up for, you know, dance or music or programming. And I did the programming thing and my mom was like, "Wha- what? (laughs) Why?" (laughs)
- LFLex Fridman
Did you compete in programming? Like, uh, these days, Romania probably, that's like a big thing, there's a programming competitions.
- ADAnca Dragan
Mm-hmm.
- LFLex Fridman
Was that, did that touch you at all?
- ADAnca Dragan
I did a little bit of the Computer Science Olympian, but not, not as seriously as I did the Math Olympian. So it was programming. Yeah, it's basically here's a hard math problem, solve it with a computer is kind of the deal.
- LFLex Fridman
Oh, yeah. It's more like algorithm (00:03:23) type thing.
- ADAnca Dragan
Exactly. It's al- it was algorithmic.
- 7:32 – 13:25
Favorite robot
- ADAnca Dragan
- LFLex Fridman
So again, you kind of mentioned the Google self-driving car, but outside of that, uh, what's, like who or what is your favorite robot, real or fictional, that, like, captivated your imagination throughout? I mean, I guess you kind of alluded to the Google self-drive. The Firefly was a magical moment, but is there something else? Is there other ro-
- ADAnca Dragan
It wasn't the Firefly, no. It was, I think it was the Lexus, by the way.
- LFLex Fridman
Oh, Lexus.
- ADAnca Dragan
This was back, back then. But, uh, yeah. So good question. Uh, I m- my ... Okay. My favorite fictional robot is WALL-E and I love how amazingly expressive it is. I'm person who thinks a little bit about expressive motion, the kinds of things you were saying with you can do this-
- LFLex Fridman
Yeah.
- ADAnca Dragan
... and it's a head-
- LFLex Fridman
Yeah.
- ADAnca Dragan
... and it's a manipulator and what does it all mean? Um, I like to think about that stuff. I love Pixar. I love animation. I love how they-
- LFLex Fridman
WALL-E has two big eyes, I think? Or no?
- ADAnca Dragan
Yeah. It has these, um, these k- cameras-
- LFLex Fridman
Yeah.
- ADAnca Dragan
... and they move. So yeah, that's, it's a, you know, it goes woo on them. It's super cute. It's, yeah. It's ex- And, you know, the way it moves is just so expressive. The timing of that motion, the, what it's doing with its arms and what it's doing with these lenses is amazing. And so, um, I've, I've really liked that from the start. And then on top of that, sometimes I share this, it's a personal story I share with people or when I teach about AI or whatnot. Um, my, uh, husband proposed to me by building a WALL-E and he actuated it so it, it's seven degrees of freedom including the lens thing, and it kinda came in and it had the, he made it have like a, you know, the belly box opening thing.
- LFLex Fridman
Nice.
- ADAnca Dragan
So it just did that and then it spewed out this, uh, box made out of LEGOs that opened slowly and then bam, right-
- LFLex Fridman
Oh, no.
- ADAnca Dragan
... in it. Yeah. Yeah. It was, uh, it was quite, quite ... It set a bar. (laughs)
- LFLex Fridman
That could be like the f- (laughs) the most impressive thing I've ever heard.
- ADAnca Dragan
(laughs)
- LFLex Fridman
Okay? (laughs)
- ADAnca Dragan
I have a special connection to WALL-E, long story short.
- LFLex Fridman
(laughs)
- ADAnca Dragan
I like WALL-E 'cause I like animation-
- LFLex Fridman
(laughs)
- ADAnca Dragan
... and I like robots and I like, uh, you know, the fact that this was, I, we still have this robot to this day.
- LFLex Fridman
What, uh ... How hard is that problem, do you think, of the expressivity of robots? Like, the, with the Boston Dynamics, I never talked to those folks about this particular element. I've, I've talked to 'em a lot. But it seems to be like almost an accidental side effect for them.
- ADAnca Dragan
Mm-hmm.
- LFLex Fridman
That they weren't ... I, I don't know if-
- ADAnca Dragan
They weren't planning on-
- LFLex Fridman
... they're faking it. They weren't trying to ...
- 13:25 – 32:01
How difficult is human-robot interaction?
- LFLex Fridman
Right. Th- so when you start to think about incorporating the human into this state model, uh, apologize for the philosophical question, but, uh, how complicated are human beings, do you think? Like, can they be reduced to a, to a kind of, uh, almost like an object that moves and then maybe has some basic intents? Or is there something th- do we have to model things like mood and, uh, general aggressiveness and tem- I mean, all of these kinds of human qualities, or like game theoretic qualities? Like, what's your sense-
- ADAnca Dragan
How complicated it is.
- LFLex Fridman
... how, how hard is the problem with human-robot interaction?
- ADAnca Dragan
Yeah. Should we talk about what the problem with human-robot interaction is?
- LFLex Fridman
Yeah, let's... Yeah, this, uh, what is-
- ADAnca Dragan
And then, and then-
- LFLex Fridman
... human-robot interaction?
- ADAnca Dragan
... and then talk about how that... Yeah. So, and by the way, I'm gonna talk about this very particular view of human-robot interaction, um, right? Which is not so much on the social side or on the side of how do you have a good conversation with a robot? What should the robot's appearance be? It turns out that if you make robots taller versus shorter, this has an effect on how people act with them. So I'm not, I'm not talking about that. (laughs)
- LFLex Fridman
Yeah, yeah.
- ADAnca Dragan
But I'm talking about this very kind of narrow thing, which is you take... If you wanna take a task that a robot can do in isolation, in a lab out there in the world but in isolation, and now you're asking what does it mean for the robot to be able to do this task for presumably what its actually end goal is, which is to help some person.
- LFLex Fridman
Mm-hmm.
- ADAnca Dragan
That ends up changing the problem in two ways. The first way it changes the problem is that the robot is no longer the single agent acting, that you have humans who also take actions in that same space. You know, cars navigate around people. Robots around an office navigate around the people in that office. Um, if I send a robot to over there in the cafeteria to get me a coffee, then there's probably other people reaching for stuff in the same space. And so now you have your robot and you're in charge of the actions that the robot is taking, then you have these people who are also making decisions and taking actions in that same space. And even if, you know, the robot knows what it's, what it should do and all of that, just coexisting with these people, right, kind of getting the actions to gel well, to mesh well together, that's sort of the, kind of problem number one. And then there's problem number two, which is... goes back to this notion of I- if I'm a programmer, I can specify some objective for the robot to go off and optimize. I can specify the task. But if I put the robot in your home, presumably you might have your own opinions about, "Well, okay, I want my house clean, but how do I want it cleaned, and how should robot, and how close to me it should come," and all of that. And so I think those are the two differences that you have. You're acting around people, and you, what you should be optimizing for should satisfy the preferences of that end user, not of your programmer who programmed you.
- LFLex Fridman
Yeah, and the, the preferences thing is, is tricky. So figuring out those preferences, being able to interactively adjust to understand what the human is do- so it really boils down to be- understand the humans in order to interact with them and in order to please them.
- ADAnca Dragan
Right. So now-
- LFLex Fridman
So why is this hard?
- ADAnca Dragan
... why, yeah, why is understanding humans hard?
- LFLex Fridman
Yeah.
- ADAnca Dragan
So I think there's two tasks about understanding humans that in my mind are very, very similar, but not everyone agrees. So there is the task of being able to just anticipate what people will do. We all know that cars need to do this, right? We all know that, well, if I navigate around some people, the robot has to get some notion of, "Okay, where, where is this person gonna be?" Um, so that's kind of the prediction side. And then there's wha- what you were saying, satisfying the preferences, right? So adapting to the person's preferences, knowing what to optimize for, which is more this inference side, this what is, what does this person want? What is their intent? What are their preferences? And to me, those kind of go together because I think that in... if you c- if at the very least if you can understand, if you can look at human behavior and understand what it is that they want, then that's sort of the key enabler to being able to anticipate what they'll do in the future, because I think that, you know, we're not arbitrary. We make these decisions that we make, we act in the way we do because we're trying to achieve certain things.
- LFLex Fridman
Mm-hmm.
- ADAnca Dragan
And so I think that's the relationship between them. Now, how complicated do these models need to be in order to, uh, be able to understand what people want?So, we've gotten a long way in robotics with something called inverse reinforcement learning, which is the notion of someone acts, demonstrates what, how they want the thing done.
- LFLex Fridman
What is in- inverse reinforcement learning?
- ADAnca Dragan
Yeah.
- LFLex Fridman
You just briefly said it.
- ADAnca Dragan
Right. So, it's-
- LFLex Fridman
Okay.
- ADAnca Dragan
... it's the problem of take human behavior and infer reward function from this to figure out what it is that that behavior is optimal with respect to. And it's a great way to think about learning human preferences in the sense of, you know, you have a car and the person can drive it, and then you can say, "Well, okay, I can actually learn what the person is optimizing for. Um, I can learn their driving style." Or you can, you can have people demonstrate how they want the house cleaned, and then you can say, "Okay, this is, this is... I'm getting the trade-offs that they're, that they're making, I'm getting the preferences that they want out of this." And so we've been successful in robotics somewhat with this, and it's, um, it's based on a very simple model of human behavior, uh, remarkably simple, which is that human behavior is optimal with respect to whatever it is that people want, right?
- LFLex Fridman
Mm-hmm. Mm-hmm.
- ADAnca Dragan
(laughs) So you make that assumption, and now you can kind of inverse through. That's why it's called inverse, well, really optimal control, but, but also inverse reinforcement learning. So this is based on utility maximization and economics, whereas back in the '40s, von Neumann, Morgenstern were like, "Okay, people are making choices by maximizing utility. Go." And then in the late '50s, we had Luce and Shepard come in and say, "People are a little bit noisy and approximate in that process, so they might choose something kind of stochastically with probability proportional to how much utility something has." So there's a bit of noise in there. Um, this has translated into robotics in something that we call Boltzmann rationality. So it's a, kind of an evolution of inverse reinforcement learning that accounts for, for human noise. And we've had some success with that too, for these tasks where it turns out people act noisily enough that you can't just do vanilla, the vanilla version. Uh, you can account for noise and still infer what, what they seem to want based on this. Then now we're hitting tasks where that's no- not enough. And becau-
- LFLex Fridman
What's, what's, wha- what are examples of this (overlapping)
- ADAnca Dragan
So imagine you're trying to control some robot that's, that's fairly complicated, or you're trying to control a robot arm 'cause, uh, maybe you're a patient with a motor impairment and you have this wheelchair-mounted arm, and you're trying to, to control it around. Or one task that we've looked at with Sergey is, uh, and our students did, is a Lunar Lander. So just, I don't know if you know this Atari game, it's called Lunar Lander. It's, uh, it's really hard.
- 32:01 – 34:24
HRI application domains
- LFLex Fridman
You mentioned recommender systems, what kind of... and we're talking about human-robot interaction-
- ADAnca Dragan
(laughs)
- LFLex Fridman
... what kind of problem spaces are you thinking about? So is it, um, robots, like wheeled robots with autonomous vehicles? Is it, uh, object manipulation? Like when you think about human-robot interaction in your mind, and maybe s- uh, I'm sure you can speak for the entire community of human-robot interaction.
- ADAnca Dragan
(laughs)
- LFLex Fridman
(laughs) But like, what are the problems of interest here? Is... and does it, um... you know, I- I kinda think of, um, open domain dialogue as human-robot interaction, and that happens not in a physical space but it could just happen in-
- ADAnca Dragan
In the virtual.
- LFLex Fridman
... in the virtual space.
- ADAnca Dragan
Mm-hmm.
- LFLex Fridman
So where's (laughs) - where's the boundaries of this field for you when you're thinking about the things we've been talking about?
- ADAnca Dragan
Yeah, so I, um, I try to find kind of underlying... I don't know what to even call them. Like I try to work on... you know, I might call what I do the, kind of working on the foundations of algorithmic human-robot interaction and trying to make contributions there, um, and- and it's important to me that whatever we do is actually somewhat domain agnostic when it comes to, is it about, you know, autonomous cars or is it about quadrotors or is it about... it's this sort of... the- the same underlying principles apply. Of course when you're trying to get a particular domain to work, you usually have to do some extra work to adapt that to that particular domain, but these things that we were talking about around, well, you know, how do you model humans? It turns out that a lot of systems need to... could benefit from a better understanding of how human behavior relates to what people want and need to predict human behavior, physical robots of all sorts, and- and beyond that. And so I used to do manipulation, I used to be, you know, picking up stuff, and then it was picking up stuff with people around, and now it's sort of very broad when it comes to the application level but in a sense very focused on, okay, how does the problem need to change? How do the algorithms need to change when we're not doing a robot by itself, you know, emptying the dishwasher, but we're stepping outside of that?
- LFLex Fridman
Uh, a
- 34:24 – 45:59
Optimizing the beliefs of humans
- LFLex Fridman
thought that popped into my head just now, um, on the game theoretic side of things, you said, uh, this really interesting idea of using actions to gain more information, but if- if we think of sort of game theory, the humans that are interacting with you... with- with you, the robot?
- ADAnca Dragan
Mm-hmm. Mm-hmm.
- LFLex Fridman
Wow, I'm taking the identity of the robot at this point.
- ADAnca Dragan
Yeah, I do that all the time.
- LFLex Fridman
(laughs) Yeah. Uh, is, um... they also have a world model of you.
- ADAnca Dragan
Mm-hmm.
- LFLex Fridman
And- and you can manipulate that. I mean, if we look at autonomous vehicles, people have a certain viewpoint. You said with the kids a l- people see Alexa as a... in a certain way. Is there some value in trying to also optimize how people see you as a robot?
- ADAnca Dragan
Mm-hmm.
- LFLex Fridman
Is that... or is that a little too far in- in- in... away from the specifics of what we can solve right now?
- ADAnca Dragan
So I... well, we can... both, right?
- LFLex Fridman
Yes.
- ADAnca Dragan
So it's really interesting and, um, we've seen a little bit of progress on this problem, on pieces of this problem, so you can... um, again, you know, it kind of comes down to how complicated does the human model need to be? But in one piece of work that we were looking at, we just- just said, "Okay, there is these in... there's these parameters that are internal to the robot and they're what the... um, they're what the robot is about to do or maybe what objective or driving style the robot has or something like that." And what we're gonna do is we're gonna set up a system where part of the state is the person's belief over those parameters. And now when the robot acts, that- the person gets new evidence about this robot internal state and so they're updating their mental model of the robot, right? So if they see a car that sort of cuts someone off, they're like, "Oh, that's an aggressive car." They know more, right? Um, if they see sort of a robot head towards a particular door, they're like, "Oh, yeah, the robot's trying to get to that door." So this thing that we have to do with humans to try and understand their goals and intentions, humans are inevitably gonna do that to robots, and then that raises this interesting question that you asked which is, can we do something about that? This is gonna happen inevitably, but we can... so it'd be more confusing or less confusing to people? And it turns out you can optimize for being more informative and less confusing if you- if you have an understanding of how your actions are being interpreted by the human, how they're using these actions to update their belief. And honestly, all we did is just Bayes' rule.
- LFLex Fridman
(laughs)
- ADAnca Dragan
Basically, okay-... person has a belief, they see an action, they make some assumptions about how the robot generates its actions, presumably as being rational because robots are rational. It's reasonable to assume that about them. And then they incorporate that, that, uh, uh, new piece of evidence, in a Bayesian sense in their belief, and they obtain a posterior. And now the robot is trying to figure out what actions to take such that it steers the person's belief to put as much probability mass as possible on the correct, on the correct parameters.
- LFLex Fridman
So, that's kind of a mathematical formulization of that. But, uh, my worry, and, uh, I don't know if you want to go there with me, but, uh-
- ADAnca Dragan
(laughs)
- LFLex Fridman
... (laughs) I, I, I talk about this quite a bit, um, the, the kids talking to Alexa disrespectfully worries me. I, I worry in general about human nature. Like I said, I grew up in the Soviet Union, World War II, I'm a Jew too, so with the Holocaust and everything. I just worry about how we humans sometimes treat the other, uh, the, the group that we call other, whatever that is. Through human history, the group that's the other has been changed faces.
- ADAnca Dragan
Mm-hmm.
- LFLex Fridman
But it seems like the robot will be the other, the other, uh, uh, the next other. And one thing is, it feels to me that robots don't get no respect.
- ADAnca Dragan
(laughs) They get shoved around.
- LFLex Fridman
Shoved around. And is there, one, at the shallow level, for a better experience, it seems that robots need to talk back a little bit. Like, in t- my intuition says... I mean, most companies from, uh, sort of Roomba, autonomous vehicle companies might not be so happy with the idea that a robot has a little bit of an attitude. But I feel, it feels to me that that's necessary to create a compelling experience. Like, we humans don't seem to respect anything that doesn't give us some attitude. That, or, or like mys- uh, mix of mystery and attitude and anger and d- uh, that threatens us subtly, maybe passive-aggressively. I don't know. It seems like we humans, yeah, need that. Do, wha- what are your-
- ADAnca Dragan
Sh-
- LFLex Fridman
Is there something... You have thoughts on this or... (laughs)
- ADAnca Dragan
Okay, I'll give you two thoughts on this.
- LFLex Fridman
Okay, sure.
- ADAnca Dragan
One is, one is, it, it's, we respond to, you know, someone being assertive, um, but we also respond, um, to someone being vulnerable. So, I think robots... Well, my first thought is that robots get shoved around and, and bullied a lot, um, because they're sort of, you know, tempting and they're sort of showing off, or they appear to be showing off. And so I think kind- going back to these things we were talking about in the beginning of making robots a little more, um, a little more expressive, a little bit more like, "Eh, that wasn't cool to do and now I'm bummed." (laughs)
- LFLex Fridman
Yeah.
- ADAnca Dragan
Right? Um, I think that that can actually help because people can't help but anthropomorphize and respond to that. Even that, though, the emotion being communicated is not in any way a real thing and people know that it's not a real thing because they know it's just a machine, we still interpret. You know, we watch... There's this, um, uh, famous psychology experiment with little triangles and kind of dots on a screen and a triangle is chasing the square and you get really angry-
- LFLex Fridman
Yeah.
- ADAnca Dragan
... at the darn triangle (laughs) because why is it not leaving the square alone? So, that's, yeah, we, we can't help. So, that was the first thought.
- 45:59 – 1:05:02
Difficulty of driving when humans are involved
- LFLex Fridman
I've recently had a conversation with Jim Keller, who's a sort of this, uh, legendary chip ar- uh, chip architect, but he also l- led the autopilot team for a while, and his intuition that driving is fundamentally still like a, a ballistics problem. Like, you can ignore the human element, that it's just not hitting things and you can kind of learn the right dynamics required to, uh, do the merger and all those kinds of thing. And then my sense is, and I, I don't know if I can provide sort of definitive proof of this, but my sense is, like, an order of magnitude or more, more difficult when humans are involved. Like, it's not simply a object, uh, collision avoidance problem, which, where does your intuition... Of course, nobody knows the right answer here, but where does your intuition fall on the difficulty, fundamental difficulty of the driving problem when humans are involved?
- ADAnca Dragan
Yeah. Good question. I have many opinions on this.
- LFLex Fridman
(laughs)
- ADAnca Dragan
Um, imagine downtown San Francisco.
- LFLex Fridman
Yeah.
- ADAnca Dragan
Yeah? It's crazy busy, everything. Okay, now take all the humans out. No pedestrians. No human-driven vehicles. No cyclists. No people on little s- electric scooters zipping around. Nothing. I think we're done. I think driving at that point is done. We're done. Uh, there's nothing really that's n- still needs to be solved about that.
- LFLex Fridman
Well, well, let's pause there. I, I, I think I, I agree with you. Like, I, and I think a lot of people that will hear w- will ag- agree with that, but we need to sort of internalize that idea. So what's the problem there? 'Cause we might not quite yet be done with that 'cause a lot of people kind of focus on the perception problem. A, a lot of people kind of map autonomous driving into how close are we to solving being able to detect all the, you know, the, the drivable area, the objects in the scene. Do you see that as, um... How hard is that problem?
- ADAnca Dragan
I-
- LFLex Fridman
So your intuition there behind your statement was, we might have not solved it yet, but we're close to solving basically the perception problem.
- ADAnca Dragan
I think the perception problem, I mean, and by the way, a bunch of years ago, this would not have been true, uh, and a lot of issues in the space came, were coming from the fact that, oh, we don't really, you know, we don't know what's, what's where. But, um, I think it's fairly safe to say that at this point, although you could always improve on things and all of that, you can drive through downtown San Francisco if there are no people around. There's no really perception issues standing in your way there. I think perception is hard, but yeah, it's, we've made a lot of progress on the perception, so I don't have to undermine the difficulty of the problem. I think everything about robotics is really difficult, of course. I think the, you know, the, the, the, the planning problem, the control problem, all very difficult, but I think what's, what makes it really kind of-
- LFLex Fridman
Is humans.
- ADAnca Dragan
Yeah. It might be, I mean, uh, you know, and I s- and I pick downtown San Francisco.... adapting to, "Well, now it's snowing, now it's no longer snowing, now it's slippery in this way, now..." So, the dynamics part could, could, I could imagine being, um, being still somewhat challenging. But-
- LFLex Fridman
But, uh, w- no, the, the thing that I think worries us, and our intuition's not good there is, the perception problem at the edge cases. Sort of, uh, d- downtown San Francisco, the nice thing, it, it's not actually, uh, it may not be a good example because, um-
- ADAnca Dragan
'Cause you know what you're, what you're getting r- well, there's like cr- crazy construction zones and all of that.
- LFLex Fridman
Yeah. But the thing is, is you're traveling at slow speeds, so like it doesn't feel dangerous.
- ADAnca Dragan
Mm-hmm.
- LFLex Fridman
To me, what feels dangerous is highway speeds when everything is, to us humans, super clear.
- ADAnca Dragan
Yeah. I'm assuming LIDAR here, by the way.
- LFLex Fridman
Yes.
- ADAnca Dragan
I think it's kind of irresponsible to not use LIDAR. That's just my personal opinion.
- LFLex Fridman
(laughs) that's a pre- so what's-
- ADAnca Dragan
I mean, depending on your use case, but I think like, you know, if your, if you have the opportunity to use LIDAR, well, good, and in a lot, in a lot of cases, you might not.
- LFLex Fridman
Good. Your intuition makes more sense now. So, you don't think vision-
- ADAnca Dragan
I just really just don't know enough to say, well, vision alone, what, you know, what's like ... I, there's a lot of s- how many cameras do they have? Is it how we're using them? I don't know.
- LFLex Fridman
Right. Yeah. It's details.
- ADAnca Dragan
There's all, there's all sorts of details. I imagine there's stuff that's really hard to actually see, uh, you know, just how do you deal with, with glare, exactly what you were saying, stuff that people would see that, that, that you don't.
- LFLex Fridman
Mm-hmm.
- ADAnca Dragan
I h- I think I have, more of my intuition comes from systems that can actually use, uh, LIDAR as well.
- LFLex Fridman
Yeah. And until we know for sure it makes sense to be using LIDAR, that's kind of the safety focus. But then there's sort of the ... I also sympathize with the Elon Musk-
- ADAnca Dragan
(laughs) .
- 1:05:02 – 1:10:39
Semi-autonomous driving
- ADAnca Dragan
- LFLex Fridman
Le- let me ask sort of an- another s- small side of this-
- ADAnca Dragan
Mm-hmm.
- LFLex Fridman
... that we've been talking about the pure autonomous driving problem, but there's also relatively successful systems already deployed out there th- in, uh, w- what you may call like level two autonomy or semi-autonomous vehicles, whether that's Tesla autopilot, um, I work quite a bit with, um, Cadillac SuperCruise system, which has a, a driver-facing camera that detects your state. There's a bunch of, uh, basically lane-centering systems. What's your sense about this kind of way of dealing with the human–
- ADAnca Dragan
Mm-hmm.
- LFLex Fridman
... and robot interaction problem by having a really dumb robot and h- and relying on the human to help the robot out to keep them both alive? Is, is that, uh, is that... from the research perspective, how difficult is that problem? And from a practical deployment perspective, is that a fruitful way to approach this human–robot interaction problem?
- ADAnca Dragan
I think what we have to be careful about there is to not ma- it seems like some of these systems, not all, are making this underlying assumption that if... so I'm a driver and I'm now really not driving but supervising and my job is to intervene, right? And so we have to be careful with this assumption that when I'm... if I'm supervising, I will be just as safe as when I'm driving, like that I will... you know, if I, if I wouldn't get into some kind of accident if I'm driving, I will be able to avoid that accident when I'm supervising too, and I think... I'm concerned about this assumption from a few perspectives. So from a technical perspective, it's that when you let something kind of take control and do its thing, and it depends on what that thing is obviously and how much it's taking control and how, what things are you trusting it to do? But if you let it do its thing and take control, it will go to what we might call off-policy from the person's perspective state, so states that the person wouldn't actually find themselves in if they were the ones driving. And the assumption that the person functions just as well there as they function in the states that they would normally encounter is a little questionable. Now another part is the, kind of the human factor side of this, which is that, I don't know about you, but I think-I definitely feel like I'm experiencing things very differently when I'm actively engaged in the task versus when I'm a passive observer. Like even if I try to stay engaged, right, it's very different than when I'm actually actively making decisions. Um, and you see this in, in life in general. Like you see students who are actively trying to come up with the answer learn to think better than when they're passively told the answer. I think that's somewhat related. And I think people have studied this in human factors for airplanes, and I think it's actually fairly established that these two are not the same. So-
- LFLex Fridman
Yeah. I, on, on that point, because I've gotten a huge amount of heat on this, and I stand by it.
- ADAnca Dragan
Okay.
- LFLex Fridman
Uh, 'cause I know the human factors community well, and the work here is really strong, and there's many decades of work showing exactly what you're saying. Nevertheless, I've been continuously surprised that much of the predictions of that work has been wrong in what I've seen.
- ADAnca Dragan
Mm-hmm.
- LFLex Fridman
So what we have to do... I still agree with everything you said, but we have to be a little bit more open-minded.
- ADAnca Dragan
Mm-hmm.
- LFLex Fridman
So, uh, the, the... I'll tell you, there's a few surprising things that supervi- like everything you said to the word is actually exactly correct, but it doesn't say... What you didn't say is that these systems are... You said you can't assume a bunch of things, but w- d- we don't know if these systems are fundamentally unsafe. That's still unknown.
- ADAnca Dragan
Mm-hmm.
- LFLex Fridman
Uh, there's, there's a lot of interesting things, like I'm surprised by the fact... No, not the fact. That what seems to be anecdotally from a... well, from large data collection that we've done, but also from just talking to a lot of people, when in the supervisory role of semi-autonomous systems that are sufficiently dumb, at least, which is the... Uh, th- that might be an, uh, the key element is the systems have to be dumb. The people are actually more energized as observers.
- ADAnca Dragan
Mm-hmm.
- LFLex Fridman
So, they're actually better. They, they're, they're better at observing the situation. So there might be cases in systems, if you get the interaction right, where you as a supervisor will do a better job, uh, with the system together.
- ADAnca Dragan
I agree. I think that is actually really possible. I guess mainly I'm pointing out that if you do it naively, you're, uh, implicitly assuming something, that assumption might actually really be wrong. But I do think that if you explicitly think about what the agent should do such that the person still stays engaged, what the... so that you essentially empower the person to more than they could... Th- that's the really the goal, right? Is you still have a driver, so you wanna empower them to be so much better than they would be-
- LFLex Fridman
Yes.
- ADAnca Dragan
... by themselves. And that's different. It's a very different mindset than, "I want them to basically not drive." (laughs) Right? And, um, but be ready to, to sort of take over.
- 1:10:39 – 1:17:30
How do we specify good rewards?
- ADAnca Dragan
- LFLex Fridman
So one, one of the interesting things we've been talking about is the rewards that, uh, they seem to be fundamental to the, the way robots behaves. So broadly speaking, we've been talking about utility functions, but could you comment on how do we approach the design of reward functions? Like, how do we come up with good reward functions?
- ADAnca Dragan
Mm-hmm. Well, (laughs) really good question. Because the answer is we don't. (laughs)
- LFLex Fridman
(laughs)
- ADAnca Dragan
Um, this was, um, you know, I used to think, I, I used to think about how well, it's actually really hard to specify rewards for interaction because, you know, it's really supposed to be what the people want. And then you really, you know, we talked about how you have to customize what you wanna do to, um, the end user. But I kind of realized that even if you take the interactive component away, it's still really hard to design reward functions. So what do I mean by that? I mean, if we assume this sort of AI paradigm in which there's an agent and his job is to optimize some objective, some reward, utility, loss, whatever, cost. Um, if you write it out, maybe it's a set, depending on the situation or whatever it is. Um, if you write it out and then you deploy the agent, you'd wanna make sure that whatever you specified incentivizes the behavior you want from the agent in any situation that the agent will be faced with, right? So I do motion planning on my robot arm. I specify some cost function like, uh, you know, this is how far away you should try to stay, so much it matters to stay away from people, and this is how much it matters to be able to be efficient and blah, blah, blah, right? I need to make sure that whatever I specified, those constraints or trade-offs or whatever they are, that when the robot goes and solves that problem in every new situation, that behavior is the behavior that I wanna see. And what I've been finding is that we have no idea how to do that. That basically what I can do is I can sample, I can think of some situations that I think are representative of what the robot will face, and I can tune and add and tune some reward function until the optimal behavior is what I want on those situations. Which first of all is super frustrating because, you know, through the miracle of AI we've take... we don't have to specify rules for behavior anymore, right? The, we were saying before the robot comes up with the right thing to do. You plug in the situation, it optimizes, right? In that situation, it optimizes. But you have to spend still a lot of time on actually defining what it is that that criterion should be. Make sure you didn't forget about 50 bazillion things that are important and how they all should be combining together to tell the robot what's good and what's bad and how good and how bad.And so I think (sighs) I, this is, this is a, a lesson that I don't know, kind of, I guess I closed my eyes to it for a while 'cause I've been, you know, tuning cost functions for 10 years now. Um, (smacks lips) but it, it's, it really strikes me that, yeah, we've moved the tuning and the, like, designing of features or whatever from the f- behavior side into the reward side, and yes, I agree that there's way less of it, but it still seems really hard to anticipate any possible situation and make sure you specify a reward function that when optimized will work well in every possible situation.
- LFLex Fridman
Uh, so, so y- you're kind of referring to unintended consequences or just in general, any kind of, uh, suboptimal behavior that emerges outside of the things you've said.
- ADAnca Dragan
Suboptimal-
- LFLex Fridman
O- out, out of distribution. (laughs)
- ADAnca Dragan
(laughs) Suboptimal behavior that is, you know, actually optimal, I mean, this, I guess the idea of unintended consequence, uh, you know, it's optimal with respect to what you specified but it's not what you want.
- LFLex Fridman
And-
- ADAnca Dragan
And there's a difference between those.
- LFLex Fridman
So i- uh, but that's not fundamentally a robotics problem, right? That's a human problem. So like-
- ADAnca Dragan
That's the thing.
- LFLex Fridman
Yeah.
- ADAnca Dragan
Right? So there is this thing called Goodhart's law which is you set a metric for an organization and the moment it becomes on target that people actually optimize for, it's no longer a good metric. Um-
- LFLex Fridman
Oh, what's it called? That's a cool-
- ADAnca Dragan
Goodhart's law.
- LFLex Fridman
Goodhart's law. So the moment you specify a metric, it stops doing its job.
- ADAnca Dragan
Yeah. It stops doing its job. Um, so there, yeah, there's such a thing as over optimizing for things and, and you know, failing to, to think ahead of time of all the possible things that might be important. And so that's, so that's interesting because, uh, uh, historically I've worked a lot on reward learning from the perspective of customizing to the end user but it, it really seems like it's not just the interaction with the end user that's a problem of the human and the robot collaborating so that the robot can do what the human wants, right? This kind of back and forth, the robot probing, the person being informative, all of that stuff, might be actually, uh, just as applicable to this kind of maybe new form of human/robot interaction which is the interaction between the robot and the expert programmer/roboticist designer in charge of actually specifying what the heck (laughs) the robot should do. Specifying the task for the robot.
- LFLex Fridman
That's fasci- that's so cool, like, collaborating on the reward, uh, design.
- ADAnca Dragan
Right, collaborating on the reward design, and so what, what does it mean, right? What does it, when we think about the problem not as someone specifies all of your job is to optimize and we start thinking about you're in this interaction and this collaboration and the first thing that comes up is when the person specifies a reward, it's not, you know, gospel. It's not like the letter of the law. It's not the definition of the reward function you should be optimizing 'cause they're doing their best but they're not some magic perfect oracle.
- LFLex Fridman
Mm-hmm.
- ADAnca Dragan
And the sooner we start understanding that, I think the sooner we'll get to more robust (laughs) robots that function better in different situations. And then, then you have kind of say, okay, well it's, it's almost like the robots are over learning (laughs) over, they're putting too much weight on the rewards specified by definition, um, and maybe leaving a lot of other information on the table, like what are other things we could do to actually communicate to the robot about what we want them to do besides attempting to specify a reward function?
- LFLex Fridman
Yeah, you have this
- 1:17:30 – 1:21:59
Leaked information from human behavior
- LFLex Fridman
awesome, (laughs) and again, I love the poetry of it, of leaked information.
- ADAnca Dragan
(laughs)
- LFLex Fridman
So, so you, you mention (laughs) humans leak information about what they want, you know, leak reward signal for the, for the robot.
- ADAnca Dragan
Yeah.
- LFLex Fridman
So h- how do we detect these leaks?
- ADAnca Dragan
What is that? Yeah, what are these leaks? (laughs)
- LFLex Fridman
I di- just, I don't know, I, th- uh, tha- those were just recently, uh, saw it, read it I don't know where from you and I, that's, it's gonna stick with me for a while for some reason 'cause it's not explicitly expressed. It kind of leaks indirectly from our, uh, behavior.
- ADAnca Dragan
From what we do, yeah, absolutely. So I think maybe some, some surprising bits, right? So we were talking be- before about, "I'm a robot arm. It needs to move around people, carry stuff, put stuff away, all of that," and now imagine that, you know, the robot has some initial objective that the programmer gave it so that it can do all these things functionally. It's capable of doing that. And now I notice that it's doing something and maybe it's coming too close to me, right? And maybe I'm the designer, maybe I'm the end user and this robot is now in my home, and I push it away. So I push it away 'cause, you know, it's a, it's a reaction to what the robot is currently doing, and this is what we call physical human/robot interaction.
- LFLex Fridman
Mm-hmm.
- ADAnca Dragan
And now there's a lot of, uh, there's a lot of interesting work on how the heck do you respond to physical human/robot interaction? What should the robot do if such an event occurs? And there's sort of different schools of thought. It's well, you know, you can sort of treat it the control theoretic way and say this is a disturbance that you must reject (laughs) . Um, you can sort of treat it, um, uh, more kind of heuristically and say I'm gonna go into some, like, gravity compensation mode so I'm easily maneuverable around or I'm gonna go in the direction that the person pushed me, and, and to us, part of the realization has been that that is signal that communicates about the reward because if my robot was moving in an optimal way and I intervened, that means that I disagree with its notion of optimality (laughs) , right? Whatever it thinks is optimal is not actually optimal, and sort of optimization problems aside, that means that the cost function, the reward function is, is incorrect, or at least is not what I want it to be.
- LFLex Fridman
How difficult is that signal to, to, to-... uh, to interpret and make actionable.
- ADAnca Dragan
... learn from.
- LFLex Fridman
So like I... 'cause this connects to our autonomous vehicle discussion, where they're in a semi-autonomous vehicle or autonomous vehicle when the safety driver disengages the car. Like, they could have disengaged it for a million reasons.
- ADAnca Dragan
Reasons, yeah. Yeah. So, that's true. Again, it comes back to, can you s- can you structure a little bit, your assumptions about how human behavior relates to what they want? Um, and you, you know, you can... One thing that we've done is literally just treated this external torque that they applied as, you know, when you take that and you add it with what- the torque the robot was already applying, that overall action is probably relatively optimal with respect to whatever it is that the person wants, and then that gives you information about what it is that they want, so you can learn that people wanted to stay further away from them. Now, you're right that there might be many things that explain just that one signal, and that you might need much more data than that for- for- for the person to be able to shape your reward function over time.
- LFLex Fridman
Mm-hmm.
- ADAnca Dragan
Um, you can also do this info gathering stuff that we were talking about. I don't know if we've done that in that context, just to clarify, but it's definitely something we've thought about, where you can, uh, have the robot start acting in a way, like if there are a bunch of different explanations, right? It moves in a way where it sees if you're corrected in some other way or not, and then kind of actually plans its motions so that it can disambiguate and collect information about what you want. Um, anyway, so that's one way, that's kind of sort of leaked information. Maybe even more subtle leaked information is if I just press the E-Stop, right? I just, I'm doing it out of panic 'cause the robot is about to do something bad. There's, again, information there, right? Okay, the robot should definitely stop (laughs) , but it should also figure out that whatever it was about to do was not good, and in fact, it was so not good that stopping and remaining stopped for a while was better, a better trajectory for it than whatever it is that it was about to do. And that, again, is information about, what are my preferences? What do I want?
- LFLex Fridman
Speaking of E-Stops,
- 1:21:59 – 1:26:31
Three laws of robotics
- LFLex Fridman
what are your, um, expert opinions on the three laws of robo- uh, robotics from Isaac Asimov?
- ADAnca Dragan
(laughs)
- LFLex Fridman
They don't harm humans, obey orders, protect yourself. I mean, it's a- it's, uh, such a silly notion, but I speak to so many people these days, just regular folks, just, I don't know, my- my parents and so on, about robotics, and they kind of operate in that space of, um, you know, imagining our future with robots and thinking what are the ethical... How do we m- get that dance right? Uh, I know the three laws might be a silly notion, but do you- do you think about like what universal reward functions there might be that, uh, we should enforce on the robots of the future? Or is that a little too far out? And does it, uh, or is the mechanism that you just described, th- there shouldn't be three laws, it should be constantly adjusting kind of thing?
Episode duration: 1:38:32
Install uListen for AI-powered chat & search across the full episode — Get Full Transcript
Transcript of episode iOCfIFBBpVY
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome