No PriorsNo Priors Ep. 107 | With Physical Intelligence Co-Founder Chelsea Finn
EVERY SPOKEN WORD
75 min read · 14,874 words- 0:00 – 0:31
Introduction
- SGSarah Guo
Hi, listeners. Welcome to No Priors. This week, we're speaking to Chelsea Finn, co-founder of Physical Intelligence, a company bringing general purpose AI into the physical world. Chelsea co-founded Physical Intelligence alongside a team of leading researchers and minds in the field. She's an associate professor of computer science and electrical engineering at Stanford University, and prior to that, she worked at Google Brain and was at Berkeley. Chelsea's research focuses on how AI systems can acquire general purpose skills through interactions with the world. So,
- 0:31 – 3:10
Chelsea’s background in robotics
- SGSarah Guo
Chelsea, uh, thank you so much for joining us today on No Priors.
- CFChelsea Finn
Yeah. Thanks for having me.
- SGSarah Guo
You've done a lot of, um, really important storied work, um, in robotics between your work at Google, at Stanford, et cetera, so I would, I would just love to hear a little bit firsthand your background in terms of your path in the world of robotics, what drew you to it initially, and some of the work that you've done.
- CFChelsea Finn
Yeah. It's been a long road. At the beginning, I was really excited about the impact that robotics could have in the world, but at the same time, I was also really fascinated by this problem of developing perception and intelli-intelligence in machines, uh, and robots embody all of that. Uh, and also there's... Sometimes there's some cool math that you can do as well that makes... Keeps your brain active, makes you think, uh, and so I think all of that is really fun about working in the field. I started working more seriously in robotics more than 10 years ago at this point, uh, at the start of my PhD at Berkeley, and we were working on neural network control, uh, trying to train neural networks that map from image pixels to directly actually to motor torques on a robot arm. Uh, at the time, this was not very popular, uh, and we've come a long way, and it's-
- SGSarah Guo
Mm-hmm.
- CFChelsea Finn
... a lot more accepted in robotics and also just generally something that a lot of people are excited about. Since that beginning point, it was very clear to me that we could train robots to do pretty cool things, but that getting the robot to do one of those things in many scenarios with many objects was a major, major challenge. So 10 years ago, we were training robots to, like, screw a cap onto a bottle and use a spatula to lift an object into a bowl and kind of do a tight insertion or hang up, um, like, a hanger on a clothes rack. Uh, and so pretty cool stuff, uh, but actually getting the robot to do that in many environments with many objects, that's where a big part of the challenge comes in. And, uh, I've been thinking about ways to make broader data sets, train on those broader data sets, and also different approaches for learning, whether it be reinforcement learning, video prediction, uh, imitation learning, uh, all, all those things. Uh, and so yeah. Moved from... Um, spent a year at, at Google Brain, uh, in between my PhD and joining Stanford. Uh, became a professor at Stanford. Started a lab there. Did a lot of work, um, along all these lines. Uh, and then recently started Physical Intelligence, uh, almost a year ago at this point, so I've been on leave from Stanford for that. And it's been really exciting to be able to try to execute on the vision that, uh, the co-founders, uh, that we collectively have and, um, and do it with a lot of resources and so forth. And, um, I'm also still advising students at Stanford as well.
- SGSarah Guo
That's
- 3:10 – 5:13
Physical Intelligence
- SGSarah Guo
really cool. And I, I guess you started Physical Intelligence with four other co-founders and, uh, an incredibly impressive team. Could you tell us a little bit more about what Physical Intelligence is working on and the approach that you're taking? 'Cause I think it's a pretty unique, um, slant on the whole field and approach.
- CFChelsea Finn
Yeah. So we're trying to build a big neural network model that could ultimately control any robot to do anything in any scenario. And, like, a big part of our vision is that in the past, robotics has focused on, like, trying to go deep on one application and, like, developing a robot to do one thing and then ultimately gotten s- kind of stuck in that one application. It's really hard to, like, solve one thing and then try to get out of that and broaden. And instead, we're really, um, in it for the, the long term to try to address this broader problem with physical intelligence in the real world. We're thinking a lot about generalization, generalists, uh, and unlike other robotics companies, we think that being able to leverage all of the possible data is very important. And this, uh, comes down to actually not just leveraging data from one robot, but from any robot platform that might have six joints or seven joints or two arms or one arm. We've seen a lot of evidence that you can actually transfer a lot of rich information across these different embodiments and allows-
- SGSarah Guo
Mm-hmm.
- CFChelsea Finn
... you to use data. And also if you iterate on your robot platform, you don't have to throw all your data away. Uh, I have faced a lot of pain in the past where we got a new version of the robot and then your policy doesn't work, uh, and y- it's, it's a really painful process to try to get back to where you were-
- SGSarah Guo
Mm-hmm.
- CFChelsea Finn
... um, on the previous robot iteration. So yeah. Trying to build generalist robots, uh, and essentially kind of develop foundation models that will power the next generation of robots, uh, in the real world.
- SGSarah Guo
That's really cool 'cause I mean, I guess there's a lot of sort of, um, parallels to the large language model world where, you know, really a mixture of deep learning, the transformer architecture, uh, and scale has really proven out that you can get real generalizability in different forms of transfer, uh, between different areas. Could you tell us a little bit
- 5:13 – 7:39
Defining their approach and model architecture
- SGSarah Guo
more about the architecture you're taking or the approach or, you know, how you're thinking about the basis for the foundation model that you're developing?
- CFChelsea Finn
At the beginning, we were just getting off the ground. We were trying to scale data collection, uh, and a big part of that is, unlike in language, we don't have, like, Wikipedia or an internet of robot motions, and we're really excited about scaling data on real robots in the real world. This is... This kind of real data is what has fueled machine learning advances in the past, and a big part of that is we actually need to collect that data, and that looks like tele-operating robots in the physical world. We're also exploring other ways of scaling data as well, but the kind of bread and butter is scaling real robot data. Uh, we released something in late October where we showed, um, some of our initial efforts around scaling data and how we can learn very complex tasks of folding laundry, cleaning tables, built- constructing a, a cardboard box. Now...Where we are in that, uh, journey is really thinking a lot about language interaction, uh, and generalization to different environments. So what we showed in October was the robot in one environment and it was trained, it had, had data in that environment. We did, we were able to see some amount of generalization, so it was able to fold shirts that it had never seen before, fold shorts it has never seen before, but, um, the degree of generalization was very limited and you also couldn't interact with it in any way. You couldn't prompt it and tell you what, what you want it to do, um, beyond kind of fairly basic things that it saw in the training data. Uh, and so being able to handle lots of different prompts in lots of different environments is a big focus right now. And in terms of the architecture, we're, we're using transformers, uh, and we are using pretrained models, pretrained vision language models, and, uh, that allows you to leverage all of the rich information in the internet. We had a research result, um, a couple years ago where we showed that if you leverage vision language models then you could actually get the robot to do tasks that require concepts that were never in the robot's training data but were in the internet.
- SGSarah Guo
Mm-hmm.
- CFChelsea Finn
Like one famous example is that you can pass a Coke can to Taylor Swift, or a picture of Taylor Swift, and the robot has never seen Taylor Swift in person but the internet has lots of images of Taylor Swift in it, and you can leverage all of the information in that data and in the weights of the pretrained model to kind of transfer that to the robot. So we're not starting from scratch and that helps a lot as well. Um, so that's a little bit about the approach. Happy to, to dive deeper as well.
- SGSarah Guo
That's really amazing. And then,
- 7:39 – 9:46
Reaching generalizability and diversifying robot data
- SGSarah Guo
um, what, what do you think is the main basis then for really getting the generalizability? Is it scaling data further? Is it scaling compute? Is it a combination of the two? It's other forms of post-training? Is it like... I'm just sort of curious, like, as you think through the common pieces that people look at now, um, I'm sort of curious what you think needs to get filled in. Obviously, on the, again, the more language model world, people are spending a lot of time on reasoning modules and other things like that as well. So I'm curious, like, what are the components that you feel are missing right now?
- CFChelsea Finn
Yeah, so I think the number one thing, and this is kind of the boring thing, is just getting more diverse robot data.
- SGSarah Guo
Mm-hmm.
- CFChelsea Finn
So for that release that we had in late October last year, we collected data in three buildings technically.
- SGSarah Guo
Mm-hmm.
- CFChelsea Finn
The internet, for example, everything that has fueled language models and vision models is way, way more di- diverse now because the internet is pictures that are taken by lots of people and texts written by lots of different people. And so just trying to collect data in many more diverse places and with many more objects, many more tasks. So scaling the diversity of the data, not just the quantity of the data, is very important, and that's, um, a big thing that we're focusing on right now. Actually bringing our robots into lots of different places and collecting data in it. Uh, as a side project of that, we also learn what it takes to actually get your robot to be operational and functional in lots of different places, and that is a really nice byproduct because if you actually want to get robots to work in the real world, you need to be able to do that. So that's the number one thing, but then we're also exploring other things. Leveraging videos of people. Uh, again, leveraging data from the web. Leveraging pretrained models. Uh, thinking about reasoning, although more basic forms of reasoning, in order to, for example, put a, a dirty shirt into a hamper. If you can recognize where the shirt is and where the hamper is and what you need to do to accomplish that task, um, that's useful. Or if you want to make a sandwich, uh, and the user has a particular request in mind, you should reason through that request. If they're allergic to pickles, you probably shouldn't put pickles on the sandwich. Um, things like that. So there's some basic things around there, although the number one thing is just more diverse
- 9:46 – 12:32
Open source vs. closed source
- CFChelsea Finn
robot data.
- SGSarah Guo
And then I think a lot of the approach you've taken to data has really been an emphasis on releasing open source models and packages for robotics. Uh, do you think that's the long-term path? Do you think it's open core? Do you think it's eventually proprietary models? Or how do you think about that in, in the context of the industry? Because it feels like there's a, a few different robotics companies now each taking different approaches in terms of either hardware only, uh, excuse me, hardware plus software and they're, they're focused on a specific hardware footprint. There's, um, software and there's closed source versus open source if you're just doing the software. So I'm sort of curious where in that spectrum, uh, physical intelligence lies.
- CFChelsea Finn
Definitely. So we've actually been quite open. We've, uh, not only have we open sourced some of the, the weights and release details in, in technical papers, we've actually also been working with hardware companies and giving designs of robots to hardware companies. And some people have actually, like, when I tell people this sometimes they're actually really shocked that, like, like, what about the IP? What about, I don't know, confidentiality and stuff like that? Uh, and we've actually made this, made a very intentional choice around this. Uh, there's a couple reasons for it. One is that we think that the field... i- it's really just the beginning and these models will be so, so much better, uh, and the robots should be so, (laughs) so much better, uh, in a year, in, in three years. And we want to support the development, uh, of the research and we want to support the community, support the robots so that, um, when we hopefully develop the technology of these generalist models, the world will be more ready for it. We'll have better, like, more robust robots that are able to leverage those models, people who have the expertise and understand, uh, what it requires to use those models. And then the other thing is also, like, we have a really fantastic team of researchers and engineers, and really, really fantastic researchers and engineers want to work at companies that are, uh, that are open, especially, uh, researchers, uh, where they can get kind of credit for their work and share their ideas, talk about their ideas.
- SGSarah Guo
Mm-hmm.
- CFChelsea Finn
Uh, and we think that having the best researchers and engineers will be necessary for solving this problem. The last thing that I'll mention is that I think the biggest risk with this bet is that it won't work. Like, I'm not really worried about competitors, I'm more worried that w-... no one will solve the problem.
- SGSarah Guo
Oh, interesting. And, uh, why do you worry about that?
- CFChelsea Finn
I think robotics is... it's very hard. Uh, and there have been many, many failures in the past. And unlike when you're, like, recognizing an object in an image, uh, there's very little tolerance for error. You can miss a grasp, um, on an object or, like, not make con-... like, the difference between making contact and not making contact in an object-
- SGSarah Guo
Mm-hmm.
- CFChelsea Finn
... is so small, uh, and it has a massive impact on the, um, on the outcome of, of whether your, whether the robot can actually successfully manipulate the object. And I mean, that's just one example. There's challenges on the data side, uh, of collecting data.
- SGSarah Guo
Mm-hmm.
- CFChelsea Finn
Um, well, just anything involving hardware is hard as well.
- SGSarah Guo
I guess we have
- 12:32 – 14:34
Where will PI’s models integrate first?
- SGSarah Guo
a number of examples now of, uh, robots in the physical world. Um, you know, everything from autopilot on, or on a jet, on through to some forms of pick and pack and, or other types of robots and distribution centers, and there's obviously the different, um, robots involved with manufacturing, particularly in automotive, right? So there's been a handful of more constrained environments where people have been using them in different ways. Wh- where do you think the impact of these models will s- will first show up? Because to your point, there are certain things where you have very low tolerance for error, and then there's a lot of fields where actually it's okay, or maybe you can constrain the problem sufficiently relative to the capabilities of the model that it works fine. Where do you think, uh, physical intelligence will have the nearest-term impact, or in general, the field of robotics and these new approaches will substantiate themselves?
- CFChelsea Finn
Yeah. As a company, we're really focused on, on the long-term problem and, and not at, like, any one particular application because of the failure modes that can come up when you focus on one application. I don't know, uh, where-
- SGSarah Guo
Mm-hmm.
- CFChelsea Finn
... where the first applications will be. I think one thing that's actually challenging is that typically in machine learning, a lot of the successful applications of, like, recommender systems, language models, um, like, uh, image detection, a lot of the consumers of that, of the model outputs, are actually humans who could actually check it, and the humans are good at the thing.
- SGSarah Guo
Mm-hmm.
- CFChelsea Finn
A lot of the very natural applications of robots is actually the robot doing something autonomously, uh, on its own, uh, where it's not, like, a human consuming the commanded arm position, for example, uh, and then checking it and then validating it and so forth. And so I think we need to think about new ways of having some kind of tolerance-
- SGSarah Guo
Mm-hmm.
- CFChelsea Finn
... for mistakes, or, or scenarios where, where that's fine or scenarios where humans and robots can work together. Uh, that's, I think, one, one big challenge that will come up when trying to actually deploy these, and some of the language interaction work that we've been doing is actually motivated by this challenge, where we think it's really important for humans to be able to kind of provide input for how they want the robot to behave and what they want the robot to do, how they want the robot to help in a particular scenario.
- SGSarah Guo
That makes sense. I guess the other form of, uh, generalizability,
- 14:34 – 16:28
Humanoid as a form factor
- SGSarah Guo
to some extent, at least in our current world, is the, uh, is human form, right? And so some people are specifically focused on humanoid robots, like Tesla and others, under the assumption that the world is designed for people and therefore it's the perfect form factor to co-exist with people. And then other people have taken very different approaches in terms of saying, "Well, I need something that's more specialized for the home in certain ways or for factories or manufacturing," or you name it. What is your view on kind of humanoid versus not?
- CFChelsea Finn
On one hand, I think humanoids are really cool, and I have one in my lab at Stanford. Uh, on the other hand, I think that they're a little overrated. Uh, and one way kind of to practically look at it is I think that we're generally fairly bottlenecked on data right now, and some people argue that with humanoids, you can maybe collect data more easily because it matches the human form factor. And so maybe it'd be easier to mimic humans, and I've actually heard people make those arguments. But, uh, if you've ever actually tried to teleoperate a humanoid, uh, it's actually a lot harder to teleoperate than-
- SGSarah Guo
Mm-hmm.
- CFChelsea Finn
... than a static manipulator or, or a mobile manipulator with wheels. Optimizing for being able to collect data, I think, is very important, uh, because the m-... if we can get to the point where we have more data than we could ever want, then it just comes down to research and compute and, and evaluations. And so we're optimizing for... that's one of the things we're kind of optimizing for. And so we're using cheap robots, we're using robots that we can very easily develop teleoperation interfaces for in which you can do teleoperation very quickly, uh, and collect diverse data, collect lots of data.
- SGSarah Guo
Yeah, it's funny. There was that viral, uh, fake Kim Kardashian video of her going shopping with a robot following her around carrying all of her shopping bags. When I saw that, I really wanted a humanoid robot to follow me around everywhere (laughs) 'cause that'd be really funny to do that. So I'm hopeful that someday I can use, um, your software to cause a robot to follow me around to do things. So exciting future. Um (laughs) , how do you think about,
- 16:28 – 17:36
Embodied intelligence
- SGSarah Guo
um, the embodied model of development versus not on some of these things in terms of... that's a- that's another sort of, I think, set of trade-offs that some people are making or deciding between?
- CFChelsea Finn
Well, the AI community is very focused on just, like, language models, vision language models and so forth, and, um, there's, uh, like, a ton of hype around, like, reasoning and stuff like that. Oh, let's create, like, the most intelligent thing. I feel like actually people underestimate how much intelligence goes into motor control. Many, many years of evolution is what led to us being able to use our hands the way that we do, and there are many animals that they can't do it, um, even though they had so much, so many years of evolution. And so I think that there's actually so much complexity and intelligence that goes into being able to do something as basic as, like, make a bowl of cereal, uh, or pour a glass of water. And, um, yeah, so in some ways, I think that actually, like, embodied intelligence or physical intelligence is very core to, uh, intelligence and maybe kind of underrated compared to some of the, the less embodied models.
- SGSarah Guo
One of the papers that I really loved over the last couple of years in robotics was your ALOHA paper, and I thought it was a very clever approach. What is some of
- 17:36 – 20:05
Key turning points in robotics progress
- SGSarah Guo
the research over the last two or three years that you think has really caused this flurry of activity? Because I feel like there's been a number of people now starting companies in this area 'cause a lot of people feel like now's the time to do it, and I'm a little bit curious what research you feel was the basis for that shift in, in people thinking this was a good place to work.
- CFChelsea Finn
At least for us, there were a few things that we felt like were turning points, that felt like...... where it felt like the field was moving a lot faster compared to where it was before. Uh, one was, uh, the SayCan work, where we found that you can plan with language models as kind of the high-level part-
- SGSarah Guo
Mm-hmm.
- CFChelsea Finn
... uh, and then kind of plug that in with a low-level model to get a model to do long horizon tasks. One was the RG2 work, which showed that you could do the Taylor Swift example that I mentioned earlier and be able to plug in kind of the, a lot of the web data and get better generalization on robots. Uh, a third was our RTX work, where we were, we actually were able to train models across robot embodimentss, and significantly, we basically took all the robot data that, um, different research labs had. It was a huge effort to aggregate that into a common format and train on it. And we also, when we trained on that, we actually found that we could take a checkpoint, send that model checkpoint to another lab, um, halfway across the country, and the grad student at that lab could run the checkpoint on the robot and would actually s- more often than not do better than the model that they had specifically iterated on themself in their own lab. And that was, like, another big sign that, like, this stuff is actually starting to work and that you can get benefit across by, by pooling data across different robots. And then also, like you mentioned, I think the LOHA work, uh, and, and later the Mobile LOHA work was work that showed that you can tele-operate and get models to train pretty complicated dextrous-
- SGSarah Guo
Mm-hmm.
- CFChelsea Finn
... manipulation tasks. We also had the, a follow-up paper with the shoelace tying. Uh, that was a, a fun project 'cause someone said that they would retire if they saw a robot tie shoelaces, so.
- SGSarah Guo
Did they retire?
- CFChelsea Finn
Uh, they did not retire. (laughs)
- SGSarah Guo
Oh. That's awful.
- CFChelsea Finn
They're still in the field.
- SGSarah Guo
We need to force them into retirement. Whoever that person is, we need to follow up on that. (laughs)
- CFChelsea Finn
Yeah, so I think that those are a few examples, uh, and, and so yeah, I think we've seen a ton of progress in the field. I also, it, it seems like, um, after we started PI that that was also kind of a sign to others that if the experts are really willing to bet on this, then, um, something, maybe something will happen.
- SGSarah Guo
So one thing that you all came out with today from PI was what you call a hierarchical
- 20:05 – 22:21
Hierarchical interactive robot and decision making
- SGSarah Guo
interactive robot, or a HI robot. Um, can you tell us a little bit more about that?
- CFChelsea Finn
So this was a really fun project. There's two things that we're trying to look at here. One is that if you need to do, like, a longer horizon task, meaning a task that might take minutes to do, then if you just train a single policy to, like, output actions based on images, um, like if you're trying to make a sandwich and you train a policy that's just outputting the next motor command, uh, that might not do as well as something that's actually kind of thinking through the steps to accomplish that task. That was kind of the first component, and that's where the hierarchy comes in. And the second component is, a lot of the times when we train robot policies, we're just saying, like, we'll take our data, we'll annotate it and say, like, "This is picking up the sponge. This is putting the bowl in the bin. This segment is, I don't know, folding the shirt." And then you get a policy that can, like, follow those basic commands of, like, fold the shirt or pick up the cup, those sorts of things. Uh, but at the end of the day, we, we don't want robots just to be able to do that. We want them to be able to interact with us where we can say, like, "Oh, I'm a vegetarian. Can you make me a sandwich? Oh, and I, like, I'm allergic to pickles, uh, so, like, maybe don't include those." Um, and maybe also be able to interject in the middle and say, like, "Oh, hold off on the tomatoes," or something. It's actually kind of a big gap between something that can just follow, like, an instruction like pick up the cup and something that could be able to handle those kinds of prompts and those situated corrections and so forth. Uh, and so we developed a system that basically, um, has one model that takes as input the prompt and kind of reasons through, um, and is able to output, like, the next step that the robot should follow, and that might be... That, that's kind of, like, it's gonna tell it to then the next thing will be pick up the tomato, for example, um, and then a lower level model that takes as input pick up the tomato and outputs the sequence of motor commands for the next, like, half second. That's the gist of it. We, it was a lot of fun because we actually got the robot to make a vegetarian sandwich or a ham and cheese sandwich or whatever. We also did a grocery shopping example, um, and a good table cleaning example, and I was excited about it at first because it was just, like, cool to see the robot be able to respond to different prompts and do these challenging tasks, and second, because it actually seems like a, like the right approach for solving the problem.
- SGSarah Guo
On the, uh,
- 22:21 – 26:25
Choosing data inputs
- SGSarah Guo
technical capability side, one thing I was wondering about a little bit was, uh, if I look at the world of self-driving, there's a few different approaches that are being taken, and one of the approaches that is the more kind of Waymo-centric one is really incorporating a variety of other types of sensors besides just vision, so you have LIDAR and a few other things, as ways to augment the self-driving capabilities of a vehicle. Where do you think we are in terms of the sensors that we use in the context of robots? Is there anything missing? Is there anything we should add? Are there types of inputs or feedback that we need to incorporate that haven't been incorporated yet?
- CFChelsea Finn
So we've gotten very far just with vision, with RGB images even, uh, and we typically will have, uh, one or multiple external kind of what we call base cameras that are looking at the scene, and also cameras mounted to each of the wrists of the robot. We can get very, very far with that. I would love if, like, skin, uh, if we could give our robots skin.
- SGSarah Guo
Mm-hmm.
- CFChelsea Finn
Unfortunately, a lot of the tactile sensors that are out there are either far less robust than skin, far more expensive, uh, or very, very low resolution, and so there's a lot of kind of challenges on the hardware side there, and we found that actually that mounting RGB cameras to their wrists ends up being very, very helpful and probably giving you a lot of the same information that tactile sensors can give you.
- SGSarah Guo
'Cause when I think about the set of sensors that are incorporated into a person, obviously to your point, there's the tactile c- sensors, e- effectively, right? And then there's heat sensors. There's actually a variety of things that are incorporated that people usually don't really think about much.
- CFChelsea Finn
Absolutely.
- SGSarah Guo
And I'm just sort of curious, like, how many of those are actually necessary in the context of robotics versus not? What are some of the things we should think about, like just if we extrapolate off of humans or animals or other, you know...
- CFChelsea Finn
It's a great question. I mean, for the sandwich making, you could argue that you'd want the robot to be able to taste the sandwich to know if it's good or not.
- SGSarah Guo
Or smell it at least, you know. (laughs)
- CFChelsea Finn
Yeah. I've made a lot of arguments for smell to, uh, to Sergei in the past 'cause it's, there's a lot of nice things about smell although we've never actually attempted it before.
- SGSarah Guo
Yeah.
- CFChelsea Finn
In some ways the redundancy is nice. Uh-... for example, I— and I think, like, audio for example, like, y- a human, um, if you hear something that's unexpected, it can actually kind of alert you to something. I- in many cases, it might actually be very, very redundant with your other sensors 'cause you might be able to actually see something fall, for example.
- SGSarah Guo
Mm-hmm.
- CFChelsea Finn
And that redundancy can lead to robustness. For us, it's not... currently not a priority to, to look into these sensors 'cause we think that the bottleneck right now is elsewhere, is on the data front, is on kind of the architectures and so forth. Um, the other thing that I'll mention is actually right now, we're most, like, our policies right now do not have any memory. They only look at the current image frame. They can't remember even half a second prior. Uh, and so I would much rather add memory to our models before we add other sensors. We can have commercially viable robots for a number of applications without other sensors.
- SGSarah Guo
What do you think is the timeframe on that?
- CFChelsea Finn
I have no idea. (laughs)
- SGSarah Guo
Yeah?
- CFChelsea Finn
Some parts of robotics that make it easier than self-driving and some parts that make it harder. On one hand, um, it's harder because you're not just... like, it's just a much higher dimensional space. Like, um, even our static robots have 14 dimensions, uh, seven for each arm. You need to be more precise, uh, in many scenarios than driving. We also don't have as much data right off the bat. On the other hand, with driving, I feel like you kind of need to solve the entire distribution to be... have anything that's viable. Like, you, you have to be able to handle an intersection at any time of day or with any kind of ped- possible pedestrian scenario or other cars and all that, whereas in robotics, I think that there's lots of commercial use cases where you don't have to handle this whole huge distribution, uh, and you also don't have as much of a safety risk, uh, as well. That makes me optimistic, and I think that also, like-
- SGSarah Guo
Mm-hmm.
- CFChelsea Finn
... all the results in self-driving have been very encouraging, especially, like, the number of Waymos that I see in San Francisco.
- SGSarah Guo
Yeah, it's been very impressive to watch them scale up, uh, usage. The thing I found striking
- 26:25 – 28:37
Self driving vs robotics market
- SGSarah Guo
about the self-driving world is, you know, there was two dozen startups started roughly, I don't know, 10 to 15 years ago around self-driving, and the industry is largely consolidated, at least in the US, and obviously, the China market's a bit different, but it's consolidated into Waymo and Tesla, which effectively were two incumbents, right? Google and, and Tesla was an automaker. And then there's maybe one or two startups that either SPAC'd and went public or are still kind of working in the area, and then most of it's kind of fallen off, right? And the set of players that existed at that starting moment, 10, 15 years ago was kind of the same players that ended up actually winning, right? There hasn't been a lot of dynamism in the industry other than just consolidation. Um, do you think that the main robotics players are the companies that exist today, and do you think there's any sort of incumbency bias that's likely?
- CFChelsea Finn
A year ago, like, it would be completely different. Uh, and I think that we've had so many new players recently. I think that the fact that self-driving was like that, um, suggested that it might have been a bit too early ten years ago for... A- and I think that arguably, it was. Like, I think deep learning has come a long, long way since then. Uh, and so I think that that's also part of it. Uh, and I think that the same with robotics. Like, if you were to ask me ten years ago or even, even five years ago, honestly, I, I think it would be too early. Uh, I think the technology, like, wasn't there yet. Uh, we might still be too early, for all we know. I mean, it's a very hard problem, and, like, how hard self-driving has been is, I think is a testament to how hard it is to, to build intelligence in the physical world. In terms of, like, major players, there's a lot of things that I really liked about the startup environment and a lot of things that were very hard to do when I was at Google, and Google is an amazing place in many, many ways. But, like, as one example, taking a robot off campus was, like, almost a nonstarter, uh, just for code security reasons, and if you want to collect diverse data, taking robots off campus is-
- SGSarah Guo
Mm-hmm.
- CFChelsea Finn
... is valuable. You can move a lot faster when, when you're a smaller company, um, when you don't have, uh, kind of restrictions, red tape, that sort of things. The really big companies, um, they have a ton of capital, and so they can last longer. Uh, but I also think that there's... they're gonna move slower, too.
- SGSarah Guo
If you were to give advice
- 28:37 – 29:24
Advice to robotics founders
- SGSarah Guo
to somebody thinking about starting a robotics company today, what would you suggest they do, or where would you point them in terms of what to focus on?
- CFChelsea Finn
I think the main advice that I would give someone trying to start a company would be to try to learn as much as possible, uh, quickly, and I think that actually, like, trying to deploy quickly and learn and iterate quickly, um, that's probably the, the main advice, and try to, yeah, dev- like, actually get the robots out there, learn from that. Um, I'm also not sure if I'm the, the best person to be giving startup advice 'cause I've only been an entrepreneur myself for 11 months, but, uh, yeah, that's probably the advice I'd give.
- SGSarah Guo
That's cool. I mean, you're running an ex- an incredibly exciting startup, so I think you have a full ability to (laughs) suggest stuff to people in that area, for sure. I've heard
- 29:24 – 31:57
Observational data and data generation
- SGSarah Guo
a number of different groups doing... is really using observational data of people as part of the training set. So, that could be YouTube videos. It could be things that they're recording specifically for the purpose. How do you think about that in the context of training robotic models?
- CFChelsea Finn
I think that data can have a lot of value, but I think that by itself, it won't get you very far, uh, and I think that there's actually some really nice analogies you can make where, um, for example, if you watch, like, an Olympic swimmer, swim a race, uh, even if you had their strength, uh, just their practice at moving their own muscles to do the t- to, to accomplish what they're accomplishing is, like, essential for being able to do it, or if, if you're trying to learn how to hit a tennis ball well, you won't be able to learn it by kind of watching the pros. Now, maybe these examples seem a little bit contrived because they're talking about, like, experts. The reason why I make those analogies is that we humans are experts at motor control, low level motor control already for a variety of things, and our robots are not, and I think the robots actually need experience from their own body, uh, in order to learn. Uh, and so I think that it's really promising to be able to leverage that form of data, especially to expand on, um, the robot's own experience, but it's really gonna be essential to, like, actually have the data from the robot itself, too.
- SGSarah Guo
In some of those cases, is that just general...... data that you're generating around that r- robot, or would you actually have it mimic certain activities? Or how do you think about the data generation? Because you mentioned a little bit about the transfer and generalizability. It's interesting to ask, well, what, what is generalizable or not, and what types of data are and aren't, and things like that.
- CFChelsea Finn
I mean, when we collect data, we have... It's kind of like puppeteering, like the original ALOHA work, uh, and then you can record both the actual motor commands and the sensor, like the camera images, and so that is the, like, experience for the robot. And then I also think that autonomous experience will play a huge role, just like we've seen-
- SGSarah Guo
Mm-hmm.
- CFChelsea Finn
... in language models. After you get an initial language model, if you can use reinforcement learning to have the robo- the, the language model bootstrap on its own experience, that's ext- uh, extremely valuable. Yeah, and then-
- SGSarah Guo
Mm-hmm.
- CFChelsea Finn
... in terms of what's generalizable versus not, I think it, it all comes down to the breadth of the distribution. It's really hard to quantify or measure how broad, um, the robot's own experience is. Uh, and there's no way to categorize the breadth of the tasks, like w- how different one task is from another, how different one kitchen is from another, that sort of thing. Um, but we can at least get a rough idea for that breadth by, like, looking at things like the number of buildings or the number of scenes, uh, those sorts of things.
- SGSarah Guo
And
- 31:57 – 35:14
Future robotic forms
- SGSarah Guo
then, I guess, uh, we talked a little about, uh, humanoid robots and other sort of formats. If you think ahead, in terms of the form factors that are likely to exist in N years as this sort of robotic future comes into play, do you think there's sort of one singular form, or there, there are a handful? Is it a rich ecosystem, just like in biology? Like, how do you think about what's gonna come out of all this?
- CFChelsea Finn
I don't know exactly, but I think that my bet would be on something where there's actually a, uh, a, a really wide range of different robot platforms. Uh, I think S- Sergey, uh, my co-founder, likes to call it a Cambrian explosion of different robot, uh, hardware types and so forth once we actually can have the technology that can... the, the intelligence that can power all those different robots.
- SGSarah Guo
Mm-hmm.
- CFChelsea Finn
Uh, and I think it's kind of similar to, like, we have all these different devices in our kitchen, for example, that can do all these different things for us, and... rather than just, like, one device that can, that cooks the whole meal for us. And so, um, I think we can envision, like, uh, a world where there's, like, one kind of robot arm that, um, does things on, on the kitchen that has, like, some hardware that's optimized for that, uh, and maybe also optimized for it to be cheap for that s- particular use case, um, and another hardware that's kind of designed for, for, like, folding clothes or something like that, dishwashing, those sorts of things.
- SGSarah Guo
Yeah.
- CFChelsea Finn
This is all, like, speculation, of course, but I think that a world like that is something where, um... 'Cause yeah, it's, it's, I think, different from what a lot of people think about.
- SGSarah Guo
In the, in the book The Diamond Age, there's, there's sort of this, uh, view of, like, matter pipes going into homes, and you have these 3D printers that make everything for you. And in one case, you're like downloading schematics and then you, you 3D print the thing, and then people who are kind of bootlegging some of this stuff end up with almost evolutionarily based processes to build hardware and then select against certain functionalities, the mechanism by which to optimize things. Do you think a future like that is at all likely, or do you think it's more just, hey, you make the foundation model really good, you have a couple form factors, and you know, you don't need that much specialization if you have enough generalizability in the actual underlying intelligence?
- CFChelsea Finn
I think a world like that is very possible, and I think that you can make a cheaper hardware, piece of hardware, if you are optimizing for a particular use case, and maybe it'd be, like, also be a lot faster and so forth. Um, yeah. Obviously, very hard to predict.
- SGSarah Guo
Yeah. It's super hard to predict, because one of the arguments for a smaller number of hardware platforms is just supply chain, right? It's just gonna be cheaper at scale to manufacture all the sub-components, and therefore you're gonna collapse down to fewer things, because unless there's an, uh, dramatic cost advantage, those fewer things will be more easily scalable, reproducible, cheap to make, et cetera, right? If you look at sort of general hardware bridges. So it's an interesting question in terms of that trade-off between those two tensions.
- CFChelsea Finn
Yeah. Although maybe we'll have robots in the supply chain that can manufacture any customizable device that you want.
- SGSarah Guo
It's robots all the way down. So that's our future. Yeah. (laughs) Well, thanks so much for, uh, joining me today. It was a super interesting conversation. We covered a wide variety of things, so really appreciate your time.
- CFChelsea Finn
Yeah. This was fun. (instrumental music plays)
- NANarrator
Find us on Twitter @nopriorspod. Subscribe to our YouTube channel if you wanna see our faces. Follow the show on Apple Podcasts, Spotify, or wherever you listen. That way, you get a new episode every week. And sign up for emails or find transcripts for every episode at no-priors.com.
Episode duration: 35:14
Install uListen for AI-powered chat & search across the full episode — Get Full Transcript
Transcript of episode AzqsJk1f12k
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome