Y CombinatorQuan Vuong: How Open-X Proved Generalists Beat Specialists
Open-X cross-embodiment training showed generalists beat specialists by 50%; cloud-controlled inference removes the on-device bottleneck for robotics startups.
EVERY SPOKEN WORD
50 min read · 9,513 words- 0:00 – 0:41
The new robotics startup equation
- QVQuan Vuong
The equation, I think, for starting a robotic business has changed and will continue to change at an accelerating pace because the upfront cost is not that high anymore.
- JFJared Friedman
Everyone's sort of spending a lot of time in the digital world, and it feels like, you know, now is the time to start thinking about the world of atoms.
- JFJared Friedman
You literally just gave people the playbook for how to build a vertical robotics company.
- QVQuan Vuong
This has really been our mission from the start, is to create that Cambrian explosion.
- JFJared Friedman
It still, like, blows my mind. I didn't know if this would exist even in my entire lifetime.
- JFJared Friedman
[upbeat music]
- 0:41 – 3:05
Intro: GPT-1 moment for robotics
- JFJared Friedman
Welcome back to another episode of The Lightcone. Today, we have a very special guest, Quan Vuong. He's one of the co-founders of Physical Intelligence, which we think might be the robotics AI lab that brings about the GPT-1 moment for all of robotics. Quan, thank you for joining us.
- QVQuan Vuong
Pleasure to be here. Has been a long time admirer of YC, and our mission is to build a model that can control any robot to do any task that it's physically capable of, and to do so at such a high level of performance that's gonna be useful to people in all walks of life. And so GPT-1 for robotics, you know, what is it? You know, is the ChatGPT moment for robotic real? Our perspective here is that, um, we want to build a model that's really intelligent. We want to build a platform that allow us to externalize that intelligence to the rest of world and allow them to use it to build very interesting application in all sorts of vertical and robotics. And w-we think that it's gonna be more like a peeling an onions analogy, where you start from a really strong base model that have all sorts of common sense knowledge and already works to some extent on your robot. Um, you have then a mixed autonomy system, uh, very similar, for example, to a autonomous driving car today. Um, and then you actually deploy that system to do a real job. That system might make mistake. Um, it's okay. Um, and then over time, by actually exposing the system to the complexity and the edge case of the real world, that system get incrementally even just slightly better over time every day. Um, and you know, one day you wake up, and you suddenly have a system that is just fully autonomous and just provide tremendous value.
- GTGarry Tan
It might be helpful to give the audience a bit of a mini history lesson on why robotics is so hard, and there's been a lot of breakthroughs in the last two years. And I mean, just to simplify, the robotics problem is three pillars, semantics, which I think we got a lot of analogs in with language models that somehow we ported into robotics, then you have the planning, and then the last thing is control, which needs to be done in real time and interact with a environment that changes. Walk us through the seminal papers that a lot of the team of Pi Robotics publish that gave you the inkling that the GPT-1 moment is near, and that started in twenty twenty-four.
- 3:05 – 6:17
How AI unlocked robotics (RT-2, PaLM-E)
- QVQuan Vuong
Yeah. So the dream to build general purpose robot, like, robots has been a long time dream, I, I think in humanity. Like, you know, we're not the first to say that our mission is to build a model that can work on any robot. Um, and we're really fortunate to be in this moment in time in history where we feel that it's possible. To kind of walk back a little bit, um, a few years before, there was, I think, the first is Seicon, which to me was the first demonstration of language model and how you can bring all of the common sense knowledge in language model into robotics, and therefore that significantly kind of, uh, reduces the need to collect robot-specific data. So for example, if you have a task of, oh, I wanna go to the YC office to record a podcast, you know, what are the step I need to take? You can ask a language model, you know, "Just show me the step and show me the plan." Um, and that worked incredibly well. Um, and then the way kind of language model infiltrate, if you will, in robotic is it start at the planning level, uh, at the semantic level. And then, but there's still the control problem. You know, at the end of the day, you still need a mechanism to convert the plan into low-level action that can actually actuate the robot. And that bring us to PaLM-E, and that bring us to RT-2, which stand for Robotic Transformer 2. And what this two work really show is that if you start from a vision language model that is really powerful, and you kind of use robotic data to adapt this model to speak robot language, if you will, um, then you see a lot of transfer from the kind of knowledge that exists in the language mo- in the vision language model down to the low-level action. Like, um, one of my favorite example when we did the RT-2 project was you can have picture of celebrity on the table. You have a picture of Taylor Swift, you have a picture of the Queen of England, and you can ask the robot, you know, "Pick up the Coke can and move it to Taylor Swift," even though the concept of Taylor Swift, it just doesn't com- exist in the robot data at all in that work. You can do other examples, such as, um, kind of spatial reasoning that doesn't exist in the robotic data at all. Like for example, move the dinosaurs next to the r- like, red car, and these are all just completely unseen object in robot data. And so that was RT-2, and that was PaLM-E. Now, RT-2 and PaLM-E are single embodiment, um, exercise, which-
- GTGarry Tan
Just for the audience, single embodiment meaning it worked for a very specific robot.
- QVQuan Vuong
It worked for a very specific robot. In robotic, you can ask the question, how do you scale? Um, especially how do you scale data collections? And one of the insight that we had back then was, you know, maybe the data from one robot is not that different from another robots anyway. If you have enough robots in your training data, maybe what the model learn isn't to control one specific robot. The m- what the model learn is something that's more abstract, which is, how do I kind of learn a general notion of what it means to control any particularrobotic platform, and therefore I will be better at controlling any particular platform. And that bring us to what we call, uh, open cross-embodiment and robotic transformer X.
- GTGarry Tan
That was a
- 6:17 – 9:12
Breakthrough: multi-robot scaling (Open-X)
- GTGarry Tan
big paper because it was the first that showed potential scaling laws that applied to robotics, because now you could start training all these models across multiple kinds of hardware, not just one, which has never been done in robotics ever before, because from all the research labs, they would all train with a very specific set of sensor actuators and motors, and it was all very finicky with that particular hardware, right?
- QVQuan Vuong
Yeah. One of the really interesting result from, um, open cross-embodiment, and let me provide the context here, is that you can take, let's say, 10 different robot platform, collect data from them, train a policy, and really optimize the policy to work well on that platform. Um, so let's say, you know, you have that, you have 10 different platform, 10 different policies, and now if you simply take the data and absorb it into a model that is high capacity enough to really absorb that data, and you can compare, uh, you have this generalist, right? That learn to control how to te- the 10 different robot. You can compare it to the specialist that has been optimized to work well on a particular embodiment. How does it compare? And the, the in- interesting result from Open-X is it was 50% better.
- GTGarry Tan
Wow.
- QVQuan Vuong
Um, and that was really surprising-
- GTGarry Tan
Wow
- QVQuan Vuong
... because in robotic, it's hard enough to get your model to work on one particular robot platform. And one of the reason why I say that we're really fortunate to be in this moment in time in robotic is because Open-X was really only possible because of the support that we received from the ro- robotic community. It was a huge collaboration across the robotic community, and the, the reason why that's really important is there is this joke in, like, robotic grad school that, you know, if you wanna add two years to your PhD, just work on a new robot platform.
- JFJared Friedman
[laughs]
- QVQuan Vuong
You know, by that logic, if you wanna have 10 robot platform, that's 20 years, like-
- JFJared Friedman
Why is that? It takes, like, a year or two to just get the platform, um, up and running to d- even collect the data.
- QVQuan Vuong
Yeah.
- GTGarry Tan
Is it fair to say that the data set that was created from embodiment X is similar to the scale of an impact that ImageNet did for Vision? Because it was huge, and it was the first large data set across multiple hardware, huge collaboration and-
- QVQuan Vuong
I still think that ImageNet was more impactful in the vision community, and the reason for that is, um, a few. The first is that ImageNet also allow for reproducible evaluation, right? Um, you know, Open-X as an effort was more about making data available for kind of people to use, and evaluation is a really difficult problems, um, in robotic that Open-X did not solve. Um, and the second is I think Open-X is a drop in the bucket at this point in the robotic community. Um, if you measure in the kind of the scale and the volume and the diversity of data that the community is collecting, I think Open-X at this point is a drop in the bucket.
- 9:12 – 13:10
The real bottleneck: data
- JFJared Friedman
I mean, I guess we started talking about sort of GPT-1, but even GPT-1, you know, that was sort of this moment where you can prove, you know, Alec Radford figured out that there was a neuron based on a very specific input and output, um, and then that allowed the scaling laws to sort of take hold. The biggest problem in robotics I've heard is basically actually exactly what we've been talking about, is, like, it's the data problem. You know, uh, language you could bootstrap off of, like, you know, the sum total of what you could get off the internet, which is actually quite a lot. Can you give us, like, a sense for, um, like, scale? Is it, like, petabytes? Like, you know, what do you think is necessary as an input to, you know, the true GPT-1 of robotics?
- QVQuan Vuong
Yeah. So the data scarcity problem in robotic, there's a few way to look at it. The first way is that it's really two problem in disguise. There is the generation, data generation problem, and there's data capture problem, and the difference is that the data capture is that there might already be lots of robotic data that is being generated, but there's just never been really an incentive to capture it, to make it easy for digestions in training. Um, and that's one of the goal that Open-X was trying to solve, which is if you have robotic data, it's a really good idea to capture it and make it possible to train on. The second way to look at it is that robotic is very different from language model. There is not a internet of robotic data that you can use, and so you see this kind of very operationally heavy effort to collect data, and that's the question of is it going to scale? Well, the way that I look at it is let's take the US GDP, 24 trillion US dollars. Let's say if we actually solve robotics, a model that can control any robot to do any task, napkin math maybe contribute 10% to ro- US GDP. Well, that's already a massive number, um, and I, I think that promise is one of the reason that warrants the investment into data collections, um, in robotics. And the third way to look at it is we're very focused on cross-embodiment, and cross-embodiment, there is the data collection aspect of, uh, as well, which is to really make sure that your model and your organizations and infrastructure are set up to consume data from many different sources of, uh, of robots, and that actually allow you to scale easier. For example, I, if I were to contrast our approach compared to, let's say, a company that have a particular hardware platform that they optimize for and they scale, um, it, it's not an approach that have really allowed people to scale, um, becauseIt's just much harder to figure out how do you manufacture like a thousand unit of something for now compared to making sure that you yourself are ready to absorb data from like a thousand different types of robot that are already in there, in the community.
- JFJared Friedman
I mean, it's a crazy problem, isn't it? I mean, the hardware itself, even within the same design of embodiment, if there's a hardware run that goes awry or like one of the servos is slightly different, like you see it in the data, right?
- QVQuan Vuong
Yeah.
- JFJared Friedman
And then how do you control for that?
- QVQuan Vuong
Yeah. So I think we were doing kind of like an inventory of robot in the company, and we were so shocked to find that there are no robot, no two robot platform that are the same. And if you ask people in the robot community, sometimes there's debate about multi-robot versus single robot, and the argument is that, you know, single robot is simpler to scale. And actually, that's not how it plays out in practice. Like how it plays out in practice is even if you have a single robot that you're optimizing for, over time, that platform is gonna drift. You know, maybe you wanna make hardware change or you have software change. You end up in a situation where it's much harder for you to reuse old data because, you know, in machine learning, if you wanna generalize from a distribution, you would like many sample from that distribution. And if you just have one robot platform that have a major change every three months, maybe you have a few data point from that distribution. Um,
- 13:10 – 16:01
Emergence: zero-shot robot skills
- QVQuan Vuong
whereas if you start from the hypothesis that if you have many robot platform in your fleet, your model is gonna learn something more abstract, which is, "How do I control a robot, not any particular robot?" Then the model will be able to ingest data from, you know, a slightly different robot better.
- JFJared Friedman
Yeah.
- QVQuan Vuong
And actually, we're starting to see emergent property in this kind of robot la- large foundation model-
- JFJared Friedman
That's good news
- QVQuan Vuong
... we're, we're doing, where you start to see, like, interesting transfer be- between different, um, data sources. For example, today it's possible to perform tasks zero-shot, zero-shot meaning you don't collect any data, and these are the tasks that last year might have required like hundreds and hundreds of hours of data collection.
- JFJared Friedman
What are some examples?
- JFJared Friedman
Yeah. Do we have any videos we can see that like show it?
- QVQuan Vuong
So, you know, um, I get, might get some flack when I come back because this is not published result. Hopefully, this will come out soon. Um, so, you know, I wanna reserve the excitement for that, and I'm kind of-
- JFJared Friedman
Fair enough
- QVQuan Vuong
... like building up the, the, the, the, the excitement a little bit. Um, so hopefully this will come out soon.
- JFJared Friedman
All right.
- QVQuan Vuong
But these are not simple tasks. These are like actually difficult tasks-
- JFJared Friedman
Yeah
- QVQuan Vuong
... that just last year required like hundreds of hours of data collections.
- JFJared Friedman
You hear it here on "Lightcone" first that [laughs] there's some emergent properties that are gonna come out of Pi shortly.
- JFJared Friedman
Can you give us a sense of like the flavor of the tasks?
- QVQuan Vuong
It's really easy to fool yourself, and so we wanted to test across like few different tasks of different flavor, a task that require precision, task that require reasoning with multiple objects in the scene. It all seems to have this property, um, that's really nice. So it, it does seems like that's something that's kind of a more general property that emerge rather than we just, you know, got lucky-
- JFJared Friedman
Oh, okay
- QVQuan Vuong
... and suddenly the model start working on one particular task.
- JFJared Friedman
Could you help us understand where we are now in terms of like what's working and how well it's working? Like we're not quite at the ChatGPT moment yet. Like where are we? And I think you brought some videos that you were gonna show us to like help everybody visualize what the current state-of-the-art actually looks like.
- QVQuan Vuong
I think where we are is I think if you have a task where it's okay for the robot to make a mistake, um, and it's possible for you to set up a mixed autonomy system where you have a person that takes over when the robot make a mistake and provide corrections, it is possible to get to a level of performance where it start to make sense to think about scaling robot deployment. And the example that I specifically wanna highlight here is this blog post that we did with Weave and Ultra, and, you know, it's great that these are als- uh, both YC company. I wanna provide a little bit of context here first. The, the context is that Pi is a primarily research organization. We wanna focus on building the best model, um, but we also want to not be tunnel vision. We wanna make sure that the model that we built actually gonna be useful and actually perform tasks that people in society cares about.
- 16:01 – 22:21
Real-world demos: laundry & warehouses
- QVQuan Vuong
And one of the really good way for us to do so is to partner really closely with company that wanna get robot out there today. And the way that this relationship work is that we treat each other like we're on the same team, very free flow of information, um, and we design a system that try to get the best possible performance for the tasks that this company care about. So let me talk about Weave first. What you're seeing in this video is a system that we built together, folding really diverse item of laundry in a real laundromat in the mission. You can see, you know, people walking outside. And why this task is difficult is because there's just infinite possibility of observation space. Like, you know, um, clothings are deformable, and no two item of clothing here are the same, and these are also unseen. You know, these are not like clothing item that are seen in the training data.
- JFJared Friedman
Yeah, I love this team. They are some of the most cracked people out of Apple I've ever met.
- JFJared Friedman
Garry was the partner. We maybe-
- JFJared Friedman
Yeah
- JFJared Friedman
... wanna like explain like what Weave is and what their like, what their like company is.
- JFJared Friedman
Yeah, I mean, they're actually tr- you know, shipping their first robots into the home. Uh, we sort of talked about it as, you know, being able to do household tasks like this, and I think they were very inspired by Physical Intelligence's first demos with, um, with laundry folding. So it's actually a total trip to hear about it, you know, a bit, a year ago we were talking about them doing it, and then now to see them do it working in hand, hand-in-hand with you is really awesome. I think this is a great example of like, you know, you need the model smarts, you need the data collection, and then the hardware and, um, the sort of system integration all working together is just hard to nail, so.
- QVQuan Vuong
Yeah. And to get back to your question about why robotic is hard, it's really, it's, it is a really hard system problems. Um, like you need everything to work well and work well together to get this result, and like Weave is such an incredible team for us to, to, to work with to, to get this result. And it actually didn't even take us that long to get this result. It was roughly, well-We set a goal, and maybe it was, like, two weeks afterwards where we got-got a model that was... got a model and a system that was good enough at performing this task.
- JFJared Friedman
It still, like, blows my mind to see a robot actually folding laundry because I remember until... basically, until ChatGPT, I didn't know if this would exist even in my entire lifetime. 'Cause, like, folding laundry, I mean, it's, it's always been, like, the Turing test for robotics because there's no way to, like, deterministically program a system the way that you did, like, pre-AI to do this 'cause the space is, like, so infinite. And, like, we've shown that it's possible for us to do... Like, basically, if a robot can do this, like, robots will be able to do everything. It's only a matter of, like, improving it from here.
- QVQuan Vuong
There was a funny story where, um, when we first published Pi Zero, people thought of us as the laundry company.
- JFJared Friedman
[laughs]
- QVQuan Vuong
Because the demo was just focused on laundry, and actually picking home tasks, especially tasks that has to do with deformable object, is a very intentional choice on our end. We're not just after the home. We really wanna make it broadly applicable. But picking home tasks for us to start with has a few benefits, like, one, it's relatable. You know, you can see the laundry folding demo, and you can kind of, like, grok how this is gonna be useful, and you can get a sense of why it's hard. And the second is that it's really easy to set up to test generalization.
- GTGarry Tan
You can talk about, uh, Ultra, which is your company, Jared, a demo of it.
- QVQuan Vuong
Yeah, this is Ultra. The thing that I love about this video is you see, you know, it's bright outside, and you see this is 4x speed, and it's 100 minutes. If I scroll to the end, the sun has set.
- GTGarry Tan
Oh, wow.
- JFJared Friedman
Ah. [laughs]
- GTGarry Tan
That was one of the-
- JFJared Friedman
That's cool
- GTGarry Tan
... big problems in robotics, where it would be so sensitive to the environment and lighting and mess up the vision system, the semantics and part of it.
- QVQuan Vuong
Yeah, and the interesting, uh, thing here is that it is possible to get to the level of autonomy that the per robot is just performing the task. This is autonomy at scale. Like, this is ready to be scaled.
- JFJared Friedman
Quan, because this task is less familiar than laundry folding, do you wanna explain what the robot is doing here and what Ultra is, like, doing as a company?
- QVQuan Vuong
Ultra is a company that wanna makes it really easy to adapt robot to, you know, new tasks. Um, and right now they are focusing on logistics space, which is really important because, you know, there's lots of labor shortage in logistic. And the task that we focus on together here is, you know, if you order an item from Amazon, you sometime get this soft pouch that item gets shipped from, and the task here is you have a tray of these items here, and the robot is supposed to pick one of them at a time and place it inside this pouch. The machine would then close it and then pick up the pouch and put it, um, on the left here to be ready for shipping. Now, this hard is hard because there are many different types of object that can be in this tray, and the opening here is actually very narrow, so you see this interesting example of the robot kind of nudging the item to go into the pouch, and that's, that's really hard. Like, that require-
- GTGarry Tan
[laughs]
- QVQuan Vuong
... very good understanding of the scene and, like, very precise motion to nudge the object into the pouch. Um, the other thing that's hard about this task is the level of autonomy that's require. Like, this is running for an entire day. There is still human intervention, I wanna say, in, um, this, like, full-day operation, um, but the level of intervention is actually quite minimal.
- JFJared Friedman
This is not just, like, some, like, demo station, right? This is actually recorded in an actual e-commerce warehouse where they're actually shipping real products to real customers. This isn't just, like, a, like, a lab.
- QVQuan Vuong
This is packaging real customer, uh, real c- order for customer to be ship out in a real warehouse. So this is real operations.
- JFJared Friedman
So I think this is really cool because I, I think when people think about robots, they tend to think of the consumer use cases like Weave 'cause that's, you know, what we're familiar with in our daily life. What I find really interesting is that there's, like, a million applications like this Ultra thing that you wouldn't think of as obviously like, oh, who packs the, like, soft pouch of things that you get from, like, Amazon? Well, there's some person, like, who does that, and this is, like, a job that we can now build a robot to do.
- QVQuan Vuong
The interesting thing about the approach is that
- 22:21 – 23:16
Robotics becomes a data + ops problem
- QVQuan Vuong
you're converting it from a very difficult engineering problem into a operation problems of how do I identify the use case and how do I collect the right data, which is, in some sense, more scalable because you can build the system that allow you to collect data from many different tasks. So, you know, it is, it's now a problem of how do I scale data collection rather than, you know, for every new task, how do I design a really difficult engineering system to solve it?
- GTGarry Tan
YC Startup School is back. We're hand-selecting the most promising builders in the world and flying them out to San Francisco for July 25th and 26th to discuss the cutting edge of tech. Apply now for a spot. Okay, back to the video. I think one thing that the audience may not know is that you have a very unique technical insight that in the past, robotics folks would have kind of gasped and be shocked. Because robots need to run in real time, a lot of times all of
- 23:16 – 29:03
Cloud-controlled robots (big unlock)
- GTGarry Tan
the compute runs in, on device. But you guys have done something very different. Can you tell us more about that so that this works in, in, in real time with large models and, and really well?
- QVQuan Vuong
So the context here is that, you know, we talk to many companies that would like to deploy robots, and one of the first question we get is, "What compute unit should we get on the robot?"
- JFJared Friedman
[laughs]
- QVQuan Vuong
You know, it's expensive. It's gonna increase the BOM cost, and they're worried that it's gonna go out in fashion very quickly 'cause the model change, the model gets bigger. How do I make sure that the hardware that I'm gonna commit to today is gonna be viable for, you know, a couple of years? It's very difficult questions. People are often really surprised when I tell them that almost all of the robot evaluation that we run at Pi today, including the really complicated demo that we have shown, making coffee, folding laundry, mobile robots navigating around, the model actually hosted in the cloud. Um, and you know, this is not, like...A cloud isn't a server in the office, it's a real cloud. The model is hosted in a data center somewhere, and within this high-frequency control loop that, um, is controlling the robot, the robot is actually querying an API endpoint that hosts the model, sending it images and language command, and getting back action that then execute directly on the robot. And this is surprising because of precisely the reason that you mentioned. You know, how do you actually make it work? This is why it's really important for Pi to couple system hardware and model development and research, like, very tightly to-together because, like, it allow us to solve for this problem. So for example, one of the insight we have here is that you can actually bury the inference time within the robot control loop because, you know, if I'm a robot, I have enough action for me to execute for the next 100 milliseconds. Like, there's no reason for me to wait until I finish executing that action to ask my model for a different action. You know, I can do it as fast as, um, inference essentially. Um, and so, you know, maybe when I only have 50 milliseconds of action worth left, I can ask for the next sets of action, and when the current 50 mill- millisecond is over, like, I have something that's ready for me to continue with, you know, my next 100 milliseconds. Um, so that's one of the insight. The other, uh, kind of algorithmic improvement, um, we, we refer to them as real-time chunking. Design inference in such a way that you know there's gonna be a delay in how long it takes to query the model on the cloud, basically. Like, the problem here, if I get a, a little bit more technical, is an action chunk is a sequence of action that I can execute on the robot, so, you know, it's not just one action. And if I have an action chunk that I can execute for 100 millisecond, and 50 milliseconds in I wanna predict another action chunk, and I'm gonna transition to that new action chunk after my current 50 mill- millisecond is over, how do I make sure the two are consistent? Like, you know, how do I make sure that if I'm moving this way-
- JFJared Friedman
Mm-hmm
- QVQuan Vuong
... the next action chunk is gonna continue me, c- to allow me to continue to be smoothly moving this way?
- JFJared Friedman
You can pre-compute.
- QVQuan Vuong
Yeah, you can pre-compute, and, like, that's one of the algorithmic improvement that we've made to make inference using model hosted in the cloud possible.
- JFJared Friedman
I studied computer engineering, so I'm not really a, an algorithms person, but when it comes to systems like that, like pipelining, like get me all over that. That sounds great. [laughs] That's so interesting.
- GTGarry Tan
I mean, this simplifies... It's kinda, it's a brilliant choice because it simplifies so much of the system for the robots. You don't need all these clunky, I don't know, people have two operating systems at some times for, for robots, embedded RTOS, and then the regular one, and all these complex giant compute and power. And this is what the initial versions of, uh, Waymo used to run basically a server on the trunk, and you can't afford to do that with general day robotics, which is brilliant that you figured out how to do it.
- JFJared Friedman
Yeah, you don't have to. I mean, you can do things... Some of it, there obviously has to be some compute there, but a lot of the compute can happen elsewhere, and then is there-- There must be a video-- Like, this, this thing that we're looking at in the top left, like, how much of that is sort of, like, video feedback? How much of it is, like, local processed? I mean-
- JFJared Friedman
Yeah, is there any compute locally on this robot, or is it just, like, a dumb, like, video camera that streams data to the cloud?
- QVQuan Vuong
For this, I am not 100% sure, but I am inclined to believe that it's just a dumb computer. Like, for this specific video, um, I don't remember, but I'm just 100% confidence that we can make this work with a dumb computer and a robot. And the, one of the interesting thing about our collaboration with Weaven Ultra is, one, I've never seen their robot in person.
- JFJared Friedman
[laughs]
- GTGarry Tan
Oh, wow.
- QVQuan Vuong
Um, two is I have very little idea about how their robot actually works.
- JFJared Friedman
Interesting.
- QVQuan Vuong
Um, and that's a very intentional choice. Like, I wanna stay away from-
- JFJared Friedman
Huh
- QVQuan Vuong
... from that as far as possible. I also don't know how they collect data. Like, I intentionally don't ask them this question to understand whether it's possible for an organization like Pi to parachute into their existing system and to work really closely with them on the thing that actually matters to get the system to work and not have to learn about how they've set up their system. Because in a way, that's, like, a more scalable recipe.
- GTGarry Tan
Yeah, you completely decouple a lot of the hardware control loop choices from the, from the semantics and planning, which it just works. Just brilliant.
- QVQuan Vuong
Yeah, it... I, I mean, I'm really surprised that-
- GTGarry Tan
It works. [laughs]
- QVQuan Vuong
Um, when, when we started the company, we thought that real deployment is gonna be a con- is only gonna be in a conversation like five years, um, into the life of the company.
- JFJared Friedman
Hmm.
- QVQuan Vuong
'Cause the problem is it's really hard, and we're two years [laughs] in, and, you know, this is the result that we, we have. And, and real, like, deployment and scaling
- 29:03 – 32:33
How to start a robotics company today
- QVQuan Vuong
the r- number of robot is a really serious consideration today. And so the pace of progress has just been very pleasantly much faster than we expected originally.
- JFJared Friedman
Often on this podcast, we talk about, like, what all this means for startup founders. I, I think that might be an interesting question for us to explore here. So if you imagine someone was listening to this podcast, maybe they're, like, a college student that's studying computer science, and they think robots are really cool, and they wanna do something like this. How should they get started, and what are the skills that they need? Do they need to be a mechanical engineer to be able to build a robot like this? Can they just buy an off-the-shelf, like, robot arm and camera s- system and, like, what-
- JFJared Friedman
And load Pi and-
- JFJared Friedman
Yeah
- JFJared Friedman
... you know, you're off and running in, like, a day.
- JFJared Friedman
You're off and running. Yeah.
- JFJared Friedman
Yeah.
- QVQuan Vuong
Before I actually answer your question-
- JFJared Friedman
[laughs]
- QVQuan Vuong
... let me provide a few more context. The first is that robotic is traditionally really hard because it's an extremely vertically integrated business. You need to have your own customer relationship, your own hardware, your autonomy stack, your own safety vert- ce- certification, your own everything, and the barrier to entry is just really high because of that. And one of the thing that we're trying to change is that we're trying to provide aFoundation of physical intelligence that the community can build on top of that allow them to onboard autonomy onto their robot and their task much quicker than before. So that's the first. You know, we want to provide that kind of seat of intelligence that allow people to move much faster so that they can, you know, focus on other problems. Um, the second thing is that the... I think the recipe for starting a vertical robotic business today is, one, have a really good understanding of the existing workflow because the robotic system needs to fit into existing workflow. And the second is to be very meticulous about identifying where the opportunity is. You know, if there is a workflow that need X number of work today, you know, where is the robot when you insert it, it's gonna make the biggest difference. And two is to really be scrappy when it comes to hardware and data collections. You don't need a incredibly expensive robot that is capable of very precise motion today to be able to do this task. And the reason why, why is this, why is this model really reactive, and so they can compensate for some of the inaccuracy in the actual robot movement and to ensure that you have the ability to collect data and to run evaluation, especially evaluation in real deployment. The next step after that is to get a mixed autonomy system that allow you to get to the point where it's break even or-
- JFJared Friedman
Like break even economically.
- QVQuan Vuong
Break even economically. Because the, the reason why that's important is because it allow you to then scale the number of robot.
- JFJared Friedman
Okay. 'Cause if you lose money on every robot, it's very hard to scale.
- SPSpeaker
That has been historically one of the biggest challenges for robotic companies as they go into growth stage. It's just the payback CAC period is just doesn't make sense.
- QVQuan Vuong
Yeah. So the equation, I think, for starting a robotic business, um, has changed and will continue to change at an accelerating pace because the upfront cost is not that high anymore. And now, you know, what is the upfront cost? The upfront cost is much cheaper hardware, ability to collect data, ability to collect, um, evaluation, and ability to kind of like understand the use case to see where they should insert the robot. You know, it's not about having incredibly expensive hardware, it's not about having your own proprietary, I think, autonomy, classical, uh, stack anymore to be able to do this
- 32:33 – 43:53
The coming explosion of robotics startups
- QVQuan Vuong
task. Um, and so it, it allow company to focus on the component that will actually allow them to differentiate themselves from the rest of the space.
- JFJared Friedman
Now that you've sort of unbundled it and you no longer need to build this fully vertically integrated company in order to build a robotics company, are we on the precipice of a Cambrian explosion of vertical robotics companies, where there's gonna be like a thousand companies like Ultra going after, you know, every like menial job in the economy and like getting a deep understanding of the customer, building a robot that can solve that problem, doing a like mixed human m-machine deployment until it like can run fully autonomously and building a company in, in every sector? Is that, is, is that the future that you see people building on top of PI?
- QVQuan Vuong
It's funny that you mention Cambrian explosion because when we wrote this blog post, there was that term that's very kind of like hotly debated. We are, I think, academics at Hert, and we want to be kind of very measured when we communicate. But, you know, myself personally, I believe there's gonna be a Cambrian explosions of, um, robotic company across the entire world, across many, many different vertical, um, just because it's just so much cheaper to build and it doesn't require, um, you know, someone with 20 years of experience in robotic to start anymore. You know, it require someone that is really scrappy, that can move really quickly, um, can do the system integration, um, can understand customer, what they want to start the deployment.
- JFJared Friedman
I mean, what's coming up for me is obviously we work with a lot of robotics companies and meet a lot of founders, and it feels like there's this continuum. Um, one is to use an analogy to comp-- you know, personal computing. You could argue that industrial robotics today is basically like mainframe-
- QVQuan Vuong
[chuckles]
- JFJared Friedman
... or a mini computer level. Like, you know, if you look back in the '70s, huge public companies like Digital Computer that, you know, just did like these sort of very, very expensive deployments, but like they were very, very specialized and it was all extreme enterprise. Like, you know, the idea of a personal computer was ridiculous, right? You know, it took the Altair and then Apple I and Apple II and then IBM PC XT to like create personal computing. And then like the traditional advice for robotics for many years is like go after like dirty and dangerous. And then of course those are sort of the industrial cases. Like you, you know, you have these giant Tesla robots in the Gigafactory and things like that. It feels like what you said around profitability is really, really big. So, you know, does that mean that the people who do the vertical robot Cambrian explosion sort of moment, uh, or the, the people who are sort of first in that, like, it sounds like they would be the first to be profitable and not dirty and dangerous.
- QVQuan Vuong
I think this is already happening today. I think, um, we have the fortune of having lots of visibility into the robotic community because, um, you know, people would like to talk to us, people would like to learn, you know, what it's like to build a foundation model for robotic, and people would like to know, "How do I get the same level of autonomy?" And there's so many companies and businesses that we talk to that would love to put a robot into that space that, you know, it's okay for the robot to make a mistake, and they just need it so much. I really believe that rec- the, the recipe that I mentioned earlier of identify where the robot can fit in, focus on cheaper hardware, collect data, run evaluation, mixed autonomy, break even, scale robots, will work across many different vertical. And I, I'm, I'm seeing it play out today, and it's just incredibly exciting to see.
- JFJared Friedman
And this is pretty cool that you literally just gave people the playbook for how to build a vertical robotics company.
- QVQuan Vuong
Mm.
- JFJared Friedman
Like this is a playbook that could possibly be followed successfully hundreds or thousands of times.
- QVQuan Vuong
And the reason why I want to mention it is because I, I do wanna see that Cambrian, Cambrian explosions, and we want to help enable it. You know, for Pi, if, if we talk about why Pi is going to fail, it's probably going to be because the problem is just way too hard. You know, maybe it take fifty more years to solve the robotic problem and, you know, not couple of years, five, ten. Um, and so we want to enable the community. We want to accelerate progress, and that's why we're very open. Like we publish our research. We open source Pi Zero and Pi05. And people are sort of shock when they ask me, you know, "Is there any difference between Pi Zero and Pi05 that you open source versus the model that we use internally, Pi Zero and Pi05?" And the answer was, "Actually, no, it's the same model."
- JFJared Friedman
Mm.
- QVQuan Vuong
Like the pre-trained model weights that you're using that we open source is also the pre-trained model weights that our researcher internally use for Pi Zero and Pi05. And so we really wanna help accelerate progress in the community, um, and to create that Cambrian explosions.
- JFJared Friedman
Yeah, that's very inspiring. I mean, I feel like that's, uh, everyone's sort of spending a lot of time in the digital world, and it feels like, you know, now is the time to start thinking about, you know, the, the world of atoms. And, uh, this is sort of the perfect mix of actually like, you know, how do you take electrons and turn it into abundance in the at, you know, atoms world. And I think about Dario Amodei's essay, um, All Watched Over by Machines of Loving Grace, and when you really think about the perfect manifestation of that, it's not like, you know, perfect, uh, agents that look over you just like in the electronic world. It's, you know, actually something a little bit more akin to what we're seeing here.
- QVQuan Vuong
Yeah. And this has really been our mission from the start is to create that Cambrian explosion. Um, and you know, this is why we choose to focus on the model because we believe that is the bottleneck to just really make robot useful across many different, different tasks in the world. And that's why we also focus on cross-embodiment. You know, success for us is not defined as only our model on our robot performing tasks that is useful. The, the surface area for success is actually much larger, which is our model performing really useful tasks on somebody else robot out there, maybe that we, we don't even know what that robot is like, in a way that's like useful to the end consumer.
- SPSpeaker
Could we maybe talk a little bit about, um, like the humans behind the robots here?
- QVQuan Vuong
[laughs]
- SPSpeaker
Like, um, how did the company get started? Like, who are the, who are your co-founders? How do you all get together? And what skills do you each bring to such a complex problem?
- QVQuan Vuong
Sometimes the joke I make here is that the human behind the robots are also robots. [laughing] Not, not really. Um, yeah, so Pi is a very, I would say, untraditional k- company. We have a, like, larger than average founding teams, and some of us work really closely together when we were at the robotics team at Google. And the robotics team at Google was, I think, a really, really great environment for seeing the sign of life and creating the relationships and the community that allow the robot community and, like, these advances to flourish. There is Laki, uh, which w- we met when we, uh, were thinking about starting the company and has just been really instrumental in making sure that we're a good business. And there is Adnan, our hardware lead, um, that came over from Android. And Adnan has a really difficult job because if you want to work on cross-embodiment, you know, remember my, uh, joke about how if you wanna add two years to your grad school-
- SPSpeaker
[laughs]
- QVQuan Vuong
... you bring on one more robots. The, the hardware problem and the operational problem for us is how do we build, improve, and scale a fleet of heterogeneous robot. You know, it's just not one robot platform. And because we built the organization from scratch in the beginning to, to, to support that, like, I think we're able to do it, but it's just a really hard, uh, problem, um, because the- there's just, like, no two different robots in the fleet. Like, how do you make sure that everything runs smoothly? We're really good at divide and conquer, if you ask, um-
- SPSpeaker
But so how many, how many co-founders are there in total?
- QVQuan Vuong
We have Brian, we have Chelsea, Sergey, myself, Laki, and Adnan.
- SPSpeaker
Is it just necessary to have that many co-founders to solve a problem as big as this? Or was it a case like you're already sort of like a unit together, you'd already worked together, and you just what- whatever you started, you would all have wanted to work together?
- QVQuan Vuong
Yeah. One common question that we have is, you know, why band together? And, you know, the first is that we really enjoy each other company. Um, we spend a lot of time at work, and it's, you know, in some sense give meaning to life, and so we really wanna enjoy the relationship we have at work. Um, and the second is that, you know, any one of us could have started, um, a company and be successful, but the problem is just so incredibly hard, and the chances of success is just so much higher that we band together, and we can divide and conquer the problems. Um, and, you know, that's the, I think, one of the main reason why the progress has been much faster-
- SPSpeaker
Mm
- QVQuan Vuong
... than, than we expected.
- JFJared Friedman
What were the differences of, um, you working before in either academia or a big industry, big company like Google and as opposed to now in a startup?
- QVQuan Vuong
Yeah.
- JFJared Friedman
Because this is, this is the first time for a lot of you doing a startup, right?
- 43:53 – 49:13
What’s still missing (and what comes next)
- QVQuan Vuong
the model further? Like, one of the really side project I would love to take on is to build a automated robotic research scientist.
- JFJared Friedman
Mm-hmm.
- QVQuan Vuong
Um, which is really one of the bottleneck we have today because th-this is a really difficult skill set, um, that require intuition about the entire stack. So, you know, I would love it if there is a model that can ingest multi-model data such as this and analyze failure modes, um, you know, understanding, oh, is the robot performing this way because of the data that was collected or the way that it was annotated or the way that we train the model? And then, you know, suggest idea and actually try them to figure out if, you know, those hypothesis are correct. So that's something that I would love to have and would, like, dramatically unlock us. So sometime I make the joke in the company that we should record all of the meetings and then-
- JFJared Friedman
Yes
- QVQuan Vuong
... train, train a model to, to, to basically just make prediction about what is the next steps of experiment.
- JFJared Friedman
Oh, you could. You totally could. What if it's Open Claw and, um, Obsidian and Markdown files and, like, you know, a brain.md with, like, ontology that's custom to your use case, and what if it's a hundred Open Claws in the background that you orchestrate?
- QVQuan Vuong
I think there's two sides to this. The first is that we already see a little bit of a sigh of life where for simple failure modes, um, during evaluation, if you can describe the way that the robot fail in text very precisely and very clearly-
- JFJared Friedman
Mm-hmm
- QVQuan Vuong
... then, you know, you can ask the language model to make very reasonable recommendation about what the next step is. Um, but the, the, the flip side is that this only works for simple cases today. And the reason why that's the case is because I think it's pretty, um, fundamental limitation of the model that we have today, which is that they are not at the core model that take action in the world and see the consequences of its own action, especially action that changes the physical world. Um, and, and so I, I think this kind of very fundamental understanding about how the physical world works is missing from the really large foundation model. Um, and, and I think that's, that's one of the ingredient that's missing to, to be able to build this automated robot research scientist.
- JFJared Friedman
What's interesting about Open Claw, I don't know. I mean, basically it can go and it can just do things, which is interesting. And then at that point it's on the research lab to provide like, you know, CLI, MCP endpoints to the things that might control robots or, uh, reconfigure rooms or... I mean, I think Karpathy feels like he's, he's starting to talk a bunch about this, where, you know, if you mix auto research plus what he's been talking about with Markdown files, like it might just happen in the open. Like it, you know, there's this sort of sense that you have to make something much, much more complicated to make it work, but what if that's just wrong? What if we just have Markdown files and agents and, you know, you could make it yourself with, you know, literally Claude Code and MCP today? What if it's not an algorithm problem, it's just literally an integration challenge?
- QVQuan Vuong
We have a version of this internally that I use a lot. There was a point when I was spending a, um, embarrassingly large amount of money on API queries.
- JFJared Friedman
Yeah, yeah.
- QVQuan Vuong
Um, and, you know, the, my team was like, "Quan, what are you doing?"
- JFJared Friedman
Oh, I'm that guy at Y Combinator right now. [laughs]
- QVQuan Vuong
[laughs] So, uh, to, to, to, to give you an example, um, we have a, uh, Claude skill that essentially is serving the role of a pre-training on-call today. Um, so, you know, we have these pre-training runs that are really large. Um, it's very, I think, a difficult exercise to keep them alive, to, you know, for them to continue to churn, just because there's so many thing that can go wrong. And we have, um, a, a, a prototype, a pre-training on-call that kind of babysit the run and have the permission to take action to remedy error that it see. Um, and the, one of the surprising outcome of that exercise is that it, it leads to about 50% improvement in compute usage, like just overall compute utilization for that large pre-training run, which is huge for us. Um, and, you know, this was just a small, simple prototype that, uh, that I built, and I think, like, there's a lot more to be done.
- JFJared Friedman
Quan, this is incredible. Thank you so much for everything. Thank you for making Physical Intelligence. Thank you for showing us these incredible demos, and, uh, honestly, like, the thing that gives me the most hope is this idea that there's an entity, there's a, you know, research lab out there that is focused on giving this to the world, you know, about to create this Cambrian explosion of robotic startups. So someone watching right now will be inspired by this and, uh, you know, start playing with your models, and they might create a robot that, uh, touches billions of people's lives in, for the good.
- QVQuan Vuong
Thank you for having me. Been a pleasure. Um, to the listener, the one takeaway that I want you to have is I think robotic has changed a lot, and the cost of building in robotic has decreased, and I think will continue to dr-dramatically, um, decrease. And it also require a very different kind of scrappy skill set, um, that young startup, like, needs. We hope to enable really an explosion of many, many, many different robotic use case, and, you know, always reach out to us if you wanna collaborate.
- JFJared Friedman
Thanks, man.
- JFJared Friedman
Thanks so much.
- QVQuan Vuong
Thank you.
Episode duration: 49:26
Install uListen for AI-powered chat & search across the full episode — Get Full Transcript
Transcript of episode 4EsUaur0nsQ
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome