No PriorsNo Priors Ep. 141 | With Sunday Robotics Co-Founders Tony Zhao and Cheng Chi
EVERY SPOKEN WORD
75 min read · 15,449 words- 0:00 – 0:56
Tony Zhao and Cheng Chi Introduction
- TZTony Zhao
(electronic music) Nobody want to do their dishes. Nobody want to do their laundry. People love to spend more time with their family, with their loved ones. So what we believe in is that if the robot is cheap, safe, and capable, everyone will want our robots. And we see a future where we have more than one billion of these robots in people's homes within the decade. (mechanical whirring)
- SGSarah Guo
Thanks, Memo. (mechanical whirring) Hi, listeners. Welcome back to No Priors. Today, we're here with Tony Zao and Cheng Shih, co-founders of Sunday and makers of Memo, the first general home robot. We'll talk about AI and robotics, data collection, building a full-stack robotics company, and a world beyond toil. Welcome. Cheng, Tony, thanks for being here.
- TZTony Zhao
Thanks for having us.
- CCCheng Chi
Yeah.
- SGSarah Guo
Okay.
- 0:56 – 2:11
State of AI Robotics
- SGSarah Guo
First, I want to ask, like, why are we here? Because classical robotics has not been an area of great optimism over time, or, like, massive velocity of work. And now people are talking about a, a foundation model for robotics or a ChatGPT moment. Um, can you just contextualize, like, the state of AI robotics and why we should be excited?
- TZTony Zhao
I would say I think we're kind of in between the GPT moment and the ChatGPT moment. Like, in the context of LMs, what it means is that it seems like we have a recipe that can be scaled, but we haven't scaled it up yet, and we haven't scaled it up so much so that we can have a great consumer product out of it. So this is where I mean, like, GPT, which is like a technology, and ChatGPT, which is a product.
- CCCheng Chi
Yeah, um, so we're seeing across academia, there's consensus around what's the method, uh, for manipulation, but everybody's talking about scaling up. It's like we know there's sign of life for the algorithms people are picking, but people don't know if we have more data, like what happened to GPT-2, GPT-3, what will happen. And, but we see a clear trend that, you know, there's no reason to believe that robotics doesn't follow the trajectory of other AI fields, that, you know, scaling up is gonna improve performance.
- 2:11 – 3:13
Deploying a Robot Pre-AI
- CCCheng Chi
- SGSarah Guo
Maybe even if you took a step back, like, what was the process for deploying a robot into the world, like, 10 years ago? Like, pre-set of generalizable AI algorithms. Like, why, why was it so slow as a field?
- CCCheng Chi
Yeah, so previously, um, you know, classical robotics have this sense-plan-act modular approach, where there's a human-designed interface between each of the modules. And those are needed to be designed for each specific task and each specific environment. In academia, that means for every task, that means a paper. So a paper is you design a task, design an environment, and you design interfaces, and then you produce engineered work for that specific task. But once you move on to the next task, you throw away all your code, all your work, and you start over again. And that's also kind of what happened to industry. Uh, and so for each application, people build a very specific software and hardware system around it.
- SGSarah Guo
Mm-hmm.
- CCCheng Chi
Uh, but it's not really generalizable. And therefore, it's, it feel like we're just running in loops. We build a one system, and then we build the next one, but there's, like, no synergy between them. And as a result, the progress has been
- 3:13 – 4:29
Impact of Diffusion Policy
- CCCheng Chi
somewhat slow.
- SGSarah Guo
I feel like that's a good segue into some of the amazing research work that you guys have contributed over the last five years to the field. Should we start with diffusion policy? What was the impact of that?
- CCCheng Chi
Yeah, so a diffusion policy is, like, specific algorithm for a paradigm called imitation learning. That's really, like, the most intuitive way of, you know, how to use machine learning for robotics.
- SGSarah Guo
Mm-hmm.
- CCCheng Chi
So you collect paired data of action and observation of what the robot should do. You use that to train a model with supervised learning, and then the robot do the same thing. The problem is that, uh, in the field, it's known to be very finicky. Uh, so when I talk to researchers, when I start into the field, uh, people are like, the researcher themself, the specific researcher need to collect the data so that there's exactly one way to do everything. Uh, otherwise, the robot either, like... Your, your... the model will train and will diverge, or the robot will behave some weird way. And diffusion model really allows us to capture multiple modes of behavior for the same observation, uh, in a way that still preserve training stability, and that really kind of unlocked more scalable th- training and more scalable data collection.
- SGSarah Guo
So it doesn't have to be you personally-
- CCCheng Chi
Yeah.
- SGSarah Guo
... wearing, you know, a tele-op set-
- CCCheng Chi
Yep.
- SGSarah Guo
... in order to make a robot learn.
- CCCheng Chi
Yep, yep. So, like, we can have multiple people, sometimes even untrained people, collecting data, and the result will still be great.
- 4:29 – 7:02
Role of ACT and ALOHA
- CCCheng Chi
- SGSarah Guo
Where do ALOHA and ACT play into this?
- CCCheng Chi
Mm.
- TZTony Zhao
Yeah, so these two papers are actually, like, super close to each other. They're, like, one month or two month away. That's actually how me and Cheng know each other (laughs) . It was by looking at each other's paper, like, and we met on Twitter, I think, when Cheng was back in Columbia. Before ALOHA, I think the typical way people collect data is with a, like, tele-operation setup with VR headset. And it turns out to be very unintuitive to do, and it's hard to collect data that is actually dexterous. What ALOHA brings is a very simple and reproducible setup. So it's very intuitive.
- SGSarah Guo
Sorry, in terms of just for most people who haven't worn a tele-op setup, is it the lag? Is it, like, just, you know... How, how should I compare it to, like, playing a video game or something?
- TZTony Zhao
Yeah, I think ALOHA make it feel more like playing a video game. Uh, normally it feels kind of disconnected.
- SGSarah Guo
Mm-hmm.
- TZTony Zhao
That you're just, like, moving in the free air, and the robot is moving with some delays.
- SGSarah Guo
Mm-hmm.
- TZTony Zhao
Uh, but ALOHA reduces that delay by a lot, and that contribute to the kind of smoothness and how fast human can react. Like, once we get those really dexterous data, what it allows us to do is to investigate on algorithms that are actually solving things that are difficult. In this case, it's through the introducing of transfor- using transformers in the case of robotics. And there was a long period of time that I think robotics was stuck with, uh, three-layer MLPs and conv nets, and as you make it deeper, it works worse. But it turns out that once you have very strong and dexterous data sets...... like, just throw a transformer at it and it works quite well.
- SGSarah Guo
Actually, even, like, just in terms of progress of the industry over time, transformers didn't make sense without a certain level of data collection capability.
- TZTony Zhao
Yes.
- SGSarah Guo
Okay.
- TZTony Zhao
And also, all the system around it, for example, action chunking, which is to predict a trajectory, as opposed to predicting single samples of actions. All these things kind of combined to make dextrous task, bimanual tasks more scalable.
- SGSarah Guo
Why is, uh, chunking important here, if I think about, like, just the analogy to LMs and, um, like, text sequence prediction?
- TZTony Zhao
I think it just kind of throws the ML off if you're trying to force it to react every millisecond.
- SGSarah Guo
Mm-hmm.
- TZTony Zhao
That's not h- not- not how human act. We perceive, and we can actually move quite a bit without, uh, looking at things again. And that turns out to make things, uh, the motion a lot more consistent and all performance to be a lot better.
- SGSarah Guo
And, and you discovered that, actually, transformers architecturally did apply to robotics.
- TZTony Zhao
Yes.
- SGSarah Guo
Cheng,
- 7:02 – 10:38
Imitation Learning - Enter UMI
- SGSarah Guo
you felt then that data collection was still a problem, so enter Umi.
- CCCheng Chi
Yeah, so after, uh, ALOHA and, uh, diffusion policy, I was super excited about imitation learning. But at the time, we- both of us are still doing teleoperation, and that is feel- feels super limiting. I think the problem is that, you know, in your setup at a time, like a teleop set- setup, uh, uh, it takes a PhD student a couple of hours to set up in a- in a- in a lab, and pretty much restrict data collection to a lab. But in order for the robot to actually work as a- as a product, it needs to be able to work in the wild, in- in unseen environments, and that requires data be- also be collected in the wild. And at the time, I was thinking, "Okay, is there a way we can collect robotic data without actually using a robot?" That, like, forced me to think, "Okay, what's the actual most essential part of a robotics data?" And after diffusion policy in act, actually, the paradigm's kind of simple. Uh, you just need paired observation and action data. In our case, observations, the video clip. The a- uh, the action is the movement of your hand, uh, plus, you know, how the finger moves. I realized that all this information you can get from a GoPro. Uh, you can track the movement of the GoPro in space, and you can, you know, track the motion- motion of the, uh, uh ... like, you know, of the gripper and also finger, uh, through images as well. And that's why I built this, uh, Umi, uh, gripper. It's 3D-printed. At the time, like, the project has three PhD students. Like, we just took the grippers everywhere. Uh, like, you know, I think- I think it was two weeks before the paper deadline. Like, every time it goes to a restaurant, before the waiter come in, we just collect some data. And very quickly, we got, you know, I think 1,500, uh, video clips of this, like, espresso cup serving task. And that turns out to be one of the biggest datasets in robotics, and just simply by three people. And that's, like, how... Like, that's where the power kind of shines. And then with that amount of data, allows to train the first end-to-end model, uh, that can actually generalize to unseen environments. So, we can push the robot around in Stanford. Actually, Tony was there as well.
- TZTony Zhao
Yeah.
- CCCheng Chi
Uh, you know, push the robot arm around the Stanford campus, and then anywhere, you know, the robot can, you know, serve you a drink.
- TZTony Zhao
Yeah, I think that is the moment I was like, "Hey, maybe we should start a company." (laughs) This is actually working so well. I remember, like, just following Cheng and do-
- CCCheng Chi
(laughs) A few times it doesn't work well.
- TZTony Zhao
Yes. I think the only exception I saw was, um, when it's under direct sunlight.
- CCCheng Chi
Yeah.
- TZTony Zhao
Right? And I think the reasoning was, like, over that whole, like, two, three weeks of data collection-
- CCCheng Chi
That two weeks is all raining, so there's no sunlight data.
- TZTony Zhao
Yes.
- CCCheng Chi
Uh, so, like, it fails. That also demonstrate the importance of distribution matching. So, in order for robot to work in a sunny environment, it must have seen sunny environments in the training data.
- SGSarah Guo
Yeah, this ... It's really interesting, because I remember when I first met you guys, it was like you spent, like, I don't know, $200,000 across all of your academic research, and yet the scale of data collection as translated to model capability is leading, right? So, it's- it's very interesting that, you know, we look at where we are, maybe going back to Tony's point of scaling and massive capital deployment. But that entire paradigm actually wasn't relevant before, um, people realized, like, you should train on all of the internet data.
- CCCheng Chi
Yeah.
- SGSarah Guo
And we just don't have that in robotics. So, the entire field is just blocked on having any scale of data that's relevant.
- TZTony Zhao
Yeah.
- SGSarah Guo
Right?
- TZTony Zhao
I think these days, there are so, like, so many debates about, like, what is even the right way to scale? There are, like, world models, there are simulations, there is teleoperation, there are, like, all these new ideas. And I think this is the- the sort of area that we really want to innovate, that we want to differentiate, that we want to find out something that is both high-quality and scalable.
- 10:38 – 11:57
Introducing Sunday
- TZTony Zhao
- SGSarah Guo
And then you guys, you decide to start a company pushing this cart around Stanford. (laughs) Tell me about that decision, and congratulations on the launch and- and sort of the- the direction and team you've built.
- TZTony Zhao
Yeah, it's a, it's a very interesting journey. Uh, I remember in the beginning, it's actually two of us, uh, in Cheng's apartment. On his desk, we were to, like, clamp the robot there and try to do some tasks. And it soon becomes, like, uh, I think an eight-person team towards the end of 2024, and now we're at around, like, 30 to 40 people. We're not the best at everything, right? But starting a company allows us to find people who we really love working with, and then bring all the expertise together from mechanical engineering and supply chain, like, software engineering, like, controls, and to build a system together that is not like a demo, but a real product.
- SGSarah Guo
You built this amazing team. What are people actually signing up for? What's the mission of Sunday?
- TZTony Zhao
Yes. It is to put a home robot in everyone's home. I think there are a lot of AIs trying to make you more efficient during the work, but there is not enough AI that actually helps you with all these mundane things that are not creative, that really has nothing to do with what's making us more intrinsically human. What's ideal for people to spend more time on is actually with their hobbies, with their passions, as opposed to spending more time doing chores.
- 11:57 – 15:05
Sunday’s Robot Design Philosophy
- TZTony Zhao
- SGSarah Guo
So, i- if you guys are going from, you know, these amazing research breakthroughs to, "We're actually going to ship a home robot," you know, that's a product you have to talk about.... cost and capability and robustness. Like, what's the design philosophy?
- TZTony Zhao
As these AI models becomes more capable, and as hardware cost continues to go down, the home robots or all kinds of robots will be everywhere. So if you start from the most surface level, which is design of the robots, when we design it, we think about what should a robot look like if it is ubiquitous? And you see it every single day, what should it look like? And what we end up with is that's we really think the robot should have a face. It should have a cute face, and it should be very friendly. So in- instead of, like, a Terminator doing your dishes, we want the robot to feel like it's out of a cartoon movie. And then a huge decision is, like, how many arms should a robot have? Should it have like four arms? Should it have like one arms? Should it have legs? Should it have like five fingers? Two fingers? Three fingers? There's a huge space.
- SGSarah Guo
Why isn't the obvious answer it should just be like a full humanoid?
- TZTony Zhao
Mm-hmm. I think the core motivation for us is how can we build a useful robot as soon as possible? So whenever we see something that we can accelerate it with simplification, we'll go simplify that. So one example of that is, uh, the hand that we designed, which has three fingers. We kind of combined the three of the fingers that we have together.
- SGSarah Guo
Mm-hmm.
- TZTony Zhao
And the reasoning there is just that most of the time when we use those fingers, we use it together. Let it be like grasping a handle, let it be opening the dishwasher. So it really doesn't make sense to add the cost, like multiply by 3X to have separate it into three when we can do one with most of the benefits. So this is how we think about the whole robot as well, is kind of with the constraint that we are building a general-purpose robot that can eventually do all your chores. And we'll simplify everything we possibly can so that the robot can be as low cost and as easy to repair as possible.
- CCCheng Chi
Yeah, I just want to add a little bit more, uh, to the, like, actuator and like mechanical design. Traditionally, most robots are designed for industrial use cases.
- SGSarah Guo
Mm-hmm.
- CCCheng Chi
And robot are very fast, they are very stiff, and they're real precise. Uh, the reason is because all the industrial robots are blind. So they're blindly following a trajectory that's programmed by someone.
- SGSarah Guo
It's not reaction to perception.
- CCCheng Chi
Correct. Correct.
- SGSarah Guo
Okay.
- CCCheng Chi
Um, but because of the breakthrough we had in AI, like now robot have eyes. So it can actually correct its own, uh, mechanical and hardware inaccuracies.
- SGSarah Guo
Mm-hmm.
- CCCheng Chi
Uh, so that kind of like opened up a new different space of design.
- SGSarah Guo
And intuitively, it should be like, "I can't tell you exactly what the distance is here on a millimeter scale, but I'm going to get to the cup because I can stop."
- CCCheng Chi
Yeah, exactly. So that allows us to use these low cost actuators that's, uh, cheap, that's compliant, but they're imprecise. But, uh, because of the AI's algorithms and the systems we built, it allows us to build a robot that's mechanically inherently safe and compliant, while simultaneously be able to achieve the sufficient accuracy we need for the home tasks.
- 15:05 – 19:02
Sunday’s Shipping Timeline
- SGSarah Guo
Where are we in that timeline? You said we're between GPT and ChatGPT? Um, and so like when do consumers get ChatGPT, and when will you guys ship something?
- TZTony Zhao
Yeah. It's actually a really exciting time because like we have so many prototypes internally. What we will do next year, 2026, is actually start doing beta programs. We'll have these robots, all kinds of different ones, into people's home and see how they react to it. That will be when we learn the most about like how people, like people want to talk to a robot, if people want to have the robots, uh, maybe teach their kids some, uh, new knowledge about the world. And this will inform us what the eventual product should look like. Internally, we just have a extremely high standard of what is the minimal consumer product we want to ship. It needs to be extremely safe, it needs to be extremely capable, and low cost.
- SGSarah Guo
Do you feel like you know something now that you didn't when you started the company?
- TZTony Zhao
A- absolutely. So I, I think at the beginning, I would describe it as like we see light at the end of the tunnel of there are two axes, there's dexterity, there's generalization. When we add more data, things works better. And what this company about is the cross product of these two. How can we scale and have both dexterity and generalization? And this is something we're able to show in our generalization demo, which is like we can pick up these like very precise like forks, like actual metallic forks, on these ceramic plates with very high success rates. And honestly, this is not something that's like we thought that would work so easily just by having so much more data.
- CCCheng Chi
Yeah, so actually, uh, just want to expand a little bit. Uh, you know, is actually the process was long and painful.
- SGSarah Guo
(laughs)
- TZTony Zhao
(laughs)
- CCCheng Chi
So, uh, you know, there are so many issue, like just scaling up a system, robotic systems is very, very hard. There are mechanical issues, uh, like reliability issues. Uh, there's like data, you know, quality issues that come out of it. I- in the beginning, I actually thought it was gonna be much easier than this.
- TZTony Zhao
(laughs)
- CCCheng Chi
(laughs) But really just it takes time and effort to grind out all the little details for this to work. Um, I also actually compared to TeleOp is much harder to get this system scaled up. But once it's scaled up, it's very powerful and very repeatable.
- SGSarah Guo
So it is both harder than you thought it would be to get to here, and you are further than you thought you would be.
- CCCheng Chi
Yes.
- TZTony Zhao
And I remember in the beginning we were having this like funny conversation of we were like, "If we build this, someone can just like take our glove and they will build the same thing." Like, what mold do we have? Are we worried about that? And I think in the beginning actually we were a little bit worried because we thought like, "Oh, you know, they can probably just replicate it." But as we go along the path, it turns out things are so much harder than we thought they were.
- CCCheng Chi
There's some- so many just small Nice. Yeah.
- TZTony Zhao
Yes.
- SGSarah Guo
And when you say scaling up a robotic system, you mean the data collection to training pipeline and the hardware itself?
- CCCheng Chi
Yeah, so actually like for-
- SGSarah Guo
To operate it.
- TZTony Zhao
Yeah.
- CCCheng Chi
Yeah, for this to work at all-
- SGSarah Guo
Yeah.
- CCCheng Chi
... you need the data collection system.
- SGSarah Guo
Yeah.
- CCCheng Chi
You need the robotic and control system to be able to deliver the hand to where we want it to go.
- SGSarah Guo
Yeah.
- CCCheng Chi
And you also need the data...... filtering pipeline, and data cleaning pipeline, and the training pipeline.
- TZTony Zhao
Mm-hmm.
- CCCheng Chi
And all of these things need to be iterated together, so actually gone through several loop of these. It's kind of hard to imagine without having a full stack team in-house, how this can even be done.
- TZTony Zhao
Yeah. The glove we are using right now is, we call it, like, V5.
- CCCheng Chi
Mm-hmm.
- 19:02 – 23:58
Scale of Sunday’s Training Data
- TZTony Zhao
- CCCheng Chi
So, you guys can't obviously know everything that's happening...
- TZTony Zhao
Mm-hmm.
- CCCheng Chi
... in every company and in academia and industry, but from what you know, how would you compare the scale of, um, training data that you have today, relative to the industry?
- TZTony Zhao
At this point, we are almost 10 million trajectories being collected in the wild. And those trajectories are not just like, "Oh, pick up a cup."
- CCCheng Chi
Mm-hmm.
- TZTony Zhao
It's these long trajectories with, like, walking, with navigation, and then, like, doing these long horizon tasks.
- CCCheng Chi
Tony, as you mentioned, like, it's an open question, actually, what the right way to scale data up is. And so, there are strong theories around Teleop, around, like, pure RL, um, around video and world models, like, uh, how did you think about all of these? Yeah, so from our perspective, actually it's kind of somewhat surprising. So, in the beginning, we worried that, you know, the data from glove or WUMI-like data, that has higher quantity but lower quality compared to Teleop. Because for Teleop, you're using exactly the same hardware and software stack between training and testing. It's perfectly distribution matched. Mm-hmm.
- TZTony Zhao
But what we realized is actually, you know, this glove form factor encouraged people to do more dextrous and more natural movement, and those actually result in a, like a more intelligent behavior on the, on the, on the modeling side. And in terms of, you know, data quality, we don't really see a difference in terms of, you know, how much, like, uh, like there's a gap between Teleop and, uh, glove data.
- CCCheng Chi
Mm-hmm.
- TZTony Zhao
After we did the 20 engineering, like, ƒ (utters words) .
- CCCheng Chi
Yeah. (laughs)
- TZTony Zhao
Yeah. Like, uh, 'cause, like, apparently there is a mismatch, right? That's, um... In the camera frame, there is a human inside of the robot.
- CCCheng Chi
Yep.
- TZTony Zhao
And there are just a lot of things that we need to do to kind of convert a human data, one-to-one to, like, as if it is robot data, and have the model not being able to tell the difference.
- CCCheng Chi
Yeah, and that kind of re- relies on, again, the whole, the full cycle iteration between hardware and software.
- TZTony Zhao
What about RL?
- CCCheng Chi
We see a lot of great promise for RL in local motion, and, uh, you know, we think that will continue to be true for local motion. Uh, so what we see, it really fr- uh, like... RL as a method, uh, is very powerful, uh, but it is much less sample efficient compared to imitation learning, and we see that to work great in environments where it's easy to simulate. Uh, for the case of local motion, only, you don't need to worry about, uh, rigid body dynamics and rigid body contact between the, uh, th- robot and, and the ground. Um, and, you know, because you engineer a robot, you know everything. Uh, but for manipulations, har- kind of hard for us to imagine, like, have this... Actually the same amount of diversity and the distribution of real object, in terms of matching both appearance, a- is, is- uh, appearance and physical properties. And we think that, uh, it's going to be challenging compared to, uh, glove data collection than Teleop.
- TZTony Zhao
Yeah, I think it's really about which method can get us there faster. There might be different methods that will eventually get there. For example, like, you know, simulation of a world model, right? And, like, it's almost a tautology to say that if I have a perfect world simulator, anything can be done there. Like, as long as you can do it in the real world, you can do it in a simulation, and you can, like, you know, cure cancer in a simulator, right? But what it turns out for robotics is that just some, some things are harder than others, and it really depends on the problem itself. So, in the case of local motion, as I mentioned, all we need to model in a simulator are point contacts with a somewhat, uh, flat ground.
- CCCheng Chi
Mm-hmm.
- TZTony Zhao
Um...
- CCCheng Chi
Like feet.
- TZTony Zhao
Yes.
- CCCheng Chi
Yeah.
- TZTony Zhao
But sort of the behavior we want out of it is actually very difficult to model.
- CCCheng Chi
Mm-hmm.
- TZTony Zhao
Like, it's all these, like, reactive behaviors, that when you feel like your, um, your leg is hitting something, you should, like, retract and, you know, step again. Uh, these are very, very hard to describe or try to learn from demonstrations directly. But in the case of manipulation, I think the difficulty is flipped.
- CCCheng Chi
Mm-hmm.
- TZTony Zhao
That it's a lot easier to capture the behavior itself, and it's a lot harder to simulate the world.
- CCCheng Chi
Mm-hmm.
- 23:58 – 24:56
Importance of Data Quality at Scale
- TZTony Zhao
- CCCheng Chi
Is there anything that you have changed your point of view on, um, in data, over the last year? Uh, I, like, the one thing-
- TZTony Zhao
Hm.
- CCCheng Chi
... I wouldn't say changed, but just, uh, data quality really matters. Mm-hmm. Uh, like, I think we know, I always knew data quality matters, but once you scale it up... Mm-hmm.
- TZTony Zhao
... like, it, it really matters. Uh, and then, and then, uh, because...
- CCCheng Chi
... you know, just like the diversity of behavior that, like, you experience in the wild. Uh, it's very hard to control, and the hardware failures are hard to control. You need to constantly monitor them. Uh, just need to spend a lot of ... Huge amount of engineering effort just to make sure that, you know, the data is clean.
- TZTony Zhao
Yeah, and also building all those automatic processes.
- CCCheng Chi
Yeah. Right.
- TZTony Zhao
We have our own way of calibrating the glove before we ship it out, and we have this whole, like, software system to catch if a ... Something is broken on a glove, and we can detect it automatically. It's like the importance of, uh, data quality kind of translates to all these repeatable processes, and we don't need a human to be staring at the data to know that something is wrong.
- SGSarah Guo
When
- 24:56 – 27:59
Technical Challenges
- SGSarah Guo
you describe the beta for next year, um, a lot of it, uh, sounded like, you know, we just want to understand behavior, like how people actually want to use it. We can make some design decisions for the actual product. What technical challenges do you still see?
- CCCheng Chi
Uh, so to me, I think there's like two kind. The number one is really figuring out the training recipe at scale. We, as a field, just entered, uh, uh, the realm of scaling, and we just got amount of data that we need. I think now is a perfect time to start do research and try to figure out what exact training recipe we need to actually, you know, get robust behaviors. Uh, and I think, you know, we're in a unique position because, like, the amount of data and the entire pipeline we built around data. The second point, I think, is just really hardware is hard. We are still pushing the bound- the envelope, performance envelope of hardware. It's not really clear actually what is needed, what is needed, what is necessary for the hardware to be reliable. Because like when- whenever a m- mechanical team build a hardware, the learning team will try to har- try harder to push it against the boundary, and then it'll break at some point. Uh, but I think what's, uh, interesting in this company is that everybody is under the same roof. So immediately after something breaks, it goes straight back into mechanical design, and we have a ... Another iteration, let's say, for the hand parts, uh, very quickly. Hardware is hard, um, but it is important, and I think, you know, it's the, uh, hard but right thing to do. And I think we, as a field, shouldn't avoid doing the hard things, uh, just because they are hard.
- TZTony Zhao
Yeah. I want to echo Cheng's point about, like, first the research. I think when there is data scarcity, it is really easy to come up with, like, cute fancy research ideas that doesn't end up scaling very well. And this is why, like, when we build a company, we actually focus on the infrastructure and a scalable data pipeline and operations before we start to, like, really dive into research, which we only started to do, like, three months ago. I think we really want to avoid doing research that doesn't scale, and we want to focus on things that contribute to the final product. The second point is like, I think robotics is so intrinsically a s- like a system that's right now, we don't ... Like, there's not a existing general-purpose home robot out there, and we don't really know what the interface of different system is, like what is even good. And in that case, if you're working with a partner, it's actually really hard for them to understand your standard of good, because your standard of good is changing all the time.
- SGSarah Guo
Mm-hmm.
- TZTony Zhao
This is why we are, like, building everything in-house in a more full-stack approach, that we build our own data collection device that is co-designed with the robot. We build our own, like, operation team to be like, "How can we most efficiently get the most high-quality data out?" And of course, our own AI training team, um, that's make the best use of this data. I think these are the things that are really not easy. It makes a company a lot harder, like, to build, that right now we suddenly need, like, so many teams, and they need to o- orchestrate, uh, together, but we believe it is the right thing to do.
- 27:59 – 30:48
When Will People Have Home Robots?
- TZTony Zhao
- SGSarah Guo
Okay, I'm gonna ask you a few questions that are uncomfortable guesses now, but when will people be able to buy robots commercially for the home?
- TZTony Zhao
Like, this is something we're really excited about because we have so many of prototype robots in our office, and we really want to get it out there. So the next step of our plan is to have a beta group program 2026, and what it means is that, uh, for people who sign up that we selected, they will have a real robot in their home, and it will start doing chores for you. And, uh, it's a h- it's going to be a really interesting learning lesson for us because we will see, like, how human interact with the robots. We'll see, like, what kind of things people just really want the robot to do. I think this will be before we actually ship it to the masses, because we just have a incredibly high standard of what we are willing to ship as a ... For a consumer experience standpoint. We want the robot to be highly reliable. We want it to be capable. We want it to be cheap. I think it really depends on the results of the beta program that we'll decide when is a good time to ship it. Is it 2027? Is it 2028? And, but all of those are possible.
- SGSarah Guo
But it's not a decade away.
- TZTony Zhao
No, it's definitely not a decade away.
- SGSarah Guo
How much do you think it could cost?
- TZTony Zhao
Right now, the prototype robots we have in, in-house, I think the cost ranges from like $6,000 to something like $20,000, and this is actually pretty interesting that, uh, the big difference here is not like, "Oh, we find a better actuator." They are using the same actuators that are, like, very low-cost, but actually it is the cladding of the robots. When you're trying to make them at lower scale, it's just really expensive, like the claddings are like a few thousand dollars to make. (laughs) Uh, but this is the type of things that, as we scale up, it becomes, like, dirt-cheap. Uh, because instead of, like, doing CNC, instead of hand-painting them, it'll become injection molding. What we see is that as we get the scale to a few thousand units, we can drastically reduce the material cost, likely under 10K, and what it implies is that when we sell the robots, the price will be somewhere around it.
- SGSarah Guo
Okay, so, uh, you fast-forward two, three years out. If you look like five years and beyond, the home robots are ubiquitous. Like, what does life look like? How does it change for the average person?
- TZTony Zhao
This is a different answer for everyone. For me-... like, I just really hate dishes. Uh, like, in my sink there's always like four or five dishes that are, like, somewhat dirty out there, that it kind of stinks a little bit, and after a long day at work, it really doesn't feel good to come back, like, see a home like that. Um, so I think the world we'll live in is-
- SGSarah Guo
Is gonna be cleaner.
- TZTony Zhao
... is going to be cleaner. And I was just thinking about it as, like, the marginal cost of labor in homes goes to zero.
- SGSarah Guo
The last thing I want
- 30:48 – 32:34
Failures of Past Demos
- SGSarah Guo
to make sure we do is, like, talk about demos, right?
- TZTony Zhao
Mm-hmm.
- SGSarah Guo
Um, there's a lot of robotics launch videos today. It's been years since you saw an Optimus serving drinks at a bar. Why are those not available? And what is actually hard?
- TZTony Zhao
Yeah. I, I think the way I would put it is make zero assumptions, no priors.
- SGSarah Guo
Okay.
- TZTony Zhao
As in-
- SGSarah Guo
Nice.
- TZTony Zhao
... if you see a robot handing one drink to one person, first ask the question of, "Is that autonomous or is that tele-operated?" So this is the first thing. So you should look at the tweet and see what they say about it. And, and then is that, does it show giving another slightly different color cup to the same person or not? If they didn't show it, it means that the robot can literally only pick up that single cup and give it to that same person. When we look at demos, we tend to put our human instinct into it. Like, "Oh, if it can hand a cup to that person, it must be able to hand a different cup to another person. Maybe it can also do my dishes, maybe it can do the laundry." There are a lot of, like, wishful thinking that we can have about it, which is what's great about robotics, that there are a lot of imaginations. But I think when we look at demos, only index on things that is shown, and that's likely the full scope of that task.
- CCCheng Chi
Yeah. I think another aspect is, at least me as a, a researcher, I, uh, I appreciate the number of interactions that happens, uh, in, in, in the demos. Usually, the more interactions you have, like, every interaction, there's a chance of failure.
- SGSarah Guo
Mm-hmm.
- CCCheng Chi
So the, the longer the sequence is, the harder it actually is. So that's something we really emphasize here. Uh, and, and that's actually somewhat, uh, uniquely, uh, easy for us because the glove, uh, way of data collection is so intuitive, uh, to people.
- TZTony Zhao
Yeah. It's really about, like, generalization and reliability.
- 32:34 – 36:53
Sunday’s Demos
- TZTony Zhao
- SGSarah Guo
So can you explain the demos that you guys are showing?
- TZTony Zhao
Yeah. Of course. So we're showing, like, basically three categories of demos. The first one, as you saw, is, uh, we have this whole, like, messy table. And what the robot does is to clean up the whole table and, you know, dump the food into the food waste bin and load the dishes in the dishwasher, and then operate the dishwasher. What makes this demo really hard is that it is a mix of really fine-grained manipulation with this super long horizon, full-range task, as in, like, it need to go up and it also need to go down very much.
- CCCheng Chi
It's a, it's a mobile manipulation task.
- TZTony Zhao
Exactly. The reason that we can show this is just how nimble and easy for us to collect these datasets to make Horizon dexterity demo possible. And it's also about the forces as well. So you might have seen, like, we're trying to pick up two wine glasses with one hand.
- SGSarah Guo
Mm-hmm.
- TZTony Zhao
And-
- SGSarah Guo
I struggle with this, but yes. (laughs)
- CCCheng Chi
(laughs) Yeah.
- TZTony Zhao
It's actually really hard. And because it's, like, transparent objects, we need to also load it very precisely into the, into the dishwasher. A lot of it is about how much force you apply.
- SGSarah Guo
Mm-hmm.
- TZTony Zhao
Uh, because if you're trying to grasp two, in one hand, if you squeeze a little bit harder, you're going to break one of the glass. And when you load it into a dishwasher, if you're pushing it in the wrong direction and it hits something, it's going to shatter. We did shatter a ton of glasses when we were, like, experimenting with it. So these are tasks that are, like, really high stake, that it's not just about recovering from mistakes, but about not making those mistakes in the first place. And this is what's generally the case in a lot of the home tasks, that you're just not allowed to make any mistakes. And then we get into the generalization demos, which we basically show our robot, we book like six Airbnbs and we get it there, zero shots.
- SGSarah Guo
Mm-hmm.
- TZTony Zhao
And see if it can do, like, part of the task. So two tasks we use, one is, like, go around the table and collect all the utensils into the caddy. The other is to grasp a plate and then load it into the dishwasher. What makes these two- these demos very interesting is that we don't need any data when we enter that home.
- SGSarah Guo
Mm-hmm.
- TZTony Zhao
It's pure generalization. And this is as close to getting, like, a real product as you can get, because when someone buy-buy our home robot, we really don't want them to, like, collect a huge dataset themselves.
- SGSarah Guo
Mm-hmm.
- TZTony Zhao
Just to, like, unbox it. Also, in addition to the generalization, those two tasks are also really precise. We're using the exact silverwares, uh, in that home. And you need, like, basically, like, a few millimeter of precision, need to grasp it properly. Those forks are also hard to perceive because they're reflective, like, the lights looks weird on it. We have, have a transparent table home. I think that the table looks like nothing. (laughs)
- SGSarah Guo
(laughs)
- TZTony Zhao
And the robot still, like, reacts very well to it. And again, the reason that we can do it is because we have all these, like, more than 500 people and we've seen so many glass tables-
- SGSarah Guo
Mm-hmm.
- TZTony Zhao
... in that dataset, so the robot is able to do it. I think the last bit of the task that we did is kind of pushing what's possible in terms of dexterity. The two tasks we chose, one is, uh, espresso, operating an espresso machine. The other is, like, folding socks.
- SGSarah Guo
Mm-hmm.
- TZTony Zhao
What makes these hard is that they require very fine-grained force that is hard to get if you're dealing with tele-operation, because these days, there's not a good tele-operation system that can let you feel how much force the robot is feeling.
- SGSarah Guo
Mm-hmm.
- TZTony Zhao
So basically when you're tele-operating, your hand is numb.
- SGSarah Guo
Mm-hmm.
- TZTony Zhao
And sometimes you're applying, like, a huge amount of force on the robot, but you don't know it, and that can result in, like, very low data quality, that the robot is also doing it in that aggressive way that we really want to avoid for our system. Um, the sock is a very good example, that when you're trying to fold it, your two fingers can...... touch.
- SGSarah Guo
Mm-hmm.
- TZTony Zhao
And that forms a, what we call, like, a force closure. You have a closed loop-
- 36:53 – 39:10
What Sunday’s Hiring For
- TZTony Zhao
- SGSarah Guo
One of the things that you will also need to scale as you guys, um, you know, scale up the company is the team. Um, uh, what are you hiring for? What are you, what are you looking for?
- CCCheng Chi
One thing, uh, I'm, uh, really looking for is-
- SGSarah Guo
Cheng speaks up. (laughs)
- TZTony Zhao
(laughs) all right. Yeah.
- CCCheng Chi
Yeah.
- SGSarah Guo
Yeah.
- CCCheng Chi
So it's, uh, full-stack roboticists and people who aspire to become full-stack roboticists.
- SGSarah Guo
Yeah.
- CCCheng Chi
Uh, so really, what you learn in this company, it's just that robotics is such a multidisciplinary field.
- SGSarah Guo
Mm-hmm.
- CCCheng Chi
Uh, you need to know, uh, you know, mech- a little bit of mechanical, a little bit of electrical, a little bit of code, a little bit of data to actually fully optimize the system. And we have a couple examples, um, of training, you know, just full-stack software engineers to become roboticists.
- SGSarah Guo
Mm-hmm.
- CCCheng Chi
Training engineers to become roboticists. And so if you want to learn about robotics as you move along, you want to learn the whole thing, not just to be boxing into your small, you know, little cubicle, uh, let us know. (laughs)
- SGSarah Guo
And you told me that you, uh, didn't write code until you got to college or something.
- CCCheng Chi
Yeah.
- SGSarah Guo
Yeah.
- CCCheng Chi
I was super enthusiastic about robotics, uh, but I was mostly doing, like, mechanical and electrical design, uh, before that. Uh, and then I realized, "Okay, the bottom line is actually how the robot will move, and there's, like... there's, there's a... there's something called, like, programming." And then the more I got into it, the... like, the deeper it gets. Uh, and then, you know, toward the end of college, I realized, "Okay, there's a thing called machine learning, and you figure out how to train models." I think these things just go on and on. I think it's very natural for me to gradually expand my skill set because I'm always looking forward to build a robot.
- SGSarah Guo
Well, I hope you discover the next field, because you're no longer doing dishes too.
- TZTony Zhao
(laughs) It's a very fun place to work. Whatever you can imagine about robotics and consumer products and machine learning, you can find it here, because we're just fundamentally such a full-stack company. We're not just about the software, we're not just about the hardware, but we are about the whole experience, the whole product, and making sure that product is general and, like, scalable in the future.
- SGSarah Guo
Awesome. Congratulations.
- TZTony Zhao
It's really exciting. (instrumental music)
- SGSarah Guo
Find us on Twitter @nopriorspod. Subscribe to our YouTube channel if you wanna see our faces. Follow the show on Apple Podcasts, Spotify, or wherever you listen. That way, you get a new episode every week. And
- 39:10 – 39:16
Conclusion
- SGSarah Guo
sign up for emails or find transcripts for every episode at no-priors.com.
Episode duration: 39:16
Install uListen for AI-powered chat & search across the full episode — Get Full Transcript
Transcript of episode 4-VzXoZqAH0
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome