No Priors Ep. 48 | With Covariant CEO Peter Chen

Building adaptive AI models that can learn and complete tasks in the physical world requires precision but these AI robots could completely change manufacturing and logistics processes. Peter Chen, the co-founder and CEO of Covariant, leads the team that is building robots that will increase manufacturing efficiency, safety, and create warehouses of the future. Today on No Priors, Peter joins Sarah to talk about how the Covariant team is developing multimodal models that have precise grounding and understanding so they can adapt to solve problems in the physical world. They also discuss how they plan their roadmap at Covariant, what could be next for the company, and what use case will bring us to the Chat-GPT moment for AI robots. 00:00 Peter Chen Background 00:58 How robotics AI will drive AI forward 03:00 Moving from research to a commercial company 05:46 The argument for building incrementally 08:13 Manufacturing robotics today 12:21 Put wall use case 15:45 What’s next for Covariant Brain 18:42 Covariant’s customers 19:50 Grounding concepts in Ai 25:47 How scaling laws apply to Covariant 29:21 Covariant’s driving thesis 32:54 the Chat-GPT moment for robotics 35:12 Manufacturing center of the future 37:02 Safety in AI robotics

Sarah GuohostPeter Chenguest

Jan 24, 202440mWatch on YouTube ↗

EVERY SPOKEN WORD

75 min read · 14,854 words

0:00 – 0:58
Peter Chen Background
1. SGSarah Guo
  (music plays) Hi, listeners. Welcome to another episode of No Priors. This week, I'm joined by Peter Chen, the co-founder and CEO of Covariant, a robotics startup that is developing AI robots. Before he started Covariant, Peter was a research scientist at OpenAI and a researcher at the Berkeley AI Research Lab, where he focused on reinforcement learning, meta-learning, and unsupervised learning. He is a prolific publisher and now a founder. I'm so excited to have you on today to talk about what's, uh, going on in robotics. Welcome, Peter.
2. PCPeter Chen
  Thanks, Sarah. It's great to be here. Um, there, uh, is, there are many exciting reasons to be here. One is I have been a frequent listeners, um, of the podcast, and the second one is just because of the name, like, I just have to be on this show, so it's great to be here.
3. SGSarah Guo
  Right. Let's go establish, uh, some, some priors for everybody, uh, in a very unknown landscape, right? Can we start with just,
0:58 – 3:00
How robotics AI will drive AI forward
1. SGSarah Guo
  uh, why you were drawn to robotics and the beginning of your research journey?
2. PCPeter Chen
  Yeah. When I was working on research at both UC Berkeley as part of my PhD, uh, and at OpenAI, there were two topics that were particularly exciting to me. One topic is, like, as you have introduced, unsupervised learning. Like, how can we build models that learn from vast amount of data? And we now more colloquially known this as generative AI because, like, we train this large models on large amount of text, images, videos, uh, and you learn from them in an unsupervised manner. That topic has always been very interesting to me because if you want to train very capable AIs, you want to have a lot of data, uh, and where you can get a lot of data is through this kind of unsupervised dataset. And then the second topic that was really interesting to me was reinforcement learning. Like, it's not just building models that understand, but building models that can make decisions. Um, and reinforcement learning teach these models to make decisions by having them make trials and errors and learn from the better decisions and do less of the worse decisions. And robotics is just a, such a great combination of these fields. Like, in order to build really capable robots, they need to really understand the world in a very, very robust way, and they are not just passive agents that just understand text or what's in an image. They actually need to take actions in the real world, and the consequences do matter. And so we found robotics to be such a great way to both utilize the advances in AI, but also we think of it as a way to also propel AI forward. Like, this is where you get the grounded data. This is where you get that embodied data of not just AI that is trained on browsing the internet, but AI that is trained with physical interactions with the world. And so we also believe robotics would be a key way to advance AI.
3. SGSarah Guo
  That makes sense. You were at places that are great places to do research. Why did you
3:00 – 5:46
Moving from research to a commercial company
1. SGSarah Guo
  decide to start a commercial company?
2. PCPeter Chen
  It's a really good question. Um, I mean, there are a lot of companies that are founded by prior PhDs that are kind of the classic journey of there's a technology that was built in a lab environment, and it got to enough level of maturity that, oh, we should start to commercialize it in the real world. That was kind of not the journey of Covariant. Like, when we started Covariant, there was no AI that was good enough to make robots do useful things commercially. Uh, and so it was not a classic journey of technology developed in academia and then transitioned to a commercial landscape. The key insight that we had at that time when we left OpenAI in 2017 to start Covariant was the future of AI is going to be the future of foundation models. These models that are truly multitask, learn from large amount of data, and as, as such, be more generalizable. They can solve new tasks more easily, and are also more capable at every single one of the task because of the transfer that you get across tasks. We just had early conviction that that was the path to build AI, and that is also going to be true for the physical world for robotics. But there's one big problem, which is you have no dataset to build robotics foundation model. Like, there's no dataset that you can build this AI that understands the physical world and take actions in the physical world. Um, and so in order to build this foundation models for robotics, you really have to build a company that can collect data to do it, and the only way to collect enough data is to build fleets of robots that are actually creating value for customers that, so that you can collect those data in production, because even if you try to scale up data collection in a lab environment, there's a limit on how much you can do that. In that perspective, like, we strongly believe in the Tesla approach, like, where they have the most self-driving car data because they ship a great car that people want to drive and a good enough entry-level autopilot that people are willing to use it, and they're creating value for their customers. Like, customer use their products, and those data that they collect can allow them to build much more capable models and AI. And so, um, why we left OpenAI and, um, academia to start Covariant is very much this belief that in order to build foundation models for robots, you have to have a lot of data, and in order to have a lot of data, you have to build autonomously working systems for customers.
3. SGSarah Guo
  Yeah.
4. PCPeter Chen
  And the only way to do that is to build a company to serve those customers.
5. SGSarah Guo
  Yeah. There's a really interesting tension if you're trying to build a, let's say, AI capability that doesn't
5:46 – 8:13
The argument for building incrementally
1. SGSarah Guo
  exist yet because the, you know, there's no model that is good enough of how much you invest in that upfront versus deliver the product that already exists in the world, right? Like, you could just go build a bunch of robots and deploy them en masse or, you know, if we draw a analogy to, um, the prior generation or current, current existing generation of autonomy companies, like we were, you know, I- involved early in my prior role in Aurora and Nuro and then I was personal investor in Kodiak, right? Like, a lot of these companies, y- you were trying to build a, a brain as, as an alternative to the Tesla approach, and I think the, the economics of collect as you go is getting, uh, very, very compelling just in terms of how expensive it is to try to sequence it the other way.
2. PCPeter Chen
  Yeah. Like, this definitely needs to be a-... um, incremental approach. Like, you have to just find, like, the right sequence of what is the technology advance that I want to build now that enable enough of a products that I can deliver? Which then in turn allow you to build more capable models that then in turn like a larger service, um, of area. And this is like... I mean, we, we have seen this play out in the non-robotics world as well, right? Like, if we, if we think about OpenAI, Anthropic, Cohere, a lot of these big language models, um, players, um, like, the models that they have are not fully general language models yet, right? But they are good enough they can solve a large section of problems that it's worth productionizing them, um, getting commercial value out of it. Which then in turn allow you to build the next incrementally, um, better system. And I think of it as the s- same kind of, um, roadmapping exercise that you have to do in autonomy. Like, you, you cannot just go straight to the, like, full general physical AGI, um, at the beginning. Like, you have to build something that, that, like, represents, like, a justifiable R&D spend as well as timeline that you can justify. But that allows you to build something that is valuable that you can ship to customers. And from that process, you get more data, you get more learning, that then in turn allow you to build a next-generation model. So, we think of it as very much an iterative approach, uh, and having real products and having real customers, like, allow you to ground that approach as opposed to, um, just being a philosophical debate of, like, how we build this super, super general thing that is very far out in the future.
8:13 – 12:21
Manufacturing robotics today
1. PCPeter Chen
2. SGSarah Guo
  Then I think the right way to start would actually be to ground the conversation in kind of, like, the application landscape. Can you walk us through the sort of limitations of robotics in warehousing and manufacturing that are commonplace right now, and how much intelligence these robots have?
3. PCPeter Chen
  Robots are extremely common, um, nowadays. Like, so what we typically work on are robotic arms. So think of these as six axes, seven axes, um, robotic arms that can do very flexible movements. They are super precise, they're super fast, and super doable, and very cheap. Lots of factories around the world have robots, um, but the challenge is, like, 99+% of the robots that are deployed in the world are dumb robots. Like, these robots are pre-programmed to do the same thing again and again, and they don't really have any kinds of intelligence that can adapt to new circumstances, communicate with people, and change what they do on the fly. And so think of robotics that exist today are extremely rigid. And so really the problem that we are solving is we're not trying to make the existing dumb robot use cases better, right? Like, we're not trying to say, "Oh, instead of, uh, manually programming this robot, you could just have an AI that, that program that robot." Uh, we're not talking about that. Like, we're really talking about, like, opening up a couple orders of magnitude more use cases where the robots actually need to be smart. Like, they need to adapt what they do based on the scenario that is presented to them, right? So, like, the w- a good way to visualize this is on one hand, like, think about a robot, for example, in a Tesla factory, that is handling a car body, right? Okay. This is, this has, this is a very incredible feat of engineering that can move, like, multi-ton object, um, very fast, very precisely, but it's just doing the same thing again and again. Like, and then imagine another robot in a e-commerce warehouse that has hundreds of thousands of unique items that it has to distinguish, pick up, and pack carefully into a box that gets shipped to you. That's a very different kinds of diversity, um, um, that we're talking about. And so when we think about building AI for robots, when we think about building foundation models for robots, we're thinking about really lifting robotics as a category from this former category of just being able to do repeated things, to this category of really being able to handle diversity, um, of environments, changes in the environments, and being able to understand what's around it and make intelligent decisions and actions, um, to handle a diverse set of circumstances. And we think, like, this would enable really a whole different wave of robotics that is not how robotics is used, um, today. And for Covariant specifically, we are starting from, um, logistics and warehouses as an industry that we focus on, um, so this is... Think of it as the explosive demand that is driven by the growth of e-commerce. There's a lot of complexities that's been injected into the logistics and supply chain. Um, and at the same time, coupling that with demographics change, changing immigration landscape makes fewer and fewer people want to do these kind of warehouse jobs, like drive an hour and a half to the suburb and then have to work through the midnight. Like, these are not the kind of jobs that people want to do, and our customers have extremely high turnover rate. Like, an average warehouse that we serve have typically more than 100%, uh, year-over-year turnover rate. And so, like, these are the type of places that we have an extreme shortage of people that want to do those kind of jobs, and yet at the same time there are no prior robots that can solve pick, pack, ship in warehouses because, like, traditional robots are just machines that do the motions that you program it to do repeatedly and... But here you actually need systems that's actually adaptive and do it at a very high level of reliability.
4. SGSarah Guo
  Mm-hmm. Can you, um, describe,
12:21 – 15:45
Put wall use case
1. SGSarah Guo
  like, like, how we should imagine the physical... Like, you obviously have Covariant Brain, but then you have the physical instantiation. Like, what's a, what's a put wall just for our listeners?
2. PCPeter Chen
  Yeah, so a, um, common use case that we have for our customers is w- what we typically call a put wall use case. A put wall is a, uh, term that is used in e-commerce fulfillment, um, and which is...... like, when you click a button to buy something online and, and then the box show up to your door. And you might wonder, like, "Well, how is that done?" Well, there's a complex sets of operations that's happening in the background, uh, and a put-war is one step of that, like. And this step is typically used to sort a mix of customer orders to different customers, right? Like, let's say both you and I have order a new generation of iPhone, right? And then, like, a robot would be sitting there and picking up one iPhone and say, "Oh, this one should go to Sarah, and this one should go to Peter." If you think about, like, what that robot needs to do, like, the robot needs to have an incredibly great ability to grasp items without damaging it, uh, and have the accurate ability to identify what is the item and then route them to the appropriate customer, like, in this case, like, either you or me. Uh, and so put-war, you can think of it as a sortation mechanism. You can think of it as a physical router that exist, um, in the world. Like, so instead of thinking about, um, network router that sends digital packets around, like, you can think about put-war as a physical router that sends goods to different places.
3. SGSarah Guo
  Is it fair to say that, you know, identification and routing are more solved problems than grasping?
4. PCPeter Chen
  I would say identification and routings are typically more, considered more solved problem than grasping. Like, because if you... there are other, like, more, um, mechanical way to solve those problems. Like, you can design a piece of conveyor that, like, if you always put an item to the same place, then you can route it to a designed location. And so, like, that becomes mostly a mechanical problem, and anything that is a mechanical problem is typically more solved. And so that is very much true. Like, I would say, like, out of this grasping, identification, and routing, like, definitely the grasping part, um, involves more AI. But as we build more advanced AI and bring it into a more traditional fields, like robotics, like, what we actually find is that even in the identification step, even in the routing steps, there are a lot of ways that AI can make more traditional mechanical systems smarter, right? Like, for example, like, a classic way to do identification is through scanning the barcode. But, um, where's the barcode? Like, how do you scan the barcode? Well, that's actually something that AI can inform it, right? And, like, oftentimes, like, human can identify an item without even scanning the barcode because you can read the packaging. Like, you can infer, like, what is in there. Uh, and that is also something that, uh, AI can help. And so, like, while it is true that there are some steps of the problems that can be solved by more traditional mechanical and robotic systems, uh, what we have found is that, like, once you have a very flexible AI, you can actually rethink a lot of the processes. Like, you will make something that was previously impossible possible, like grasping, and then you can also improve a lot of the other steps of the processes that were previously possible but now you can do them in a more intelligent way.
5. SGSarah Guo
  Is the, um, next step
15:45 – 18:42
What’s next for Covariant Brain
1. SGSarah Guo
  of expansion that you are excited about for Covariant still within, um, pick and pack or are there other tasks within, um, warehousing and logistics that you think are really interesting to expand into? Or, you know, there's, um, other forays into different robotic applications like, you know, humanoid robots like the Tesla Optimus or other industrial applications.
2. PCPeter Chen
  Yeah. Um, a couple... Like, starting at the very highest level, right? When we think about the Covariant Brain, this foundation model that we are building, like, we are not building it just for warehouse operations- applications. We are not just building it for pick and place, uh, applications within warehouses. Um, so definitely, like, everything that you're talking about, it's very exciting to us, like, so both applications outside of warehouses as well as applications to newer hardware form factors like humanoid, um, um, robots. And so, like, that definitely is the long-term path, um, for us. I would say, like, in the very immediate future, uh, as a company, we are focused in the manipulation space of warehouses just because there are so much demand and there are so many different kinds of use cases that exist, um, in the warehouse domain already. Because a warehouse for a apparel company is very different from a warehouse for a cosmetics company, which is very different from a warehouse for a meal prep, um, company. And across all of these, you actually have very different manipulation skills that you need and very different kinds of data that you can collect to train the foundation model and also very different large markets, um, that we can tap into. Um, but we are very intentional in how we build the models in a way that makes sure it's generalizable and so we can actually extend into new domain. Uh, and one more comment on the humanoid, um, question. Like, I think that would be one of the most exciting advances in robotics, is to make humanoid as a form factor possible, right? Because our world is designed around human bodies. Like, so humanoid is the universal hardware form factor that can be dropped into any place, uh, in our world. And so, like, we really, like, we cannot, um, um, we really cannot wait for, like, humanoids to be, like, commercially and also technologically available. Like, because when that platform is available, that is really the best mechanism for us to deploy Covariant Brain, this foundation models, to go to more places more quickly. Fortunately, we are not reliant on it. Like, even by using the existing industrial robots hardwares, um, we can build a scaling business. We can continue to bootstrap and build incrementally more capable models. But if, when it comes, like, that would be a really big acceleration for us.
3. SGSarah Guo
  Mm-hmm. Mm-hmm. One more, um, question on this riv- uh, application or, um, maybe just the Covariant side
18:42 – 19:50
Covariant’s customers
1. SGSarah Guo
  before I would love to talk a little bit more about the research is, um, can you give, uh, our listeners a sense of you're five years into Covariant. Like, how big is the team? You have robots in the production. Any, like, what are your types of customers?
2. PCPeter Chen
  Yeah. So Covariant is about, um, 200 people company, uh, and we are extremely international. Um, I would say roughly half of our customers are in Europe, half of our customers in North America. Um, and, uh, we have robots deployed across three continents at this point, and more than ten countries. Um, and what is really remarkable, uh, all of these customers, all of these different robots are networked together, like, as one single foundation model. And everything that they learn come back and make this central model, um, better. And our customers are typically large retailers, large e-commerce brands, uh, and essentially anyone that runs a, um, large distribution centers or a network of distribution centers, like, would likely choose Covariant, um, as their model that power the physical world.
19:50 – 25:47
Grounding concepts in Ai
1. PCPeter Chen
2. SGSarah Guo
  Amazing. Can we talk a little bit, uh, just, uh, about the research? And I, I think the first, uh, thing I'll ask you to explain as just a very high-level concept is what the concept of grounding in understanding of the real world or, you know, foundation models that understand physics and objects interaction, like, what that means, or w- you know, how that's missing today.
3. PCPeter Chen
  Yeah. Um, so grounding is this interesting idea of, um, like if you just read the text on the internet, like, you learn a lot about abstract concepts, right? But, but they could be, like, purely symbolic, like, you might read, "Apple is delicious." Okay. I, I, I, I have this association that, okay, like, something that is apple could be delicious. Uh, and if I ask what delicious thing, you can say, "Apple is a delicious thing." But that is very symbolic, like, that has, like, no actual grounding in our physical world, like, what does an apple look like? If I give you an image of an apple, can you recognize it? Uh, and can you recognize, like, the different other physical properties of an apple? Uh, and so, like, the first thing that you want to do is, like, grounding is to, to ground all these symbolic abstract concepts into something that is real, that is physical. Um, and there are actually a lot of advances of this, like even outside of robotics, um, that's happening already. Like, we have a lot of multi-modal model that exist, um, um, in the world. Like, if you go to GPT-4V, like, you actually could give it an image and then it can, um, answer something for you intelligently about what's in the image. Like, so, like, GPT-4V has grounded, like, these type of multi-modal language models, like, already have an understanding, um, of, um, those grounded concepts. So where does, where does it get those grounding from? Like, it gets those grounding from, um, essentially the image and text pairs that happen in, on the internet, right? Like, if you look at, uh, an Instagram image, right, it might have a set of captions, um, a- along with it. So we can train this kind of multi-modal models with a combination of those data, right? Like, after you have seen enough of the, uh, Instagram image of an apple, and enough of people tag them as apple, then after you have trained on a large amount of such data, you start to get that grounding. You start to pick up that, um, associations. Um, so that's like, I would say outside of robotics, like, how typically grounding happens and how you typically get this kind of multi-modal, um, understanding that understands beyond just pure symbolic concepts, but actually has an understanding of how it gets associated with the real physical world, uh, typically manifested through an image of the real world.
4. SGSarah Guo
  And, uh, if I think about just the concept of an apple is in many videos on YouTube. Um, they are kind of round, they're affected by gravity, they have some mass. Like, what's missing from those captioned images and videos when you talk about, like, the data that's missing that you need to go collect for robotics to improve?
5. PCPeter Chen
  Yeah. So, um, there are a couple aspects of it, like, so, um, like obviously this kind of internet scale data is very useful. Like, you can already pick up a lot of association and grounding with the physical world. Um, but there are still a lot of things that's, that's missing, right? So for example, like when you think about this kind of, um, naturally occurring text and image pair data, they're typically about high level concepts. Like, they're typically not about something that is very precise. Like, so for example, like when I present, uh, an apple to you, like you don't typically describe, like, the precise shape of the apple, right? Like, is, is, is this like a very round shape apple? Is this like a very full apple? Like, you might use some high level concept to describe it, but there's really nothing that describe it, say, down to sub-millimeter level precision, which is kind of like the level of, like, precise understanding that you need to interact with the real world. You, you don't just say, "Well, there's kind of an apple there." But there might be like up to a two centimeter, like, difference in understanding of where the boundary of that apple is and how should I do it. And so, like, here's, like, the first dimension of, like, things that is missing, which is, like there's really no very, no precise grounding. Um, um, and there's no precise understanding of the physical world that's naturally occurring, um, on the internet. Um, so that's, like, one of the first thing that you find kind of the departure of robotics foundation models from, like, other general, uh, multi-modal foundation models, like, it's this idea of precision. Like, you now actually need to understand things to a much higher level, um, of precision, um, that don't otherwise exist, uh, in this kind of data set. Um, and so that's, like, one big thing. And then another really big thing is, um, like this ability to, um, understand effects of your own actions, uh, and a large part of this is just because there are not a lot of robots that are doing interesting things, uh, uh, in the world. And so, like, there are not a lot of data sets that, uh...... uh, are in the format of robot does something, and then you know the outcome of it. Like, is this a good way to pick up something? Like, if I move an item too quickly, like, would it damage it? If I press, like, for example, a tomato, like, what is the force that is appropriate, that, that is possible? Like, you don't have a lot of these kind of, um, action and outcome pairs, um, that exist in the world. Like, the closest thing to that is probably on the YouTube, you have human doing those things, but then there's a research question of, like, well, can you have a robot that learns from just watching a human does it and you don't actually fully know, like, how hard does a human press on the tomato, uh, or like how you precisely slice something. So you're still lacking a good amount of the data that, like, completes this feedback loop.
6. SGSarah Guo
  Do you have some sense of, like, how or if scaling laws apply for you?
25:47 – 29:21
How scaling laws apply to Covariant
1. SGSarah Guo
  Like, do you know how many robots you need to deploy or how much data you need to go collect to get to certain levels of improvement? Or can you try to predict it now?
2. PCPeter Chen
  So I would say the most technical definition of scaling law, um, does apply and we have seen it apply, uh, in this domain. And it's somewhat not surprising because, like, like if you think about, like, the scaling law in the most technical sense, which is if you scale up data and you scale up your model capacity and you scale up the compute that you throw at it, you get lower loss function, like training loss function, um, out of it. And we have seen this play out across so many different domains, like more than just language model, that is not surprising. Um, I think the question that you're asking is probably the more, um, not the most technical definition of scaling law, but it's the general definition of scaling law, which is as you scale those up, would you get emergent capabilities out of it? Like, would you kind of like get something that's just like modeled as orders of magnitude smarter in some-
3. SGSarah Guo
  Mm-hmm.
4. PCPeter Chen
  ... loose, um, definition of it. Like, which is kind of the thing that we see from the large language model world. Like, when you go from GPT-3 to GPT-4, when you go from Claude 1 to Claude 2, like, you kind of, like, see this step change improvement in reliability, in generalization, um, that you get from it. So I assume that's, like, probably what you're, what you're asking.
5. SGSarah Guo
  Yes. Do you believe in some emergent...
6. PCPeter Chen
  So I would say we see some element of it, but it is something that we rely less on. And here's, like, where I think there is a really interesting crucial distinction between a, um, call it full general model that is designed to solve everything in the world to what I think of as a domain-specific, um, foundation model, like in our case, like solving robotic, um, manipulations. So in a full general model, like for example, like GPT-5 that you want it to solve everything in the world, then you have this problem of essentially out of domain generalization. Like, when we say, like, like as you scale it up, like, do you get something that is much smarter out of it? Like, we are not saying, like, whether GPT-5 would fit the training data better. Like, we are saying, like, if you give a scenario that is completely outside of training data, like, how well does it work? And that is where, like, you kind of, like, need to rely on this strong form of scaling law. Um, but you kind of don't need that, um, when you are in a more restricted domain like robotics, um, because, like, you actually could have so much data coverage that your test scenarios are just part of your training scenario. Um, so to some degree, like, we actually don't need to rely on this strong form of scaling law to hold, um, for us to build really valuable technology out of it. Um, and so I expect, like, something similar like that would happen, like would follow the similar trend that you see in the language world. But at the same time, like, we don't, we don't require it. Like, we know that, like, as you get more customers, as you get more data, like, these systems would get better. And especially if you have targeted data coverage for specific domains, for specific customers, like, they would be guaranteed to get better. Like, so to some degree, like we, um, whether you believe, like, robotics can scale or not, it's, it's a simpler bet. Like, it's just like whether you can get data of that domain. And if you can get it, like, then you can for sure, they can fit it.
7. SGSarah Guo
  L- last question in this research area, is there a
29:21 – 32:54
Covariant’s driving thesis
1. SGSarah Guo
  specific scientific insight that or, or bet that Co- Covariant has made? Or should we think of this as n- no, not at all trivial, but a full stack play with the right people, very well prepared, um, engineers and scientists doing the relevant data collection that doesn't exist today that will support increased robotic intelligence versus let's say, like, a architectural bet or whatever it is?
2. PCPeter Chen
  Yeah, it's, uh, like the architectural has changed, like, maybe five times already. Like, like-
3. SGSarah Guo
  Right.
4. PCPeter Chen
  ... it has gone through, like, significant transformation, like, every year. Like, I don't, I don't think you can be married to any single specific architecture in a field that is moving so quickly. But there is one unique bet that we are placing, right? So, um, that one unique bet is we believe the future of robotics would be built by whoever that has most robotics data, right? And, and essentially the whole company is built around that thesis. Um, and like you can say, like an alt- what is an alternative belief? Like, an alternative belief would be, can we just solely rely on simulation? Like, we actually don't need much real world data. Like, that would be a different philosophical bet, like, uh, on it. Like, we also use simulation, but we think of simulation as more of a way to augment the data, not as the way to replace everything.
5. SGSarah Guo
  There are lots of smart Tesla and ex-Tesla people or Tesla has been a, I guess, big proponent of high quality simulation, including for, um, you know, training data generation, right? Where are the gaps or why do you believe that's, that's insufficient?
6. PCPeter Chen
  When we think about simulation, it's actually somewhat different for different kinds of economy domain. Like so when you think about simulation in self-driving car, like we are really mostly thinking about systems that hopefully don't physically interact with each other, right? (laughs) Like if two cars get in contact with each other-
7. SGSarah Guo
  Sure, yes.
8. PCPeter Chen
  ... that's a really terrible thing, right? And so the simulation there is more about simulation of-
9. SGSarah Guo
  Avoidance.
10. PCPeter Chen
  ... multiagent behaviors like-
11. SGSarah Guo
  (laughs)
12. PCPeter Chen
  ... yeah, avoidance of contact.
13. SGSarah Guo
  Yeah.
14. PCPeter Chen
  But if you think about like, like manipulation, like if you never contact something, that's also a big problem, like because like then you actually don't do any work. Um, and whenever you involve contact, simulation of those things become very, very difficult, like items that can deform, like, like the contact that makes this incredibly challenging. And so those are where simulation becomes very difficult, like is when it involves contact, complex dynamics. And then there's the second thing that makes simulation difficult is, like I, I mentioned earlier that the typical customers that we serve, like may have 100,000 distinct objects in a warehouse. Like so, like if you want to fully recreate that in your simulation, like that is actually more work than just learning a system that can deal with, um, um, the real world. Like so the identification problem, like in order to specify the real world in your simulation, like that actually might require more data or more work, uh, uh, or whatnot. And that being said, like we believe in learned role model. Like we believe in foundation models that can learn from the real world and you can simulate new scenarios of what would happen if you do things differently. But that, I think of that as like different from the classical simulation that I referred to earlier, which is program, program-based and you are just, um, hard coding the rules of reality and then building agents that learn from the mechanical interpretations of the rule of realities, um, that you encoded in your simulator.
32:54 – 35:12
the Chat-GPT moment for robotics
1. PCPeter Chen
2. SGSarah Guo
  So for our last couple minutes, should we zoom out and talk a little bit-
3. PCPeter Chen
  Yeah.
4. SGSarah Guo
  ... about the future?
5. PCPeter Chen
  Yeah.
6. SGSarah Guo
  So you've said we're sort of pre-ChatGPT for the robotics industry. What is the ChatGPT moment for robots? What, what do you imagine?
7. PCPeter Chen
  The ChatGPT moment for robots, you want AI that is as general as ChatGPT, like so you would be able to throw robots into any arbitrary n- new scenarios and they would be able to learn how to deal with it very quickly. Um, but in addition to that, like which is kind of like what ChatGPT allow people to experience is you can ask it arbitrary problems, like and then it can solve to some degree, um, um, to you. So you want the same kind of generality. Um, but in addition to that, what you also need is really high reliability, um, because like you really don't want robots that only succeeds in, like the task that you ask it to do 70% of the time and then there's like, there might be 30%, like really catastrophic outcomes, um, that come out, come with it. So I would say that the bar for the ChatGPT moment for robotics is high. Like you, you need to solve the generality, like which is the same kind of problem, but you need to solve it with high level of reliability. And this is like where like, um, one of these concepts that we talk about earlier comes in. Like you really need large amount of high quality data to densely cover, um, um, like this robotic fields, um, um, that you want. And so that would be what I think about as the model side, um, of the ChatGPT moment for robotics. And then you also n- need to think about the hardware portion of it, right? Like even if you have a robot AI that is very smart, unless you're just interacting with this robot AI in some metaverse digital 3D world, um, you still need some hardware body for robots. And before humanoids are f- fully widespread, uh, I think we will see the, the ChatGPT of, uh, moment for robotics being articulated in the industrial settings earlier than in the commercial settings. Like because those are the places that can actually justify the hardware investments because the hardware is being used 24/7, as opposed to like home robots that might only be used two hours a week. Like, that's a very different ROI from the hardware piece that you need to put in it.
8. SGSarah Guo
  What does the, uh, like warehouse, or factory,
35:12 – 37:02
Manufacturing center of the future
1. SGSarah Guo
  or, um, logistics center of the future look like? Like lights out, no humans?
2. PCPeter Chen
  I don't think it would be fully lights out and no human, at least in the near future. But I think of it as would be very robotics augmented. Like so think of one person would be able to oversee 10, 20, 30 robots. Like so like, like instead of like one person have to manually do all those work, like you actually work with a fleet of robots. Like so think of like kind, kind of as a physical co-pilot type of setup. Like you just get this like large amplification of like what a one person, um can do. But most likely, it wouldn't be completely lights out. Like you will still have people there. I think this form of, um, expression of AI, like would probably be true not just for robotics but many other fields of AI as well.
3. SGSarah Guo
  I, I realize you just said industrial applications first from an ROI perspective. That makes sense. But do you have a guess or a hope for what the first form or use case for intelligent robot that your average human, like your consumer interacts with?
4. PCPeter Chen
  If I have to guess, it probably would be a home robot that don't involve much manipulation. So think of it as like a home robots that might be like a Roomba. It can follow you around, like you can talk to it. So like it has that navigation of movement aspects of it, but not necessarily the manipulation aspects of it, like not actually manipulating the physical world around it. I think that would be the most technologically feasible, um, um, version. So think of it as similar to Amazon's Astro robot, like this kind of like cute robot that has two wheels that can follow you around and someone calls it, it can, it can go there. And so, like, I think that type of form factor, uh, would probably be like when we would see it earlier.
5. SGSarah Guo
  Robotics AI work,
37:02 – 40:57
Safety in AI robotics
1. SGSarah Guo
  it triggers a lot of concern around safety in both like the short term practical sense and in sort of the AGI breaking into the real world sense. How do you think about safety at Covariant?
2. PCPeter Chen
  Uh, we have a simple carve-out to this question, right, because we focus on industrial applications, uh, and well, all industrial robots, like, have a, uh, a s- set of safety rules that they need to conform to, right? Because it's not just AI can be dangerous, like manual programming can be dangerous. Like, you could (laughs) make a... You can program a robot to do dangerous things already. And so there's a really robust sets of rules around you have to put safety cages around robots, uh, and if you have, you don't have safety cages, you would need to have certain kinds of certified controller that makes sure robot doesn't do anything that's dangerous, um, to the surrounding equipments, people. And so from that sense, like, because we are just following the same, um, rules, like, any kinds of robots that we build and deploy are by definition safe, um, or by construction safe. But that is very different from, like, when you say, "Well, what if we hook up, like, an arbitrarily expressive agent into a h- home robot that also has, um..." Like, how do you limit that to be safe is much harder, like, just similar to, like, if you just hook up a language agent to give it arbitrary Python code execution capability and arbitrary ability to access the internet. It just becomes very difficult to say, "Well, how can you make sure, like, it doesn't do anything dangerous?" And, and that's where the alignment problem comes in and that, that's where there's a lot of this good safety research, um, comes in. But we have a simple carve-out, like, at least for the near term in this kind of industrial applications.
3. SGSarah Guo
  Peter, what advancement in AI research or application outside of robotics are you most personally interested in?
4. PCPeter Chen
  Looking backward or looking forward?
5. SGSarah Guo
  Looking forward. I can only look forward.
6. PCPeter Chen
  I think the same kind of events that we have seen in... last year, like, we would see at least the same more, um, order of magnitude of them, uh, in the coming year. Like, it's just, if you look really behind, like, all these advances in large language models, image generations, they are still using relatively primitive, um, technology, right? So, like, if you, especially large language models, like, they are mostly still trained just on next token prediction, like, which in... For people that study reinforcement learning, we call it behavior cloning, which means that you're just asking the AI to clone the behavior of another agent. And that is, like, one of the most primitive way possible to train this type of systems, right? Because if you're just mimicking something, like, there's a natural ceiling on how good you can get on that. And then there is just so many other proven toolboxes that we have not deploy yet that, like, I would say, like, progress is guaranteed, like, in everything that we have seen, um, so far. And, uh, I'm super excited about that, uh, and I'm also super excited about the, uh, open source movement, um, continuing in the AI world, like, where a lot of these advances make, uh, available to a broad set of communities that can continue to build on it and experiment with it. Um, and so I think it will continue to be a very exciting year, um, of AI progress.
7. SGSarah Guo
  Okay, then looking, looking backward and forward at the same time, last question is your favorite sci-fi book with robots in it, realistic or not?
8. PCPeter Chen
  It's not a book, but I really like Westworld.
9. SGSarah Guo
  Okay, (laughs) great. Westworld, the future comes.
10. PCPeter Chen
  (laughs)
11. SGSarah Guo
  Peter, thank you so much for joining us on No Priors. Until next time.
12. PCPeter Chen
  Thanks.
13. SGSarah Guo
  Find us on Twitter @NoPriorsPod. Subscribe to our YouTube channel if you wanna see our faces, follow the show on Apple Podcasts, Spotify, or wherever you listen. That way, you get a new episode every week. And sign up for emails or find transcripts for every episode at no-priors.com.

Episode duration: 40:57

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode FSrF4r-gyMA

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome