At a glance
WHAT IT’S REALLY ABOUT
Covariant’s Peter Chen Builds Data-Driven Foundation Models For Real-World Robots
- Peter Chen, CEO and co‑founder of Covariant, explains how the company is building a foundation model for robotics by deploying AI-powered robots in warehouses and logistics centers.
- He contrasts today’s “dumb,” pre-programmed industrial robots with the next wave of intelligent, adaptive systems that can handle diverse objects and environments with high reliability.
- Chen argues that success in robotics will hinge on collecting massive amounts of real-world, embodied interaction data, not just simulation or internet-scale text and images.
- Looking ahead, he expects industrial manipulation to lead the way, with humanoids and consumer robots following once hardware economics and safety constraints are met.
IDEAS WORTH REMEMBERING
5 ideasReal-world data is the core strategic asset for robotic intelligence.
Chen’s central bet is that the winner in robotics will be whoever gathers the most high-quality, embodied interaction data from robots operating in production environments, because internet and simulated data miss crucial physical nuances.
Today’s robots are precise but fundamentally “dumb” and inflexible.
Over 99% of industrial robots are pre-programmed to repeat fixed motions and cannot adapt to changing items, layouts, or tasks, leaving huge classes of real-world problems—like e-commerce fulfillment—unsolved.
Foundation models for robotics need both generality and high reliability.
A “ChatGPT moment” for robots requires not just broad task coverage but failure rates low enough to avoid costly or dangerous physical errors, which demands dense data coverage and rigorous real-world training.
Warehouse manipulation is a high-value proving ground for robotic AI.
Logistics centers face booming e-commerce demand, labor shortages, and >100% annual turnover, making them ideal environments to deploy robots that learn from diverse manipulation tasks and generate valuable training data.
Grounding in the physical world requires precision beyond internet multimodal data.
Image–text pairs teach high-level concepts, but manipulation requires sub-centimeter understanding of object shapes, contact forces, and dynamics, plus action–outcome data that current online datasets largely lack.
WORDS WORTH SAVING
5 quotes“When we started Covariant, there was no AI that was good enough to make robots do useful things commercially.”
— Peter Chen
“99+% of the robots that are deployed in the world are dumb robots… doing the same thing again and again.”
— Peter Chen
“We believe the future of robotics would be built by whoever has most robotics data.”
— Peter Chen
“You cannot just go straight to full general physical AGI… you have to build something that is valuable that you can ship to customers, and from that process you get more data.”
— Peter Chen
“The bar for the ChatGPT moment for robotics is high… you need to solve the generality, but you need to solve it with a high level of reliability.”
— Peter Chen
High quality AI-generated summary created from speaker-labeled transcript.
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome