Skip to content
Y CombinatorY Combinator

Quan Vuong: How Open-X Proved Generalists Beat Specialists

Open-X cross-embodiment training showed generalists beat specialists by 50%; cloud-controlled inference removes the on-device bottleneck for robotics startups.

Quan VuongguestJared FriedmanhostGarry Tanhost
Apr 15, 202649mWatch on YouTube ↗

At a glance

WHAT IT’S REALLY ABOUT

Foundation models and cloud control are accelerating robotics startup boom

  1. Robotics progress accelerated when vision-language models began transferring semantic knowledge into robot actions, as shown by projects like PaLM‑E and RT‑2.
  2. Open-X demonstrated cross-embodiment training—pooling data from many robot types—can outperform single-robot specialists, suggesting emerging scaling laws in robotics.
  3. The biggest constraint is not algorithms alone but data generation/capture and operational infrastructure for collecting, managing, and evaluating real-world robot data.
  4. Mixed-autonomy deployments (robots operate with minimal human takeover) are now viable for real businesses, as shown in laundromat folding and warehouse packaging demos.
  5. Cloud-hosted models can control robots in real time via API calls using action-chunking/pipelining, reducing on-robot compute requirements and enabling faster iteration as models evolve.

IDEAS WORTH REMEMBERING

5 ideas

Robotics is becoming a foundation-model problem, not just a control-stack problem.

Language/vision-language models supply common-sense semantics and planning priors, and adapting them with robot data enables surprising generalization to unseen objects and instructions (e.g., RT‑2 style transfer).

Cross-embodiment training can beat single-robot specialization.

Open‑X results described a generalist model trained on data from ~10 robot platforms performing ~50% better than embodiment-specific specialists, implying abstraction (“how to control robots”) emerges with diverse data.

The true bottleneck is operational: collecting, capturing, and evaluating robot data at scale.

Unlike the internet for text, robotics lacks a naturally aggregated dataset; progress depends on building repeatable pipelines for data ingestion, quality, labeling, and evaluation—especially as task duration and capability grow.

Mixed autonomy is the practical bridge to full autonomy and scalable economics.

Deployments can start when mistakes are tolerable and humans can intervene; with corrections and continuous exposure to edge cases, systems improve until they approach full autonomy and can reach economic break-even.

Real-world demos show robots are moving from lab tricks to operations-ready systems.

The Weave laundromat folding example emphasizes deformable-object generalization, while the Ultra warehouse pouching example emphasizes long-duration autonomy under changing lighting and real production constraints.

WORDS WORTH SAVING

5 quotes

Our mission is to build a model that can control any robot to do any task that it's physically capable of.

Quan Vuong

If you simply take the data and absorb it into a model… you can compare it to the specialist… And… it was 50% better.

Quan Vuong

In robotics… if you wanna add two years to your PhD, just work on a new robot platform.

Quan Vuong

Folding laundry… has always been… the Turing test for robotics.

Jared Friedman

Almost all of the robot evaluation that we run… the model [is] hosted in the cloud… querying an API endpoint… and getting back action that then execute directly on the robot.

Quan Vuong

“GPT-1 moment” analogy for roboticsSemantics vs planning vs control in roboticsRT‑2 / PaLM‑E and knowledge transfer to actionsCross-embodiment scaling and Open‑X datasetData generation vs data capture incentivesZero-shot/emergent robot skillsMixed autonomy in real deploymentsCloud-controlled robots and real-time chunkingVertical robotics startup playbookRobotics infrastructure gaps (teleop, annotation, eval)

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome