YC Root AccessRobots Don't Need More Compute. They Need This.
At a glance
WHAT IT’S REALLY ABOUT
Encord’s bet: physical AI progress hinges on better data pipelines
- Encord positions itself as an AI-native “universal data layer” that helps physical AI teams ensure the right data enters model training and the wrong data stays out.
- The company evolved from computer-vision annotation automation to a broader multimodal platform after ChatGPT increased trust in AI-assisted data workflows.
- Physical AI faces the inverse constraint of LLMs: compute is abundant, but real-world embodied data collection, curation, and evaluation are hard and costly.
- Encord is expanding into earlier (data collection/pre-training) and later (post-deployment observability and exception handling) stages to create a full data flywheel for robotics teams.
- The founders argue the economic upside is massive because most of the world’s economy involves physical work, and they aim to become the default infrastructure layer for physical AI data.
IDEAS WORTH REMEMBERING
5 ideasPhysical AI is bottlenecked by data, not compute.
LLMs benefited from internet-scale text data; robotics needs embodied, real-world data that must be intentionally captured, curated, and validated to reach similar “scaling law” gains.
Small data errors can become big real-world failures.
In physical systems, mistakes can cause safety-critical outcomes (e.g., vehicles, drones), so the tolerance for noisy labels and poor curation is much lower than in many digital-only AI products.
A single, consolidated view of the data flywheel is a competitive edge.
Encord argues that indexing, curation, annotation, model-assisted pre-labeling, and evaluation in one platform enables automation across the pipeline and faster iteration to production.
ChatGPT didn’t just create demand—it changed trust dynamics.
The founders claim early customers were reluctant to let AI touch their data; widespread consumer trust in AI after ChatGPT made AI-assisted workflows (including annotation automation) more acceptable.
Multimodality is operationally harder than text-only AI.
Coordinating video, sensor streams, audio, and text at scale is harder to visualize, QA, and collaborate on, increasing the need for specialized tooling beyond typical LLM data stacks.
WORDS WORTH SAVING
5 quotesSo ultimately, a model is only as good as the data it's trained on, and even like the slightest errors in the data set can influence and impact like how the model actually works in the real world.
— Ulrik Waage
They thought that the Icelandic dating market was going to be bigger than, than the AI market.
— Eric Landau
What it proved was that if you throw data and compute at a problem, then these systems can be extremely performant... With physical AI, it's actually the opposite. Now we have all the compute infrastructure, but you need the data to actually get to the scaling law.
— Eric Landau
But if you have a model in the real world hallucinate, right? That's a self-driving car. That could be a drone that falls down from the sky.
— Ulrik Waage
One thing that was, I think, surprising to, to us is how much it costs to not make a decision. So there's a big opportunity cost of just indecision, and you're constantly paying interest on decisions that you don't make.
— Eric Landau
High quality AI-generated summary created from speaker-labeled transcript.
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome