YC Root AccessRobots Don't Need More Compute. They Need This.
Nicolas Dessaigne and Ulrik Waage on encord’s bet: physical AI progress hinges on better data pipelines.
In this episode of YC Root Access, featuring Nicolas Dessaigne and Ulrik Waage, Robots Don't Need More Compute. They Need This. explores encord’s bet: physical AI progress hinges on better data pipelines Encord positions itself as an AI-native “universal data layer” that helps physical AI teams ensure the right data enters model training and the wrong data stays out.
At a glance
WHAT IT’S REALLY ABOUT
Encord’s bet: physical AI progress hinges on better data pipelines
- Encord positions itself as an AI-native “universal data layer” that helps physical AI teams ensure the right data enters model training and the wrong data stays out.
- The company evolved from computer-vision annotation automation to a broader multimodal platform after ChatGPT increased trust in AI-assisted data workflows.
- Physical AI faces the inverse constraint of LLMs: compute is abundant, but real-world embodied data collection, curation, and evaluation are hard and costly.
- Encord is expanding into earlier (data collection/pre-training) and later (post-deployment observability and exception handling) stages to create a full data flywheel for robotics teams.
- The founders argue the economic upside is massive because most of the world’s economy involves physical work, and they aim to become the default infrastructure layer for physical AI data.
IDEAS WORTH REMEMBERING
5 ideasPhysical AI is bottlenecked by data, not compute.
LLMs benefited from internet-scale text data; robotics needs embodied, real-world data that must be intentionally captured, curated, and validated to reach similar “scaling law” gains.
Small data errors can become big real-world failures.
In physical systems, mistakes can cause safety-critical outcomes (e.g., vehicles, drones), so the tolerance for noisy labels and poor curation is much lower than in many digital-only AI products.
A single, consolidated view of the data flywheel is a competitive edge.
Encord argues that indexing, curation, annotation, model-assisted pre-labeling, and evaluation in one platform enables automation across the pipeline and faster iteration to production.
ChatGPT didn’t just create demand—it changed trust dynamics.
The founders claim early customers were reluctant to let AI touch their data; widespread consumer trust in AI after ChatGPT made AI-assisted workflows (including annotation automation) more acceptable.
Multimodality is operationally harder than text-only AI.
Coordinating video, sensor streams, audio, and text at scale is harder to visualize, QA, and collaborate on, increasing the need for specialized tooling beyond typical LLM data stacks.
WORDS WORTH SAVING
5 quotesSo ultimately, a model is only as good as the data it's trained on, and even like the slightest errors in the data set can influence and impact like how the model actually works in the real world.
— Ulrik Waage
They thought that the Icelandic dating market was going to be bigger than, than the AI market.
— Eric Landau
What it proved was that if you throw data and compute at a problem, then these systems can be extremely performant... With physical AI, it's actually the opposite. Now we have all the compute infrastructure, but you need the data to actually get to the scaling law.
— Eric Landau
But if you have a model in the real world hallucinate, right? That's a self-driving car. That could be a drone that falls down from the sky.
— Ulrik Waage
One thing that was, I think, surprising to, to us is how much it costs to not make a decision. So there's a big opportunity cost of just indecision, and you're constantly paying interest on decisions that you don't make.
— Eric Landau
QUESTIONS ANSWERED IN THIS EPISODE
5 questionsWhat does “universal data layer for physical AI” mean in concrete terms—what systems does Encord replace vs. integrate with?
Encord positions itself as an AI-native “universal data layer” that helps physical AI teams ensure the right data enters model training and the wrong data stays out.
How do you measure and enforce data quality for safety-critical robotics (e.g., label accuracy targets, audit trails, inter-annotator agreement, edge-case coverage)?
The company evolved from computer-vision annotation automation to a broader multimodal platform after ChatGPT increased trust in AI-assisted data workflows.
What exactly happens inside the Bay Area R&D facility—what environments, sensors, and capture setups are you standardizing for customers?
Physical AI faces the inverse constraint of LLMs: compute is abundant, but real-world embodied data collection, curation, and evaluation are hard and costly.
Where do “humans in the loop” add the most value in physical AI today: frontier task labeling, QA, policy/safety review, or post-deployment exception handling?
Encord is expanding into earlier (data collection/pre-training) and later (post-deployment observability and exception handling) stages to create a full data flywheel for robotics teams.
You mentioned customers embed their models for pre-labeling—how do you handle model versioning, evaluation drift, and preventing feedback loops from reinforcing errors?
The founders argue the economic upside is massive because most of the world’s economy involves physical work, and they aim to become the default infrastructure layer for physical AI data.
Chapter Breakdown
Encord’s mission: a universal data layer for physical AI
Ulrik explains Encord as AI-native data infrastructure focused on physical AI and robotics teams. The company’s core aim is to ensure the right data goes into models—created, curated, annotated, and evaluated in one platform.
Why data becomes the bottleneck as models scale
The conversation frames model performance as constrained by data quality and data operations complexity. As models and deployments scale, datasets grow and continuous data feeding becomes essential to push performance in production.
Founding story: spotting the defensible wedge in AI (pre-ChatGPT)
Ulrik and Eric describe meeting as AI was taking off in the late 2010s and realizing data wrangling consumed the most time and could be highly defensible. They saw a broken workflow—outsourcing labeling overseas—and believed software could do better.
Early market skepticism: building AI for teams that didn’t trust AI
Eric notes AI/data tooling wasn’t a hot category in their YC era, when crypto/fintech drew more attention. They even struggled to raise seed because some investors doubted the AI market size, reflecting how early the bet was.
Scale today: customers, team footprint, and Series C milestone
Ulrik shares Encord’s current scale: hundreds of AI teams across autonomous driving and robotics, with a sizable team split between London and San Francisco. The announcement includes a $60M Series C and $110M raised total.
The first product: automating computer-vision annotation workflows
They describe Encord’s early product as annotation automation for computer vision—improving a process that was slow and operationally heavy. The initial target was image segmentation and related CV labeling workflows.
ChatGPT changed trust: from skepticism to acceptance of automation
Eric explains how ChatGPT was a turning point because it normalized trusting AI systems, even among AI companies. Encord had developed “micro models” to assist labeling, but customers were hesitant until the broader market saw AI work reliably in general settings.
Shift to multimodal and physical AI: why the data problem flips
After ChatGPT, Encord leaned into multimodal data (video, audio, text, sensors) and physical AI. Eric contrasts digital AI—where internet text made data plentiful—with physical AI, where compute exists but real-world embodied data is scarce.
New offering: enabling real-world data collection and pre-training support
Ulrik introduces a new initiative: an R&D facility in the Bay Area to help robotics companies collect embodied data for pre-training and training. Encord doesn’t build robots; it provides environments and the data pipeline to capture and operationalize training data.
Post-deployment needs: exception handling, observability, and QA in the real world
They argue the next phase for physical AI is operating safely in production, where failures have real consequences. Encord aims to support post-deployment workflows like exception handling and observability, linking real-world events back to model improvement.
Humans-in-the-loop remain essential because stakes and frontier tasks are different
They explain why human oversight persists in physical AI: frontier tasks and safety-critical contexts demand high-quality supervision. Humans also act as managers/supervisors of AI systems, especially when tolerance for errors is low.
What customers buy: faster path to market and better model performance
Encord is positioned as infrastructure that lets robotics and autonomy teams focus on building products rather than building data stacks. The value proposition centers on accelerating iteration and improving model quality via unified data workflows.
Example in production: Weave Robotics and the laundry-folding robot
Ulrik highlights Weave Robotics (a YC company) as a customer bringing a laundry-folding robot to market. The example illustrates Encord’s role as the physical AI data platform behind real consumer-facing robotics deployments.
Series C rationale and the outlook for robots: hype, consolidation, then scale
Eric explains the fundraise as an acceleration play as physical AI attention and investment increase. They predict a trajectory similar to self-driving cars: hype, consolidation, then rapid progress toward general-purpose household robots over a few years.
Ambition, hiring (including agents), and founder lessons on decision-making
They share Encord’s long-term ambition: become the default system through which physical AI data flows, analogous to Stripe for payments. The discussion also covers hiring across SF/London, adding internal AI agents, and advice on speed of decisions and adapting tactics while keeping long-term direction.
EVERY SPOKEN WORD
Install uListen for AI-powered chat & search across the full episode — Get Full Transcript
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome