Humanoids Cost as Much as an SUV Now | Nikhil Kamath x Brett Adcock | WTF Online Ep 2

Name: Humanoids Cost as Much as an SUV Now | Nikhil Kamath x Brett Adcock | WTF Online Ep 2
Uploaded: 2025-11-05T00:00:00Z
Duration: 1 h 36 min 19 s
Description: Brett Adcock describes his path from software (Vettery) to electric aviation (Archer) and now humanoid robotics at Figure, arguing humanoids are the “ultimate general-purpose machine” because the world is built for the human form.

Nikhil Kamath and Brett Adcock on humanoid robots near consumer costs: data, safety, and autonomy race.

Nikhil KamathhostBrett AdcockguestNikhil Kamathhost

Nov 5, 20251h 36m

Adcock’s transition: Vettery → Archer Aviation → Figure AIHumanoid robot internals: actuators, battery, onboard computeOnboard vs cloud compute; control-loop frequency constraintsVision-based perception (camera-only) and neural controlData flywheel: human demos, fleet learning, transfer across tasksSafety, design language, and “Westworld” realism tradeoffsCommercial deployment vs home deployment; Project Go BigIndustry landscape: hardware-only vs software-only vs full-stackChina robotics narrative and manufacturing vs capability debateEnding OpenAI partnership; vertical integration strategyVoice as next UI/form factor; AI-native devices beyond phonesMacro implications: labor displacement, abundance economics, purpose

In this episode of Nikhil Kamath, featuring Nikhil Kamath and Brett Adcock, Humanoids Cost as Much as an SUV Now | Nikhil Kamath x Brett Adcock | WTF Online Ep 2 explores humanoid robots near consumer costs: data, safety, and autonomy race Brett Adcock describes his path from software (Vettery) to electric aviation (Archer) and now humanoid robotics at Figure, arguing humanoids are the “ultimate general-purpose machine” because the world is built for the human form.

WHAT IT’S REALLY ABOUT

Humanoid robots near consumer costs: data, safety, and autonomy race

Brett Adcock describes his path from software (Vettery) to electric aviation (Archer) and now humanoid robotics at Figure, arguing humanoids are the “ultimate general-purpose machine” because the world is built for the human form.
He breaks down Figure’s robot stack—electric actuators, torso battery + CPU/GPU compute, multi-camera vision, force/torque sensing—and explains why deep learning is essential given the enormous state/action space of a 40-joint body.
A core theme is data: Figure trains on real human demonstrations and fleet learning, believes real-world physical interaction is a larger untapped dataset than the internet, and is building a large-scale “robot pretraining set” (Project Go Big).
They discuss near-term commercialization (BMW pilot) vs. home use, safety around kids, competing approaches in the industry, China’s perceived lead, Figure’s decision to end the OpenAI partnership, and how ubiquitous humanoids could collapse costs of goods/services and reshape jobs and purpose.

IDEAS WORTH REMEMBERING

7 ideas

Humanoids are a full-stack problem—hardware and AI must be co-designed.

Adcock argues you can’t win with “hardware-only” bots or “software-only” autonomy; general-purpose performance requires tight integration of actuators, sensing, embedded compute, and learning systems that work end-to-end on the robot.

Deep learning unlocked humanoids because the state space is un-codeable.

With ~40 joints and continuous motion, the number of possible configurations explodes (he frames it as more states than atoms in the universe), making hand-coded approaches infeasible and pushing control toward neural networks.

High-frequency control forces significant onboard compute today.

Figure runs control and many neural nets on-robot because stable motion/manipulation needs ~200 Hz closed-loop updates; cloud links are too slow for that layer, though higher-level planning can increasingly move offboard.

Vision is camera-first, similar to Tesla-style perception rather than lidar-heavy stacks.

Figure’s robots “reason in pixel space” using multiple cameras (including palms/hands and head), enabling egocentric perception and closed-loop manipulation without lidar, paired with joint-state inputs and action outputs.

Real-world physical data is the moat; synthetic and internet text aren’t enough for embodiment.

He claims the internet’s text tokens are effectively “tapped out,” while the physical world offers richer, larger data. Figure collects human demonstrations for navigation/manipulation and leverages fleet learning to improve autonomy.

Transfer learning across tasks is surprisingly strong—and central to scaling.

Adcock highlights that training on one task (e.g., logistics) can improve unrelated tasks (e.g., laundry), reinforcing the “one model improves many behaviors” dynamic needed for general-purpose robots.

Safety in homes—especially around kids—is not solved yet, even for proponents.

Despite testing a robot in his home, he would not let it roam freely with young children for “hours and weeks” yet; he frames home-grade autonomy as requiring a much higher safety track record.

WORDS WORTH SAVING

5 quotes

“Would you leave your kids with your humanoid?”

— Nikhil Kamath

“We’re not there today. Like, I would not let my… robot roam free for hours and weeks right now with my young kids.”

— Brett Adcock

“You can’t solve this with code. You have to use advanced AI, like neural nets.”

— Brett Adcock

“The world was designed for humans… so our belief here is that the humanoid is… the ultimate general-purpose machine.”

— Brett Adcock

“I just don’t believe it.”

— Brett Adcock

QUESTIONS ANSWERED IN THIS EPISODE

5 questions

Project Go Big: what exact modalities (video, depth proxies, torque/force, audio, language prompts) are you capturing, and how will you label/structure it to be “internet-scale” for robotics?

Brett Adcock describes his path from software (Vettery) to electric aviation (Archer) and now humanoid robotics at Figure, arguing humanoids are the “ultimate general-purpose machine” because the world is built for the human form.

On BMW: what task is Figure 02 doing on the line, what percent is autonomous vs supervised intervention, and what’s been the top failure mode (hardware, perception, planning, grasping)?

He breaks down Figure’s robot stack—electric actuators, torso battery + CPU/GPU compute, multi-camera vision, force/torque sensing—and explains why deep learning is essential given the enormous state/action space of a 40-joint body.

You criticize teleoperation demos as “deceiving.” What autonomy benchmarks would you propose so the public can compare humanoid capabilities fairly across companies?

A core theme is data: Figure trains on real human demonstrations and fleet learning, believes real-world physical interaction is a larger untapped dataset than the internet, and is building a large-scale “robot pretraining set” (Project Go Big).

Hands are a major bet for Figure. Where is the line between “human-like dexterity” and “sufficient industrial dexterity,” and how does that affect cost and reliability?

They discuss near-term commercialization (BMW pilot) vs. home use, safety around kids, competing approaches in the industry, China’s perceived lead, Figure’s decision to end the OpenAI partnership, and how ubiquitous humanoids could collapse costs of goods/services and reshape jobs and purpose.

Home-first is a reversal of your earlier view. What specific capability milestones made you flip—navigation, deformable manipulation, long-horizon planning, or safety instrumentation?

EVERY SPOKEN WORD

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome

At a glance

Humanoid robots near consumer costs: data, safety, and autonomy race

Humanoids are a full-stack problem—hardware and AI must be co-designed.

Deep learning unlocked humanoids because the state space is un-codeable.

High-frequency control forces significant onboard compute today.

Vision is camera-first, similar to Tesla-style perception rather than lidar-heavy stacks.

Real-world physical data is the moat; synthetic and internet text aren’t enough for embodiment.

Transfer learning across tasks is surprisingly strong—and central to scaling.

Safety in homes—especially around kids—is not solved yet, even for proponents.

Project Go Big: what exact modalities (video, depth proxies, torque/force, audio, language prompts) are you capturing, and how will you label/structure it to be “internet-scale” for robotics?

On BMW: what task is Figure 02 doing on the line, what percent is autonomous vs supervised intervention, and what’s been the top failure mode (hardware, perception, planning, grasping)?

You criticize teleoperation demos as “deceiving.” What autonomy benchmarks would you propose so the public can compare humanoid capabilities fairly across companies?

Hands are a major bet for Figure. Where is the line between “human-like dexterity” and “sufficient industrial dexterity,” and how does that affect cost and reliability?

Home-first is a reversal of your earlier view. What specific capability milestones made you flip—navigation, deformable manipulation, long-horizon planning, or safety instrumentation?

Get more out of YouTube videos.