Jensen Huang: Why cheap chips still produce expensive tokens

Via Groq, Nvidia routes each inference step to the right chip. Vera Rubin targets agentic racks; Huang's key metric is token cost, not the datacenter price tag.

Jason CalacanishostJensen HuangguestDavid FriedberghostChamath Palihapitiyahost

Mar 18, 20261h 6mWatch on YouTube ↗

WHAT IT’S REALLY ABOUT

Jensen Huang on agents, physical AI, and inference growth economics

Nvidia frames its evolution from a GPU vendor to an “AI factory” company, emphasizing disaggregated inference and heterogeneous compute (GPUs, CPUs, networking, storage processors, and Groq LPUs) orchestrated by an “operating system” layer (Dynamo).
Huang argues the industry is entering an inference-and-agents era where compute demand has jumped ~10,000x in two years (generative → reasoning → agentic), making token cost-per-output and throughput—not sticker price of a datacenter—the real economic metric.
He positions “physical AI” as a rare chance for tech to penetrate a ~$50T real-world economy (robots, factories, vehicles), with Nvidia’s three-computer stack (train, simulate/evaluate via Omniverse, and edge robotics compute) enabling that transition.
The conversation addresses AI’s PR and policy challenges, urging balanced communication that warns without “scaring,” and arguing national security depends on broad US adoption and global diffusion of the American AI tech stack.
Huang predicts rapid progress in robotics (3–5 years to broad productization), significant healthcare transformation via agents and biology models, and sustained need for deep vertical specialization as the moat in an agent-driven software world.

IDEAS WORTH REMEMBERING

5 ideas

Disaggregate inference to match each step to the right chip.

Huang describes inference as today’s most complex compute pipeline and argues performance improves when prefill/decode and related stages can be split across heterogeneous accelerators, motivating Groq LPUs alongside GPUs and other processors.

Judge AI infrastructure by cost-per-token (throughput/efficiency), not datacenter sticker price.

He claims a “$50B factory” can produce cheaper tokens than a cheaper build if it delivers dramatically higher throughput, and notes much of capex is land/power/cooling/networking that doesn’t scale down linearly with chip price.

Agentic workloads change the datacenter: memory, storage, and tool-use become first-class constraints.

Running agents means constant access to working/long-term memory, heavy storage I/O, tool invocation, and multi-model orchestration, expanding the bill of materials beyond GPUs into storage processors (e.g., BlueField), networking, and specialized accelerators.

“OpenClaw”/agent frameworks resemble a new operating system for modern computing.

Huang highlights agent systems’ OS-like primitives—memory, scheduling, I/O, resource management, tool/skill APIs—and argues this makes “personal AI computers” viable across desktop, enterprise, and embedded contexts, but requires governance/security constraints.

Nvidia’s strategic filter: do the insanely hard things that map to company superpowers.

He says Nvidia leans into problems that are difficult, unprecedented, and painful—because that combination reduces competition and creates durable advantage when executed with full-stack integration.

WORDS WORTH SAVING

5 quotes

“You should not equate the price of the factory and the price of the tokens.”

— Jensen Huang

“Even when the chips are free, it’s not cheap enough.”

— Jensen Huang

“When we went from generative to reasoning… about a hundred times. … reasoning to agentic… another hundred times.”

— Jensen Huang

“Open models… and proprietary… These two things are not A or B, it’s A and B.”

— Jensen Huang

“Warning is good, scaring is less good.”

— Jensen Huang

Dynamo and disaggregated inferenceGroq integration and heterogeneous compute racksInference factory economics and token costAgents as the new “computer” and operating system paradigmThree-computer stack: training, simulation/Omniverse, edge roboticsOpen models vs proprietary models (A and B)AI policy/PR, diffusion, and geopolitics (China/Taiwan supply chain)Self-driving platform strategy and “system-level” advantageHealthcare, digital biology, and agentic instrumentsRobotics timelines, China’s hardware supply-chain edgeWork transformation, token budgets for employees, AI skills for youthBuilding moats via deep domain specialization

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.