All-In PodcastJensen Huang: Why cheap chips still produce expensive tokens
Via Groq, Nvidia routes each inference step to the right chip. Vera Rubin targets agentic racks; Huang's key metric is token cost, not the datacenter price tag.
At a glance
WHAT IT’S REALLY ABOUT
Jensen Huang on agents, physical AI, and inference growth economics
- Nvidia frames its evolution from a GPU vendor to an “AI factory” company, emphasizing disaggregated inference and heterogeneous compute (GPUs, CPUs, networking, storage processors, and Groq LPUs) orchestrated by an “operating system” layer (Dynamo).
- Huang argues the industry is entering an inference-and-agents era where compute demand has jumped ~10,000x in two years (generative → reasoning → agentic), making token cost-per-output and throughput—not sticker price of a datacenter—the real economic metric.
- He positions “physical AI” as a rare chance for tech to penetrate a ~$50T real-world economy (robots, factories, vehicles), with Nvidia’s three-computer stack (train, simulate/evaluate via Omniverse, and edge robotics compute) enabling that transition.
- The conversation addresses AI’s PR and policy challenges, urging balanced communication that warns without “scaring,” and arguing national security depends on broad US adoption and global diffusion of the American AI tech stack.
- Huang predicts rapid progress in robotics (3–5 years to broad productization), significant healthcare transformation via agents and biology models, and sustained need for deep vertical specialization as the moat in an agent-driven software world.
IDEAS WORTH REMEMBERING
5 ideasDisaggregate inference to match each step to the right chip.
Huang describes inference as today’s most complex compute pipeline and argues performance improves when prefill/decode and related stages can be split across heterogeneous accelerators, motivating Groq LPUs alongside GPUs and other processors.
Judge AI infrastructure by cost-per-token (throughput/efficiency), not datacenter sticker price.
He claims a “$50B factory” can produce cheaper tokens than a cheaper build if it delivers dramatically higher throughput, and notes much of capex is land/power/cooling/networking that doesn’t scale down linearly with chip price.
Agentic workloads change the datacenter: memory, storage, and tool-use become first-class constraints.
Running agents means constant access to working/long-term memory, heavy storage I/O, tool invocation, and multi-model orchestration, expanding the bill of materials beyond GPUs into storage processors (e.g., BlueField), networking, and specialized accelerators.
“OpenClaw”/agent frameworks resemble a new operating system for modern computing.
Huang highlights agent systems’ OS-like primitives—memory, scheduling, I/O, resource management, tool/skill APIs—and argues this makes “personal AI computers” viable across desktop, enterprise, and embedded contexts, but requires governance/security constraints.
Nvidia’s strategic filter: do the insanely hard things that map to company superpowers.
He says Nvidia leans into problems that are difficult, unprecedented, and painful—because that combination reduces competition and creates durable advantage when executed with full-stack integration.
WORDS WORTH SAVING
5 quotes“You should not equate the price of the factory and the price of the tokens.”
— Jensen Huang
“Even when the chips are free, it’s not cheap enough.”
— Jensen Huang
“When we went from generative to reasoning… about a hundred times. … reasoning to agentic… another hundred times.”
— Jensen Huang
“Open models… and proprietary… These two things are not A or B, it’s A and B.”
— Jensen Huang
“Warning is good, scaring is less good.”
— Jensen Huang
High quality AI-generated summary created from speaker-labeled transcript.
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome