All-In PodcastJensen Huang: Why cheap chips still produce expensive tokens
Via Groq, Nvidia routes each inference step to the right chip. Vera Rubin targets agentic racks; Huang's key metric is token cost, not the datacenter price tag.
CHAPTERS
Jensen Huang at GTC: Nvidia’s shift from GPU maker to “AI factory” company
The hosts welcome Jensen Huang and frame the conversation around Nvidia’s rapid expansion and influence across the AI ecosystem. Jensen sets the tone: Nvidia’s roadmap is laid out years ahead at GTC, and the company now thinks in terms of end-to-end AI computing factories rather than standalone GPUs.
- •GTC as the venue where Nvidia previews strategy years in advance
- •Reframing Nvidia from a GPU company to an AI factory/infrastructure company
- •The coming computing era is defined by diverse workloads, not a single model type
- •Nvidia’s ambition is to provide the full stack across the AI lifecycle
Groq acquisition and “disaggregated inference” as the driver of the inference explosion
Jensen explains why inference has become the most complex computing problem today and why disaggregating the inference pipeline matters. He describes the “Dynamo” operating system concept for AI factories and how adding Groq enables placing the right workload on the right silicon.
- •Dynamo: Nvidia’s “operating system” for the AI factory concept
- •Disaggregated inference: splitting inference stages across different GPUs/chips
- •Inference pipelines are complex and require heterogeneous compute resources
- •Groq added as another processor option in a broader heterogeneous system
From LLMs to agents: why data centers now need heterogeneous racks (Vera Rubin era)
The discussion moves from plain LLM serving to agentic workloads that hammer memory, storage, tools, and multi-model orchestration. Jensen argues this diversity is why next-gen systems like Vera Rubin are designed for heterogeneous compute, networking, and storage processing at scale.
- •Agentic workloads: working memory, long-term memory, tool use, multi-agent collaboration
- •Data center now runs many model types: large/small LLMs, diffusion, autoregressive, etc.
- •Vera Rubin designed to handle diverse, mixed workloads
- •TAM expansion: more racks devoted to storage processors, networking, CPUs, and Groq-like accelerators
Three-computer architecture: training, simulation/Omniverse, and edge robotics
Jensen outlines a simple mental model: AI requires separate systems for training, physics-based simulation/evaluation, and edge deployment. He highlights Omniverse as the “virtual gym” for robots and stresses the breadth of edge form factors—from cars to teddy bears to telecom base stations.
- •Three computers: training, simulation/evaluation (Omniverse), and edge/robotics computer
- •Physics-accurate simulation is essential for robots and autonomy
- •Edge AI spans cars, robots, consumer devices, and embedded systems
- •Telecom base stations as a future extension of AI infrastructure (a $2T industry transformation)
Token economics vs. factory sticker price: why expensive factories can yield cheaper tokens
Responding to concerns about Nvidia systems costing more than alternatives, Jensen argues price comparisons confuse capex with unit economics. The key metric is token cost, and Nvidia claims system-level efficiency and throughput can make a bigger factory the cheapest way to produce inference.
- •Don’t equate factory capex with cost per token
- •Non-chip costs (land, power, shell, cooling, networking) dominate data center budgets
- •Throughput/efficiency can overwhelm headline hardware price differences
- •“Even free chips aren’t cheap enough” if they lag state-of-the-art system performance
How Jensen makes decisions: choosing problems that are ‘insanely hard’ and uniquely Nvidia
Jensen describes his CEO framework: pursue only problems that are extremely difficult, not yet solved, and aligned with Nvidia’s special capabilities. He emphasizes that meaningful invention requires enduring “pain and suffering,” and that difficulty is a competitive moat.
- •CEO’s job: define vision and strategy based on real technological constraints
- •Avoid “easy” markets that attract many competitors
- •Target never-before-done, high-difficulty problems that match company strengths
- •Enduring hardship is part of building defensible breakthroughs
Physical AI and digital biology: Nvidia’s long-tail bets hitting inflection points
Jensen argues physical AI is the tech industry’s first real shot at a $50T market and says Nvidia’s decade-long investment is now compounding into a multi-billion-dollar business. He also predicts a near-term “ChatGPT moment” for digital biology as AI begins to represent genes, proteins, and cells.
- •Physical AI positioned as a $50T opportunity; Nvidia started ~10 years ago
- •Physical AI now approaching ~$10B/year and accelerating
- •Digital biology nearing a major inflection: representing genes/proteins/cells dynamically
- •Agriculture, healthcare, and industrial categories cited as imminent beneficiaries
OpenClaw and the rise of personal AI on the desktop: agents as the new operating system
The panel discusses hobbyist momentum and local workstations, then Jensen frames OpenClaw as a cultural and architectural breakthrough that popularized useful agents. He describes agent systems as a reinvention of computing—memory, scheduling, I/O, and “skills” forming a new OS-like blueprint that can run everywhere.
- •Three recent inflections: generative (ChatGPT), reasoning, then agentic (Claude Code → OpenClaw)
- •OpenClaw popularizes what agents can do beyond enterprise-only tools
- •Agent systems resemble an operating system: memory, resource management, scheduling, I/O, skills/APIs
- •Governance and security are essential (prevent agents from combining sensitive access, code execution, and outbound comms unchecked)
AI’s PR crisis and policymaking: countering doomer narratives and Anthropic comms lessons
Jensen urges proactive education of policymakers: AI is software, not a biological or conscious entity, and the industry understands more than it sometimes claims. He praises Anthropic’s technical rigor and safety culture but warns that ‘scaring’ narratives can backfire and slow adoption, weakening national competitiveness.
- •Policymakers need current, accurate briefings; don’t let extremism shape regulation
- •AI is not conscious/alien/biological; it’s software—important for public framing
- •Anthropic praised for safety focus, but communications should be balanced and evidence-based
- •Biggest national risk: domestic under-adoption while other countries diffuse AI quickly
Enterprise ROI and the compute curve: generative → reasoning → agentic as 10,000× demand
Jensen and the hosts argue that compute demand has surged dramatically as AI moved from token generation to reasoning and then to agents that do real work. Jensen claims agentic systems will drive willingness to pay, creating the conditions for a massive inference buildout and sustained demand growth.
- •Compute jumps: ~100× from generative to reasoning, ~100× again to agentic (~10,000× in 2 years)
- •People pay more for work done than for information answers
- •Agentic tools boost software engineering and knowledge work productivity
- •Inference scaling described as heading toward “million×” expansion
Tokens as labor leverage: budgeting AI usage like CAD tools for elite engineers
Jensen proposes a striking internal benchmark: if a highly paid engineer isn’t consuming substantial token spend, the company should worry. The panel explores how agents remove perceived constraints (‘too hard,’ ‘too slow,’ ‘need more people’) and shift engineering toward specifying, evaluating, and orchestrating AI collaborators.
- •Token spend reframed as productivity infrastructure, not a discretionary perk
- •Benchmark: meaningful token usage should be a large fraction of a top engineer’s compensation
- •Agents change engineering from coding to specifying architectures, constraints, and evaluation
- •Future: engineers managing “hundreds of agents” and focusing on creativity and judgment
Open source, decentralized training, and global diffusion: why both closed and open models matter
The conversation covers distributed/community training examples and the strategic role of open models. Jensen argues models are a technology layer, not merely a service: proprietary frontier models and open-weight models will coexist, with open models enabling domain control and specialization across industries.
- •Decentralized training experiments likened to “folding at home” for model building
- •Closed frontier models thrive for general use; open models are crucial for specialization/control
- •Enterprises need tunable models to encode domain expertise and governance requirements
- •Practical strategy: route to best proprietary model now, migrate/optimize with open models over time
Geopolitics and supply chain resilience: China access, Taiwan manufacturing, Iran/Israel families
Jensen addresses export controls and argues US national security benefits from widespread adoption of the American tech stack globally. He discusses efforts to reindustrialize in the US with Taiwanese partners, diversify manufacturing beyond Taiwan, and manage regional risks—from Middle East instability to helium supply concerns.
- •Goal: American tech stack broadly used globally, even if models differ by country/region
- •China market access framed as strategic; licensing and supply chain ramping discussed
- •Taiwan: accelerate US reindustrialization (AZ/TX/CA), diversify supply chain, and practice restraint
- •Middle East: employee family concerns; commitment to Israel and regional long-term AI expansion; helium risk noted but buffered
Autonomy, robotics, space and healthcare: platform strategy, timelines, and what comes next
Jensen outlines Nvidia’s autonomy strategy: enable every automaker with a full-stack platform (training, simulation, in-car compute) without building cars themselves, while competing with customers who also build chips. He then covers data centers in space, healthcare agentics and surgical robotics, and predicts humanoid robotics will become real products within roughly 3–5 years, with China’s hardware supply chain strength as a key factor.
- •Self-driving stack: training + simulation + vehicle compute + safety OS; reasoning AV approach (e.g., decomposing scenarios)
- •Competition with customers (TPUs/ASICs) handled via system-level advantage, portability, and full-stack delivery
- •Space compute: already using CUDA in satellites; major challenge is cooling (radiation-only), longer-term exploration
- •Healthcare: AI biology for discovery, agents for clinical workflows, physical AI for robotics/instruments; robotics products expected in ~3–5 years; China strong in motors/magnets/rare earth supply chain
AI revenue potential, moats, and advice for the next generation in an agentic world
Jensen predicts model-company revenues could exceed even aggressive forecasts because enterprise software firms may become resellers of model tokens, expanding distribution. He argues application-layer moats come from deep vertical specialization and customer-connected flywheels, then closes with guidance for young people: master AI tools, build deep math/science foundations, and treat language as the new programming interface—while expecting jobs to transform rather than simply vanish.
- •Revenue outlook: model/agent companies plus enterprise channels could scale faster than expected
- •Moat for AI apps: deep domain specialization + early customer connection to accelerate improvement
- •Jobs shift from tasks to purpose; radiology example: AI adoption rose alongside demand for radiologists
- •Advice: study deep math/science and language skills; become expert at using AI (specifying, guiding, evaluating)