
Jensen Huang: Nvidia's Future, Physical AI, Rise of the Agent, Inference Explosion, AI PR Crisis
Jason Calacanis (host), Jason Calacanis (host), Jensen Huang (guest), David Sacks (host), Chamath Palihapitiya (host), Jason Calacanis (host), David Friedberg (host), David Sacks (host)
In this episode of All-In Podcast, featuring Jason Calacanis and Jason Calacanis, Jensen Huang: Nvidia's Future, Physical AI, Rise of the Agent, Inference Explosion, AI PR Crisis explores jensen Huang on agents, physical AI, and inference growth economics Nvidia frames its evolution from a GPU vendor to an “AI factory” company, emphasizing disaggregated inference and heterogeneous compute (GPUs, CPUs, networking, storage processors, and Groq LPUs) orchestrated by an “operating system” layer (Dynamo).
Jensen Huang on agents, physical AI, and inference growth economics
Nvidia frames its evolution from a GPU vendor to an “AI factory” company, emphasizing disaggregated inference and heterogeneous compute (GPUs, CPUs, networking, storage processors, and Groq LPUs) orchestrated by an “operating system” layer (Dynamo).
Huang argues the industry is entering an inference-and-agents era where compute demand has jumped ~10,000x in two years (generative → reasoning → agentic), making token cost-per-output and throughput—not sticker price of a datacenter—the real economic metric.
He positions “physical AI” as a rare chance for tech to penetrate a ~$50T real-world economy (robots, factories, vehicles), with Nvidia’s three-computer stack (train, simulate/evaluate via Omniverse, and edge robotics compute) enabling that transition.
The conversation addresses AI’s PR and policy challenges, urging balanced communication that warns without “scaring,” and arguing national security depends on broad US adoption and global diffusion of the American AI tech stack.
Huang predicts rapid progress in robotics (3–5 years to broad productization), significant healthcare transformation via agents and biology models, and sustained need for deep vertical specialization as the moat in an agent-driven software world.
Key Takeaways
Disaggregate inference to match each step to the right chip.
Huang describes inference as today’s most complex compute pipeline and argues performance improves when prefill/decode and related stages can be split across heterogeneous accelerators, motivating Groq LPUs alongside GPUs and other processors.
Get the full analysis with uListen AI
Judge AI infrastructure by cost-per-token (throughput/efficiency), not datacenter sticker price.
He claims a “$50B factory” can produce cheaper tokens than a cheaper build if it delivers dramatically higher throughput, and notes much of capex is land/power/cooling/networking that doesn’t scale down linearly with chip price.
Get the full analysis with uListen AI
Agentic workloads change the datacenter: memory, storage, and tool-use become first-class constraints.
Running agents means constant access to working/long-term memory, heavy storage I/O, tool invocation, and multi-model orchestration, expanding the bill of materials beyond GPUs into storage processors (e. ...
Get the full analysis with uListen AI
“OpenClaw”/agent frameworks resemble a new operating system for modern computing.
Huang highlights agent systems’ OS-like primitives—memory, scheduling, I/O, resource management, tool/skill APIs—and argues this makes “personal AI computers” viable across desktop, enterprise, and embedded contexts, but requires governance/security constraints.
Get the full analysis with uListen AI
Nvidia’s strategic filter: do the insanely hard things that map to company superpowers.
He says Nvidia leans into problems that are difficult, unprecedented, and painful—because that combination reduces competition and creates durable advantage when executed with full-stack integration.
Get the full analysis with uListen AI
Open source and proprietary models will coexist; enterprises need controllable open models.
Huang argues consumers benefit from best-in-class closed models, while industries require domain capture and control that often depends on open weights/models—so startups can route between frontier APIs and specialized fine-tunes to balance capability and cost.
Get the full analysis with uListen AI
The lasting application-layer moat is deep vertical specialization plus customer-connected learning loops.
He predicts general models will be augmented by specialized sub-agents trained on proprietary domain data, and that connecting agents to real customers early accelerates a flywheel of product improvement and defensibility.
Get the full analysis with uListen AI
Notable Quotes
““You should not equate the price of the factory and the price of the tokens.””
— Jensen Huang
““Even when the chips are free, it’s not cheap enough.””
— Jensen Huang
““When we went from generative to reasoning… about a hundred times. … reasoning to agentic… another hundred times.””
— Jensen Huang
““Open models… and proprietary… These two things are not A or B, it’s A and B.””
— Jensen Huang
““Warning is good, scaring is less good.””
— Jensen Huang
Questions Answered in This Episode
What exactly changes in the inference pipeline when you move to “disaggregated inference,” and which stages are best-suited to Groq LPUs versus Nvidia GPUs?
Nvidia frames its evolution from a GPU vendor to an “AI factory” company, emphasizing disaggregated inference and heterogeneous compute (GPUs, CPUs, networking, storage processors, and Groq LPUs) orchestrated by an “operating system” layer (Dynamo).
Get the full analysis with uListen AI
You mentioned adding Groq to ~25% of Vera Rubin deployments—what workload characteristics determine that allocation, and how should operators measure the break-even point?
Huang argues the industry is entering an inference-and-agents era where compute demand has jumped ~10,000x in two years (generative → reasoning → agentic), making token cost-per-output and throughput—not sticker price of a datacenter—the real economic metric.
Get the full analysis with uListen AI
If agents are the new operating system, what are the minimum governance controls needed so agents can use tools and sensitive data safely without crippling usefulness?
He positions “physical AI” as a rare chance for tech to penetrate a ~$50T real-world economy (robots, factories, vehicles), with Nvidia’s three-computer stack (train, simulate/evaluate via Omniverse, and edge robotics compute) enabling that transition.
Get the full analysis with uListen AI
You said compute rose ~10,000x from generative → reasoning → agentic—what are the concrete bottlenecks now: memory bandwidth, storage I/O, networking, or model efficiency?
The conversation addresses AI’s PR and policy challenges, urging balanced communication that warns without “scaring,” and arguing national security depends on broad US adoption and global diffusion of the American AI tech stack.
Get the full analysis with uListen AI
How should a company set internal “token budgets” per employee (like your $250k per $500k engineer heuristic) without incentivizing waste or shallow prompting?
Huang predicts rapid progress in robotics (3–5 years to broad productization), significant healthcare transformation via agents and biology models, and sustained need for deep vertical specialization as the moat in an agent-driven software world.
Get the full analysis with uListen AI
Transcript Preview
Special episode this week. We've preempted the weekly show, and there's only three people we preempt the show for, President Trump, Jesus, and Jensen.
[laughs]
[laughs]
And, uh, I'll let you pick which order we do that. Uh, but what an amazing run you've had and, and a great event. Um-
Every industry is here. Every tech company is here. Every AI company is here. Incredible. Incredible.
Extraordinary. And one of the great announcements of the past year has been Groq. When you made the purchase of Groq, did you realize how insufferable Cha- Chamath would become? [laughs]
[laughs]
I had, I had an inkling that, that, [laughs] that, uh-
'Cause we're his friends. We have to deal with him-
I know
... every week.
I know it. I know it.
You had to deal with him for the six-week close. [laughs]
I know it.
It's like two weeks.
Two weeks.
I know. It's all coming back to me now.
[laughs]
It's, it's making me rather uncomfortable. The, the thing is, uh, many of our strategies are, are presented in, in broad daylight at GTC, years in advance of when we do it. Two and a half years ago, I introduced the operating system of the AI factory, and it's called Dynamo. Dynamo, as you know, is a piece of instrument, a machine that was created by Siemens to turn essentially water into electricity. And Dynamo, uh, w- powered the factory of the last industrial revolution. So I thought it was the perfect name for the operating system of the next industrial revolution, the factory of that. And so inside Dynamo, the fundamental technology is disaggregated inference. Jason, I, I know y- you're, you're-
Yeah
... you're super technical.
Absolutely.
I know it. [laughs]
I'll let you take this one. Go ahead and define it for the audience.
[laughs]
I don't want to step on you.
Yeah. Thank you. I, I, I knew you wanted to jump in there for a second.
Yeah.
But it's, it's disaggregated inference, which means the, the pipeline, the processing pipeline of inference is extremely complicated. In fact, it is the most complicated computing problem today. Incredible scale, lots of mathematics of different shapes and sizes, and we came up, came up with the idea that you would change, you would, you would disaggregate parts of the processing such that some of it can run on some GPUs, rest of it can run on different GPUs, and that led to us realizing that maybe even disaggregated computing could make sense, that we could have different heterogeneous nature of computing. That same sensibility led us to Melonize.
Yep.
You know, today, Nvidia's computing is spread across GPUs, CPUs, switches, scale-up switches, scale-out switches, networking processors, and now we're gonna add Groq to that, and we're gonna put the right workload on the right chips. You know, we just really evolved from a GPU company to an AI factory company.
I mean, I think that was probably the biggest takeaway that I had. You're seeing this fundamental disaggregation where we've gone from a GPU, and now you have this complexion of all these different options that will eventually exist. The thing that you guys said on stage, or you said on stage was, "I, I would like the high value inference people to take a listen to this," and 25% of your data center space you said should be allocated to this Groq LPU, GPU combo.
Install uListen to search the full transcript and get AI-powered insights
Get Full TranscriptGet more from every podcast
AI summaries, searchable transcripts, and fact-checking. Free forever.
Add to Chrome