Jensen Huang: Nvidia's Future, Physical AI, Rise of the Agent, Inference Explosion, AI PR Crisis

Jensen Huang: Nvidia's Future, Physical AI, Rise of the Agent, Inference Explosion, AI PR Crisis

All-In PodcastMar 19, 20261h 6m

Jason Calacanis (host), Jason Calacanis (host), Jensen Huang (guest), David Sacks (host), Chamath Palihapitiya (host), Jason Calacanis (host), David Friedberg (host), David Sacks (host)

Dynamo and disaggregated inferenceGroq integration and heterogeneous compute racksInference factory economics and token costAgents as the new “computer” and operating system paradigmThree-computer stack: training, simulation/Omniverse, edge roboticsOpen models vs proprietary models (A and B)AI policy/PR, diffusion, and geopolitics (China/Taiwan supply chain)Self-driving platform strategy and “system-level” advantageHealthcare, digital biology, and agentic instrumentsRobotics timelines, China’s hardware supply-chain edgeWork transformation, token budgets for employees, AI skills for youthBuilding moats via deep domain specialization

In this episode of All-In Podcast, featuring Jason Calacanis and Jason Calacanis, Jensen Huang: Nvidia's Future, Physical AI, Rise of the Agent, Inference Explosion, AI PR Crisis explores jensen Huang on agents, physical AI, and inference growth economics Nvidia frames its evolution from a GPU vendor to an “AI factory” company, emphasizing disaggregated inference and heterogeneous compute (GPUs, CPUs, networking, storage processors, and Groq LPUs) orchestrated by an “operating system” layer (Dynamo).

Jensen Huang on agents, physical AI, and inference growth economics

Nvidia frames its evolution from a GPU vendor to an “AI factory” company, emphasizing disaggregated inference and heterogeneous compute (GPUs, CPUs, networking, storage processors, and Groq LPUs) orchestrated by an “operating system” layer (Dynamo).

Huang argues the industry is entering an inference-and-agents era where compute demand has jumped ~10,000x in two years (generative → reasoning → agentic), making token cost-per-output and throughput—not sticker price of a datacenter—the real economic metric.

He positions “physical AI” as a rare chance for tech to penetrate a ~$50T real-world economy (robots, factories, vehicles), with Nvidia’s three-computer stack (train, simulate/evaluate via Omniverse, and edge robotics compute) enabling that transition.

The conversation addresses AI’s PR and policy challenges, urging balanced communication that warns without “scaring,” and arguing national security depends on broad US adoption and global diffusion of the American AI tech stack.

Huang predicts rapid progress in robotics (3–5 years to broad productization), significant healthcare transformation via agents and biology models, and sustained need for deep vertical specialization as the moat in an agent-driven software world.

Key Takeaways

Disaggregate inference to match each step to the right chip.

Huang describes inference as today’s most complex compute pipeline and argues performance improves when prefill/decode and related stages can be split across heterogeneous accelerators, motivating Groq LPUs alongside GPUs and other processors.

Get the full analysis with uListen AI

Judge AI infrastructure by cost-per-token (throughput/efficiency), not datacenter sticker price.

He claims a “$50B factory” can produce cheaper tokens than a cheaper build if it delivers dramatically higher throughput, and notes much of capex is land/power/cooling/networking that doesn’t scale down linearly with chip price.

Get the full analysis with uListen AI

Agentic workloads change the datacenter: memory, storage, and tool-use become first-class constraints.

Running agents means constant access to working/long-term memory, heavy storage I/O, tool invocation, and multi-model orchestration, expanding the bill of materials beyond GPUs into storage processors (e. ...

Get the full analysis with uListen AI

“OpenClaw”/agent frameworks resemble a new operating system for modern computing.

Huang highlights agent systems’ OS-like primitives—memory, scheduling, I/O, resource management, tool/skill APIs—and argues this makes “personal AI computers” viable across desktop, enterprise, and embedded contexts, but requires governance/security constraints.

Get the full analysis with uListen AI

Nvidia’s strategic filter: do the insanely hard things that map to company superpowers.

He says Nvidia leans into problems that are difficult, unprecedented, and painful—because that combination reduces competition and creates durable advantage when executed with full-stack integration.

Get the full analysis with uListen AI

Open source and proprietary models will coexist; enterprises need controllable open models.

Huang argues consumers benefit from best-in-class closed models, while industries require domain capture and control that often depends on open weights/models—so startups can route between frontier APIs and specialized fine-tunes to balance capability and cost.

Get the full analysis with uListen AI

The lasting application-layer moat is deep vertical specialization plus customer-connected learning loops.

He predicts general models will be augmented by specialized sub-agents trained on proprietary domain data, and that connecting agents to real customers early accelerates a flywheel of product improvement and defensibility.

Get the full analysis with uListen AI

Notable Quotes

“You should not equate the price of the factory and the price of the tokens.”

Jensen Huang

“Even when the chips are free, it’s not cheap enough.”

Jensen Huang

“When we went from generative to reasoning… about a hundred times. … reasoning to agentic… another hundred times.”

Jensen Huang

“Open models… and proprietary… These two things are not A or B, it’s A and B.”

Jensen Huang

“Warning is good, scaring is less good.”

Jensen Huang

Questions Answered in This Episode

What exactly changes in the inference pipeline when you move to “disaggregated inference,” and which stages are best-suited to Groq LPUs versus Nvidia GPUs?

Nvidia frames its evolution from a GPU vendor to an “AI factory” company, emphasizing disaggregated inference and heterogeneous compute (GPUs, CPUs, networking, storage processors, and Groq LPUs) orchestrated by an “operating system” layer (Dynamo).

Get the full analysis with uListen AI

You mentioned adding Groq to ~25% of Vera Rubin deployments—what workload characteristics determine that allocation, and how should operators measure the break-even point?

Huang argues the industry is entering an inference-and-agents era where compute demand has jumped ~10,000x in two years (generative → reasoning → agentic), making token cost-per-output and throughput—not sticker price of a datacenter—the real economic metric.

Get the full analysis with uListen AI

If agents are the new operating system, what are the minimum governance controls needed so agents can use tools and sensitive data safely without crippling usefulness?

He positions “physical AI” as a rare chance for tech to penetrate a ~$50T real-world economy (robots, factories, vehicles), with Nvidia’s three-computer stack (train, simulate/evaluate via Omniverse, and edge robotics compute) enabling that transition.

Get the full analysis with uListen AI

You said compute rose ~10,000x from generative → reasoning → agentic—what are the concrete bottlenecks now: memory bandwidth, storage I/O, networking, or model efficiency?

The conversation addresses AI’s PR and policy challenges, urging balanced communication that warns without “scaring,” and arguing national security depends on broad US adoption and global diffusion of the American AI tech stack.

Get the full analysis with uListen AI

How should a company set internal “token budgets” per employee (like your $250k per $500k engineer heuristic) without incentivizing waste or shallow prompting?

Huang predicts rapid progress in robotics (3–5 years to broad productization), significant healthcare transformation via agents and biology models, and sustained need for deep vertical specialization as the moat in an agent-driven software world.

Get the full analysis with uListen AI

Transcript Preview

Jason Calacanis

Special episode this week. We've preempted the weekly show, and there's only three people we preempt the show for, President Trump, Jesus, and Jensen.

Jason Calacanis

[laughs]

Jensen Huang

[laughs]

Jason Calacanis

And, uh, I'll let you pick which order we do that. Uh, but what an amazing run you've had and, and a great event. Um-

Jensen Huang

Every industry is here. Every tech company is here. Every AI company is here. Incredible. Incredible.

Jason Calacanis

Extraordinary. And one of the great announcements of the past year has been Groq. When you made the purchase of Groq, did you realize how insufferable Cha- Chamath would become? [laughs]

Jason Calacanis

[laughs]

Jensen Huang

I had, I had an inkling that, that, [laughs] that, uh-

Jason Calacanis

'Cause we're his friends. We have to deal with him-

Jensen Huang

I know

Jason Calacanis

... every week.

Jensen Huang

I know it. I know it.

Jason Calacanis

You had to deal with him for the six-week close. [laughs]

Jensen Huang

I know it.

Jason Calacanis

It's like two weeks.

Jason Calacanis

Two weeks.

Jensen Huang

I know. It's all coming back to me now.

Jason Calacanis

[laughs]

Jensen Huang

It's, it's making me rather uncomfortable. The, the thing is, uh, many of our strategies are, are presented in, in broad daylight at GTC, years in advance of when we do it. Two and a half years ago, I introduced the operating system of the AI factory, and it's called Dynamo. Dynamo, as you know, is a piece of instrument, a machine that was created by Siemens to turn essentially water into electricity. And Dynamo, uh, w- powered the factory of the last industrial revolution. So I thought it was the perfect name for the operating system of the next industrial revolution, the factory of that. And so inside Dynamo, the fundamental technology is disaggregated inference. Jason, I, I know y- you're, you're-

Jason Calacanis

Yeah

Jensen Huang

... you're super technical.

Jason Calacanis

Absolutely.

Jensen Huang

I know it. [laughs]

Jason Calacanis

I'll let you take this one. Go ahead and define it for the audience.

Jason Calacanis

[laughs]

Jason Calacanis

I don't want to step on you.

Jensen Huang

Yeah. Thank you. I, I, I knew you wanted to jump in there for a second.

Jason Calacanis

Yeah.

Jensen Huang

But it's, it's disaggregated inference, which means the, the pipeline, the processing pipeline of inference is extremely complicated. In fact, it is the most complicated computing problem today. Incredible scale, lots of mathematics of different shapes and sizes, and we came up, came up with the idea that you would change, you would, you would disaggregate parts of the processing such that some of it can run on some GPUs, rest of it can run on different GPUs, and that led to us realizing that maybe even disaggregated computing could make sense, that we could have different heterogeneous nature of computing. That same sensibility led us to Melonize.

Jason Calacanis

Yep.

Jensen Huang

You know, today, Nvidia's computing is spread across GPUs, CPUs, switches, scale-up switches, scale-out switches, networking processors, and now we're gonna add Groq to that, and we're gonna put the right workload on the right chips. You know, we just really evolved from a GPU company to an AI factory company.

Jason Calacanis

I mean, I think that was probably the biggest takeaway that I had. You're seeing this fundamental disaggregation where we've gone from a GPU, and now you have this complexion of all these different options that will eventually exist. The thing that you guys said on stage, or you said on stage was, "I, I would like the high value inference people to take a listen to this," and 25% of your data center space you said should be allocated to this Groq LPU, GPU combo.

Install uListen to search the full transcript and get AI-powered insights

Get Full Transcript

Get more from every podcast

AI summaries, searchable transcripts, and fact-checking. Free forever.

Add to Chrome