Skip to content
All-In PodcastAll-In Podcast

Jensen Huang: Why cheap chips still produce expensive tokens

Via Groq, Nvidia routes each inference step to the right chip. Vera Rubin targets agentic racks; Huang's key metric is token cost, not the datacenter price tag.

Jason CalacanishostJensen HuangguestDavid FriedberghostChamath Palihapitiyahost
Mar 18, 20261h 6mWatch on YouTube ↗

FREQUENTLY ASKED QUESTIONS

Direct answers grounded in the episode transcript. Tap any timestamp to verify against the source.

  1. What is Nvidia disaggregated inference?

    Disaggregated inference is Nvidia's way of splitting the inference pipeline across specialized compute. Huang says inference is now the most complicated computing problem, because it happens at enormous scale and involves many mathematical shapes and sizes. Instead of forcing every part of the pipeline onto one kind of GPU, Nvidia breaks the processing apart so different pieces can run on different GPUs or other accelerators. That same idea led Nvidia toward heterogeneous computing: GPUs, CPUs, scale-up and scale-out switches, networking processors, and Groq all cooperating inside what Huang calls an AI factory. The point is not simply to add more chips. It is to put the right workload on the right chip, so the system can handle the inference explosion more efficiently.

    1:48 in transcript
  2. Why does Nvidia need three computers for physical AI?

    Nvidia's physical AI stack needs separate systems for training, simulation, and edge deployment. Huang says the first computer develops or trains the AI model. The second evaluates the AI inside a virtual gym that represents the physical world, where the software has to obey the laws of physics. Nvidia calls that simulation system Omniverse. The third computer runs at the edge as the robotics computer. That edge device could be inside a self-driving car, a robot, a teddy bear, a factory, a warehouse, or even telecommunications base stations that become part of AI infrastructure. Huang later frames physical AI as a large category and says it is the technology industry's first chance to address a $50 trillion industry that had mostly lacked technology.

    5:17 in transcript
  3. Why is Jensen Huang worried about AI doomerism?

    Jensen Huang's concern is that fear can distort policy before AI adoption takes hold. He says policymakers need direct education about what the technology is and is not: it is not a biological being, alien, or conscious, and it is computer software. He rejects the claim that people do not understand AI at all, arguing that the industry understands a lot while still recognizing the technology is moving fast. The national risk, in his view, is that other countries adopt AI while American industry and society become angry, afraid, or paranoid and fail to use it. On Anthropic, he praises the company's technology, security focus, safety culture, and desire to warn people, but draws a line between warning and scaring.

    17:33 in transcript
  4. What did Jensen Huang mean by $250,000 in tokens per engineer?

    Jensen Huang used the token budget as a test of whether elite engineers are using AI enough. His thought experiment starts with a software engineer or AI researcher paid $500,000 a year. If that person spent only $5,000 on tokens, he says he would react strongly, and if they did not consume at least $250,000 worth of tokens, he would be deeply alarmed. The comparison is to chip designers refusing to use CAD tools and choosing paper and pencil instead. In Huang's view, agents remove old constraints such as 'this is too hard,' 'this will take a long time,' or 'we need a lot of people.' The work shifts toward ideas, architectures, specifications, organizing agent teams, and defining what good outcomes look like.

    24:51 in transcript
  5. When does Jensen Huang think humanoid robots will become real products?

    Jensen Huang puts useful humanoid robots roughly three to five years away. He says America helped invent much of the robotics industry, but may have started too early and got tired before the enabling technology arrived. Now that high-functioning proof of existence exists, he says the jump to reasonable products usually takes only two or three technology cycles, which he translates into three to five years. He expects robots to be 'all over the place' in that window. Huang also stresses that China is formidable because microelectronics, motors, rare earths, and magnets are foundational to robotics, and China has the world's best ecosystem in those areas.

    52:35 in transcript

Answers are AI-generated from the transcript and may contain errors. Tap a question to verify against the source.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome