CHAPTERS
- 0:00 – 2:29
NVIDIA’s 10-year roadmap: scaling compute at the system level
Jensen frames the last decade’s shift from CPU-centric, human-coded software to GPU-accelerated machine learning, arguing the entire computing stack has been reinvented. He outlines NVIDIA’s 10-year ambition: compounding improvements in performance, cost, and energy at “data-center scale,” not merely per chip.
- •Shift from coding to machine learning as the primary computing paradigm
- •GPUs as the foundation for massive parallelism across clusters/data centers
- •Goal to double/triple performance yearly at scale, reducing cost and energy similarly
- •Compounding gains create a “hyper–Moore’s Law” effect
- •The stack change expands the size of solvable problems
- 2:29 – 3:34
Beyond Moore’s Law: co-design, lower precision, and algorithm-architecture iteration
Asked what drives faster-than-Moore scaling, Jensen explains traditional scaling pillars (Dennard/VLSI) have faded. The new approach is co-design—iterating between algorithms, numerics, and hardware architecture—to unlock step-function improvements.
- •Dennard scaling and classic VLSI scaling have run out of steam
- •Co-design requires controlling both software/algorithms and hardware
- •Precision evolution (FP64→FP32→BF16→FP8→FP4) as a key lever
- •Architecture adapts to algorithms and vice versa in tight loops
- •Accelerated computing enables gains impossible in traditional approaches
- 3:34 – 4:13
Data-center-scale computing: networking as a compute fabric (Mellanox, InfiniBand, NVLink)
Jensen describes the second major scaling pillar: treating the network as part of the compute system. NVIDIA’s Mellanox acquisition and the fusion of InfiniBand with NVLink aim to make clusters behave like one giant GPU, improving efficiency and scalability.
- •Scaling now depends on data-center architecture, not just chips
- •Network becomes a compute fabric; work is pushed into the fabric
- •Mellanox acquisition as a strategic move for fabric control
- •InfiniBand and NVLink integration to scale as “one processor”
- •Hundreds of GPUs coordinated as a single virtualized compute unit
- 4:13 – 5:59
Training vs inference demands: low latency vs high throughput and inference-time scaling
The discussion turns to inference-time scaling techniques like tree search and self-reflection, which demand very low latency. Jensen contrasts this with the data center’s factory-like need for high throughput and low unit cost, explaining why the infrastructure must evolve to handle both simultaneously.
- •Inference-time scaling requires rapid token generation at low latency
- •Techniques mentioned: tree search, chain-of-thought, internal simulation/reflection
- •Data centers must also optimize for throughput to control cost
- •Latency and throughput are fundamentally in tension
- •NVLink direction supports a large-memory, high-bandwidth “virtual GPU” model
- 5:59 – 7:16
Software stack compounding: CUDA as a stable foundation for rapid performance jumps
Elad notes dramatic token cost declines, and Jensen emphasizes NVIDIA’s contribution through software and stack optimizations. He argues CUDA stability enables rapid iteration above the foundation—citing large performance gains on existing hardware without changing the top-level model algorithms.
- •CUDA as a stable foundation enables fast iteration above it
- •NVIDIA improved Hopper performance ~5× in a year on LLaMA-era workloads
- •Gains come from full-stack optimization, not changing the model layer
- •Accelerated computing enables new co-design techniques
- •Ecosystem productivity is a strategic priority
- 7:16 – 10:40
Infrastructure flexibility: reusing training clusters for inference and a multi-scale model world
Sarah asks about interchangeability between training and inference infrastructure. Jensen argues that training systems remain excellent for inference later, creating a “trail” of valuable deployed compute, while model ecosystems will span frontier models down to distilled tiny specialists.
- •Modern infrastructure is disaggregated across generations and cooling types
- •Training hardware can be repurposed effectively for inference
- •Investment returns improve as older infrastructure becomes inference capacity
- •Frontier models remain necessary for distillation and synthetic data generation
- •Future includes a spectrum: giant models to highly effective tiny domain models
- 10:40 – 12:40
The new unit of delivery: from chips to racks to full data centers
NVIDIA’s customer support unit has expanded from chips to systems to racks (e.g., NVL72) and beyond. Jensen explains NVIDIA must build complete data centers internally to validate real performance, fabric behavior, and software at scale—then “disaggregate” and sell components.
- •NVIDIA builds full data centers to ensure the system works end-to-end
- •Real-world performance often diverges from slideware peak specs
- •“New unit of computing is the data center” as a design/product concept
- •Many configurations are built/tested: cooling, CPU types, Ethernet/InfiniBand/NVLink
- •Vertical integration for optimization, then modularization for customer adoption
- 12:40 – 14:56
Compatibility as strategy: grafting into every cloud while protecting ‘build once, run everywhere’
Jensen details the challenge of making NVIDIA infrastructure fit diverse cloud control planes and security models. He argues long-term architectural and software compatibility is essential because software maintenance is the largest cost—and CUDA’s longevity pledge is a key trust anchor.
- •NVIDIA systems must integrate with AWS, GCP, Azure, OCI, each with unique control/security planes
- •Consistency and modularity help developers run workloads broadly
- •Software maintenance dominates costs; hardware is comparatively smaller
- •NVIDIA commits to maintaining software long-term (CUDA, Shield, GeForce examples)
- •Architectural compatibility prevents exploding software team size and benefits developers
- 14:56 – 17:27
xAI’s 100,000-GPU supercluster: digital twins, staging, and rapid bring-up
Elad highlights the speed and scale of NVIDIA’s work with xAI, and Jensen credits Elon Musk’s execution urgency. Jensen describes a disciplined approach: planning backward from a target date, simulating network configurations, using digital twins, pre-staging supply chain and wiring, and validating with a smaller ‘system zero.’
- •100,000-GPU single-unit cluster built at unprecedented speed
- •Backward planning from a fixed go-live date
- •Network simulation and configuration validation
- •Digital twin pre-staging for integration and supply chain readiness
- •“System zero” reference setup to de-risk before full delivery
- 17:27 – 20:36
Super-scaling realities: tonnage, logistics, and ‘data center as a product’ operations
Looking to 200K–1M GPU clusters, Jensen says nothing is normal but physics isn’t the limit—execution is. He explains NVIDIA’s internal mindset shift to treating data centers like a product: planned, stood up, tuned, and operated to feel as seamless as opening a new phone.
- •Primary blockers are practical: capital, energy, supply chain, operations—everything
- •Scaling isn’t forbidden by physics; it’s hard due to complexity
- •Typical supercomputers can take a year+ to become fully operational
- •NVIDIA’s “data center as a product” initiative standardizes deployment/operations
- •Aspiration: customer provides space/power/cooling; NVIDIA helps bring up in ~30 days
- 20:36 – 22:20
AI in chip design: ‘AI engineers’ exploring a combinatorial design space
Sarah returns to scientific applications, and Jensen highlights AI chip designers and AI software engineers inside NVIDIA. He argues modern chips are effectively thousands of interacting modules and AI is necessary to explore the vast design space, find better optima, and meet tight schedules.
- •AI chip design is already highly effective; essential for building Hopper
- •AI explores larger design spaces than humans can under time constraints
- •Chip design resembles optimizing many modules, not a single monolithic block
- •Cross-module co-design benefits from AI’s ability to traverse combinatorial spaces
- •AI can find solutions hidden behind local minima/maxima
- 22:20 – 27:00
Market cap surge and company evolution: from ‘computers’ to ‘AI factories’ producing tokens
Elad asks what changed as NVIDIA’s valuation soared. Jensen says the company can’t change as fast as stock, but the world is realizing computing has been reinvented and data centers are becoming single-tenant “factories” that produce tokens—commodities reconstituted into valuable forms of intelligence.
- •NVIDIA pace and culture remain steady despite stock acceleration
- •Reinvented computing after ~60 years of continuity
- •New data centers are token-producing factories rather than file-storing multi-tenant systems
- •Tokens can represent many outputs: language, robotics motion, proteins, chemicals
- •Trillion-dollar modernization plus a new ‘intelligence commodity’ industry opportunity
- 27:00 – 28:33
Embodied AI: from tokenized video/motion to general robotics and self-driving
Sarah asks about embodiment, and Jensen ties robotics to tokenization and multimodal generation. He identifies two major ‘brownfield’ robotics domains—cars and humanoid robots—where deployment can scale without changing the world’s infrastructure, and suggests we’re nearing general robotics.
- •If actions can be tokenized, robots can be prompted similarly to generative models
- •Multimodal alignment (language, video, action) points toward embodied intelligence
- •Two key brownfield domains: self-driving cars and humanoid/embodied robots
- •Robotics can scale because the world is already built for cars and humans
- •Parallel progress toward AGI and ‘general robotics’
- 28:33 – 31:18
AI employees and agent ecosystems: renting specialized digital workers across platforms
Jensen predicts organizations will combine biological employees with AI “digital employees,” managed via prompts and delegation. He envisions specialized agents embedded in SaaS and EDA platforms (e.g., Cadence/Synopsys, SAP, ServiceNow) and argues platforms may benefit rather than be disrupted.
- •Future orgs blend humans and AI employees across functions
- •Management metaphor: prompt with context, delegate missions, iterate collaboratively
- •Chip design as the highest-leverage area for AI employee adoption
- •Vision of ‘renting’ massive numbers of specialized EDA agents when needed
- •SaaS/enterprise platforms become agent-rich ecosystems; NVIDIA builds agents for its own standards (CUDA/OpenUSD)
- 31:18 – 36:48
AI’s sweeping impact on science & engineering—and Jensen’s daily AI workflow
Jensen argues AI’s influence across science is underappreciated and will soon underpin most breakthroughs, dismissing ‘fad’ narratives by pointing to fundamental changes in how software/knowledge is encoded. He closes by sharing his personal habit of using tools like ChatGPT and Perplexity as tutors and for constant fact-checking.
- •AI/ML is transforming every scientific field—from quantum to chemistry to engineering
- •Rapid progress mirrors the short timeline from AlexNet ‘toy’ to superhuman vision
- •Core shift: how humans encode knowledge/software has changed
- •Expectation: generative AI becomes foundational to future papers and breakthroughs
- •Personal usage: AI as tutor, starting point for learning, and routine double-checking tool
