This video isn’t embeddableWatch on YouTube →

Stanford CS153 Frontier Systems | Jensen Huang from NVIDIA on the Compute Behind Intelligence

For more information about Stanford's online Artificial Intelligence programs, visit: https://stanford.io/ai Follow along with the course schedule and syllabus, visit: https://cs153.stanford.edu/ In a CS153 Frontier Systems lecture, the class hosts Jensen Huang, CEO of NVIDIA, who argues computing is being reinvented for the first time in 64 years as software shifts from prerecorded execution to real-time generation, with NVIDIA's extreme co-design across chips, compilers, networks, and systems delivering a million-fold speedup over the past decade versus Moore's Law's 100x. He walks through the architectural logic of Hopper (pre-training), Grace Blackwell NVLink72 (inference and decode), Vera Rubin (agents), and the upcoming Feynman generation built for swarms of agents and sub-agents, while pushing back on MFU as a misleading metric in favor of tokens-per-watt and real evals. Huang also defends open models like Nemotron, BioNemo, and Alpamayo as essential for safety, transparency, and democratizing AI across underserved languages and scientific domains, and forecasts compute energy demand growing roughly a thousandfold, making this the strongest market moment in history to invest in sustainable energy and grid upgrades. Guest Speaker: Jensen Huang founded NVIDIA in 1993 and has served since its inception as president, chief executive officer, and a member of the board of directors. Since its founding, NVIDIA has pioneered accelerated computing. The company’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, and ignited the era of modern AI. NVIDIA is now driving the platform shift of accelerated computing and generative AI, transforming the world's largest industries and profoundly impacting society. Huang has been elected to the National Academy of Engineering and in 2026 was appointed to the President’s Council of Advisors on Science and Technology. He is a recipient of the Semiconductor Industry Association’s highest honor, the Robert N. Noyce Award; the IEEE Founder’s Medal; the Dr. Morris Chang Exemplary Leadership Award; and honorary doctorate degrees from Taiwan’s National Chiao Tung University, National Taiwan University, Oregon State University, Huazhong University of Science and Technology, and Linköping University. He has been named the world’s best CEO by Fortune, the Economist, and Brand Finance, as well as one of TIME magazine’s 100 most influential people. Prior to founding NVIDIA, Huang worked at LSI Logic and Advanced Micro Devices. He holds a BSEE degree from Oregon State University and an MSEE degree from Stanford University. Follow the playlist: https://youtube.com/playlist?list=PLoROMvodv4rN447WKQ5oz_YdYbS74M5IA&si=DOJ5amlyRdyMJBhG

Jensen Huangguest

May 13, 20261h 8mWatch on YouTube ↗

CHAPTERS

0:09 – 3:15
Computing is being reinvented: from pre-recorded software to generative, contextual systems
Huang frames this moment as the biggest reinvention in computing in ~60+ years, comparable to the IBM System/360 era but more fundamental. He contrasts traditional, “pre-recorded” computing with real-time generated outputs that adapt to context and intention.
- •Classic computing model and mental model stayed largely constant for decades
- •Generative AI enables real-time, contextually relevant outputs
- •Shift from explicit instruction-following toward intention-aware interaction
- •Implications ripple through software, systems, and go-to-market thinking
3:15 – 5:46
Stack-wide disruption: new development methods, new systems, and new applications
He extends the “everything changed” claim across the entire stack—software engineering workflows, runtime (neural nets vs binaries), infrastructure (network/storage), and applications. Robotics and self-driving are highlighted as examples unlocked by deep learning.
- •Software development methodology and org structures must change
- •Neural-network execution differs from compiled binaries
- •System design impacts networking/storage/cloud services
- •AI unlocks robotics/robotaxis and other previously impractical applications
5:46 – 8:34
From GPT to agentic systems: continuous computing and what comes next
Huang describes how GPT made machine “thinking” and step-by-step reasoning feel inevitable, leading to tool use and agentic systems. This transition changes compute from on-demand to continuously running, forcing a rethink of personal computing and cloud services.
- •GPT-era models made reasoning and “thinking tokens” obvious
- •Tool use emerges as externally consumed tokens
- •Agentic systems represent the next phase after generative AI
- •Continuous workloads reshape cloud economics and system design
8:34 – 11:06
Co-design explained: why optimizing hardware + compilers + frameworks together wins
He answers the co-design question using RISC as the canonical example: compiler and ISA must be designed together. He generalizes this to post–general-purpose computing, arguing extreme workloads demand end-to-end optimization across chips, software, and systems.
- •RISC success came from ISA/compiler co-design (Hennessy’s legacy)
- •General-purpose computers aren’t ideal for extreme workloads
- •Co-design spans algorithms, compilers/frameworks, and chip/system architecture
- •End-to-end optimization beats locally optimized components
11:06 – 13:50
Co-design at NVIDIA: beyond Moore’s Law (the “million‑X” claim)
Huang argues that co-design enabled NVIDIA to outpace Moore’s Law dramatically—claiming 100,000× to 1,000,000× improvement over a decade for relevant workloads. He connects this abundance to why researchers can train on “all the world’s data.”
- •Moore’s Law + Dennard scaling limits yield far smaller gains today
- •Co-design across CPU/GPU/networking/storage changes the scaling curve
- •Claimed 100k×–1M× acceleration over ~10 years in targeted compute
- •Radical compute abundance changes what problems and datasets are feasible
13:50 – 17:08
How education should change: learn with AI while keeping first principles
He argues AI must be part of education both as subject matter and as a learning tool. Textbooks can’t keep pace with real-time knowledge, but foundational principles still matter and should anchor AI-assisted learning.
- •AI should be integrated into curriculum and study workflow
- •Textbook publishing cadence can’t match current knowledge velocity
- •Use AI to read/summarize papers and act like a personal research assistant
- •First principles (e.g., classic methodologies) remain essential context
17:08 – 23:49
Open source vs closed models: why NVIDIA builds open foundation models
Huang recommends frontier proprietary tools for productivity, but explains why NVIDIA also advances open models. He positions open models as necessary for domains lacking data/scale (bio, climate, autonomy, robotics) and for language coverage across smaller communities.
- •Frontier closed products (tooling + model) often outperform DIY open downloads
- •NVIDIA’s open efforts aim to seed whole ecosystems with usable foundations
- •Domain foundation models: BioNemo, Alpamayo (AV), Groot (robotics), climate
- •Open language models support smaller languages and downstream fine-tuning
23:49 – 26:10
Why openness matters for safety and security: transparency and “swarms” of defense
He argues truly safe AI requires openness because black boxes can’t be audited or secured effectively. For cybersecurity, he advocates deploying vast numbers of cheap, fast models (e.g., Nemotron Nano) as a defensive swarm rather than escalating model-vs-model arms races.
- •Opaque AI systems are difficult to defend, secure, or interrogate
- •“Reason step-by-step” is not a complete solution because models can lie
- •Cyber defense should use massive swarms of cost-effective models
- •Nemotron Nano example: fast detection models deployed at huge scale
26:10 – 30:01
Coalition scaling and utilization: why MFU can be a misleading metric
In a discussion about low cluster utilization, Huang argues MFU (model FLOPS utilization) often measures the wrong thing because real systems are bottlenecked by bandwidth, memory, or network. He emphasizes overprovisioning to avoid Amdahl’s-law bottlenecks and focusing on performance outcomes rather than FLOPS.
- •MFU can be “simply wrong” depending on bottleneck and workload phase
- •Datacenter constraints include flops, bandwidth, capacity, and networking
- •Overprovisioning reduces bottlenecks; peak utilization can be brief
- •Optimize for real performance, not a single easy-to-game metric
30:01 – 33:14
Measuring progress: tokens-per-watt, NVLink bandwidth, and the need for serious evals
He reframes the metric conversation around workload-relevant outcomes like tokens per watt—where bandwidth and interconnect can dominate, especially during decode. He stresses that evaluation choice shapes behavior, and the correct metric must reflect actual value (not all tokens are equal).
- •Tokens per watt captures more of the real system tradeoffs than FLOPS
- •Decode is bandwidth-dominated; prefill vs decode have different profiles
- •High tokens/watt can coincide with low MFU (disaggregated workloads)
- •Define rigorous evals aligned with user value; avoid optimizing proxies
33:14 – 38:12
Architecture roadmap logic: Hopper → Grace Blackwell → Vera Rubin → (future) Feynman
Huang explains NVIDIA’s generation-by-generation design by matching systems to dominant compute patterns. Hopper targeted pre-training, Grace Blackwell NVLink72 targeted inference/decode via rack-scale bandwidth, and Vera Rubin targets agent workloads with long memory and low-latency CPU tool execution; Feynman is hinted as “agent swarms” at scale.
- •Hopper designed around pre-training’s scaling demands
- •Grace Blackwell NVLink72: rack-scale computer optimized for decode bandwidth
- •Vera Rubin: agent-centric design with storage-on-fabric + low-latency CPUs
- •Feynman teased as the next step for hierarchical/swarming agent systems
38:12 – 41:57
Energy as the next bottleneck: efficiency, grid upgrades, and sustainable generation
He predicts compute energy demand could grow by ~1000× and argues efficiency gains must compound. He also calls for massive investment in sustainable energy and grid modernization, claiming market forces now support buildout without heavy subsidies.
- •Primary controllable lever: energy efficiency (e.g., tokens per watt)
- •Future workloads: generative + continuous computing drives demand
- •Energy needs could be orders of magnitude higher than today
- •Strong market incentives make this a prime moment to invest in clean energy
41:57 – 47:23
Career advice: seek resilience through struggle, not only passion
Huang challenges the “only do what you love” narrative, arguing many people don’t know their passions and that excellence often requires enduring unpleasant work. He frames suffering and struggle as training for resilience—an essential trait when stakes are high.
- •Passion-first career advice can be unrealistic for many people
- •Most jobs include large portions of hard, unenjoyable work
- •Resilience is built through repeated struggle and discomfort
- •Treat work as responsibility; do the best possible regardless of glamour
47:23 – 52:52
Policy and geopolitics: GPUs aren’t ‘atomic bombs,’ and restricting markets harms industry
Responding to concerns about adversarial access, he rejects analogies equating GPUs with weapons of mass destruction, emphasizing GPUs’ general-purpose societal uses. He argues conceding markets through policy weakens the domestic tech ecosystem and that “singularity panic” narratives are irresponsible.
- •GPUs power broad civilian applications (gaming, medical imaging, logistics)
- •He rejects “GPUs = atomic bombs” as a broken starting premise
- •Conceding large global markets can hollow out national tech leadership
- •Critiques doomer ‘instant singularity’ claims as inaccurate and harmful
52:52 – 57:20
Why universities can’t get enough compute: budgeting and aggregation, not chip supply
In a pointed exchange, Huang claims the limiting factor for universities is not that chips are withheld but that institutions lack centralized, large-scale budgeting and shared infrastructure. He advocates campus-wide supercomputers or buying large cloud allocations—on the order of $1B—to ensure broad access.
- •He disputes the idea that orders are placed but not fulfilled
- •Core issue: decentralized grants/departments can’t fund shared billion-scale compute
- •Proposes campus-wide shared AI supercomputers (like past ‘big science’ projects)
- •Concrete suggestion: allocate ~$1B (e.g., via cloud) for universal researcher access
57:20 – 1:08:23
Leadership reflections: CEO joys and vulnerabilities, early mistakes, and forecasting in fog
Huang closes with leadership lessons: the creative joy of aligning vision/strategy/execution, and the heavy responsibility when things go wrong. He recounts early NVIDIA technical missteps that forced strategic learning, cites mobile as a strategic mistake, and explains his forecasting method: observe signals, reason from first principles, build a mental model, then manage optionality and opportunity cost.
- •Best part: shaping vision + strategy + execution with exceptional teams
- •Hard part: responsibility, fear, and public vulnerability during downturns
- •Early product failure taught strategy and maneuvering; mobile push seen as mistake
- •Forecasting approach: observe → first principles → mental model → work backward while preserving options

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

iOS

Android

Claude

Chrome

Computing is being reinvented: from pre-recorded software to generative, contextual systems

Stack-wide disruption: new development methods, new systems, and new applications

From GPT to agentic systems: continuous computing and what comes next

Co-design explained: why optimizing hardware + compilers + frameworks together wins

Co-design at NVIDIA: beyond Moore’s Law (the “million‑X” claim)

How education should change: learn with AI while keeping first principles

Open source vs closed models: why NVIDIA builds open foundation models

Why openness matters for safety and security: transparency and “swarms” of defense

Coalition scaling and utilization: why MFU can be a misleading metric

Measuring progress: tokens-per-watt, NVLink bandwidth, and the need for serious evals

Architecture roadmap logic: Hopper → Grace Blackwell → Vera Rubin → (future) Feynman

Energy as the next bottleneck: efficiency, grid upgrades, and sustainable generation

Career advice: seek resilience through struggle, not only passion

Policy and geopolitics: GPUs aren’t ‘atomic bombs,’ and restricting markets harms industry

Why universities can’t get enough compute: budgeting and aggregation, not chip supply

Leadership reflections: CEO joys and vulnerabilities, early mistakes, and forecasting in fog

Get more out of YouTube videos.