Skip to content
Lex Fridman PodcastLex Fridman Podcast

Jim Keller: Moore's Law, Microprocessors, and First Principles | Lex Fridman Podcast #70

Jim Keller is a legendary microprocessor engineer, having worked at AMD, Apple, Tesla, and now Intel. He's known for his work on the AMD K7, K8, K12 and Zen microarchitectures, Apple A4, A5 processors, and co-author of the specifications for the x86-64 instruction set and HyperTransport interconnect. This episode is presented by Cash App. Download it & use code "LexPodcast": Cash App (App Store): https://apple.co/2sPrUHe Cash App (Google Play): https://bit.ly/2MlvP5w PODCAST INFO: Podcast website: https://lexfridman.com/podcast Apple Podcasts: https://apple.co/2lwqZIr Spotify: https://spoti.fi/2nEwCF8 RSS: https://lexfridman.com/feed/podcast/ Full episodes playlist: https://www.youtube.com/playlist?list=PLrAXtmErZgOdP_8GztsuKi9nrraNbKKp4 Clips playlist: https://www.youtube.com/playlist?list=PLrAXtmErZgOeciFP3CBCIEElOJeitOr41 OUTLINE: 0:00 - Introduction 2:12 - Difference between a computer and a human brain 3:43 - Computer abstraction layers and parallelism 17:53 - If you run a program multiple times, do you always get the same answer? 20:43 - Building computers and teams of people 22:41 - Start from scratch every 5 years 30:05 - Moore's law is not dead 55:47 - Is superintelligence the next layer of abstraction? 1:00:02 - Is the universe a computer? 1:03:00 - Ray Kurzweil and exponential improvement in technology 1:04:33 - Elon Musk and Tesla Autopilot 1:20:51 - Lessons from working with Elon Musk 1:28:33 - Existential threats from AI 1:32:38 - Happiness and the meaning of life CONNECT: - Subscribe to this YouTube channel - Twitter: https://twitter.com/lexfridman - LinkedIn: https://www.linkedin.com/in/lexfridman - Facebook: https://www.facebook.com/LexFridmanPage - Instagram: https://www.instagram.com/lexfridman - Medium: https://medium.com/@lexfridman - Support on Patreon: https://www.patreon.com/lexfridman

Lex FridmanhostJim Kellerguest
Feb 5, 20201h 34mWatch on YouTube ↗

CHAPTERS

  1. 2:01 – 3:44

    Brain vs. computer: memory, computation, and what we don’t know about neurons

    Lex opens by asking how computers compare to the human brain. Keller frames computers as a separation of memory and computation, while noting our limited understanding of what individual neurons compute and how cognition is represented.

    • Computers as memory + computation with global memory architectures
    • Brains as layered neuron structures with distributed representations
    • Neural networks as a partial, not fully understood, analogy
    • Why it’s hard to compare: we can’t specify a neuron’s computation
  2. 3:44 – 5:57

    From atoms to data centers: abstraction layers and how processors are built

    Keller walks through the engineering stack from materials and transistors up to software and data centers. He emphasizes that computer engineering succeeds largely because its abstraction layers are well-defined and composable.

    • Atoms/materials → transistors → gates → functional units → processors
    • Processing elements and coherent cores as modern building blocks
    • Instruction set, assembly, and high-level languages as software abstractions
    • Design starts with a target (speed, power, cost) and spans many disciplines
  3. 5:57 – 6:38

    Instruction sets and why “simple” computers don’t win

    The conversation turns to instruction set architectures (ISA) and why they remain stable over decades. Keller contrasts the ideal of simplicity with the market reality that high-performance chips require significant internal complexity.

    • What an ISA is and why most programs use a small subset of ops
    • Stability of x86/ARM over decades
    • In-order execution vs. modern performance demands
    • Why customers don’t buy “clean but slow” machines
  4. 6:38 – 11:17

    Out-of-order execution as a dependency-graph problem: finding parallelism in a “narrative”

    Keller explains modern CPUs as machines that fetch large instruction windows, build dependency graphs, and execute independent pieces in parallel. He introduces the idea of a program as a serial narrative with latent parallel structure.

    • Fetch hundreds of instructions and compute a dependency graph
    • Execute micrographs in parallel while preserving correct semantics
    • CPU parallelism is “found” inside a serial narrative
    • GPU parallelism is “given” across many independent elements (e.g., pixels)
  5. 11:17 – 17:54

    Branch prediction: the ‘little supercomputer’ inside your CPU

    Keller details why branch predictability is critical to exploiting large instruction windows. He traces branch prediction from simple last-outcome heuristics to sophisticated predictors resembling pattern recognition systems, and discusses misprediction costs.

    • Modern CPUs need extremely high branch accuracy for large windows
    • Progression: last-time prediction → counters → history-based methods
    • Today’s predictors resemble neural-pattern recognition ensembles
    • Misprediction flushes the pipeline; newer ideas reuse invariant work
    • Complexity explosion: 85% accuracy takes little state; 99% takes megabits
  6. 17:54 – 20:39

    Determinism vs. noise: why programs can be reproducible even with chaotic execution

    Lex asks whether running the same program always yields the same answer. Keller distinguishes language-level determinism from practical nondeterminism seen in graphics/HPC, and discusses why AI workloads sometimes tolerate—or benefit from—noisy computation.

    • Correct C programs are defined to be deterministic
    • Graphics/GPU history of nondeterministic results and why HPC disliked it
    • AI can trade precision for speed; input data is already noisy
    • Developers still want deterministic switches for debugging and trust
    • Execution can be nondeterministic internally yet produce deterministic outputs
  7. 20:39 – 30:41

    Building computers and building teams: people as ‘functional units’ and recipe vs. understanding

    Keller shifts from chip architecture to organizational architecture, describing how teams with different skill sets combine to build complex systems. He contrasts executing ‘recipes’ with cultivating deep understanding, using bread-making as an analogy.

    • Reframing organizations like architectures: people as functional units
    • Great teams mix intuitive leap-makers and rigorous evaluators
    • Recipe execution is efficient but brittle outside its scope
    • Deep understanding enables transfer across domains and novel problems
    • Art vs. science revisited: knowing when to unpack assumptions vs. ship
  8. 30:41 – 39:46

    Moore’s Law isn’t dead: cascades of innovations and physical headroom

    Keller argues Moore’s Law persists because it’s not one trick but thousands of innovations with their own S-curves. He discusses transistor scaling in atomic terms, manufacturing constraints, and why pessimistic ‘death’ predictions keep failing.

    • Moore’s Law as a cascade of diminishing-return curves yielding exponential progress
    • Transistors described in atoms; quantum effects appear far smaller than today’s devices
    • Manufacturing, materials, optics, chemistry, and metallurgy as key enablers
    • Expectations shape architecture: teams must be prepared for more transistors
    • Roadmap thinking: believing in scaling prevents organizations from ‘drowning’ in complexity
  9. 39:46 – 51:27

    When quantity becomes kind: new computation paradigms from scalar to topology

    They explore how more compute enables fundamentally different mathematical abstractions and workloads. Keller links orders-of-magnitude improvements to shifts in what algorithms become practical, including modern AI’s projection-like representations.

    • Evolution of computation: scalar → vector → matrix → topological/data-shape thinking
    • AI progression: rules → search → large-scale training and representation learning
    • Debate: is training ‘search’ or layered projections? (given vs. found search space)
    • Despite new workloads, hardware primitives remain adds/loads/multiplies
    • ‘Difference in quantity becomes difference in kind’ (ant vs. anthill analogy)
  10. 51:27 – 1:00:03

    Technology’s societal trajectory and cosmic framing: physics, philosophy, and abstraction layers

    Lex pushes to the big picture: whether technology is improving the world and what it’s ‘for.’ Keller reflects on complexity growth, machine learning’s opaque-yet-useful patterns, and the possibility that humans are building new layers of abstraction.

    • Keller’s view of individual responsibility vs. massive parallel societal forces
    • ML finds patterns without human-intuitive functional forms
    • Humans as builders of new abstraction layers, not necessarily the ‘peak’
    • Brains: possibly non-magical yet experientially mysterious (meditation, emergence)
    • Compute scale: brains ~10^18 ops; machines can exceed that, but implications are unclear
  11. 1:00:03 – 1:02:59

    Is the universe a computer? Simulation arguments, limits, and ‘weird’ physical rules

    The discussion turns to the computational nature of reality. Keller critiques simplistic ‘universe-as-computer’ ideas by pointing to the immense computation implied by quantum descriptions, while acknowledging physics has curious constraints (uncertainty, lightspeed).

    • Quantum descriptions appear computationally exorbitant at fine scales
    • Physical rules: uncertainty, locality via speed of light, and entanglement weirdness
    • Simulation framing seems to break under close inspection
    • Physics as a set of interdependent equations with unresolved corners
    • Humility: both physics and philosophy grapple with ‘why is there anything?’
  12. 1:02:59 – 1:05:12

    Kurzweil, exponential futures, and ‘computronium’: what extreme scaling could mean

    Lex asks about Kurzweil-style indefinite exponential progress. Keller explores the implications of extreme compute density and scale, argues silicon scarcity isn’t the issue, and emphasizes manufacturing/equipment and first-principles thinking about atoms-in-configuration.

    • Exponential growth at large exponents becomes hard to interpret meaningfully
    • Compute scaling isn’t only shrinking: mass/scale of material could multiply compute
    • ‘Computronium’ as a sci-fi notion of matter optimized for computation
    • Materials are abundant; cost lies in equipment and process mastery
    • Elon’s framing: decide desired atomic configuration, then engineer how to place atoms
  13. 1:05:12 – 1:20:31

    Tesla Autopilot hardware: data vs. cognition, safety, regulators, and specialization trade-offs

    Keller discusses autonomy as an engineering problem shaped by attention, data, and solvable vehicle dynamics, while Lex argues for deeper human-modeling complexity. Keller covers regulator expectations, safety scrutiny, and the hardware tension between specialization and fast-changing ML algorithms under tight cost constraints.

    • Autonomy framed as attention + ballistics/topography; Lex emphasizes human behavior complexity
    • Robots ‘maximize the givens’: leverage mapping, constraints, and updateable priors
    • Regulators focus on scenario outcomes, not prescribing specific technologies
    • Hardware design tension: accelerator specialization vs. algorithm churn (GPU vs. custom)
    • Cost constraint: make autopilot affordable enough to ship in every car
  14. 1:20:31 – 1:28:34

    Working with Elon Musk: first principles, stripping assumptions, and learning as a practice

    Keller reflects on Elon’s style: relentless first-principles decomposition and intolerance for local maxima. He describes the emotional difficulty of abandoning self-protective assumptions, and highlights reading and deliberate learning as ways to acquire leverage quickly.

    • Elon’s ‘keep digging’ approach to first principles beyond superficial layers
    • Innovation can look chaotic because dismantling assumptions is painful
    • Psychological cost: most thinking protects self-conception; real clarity is taxing
    • Craftsmanship plus bold rethinking: why the experience can be fun or brutal
    • Keller’s learning method: intensive reading to rapidly bootstrap new domains (e.g., management)
  15. 1:28:34 – 1:34:43

    AI existential risk, meaning, and embracing the unknown

    The final stretch tackles fears of superintelligence and the meaning of life. Keller is skeptical of AI as an existential enemy, emphasizes niches and meaning-making, and closes with a view that the universe ‘does what it does’—creating complexity that explores itself.

    • Keller’s skepticism about superintelligence wanting human-style conflicts
    • Existential challenge as psychological (status/identity) more than physical annihilation
    • Good/bad tension as part of competitive reality rather than a simple ‘evil’ framing
    • Personal stance: doesn’t want to relive moments; prefers optimism + anxiety of the unknown
    • Meaning as emergent process: atoms → life → exploration and understanding

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.