Skip to content
Lex Fridman PodcastLex Fridman Podcast

David Silver: AlphaGo, AlphaZero, and Deep Reinforcement Learning | Lex Fridman Podcast #86

David Silver leads the reinforcement learning research group at DeepMind and was lead researcher on AlphaGo, AlphaZero and co-lead on AlphaStar, and MuZero and lot of important work in reinforcement learning. Support this podcast by signing up with these sponsors: - MasterClass: https://masterclass.com/lex - Cash App - use code "LexPodcast" and download: - Cash App (App Store): https://apple.co/2sPrUHe - Cash App (Google Play): https://bit.ly/2MlvP5w EPISODE LINKS: Reinforcement learning (book): https://amzn.to/2Jwp5zG PODCAST INFO: Podcast website: https://lexfridman.com/podcast Apple Podcasts: https://apple.co/2lwqZIr Spotify: https://spoti.fi/2nEwCF8 RSS: https://lexfridman.com/feed/podcast/ Full episodes playlist: https://www.youtube.com/playlist?list=PLrAXtmErZgOdP_8GztsuKi9nrraNbKKp4 Clips playlist: https://www.youtube.com/playlist?list=PLrAXtmErZgOeciFP3CBCIEElOJeitOr41 OUTLINE: 0:00 - Introduction 4:09 - First program 11:11 - AlphaGo 21:42 - Rule of the game of Go 25:37 - Reinforcement learning: personal journey 30:15 - What is reinforcement learning? 43:51 - AlphaGo (continued) 53:40 - Supervised learning and self play in AlphaGo 1:06:12 - Lee Sedol retirement from Go play 1:08:57 - Garry Kasparov 1:14:10 - Alpha Zero and self play 1:31:29 - Creativity in AlphaZero 1:35:21 - AlphaZero applications 1:37:59 - Reward functions 1:40:51 - Meaning of life CONNECT: - Subscribe to this YouTube channel - Twitter: https://twitter.com/lexfridman - LinkedIn: https://www.linkedin.com/in/lexfridman - Facebook: https://www.facebook.com/LexFridmanPage - Instagram: https://www.instagram.com/lexfridman - Medium: https://medium.com/@lexfridman - Support on Patreon: https://www.patreon.com/lexfridman

Lex FridmanhostDavid Silverguest
Apr 2, 20201h 48mWatch on YouTube ↗

At a glance

WHAT IT’S REALLY ABOUT

David Silver on AlphaGo, self-play, and the path to intelligence

  1. Lex Fridman and David Silver trace Silver’s journey from childhood programming to leading DeepMind’s landmark work on AlphaGo, AlphaZero, and MuZero. They explain why Go was such a hard AI challenge, how deep reinforcement learning and self-play enabled systems to exceed human world champions, and what these results suggest about intuition, creativity, and general intelligence. Silver details the transition from hand-crafted knowledge and search to learning-based systems that discover their own strategies, and how removing human priors made the algorithms both stronger and more general. The conversation closes with reflections on future real-world applications, the nature of goals and reward in AI, and layered views on the “meaning” of intelligence and life.

IDEAS WORTH REMEMBERING

5 ideas

Go forced AI beyond brute-force search toward learned intuition.

Unlike chess, Go resists simple material evaluation and has an enormous search space, requiring systems that can learn to “understand” positions and make intuitive judgments rather than rely solely on deep combinatorial search.

Reinforcement learning provides a clean problem definition for intelligence.

Silver views intelligence as an agent interacting with an environment to maximize reward over time, making reinforcement learning a unifying framework to formalize and study many aspects of intelligent behavior.

Deep neural networks unlocked scalable representations for RL agents.

Using deep networks to approximate policies, value functions, and models allowed systems like AlphaGo to handle raw board states and complex patterns, overcoming the representational limits of older, hand-designed methods.

Self-play enables systems to surpass human knowledge, not just imitate it.

AlphaGo Zero and AlphaZero learned entirely from games against themselves, starting from random play and iteratively correcting their own errors, ultimately discovering strategies and opening patterns humans had never found.

Removing human priors can make algorithms both stronger and more general.

AlphaZero uses almost no game-specific knowledge yet achieves superhuman performance in Go, chess, and shogi with the same code, illustrating that minimal, simple principles can yield powerful, widely applicable systems.

WORDS WORTH SAVING

5 quotes

It seemed to me that the only step of major significance was to try and recreate something akin to human intelligence.

David Silver

In order to crack Go, we would need to get something akin to human intuition.

David Silver

If you're not learning, what else are you doing?

David Silver

When you’ve bestowed in them the ability to judge better than you can, then trust the system to do so.

David Silver

Many abilities, like intuition and creativity, that we previously thought were in the domain only of the human mind are actually accessible to machine intelligence as well.

David Silver (as quoted by Lex Fridman at the end)

David Silver’s background and early fascination with programming, games, and AIWhy Go is uniquely challenging for AI compared to chess and other gamesReinforcement learning fundamentals and the rise of deep reinforcement learningThe evolution from Monte Carlo tree search and MoGo to AlphaGoAlphaGo Zero and AlphaZero: self-play, removing human data, and generalizing across gamesMuZero: learning dynamics and planning without knowing the rulesCreativity, intuition, and the broader implications of self-play for AI and humanity

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome