David Silver: AlphaGo, AlphaZero, and Deep Reinforcement Learning | Lex Fridman Podcast #86

David Silver: AlphaGo, AlphaZero, and Deep Reinforcement Learning | Lex Fridman Podcast #86

Lex Fridman PodcastApr 3, 20201h 48m

Lex Fridman (host), David Silver (guest)

David Silver’s background and early fascination with programming, games, and AIWhy Go is uniquely challenging for AI compared to chess and other gamesReinforcement learning fundamentals and the rise of deep reinforcement learningThe evolution from Monte Carlo tree search and MoGo to AlphaGoAlphaGo Zero and AlphaZero: self-play, removing human data, and generalizing across gamesMuZero: learning dynamics and planning without knowing the rulesCreativity, intuition, and the broader implications of self-play for AI and humanity

In this episode of Lex Fridman Podcast, featuring Lex Fridman and David Silver, David Silver: AlphaGo, AlphaZero, and Deep Reinforcement Learning | Lex Fridman Podcast #86 explores david Silver on AlphaGo, self-play, and the path to intelligence Lex Fridman and David Silver trace Silver’s journey from childhood programming to leading DeepMind’s landmark work on AlphaGo, AlphaZero, and MuZero. They explain why Go was such a hard AI challenge, how deep reinforcement learning and self-play enabled systems to exceed human world champions, and what these results suggest about intuition, creativity, and general intelligence. Silver details the transition from hand-crafted knowledge and search to learning-based systems that discover their own strategies, and how removing human priors made the algorithms both stronger and more general. The conversation closes with reflections on future real-world applications, the nature of goals and reward in AI, and layered views on the “meaning” of intelligence and life.

David Silver on AlphaGo, self-play, and the path to intelligence

Lex Fridman and David Silver trace Silver’s journey from childhood programming to leading DeepMind’s landmark work on AlphaGo, AlphaZero, and MuZero. They explain why Go was such a hard AI challenge, how deep reinforcement learning and self-play enabled systems to exceed human world champions, and what these results suggest about intuition, creativity, and general intelligence. Silver details the transition from hand-crafted knowledge and search to learning-based systems that discover their own strategies, and how removing human priors made the algorithms both stronger and more general. The conversation closes with reflections on future real-world applications, the nature of goals and reward in AI, and layered views on the “meaning” of intelligence and life.

Key Takeaways

Go forced AI beyond brute-force search toward learned intuition.

Unlike chess, Go resists simple material evaluation and has an enormous search space, requiring systems that can learn to “understand” positions and make intuitive judgments rather than rely solely on deep combinatorial search.

Get the full analysis with uListen AI

Reinforcement learning provides a clean problem definition for intelligence.

Silver views intelligence as an agent interacting with an environment to maximize reward over time, making reinforcement learning a unifying framework to formalize and study many aspects of intelligent behavior.

Get the full analysis with uListen AI

Deep neural networks unlocked scalable representations for RL agents.

Using deep networks to approximate policies, value functions, and models allowed systems like AlphaGo to handle raw board states and complex patterns, overcoming the representational limits of older, hand-designed methods.

Get the full analysis with uListen AI

Self-play enables systems to surpass human knowledge, not just imitate it.

AlphaGo Zero and AlphaZero learned entirely from games against themselves, starting from random play and iteratively correcting their own errors, ultimately discovering strategies and opening patterns humans had never found.

Get the full analysis with uListen AI

Removing human priors can make algorithms both stronger and more general.

AlphaZero uses almost no game-specific knowledge yet achieves superhuman performance in Go, chess, and shogi with the same code, illustrating that minimal, simple principles can yield powerful, widely applicable systems.

Get the full analysis with uListen AI

Learning implicit models of the world enables planning without explicit rules.

MuZero learns to predict only the aspects of the environment needed for planning, achieving state-of-the-art performance in Atari and board games without ever being given their formal rules.

Get the full analysis with uListen AI

AI systems can exhibit genuine creativity in well-defined domains.

Through self-play, systems like AlphaGo produced novel, high-level moves (e. ...

Get the full analysis with uListen AI

Notable Quotes

It seemed to me that the only step of major significance was to try and recreate something akin to human intelligence.

David Silver

In order to crack Go, we would need to get something akin to human intuition.

David Silver

If you're not learning, what else are you doing?

David Silver

When you’ve bestowed in them the ability to judge better than you can, then trust the system to do so.

David Silver

Many abilities, like intuition and creativity, that we previously thought were in the domain only of the human mind are actually accessible to machine intelligence as well.

David Silver (as quoted by Lex Fridman at the end)

Questions Answered in This Episode

To what extent can the self-play paradigm be transferred safely from games to high-stakes real-world domains like medicine or autonomous driving?

Lex Fridman and David Silver trace Silver’s journey from childhood programming to leading DeepMind’s landmark work on AlphaGo, AlphaZero, and MuZero. ...

Get the full analysis with uListen AI

Are there fundamental limits to performance gains from self-play, or will more compute and better architectures keep pushing systems far beyond current human and machine levels?

Get the full analysis with uListen AI

How should we think about “goals” and reward functions for AI in open-ended environments where human values are complex and sometimes conflicting?

Get the full analysis with uListen AI

What kinds of new scientific or mathematical discoveries might arise when self-play systems are applied directly to real scientific domains rather than abstract games?

Get the full analysis with uListen AI

If AI can clearly outperform humans in intuition and creativity within constrained domains, how might that change how we educate, train, and define expertise for future generations?

Get the full analysis with uListen AI

Transcript Preview

Lex Fridman

The following is a conversation with David Silver, who leads the Reinforcement Learning Research Group at DeepMind, and was the lead researcher on AlphaGo, AlphaZero, and co-led the AlphaStar and MuZero efforts, and a lot of important work in reinforcement learning in general. I believe AlphaZero is one of the most important accomplishments in the history of artificial intelligence, and David is one of the key humans who brought AlphaZero to life, together with a lot of other great researchers at DeepMind. He's humble, kind, and brilliant. We were both jet-lagged but didn't care, and made it happen. It was a pleasure and truly an honor to talk with David. This conversation was recorded before the outbreak of the pandemic. For everyone feeling the medical, psychological, and financial burden of this crisis, I'm sending love your way. Stay strong. We're in this together. We'll beat this thing. This is the Artificial Intelligence Podcast. If you enjoy it, subscribe on YouTube, review it with five stars on Apple Podcasts, support it on Patreon, or simply connect with me on Twitter @lexfridman, spelled F-R-I-D-M-A-N. As usual, I'll do a few minutes of ads now, and never any ads in the middle that can break the flow of the conversation. I hope that works for you and doesn't hurt the listening experience. Quick summary of the ads. Two sponsors: MasterClass and Cash App. Please consider supporting the podcast by signing up to MasterClass at masterclass.com/lex and downloading Cash App and using code LEXPODCAST. This show is presented by Cash App, the number one finance app in the App Store. When you get it, use code LEXPODCAST. Cash App lets you send money to friends, buy Bitcoin, and invest in the stock market with as little as $1. Since Cash App allows you to buy Bitcoin, let me mention that cryptocurrency, in the context of the history of money, is fascinating. I recommend A Cent of Money as a great book on this history. Debits and credits on ledgers started around 30,000 years ago, the U.S. dollar, created over 200 years ago, and Bitcoin, the first decentralized cryptocurrency, released just over 10 years ago. So given that history, cryptocurrency is still very much in its early days of development, but it's still aiming to, and just might, redefine the nature of money. So again, if you get Cash App from the App Store or Google Play and use the code LEXPODCAST, you get $10, and Cash App will also donate $10 to FIRST, an organization that is helping to advance robotics and STEM education for young people around the world. This show is sponsored by MasterClass. Sign up at masterclass.com/lex to get a discount and to support this podcast. In fact, for a limited time now, if you sign up for an all-access pass for a year, you get to get another all-access pass to share with a friend. Buy one, get one free. When I first heard about MasterClass, I thought it was too good to be true. For $180 a year, you get an all-access pass to watch courses from, to list some of my favorites, Chris Hadfield on space exploration, Neil deGrasse Tyson on scientific thinking and communication, Will Wright, the creator of SimCity and Sims, on game design, Jane Goodall on conservation, Carlos Santana on guitar. His song Europa could be the most beautiful guitar song ever written. Garry Kasparov on chess, Daniel Negreanu on poker, and many, many more. Chris Hadfield explaining how rockets work and the experience of being launched into space alone is worth the money. For me, the key is to not be overwhelmed by the abundance of choice. Pick three courses you want to complete, watch each of them all the way through. It's not that long, but it's an experience that will stick with you for a long time, I promise. It's easily worth the money. You can watch it on basically any device. Once again, sign up on masterclass.com/lex to get a discount and to support this podcast. And now here's my conversation with David Silver. What was the first program you've ever written, and what programming language? Do you remember?

Install uListen to search the full transcript and get AI-powered insights

Get Full Transcript

Get more from every podcast

AI summaries, searchable transcripts, and fact-checking. Free forever.

Add to Chrome