Skip to content
Lex Fridman PodcastLex Fridman Podcast

Tuomas Sandholm: Poker and Game Theory | Lex Fridman Podcast #12

Lex Fridman and Tuomas Sandholm on aI Mastering Poker Reveals Future of Game Theory and Strategy.

Lex FridmanhostTuomas Sandholmguest
Dec 28, 20181h 6mWatch on YouTube ↗

CHAPTERS

  1. 0:00 – 4:10

    Heads-up No-Limit Texas Hold’em as an AI benchmark (rules + why it’s hard)

    Lex and Tuomas lay out the basic structure of heads-up no-limit Texas Hold’em and why it became a flagship benchmark for imperfect-information game solving. They highlight the sequential revelation of public cards, private hole cards, and why hidden information changes everything compared to chess or Go.

  2. 4:10 – 7:08

    The Libratus vs. top pros match: setup, incentives, and interface

    Tuomas recounts the 20-day Rivers Casino match where Libratus played four elite specialists over 120,000 hands to reach statistical significance. They discuss how the event was structured, how pros were paid, and the practical UI details that ensured fairness.

  3. 7:08 – 10:24

    Confidence, betting markets, and the myth of “poker tells”

    They explore Tuomas’s uncertainty going into the match, including prior failure with Cloudico, and how betting markets underestimated the AI. The conversation then pivots to why people romanticize poker as uniquely human due to tells and psychology, and why that matters less at elite levels.

  4. 10:24 – 12:59

    Scaling to 10^161: information vs action abstraction (and abstraction pitfalls)

    Tuomas explains why solving poker requires abstraction and why abstraction in games is trickier than in single-agent settings. They distinguish information abstraction (cards) from action abstraction (bet sizes) and describe how Libratus approached each—while warning about counterintuitive pathologies.

  5. 12:59 – 14:27

    Luck, variance, and why you need 100,000+ hands

    They discuss the balance between card strength and betting skill, emphasizing poker’s high variance—especially in no-limit all-in situations. This variance explains the need for extremely long matches to confidently determine who is better.

  6. 14:27 – 18:57

    Learning vs game-theoretic solving: why imperfect information complicates value functions

    Lex asks why Libratus didn’t use deep learning and how learning could fit into poker. Tuomas explains that in imperfect-information games, the value depends on beliefs (not just states), making AlphaGo-style evaluation functions insufficient without modeling information sets and opponent beliefs.

  7. 18:57 – 22:43

    Beliefs, Bayes, and Nash equilibrium: ‘beliefs are output, not input’

    They unpack how belief distributions arise naturally from rules (uniform dealing + card removal) and are refined via Bayes’ rule under equilibrium strategies. Tuomas uses rock–paper–scissors to illustrate how a strategy implies beliefs, reinforcing that the approach doesn’t require player-specific data.

  8. 22:43 – 25:13

    Opponent exploitation: hybridizing equilibrium play with data (and why Libratus avoided it)

    Tuomas describes work that blends safe equilibrium strategies with selective exploitation based on observed weaknesses. He also explains why Libratus didn’t use exploitation against top pros: exploiting can open you up to counter-exploitation, and elite opponents are best at punishing that.

  9. 25:13 – 30:22

    Taxonomy of games and why multiplayer/general-sum is a major leap

    The discussion broadens into a map of game classes: repeated matrix games, stochastic games, and extensive-form games. Tuomas explains why two-player zero-sum games are comparatively clean, while general-sum or multi-player games introduce equilibrium selection, coordination problems, and collusion.

  10. 30:22 – 32:45

    Collusion and coordination: why cooperative poker variants explode in difficulty

    Lex asks why allowing collaboration would make poker dramatically harder, referencing Noam Brown’s comments. Tuomas connects this to coalitional game theory and gives bridge as an example of team payoffs with restricted communication, highlighting the modeling and computational challenges.

  11. 32:45 – 37:55

    From poker to practice: startups, negotiation, and autonomous-vehicle coordination

    Tuomas outlines how game-solving ideas translate into business, finance, security, and military contexts through his startups. They also discuss a concrete autonomous-vehicle angle: pre-negotiating merging behaviors across fleets via automated, combinatorial negotiation rather than ad hoc human-style interaction.

  12. 37:55 – 43:16

    Performance-oriented research: why scaling experiments matter (and the poker backlash)

    Tuomas argues that theory and small-scale experiments can mislead; large systems and real evaluation reveal what truly works. He recounts the intense negative reaction from parts of the poker community, then reframes AI as enriching the game—humans adopt the ‘Martian’ strategies AIs discover.

  13. 43:16 – 48:31

    Automated mechanism design: impossibility results and ‘islands of possibility’

    They shift to mechanism design—designing the rules to achieve desired outcomes—and Tuomas emphasizes both promise and limits. He explains that impossibility theorems rule out achieving certain properties across broad classes, but automated design can still find workable mechanisms for specific settings, including within otherwise ‘impossible’ classes.

  14. 48:31 – 54:10

    What’s next for AI/game solving: benchmarks, real-world strategy, and interpretability

    Lex asks about the next milestone after poker; Tuomas mentions candidates like StarCraft, DOTA 2, Diplomacy, and Hanabi but notes the community lacks a single agreed benchmark. He then argues the bigger breakthrough is widespread real-world strategic planning, and discusses the tension between provable guarantees and human interpretability.

  15. 54:10 – 1:06:17

    AI risk, societal impact, and the threats Tuomas worries about most

    They close on AI safety and societal consequences: Tuomas is optimistic, citing kidney exchange and supply-chain efficiency as tangible benefits, and sees “value misalignment” as mostly theoretical in his applied experience. He identifies climate change and nuclear war as primary existential risks, and explains why nuclear stability may be more fragile today despite MAD logic.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.