Tuomas Sandholm: Poker and Game Theory | Lex Fridman Podcast #12

Lex Fridman and Tuomas Sandholm on aI Mastering Poker Reveals Future of Game Theory and Strategy.

Lex FridmanhostTuomas Sandholmguest

Dec 28, 20181h 6mWatch on YouTube ↗

EVERY SPOKEN WORD

115 min read · 23,353 words

0:00 – 15:00
The following is a…
1. LFLex Fridman
  The following is a conversation with Tuomas Sandholm. He's a professor at CMU and co-creator of Libratus, which is the first AI system to beat top human players in the game of Heads-up No-Limit Texas Hold'Em. He has published over 450 papers on game theory and machine learning, including a best paper in 2017 at NIPS, now renamed to NeurIPS, which is where I caught up with him for this conversation. His research and companies have had wide-reaching impact in the real world, especially because he and his group not only propose new ideas, but also build systems to prove that these ideas work in the real world. This conversation is part of the MIT course on Artificial General Intelligence and the Artificial Intelligence podcast. If you enjoy it, subscribe on YouTube, iTunes, or simply connect with me on Twitter @lexfrid. And now here's my conversation with Tuomas Sandholm. Can you describe, at the high level, the game of poker, Texas Hold'Em, Heads-up Texas Hold'Em, for people who might not be familiar, uh, this card game?
2. TSTuomas Sandholm
  Yeah, happy to. So Heads-up No-Limit Texas Hold'Em has really emerged in the AI community as a main benchmark for testing these application-independent algorithms for imperfect information game solving. And this is a game, uh, that's actually played by humans. You don't see it that much on TV or casinos, because, uh, well, for various reasons, but, uh, uh, you do see it in some expert-level casinos, and you see it in the best poker movies of all time. It's actually an event in the World Series of Poker. But mostly it's played online and typically for pretty, uh, big sums of money. And this is a game that usually only experts play. So if you re- uh, go to your home game on a Friday night, it probably is not gonna be Heads-up No-Limit Texas Hold'Em. It might be, uh, No-Limit Texas Hold'Em in some cases, but typically for a, a big group when it's not as competitive. While heads-up means it's two players, so it's really like me against you. Am I better or are you better? Much like chess or, or, or Go in that sense, but an imperfect-information game, which makes it much harder, of course. I have to deal with issues of, uh, you knowing things that I don't know, and I know things that you don't know, instead of pieces being nicely laid on the board for both of us to see.
3. LFLex Fridman
  So in Texas Hold'Em, uh, there's, uh, two cards that you only see? The-
4. TSTuomas Sandholm
  Yes.
5. LFLex Fridman
  ... they belong to you?
6. TSTuomas Sandholm
  Yeah.
7. LFLex Fridman
  And then there is they gradually lay out some cards that add up overall to five cards that everybody can see.
8. TSTuomas Sandholm
  Yeah.
9. LFLex Fridman
  So the imperfect nature of the information is the two cards that you're holding in your hand.
10. TSTuomas Sandholm
  Up front, yeah. So as you said, you know, you first get two cards in private each, and then you, uh, there's a betting round. Then you get three cards in public on the table, then there's a betting round. Then you get the fourth card in public on the table, there's a betting round.
11. LFLex Fridman
  Mm-hmm.
12. TSTuomas Sandholm
  Then you get the five- fifth card on the table, there's a betting round. So there's a total of four betting rounds and four tranches of information revelation, if you will. The only- the first tranche is private, and then it's public from there.
13. LFLex Fridman
  And this is probably, probably by far the most popular game in AI and, um, just the general public in terms of imperfect information. So it's probably the most popular spectator game to watch, right? So, uh, th- which is why it's a super exciting game to tackle. So it's, it's on the order of chess, I would say, in terms of popularity, in terms of AI setting it as the bar of what is intelligence. So in 2017, Libratus... How do you pronounce it? Libratus.
14. TSTuomas Sandholm
  Libratus.
15. LFLex Fridman
  Libratus. Libratus beats...
16. TSTuomas Sandholm
  Little Latin there.
17. LFLex Fridman
  A little bit of Latin. Uh, Libratus beat a few, uh, four expert human players. Can you describe that event? What you learned from it? What was it like? What was the process in general for people who have not read the papers and, uh-
18. TSTuomas Sandholm
  Yeah.
19. LFLex Fridman
  ... studied?
20. TSTuomas Sandholm
  Yeah, so the event was that, uh, we invited four of the top ten players. With these, uh, specialist players in Heads-up No-Limit Texas Hold'Em, which is very important, because this game is actually quite different than the, the multiplayer version. We brought them in to Pittsburgh to play at the Rivers Casino, uh, for 20 days. We wanted to get to 120,000 hands in, because, uh, we wanted to get statistical significance. Uh, so it's a lot of hands for humans to play, even for these top pros who play fairly quickly normally. Uh, so we, uh, couldn't just have one of them play so many hands. 20 days, they were playing basically morning to evening, and, um, I raised $200,000 as a little incentive for them to play. And w- the setting was so that they didn't all get $50,000. Um, we actually paid them out based on how they did against the AI each.
21. LFLex Fridman
  Yeah.
22. TSTuomas Sandholm
  So they had an incentive to play, uh, as hard as they could, whether they're way ahead or way behind or right at the mark of beating the AI.
23. LFLex Fridman
  And you don't make any money, unfortunately.
24. TSTuomas Sandholm
  Right, no.
25. LFLex Fridman
  (laughs)
26. TSTuomas Sandholm
  We can't make any money, so, so originally, a couple of years earlier, I, uh, well, uh, I actually explored whether we could actually play for money, because that would be, of course, uh, interesting as well, uh, to play against the top people for money, but the Pennsylvania Gaming Board said no. So, so we, we couldn't. So this is much like an exhibit, like, uh, like for a musician or a boxer or something like that.
27. LFLex Fridman
  Nevertheless, you were keeping track of the money and Li- Li- Libratus, uh, won close to $2 million, I think. Uh, so (laughs) so if that- if it was for real money, uh, if you were able to earn money, that was a quite impressive and in- inspiring achievement. Just, uh, a few details. What, what were the players looking at? I mean, uh, were they behind a computer? What, what was the interface like?
28. TSTuomas Sandholm
  Yes, uh, they were playing much like they normally do. These top players, when they play this game, they play mostly online, so they're used to playing through a UI.
29. LFLex Fridman
  Yes.
30. TSTuomas Sandholm
  And they did the same thing here. So there was this layout. You could imagine, there's a table-
15:00 – 30:00
Yeah, so as you…
1. LFLex Fridman
  it's unclear how well it does, but nevertheless uses deep learning? So what are your thoughts about learning methods to aid in the way that, uh, Libratus plays the game of poker?
2. TSTuomas Sandholm
  Yeah, so as you said, Libratus did not use learning methods, and, um, played very well without them. Since then, we have actually...
3. LFLex Fridman
  Awesome.
4. TSTuomas Sandholm
  Actually here, we have, uh, a couple of papers on things that do use learning techniques.
5. LFLex Fridman
  Excellent.
6. TSTuomas Sandholm
  Uh, so, um, and, and deep learning in particular. And th- uh, sort of the way you're talking about, where it's learning an evaluation function.
7. LFLex Fridman
  Mm-hmm.
8. TSTuomas Sandholm
  But, um, in imperfect information games, unlike, let's say, in Go, or now, now also in chess and shogi-
9. LFLex Fridman
  Mm-hmm.
10. TSTuomas Sandholm
  ... it's not, um, sufficient to learn an evaluation for a state, because the value of an information set, uh, depends not only on the exact state, but it also depends on both players' beliefs.
11. LFLex Fridman
  Mm-hmm.
12. TSTuomas Sandholm
  Like if I have a bad hand, I'm much better off if the opponent thinks I'm, I have a good hand, and vice versa. If I have a good hand, I'm much better off if the opponent believes I have a bad hand.
13. LFLex Fridman
  Mm-hmm.
14. TSTuomas Sandholm
  Uh, so the value of a state is not just a function of the cards. Uh, it depends on, uh, if you will, the path of play, but only to the extent that it's captured in the belief distributions. So, uh, so that's why it's not as simple as, uh, uh, as it is in perfect information games. And I don't wanna say it's simple there either. It's, of course-
15. LFLex Fridman
  Right.
16. TSTuomas Sandholm
  ... very complicated computationally there, too. But at least conceptually it's very straightforward. There's a state, there's an evaluation function, you can try to learn it. Here, you have to do something more. Uh, uh, and, uh, w- what we do is in one of these papers we're looking at allowing... where, where we allow the opponent to actually take different strategies-
17. LFLex Fridman
  Mm-hmm.
18. TSTuomas Sandholm
  ... at the leaf of the search tree, as, as... if you will. And, and that, uh, is a different way of doing it, and it doesn't assume, therefore, a particular way that the opponent plays. But it allows opponent to choose from a s- uh, set of different continuation strategies, and that forces us to not be too optimistic in a lookahead search. And that's o- that's one way you can do sound lookahead search in imperfect information games, which is very diff- difficult. And in, in, in... You asked, you were asking about DeepStack.
19. LFLex Fridman
  Yes.
20. TSTuomas Sandholm
  What they did, uh, it was very different than what we do, either in Libratus or in this new work. They were gen- randomly generating various situations in the game, then they were doing lookahead from there to the end of the game as if that was the start of a different game.
21. LFLex Fridman
  Mm-hmm.
22. TSTuomas Sandholm
  And then they were using deep learning to learn those, uh, values of those states, but the states were not just the physical states. They include the belief distributions.
23. LFLex Fridman
  When you talk about lookahead, uh, for DeepStack or with Libratus, does it mean considering every possibility that the game can involve? Is... Are we talking about extremely sort of ex- this exponentially growth of a tree?
24. TSTuomas Sandholm
  Yes. So we, we, we're talking about exactly that, much like you do in alpha-beta search or Monte Carlo tree search, but with different techniques. So there's a different search algorithm, and then we have to deal with the leaves differently. So if you think about what Libratus did, we didn't have to worry about this, because we only did it at the end of the game, so we would always terminate into a real situation, and we would know what the payout is. It didn't do these depth-limited lookaheads. But now in this new paper, uh-... which is called depth-limited, I think it's called depth-limited search for imperfect information games. We can actually do sound depth-limited lookahead. So we can actually start to do the lookahead from the beginning of the game on because that's too complicated to do for this whole long game. So when we brought this, we were just doing it for the end.
25. LFLex Fridman
  So, and then the other side, this belief distribution, so is it explicitly modeled what kind of beliefs that the opponent might have?
26. TSTuomas Sandholm
  Yeah. Yeah, it is explicitly modeled, but it's not assumed. The beliefs are actually output, not input. Of course, the starting beliefs are input, but they just fall from the rules of the game, because we know that the dealer deals uniformly from the deck.
27. LFLex Fridman
  Mm-hmm.
28. TSTuomas Sandholm
  So I know that every pair of cards that you might have is equally likely.
29. LFLex Fridman
  Yes.
30. TSTuomas Sandholm
  I know that for a fact. That just follows from the rules of the game. Of course, except the two cards that I have. I know you don't have those.
30:00 – 45:00
Yes. …
1. TSTuomas Sandholm
  and in these non-zero sum games, we may lose some joint benefit w- uh, by being just simply stupid. We could actually both be better off if we did something else.
2. LFLex Fridman
  Yes.
3. TSTuomas Sandholm
  And in three-player, you get other problems also, like collusion. Like maybe you and I can gang up on a third player-
4. LFLex Fridman
  Mm-hmm.
5. TSTuomas Sandholm
  ... and we can do radically better by colluding. So th- uh, there are lots of issues that come up there.
6. LFLex Fridman
  So, uh, No Brown, the s- student you worked with on this, has mentioned... Uh, I looked through the AMA on Reddit. He mentioned that the ability of poker players to collaborate would make the game... He was asked the question of how would you make the game of poker... Uh, or both of you were asked the question, how would you make the game of poker, uh, beyond being solvable by current AI methods? And he said that, uh, there's not many (laughs)
7. TSTuomas Sandholm
  Yes.
8. LFLex Fridman
  ... ways of making poker more difficult, uh-... but a collaboration or cooperation between players, uh, would make it extremely difficult. So can you provide the intuition behind why that is, uh, if you agree with that idea?
9. TSTuomas Sandholm
  Yeah, yeah. So, uh, we've d- uh, I've done a lot of work on, uh, coalitional games.
10. LFLex Fridman
  Mm-hmm.
11. TSTuomas Sandholm
  And we actually have a paper here with, uh, my other student, Gabriele Farina and some other collaborators on, uh, at the, at NIPS on that.
12. LFLex Fridman
  Yes.
13. TSTuomas Sandholm
  Actually just came back from the poster session where we presented this.
14. LFLex Fridman
  (laughs)
15. TSTuomas Sandholm
  But, uh, so, uh, w- when you have collusion, it's a, it's a different problem.
16. LFLex Fridman
  Yes.
17. TSTuomas Sandholm
  And it typically gets even harder then. Even the game representations, some of the game representations don't really allow good computation, so we actually introduced a new game representation for, for that.
18. LFLex Fridman
  Is that kinda cooperation part of the model? Is, are you, do you have w- uh, do you have information about the fact that other players are cooperating or is it just this chaos that where nothing is known?
19. TSTuomas Sandholm
  So, so there's some- some things unknown.
20. LFLex Fridman
  Can you give an example of a, a collusion type game? Or, or is it usually-
21. TSTuomas Sandholm
  So like bridge.
22. LFLex Fridman
  Right.
23. TSTuomas Sandholm
  Yeah. So think about bridge. It's like when you and I are on a team, our payoffs are the same.
24. LFLex Fridman
  Mm-hmm.
25. TSTuomas Sandholm
  The problem is that we can't talk. So, so when I get my cards, I can't whisper to you what my cards are.
26. LFLex Fridman
  Mm-hmm.
27. TSTuomas Sandholm
  That would not be allowed.
28. LFLex Fridman
  Mm-hmm.
29. TSTuomas Sandholm
  So, uh, we have to somehow coordinate our strategies ahead of time, and only ahead of time, and then there are certain signals we can talk about-
30. LFLex Fridman
  Mm-hmm.
45:00 – 1:00:00
Mm-hmm. …
1. TSTuomas Sandholm
  X in class C, that is not doable with any mechanism. The good thing about automated mechanism design is that we're not really designing for a class. We're designing for specific settings at a time.
2. LFLex Fridman
  Mm-hmm.
3. TSTuomas Sandholm
  So even if there's an impossibility result for the whole class, it just me- doesn't mean that all of the cases in the class are impossible. It just means that some of the cases are impossible. So we can actually carve these islands of possibility within these known impossible classes, and we've actually done that. So one, one of the famous results in mechanism design is the Myerson-Satterthwaite theorem f- by Roger Myerson and Mark Satterthwaite from 1983. It's, it's an impossibility of efficient trade under imperfect information. We show that you can, in many settings, avoid that and get efficient trade anyway.
4. LFLex Fridman
  Depending on how they design the game. Okay, so-
5. TSTuomas Sandholm
  D- depending how you design the game. And of course, it's not... uh, it doesn't, in any way, any way, uh, contradict the impossibility result. The impossibility result is still there, but it just, uh, finds spots within this impossible class where, in those spots, you don't have the impossibility.
6. LFLex Fridman
  Sorry if I'm going a bit philosophical, but, uh, what lessons do you draw towards, like I mentioned, politics or the human interaction and designing mechanisms for, outside of just these kinds of trading or auctioning or, um, purely formal games, are human interaction, like a political system. What, how... can... do you think it- it's applicable to, yeah, politics, or to business, uh, to negotiations, these kinds of things, designing rules that have certain outcomes?
7. TSTuomas Sandholm
  Yeah, yeah. I, I do think so.
8. LFLex Fridman
  Have you seen success, that successfully done?
9. TSTuomas Sandholm
  There hasn't really... Oh, you mean mechanism design or automated mechanism?
10. LFLex Fridman
  Automated mechanism design.
11. TSTuomas Sandholm
  So, so mechanism design itself has had fairly limited success so far. There are certain cases, but most of the real-world situations are actually not sound from a mechanism design perspective.
12. LFLex Fridman
  Mm-hmm.
13. TSTuomas Sandholm
  Even in those cases where they've been designed by very knowledgeable mechanism design people-
14. LFLex Fridman
  Mm-hmm.
15. TSTuomas Sandholm
  ... the people are typically just taking some insights from the theory and applying those insights into the real world-
16. LFLex Fridman
  Yes.
17. TSTuomas Sandholm
  ... rather than applying-
18. LFLex Fridman
  Directly.
19. TSTuomas Sandholm
  ... the mechanisms directly. So one famous example of, is the FCC spectrum auctions. So, um, um, I- I've also had a small role in that, and, uh, very good economists have been work- excellent economists have been working on that with no game theory. Yet, the rules that are designed in practice there, they're such that bidding truthfully is not the best strategy. Usually, mechanism design, we try to, uh, make things easy for the participants so telling the truth is the best strategy.
20. LFLex Fridman
  Mm-hmm.
21. TSTuomas Sandholm
  But, uh, but even in those very high-stakes auctions where you have tens of billions of dollars worth of spectrum being auctioned, truth-telling is not the best strategy.
22. LFLex Fridman
  (laughs)
23. TSTuomas Sandholm
  And, and by the way, nobody knows even a single optimal bidding strategy for those auctions.
24. LFLex Fridman
  What's the challenge of coming up with an optimal... Because there's a lot of players and there's imperfect information?
25. TSTuomas Sandholm
  It's not so much that, um, a lot of players but many items for sale, and the- these mechanisms are such that even with just two items or one item, bidding truthfully wouldn't be the best strategy.
26. LFLex Fridman
  If you look at the history of AI, it's marked by seminal events. Uh, AlphaGo being a world champion, human Go player. I would put Libratus winning the heads-up no-limit hold 'em as one of such event.
27. TSTuomas Sandholm
  Thank you.
28. LFLex Fridman
  Uh, uh, and what, what do you think is the next such event, uh, whether it's in your life or in the broadly AI community, that you think might be out there that would surprise the world?
29. TSTuomas Sandholm
  ... so that's a great question, and I don't really know the answer. In terms of game solving, uh, Heads-up No Limit Texas Hold 'Em really was the one remaining widely agreed-upon benchmark, so that was the big milestone. Now, are there other things? Yeah, certainly there are, but there, uh, there is not one that the community has kind of focused on. So what could be other things? Uh, there are groups working on StarCraft. There are th- groups working on DOTA 2. These are video games.
30. LFLex Fridman
  Yes.
1:00:00 – 1:06:01
I am still extremely…
1. LFLex Fridman
  you know, nuclear weapons (laughs) have been here, it's an obvious problem that's just been sitting there, so how do you think about... what is the mechanism design there that just made everything seem stable and are you still extremely worried?
2. TSTuomas Sandholm
  I am still extremely worried. So, uh, y- yeah, you probably know the simple game theory of MAD. So, so, so, uh, this was, uh, mutually assured destruction-
3. LFLex Fridman
  Mm-hmm.
4. TSTuomas Sandholm
  ... and it's li- it doesn't require any computation, with small matrices you can actually convince yourself that the game is such that nobody wants to initiate.
5. LFLex Fridman
  Mm-hmm.
6. TSTuomas Sandholm
  Yeah, that's a very coarse-grained analysis, and it really works in a situation where you have two super powers or small number of super powers. Now things are very different. You have, uh, smaller nukes so the threshold o- of initiating is smaller and you have smaller countries and non-n- non-nation actors who may get h- nukes and so on, so it's, I, I think it's riskier now than it was maybe ever before.
7. LFLex Fridman
  And what idea, application of AI, you've talked about it a little bit but, what is the most exciting to you right now? I mean, you're here at NIPS, NeurIPS now-You, you ha-have a few excellent pieces of work, but what are you thinking into the future? With several companies you're doing, what's the most exciting thing or one of the exciting things?
8. TSTuomas Sandholm
  The number-one thing for me right now is coming up with these scalable techniques for game solving and applying them into the real world. Uh, I'm still very interested in market design as well, and we're doing that in the optimized markets. But I'm most interested if number one right now is strategic machine, strategy robot, getting that technology out there and seeing as you are in the trenches doing applications, what needs to be actually filled, what technology gaps still need to be, be filled. So it's so hard to just put your feet on the table and imagine what needs to be done. But when you're actually doing real applications, the applications tell you what needs to be done. And I really enjoy that interaction.
9. LFLex Fridman
  Is it a challenging process to apply some of the state-of-the-art techniques you're working on a-and having the, the various, uh, players in industry or the military or people who could really benefit from it actually use it? What's that process like of... You know, in autonomous vehicles, we work with automotive companies and they're, uh, in, in many ways, they're a little bit old-fashioned. It's difficult. They really want to use this technology. There's clearly will have a significant benefit, but the systems aren't quite in place to easily have them integrated in terms of data, in terms of compute, in terms of all these kinds of things. So do you s- is that one of the bigger challenges, uh, that you're facing? Uh, and how do you tackle that challenge?
10. TSTuomas Sandholm
  Yeah, I think that's always a challenge. Uh, that, that kind of slowness and inertia really of let's do things the way we've always done it. You just have to find the internal champions at the customer who understand that, hey, things can't be the same way in the future, otherwise bad things are going to happen.
11. LFLex Fridman
  Right.
12. TSTuomas Sandholm
  And it's in autonomous vehicles, it's actually very interesting that the car makers are doing that and they're very traditional. But at the same time, you have tech companies who have nothing to do with cars or transportation-
13. LFLex Fridman
  Mm-hmm.
14. TSTuomas Sandholm
  ... like Google and Baidu really pushing on autonomous cars. I, I find that fascinating.
15. LFLex Fridman
  Clearly, you're super excited about actually these ideas having an impact in the world. Uh, in terms of the technology, in terms of ideas and research, are there directions that you're also excited about, whether that's on the, some of the approaches you talked about for the imperfect information games, whether it's applying deep learning to some of these problems? Is there something that you're excited in, um, in the research side of things?
16. TSTuomas Sandholm
  Yeah, yeah. Lot, lots of different things in, in the game solving.
17. LFLex Fridman
  Mm-hmm.
18. TSTuomas Sandholm
  Uh, sol- solving even bigger games, uh, games where you have, um, more hidden action of the player actions as well. Uh, poker is a game where really the chance actions are hidden, or some of them are hidden, but the player actions are public. Uh, multiplayer games of various sorts, collusion, uh, opponent exploitation, all, all... and, and even longer games. Some games that basically go forever, but they're not repeated.
19. LFLex Fridman
  Mm-hmm.
20. TSTuomas Sandholm
  So seek extensive form games that go forever. What, what would that even look like? How do you represent that? How do you solve that?
21. LFLex Fridman
  What's an example of a game like that? What... Uh, is this some of the stochastic games that you mentioned?
22. TSTuomas Sandholm
  Let's say business strategy. So it's a- and not just modeling like a particular interaction, but thinking about the business from here to eternity.
23. LFLex Fridman
  Mm-hmm.
24. TSTuomas Sandholm
  Or, uh-
25. LFLex Fridman
  I see.
26. TSTuomas Sandholm
  ... or let's, let's say, um, military strategy. So it's not like war is gonna go away. How, uh, how do you dare think about military strategy that's gonna go forever? Uh, how do you even model that? How do you know whether a move was good-
27. LFLex Fridman
  Mm-hmm.
28. TSTuomas Sandholm
  ... that you, you... somebody made and, and, and so on? Uh, so that, that's kind of one direction. I'm also very interested in learning much more scalable techniques for integer programming. So we had an ICML paper this summer on that for the first automated algorithm configuration paper that has theoretical generalization guarantees. So if I see these many training examples and I tune my algorithm in this way-
29. LFLex Fridman
  Mm-hmm.
30. TSTuomas Sandholm
  ... it's going to have good performance on the real distribution, which I've not seen. So w- which is kind of interesting that, you know, algorithm configuration has been going on now for at least 17 years seriously-

Episode duration: 1:06:17

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode b7bStIQovcY

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome

The following is a…

Yeah, so as you…

Yes. …

Mm-hmm. …

I am still extremely…

Get more out of YouTube videos.