Skip to content
Lex Fridman PodcastLex Fridman Podcast

Tuomas Sandholm: Poker and Game Theory | Lex Fridman Podcast #12

Lex Fridman and Tuomas Sandholm on aI Mastering Poker Reveals Future of Game Theory and Strategy.

Lex FridmanhostTuomas Sandholmguest
Dec 28, 20181h 6mWatch on YouTube ↗

EVERY SPOKEN WORD

  1. 0:0015:00

    The following is a…

    1. LF

      The following is a conversation with Tuomas Sandholm. He's a professor at CMU and co-creator of Libratus, which is the first AI system to beat top human players in the game of Heads-up No-Limit Texas Hold'Em. He has published over 450 papers on game theory and machine learning, including a best paper in 2017 at NIPS, now renamed to NeurIPS, which is where I caught up with him for this conversation. His research and companies have had wide-reaching impact in the real world, especially because he and his group not only propose new ideas, but also build systems to prove that these ideas work in the real world. This conversation is part of the MIT course on Artificial General Intelligence and the Artificial Intelligence podcast. If you enjoy it, subscribe on YouTube, iTunes, or simply connect with me on Twitter @lexfrid. And now here's my conversation with Tuomas Sandholm. Can you describe, at the high level, the game of poker, Texas Hold'Em, Heads-up Texas Hold'Em, for people who might not be familiar, uh, this card game?

    2. TS

      Yeah, happy to. So Heads-up No-Limit Texas Hold'Em has really emerged in the AI community as a main benchmark for testing these application-independent algorithms for imperfect information game solving. And this is a game, uh, that's actually played by humans. You don't see it that much on TV or casinos, because, uh, well, for various reasons, but, uh, uh, you do see it in some expert-level casinos, and you see it in the best poker movies of all time. It's actually an event in the World Series of Poker. But mostly it's played online and typically for pretty, uh, big sums of money. And this is a game that usually only experts play. So if you re- uh, go to your home game on a Friday night, it probably is not gonna be Heads-up No-Limit Texas Hold'Em. It might be, uh, No-Limit Texas Hold'Em in some cases, but typically for a, a big group when it's not as competitive. While heads-up means it's two players, so it's really like me against you. Am I better or are you better? Much like chess or, or, or Go in that sense, but an imperfect-information game, which makes it much harder, of course. I have to deal with issues of, uh, you knowing things that I don't know, and I know things that you don't know, instead of pieces being nicely laid on the board for both of us to see.

    3. LF

      So in Texas Hold'Em, uh, there's, uh, two cards that you only see? The-

    4. TS

      Yes.

    5. LF

      ... they belong to you?

    6. TS

      Yeah.

    7. LF

      And then there is they gradually lay out some cards that add up overall to five cards that everybody can see.

    8. TS

      Yeah.

    9. LF

      So the imperfect nature of the information is the two cards that you're holding in your hand.

    10. TS

      Up front, yeah. So as you said, you know, you first get two cards in private each, and then you, uh, there's a betting round. Then you get three cards in public on the table, then there's a betting round. Then you get the fourth card in public on the table, there's a betting round.

    11. LF

      Mm-hmm.

    12. TS

      Then you get the five- fifth card on the table, there's a betting round. So there's a total of four betting rounds and four tranches of information revelation, if you will. The only- the first tranche is private, and then it's public from there.

    13. LF

      And this is probably, probably by far the most popular game in AI and, um, just the general public in terms of imperfect information. So it's probably the most popular spectator game to watch, right? So, uh, th- which is why it's a super exciting game to tackle. So it's, it's on the order of chess, I would say, in terms of popularity, in terms of AI setting it as the bar of what is intelligence. So in 2017, Libratus... How do you pronounce it? Libratus.

    14. TS

      Libratus.

    15. LF

      Libratus. Libratus beats...

    16. TS

      Little Latin there.

    17. LF

      A little bit of Latin. Uh, Libratus beat a few, uh, four expert human players. Can you describe that event? What you learned from it? What was it like? What was the process in general for people who have not read the papers and, uh-

    18. TS

      Yeah.

    19. LF

      ... studied?

    20. TS

      Yeah, so the event was that, uh, we invited four of the top ten players. With these, uh, specialist players in Heads-up No-Limit Texas Hold'Em, which is very important, because this game is actually quite different than the, the multiplayer version. We brought them in to Pittsburgh to play at the Rivers Casino, uh, for 20 days. We wanted to get to 120,000 hands in, because, uh, we wanted to get statistical significance. Uh, so it's a lot of hands for humans to play, even for these top pros who play fairly quickly normally. Uh, so we, uh, couldn't just have one of them play so many hands. 20 days, they were playing basically morning to evening, and, um, I raised $200,000 as a little incentive for them to play. And w- the setting was so that they didn't all get $50,000. Um, we actually paid them out based on how they did against the AI each.

    21. LF

      Yeah.

    22. TS

      So they had an incentive to play, uh, as hard as they could, whether they're way ahead or way behind or right at the mark of beating the AI.

    23. LF

      And you don't make any money, unfortunately.

    24. TS

      Right, no.

    25. LF

      (laughs)

    26. TS

      We can't make any money, so, so originally, a couple of years earlier, I, uh, well, uh, I actually explored whether we could actually play for money, because that would be, of course, uh, interesting as well, uh, to play against the top people for money, but the Pennsylvania Gaming Board said no. So, so we, we couldn't. So this is much like an exhibit, like, uh, like for a musician or a boxer or something like that.

    27. LF

      Nevertheless, you were keeping track of the money and Li- Li- Libratus, uh, won close to $2 million, I think. Uh, so (laughs) so if that- if it was for real money, uh, if you were able to earn money, that was a quite impressive and in- inspiring achievement. Just, uh, a few details. What, what were the players looking at? I mean, uh, were they behind a computer? What, what was the interface like?

    28. TS

      Yes, uh, they were playing much like they normally do. These top players, when they play this game, they play mostly online, so they're used to playing through a UI.

    29. LF

      Yes.

    30. TS

      And they did the same thing here. So there was this layout. You could imagine, there's a table-

  2. 15:0030:00

    Yeah, so as you…

    1. LF

      it's unclear how well it does, but nevertheless uses deep learning? So what are your thoughts about learning methods to aid in the way that, uh, Libratus plays the game of poker?

    2. TS

      Yeah, so as you said, Libratus did not use learning methods, and, um, played very well without them. Since then, we have actually...

    3. LF

      Awesome.

    4. TS

      Actually here, we have, uh, a couple of papers on things that do use learning techniques.

    5. LF

      Excellent.

    6. TS

      Uh, so, um, and, and deep learning in particular. And th- uh, sort of the way you're talking about, where it's learning an evaluation function.

    7. LF

      Mm-hmm.

    8. TS

      But, um, in imperfect information games, unlike, let's say, in Go, or now, now also in chess and shogi-

    9. LF

      Mm-hmm.

    10. TS

      ... it's not, um, sufficient to learn an evaluation for a state, because the value of an information set, uh, depends not only on the exact state, but it also depends on both players' beliefs.

    11. LF

      Mm-hmm.

    12. TS

      Like if I have a bad hand, I'm much better off if the opponent thinks I'm, I have a good hand, and vice versa. If I have a good hand, I'm much better off if the opponent believes I have a bad hand.

    13. LF

      Mm-hmm.

    14. TS

      Uh, so the value of a state is not just a function of the cards. Uh, it depends on, uh, if you will, the path of play, but only to the extent that it's captured in the belief distributions. So, uh, so that's why it's not as simple as, uh, uh, as it is in perfect information games. And I don't wanna say it's simple there either. It's, of course-

    15. LF

      Right.

    16. TS

      ... very complicated computationally there, too. But at least conceptually it's very straightforward. There's a state, there's an evaluation function, you can try to learn it. Here, you have to do something more. Uh, uh, and, uh, w- what we do is in one of these papers we're looking at allowing... where, where we allow the opponent to actually take different strategies-

    17. LF

      Mm-hmm.

    18. TS

      ... at the leaf of the search tree, as, as... if you will. And, and that, uh, is a different way of doing it, and it doesn't assume, therefore, a particular way that the opponent plays. But it allows opponent to choose from a s- uh, set of different continuation strategies, and that forces us to not be too optimistic in a lookahead search. And that's o- that's one way you can do sound lookahead search in imperfect information games, which is very diff- difficult. And in, in, in... You asked, you were asking about DeepStack.

    19. LF

      Yes.

    20. TS

      What they did, uh, it was very different than what we do, either in Libratus or in this new work. They were gen- randomly generating various situations in the game, then they were doing lookahead from there to the end of the game as if that was the start of a different game.

    21. LF

      Mm-hmm.

    22. TS

      And then they were using deep learning to learn those, uh, values of those states, but the states were not just the physical states. They include the belief distributions.

    23. LF

      When you talk about lookahead, uh, for DeepStack or with Libratus, does it mean considering every possibility that the game can involve? Is... Are we talking about extremely sort of ex- this exponentially growth of a tree?

    24. TS

      Yes. So we, we, we're talking about exactly that, much like you do in alpha-beta search or Monte Carlo tree search, but with different techniques. So there's a different search algorithm, and then we have to deal with the leaves differently. So if you think about what Libratus did, we didn't have to worry about this, because we only did it at the end of the game, so we would always terminate into a real situation, and we would know what the payout is. It didn't do these depth-limited lookaheads. But now in this new paper, uh-... which is called depth-limited, I think it's called depth-limited search for imperfect information games. We can actually do sound depth-limited lookahead. So we can actually start to do the lookahead from the beginning of the game on because that's too complicated to do for this whole long game. So when we brought this, we were just doing it for the end.

    25. LF

      So, and then the other side, this belief distribution, so is it explicitly modeled what kind of beliefs that the opponent might have?

    26. TS

      Yeah. Yeah, it is explicitly modeled, but it's not assumed. The beliefs are actually output, not input. Of course, the starting beliefs are input, but they just fall from the rules of the game, because we know that the dealer deals uniformly from the deck.

    27. LF

      Mm-hmm.

    28. TS

      So I know that every pair of cards that you might have is equally likely.

    29. LF

      Yes.

    30. TS

      I know that for a fact. That just follows from the rules of the game. Of course, except the two cards that I have. I know you don't have those.

  3. 30:0045:00

    Yes. …

    1. TS

      and in these non-zero sum games, we may lose some joint benefit w- uh, by being just simply stupid. We could actually both be better off if we did something else.

    2. LF

      Yes.

    3. TS

      And in three-player, you get other problems also, like collusion. Like maybe you and I can gang up on a third player-

    4. LF

      Mm-hmm.

    5. TS

      ... and we can do radically better by colluding. So th- uh, there are lots of issues that come up there.

    6. LF

      So, uh, No Brown, the s- student you worked with on this, has mentioned... Uh, I looked through the AMA on Reddit. He mentioned that the ability of poker players to collaborate would make the game... He was asked the question of how would you make the game of poker... Uh, or both of you were asked the question, how would you make the game of poker, uh, beyond being solvable by current AI methods? And he said that, uh, there's not many (laughs)

    7. TS

      Yes.

    8. LF

      ... ways of making poker more difficult, uh-... but a collaboration or cooperation between players, uh, would make it extremely difficult. So can you provide the intuition behind why that is, uh, if you agree with that idea?

    9. TS

      Yeah, yeah. So, uh, we've d- uh, I've done a lot of work on, uh, coalitional games.

    10. LF

      Mm-hmm.

    11. TS

      And we actually have a paper here with, uh, my other student, Gabriele Farina and some other collaborators on, uh, at the, at NIPS on that.

    12. LF

      Yes.

    13. TS

      Actually just came back from the poster session where we presented this.

    14. LF

      (laughs)

    15. TS

      But, uh, so, uh, w- when you have collusion, it's a, it's a different problem.

    16. LF

      Yes.

    17. TS

      And it typically gets even harder then. Even the game representations, some of the game representations don't really allow good computation, so we actually introduced a new game representation for, for that.

    18. LF

      Is that kinda cooperation part of the model? Is, are you, do you have w- uh, do you have information about the fact that other players are cooperating or is it just this chaos that where nothing is known?

    19. TS

      So, so there's some- some things unknown.

    20. LF

      Can you give an example of a, a collusion type game? Or, or is it usually-

    21. TS

      So like bridge.

    22. LF

      Right.

    23. TS

      Yeah. So think about bridge. It's like when you and I are on a team, our payoffs are the same.

    24. LF

      Mm-hmm.

    25. TS

      The problem is that we can't talk. So, so when I get my cards, I can't whisper to you what my cards are.

    26. LF

      Mm-hmm.

    27. TS

      That would not be allowed.

    28. LF

      Mm-hmm.

    29. TS

      So, uh, we have to somehow coordinate our strategies ahead of time, and only ahead of time, and then there are certain signals we can talk about-

    30. LF

      Mm-hmm.

  4. 45:001:00:00

    Mm-hmm. …

    1. TS

      X in class C, that is not doable with any mechanism. The good thing about automated mechanism design is that we're not really designing for a class. We're designing for specific settings at a time.

    2. LF

      Mm-hmm.

    3. TS

      So even if there's an impossibility result for the whole class, it just me- doesn't mean that all of the cases in the class are impossible. It just means that some of the cases are impossible. So we can actually carve these islands of possibility within these known impossible classes, and we've actually done that. So one, one of the famous results in mechanism design is the Myerson-Satterthwaite theorem f- by Roger Myerson and Mark Satterthwaite from 1983. It's, it's an impossibility of efficient trade under imperfect information. We show that you can, in many settings, avoid that and get efficient trade anyway.

    4. LF

      Depending on how they design the game. Okay, so-

    5. TS

      D- depending how you design the game. And of course, it's not... uh, it doesn't, in any way, any way, uh, contradict the impossibility result. The impossibility result is still there, but it just, uh, finds spots within this impossible class where, in those spots, you don't have the impossibility.

    6. LF

      Sorry if I'm going a bit philosophical, but, uh, what lessons do you draw towards, like I mentioned, politics or the human interaction and designing mechanisms for, outside of just these kinds of trading or auctioning or, um, purely formal games, are human interaction, like a political system. What, how... can... do you think it- it's applicable to, yeah, politics, or to business, uh, to negotiations, these kinds of things, designing rules that have certain outcomes?

    7. TS

      Yeah, yeah. I, I do think so.

    8. LF

      Have you seen success, that successfully done?

    9. TS

      There hasn't really... Oh, you mean mechanism design or automated mechanism?

    10. LF

      Automated mechanism design.

    11. TS

      So, so mechanism design itself has had fairly limited success so far. There are certain cases, but most of the real-world situations are actually not sound from a mechanism design perspective.

    12. LF

      Mm-hmm.

    13. TS

      Even in those cases where they've been designed by very knowledgeable mechanism design people-

    14. LF

      Mm-hmm.

    15. TS

      ... the people are typically just taking some insights from the theory and applying those insights into the real world-

    16. LF

      Yes.

    17. TS

      ... rather than applying-

    18. LF

      Directly.

    19. TS

      ... the mechanisms directly. So one famous example of, is the FCC spectrum auctions. So, um, um, I- I've also had a small role in that, and, uh, very good economists have been work- excellent economists have been working on that with no game theory. Yet, the rules that are designed in practice there, they're such that bidding truthfully is not the best strategy. Usually, mechanism design, we try to, uh, make things easy for the participants so telling the truth is the best strategy.

    20. LF

      Mm-hmm.

    21. TS

      But, uh, but even in those very high-stakes auctions where you have tens of billions of dollars worth of spectrum being auctioned, truth-telling is not the best strategy.

    22. LF

      (laughs)

    23. TS

      And, and by the way, nobody knows even a single optimal bidding strategy for those auctions.

    24. LF

      What's the challenge of coming up with an optimal... Because there's a lot of players and there's imperfect information?

    25. TS

      It's not so much that, um, a lot of players but many items for sale, and the- these mechanisms are such that even with just two items or one item, bidding truthfully wouldn't be the best strategy.

    26. LF

      If you look at the history of AI, it's marked by seminal events. Uh, AlphaGo being a world champion, human Go player. I would put Libratus winning the heads-up no-limit hold 'em as one of such event.

    27. TS

      Thank you.

    28. LF

      Uh, uh, and what, what do you think is the next such event, uh, whether it's in your life or in the broadly AI community, that you think might be out there that would surprise the world?

    29. TS

      ... so that's a great question, and I don't really know the answer. In terms of game solving, uh, Heads-up No Limit Texas Hold 'Em really was the one remaining widely agreed-upon benchmark, so that was the big milestone. Now, are there other things? Yeah, certainly there are, but there, uh, there is not one that the community has kind of focused on. So what could be other things? Uh, there are groups working on StarCraft. There are th- groups working on DOTA 2. These are video games.

    30. LF

      Yes.

  5. 1:00:001:06:01

    I am still extremely…

    1. LF

      you know, nuclear weapons (laughs) have been here, it's an obvious problem that's just been sitting there, so how do you think about... what is the mechanism design there that just made everything seem stable and are you still extremely worried?

    2. TS

      I am still extremely worried. So, uh, y- yeah, you probably know the simple game theory of MAD. So, so, so, uh, this was, uh, mutually assured destruction-

    3. LF

      Mm-hmm.

    4. TS

      ... and it's li- it doesn't require any computation, with small matrices you can actually convince yourself that the game is such that nobody wants to initiate.

    5. LF

      Mm-hmm.

    6. TS

      Yeah, that's a very coarse-grained analysis, and it really works in a situation where you have two super powers or small number of super powers. Now things are very different. You have, uh, smaller nukes so the threshold o- of initiating is smaller and you have smaller countries and non-n- non-nation actors who may get h- nukes and so on, so it's, I, I think it's riskier now than it was maybe ever before.

    7. LF

      And what idea, application of AI, you've talked about it a little bit but, what is the most exciting to you right now? I mean, you're here at NIPS, NeurIPS now-You, you ha-have a few excellent pieces of work, but what are you thinking into the future? With several companies you're doing, what's the most exciting thing or one of the exciting things?

    8. TS

      The number-one thing for me right now is coming up with these scalable techniques for game solving and applying them into the real world. Uh, I'm still very interested in market design as well, and we're doing that in the optimized markets. But I'm most interested if number one right now is strategic machine, strategy robot, getting that technology out there and seeing as you are in the trenches doing applications, what needs to be actually filled, what technology gaps still need to be, be filled. So it's so hard to just put your feet on the table and imagine what needs to be done. But when you're actually doing real applications, the applications tell you what needs to be done. And I really enjoy that interaction.

    9. LF

      Is it a challenging process to apply some of the state-of-the-art techniques you're working on a-and having the, the various, uh, players in industry or the military or people who could really benefit from it actually use it? What's that process like of... You know, in autonomous vehicles, we work with automotive companies and they're, uh, in, in many ways, they're a little bit old-fashioned. It's difficult. They really want to use this technology. There's clearly will have a significant benefit, but the systems aren't quite in place to easily have them integrated in terms of data, in terms of compute, in terms of all these kinds of things. So do you s- is that one of the bigger challenges, uh, that you're facing? Uh, and how do you tackle that challenge?

    10. TS

      Yeah, I think that's always a challenge. Uh, that, that kind of slowness and inertia really of let's do things the way we've always done it. You just have to find the internal champions at the customer who understand that, hey, things can't be the same way in the future, otherwise bad things are going to happen.

    11. LF

      Right.

    12. TS

      And it's in autonomous vehicles, it's actually very interesting that the car makers are doing that and they're very traditional. But at the same time, you have tech companies who have nothing to do with cars or transportation-

    13. LF

      Mm-hmm.

    14. TS

      ... like Google and Baidu really pushing on autonomous cars. I, I find that fascinating.

    15. LF

      Clearly, you're super excited about actually these ideas having an impact in the world. Uh, in terms of the technology, in terms of ideas and research, are there directions that you're also excited about, whether that's on the, some of the approaches you talked about for the imperfect information games, whether it's applying deep learning to some of these problems? Is there something that you're excited in, um, in the research side of things?

    16. TS

      Yeah, yeah. Lot, lots of different things in, in the game solving.

    17. LF

      Mm-hmm.

    18. TS

      Uh, sol- solving even bigger games, uh, games where you have, um, more hidden action of the player actions as well. Uh, poker is a game where really the chance actions are hidden, or some of them are hidden, but the player actions are public. Uh, multiplayer games of various sorts, collusion, uh, opponent exploitation, all, all... and, and even longer games. Some games that basically go forever, but they're not repeated.

    19. LF

      Mm-hmm.

    20. TS

      So seek extensive form games that go forever. What, what would that even look like? How do you represent that? How do you solve that?

    21. LF

      What's an example of a game like that? What... Uh, is this some of the stochastic games that you mentioned?

    22. TS

      Let's say business strategy. So it's a- and not just modeling like a particular interaction, but thinking about the business from here to eternity.

    23. LF

      Mm-hmm.

    24. TS

      Or, uh-

    25. LF

      I see.

    26. TS

      ... or let's, let's say, um, military strategy. So it's not like war is gonna go away. How, uh, how do you dare think about military strategy that's gonna go forever? Uh, how do you even model that? How do you know whether a move was good-

    27. LF

      Mm-hmm.

    28. TS

      ... that you, you... somebody made and, and, and so on? Uh, so that, that's kind of one direction. I'm also very interested in learning much more scalable techniques for integer programming. So we had an ICML paper this summer on that for the first automated algorithm configuration paper that has theoretical generalization guarantees. So if I see these many training examples and I tune my algorithm in this way-

    29. LF

      Mm-hmm.

    30. TS

      ... it's going to have good performance on the real distribution, which I've not seen. So w- which is kind of interesting that, you know, algorithm configuration has been going on now for at least 17 years seriously-

Episode duration: 1:06:17

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode b7bStIQovcY

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome