Lex Fridman PodcastNoam Brown: AI vs Humans in Poker and Games of Strategic Negotiation | Lex Fridman Podcast #344
EVERY SPOKEN WORD
150 min read · 30,225 words- 0:00 – 1:09
Introduction
- NBNoam Brown
A lot of people were saying, like, "Oh, this whole idea of game theory, it's just nonsense and if you really want to make money, you gotta, like, look into the other person's eyes and read their soul and figure out what cards they have." But what happened was w- where we played our bot against four top heads of no-limit hold 'em poker players, and the bot wasn't trying to adapt to them. It wasn't trying to exploit them. It wasn't trying to do these mind games. It was just trying to approximate the Nash equilibrium, and it crushed them.
- LFLex Fridman
The following is a conversation with Noam Brown, research scientist at FAIR, Facebook AI Research Group at Meta AI. He co-created the first AI system that achieved superhuman level of performance in no-limit Texas hold 'em both heads up and multiplayer. And now, recently, he co-created an AI system that can strategically out-negotiate humans using natural language in a popular board game called Diplomacy, which is a war game that emphasizes negotiation. This is the Lex Fridman Podcast. To support it, please check out our sponsors in the description, and now, dear friends, here's Noam Brown.
- 1:09 – 5:02
No Limit Texas Hold 'em
- LFLex Fridman
You've been a lead on three amazing AI projects. So we got Libratus that solved, or at least achieved human-level performance on, uh, no-limit Texas hold 'em poker with two players, heads up. You got Pluribus. That solved no-limit Texas hold 'em poker with six players, and just now you have Cicero. These are all names of systems that solved or achieved human-level performance on the game of Diplomacy, which, uh, for people who don't know, is a popular strategy board game. It was loved by JFK, John F. Kennedy and Henry Kissinger, and many other big famous people in the decades since. So let's talk about poker and Diplomacy today. First poker, what is the game of no-limit Texas hold 'em and how is it different from chess?
- NBNoam Brown
Well no-limit Texas hold 'em poker is the most popular variant of poker in the world. So, you know, you go to a casino, you play, sit down at the poker table. The game that you're playing is no-limit Texas hold 'em. If you watch movies about poker like Casino Royale or Rounders, the game that they're playing is no-limit Texas hold 'em poker. Now it's very different from limit hold 'em in that you can bet any amount of chips that you want, and so the stakes escalate really quickly. You start out with, like, $1 or $2 in the pot and then by the end of the hand, you've got like $1000 in there maybe.
- LFLex Fridman
So the option to increase the number very aggressively and very quickly is always there?
- NBNoam Brown
Right. The no-limit aspect is there's no limit to how much you can bet. You know, you, in limit hold 'em, there's like $2 in the pot. You- you can only bet, like, $2. But if you've got $10,000 in front of you, you're always welcome to put $10,000 into the pot.
- LFLex Fridman
So I got a chance to hang out with, uh, Phil Hellmuth who plays all these different variants of poker, and correct me if I'm wrong, but it seems like no limit rewards crazy versus the other ones rewards more kind of calculated strategy. Or- or no, because you're sort of looking from an- from an analytic perspective, is- is strategy also rewarded in no-limit Texas hold 'em?
- NBNoam Brown
I think both variants reward strategy, but I think what's different about no-limit hold 'em is i- it's- it's much easier to get jumpy, you know. You go in there thinking you're gonna lose, you know, you're gonna play for, like, $100 or something and suddenly there's like, you know, $1000 in the pot. A lot of people can't handle that.
- LFLex Fridman
Can you define jumpy?
- NBNoam Brown
When you're playing poker, you always want to choose the action that's going to maximize your expected value. It's kind of like- kind of like with investing, right? Like, if you're ever in a situation where you're the amount of money that's at stake is, um, is going to have a material impact on your life, then you're gonna play in a more risk averse style. You know, if somebody makes a huge bet, you're gonna... If you're playing no-limit hold 'em and somebody makes a huge bet, there might come a point where you're like, "This is too much money for me to handle. Like, I can't risk this amount." Uh, and that's what throws a lot of people off. So that's the big difference, I think, between no limit and limit.
- LFLex Fridman
Wha- what about on the action side when you're actually making that big bet? That's what I mean by crazy. I was- I was trying to refer to the technical, the (laughs) the technical term of crazy, meaning use the big jump in the bet to completely throw off the other person in terms of, um, their ability to reason optimally.
- NBNoam Brown
I think that's right. I think one of the key strategies in poker is to put the other person into an uncomfortable position, and if you're doing that, then you're- you're playing poker well. And there's a lot of opportunities to do that in no-limit hold 'em. You know, you can have, like, $50 in there. You throw in a $1000 bet and, um, you know, that's sometimes, if you do it right, it puts the other person in a really tough spot. Now, it's also possible that you make huge mistakes that way, and so it's really easy to lose a lot of money in no-limit hold 'em if you don't know what you're doing. Um, but there- there's a lot of upside potential
- 5:02 – 18:12
Solving poker
- NBNoam Brown
too.
- LFLex Fridman
So when you build systems, AI systems that play these games, we'll talk about poker, we'll talk about Diplomacy, are you, um, are you drawn in in part by the beauty of the game itself, AI aside? Or is it, to you, primarily a fascinating problem set for the AI to solve?
- NBNoam Brown
I'm drawn in by the beauty of the game. Uh, when I... I started playing poker when I was in high school, and the idea to me that there is a correct, an objectively correct way of playing poker, and if you could figure out what that is then you're, you know, you're making unlimited money basically, that's like a really fascinating concept to me. Um, and so I- I was fascinated by the strategy of poker even when I was, like, 16 years old. It wasn't until, like, much later that I actually worked on poker AIs.
- LFLex Fridman
So there was a sense that you can solve poker, like, uh, in the way you can solve chess, for example, or checkers? I believe checkers got solved, right? I think so.
- NBNoam Brown
Yeah, checkers- checkers is completely solved.
- LFLex Fridman
There's an op- optimal strategy for checkers.
- NBNoam Brown
Optimal strategy. It's impossible to beat the AI.
- LFLex Fridman
Yeah, and so in that same way you could technically solve chess.
- NBNoam Brown
You could solve chess, you could solve poker.
- LFLex Fridman
You could solve poker?
- NBNoam Brown
So, this is... This gets into the concept of a Nash equilibrium.
- LFLex Fridman
Okay.
- NBNoam Brown
So-
- LFLex Fridman
What is a Nash equilibrium?
- NBNoam Brown
Okay. (laughs) So, in any finite two-player zero-sum game, there is an optimal strategy that, if you play it, you are guaranteed to not lose in expectation, no matter what your opponent does. And this is kind of a radical concept to a lot of people, um, but it's true in chess, it's true in poker, it's true in any finite two-player zero-sum game. And to give some intuition for this, you can think of rock-paper-scissors. In rock-paper-scissors, if you randomly choose between throwing rock, paper, and scissors with equal probability, then no matter what your opponent does, you are not going to lose in expectation. You are not going to lose in expectation in the long run. Now, the same is true for poker. There exists some strategy, some really complicated strategy, that if you play that, you are guaranteed to not lose money in the long run. And I should say, this is for two-player poker. Six-player poker is a different story.
- LFLex Fridman
Yeah, it's a beautiful, giant mess. When you say "in expectation," you're guaranteed not to lose in expectation. What does "in expectation" mean?
- NBNoam Brown
Poker is a very high variance game, so you're gonna have hands where you win, you're gonna have hands that you lose. Even if you're playing the perfect strategy, you can't guarantee that you're gonna win every single hand. But if you play for long enough, then you are guaranteed to at least break even and, and in practice, probably win.
- LFLex Fridman
So, that's in expectation, the size of your stack, generally speaking. Now, that doesn't include anything about the fact that you can go broke. It doesn't include any of those kinds of normal real-world limitations. You're talking in a, in the theoretical world. Uh, what about this, the zero-sum aspect? How big of a constraint is that, and how big of a constraint is finite?
- NBNoam Brown
So, finite's n- not a, a huge constraint. So, I mean, most games that you play are finite in size. Um, it's also true actually, that there exists this like, perfect strategy in many infinite games as well. Technically, the game has to be compact. Um, there are like, some edge cases where you don't have a Nash equilibrium in a two-player zero-sum game. So, you can think of a game where you like... You know, if we're playing a game where whoever names the bigger number is the winner-
- LFLex Fridman
Mm-hmm.
- NBNoam Brown
... there's no Nash equilibrium to that game.
- LFLex Fridman
17.
- NBNoam Brown
Yeah, exactly. 18. I thought you'd beat me.
- LFLex Fridman
You win again.
- NBNoam Brown
(laughs)
- LFLex Fridman
You're good at this.
- NBNoam Brown
I played a lot of games.
- LFLex Fridman
(laughs) Okay. Uh, so that's th-... And then the zero-sum aspect. The zero-sum-
- NBNoam Brown
The zero, zero-sum aspect... So, there exists a Nash equilibrium in non-two-player zero-sum games as well. And by the way, just to clarify what I mean by two-player zero-sum. I mean, there's two players and whatever one player wins, the other player loses. So, if we're playing poker and I win $50, that means that you're losing $50. Now, outside of two-player zero-sum games, there still exists Nash equilibria, but they're not as meaningful because, you know, you can think of a game like Risk. Y- If everybody else at the, on the board decides to team up against you and take you out, there's no perfect strategy you can play that's gonna guarantee that you win there. There's just nothing you can do. So, outside of two-player zero-sum games, there's no guarantee that you're going to win by playing a Nash equilibrium.
- LFLex Fridman
Have you ever tried to model in the other aspects of the game, which is like, the pleasure you draw from playing the game, and then if you're a professional poker player, if you're exciting even if you lose, uh, the, you know, the money you would get from the attention you get to the sponsors and all that kind of stuff. Is that... That'd be a fun thing to model, to model in. Or does that be- make it sort of super complex to, to include the human factor in this, in its full complexity?
- 18:12 – 24:50
Poker vs Chess
- NBNoam Brown
as well."
- LFLex Fridman
Which is the more complex game, chess or poker? Or Go or poker? Do you know?
- NBNoam Brown
That is, that is a controversial question.
- LFLex Fridman
Okay.
- NBNoam Brown
I'm gonna-
- LFLex Fridman
Oh, is, like, somebody screaming on Reddit right now, "It depends on which subreddit you're on"?
- NBNoam Brown
Yeah.
- LFLex Fridman
"Is it chess or is it poker?"
- NBNoam Brown
Yeah. I'm sure, like, David Silver is gonna get really angry at me.
- LFLex Fridman
Yeah.
- NBNoam Brown
Uh, I'll, I'll say, I'm gonna say poker actually, and I think for a couple reasons. Um-
- LFLex Fridman
They're not here to defend themselves.
- NBNoam Brown
(laughs) So, first of all, you have the imperfect information aspect. And so it's, um, i-We, we can go into that, but like once you introduce imperfect information, uh, things get much more complicated.
- LFLex Fridman
So we should say, maybe you can describe what is seen to the players, what is not seen, um, in the game of Texas Hold'em?
- NBNoam Brown
Yeah. So Texas Hold'em, you get two cards face down that only you see. Um, and so that's the hidden information of the game. The other players also all get two cards face down that only they see. Um, and so you have to kind of as you're playing reason about like, "Okay, what do they think I have? What do they have? What do they think I think they have?" That kind of stuff. And, um, that's, that's kind of where bluffing comes into play, right? Because the fact that you can bluff, the fact that you can bet with a bad hand and still win is because they don't know what your cards are.
- LFLex Fridman
Right.
- NBNoam Brown
And that's the, that's the key difference between a perfect information game like poker... uh, sorry, like chess and Go, um, and imperfect information games like poker.
- LFLex Fridman
This is what trash talk looks like. (laughs) The implied statement is, "The game I solved is much tougher."
- NBNoam Brown
(laughs)
- LFLex Fridman
Uh, but yeah, so, uh, w- when you're playing... I'm just gonna do random questions here. So wha- when you're playing your opponent under imperfect information, is there some degree to which you're trying to estimate the range of hands that they have? Or is that not part of the algorithm? So how, uh... what are the different approaches to the imperfect information game?
- NBNoam Brown
So the key thing to understand about why imperfect information makes things difficult-
- LFLex Fridman
Mm-hmm.
- NBNoam Brown
... is that you have to worry not just about which actions to play, but the probability that you're going to play those actions. So you think about, um, rock-paper-scissors for example. Rock-paper-scissors is an imperfect information game, um-
- LFLex Fridman
Right.
- NBNoam Brown
... because you don't know what I'm about to throw.
- LFLex Fridman
I, I do, but yeah, usually not. Yeah.
- NBNoam Brown
Yeah. And so you can't just say like, "Oh, I'm just going to throw a rock every single time-"
- LFLex Fridman
Mm-hmm.
- NBNoam Brown
... because the other person is gonna figure that out and notice a pattern, and then suddenly you're gonna start losing.
- LFLex Fridman
Mm-hmm.
- 24:50 – 58:18
AI playing poker
- LFLex Fridman
So, who's the greatest poker player of all time and why is it Phil Hellmuth? So, this is for Phil. Uh, so he's known, um, at least in part for maybe playing sub-optimally, and he stills wins a lot. It's a bit chaotic. So, maybe can you speak from an AI perspective about the genius of his madness or the madness of his genius? So, playing sub-optimally, playing chaotically, um, as a way to make it hard to pin down about what your strategy is.
- NBNoam Brown
So, okay. The thing that I should explain first of all with like Nash equilibrium, it doesn't mean that it's predictable. The whole point of it is that you're trying to be unpredictable. Now, I think when somebody like Phil Hellmuth might be really successful-
- LFLex Fridman
Right.
- NBNoam Brown
... is not in being unpredictable, but in being able to, um, take advantage of the other player and figure out where they're being predictable or guiding the other player into thinking that you have a certain weaknesses and then un- and then understanding how they're going to change their behavior. They're going to deviate from a Nash equilibrium style of play to try to take advantage of those perceived weaknesses and then counter exploit them. So, you kind of get into the mind games there.
- LFLex Fridman
So, you think about at least heads up poker as a, as a dance between two agents? I guess are you playing the cards or are you playing the, the player?
- NBNoam Brown
So, this, this gets down to a big argument in the poker community and the academic community. For a long time, there was this debate of like what- what's called GTO, game theory optimal poker-
- LFLex Fridman
Right.
- NBNoam Brown
... or exploitative play. And, um, up until about like 2017 when we did the Libratus match, I think actually exploitative play had the advantage. A lot of people were saying like, "Oh, this whole idea of game theory, it's just nonsense. And if you really want to make money, you got to like look into the other person's eyes and read their soul and figure out what cards they have." But what happened was people started adopting the game theory optimal strategy, um, and they were making good money. And they weren't trying to adapt so much to the other player. They were just trying to play the Nash equilibrium. And then what really solidified it, I think, was the Libratus, the Libratus match where we played our bot against four top heads up no limit hold 'em poker players, and the bot wasn't trying to adapt to them. It wasn't trying to exploit them. It wasn't trying to do these mind games. It was just trying to approximate the Nash equilibrium, and it crushed them. I think, you know, it... We were playing for $50, $100 blinds, and over the course of about 120,000 hands, it made close to $2 million in-
- LFLex Fridman
120,000 hands?
- NBNoam Brown
120,000 hands in-
- LFLex Fridman
A- against humans?
- NBNoam Brown
Yeah. And this was, this was fake money, to be clear. So, there was real money at stake. There was $200,000-
- LFLex Fridman
First of all-
- NBNoam Brown
Yeah.
- LFLex Fridman
... all money is fake, but, um, that's a, that's- that's a different conversation. Um, we give it meaning. Uh, it's an, it's a, it's- it's a phenomena that gets meaning from our, uh, complex psychology as a human civilization. Um, it's emergent from the collective intelligence of the human species. But that's not what you mean. You mean like there's literally you can't, (laughs) you can't buy stuff with it. Okay. Can you actually, uh, step back and take me through that, um, competition?
- NBNoam Brown
Yeah. Okay. So, when I was in grad school, um, there was this thing called the Annual Computer Poker Competition-
- LFLex Fridman
Mm-hmm.
- NBNoam Brown
... where every year all the different research labs that were working on AI for poker would get together. They would make a bot. They would play them against each other. Uh, and we made a bot that actually won the, um, 2014 competition, the 2016 competition, uh, and so we decided we're gonna take this bot, build on it, and play against real top professional heads up no limit Texas hold 'em poker players. So, we invited four of the world's best, um, players in this specialty, and we challenged them to 120,000 hands of poker over the course of 20 days. Um, and we had $200,000, $200,000 in prize money at stake where it would basically be divided among them depending on how well they did relative to each other. So, we wanted to have some incentive for them to play their best.
- LFLex Fridman
Did you have a confidence, 2014, '16, that this is even possible? How much doubt was there?
- NBNoam Brown
So... And we did a competition actually in 2015 where we also played against professional poker players and the bot lost by- by a pretty sizable margin actually. Now, there were some big improvements from 2015 to 2017 and so-
- LFLex Fridman
Can you speak to the improvements? Is it computational in nature? Is it the algorithm, the- the methods? What?
- NBNoam Brown
It was, it was really an algorithmic approach. That- that was the difference. So, 2015, it was much more focused on trying to come up with a strategy upfront, like trying to solve the entire game of poker, like and then just have a look up table where you're saying like, "Oh, I'm in this situation. What's the strategy?"
- LFLex Fridman
Mm-hmm.
- NBNoam Brown
Um, the approach that we took in 2017 was much more search based. It was trying to say, "Okay, well, let me in real time try to compute, uh, a much better strategy than what I had pre-computed, um, by playing against myself during self play."
- LFLex Fridman
What is the search space for poker? What are you searching over? What's that look like? There's different actions like raising, calling. Yeah, what are the actions? Um, is it just a search over actions?
- NBNoam Brown
So, in a game like chess, the- the search is like, "Okay, I'm in this chess position and I can like, you know, move these different pieces and see where things end up." In poker, what you're searching over is the actions that you can take for your hand, the probabilities that you take those actions, and then also the probabilities that you take other actions with other hands that you might have.
- LFLex Fridman
Right.
- NBNoam Brown
Um, and, and that's kind of like, um, hard to wrap your head around, like why are you searching over these like other hands that you might have and like trying to figure out what you would do with those hands?
- LFLex Fridman
Mm-hmm.
- NBNoam Brown
Um, and the idea is, is again, you- you wanna, you wanna always be balanced and unpredictable. And so if your search algorithm is saying like, "Oh, I want to raise with this hand," well, in order to know whether that's a good action... Like let's say it's a bluff, you know? Let's say you have a bad hand and you're saying like, "Oh, I- I think I should be betting here with this really bad hand and bluffing," well, that on- that's only a good action if you're also...... betting with a strong hand. Otherwise, it's an obvious bluff.
- 58:18 – 1:09:08
Heads-up vs Multi-way poker
- LFLex Fridman
So, uh, the leap from that to multi-way six player poker, what's, how difficult is that jump? And what are some interesting differences between heads-up poker and- and multi-way poker?
- NBNoam Brown
Yeah, so I mentioned, you know, Nash equilibrium in two player zero-sum games. If you play that strategy, you are guaranteed to not lose an expectation no matter what your opponent does. Now, once you go to six player poker, you're no longer playing a two player zero-sum game. And so there was a lot of debate among the academic community and among the poker community about how well these techniques would extend beyond just two player heads-up poker. Now, what I had come to realize is that, um, the techniques actually I thought really would extend to six player poker, because even though in theory they don't give you these guarantees outside of two player zero-sum games, in practice, it still gives you a really strong strategy. Now, there were a lot of complications that would come up with six player poker besides like the game theoretic aspect. I mean, for one, the game is just exponentially larger. Um, so the main thing that allowed us to go from two player to six player was the idea of depth-limited search. So I said before, like, you know, we would do search, we would plan out, the bot would plan out, like, what- what it's going to do next and for the next several moves. And in Libratus, that search was done extending all the way to the end of the game. So it would have to start, um... It- it from- from the turn onwards, like looking maybe 10 moves ahead, um, it would have to figure out what it was doing for all those moves. Now when you get to six player poker, it can't do that exhaustive search anymore because the game is just way too large. Um, but by only having to look a few moves ahead and then stopping there and substituting a value estimate of, like, how good is that strategy at that point, then we're able to do a much more scalable form of search.
- LFLex Fridman
Is there something cool? We're looking at the paper right now. Is there something cool in the paper in terms of graphics? A game tree traversal via Monte Carlo.
- NBNoam Brown
I think if you go down a bit, uh...
- LFLex Fridman
Uh, figure one, an example of equilibrium selection problem. Ooh, so yeah, uh, what do we know about equilibria when is- there's multiple players?
- NBNoam Brown
So when you go outside of two player zero-sum, so a Nash equilibrium is a set of strategies, like one strategy for each player, where no player has an incentive to switch to a different strategy. Um, and so you can kind of think of it as like, imagine you have a game where there's a- a ring, um, that's actually the visual here. You got a ring and the object of the game is to be as far away from the other players as possible. There's... A- a Nash equilibrium is for all the players to be spaced equally apart around this ring, but there's infinitely many different Nash equilibria, right? There's infinitely many ways to space four dots along a ring.And if every single player independently computes a Nash equilibrium, then there's no guarantee that the joint strategy that they're all playing, um, is going to resu- is going to be a Nash equilibrium. There- they're just gonna be like random dots scattered along this ring rather than four coordinated dots being equally spaced apart.
- LFLex Fridman
Is it possible to sort of optimally do this kind of selection, to do the selection abo- um, of the equilibria you're chasing? So, is there like a meta problem to be solved here?
- NBNoam Brown
So the meta problem is in some sense, um, how do you- how do you understand the Nash equilibrium that the other players are going to play. Um, and- and even if you do that, again, like, there's no guarantee that you're going to win. So-
- LFLex Fridman
Yeah.
- NBNoam Brown
... you know, if you're playing, uh, if you're playing Risk, like I said, and- and all the other players decide to team up against you, you're gonna lose. Nash equilibrium doesn't help you there. And so there was this big debate about whether Nash equilibrium and all these techniques that compute it are even useful once you go outside of two-player zero-sum games. Now, I think for many games there is a valid criticism here and I think when we talk about- when we go to something like Diplomacy, we run into this issue that the approach of a- trying to approximate a Nash equilibrium doesn't really work anymore. But it turns out that in six-player poker, um, because six-player poker is such an adversarial game, um, where none of the players really try to work with each other, the techniques that were used in two-player poker to try to approximate an equilibrium, those still end up working in practice in- in six-player poker as well.
- LFLex Fridman
The- there's some deep way in which six-player poker is just a bunch of heads-up poker, like, games in one. It's like a- it's like embedded in it. So the competitiveness, um, is more fundamental to poker than the cooperation.
- NBNoam Brown
Right, yeah. Poker is just such an adversarial game, there's no real cooperation. In fact, you're not even allowed to cooperate in poker. It's considered collusion. It's against the rules. Um, and so for that reason, the techniques end up working really well and I think that's true more- more broadly in extremely adversarial games in general.
- LFLex Fridman
But that's sort of in practice versus being able to prove something.
- NBNoam Brown
That's right. Nobody has a proof that that's the case and it could be that- that six-player poker belongs to some class of games where apro- approximating a Nash equilibrium through self-play provably works well. Um, and, you know, there are other classes of games beyond just two-player zero-sum where this is proven to work well. So there are these, you know, kinds of games called potential games which I won't go into, it's kind of like a complicated concept, but, um, there are classes of games where, uh, this approach to approximating a Nash equilibrium is proven to work well. Now, six-player poker is not known to belong to one of those classes, but it is possible that there is some class of games where it either provably performs well or it provably performs not that badly.
- LFLex Fridman
So, what are some interesting things about, uh, Pluribus that was able to achieve human-level performance on this, or superhuman-level performance on the six-player version of poker?
- NBNoam Brown
I- personally, I think the most interesting- interesting thing about Pluribus is that it was so much cheaper than Libratus. I mean, Libratus, if you had to put a price tag on- on the computational resources that went into it, I would say the final training run took about $100,000. You go to Pluribus, the final training run would cost, like, less than $150 on AWS.
- LFLex Fridman
Is this normalized to, uh, computational inflation? So meaning, uh, does- is- is this just- does this just have to do with the fact that Pluribus was trained, like, a year later?
- NBNoam Brown
No, no, no. It's not- it's- I mean, first of all, like, yeah, computing resources are- are getting cheaper every day and, like- but you're not gonna see a- a thousand-fold decrease in the computational resources over two years, um, or even anywhere close to that. The- the real improvement was algorithmic improvements, and in particular, the ability to do depth-limited search.
- LFLex Fridman
So it- does depth-limited search also work for Libratus?
- NBNoam Brown
Yep. Yes. So, where this depth-limited search came from is, you know, I- I developed this technique and, um, ran it on two-player poker first and that reduced the computational resources needed to make an AI that was superhuman from, you know, $100,000 for Libratus to something you could train on your laptop.
- LFLex Fridman
What do you learn from that, um, from that discovery?
- NBNoam Brown
What I would take away from that is that algorithmic improvements really do matter.
- LFLex Fridman
How would you describe the more general case of limited-depth search? So it's basically constraining the scale, temporal or in some other way, of the computation you're doing in some clever way. So, like, with- i- i- like, how else can you significantly constrain computation, right?
- NBNoam Brown
Well, I think the idea is that we want to be able to leverage search as much as possible and the way that we were doing it in Libratus required us to search all the way to the end of the game. Now, if you're playing a game like chess, the idea that you're gonna search always to the end of the game i- is kinda unimaginable, right? Like, there's just so many situations where you just won't be able to use search in that case or the cost would be, um, you know, prohibitive. And this technique allowed us to leverage search and without having to pay such a huge computational cost for it and be able to apply it more broadly.
- LFLex Fridman
So to what degree did you use neural nets for, uh, Libratus and Pluribus and more generally, what role do neural nets have to play in, um, in superhuman-level performance in poker?
- NBNoam Brown
So we actually did not use neural nets at all for Libratus or Pluribus and a lot of people found this surprising back in 2017, I think they find it surprising today, um, that we were able to do this without using any neural nets. Um, and I think the reason for that, I mean, I think neural nets are, um, incredibly powerful and the techniques that are used today, even for poker AIs, do rely, uh, quite heavily on neural nets. Um, but it wasn't the main challenge for poker. Like, I think what neural nets are really good for-If you're in a situation where finding features for a value function is really difficult, then neural nets are really powerful. And this was the problem in Go, right? Like the problem in Go was that... Or the final problem in Go at least, was that nobody had a good way of looking at a board and figuring out who was winning or loo- and, and describing, um, through a simple algorithm who was winning or losing. And so there, neural nets were super helpful because you could just feed in a ton of different board positions into this neural net and it would be able to predict then who was winning or losing. But in poker, the features weren't the challenge. The f- the challenge was how do you design a scalable algorithm that would allow you to find this balance strategy, um, that would understand that you have to bluff with the right probability?
- LFLex Fridman
So, can that be somehow incorporated into the, the value function, this, the complexity of poker that you've described?
- NBNoam Brown
Yeah, so the way the value functions work in poker, like the latest and greatest poker AIs, they do use neural nets for the value function. The way it's done is, is very different from how it's done in a game like chess or Go because in poker, you have to reason about beliefs.
- LFLex Fridman
Right.
- NBNoam Brown
And so the value of a state depends on the beliefs that players have about what the different cards are. Like if you have pocket aces, then whether that's a really, really good hand or just an okay hand depends on whether you know I have pocket aces, right? Like if you know that I have pocket aces, then if I bet, you're gonna fold immediately. But if you think that I have a really bad hand, then I could bet with pocket aces and make a ton of money. So, the value function in poker these days takes the beliefs as an input, which is very different from like how, how chess and Go AIs work.
- 1:09:08 – 1:12:42
Greatest poker player of all time
- NBNoam Brown
- LFLex Fridman
So as a person who appreciates the game, uh, who do you think is the greatest poker player of all time?
- NBNoam Brown
That's a, that's a tough question. Um-
- LFLex Fridman
Can an AI help answer that question? Can you, can it actually a- analyze the quality of play, right? So the AIs chess engines can, can give estimates of the quality of play, right? Um, I wonder if there's a... Is there an Elo rating type of system for poker? I suppose you could, but there's just not enough. You would have to play a lot of games, right? A very large number of games. Like, more than you would in chess. The deterministic game makes it easier to estimate Elo, I think.
- NBNoam Brown
I think it is much harder to estimate something like Elo rating in poker. I think it's doable. The problem is that the, the game is very high variance. So you could play... You could be profitable in poker for a year and you could actually be a bad player just because the variance is so high. I mean, you've got top professional poker players that would lose for a year just because they're on a really bad, um, bad streak.
- LFLex Fridman
So yeah, so for Elo, you have to have a nice clean way of saying if player A played player B and A beats B, that says something, that's a signal. In poker, that's a very noisy signal.
- NBNoam Brown
It's a very noisy signal. Now there is a signal there and so you, you could do this, this calculation, it would just be much harder. Um, but the same way that AIs have now taken over chess and, you know, all the top professional chess players train with, with AIs, the same is true for poker. Um, the game has become, uh, a very computational, um... People train with AIs to try to find out where they're making mistakes, um, try to learn from the AIs to improve their strategy. Uh, so now, yeah, so th- the game has been revolutionized es- es- in the past five years by, by the development of AI in this sport.
- LFLex Fridman
The skill with which you avoided the question of the greatest of all time was impressive.
- NBNoam Brown
So my feeling is that it's a difficult, it's a difficult question because just like in chess where you can't really compare Magnus Carlsen today to Garry Kasparov, um, because the game has evolved so much. Um, the poker players today are so far beyond the, the, the skills of like people that were playing even 10 or 20 years ago. Um, so you look at the, the kinds of like all-stars that were on ESPN at like the height of the poker boom, pretty much all those players are actually not that good at the game today. At at least the, at least the, the strategy aspect. I mean, they might still be good at like reading the player at the other side of the table and trying to figure out like are they bluffing or not, but in terms of the actual like computational strategy of the game, um, a lot of them have really struggled to keep up with that development. Now, so for that reason, I'll, I'll give an answer and I'm gonna say Daniel Negreanu, who you actually had on the podcast recently. I saw it was a great episode.
- LFLex Fridman
He's gonna love this so much.
- NBNoam Brown
Yeah. (laughs) And-
- LFLex Fridman
(laughs) And Phil's gonna hate this so much.
- NBNoam Brown
(laughs) A- and I'm, I'm gonna give him, I'm gonna give him credit because he is one of the few like old school really strong players that have kept up with the development of AI.
- LFLex Fridman
So he is trying to... He's constantly studying the, the game theory optimal way of playing.
- NBNoam Brown
Exactly, yeah. And I think a lot of, a lot of the old school poker players have just kind of given up on that aspect and, and I gotta give Daniel Negreanu credit for, for keeping up with all the developments that are happening in the sport.
- LFLex Fridman
Yeah, it's fascinating to watch and it's fascinating to watch where it's headed. Um, yeah, so there you go. Some love for Daniel. Quick pause. Bathroom break?
- NBNoam Brown
Yeah, let's do it.
- 1:12:42 – 1:22:33
Diplomacy game
- NBNoam Brown
- LFLex Fridman
Let's go from poker to diplomacy. What is at, at a high level the game of diplomacy?
- NBNoam Brown
Yeah, so I talked a lot about two-player zero sum games and what's interesting about diplomacy is that it's very different from these like adversarial, uh, games like chess, Go, poker, even StarCraft and Dota. Diplomacy has a much bigger cooperative element to it. It's a seven player game. It was actually created in the '50s, um, and it takes place, uh, before World War I. It's like a map of Europe with seven great powers, um, and they're all trying to form alliances with each other. There's a lot of negotiation going on. Um, and so the whole focus of the game is on, uh...... forming alliances with the other players to take on the other players.
- LFLex Fridman
England, Germany, Russia, Turkey, Austria, Hungary, Italy, and France.
- NBNoam Brown
That's right, yeah. So, the way the game works is on each turn you spend about, you know, five to 15 minutes talking to the other players in private, and you make all sorts of deals with them. You say like, "Hey, let's work together." Um, you know, "Let's team up against this other player." Because the only way that you can make progress is by working with somebody else a- against the others. Um, and then after that negotiation period is done, all the players simultaneously submit their moves and they're all executed at the same time. And so you can tell people like, "Hey, I'm going to support you this turn," um, but then you don't follow through with it, and they're only going to figure that out once they see the moves being read off.
Episode duration: 2:29:21
Install uListen for AI-powered chat & search across the full episode — Get Full Transcript
Transcript of episode 2oHH4aClJQs
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome