Skip to content
No PriorsNo Priors

No Priors Ep. 1 | With Noam Brown, Research Scientist at Meta

AGI can beat top players in chess, poker, and, now, Diplomacy. In November 2022, a bot named Cicero demonstrated mastery in this game, which requires natural language negotiation and cooperation with humans. In short, Cicero can lie, scheme, build trust, pass as human, and ally with humans. So what does that mean for the future of AGI? This week’s guest is research scientist Noam Brown. He co-created Cicero on the Meta Fundamental AI Research Team, and is considered one of the smartest engineers and researchers working in AI today. Co-hosts Sarah Guo and Elad Gil talk to Noam about why all research should be high risk, high reward, the timeline until we have AGI agents negotiating with humans, why scaling isn’t the only path to breakthroughs in AI, and if the Turing Test is still relevant. 00:00 Introduction 01:43 What sparked Noam’s interest in researching AI that could defeat games 06:00 How the Alexa.NET and AlphaGo changed the landscape of AI research 08:09 Why Noam chose Diplomacy as the next game to work on after poker 09:51 What Diplomacy is and why the game was so challenging for an AI bot 14:50 Algorithmic breakthroughs and significance of AI bots that win in No-Limit Texas Hold'em poker 23:29 The Nash Equilibrium and optimal play in poker 24:53 How Cicero interacted with humans 27:58 The relevance and usefulness of the Turing Test 31:05 The data set used to train Cicero 31:54 Bottlenecks to AI researchers and challenges with scaling 40:10 The next frontier in researching games for AI 42:55 Domains that humans will still dominate and applications for AI bots in the real world 48:13 Reasoning challenges with AI

Sarah GuohostNoam BrownguestElad Gilhost
Apr 25, 20231h 0mWatch on YouTube ↗

EVERY SPOKEN WORD

  1. 0:001:43

    Introduction

    1. SG

      (instrumental music plays) Noam, welcome to No Priors.

    2. NB

      Oh, thank you for having me.

    3. EG

      Yeah. Thanks a lot for joining. So, you know, I think, uh, in the world today when a lot of people think about AI, they think about it as basically you put a... you put a couple words into a prompt and then you get out an image. Or you have, uh, ChatGPT summarize James Burnham's professional managerial class for you in a rhyming essay in the voice of a cat or something. (laughs) And I think you've pushed in really interesting, uh, directions, uh, that are very different in some ways from what a lot of people have been focused on, and you've been more focused on game theoretic actors interacting with humans and with each other. And in parallel, you're kinda known as, um, as Sara mentioned, as sort of one of these true 10X engineers and researchers pushing the boundaries on the... on A-... NAI. And so I'm sort of curious, like, what first sparked your interest in games and researching AI to defeat games like poker and Diplomacy?

    4. NB

      Well, I think, uh, you know, my journey's a bit, uh, non-traditional. I mean, I started out in finance actually, towards the end of my undergrad career. And also, like, af-... right after undergrad I worked in algorithmic trading for a couple years, and I, I kinda realized that, uh, while it's, it's fun and, uh, it's, you know, exciting, it's kinda like a game. You know you gotta score at the end of the day, which is how much money you've made or lost. Uh, it's not really the most fulfilling thing that I wanna do with my life. Uh, and so I decided that I wanted to do research, and i- it wasn't really clear to me in what area. I was originally planning to do economics actually, and so I went to the Federal Reserve. I worked there for two years. Honestly, I wanted to figure out how to structure financial markets better to encourage more pro-social behavior. And so, in the process, I, I became interested in, in game theory and I, I thought I wanted to pursue a PhD, like, in economics th-... uh, focused on game theory. Two things happened. So first of all, I became a bit, a bit jaded with the pace of progress in economics, because if you come up with an idea, you, you have to get it

  2. 1:436:00

    What sparked Noam’s interest in researching AI that could defeat games

    1. NB

      passed through legislation and it's a very long process. Computer science is much more exciting in that way because you can just build something. You don't really need permission to do it. And then the other thing I figured out was that a lot of the most exciting work in game theory was actually happening in computer science. It wasn't happening in economics. And so I applied for, uh, grad schools with the intention of studying algorithmic game theory in a computer science department. And, uh, when I got to grad school, there was conveniently a professor that was looking for somebody to do research on AI for poker, and I thought this was, like, the perfect intersection of everything that I wanted to do. I was interested in game theory. I was interested in, you know, making something, uh, interested in AI. I had played poker when I was in high school and college, and you know, never for high stakes but always just kinda interested in the strategy of the game. I actually tried to make a, a poker bot when I was in undergrad, and it, it, it did terribly but it was a lot of fun a- along... and so to be able to do that for research in grad school, I thought this was like the, the perfect thing for me to work on. And also I felt like there, there was an opportunity here because it felt doable, and I, I, I kinda recognized that if you succeed in making an AI that can play poker, you're going to learn really valuable things along the way, and, and that could have, like, major implications for the future. So that's kinda how I, how I got started in that.

    2. EG

      That's really cool. And did you have a specific end goal of your work when you started or was it just interest? In other words, some... you know, you talk to a lotta people in the field now and they say, "Oh, our end goal is AGI and it's always been." (laughs) And I think sometimes that's sort of invented later as sort of an interesting story for what they're doing. Did you view this as just doing primary research and as just personal interest? Did you view it as, like, there's a path leading to agents that function on behalf of people? Or, or was there some other sort of driving motivator?

    3. NB

      Well, y-... so I started grad school in, in 2012, and it was a very different time in 2012. Uh, you know, the idea of AGI was, was really science fiction. Um, there were, there were some people that were, you know, uh, serious about it but, but very few. The majority opinion was that AI was... if anything, it was kind of a dead field. Um, I actually remember, like, emailing a professor and having this conversation where I was like, "Look, I'm really interested in AI but I'm, I'm kinda worried to pursue a PhD in this because, you know, I, I th-... I get the impression that it's just a dead field and I'm not... I'm worried I'll be able to get... if I'll be able to get a job afterwards." Conveniently, like, a couple years into grad school things changed pretty drastically and, um, and I, I happened to be in the right place at the right time, I think. It was really fortunate in that respect. So the, the original intention wasn't to, to pursue AGI. The original intention was, you know, you learn interesting things about, um, AI and game theory and y- you build slowly, um, and it was really only a, a couple years into grad school that it, it became clear that the pace of progress was, was quite dramatic.

    4. EG

      Was there a specific moment that really drove that home for you? I know for some people they mentioned, oh, AlexNet came out or, oh, you know, some of the early GAN work felt like a wake-up call. I'm just sort of curious if there's a specific technology or paper or something else that came out, or was it just kind of a continuum?

    5. NB

      Uh, I think it was a slow drip. I mean, I think for me especially i- it was, it was the AlphaGo moment, you know-

    6. EG

      Mm-hmm.

    7. NB

      ... like when, when you see that it's just very clear. I mean, AlexNet too. I mean, eh, I, I remember taking an AI class when I was, um... before I, before I started grad school actually. I, I took a computer vision class and they were talking about, like, you know, SIFT and all this stuff. You get something like AlexNet and it just, like, throws all that out the window and it's just, like, mind-boggling how effective that could be.

    8. SG

      Noam, can you... can you explain actually, like, why AlphaGo is so important and, like, just size of search space and how you might contrast that to previous games?

    9. NB

      Yeah. So, uh, there was... you know, a, a big milestone in AI was Deep Blue beating Garry Kasparov in chess in 1997, and that was, that was a big deal. Um, and I think we actually... the... it's kinda downplayed today I think in, like, by a lot of machine learning researchers, but it was, it was... we learned a lot from that. We learned that scale really does work. And in that case it wasn't scaling, you know, training in neural nets, it was scaling search. Uh, but the techniques that were used in Deep Blue, they didn't work in a game like Go because the pattern matching was just not there. Like, y- you couldn't... uh, a big challenge in Go was figuring out, like, how do you even evaluate the state of a board? How do you tell who's winning? Um, in chess it's, like, difficult but you can kinda write a function... e- you can handcraft a function to, to estimate that, right? like, you calculate, oh, each piece is worth this many points and you add it together and you can kinda get a sense of who's winning and who's losing. And in Go that's just almost impossible to do by hand.

    10. SG

      It's essentially too big to do that.

    11. NB

      It's too big. It's too subtle. Um, it's just, it's just too complicated and there's too much nuance. And if you asked... you know, the difference

  3. 6:008:09

    How the Alexa.NET and AlphaGo changed the landscape of AI research

    1. NB

      is also if you asked a human who's winning, they could tell you who's winning but they couldn't tell you why. You know, one of the things that people assumed was that h-... you know, humans are just better at pattern matching. Um, and to have an AI come along and, like, demonstrate that it can do this pattern matching better than a human can in... even if it's in this constrained game, um, that was, that was a big deal.... I think that was a wake-up call to a lot of people, not, not just me but I think, um, across the world.

    2. SG

      I remember as a former, like, Go nerd looking, uh, just trying to understand the moves that AlphaGo made to try to figure out how to play better 'cause it- it was like a- such a mind-blowing moment.

    3. NB

      Yeah. And, you know, if any, if any of your listeners hasn't, haven't seen the AlphaGo documentary, I highly recommend watching it. You can... I think it's on Netflix and, uh, or YouTube. And it's, it really just like... You can see just how significant this was to a, to a lot of the world, um, when you watch that.

    4. EG

      How did you end up choosing Diplomacy as the next thing to work on after poker? There, there's obviously like a, a wide space of a variety of different types of games and so what, what drove your selection criteria there and how did you think about choosing that as the, the next sort of interesting research problem?

    5. NB

      So basically what happened, uh, we succeeded in, in poker and, um, when we were trying to pick the, the next direction, it became clear that AI was progressing very quickly. Like, much quicker than I think a lot of people appreciated. And there were a lot of conversations about like, "What should the next benchmark be?" A lot of people were throwing around these games like, uh, Hanabi was one. Uh, somebody was talking about like Werewolf or Settlers of Catan, these kinds of things. And I just felt like... You know, this was 2019 and in 2019, you had GPT-2 come out, which was just mind-blowing, and then you also had, you know, Deep Mind grand masters in StarCraft II, you had OpenAI beating human experts in DOTA 2, and that was just after like a couple of years of, uh, of work, of research and engineering. To then like go to a game like Settlers of Catan, it just, it just felt like too easy. Like, you could just take a team of five people, spend a year on that, and you'd have it cracked. And so we wanted to pick something that would be truly impressive like that would require fundamentally new techniques in order to succeed, not just scaling up something that, that already exists. And we were trying to think of what would be the hardest game to make an AI for, and we landed on Diplomacy. The idea that you could have an AI that negotiates in natural language with humans and like

  4. 8:099:51

    Why Noam chose Diplomacy as the next game to work on after poker

    1. NB

      strategizes with them, it really just felt like science fiction. And even in 2019 knowing all the success that was happening in AI, it still felt like science fiction. And so that's why we aimed for it. I, I think that was the right call. I mean, I'm really glad that we, we aimed high at that point. I was a little afraid to do that, to be honest. Um, it, it's, it's a high-risk thing to aim for, but all research is high risk, high reward, or at least it should be.

    2. EG

      What, what was, uh, the most unexpected thing to come out of working on Diplomacy in terms of what Cicero could do?

    3. NB

      (sighs) I mean, I think the most unexpected thing was just honestly how it didn't get detected as a bot. We were really worried about this leading into, into the human mach- uh, human competitions because first of all there's no way to, like, really test this ahead of time. Like, we can play with the bot, but we know that it's a bot and we can't really, like, gather a bunch of people together and stick them in a game and, you know, have them play with a bot without them... Without telling them or having them realize that something's up, right? Like if, if this company is hiring them to, like, play a game, like, and they know that we're working on Diplomacy, like, clearly they're gonna be playing with a bot. A- and when people know that they're playing with a bot, they behave very differently, right? It... We didn't want this to turn this into a Turing test, and so we had to enter the bot into these games where players did not know that there was a bot in the mix. That was the only way that we could get, like, meaningful results. Um, and, and just to be clear, if like... for people, for those of you that are not familiar with Diplomacy, the reason for this is because, like, Diplomacy is a natural language negotiation game. And so you're having these, like, really complicated, long conversations with these, with these people and, um, it, it's kinda hard to get away with that if... a- as a bot, um, and not be detected. And so our big concern was, like, we stick this bot in a game and within like five games, maybe even two games, um, they figure out it's a bot, word gets out, all the Diplomacy... The Diplomacy community is pretty small so they all talk to each other and then

  5. 9:5114:50

    What Diplomacy is and why the game was so challenging for an AI bot

    1. NB

      in all the future games, everybody's like asking, you know, Turing test questions trying to figure out who the bot is and our results, our, our experiments are just like meaningless. And so we figured like, "Okay, maybe we get lucky and we manage to get like 10 games in before they figure this out, but at least we have like 10 games worth of data." But surprisingly we managed to go like the full 40 games without being detected as a bot. That was surprising to me and I think, I think that's a testament to the progress of language models, um, in the past, past couple of years especially. And also that maybe humans aren't as good at talking (laughs) as we might think. Like, I- it made me appreciate also that, you know, if somebody is saying something a little weird, 'cause the bot does say weird things every once in a while, their first instinct is not gonna be like, "Oh, I'm talking to a bot." Their first instinct is gonna be like, "Oh, this person is, like, dumb or distracted or, like, you know, they're drunk or something," uh, and then way down the list is like, "Oh, this person is a bot." Uh, so I think that we got pretty lucky in that respect. I mean... But, but also, I mean, the, the bot did manage to, like, actually go these 40 games without being detected, and so I think that is a testament to, to the quality of the language model.

    2. SG

      I think Meta is actually, uh, planning to release the data, which is gonna be so interesting, but can you... um, can you just like describe like an interaction from the bot you thought was interesting in these negotiations?

    3. NB

      Oh, yeah. I mean, I, I think one of the, one of the messages that was like really, you know, honestly kind of scary to me was just when it w- it was, it was talking to another player and the player was saying like, "Hey, you know, I'm really nervous about your units, you know, near my border," and, and the bot honestly was not planning to attack the player. Um, it was planning to go in the other direction. And it just... It, it sent the player this like really empathetic message where it was like, "Look, I totally understand where you're coming from. I, I can assure you 100%, like, I'm not planning to attack you. I'm planning to go the other direction. You have my word." And, um, it, it really felt like a very, um, a very human-like message, and, um, you know, 100% I would have never expected that to come from a bot. And that... When you see kind... When you see stuff like that, like, it makes you appreciate like, yeah, there's something really powerful here.

    4. EG

      How do you think about the Turing test in the context of all this? Like... Or what's your updated model of whether the test is still relevant or how to think about it?

    5. NB

      So there's actually a New York Times article that came out today, um, from Cade Metz at, uh, New York Times on, on like the Turing test and what it means, and he actually talks about Cicero in, in the article. Um, and basically his, his view is that the Turing test is, is kinda dead, and I, I kind of agree with that. I think the Turing test is no longer really a useful, um, measure the way it was intended to be. Um, certainly.... just because we have bots that can... uh, I won't say they can pass the Turing test, but, I mean, like, they're getting, they're getting close enough that it's no longer that useful of a measure. Um, it doesn't mean that we have general intelligence. Um, I think there's still a long way to go on that. There's a lot of things that these bots can't do well. But, yeah, I- I think my view now is that the Turing test is not that useful of a measure anymore. It- it doesn't necessarily mean that it was always a useless measure. I think it- it just shows, like, how much progress we've made. Um, we're not 100% there but, you know, the- the progress really has been staggering, especially in the past few years.

    6. EG

      What- what measure do you think, or measures do you think make sense to use? And then, also, what do you think is missing on sort of the road to general intelligence?

    7. NB

      Um, I think there's a few things that are missing. The big- the big thing that I'm interested in particular is reasoning capabilities. You have these bots and they can, they can- they basically... they're all doing next word prediction, right? Cicero is a bit different actually, in that it's actually conditioning its dialogue generation on a plan. Um, and I think that's one of the really interesting things that- that distinguishes Cicero from a lot of the work that's happening in language models today. But a lot of the research that is happening is- is using next word prediction. And when it's trying to do something that's, like, more sophisticated in terms of reasoning capabilities, it's a lot of chain of thought where it's just, like, rolling out, you know, the kind of reasoning that it's observed humans do in their... in- in its training data and seeing where that leads. I think there's a general recognition among AI researchers that, um, this is a big weakness in the bots today, and that if we want truly general artificial general intelligence then this, uh, this needs to be addressed. Now, there's a big question about how to address it. And that's actually why I really like this direction, because it's still an open question about how- how to actually fix this problem. There's been some progress, but I think there's a lot of room for improvement.

    8. SG

      What do you think are the most, uh, promising possible directions?

    9. NB

      Uh, that is, that is the trillion-dollar question. Um-

    10. SG

      You're the trillion-dollar man. (laughs)

    11. NB

      (laughs) I- I think, I think there's, like, clear- clear base. I mean, like, first- first of all, chain of thought really was, like, a big step. And it's kinda shocking just, like, how effective that was, given how simple of an idea it is.

    12. SG

      I tell myself every day when I wake up, "Now let's think step by step."

    13. NB

      (laughs) Yeah, so for those of you that don't know, it's just, like, you add to the prompt, like, "Oh, let's think th- through this step by step," and then the- the AI will, like, actually generate a- a- a longer, like, thought process about how it reaches its conclusion, and then that, um, actually leads to better conclusions. But you can- you can kinda see that as, like, just rolling out the- the thought process that's observed in human data, and

  6. 14:5023:29

    Algorithmic breakthroughs and significance of AI bots that win in No-Limit Texas Hold'em poker

    1. NB

      so there's a question of like, okay, well, instead of just rolling that out, could you actually improve it as it's going through each step? Um, and so I think things like that... I mean, I'm kinda keeping it, like, very abstract because it- it's- it's an important question, and also, I think there's not a clear answer yet, um, so I don't wanna speculate too much. But I think that there is, like, room for improvements i- in this direction.

    2. EG

      What was the actual dataset that, um, was necessary in order for the training here? And- and sort of... uh, uh, maybe to take a step back, you know, I've been having a series of conversations with people about data and sort of like when do we run out of data that's easily available, and when do we har- have to start creating either large-scale, uh, synthetic data or, um, human RLHF data or, um, you know, do you literally pay bounties to people who just record themselves all day so you can start collecting interesting data off of them (laughs) to do different things with over time, right? As these models scale to a certain point where, you know, you've used up the internet and you start using up all the video content, you start running out of stuff. I'm just sort of curious, like, how you thought about data in this context and what's necessary to really take things to the next level from a self-driven agent perspective like this.

    3. NB

      It's not clear that data really is the- the bottleneck on performance here. Uh, I- I mean, I've talked to AI researchers about this, and I think there isn't as much of a worry, um, about this as- as people might think. Probably that's because there's a lot more data that's out there than people might realize, um, and that- and that people are using right now. And also it's because I think there are gonna be improvements to, um, sample efficiency as the research progresses. I think we'll be able to- to stretch the data more.

    4. EG

      What do you think is a bottleneck?

    5. NB

      I think the bottleneck is going to be scaling. I mean, so you look at- at the models that exist today, like, they probably cost $50 million to train. Um, you can probably easily 10X that. You know, I wouldn't be surprised if there's a $500 million model that's trained in the next year or two. Um, you can maybe even go another order of magnitude and, like, train a $5 billion model if you're, like, the US government or something or, like, a really big tech company. Um, but what do you do beyond that? Do you- do you train $100 billion model? Um, you'll probably see some improvement, but at some point, it just becomes, like, not realistic anymore. And so that's- that's gonna be the bottleneck. Like, we maybe get, like, two orders of magnitude more scaling, and- and then we have a big problem. And people are focused on, like, "Okay, how do we- how do we make this more efficient? How do we train this, uh... it cheaper, more parallelized?" But you can only squeeze so much out of that, and I think we've- we've squeezed a lot already. This is why I'm interested in the reasoning direction, because I think there is this whole other dimension that people are not scaling right now, which is the amount of compute at, uh... at- at inference time. Like, you know, you can spend $50 million training this model, um, ahead of time, like pre-training this model, and then when it comes to actual inference, it costs, like, a penny. And, you know, what happens if instead of it returning an answer in a second, it returns an answer in, like, an hour, or- or even five seconds or 10 seconds? You know, sometimes if people want to give a better out- a better answer, they'll sit there and they'll think a bit, and that leads to a better outcome. And I think that- that's one of the things that's missing from these models. So I- I think that that's one of the ways to overcome the scaling challenge, and that- that's partly why I'm interested in working on that.

    6. SG

      Going back to, um, related to what Elad said, the Diplomacy problem specifically didn't have, like, you know, internet scale data, right? It... as you mentioned, it's, like, a relatively small community. Can you talk about what you guys did in terms of self-play and the data that actually was involved?

    7. NB

      So Diplomacy, the problem was interesting because... Yes, there- there's actually not a ton of data out there. I mean, we had a relatively good dataset, about- about 50,000 games with dialogue. Um, we did-

    8. SG

      This is from, like, web Diplomacy?

    9. NB

      Yeah, this is from a, a site called webdiplomacy.net. It's been around for, like, almost 20 years where people play, um, Diplomacy casually on this site. Um, we were very lucky to get this data set. I mean, um, honestly, I was scouring the internet trying to find, like, all the sites that have available data, and this was basically the only site that had a meaningful amount of data. Like, there was another popular site, but they periodically deleted their data, which was, you know, just mind-boggling to me. It's just, you're sitting on a goldmine here and you're just deleting it 'cause it's taking up server space. Um, I, I guess they didn't appreciate that, like, AI researchers will one day be interested in that. Um, and then other sites just, like, refused to hand over their data, and so I'm really glad that we managed to, like, work out a deal with webdiplomacy.net because otherwise, uh, the project would have just never happened. Now, that's about 50,000 games of Diplomacy, about 13 million messages, and that is a good size data set, but it's not, it's not enough to train a bot from scratch. Uh, fortunately, we are, we're able to leverage, like, you know, a wider data set from the internet so you kinda, like, have a pre-trained language model and then you fine-tune it on the Diplomacy data, and you get a bot that can actually communicate pretty well in the game of Diplomacy. Now, that helps with the dialogue, um, but there's still a problem, which is that the strategy isn't gonna be up to par. Um, and that's partly because you can't do that well with just supervised learning. You can't learn, like, a really, really good strategy in these kinds of games just with supervised learning, and it's also because the, the people that were, that are playing these games are not very good at the game. Like, the most, most of the data set is from fairly weak players. You know, that's, that's just a reality. You have a, you have a bell curve. The actual strong players are, like, a relatively stro- small fraction of any data set that you have. And I should say, this is not limited to Diplomacy. Like, we also found in chess and Go, we actually ran this experiment. Um, if you do just pure supervised learning on a giant data set of human, uh, chess and Go games, the bot that you get out from that is not an expert chess or Go player. Even if it's, like, conditioned to, like, behave like a chess grandmaster, it's not going to be able to match that kind of performance, um, because it's, it's not doing, it's not doing any planning. That's really what's missing. So in order to come up with these, like... In order to get a, a strategy that was able to go beyond just, like, average human performance or, or even, like, you know, strong human performance to something that's, like, much better, we had to do self-play. And this is, like, how all these, like, previous game AIs have been trained, right? If you look at AlphaGo, you look at, um, especially AlphaZero, the latest version of AlphaGo, and you look at the, you know, the DOTA 2 bot, the way they're trained is by playing against themselves for millions, uh, or billions of trajectories. Um, that's also how our poker bot was trained, um, for two-player and six-player poker. Now, the difference is, like, when you go from those games to Diplomacy, suddenly there is this cooperative aspect to the game. Like, you can't just assume that everybody else is going to behave like a machine, like, identically t- the way you're gonna behave. And so in order to c- overcome that, we had to combine self-play with, um, a recognition that humans are gonna behave a lot like how our data suggests. And so using the data set that we have, we're able to build up this model of, a rough model of how humans behave, and then we can improve on that, um, using self-play. We're, we're figuring out a good strategy, but basically a strategy that's compatible with how humans are playing the game. So to, to give some intuition for this, like, you know, there, there's... I- because it's not obvious why this changes when you go from a two-player zero-sum game like chess to a cooperative game like Diplomacy. I mean, also I should say, like, Diplomacy is both cooperative and competitive, but there is a cooperative, a big cooperative component. Like, let's say you're trying to develop a bot that negotiates. If you train that bot from scratch with no human data, um, it's going to... It could learn to negotiate, but it could learn to negotiate in a language that's not English. It could learn to negotiate in some, like, gibberish robot language. And then when you stick it in a game with six humans, that's a negotiation task like Diplomacy, it's not going to, to learn... It's not gonna be able to, to communicate with them, and they're just gonna all work with each other instead of with the bot. Um, that same dynamic happens even in the strategy game, the moves in the game, the, the non-verbal communication aspect. Like, the bot will develop these, like, norms and, and expectations around, like, what its ally should be doing this turn. Like, "I'm going to support my ally into this territory because I'm expecting them to go into this territory. I don't, I don't even have to talk to them about this 'cause it's just so obvious that they should be doing this." Um, but the humans have their own meta game where, like, "Oh, it's actually really obvious that I should be supporting you into this territory." If you don't understand the human norms and conventions, then you're not gonna be able to cooperate well with humans, and they're just gonna not work with you and work with somebody else instead. So, so that's what we really had to overcome, uh, in CICERO, and we managed to do that by using the human data to build this model of how humans behave and then adding self-play on top of that as kinda like a modifier

  7. 23:2924:53

    The Nash Equilibrium and optimal play in poker

    1. NB

      to, to the human data set.

    2. SG

      That actually has some really interesting implications, right? Like, if you believe in the long term, we are going to have bots that take action in the real world interacting with humans, and humans are perhaps not very good at optimal play in the game of life, and you're interacting with them, like, you know, y- y- i- it sort of just brings home the point of how important reasoning could be versus learning, learning p- pattern recognition.

    3. NB

      I, I think you're absolutely right that, like, this, this matters a lot if you want to make AIs that interact with humans in the real world, right? Like, if you have a car driving on the road, a self-driving car, you don't want it to assume that all the other drivers are machines that are gonna act, like, perfectly optimally in every st- ev- every step of the way. Like, you want the self-driving car to recognize that these other drivers are humans and humans make mistakes and somebody could, like, swerve into my lane and, um... E- and yeah, and also, like, you know, just like day-to-day interactions, um...... understanding, like, the non-verbal cues of humans and, like, what that means. The- these are things that, um... or even the verbal cues. These are things that an AI has to be able to, to cope with if it's going to, like, really be useful to humans in the real world, and not just beating them at chess.

    4. EG

      Games have been used, um, for a while now as a way to measure AI progress, and you've worked on poker variants and Diplomacy variants, and you mentioned before, uh, other work people have done in terms of chess

  8. 24:5327:58

    How Cicero interacted with humans

    1. EG

      and Go and things like that. What do you think is the next frontier in terms of games and sort of research on, on them in the lens of AI?

    2. NB

      There's a long history of games as benchmarks for AI, and this goes all the way back to, like, the very fou- foundations of AI back in, like, the 50s. Um, like chess, in particular, was held up as this, like, grand challenge for AI because if we can make an AI that was, like, as smart as a human chess grandmaster, then, like, imagine all the other smart things it could do. Um, of course, th- that turned out to be, like, kind of a false promise, right? Like, you get an AI that plays chess, and it turns out it doesn't really do anything else. But we've learned a lot along the way, and, and games are useful as a benchmark because you can compare very objectively to top human performance. Like, it becomes very clear, um, when you're surpassing human ability in this domain, even if it's a restricted domain. You also have this benchmark that's existed before the AI researchers came along. Like, AI researchers, it's really easy for them to come up with a benchmark once they have the technique already created, you know? You, you come up with a... you come up with a technique, and then you're saying, like, "Okay, well, now, uh, it's really easy to come up with a benchmark that this technique will work for." And you don't want that. You want the, uh, the, the problem to come first, um, and games give you that. But I think we're reaching a point now where individual recreational games, um, are just no longer that interesting of a, of a challenge. Um, I think... You know, I said earlier, we chose Diplomacy, uh, because we thought it would be the hardest game to make an AI for, and I think that's true. I, I can't think of any other game out there where, like, if somebody made an AI that could play that game, I would be like, "Wow, that's super impressive, and I did not think that that was possible." And so I think going forward, the field needs to move beyond looking at individual games and starting to look at... first of all, going beyond games, but also looking at generality. The approach that we've used in Diplomacy is very different from what we previously did in poker and what others have done in chess and Go and StarCraft. And now there's a question of like, okay, well, if we really want a general system, a general AI, can we have it play all these games at a superhuman level, and also able to do things like, you know, image generation and, uh, question-answering and, like, all these tasks? And if we could accomplish that, then that becomes incredibly impressive. And so I think games will, will continue to serve as this benchmark, but it's not... instead of serving as a benchmark that the research kind of overfits to, my hope is that it will serve as a benchmark that we use along other benchmarks, um, outside of games, like, you know, i- image generation benchmarks and, and language Q&A benchmarks and these kinds of things.

    3. SG

      Do you think the thing that you just described, that, um, two-player games, now multiplayer games of negotiation, cooperation, it, it, it seems clear that if you can beat Diplomacy, you can beat most games and... Given that the AI has already won in these restricted domains that are challenging in specific ways, like, how do you think about the domains that are, um, that are gonna be human-skill dominant? Like, are there going to be domains that... like that?

  9. 27:5831:05

    The relevance and usefulness of the Turing Test

    1. SG

    2. NB

      Well, certainly anything in the physical world, I mean, humans still dominate. I mean, when it comes to actually, like, you know, manipulation tasks, these kinds of things, uh, robotics is really lagging behind. I'm, I'm trying to avoid doing anything in the physical world for that reason. Um, software is just so much nicer to work with. There's still things that you can't... that humans are, are definitely better at, even in restricted domains. Um, you look at something like writing a novel. Um, I don't think you can get an AI to output, like, the next Harry Potter just yet. That might not be that far off, maybe it's like five years away or something, um, but I don't think it's, it's happening just yet. It's kind of scary that it's... I'm really struggling to come up with domains where I'm like, "Oh yeah, uh, AI is not gonna be able to surpass humans in this." (laughs)

    3. EG

      I feel like people often talk about areas where humans will always have an advantage just 'cause they're humans and they wanna feel good about the future-

    4. NB

      Uh-huh.

    5. EG

      ... uh, versus because there's necessarily something that shouldn't be tractable from a, at least sheer logical perspective, right?

    6. NB

      Yeah, uh, it, it certainly is. I, I mean, I think the, the big advantage that humans have, and it's not clear when AI will surpass humans in this, is generality. Um, the ability to learn from a small number of samples, to be able to do, like... you know, be useful across a wide variety of domains.

    7. EG

      But isn't that generality overstated? Because I, I feel like in the examples that you mentioned, you said everything from like image gen to Diplomacy in like a single architecture or AI or something, and often it seems like, you know, if you look at the average person, if they're very good at one thing, they're usually not good at everything, right? And so I kinda feel like the bar that we're using in terms of generality for AI sometimes is higher than the bar we'd use for generality for people in some sense. Or is that not a true statement?

    8. NB

      I think it's... it's not just about generality, it's really about sample efficiency.

    9. EG

      Mm-hmm.

    10. NB

      Like, how many games does it take for an A- for a human to become a good chess player or a good Diplomacy player or a good artist? The answer is orders of magnitude less than it takes for an AI. And that is going to pose a problem when you're in domains that don't ha- that... where there isn't much data. Now, that seems like a problem that could be overcome, it's just... I'm just saying that's a problem that hasn't been overcome yet, and I think that that's one of the clear advantages that, that AIs have today over humans... uh, sorry, that humans have over AIs today.

    11. EG

      When do you think we'll see the emergence of AIs in financial or economic systems? And obviously we have like algorithmic trading and other things like that, and then we have things like crypto where you effectively have programmatic approaches to... effectively money wrapped as code, right? (laughs) And the ability to interact with those things in reasonably rich ways through smart contracts. You know, do you think we're, um... there's any sort of near-term horizon of people experimenting with that or just interesting research being done in terms of the actual interaction of a bot with a financial system?

    12. NB

      I think it's already being done. If you look at financial markets, I'm, I'm sure there's tons of, uh, trading powered by deep learning. Um, I've actually talked to a lot of...... uh, finance companies about this. There's a lot of f- I used to work in finance and also, like, a lot of finance companies love poker, and so I- I've given a few talks at, like, various places on AI for poker. And I've talked to a few places about, like, is reinforcement learning actually useful for financial markets, for trading? And the answer I get is usually no. I think the major challenge with, uh, using things like reinforcement learning for, uh, trading is that it's a non-stationary, uh, environment. So

  10. 31:0531:54

    The data set used to train Cicero

    1. NB

      you can have all this historical data, but it- it's- it's not a, it's not a stationary system and it's gonna... Like, the markets respond to world events, these kinds of things, so you need a technique ideally that, uh, really understands the world, not just treating everything like a black box.

    2. EG

      Uh, but could that at all feed into what you were saying about spending, um, more compute on inference versus training? In other words, incorporating real-time signals at the point of decision-making? Or did you mean something else by that in terms of model architecture that would enable you to update weights in certain ways or things like that over time?

    3. NB

      Well, I think it, I think it goes back to the sample efficiency problem, that humans are pretty good at adapting to novel situations, and you run into these, like, novel situations pretty frequently in financial markets. Yeah, I- I- I think it's also a problem of- of generality, that, um, you need to understand so much about the world to really succeed. Now,

  11. 31:5440:10

    Bottlenecks to AI researchers and challenges with scaling

    1. NB

      that- that said, I mean, I think that the AIs are successful in- in financial markets in, like, fairly limited, um, ways, uh, certainly if you wanna, like, break up big orders, these kinds of things. Also, I should say, like, I'm not an expert in this. Like, this- this is kind of outdated knowledge for- for me, um, 'cause I- I'm sure, like, there's a lot of cutting edge stuff that's happening that people are not telling me about because it's making money, um, but I can tell you that this is, like, kinda the perspective as of, like, maybe five years ago. It's being used in limited ways, but I don't think it's- it's fully replacing humans yet.

    2. SG

      Do you think we're gonna get bots that, um, negotiate with humans soon? Let me preface that as we are eventually going to get them. What do you think the timeline is or the use case?

    3. NB

      That- that seems doable. It depends on how- how constrained the domain is. I think if you were to look at constrained domains, uh, constrained negotiation tasks, I think that AIs could probably s- do better than humans in that today. I- I mean, I'm trying to think of, like, specific examples, but things like, um, you know, if you wanted to negotiate over the price of a good, um, it could probably do better than- than a human in a lot of those, in a lot of those situations. I think if there's things like salary negotiations, um... it might do better than humans at- at that also. Um, I think it depends on how much you need to know about the world. I- I think contract negotiations, for example, would still be difficult because there's so much subtlety, there's so much nuance to, like, every contract, and it's not gonna replace a professional negotiator for that kind of task just yet. Um, but kind of the things that are more constrained don't require as much, like, outside knowledge about the world. I think AIs are probably up to the task already.

    4. EG

      So a friend of mine, um, who used to work with you says that one of the things you're really exceptional at is you tend to pick a neglected research domain with lots of promise, you commit to it long term, and then you become the best at it. And many people in the world kinda get attracted to shiny things instead and kinda distracted by, you know, whatever's in vogue, but then it turns out to be less interesting research. Um, what are you thinking about working on next or what interests you as sort of the next wave of stuff to do?

    5. NB

      I think the- the big thing I'm interested in is, is the reasoning problem. And this is kinda motivated by my experience in these- in these- this game space. You look at things like AlphaZero, uh, the latest version of AlphaGo, and I think that's held up... Like, AlphaGo in particular is held up as this, like, um, this big milestone in deep learning, and to some extent it is. Like, it was not doable without deep learning, but it wasn't deep learning alone that enabled that. If you take out the planning that's being done in AlphaGo and just use the raw policy network, the raw neural network, it's actually substantially below top human performance. And with just, like, raw neural nets, we have all these things that are incredibly powerful, like, you know, chatbots, um, image generation software, but the raw neural net itself still can't play Go. It requires this extra planning algorithm on top of it to- to achieve top human performance. And that planning algorithm that's used in AlphaGo, Monte Carlo tree search, is very domain specific. Um, I think people don't appreciate just how domain specific it is because it works in chess, it works in Go, and these have been, like, the classic domains that people have cared about for investigating these kinds of techniques. Um, it doesn't work in poker. It doesn't work in diplomacy. I- because I've worked in those domains, I kind of, like, recognize that this is- this is a major weakness of these kinds of algorithms, and so I think there's a big question of, like, okay, how do we get these models to be able to do these, like, complex reasoning planning tasks, um, with a more general system that can work across a wide variety of domains? Um, and if you can enable that, i- if you can- if you can do that, if you can succeed in that task, then it enables a lot of really powerful things. Like, one of the domains that I- I'm thinking about is theorem proving. You know, it doesn't seem crazy to me that you could have a model that can prove the Riemann hypothesis within the next five years, um, if you can- if you can solve the reasoning problem in a truly general way. And- and- and yeah, you know, maybe- maybe the inference cost is huge. Like, maybe it costs a million dollars per token to generate that proof, but that seems totally worth it if you can pull it off. And maybe you can do other things with it too, like maybe- maybe that's... maybe that allows you to, like, you know, write the next prize-winning novel. Um, maybe that enables you to come up with, like, life-saving drugs.

    6. SG

      Just for context, the R- the Riemann hypothesis is, like, considered the- the most important unsolved problem in math where, I don't know, the first X set of, um, solutions have been checked, but we- we don't know for sure yet.

    7. NB

      Yeah, and I- I think the- the key is that- that I'm really interested in is the generality. Like, it... we can approach... we can solve this problem in domain-specific ways...... but then it's, it always ends up, like, kinda overfit to that domain. And so I think what we need is something, um, as general as what we're seeing with, uh, transformers, um, where you just throw it at any sort of problem and it, and it works surprisingly well.

    8. SG

      And, um, I, I guess you're implying that, um, there are ways to... there are ways to frame the problem to make progress that are more general, but really interesting to making progress in reasoning, and that could be around math or possibly code. Is, is that the right understanding?

    9. NB

      My, my hope is that the techniques are general. I mean, I think it's important to also look at a wide variety of domains in order to, like, prevent you from overfitting. Um, and, yeah, one of the domains that I think would also be a good fit is code generation, because I think to write good code, like next, next token prediction is going to... is getting you surprisingly far, um, but I don't think it's gonna get you all the way there to, like, replacing, you know, um, engineers at, at, uh, big companies.

    10. SG

      Yeah, maybe one piece of just context for listeners is, um, Copilot is amazing, right? But what we are doing with code generation today is very local context specific.

    11. NB

      Yeah, and so if you wanna, like, plan out, like, a whole, a whole product, like, that's doesn't seem doable with existing technology. And, you know, I, I think, I think the perspective of a lot of people when they, when they hear me say this is like, "Well, you know, but you just scale it, you know? You scale up the models, you scale up the training, and that's always worked in the past." Um, and the example I like to give is you look at... okay, you look at AlphaGo. You could just, in theory, scale up the training, scale up the model capacity, and you don't need planning then. You just have, like, a really large... you, you run this reinforcement learning algorithm for a really long time, you have this really big network, and it will eventually learn, in, in theory at least, how to beat, uh, expert humans in Go. But there's a question of like, okay, well, how much would you have to scale it up? How much would you have to scale up this raw neural net, the capacity and the training, in order to match the performance that it achieves with Monte Carlo's research? And if you crunch the numbers and... it ends up being 100,000X. Now, y- these models are already costing, like, $50 million. Like, clearly, you're not gonna be able to scale them, um, by 100,000X, uh, a- and, um, and so then, then there's a question of, like, okay, well, what do you do instead? And the answer in AlphaGo is like, well, instead of having all that computation be during training, you also have it spend, like, 30 seconds to figure out what move to, to make next when it's actually playing the game, and that shifts the, the cost burden from having to, like, pre-compute everything to then being able to think on the fly. And, and so that's why I think that avenue seems like, um, the, the, the piece that's missing.

    12. EG

      A, a really random question, because if you look at the human brain, you have these very specialized modules with very specific functions, right? You have the visual cortex for visual processing, you have, like, different things for emotion. In terms of specific modules, like, there's specific, um, parts of the brain that if you ablate, you remove, um, certain emotive or other capabilities, right? There have been accidents where, like, poles have gone through people's heads and ablated (laughs) a very specific place and the people have survived, and so you see this sort of, um, very specific ablation of function through the ablation of specific modules. Why is it the correct assumption to think that there should be a generalizable architecture versus you just have a bunch of sub-models that are all running together that collectively enable a wide range of behavior, which is effectively what we see in the brain?

    13. NB

      That's a good question. Um, I, I don't think that we need to be tied to a specific technique, and the answer might be that we need to have, like, more specialized systems instead of just, like, one truly general architecture.

  12. 40:1042:55

    The next frontier in researching games for AI

    1. NB

      I think what I'm thinking about is, is more the goal rather than the approach. We want something that's able to succeed across a wide variety of domains and having to come up with, like, a unique approach to every single domain, that gets you part of the way, but I think that eventually that will be superseded by something that is truly general.

    2. EG

      Yeah, that makes sense, and I, I guess, uh, you know, one big domain is just reasoning, right? So, uh, I didn't mean to imply that it's different sub-types of reasoning will require different approaches, but more there may be really big things that, uh, fundamentally may function in a very different way. Um, and again, that may be incorrect, right? The brain is a evolved system, which means it has enormous limitations in terms of where it came from and how it got created, and you often end up with these local maxima when you evolve a system, right? Um, I was just sort of curious about how you thought about that.

    3. NB

      Yeah, there's certainly a risk always with research that you, you could end up in a, a local minimum, and it's, like, hard to... people, like, overfit to that, and I think, I think actually, like, machine learning was an example of this. Like, deep learning, not many people were, were focused on this 'cause they kind of assumed it was this dead end, that there were only a few people out in the, you know, like, Canadian wilderness that were working on this. Um, and that ended up, like, being tremendously successful, um, and so there's value in diversity, um, there's value in, uh, diversity of approaches, and, um, I, I think, I think it does help to try to think outside of the box and try to do something that's, like, a little bit different than what everybody else is doing.

    4. SG

      Uh, Noam, you are gonna go work on this really interesting area. Um, I'm sure there are other problems you think are interesting, especially given, uh, the, I don't know, practical limits of how much money we're willing to spend on scaling up beyond another magnitude or two. What do you think other researchers or teams should be working on that they're not paying enough attention to?

    5. NB

      Well, I think we're in an interesting place now in AI where there is a huge opportunity to build out products for... it's, uh, given what, given where things are at now, there's already an opportunity to build out products, um, that can have a big impact on the world. It's great to see that there are people that are going in that direction and, and trying to, like, bring these... this research into the real world and have a big impact there, um, make people's lives better.

    6. SG

      For what it's worth, both Elad and I got emails from multiple people telling us that they're, um, building price negotiation agents as they... as we speak.

    7. NB

      Well, that's... (laughs) E- I, like I said, I think it's doable. So I think, I think it's the right call. I think on the research side, there's, there's still a lot of interesting questions about, like, how do we make these things more efficient? Um, are there, are there better architectures we can use? I mean, I think there's just so many questions across the board, um, that are interesting. I think the, the big thing I would, I would recommend to researchers is not...... about, like, which area to focus on, but just, like, the style of research. I think

  13. 42:5548:13

    Domains that humans will still dominate and applications for AI bots in the real world

    1. NB

      there's a tendency to play it safe and to not take big risks. And I think it's important to recognize that research is an inherently risky field. You know, there's a high probability that what you're working on is not going to, is not gonna be useful in the long term. You have to kind of accept that and be willing to take that risk anyway. I mean, this, this happened to me. Like, by the early research in my PhD, in the grand scheme of things, really wasn't that useful. Um, it, it, like, it didn't make as much impact in the long term as, as I, as I would have hoped. And, um, and that's okay because, you know, I had one thing that ended up being quite impactful. Um, and so I, I think it's, uh, it's important to, to be able to take those risks, um, kind of like going into the field recognizing that, that you are taking a risk already by going into research.

    2. SG

      You heard it here first. Be like Noam. Work on things that make you nervous.

    3. NB

      (laughs)

    4. EG

      Do you wanna give a quick, um, minute overview of Diplomacy so people can understand what it is and why the research was such a breakthrough?

    5. NB

      Yeah. Diplomacy is this game. Um, it was developed in the '50s. Um, it was actually developed by, by this guy who saw what happened in World War I and kind of viewed this as a diplomatic failure. And so he wanted to create this game that would teach people how to be better diplomats, essentially. Um, and so it takes place at the onset of World War I. There's seven play- powers that you can play as. Um, England, France, Germany, Italy, uh, Russia, Turkey, and Austria-Hungary. And you engage in these, like, complex negotiations every turn, and your goal is to try to control as much of the map as possible. And the way you win is by controlling a majority of the map. It's kind of like Hunger Games, where even though only one person can win at the end of the day, there's still this, like, incentive to be able to work together, especially early on, um, because you can both benefit and have a better chance of winning in the end if you work together. And so you have these, like, really complex negotiations that happen. Um, players... A- and all the communication is done in private, so y- uh, unlike a game like Risk, for example, or Settlers of Catan where, like, all the negotiation is done, um, in front of everybody else. In Diplomacy, you will actually, like, pull somebody aside, go into a corner, like, scheme about who you're going to attack together this turn, who's gonna support who. Um, and then after you've negotiated with everybody, you write down what your moves are for the turn. And so then all the moves are read off at the same time, and you can see if people, like, actually, um, follow through on their promises about, like, helping you in s- and... Or maybe they, they lied to you and they're just gonna attack you this turn. Um, so it has, it has, like, some elements of Risk, poker, and Survivor. 'Cause there's this big, um, big trust component, and that's really the, the essence of the game. Like, can you build trust with others? Because the only way to succeed in this game is by working together, even though, um, you know, you always have an incentive to, to attack somebody and, um, grow at their expense. So yeah, that's the game. It's been around for a long time, like I said, since the '50s. It was JFK and Kissinger's favorite game. There's research on, f- for this game, for this game on A- y- from an AI angle going back to the '80s. Um, but the idea that you could play this game in natural language with humans and beat them was just complete science fiction, um, until a few years ago. Like, it was still science fiction, but we at least thought it was, like, worth pursuing it. Um, and research really took off in 2019 when researchers, um, started using deep learning to make ver- big bots through this game that could play the non-language version. So there's no communication. You just write down your moves and, um, and you kinda have to communicate non-verbally through the actions that you take. We, we were doing research on this. DeepMind was doing research on this, and then also University of Montreal and a couple other places as well. And there was, there was a lot of interest and, and progress, but, um, we decided to take the risky bet of just, like, jumping to the end point. And instead of taking an incremental approach, aiming for full natural language diplomacy. Um, and, uh, I'm glad that we aimed for that.

    6. EG

      That's a... It seems like one of the pretty amazing things about what y'all did is you basically created bots that, um, other people, that humans thought were other people, and therefore they had to learn how to collaborate with each other, how to sometimes lie or deceive, how to sometimes, um, think through sort of multiple moves from a game theoretic perspective. And so it's a, it's a radically different thing than playing chess or, or playing Go against another person and then just having almost a probabilistic tree of moves or something.

    7. NB

      Yeah. The... You run into this, like, human element. Um, you really have to understand the human element. And, um, what's really interesting about Diplomacy, aside from just the natural language component, i- is that it really is the first major game AI breakthrough in a game that involves cooperation. That's really important because, you know, at the end of the day, when we make these AIs that play chess and Go, um, we, we're not developing them with the purpose of, of beating humans at games. We, we want to, you know, have them be useful in the real world. And if you wanna have these AIs be useful in the real world, then they have to understand how to cooperate with humans as well.

    8. SG

      Elad and I were talking about, um, centaur play, and, uh, whether or not that would persist as an idea at all given, you know, uh, like, we've accepted that AIs are gonna win games at this point. Um, but I think, like, you know, the idea that AIs are going to take action by cooperating with humans, that needs to be a, a core capability. Seems obvious. And

  14. 48:131:00:36

    Reasoning challenges with AI

    1. SG

      I am... Perhaps this is the, um, making myself feel better story, but I, I am hopeful that that is, uh, a, a human skill that remains quite important, being able to cooperate with AIs.

    2. NB

      Well, from what I hear, centaur play is like, AIs have gotten so strong in games like Chess that it's not clear if the human is really adding that much, uh, these days.

    3. EG

      That's what I told Sara too. (laughs)

    4. SG

      Yeah, I'm crying. I get it. I get it. I accept it.

    5. NB

      Yeah. Um, I don't... I, I think, I think the humans are still useful in, in a game like Go, 'cause, like, the AIs are super strong, but they will also sometimes, like, a few times in each game, make these, like, really weird blunders. Um, and in Diplomacy, I think, yeah, it's super helpful.... to have, like, an experienced human i- in addition to the AI. Though, like, you know, eventually, I'd imagine that these systems become so strong that, like, it kinda goes the way of chess, where, like, the human's just kinda like adding a marginal difference at the end.

    6. SG

      Yeah, I'm- I'm actually just, uh, you know, wondering how long that window is for humans and Centaur playing the game of life, right? But it's okay. It's okay, I got it. Elad is right.

    7. NB

      Uh, hopefully, yeah, hopefully forever but, we know, we'll see.

    8. EG

      Yeah, so, uh, do you mind explaining the work that you've done in poker and some of the breakthroughs that you made there as well?

    9. NB

      Yeah, my- my PhD research, um, was really focused on how do you get an AI to beat top humans in the game of No-Limit Texas Hold 'Em Poker, especia- uh, specifically during my PhD it was on Heads-up No-Limit Texas Hold 'Em Poker, that's- that's two-player poker. Um, and this was a longstanding challenge problem. Actually, if you go back to the original papers written on game theory by John Nash, uh, the only application that's discussed in the paper is poker. He actually analyzes this, like, simple three-player poker game, uh, in the paper and- and works out the Nash equilibrium by hand. Um, and then actually at the end he says like, "Oh yeah, it'd be really interesting to analyze a much more complex poker game using- uh, using this approach." Um, so I'm glad we finally got a chance to do that, you know, 60 years later. Um, and it's- it's interesting, I think especially after AlphaGo, this became a very, uh, popular problem because after AlphaGo, there was a big question of like, okay, well AIs can now beat humans at chess, they can beat humans at Go, what can't they do? And the big thing that they couldn't do was be able to reason about hidden information, be able to understand that, okay, this other player knows things that I don't know, and I know things that they don't know. And being able to overcome that problem in a strategic setting, um, was a big unanswered question. And yeah, so that- that was the focus of my research for basically, um, my- my whole grad school experience. And there were a few different research labs that were working on this, and what would happen is every year we would all make a poker bot and we would play them against each other in this competition called the Annual Computer Poker Competition. Basically what happened is when I started my PhD, there had already been, like, some progress in AI for poker, and so the competition really turned into a competition of scaling. There's about, like, 2.5 billion different hands that you could have on the river, like the last round of poker, uh, in Texas Hold 'Em. And what we would do is cluster those hands together and, uh, using k-means clustering and, like, treat similar hands ide- identi- hands identically, and that allows you to compute, uh, a policy for poker because now instead of having to worry about 2.5 billion hands and, like, having to, uh, come up with a policy for each one of those, you can now, like, bucket them together and now you have, like, 5,000 buckets or something and you can actually compute a policy for th- for that many buckets. And so this was, like, before neural nets, that's why we were doing these, like, this k-means clustering thing instead of, uh, deep neural nets, but you can kind of think of it as, like, the number of buckets that you have is kind of like the number of parameters that you have in your- in your network. And so in grad school, it kinda turned into a- a competition of scaling, um, how many buckets could you have in your bot when I... First year, it was like 5,000 buckets, then we got up to 30,000 buckets and then 90,000 buckets. Um, every year we would have these bigger and bigger models, we would train them for longer, parallelize them, and they would always beat the previous year's model. Um, and in 2014, we actually won the Annual Computer Poker Competition, and after that we decided to take our bot and play it against expert human players. And so this was the first, um, what was called the Brains versus AI Poker Competition where we invited, like, these four top Heads-up No-Limit Texas Hold 'Em poker pros, and we had them play 80,000 hands of poker against our bot. And the bot actually lost by a pretty sizable margin, and it occurred to me, like, during this competition that the way the humans were approaching the game was actually very different from how our bot was approaching it. So we would train our bot for, like, two months leading up to this competition, you know, on- on 1,000 CPUs. Um, but then when it came time to actually play the game, it would act instantly, and the humans would- would do something different. Like, they would, um, you know, obviously they would practice ahead of time, they would ve- develop an intuition for the game, but when they were playing the game against the bot and they were in a difficult spot, they would sit there and they would think, and sometimes it was, like, five seconds, sometimes it was, like, a minute. But- but they would think, uh, that would allow them to come up with this better solution. And it occurred to me that this might be, like, something that we're missing from our bot, and so I did this analysis after the competition to figure out, okay, if we were to add this search, this planning algorithm that would come up with a better strategy, um, when it's actually in the hand, how much better could it do? And the answer was, it improved the performance, uh, by about 100,000X. It was the equivalent of scaling the model, like, scaling the number of parameters, scaling the training by 100,000X. Now, the three years of my, uh, PhD at that point, I had managed to scale things by about 100X and, you know, that's, like, quite good. I was very proud of that. But when I saw that result, it made me appreciate that everything I had done in my PhD up until that point was just a footnote compared to adding search and scaling search. Uh, and so for the next year, I just worked basically non-stop, like, 100-hour weeks trying to scale up search, make it as, like, throw as much computation at the problem a- a- at inference time as possible. And then we did another competition in January 2017 where we played against four top expert poker players again, um, $200,000 in prize money to incentivize them to play their best, and this time, we completely crushed them. People were literally telling us, like, eh, poker players were literally telling us they did not think it was possible to beat, uh, expert poker players by that kinda margin. Yeah, and so that's- that's, uh...... the story of, like, you know, my grad school experience, working on poker AI. That was for two-player poker. Um, we ended up after that working on multi-player poker, on six-player poker. Again, the, the big breakthrough there was that we developed a more scalable search technique, so instead of always having to search to the end of the game, it could search just a couple moves ahead. Um, and what was really interesting there is the bot... We did another competition that bot won, and that bot cost under $150 to train if you were to run it on, like, a cloud computing service. Um, and I think that shows that this wasn't just a matter of scaling, um, compute. It, it really was an algorithm breakthrough. And this, this kind of result would have been doable 20 years ago if people knew the approach to take.

    10. EG

      How did that... Uh, if you look at a lot of other games, the- those sorts of, uh, big shifts in performance from a bot relative to people then shifts how people play, right? They learn from the bot, or they adapt their game from watching games that the bots play. How did that play out in terms of poker?

    11. NB

      Oh, that's... Yeah, that's a great question. So, you know, the competition, it, it was re- it was really interesting 'cause, you know, so kind of like as a last-minute thing, we, uh, we added this ability. So okay, the way the bot works, we give it different bet sizes that it can use. Like, there's... The game that we were playing, there's 20,000 chips, $100, $200 blinds. Um, or 50 $100 blinds, actually. Um, and so it can bet any- it can bet any amount it wants from, like, $100 up to $20,000. Um, and so there's not much value in, like, being able to bet both $5,000 and $5,001. And so we would discretize that action space to constrain it to, like, only considering a few different options. And so there's a question of, like, okay, well, what sizes do you give it the choice between? You know, towards the end when we were developing this bot, it, like... We just had room for extra computation, and so we just, like, threw in some extra sizes, um, like 4X- 4X the pot, 10X the pot. Like, it- it doesn't cost that much more, so why not just give it the option? Um, I didn't think it would actually use those sizes. And then during the competition, it, it actually ended up using those sizes a lot. Um, and it would sometimes bet, like, you know, $20,000 into a $100 pot, which was completely unheard of in, uh, professional poker play. And, um, you know, I was a little worried about this 'cause I, I thought it was a mistake at first, and I think the, the players that we were playing against also thought it was a mistake at first. Um, but then they, they found that they kept ending up in these, like, really tricky situations and, you know, they would just really struggle with, like, whether to call or fold. And that's, that's how you know you're playing good poker. If you see the other person, like, really struggling with a decision, that is a sign that you're doing something right. Um, and at the end, they told us, like, "Yeah, that's the one thing that we're gonna try to incorporate into our own play, adding these, like, what are called over-bets into, uh, into our strategy." So instead of, like... Typically, the, the strategy was like, oh, you bet between a quarter of the size of the pot and one times the pot. And now in professional poker play, it's actually really, well, I wouldn't say common, but it, it, it's, um, it's, uh, part of the strategy to bet sometimes like 5X the pot, 10X the pot. Um, if you can pull it off in the right way, it can be a very powerful strategy.

    12. EG

      That's super cool.

    13. NB

      And I should also say, like, the way professional poker players train now, they all use bots to, to assist them. It's- it's a lot like chess, where, um, you play the game and then you have a bot analyze your play at the afterwards and see, like, okay, did you make mistakes? Where did you make mistakes? How could you do better next time? Um, the game really has been demystified and become a lot like chess. I kind of describe poker as essentially high-dimensional chess. It's ch- it's like chess where you have to reason about, like, a probability distribution over actions instead of just, like, uh, discrete actions.

    14. SG

      Yeah, and it's real- it's really interesting because I don't think people really believed there was fully optimal play in poker before. Like, they understood the probability distribution, but, uh, if you're playing live poker, like, there's, uh, social cues, right, and social play. And, um, that has clearly been swept out. Not as an activity of, like, um, uh, enjoyment, but, uh, in terms of a strategy that actually wins.

    15. NB

      Yeah, I think that's surprising to a lot of people that... This idea that there is an optimal way to play poker, um, that... You know, there- there's this thing called the Nash equilibrium where if you're playing that strategy-

    16. SG

      You always win. (laughs)

    17. NB

      Uh, it... You, you, you'll never lose. So there's, there's a subtle difference. Like, um, it guarantees that in the long run, you will not lose in expectation, and the reason for that is because, like, if you're playing against somebody else that's also playing the Nash equilibrium, like, obviously you can't both win. One of you, um, is going to lose, or you're gonna tie. And so in expectation, if you're playing against each other- against somebody else that's playing the Nash equilibrium, um, you're gonna end up tying. But in practice, what ends up happening is if you're playing the Nash equilibrium in a complicated game like poker, the other person is gonna make these small mistakes over time, and every mistake that they make is money in your pocket. Um, and so you just play the Nash equilibrium, wait for them to make mistakes, and you end up winning. And that is now the conventional wisdom among poker players, that you start by playing the Nash equilibrium. I- if you're, if you're really good, you can look at the other players, see how they're, um, deviating from the Nash equilibrium, playing sub-optimally, and maybe you can, like, deviate yourself to capitalize on those mistakes. But really, the safe thing to do is play the Nash equilibrium, let them make mistakes, and every mistake that they make costs them money and puts money in your pocket.

    18. SG

      I think that's all we have time for. Uh, thank you so much for joining us on the podcast, Noam.

    19. NB

      Yeah, thank you very much for having me.

    20. NA

      (music)

Episode duration: 1:00:36

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode AAv1S9E6ogE

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome