Skip to content
Lex Fridman PodcastLex Fridman Podcast

Oriol Vinyals: DeepMind AlphaStar, StarCraft, and Language | Lex Fridman Podcast #20

Lex Fridman and Oriol Vinyals on deepMind’s AlphaStar: StarCraft Mastery, Language Roots, and Future AI.

Lex FridmanhostOriol Vinyalsguest
Apr 29, 20191h 46mWatch on YouTube ↗

EVERY SPOKEN WORD

  1. 0:002:05

    Oriol’s early years: from gaming obsession to StarCraft competition

    1. LF

      The following is a conversation with Ariel Viñales. He's a senior research scientist at Google DeepMind, and before that, he was at Google Brain and Berkeley. His research has been cited over 39,000 times. He's truly one of the most brilliant and impactful minds in the field of deep learning. He's behind some of the biggest papers and ideas in AI, including sequence-to-sequence learning, audio generation, image captioning, neural machine translation, and of course, reinforcement learning. He's a lead researcher of the AlphaStar project, creating an agent that defeated a top professional at the game of StarCraft. This conversation is part of the Artificial Intelligence podcast. If you enjoy it, subscribe on YouTube, iTunes, or simply connect with me on Twitter @lexfridman, spelled F-R-I-D. And now here's my conversation with Ariel Viñales. You spearheaded the DeepMind team behind AlphaStar that recently beat a, uh, top professional player at StarCraft. So, you have an incredible wealth of work in deep learning and a bunch of fields, but let's talk about StarCraft first. Let's go back to the very beginning, even before AlphaStar, before DeepMind, before deep learning. First, what came, uh, first for you, a love for programming or a love for video games?

    2. OV

      I think for me, it definitely came first the drive to play video games. I really liked computers. I didn't really code much, but what I would do is I would just mess with the computer, break it and fix it. That was the level of skills, I guess, that I gained in my very early days, I mean, when I was 10 or 11. Um, and then I- I really got into video games, especially StarCraft, actually, the first version. I spent most of my time just playing kind of pseudo-professionally, as professionally as you could play back in '98 in Europe, which was not a very main scene like the- what's called nowadays eSports.

  2. 2:053:11

    How Oriol played StarCraft: random race, Zerg preference, and learning opponents

    1. LF

      Right. Of course, in the '90s. So, uh, how'd you get into StarCraft? What- what was your favorite race? How- how do you develop- how did you develop your skill? What- what was your strategy? All that kind of thing.

    2. OV

      So as a player, I tended to try to play not many games, not to kind of disclose the strategies that I kind of developed, and I like to play random, actually. Not in competitions, but just to... I- I think in StarCraft there's, well, there's three main races, and I found it very useful to play with all of them. Um, so I would choose random many times, even sometimes in tournaments to gain skill on the three races, because it's not how you play against someone, but also if you understand the race because you play it, you also understand what's annoying, what... Then when you're on the other side, what to do to annoy that person, to try to gain advantages here and there and so on. So, I actually played random. Although I must say, in terms of favorite race, I really like Zerg. Um, I was probably best at Zerg, um, and that's probably what I tend to use towards the end of my career before starting university.

  3. 3:116:57

    StarCraft explained for non-gamers: RTS as partial-information chess in real time

    1. LF

      So, let's step back a little bit. Could you try to describe StarCraft to people that may never have played video games, especially the massively online variety-

    2. OV

      Right.

    3. LF

      ... like StarCraft?

    4. OV

      So StarCraft is a real-time strategy game, and the way to think about StarCraft perhaps if you understand a bit, um, chess, is that there are... There's a board-

    5. LF

      Mm-hmm.

    6. OV

      ... which is called map or- or- or, like, yeah, like the map where people play against each other. There's obviously many ways you can play, but the most interesting one is the one-versus-one setup-

    7. LF

      Mm-hmm.

    8. OV

      ... where you just play against, um, someo- someone else or even the built-in AI, right? The... Blizzard put, uh, a system that can play the game reasonably well if you don't know how to play. And then in this board, you have, again, pieces like in chess, but these pieces are not there initially like they are in chess. You actually need to decide to gather resources to decide which pieces to build, so in a way you're starting almost with no pieces. You start gathering resources. In StarCraft, there's minerals and gas that you, you can gather, and then you must decide how much do you want to focus, for instance, on gathering more resources or starting to build units or pieces. And then once you have enough pieces or maybe, like, attack, um, you know, a- a good attack composition, then you go and attack, um, the other side of the map. And now the other main difference with chess is that you don't see the other side of the map, so you're not seeing the moves of the enemy. Um, it's what we call partially observable. So as a result, you must not only decide trading off economy versus building your own units, but you also must decide whether you want to scout to gather information, um, but also by scouting you might be giving away some information that you might- might be hiding from the enemy. So, there's a lot of complex decision-making, um, all in real time. There's also, unlike chess, this is not a turn-based, um, game. You play basically all the time continuously, and thus some skill in terms of speed and accuracy of clicking is also very important, and people that train for this really play this game at- at an, um, amazing skill level. I've seen many times this, and if you can witness this live, it's really, really impressive. Um, so in a way it's kind of a chess where you don't see the other side of the board, you're building your own pieces, and you also need to gather resources to basically get some money to build other buildings, pieces, technology, and so on.

    9. LF

      From the perspective of the human player, the difference between that and chess, or maybe that and, uh, a game like turn-based strategy like Heroes of Might and Magic-

    10. OV

      Mm-hmm.

    11. LF

      ... is that there's an anxiety 'cause y- you have to make these decisions really quickly, and- and, uh, if you are not actually...... uh, aware of what decisions work. It, it's a very stressful balance that you have to ... Everything you described is actually quite stressful, difficult to balance for a amateur human player. I don't know if it gets easier at the professional level, like if they're fully aware of what they have to do, but at the amateur level there's this anxiety, "Oh, crap. I'm being attacked. Oh, crap. I have to, uh, build up resources. Oh, I have to probably expand." And all these, uh, the time, the, uh, realtime strategy aspect is really stressful and computation, I'm sure, difficult. We'll get into it, but, uh, for me, Battle.net ... Uh, s- so StarCraft was released in, uh, '98, t- 20 years ago, which is hard to believe. Uh, and, uh, uh, Blizzard Battle.net with, uh, Diablo in '96 came out. And to me, I ... It might be a narrow perspective, but it changed online gaming and perhaps society forever. (laughs)

    12. OV

      Yeah.

  4. 6:5712:16

    Online gaming’s transformation: LAN parties, Battle.net, and the social fabric of games

    1. LF

      But I, I may have way too narrow a, a viewpoint but from your perspective, uh, can you talk about the history of gaming over the past 20 years? Is this ... How transformational, how important is this, uh, line of games?

    2. OV

      Right. So, I think I, I kind of was an active gamer whilst this was developing, the internet, um, online gaming. So for me the, the way it came was I played other games strategy-related.

    3. LF

      Yeah.

    4. OV

      I played a bit of Command & Conquer, and then I played Warcraft II, which is from Blizzard. But at the time, I didn't know. I didn't understand about what Blizzard was or anything. Warcraft II was just a game which is, which was actually very similar to StarCraft in many ways. It's also a realtime strategy game, um, where there's orcs and humans, so there's only two races.

    5. LF

      But it was offline.

    6. OV

      And it was offline, right?

    7. LF

      Yeah.

    8. OV

      So, I remember a friend of mine came to, to school saying, "Oh, there's this new cool game called StarCraft." And I just said, "Oh, this sounds like just a copy of Warcraft II," um, until I kind of installed it. And at the time, um, I, I am from Spain so we didn't have inter- like, very good internet, right? So there was ... For us, StarCraft became first kind of an offline experience where you kind of start to play these missions, right? You play against some sort of scripted things, uh, to, to de- develop the story of the characters in the game.

    9. LF

      Yeah.

    10. OV

      Um, and then later on I started playing against the built-in AI, um, and I thought it was impossible to defeat it. Then eventually you defeat one and you can actually play against seven built-in AIs at the same time, which also felt impossible. But actually, it's not that hard to beat seven built-in AIs at once. So once we achieved that, also we discovered that we could play ... Um, as I said, internet wasn't that great, but we could play with a LAN, right? On, like basically against each other if we were in the same place, because you could just connect machines with, like, cables, right? Um, so we started playing in LAN mode and w- against, you know, as a group of friends, and it was really, really, like, much more entertaining than playing against AIs.

    11. LF

      Mm-hmm.

    12. OV

      And later on, as internet was starting to develop and being a bit faster and more reliable, then it's when I started experiencing Battle.net, which is this amazing universe not only because of the fact that you can play the game against anywhere, anyone in the world, but you can also get to know more people. You just get exposed to now, like, this vast variety of ... It's kind of a bit when the chats came about, right? There's, there's a ... There was a chat system. You could play against people, but you could also chat with people not only about StarCraft but about anything. And that became a way of life for kind of two years, um, a- a- and, and obviously then it became like kind of it exploded in mean that I started to play more seriously, going to tournaments and so on and so forth.

    13. LF

      Do you have a sense on a societal, sociological level what's this whole part of society that many of us are not aware of? And it's a huge part of society, which is gamers. I mean, um, uh, every time I come across that in YouTube or streaming sites, I mean, this is a hu- the huge number of people play games r- religiously. Do you have a sense of those folks, especially now that you've returned to that realm a little bit on the AI side?

    14. OV

      Yeah. So, uh, in fact, I, uh, even after StarCraft, I actually played World of Warcraft, which is maybe the-

    15. LF

      Oh, no. (laughs)

    16. OV

      ... the, the main sort of online world or presence that you get to interact with lots of people. So, I played that for a little bit. It was ... To me it was a bit less stressful than StarCraft because winning was kind of a given. You're just put in this world, and you can always complete missions. But I think it was actually the social aspect of especially StarCraft first and then games like World of Warcraft really shaped me in, in a very interesting way b- because you ... What you get to experience is just people you wouldn't usually interact with, right? So, even nowadays I still have many Facebook friends from the era where I played online, and their ways of thinkings, even political, um, they just ... The, we don't live in, like, we don't interact in the, in the real world but we were connected by basically fiber. And that way I actually get to understand a bit better, um, that we live in a diverse world and these were just connections that were made by ... Because, you know, I happened to go in the city, in a beautiful city, uh, as a priest, and I met this, you know, this warrior and we became friends, and then we started, like, playing together, right? So, I think it's, it's, it's transformative and more and more and more people are more aware of it. I mean, it's, it's becoming quite mainstream. But back in the day, as you were saying, in 2000, 2005 even, it was very, still very strange, um, thing to do, especially in, in Europe. I think there were exceptions. Like Korea, for instance, it was amazing, like, that, tha- that everything happened so early in terms of, um, cyber cafes. Like, it's ... If you go to Seoul, like, it's a city that-... um, back in the day, StarCraft was kind of ... you could be a celebrity by playing StarCraft. But this was, like, '99, 2000, right? It's not, like, recently. So, um, yeah, it's quite, it's quite interesting to, to look back and, and ... yeah, I think it's changing society the, the same way, of course, like technology and social networks and so on are also transforming things.

  5. 12:1613:29

    Gaming and productivity: time tradeoffs and “choose two out of three”

    1. LF

      And a quick tangent. Let me ask, you're also one of the most productive people in your particular chosen, uh, passion and path in life, uh, and yet you're al- also appreciate and enjoy video games. Do you think it's possible to do, uh, uh, to enjoy video games in moderation?

    2. OV

      Someone told me that you could choose two out of three.

    3. LF

      (laughs)

    4. OV

      Um, when I was playing video games, you could choose having a girlfriend, um, playing video games, or studying.

    5. LF

      Yeah.

    6. OV

      And I think for the most part, it was re- relatively true. These things do take time. Games like StarC- if you take the game pretty seriously and you wanna study it, then you obviously will dedicate more time to it. And I definitely took gaming and, obviously, studying very seriously. I, I, I love learning science and etc. Um, so to me, especially when I started university undergrad, I kind of stepped off StarCraft. I actually fully stopped playing. Um, and then World of Warcraft was a bit more casual. You could just connect online, and, I mean, it was, it was fun, um, but I, as I said, that was not, um, as much time investment as it was for me in StarCraft.

  6. 13:2923:22

    AlphaStar origin story: from Berkeley Overmind to DeepMind + Blizzard partnership

    1. LF

      Okay, so let's get into AlphaStar. What are the, uh ... you're behind the team, the, the ... so DeepMind's been working on StarCraft and released a bunch of cool open source agents and so on the past few years.

    2. OV

      Mm-hmm.

    3. LF

      But AlphaStar really is the moment where, uh, the first time you beat a world-class player. So, what are the parameters of the challenge in the way that AlphaStar took it on, and how did you and David and the rest of the DeepMind team get into it? Consider that you can even beat the best in the world, or top players.

    4. OV

      I think it all started in ... back in 2015. Um, actually I'm lying. I think it was 2014 when DeepMind was acquired by Google, um, and I at the time was at Google Brain, which is in, it was in California, is still in California. Uh, we had this summit where we got together, the two groups. So Google Brain and Google DeepMind got together and we gave a series of talks. And given that they were doing deep reinforcement learning for games, I decided to bring up part of my past, um, which I had developed at Berkeley, like, this thing wha- which we call Berkeley Overmind, which is really just a, a StarCraft I bot, right? So I, I talked about that, and I remember Demis just came to me and said, "Well, maybe not now. It's, it's perhaps a bit too early, but you should just come to DeepMind and do this again, uh, with deep reinforcement learning," right? And at the time, it sounded very science fiction for, for several reasons. Uh, but then in 2016 when I actually moved to London and joined DeepMind, um, transferring from Brain, it became apparent that because of the AlphaGo moment, um, and kind of Blizzard reaching out to us to say, "Wait, like, do you want the next challenge?" And also me being full-time at DeepMind, so sort of kind of all this came together.

    5. LF

      Mm.

    6. OV

      And then I was ... I went to, to Irvine in California to the Blizzard headquarters to just chat with them and try to explain how would it all work before you do anything. And the approach has always been about the learning perspective, right? So, um, in, in Berkeley, we did a lot of, uh, rule-based, you know, conditioning and, oh, if, uh, you have more than three units, then go attack, and if the other has more units than me, I retreat, and so on and so forth. And of course, the point of deep reinforcement learning, deep learning, machine learning in general is that all this should be learned behavior. So that kind of was the DNA of the project since its inception e- in 2016 where we just didn't even have an environment to work with. And so this ... that's how it all started, really.

    7. LF

      So, if you go back to that conversation with Demis, or even in your own head, how far away did you ... 'cause that's ... we're talking about Atari games.

    8. OV

      Mm-hmm.

    9. LF

      We're talking about Go, which is kind of, if you're honest about it, really far away from StarCraft in, in, uh ... well, now that you've beaten it, maybe you could say it's close, but, uh, it's, it's much ... it seems like StarCraft is way harder than Go, philosophically and, uh, mathematically speaking.

    10. OV

      Mm-hmm.

    11. LF

      Uh, so how far away did you d- d- did you think you were? Did you think-

    12. OV

      So-

    13. LF

      ... in 2019 and '18 you could be doing s- as well as you have?

    14. OV

      Yeah, when I, when I kind of thought about, okay, I'm gonna dedicate now a lot of my time and focus on this, and obviously I do a lot of different research in deep learning, so spending time on it, I mean, I really had to kind of think there's gonna be something good happening out of this. Um, so really I thought, well, this sounds impossible, and it probably is impossible to do the full thing, like, the all, like, the full game, um, where you play one versus one, um, and it's only a neural network playing and so on. Um, so it really felt like I just didn't even think it was possible.

    15. LF

      Mm-hmm.

    16. OV

      But on the other hand, I could see some stepping stones, like, towards that goal. Um, clearly you could define sub-problems in StarCraft and sort of dissect it a bit and say, "Okay, here is a part of the game, here is another part." Um, and also, obviously the fact ... so this was really also critical to me, the fact that we could access human replays, right? So Blizzard was very kind, and in fact they open sourced these for the whole community where you can just go, and it's not every single StarCraft game ever played, but it's a lot of them you can just go and download. And every day they will ... you, you can just query a dataset and say, "Well, give me all the games that were played today."

    17. LF

      Mm-hmm.

    18. OV

      And given my kind of experience with language and sequences and supervised learning, I thought, well, that's definitely gonna be very helpful and something quite unique now because-... ever before, we had such a large dataset of replays of people playing the game at this scale of such a complex video game, right? So that to me was a pre- precious resource, and as soon as I knew that Blizzard was able to kind of give this to the community, I started to feel positive about something non-trivial happening. But, but I also thought the full thing, like, really no rules, no, no single line of code that tries to say, "Well, I mean, if you see this, you need to build a detector," all these, um, not having any of these specializations seemed really, really, really difficult to me.

    19. LF

      Right, intuitively.

    20. OV

      Yeah.

    21. LF

      I do also like that Blizzard was teasing or even trolling you, uh, (laughs) uh, sort of almost, uh, yeah, pulling you in, into this really difficult challenge.

    22. OV

      Yeah.

    23. LF

      Did they have any aware, what's, what's the interest from the perspective of Blizzard, except just curiosity?

    24. OV

      So- Yeah, I think Blizzard has really understood and, and really bring, bring forward this competitiveness-

    25. LF

      Mm.

    26. OV

      ... of esports in, in games. StarCraft really kind of sparked a lot of, like, something that almost was never seen, especially as I was saying back in Korea. So they just probably thought, "Well, this is such a pure one-versus-one setup that it would be great to see, um, if something that can play Atari or Go," and then later on chess, "could, could even tackle this kind of complex real-time strategy game," right? So for them, they wanted to see first obviously whether, um, it was possible, if, if, if the game they created was in a way solvable to some extent.

    27. LF

      Mm-hmm.

    28. OV

      And I think on the other hand, they also are a, a pretty modern company that innovates a lot. So just starting to understand AI for them to how to bring AI into games is not, is not AI for games, but g- games for AI, right? I mean, the, both ways I think can work and you, we obviously at DeepMind use games for AI, right, to drive AI progress, but Blizzard might actually be able to do, and many other companies, to, to start to understand and do the opposite. So I think that is also something they can get out of this and they definitely, we have brainstormed a lot about, about this, right? Yeah.

    29. LF

      But one of the interesting things to me about StarCraft and Diablo and these games that Blizzard has created is the task of balancing classes, for example. Uh, sort of making the game fair from the, uh, starting point and then let skill determine the outcome. Uh, is there, uh, I mean, can you first comment, there's three races, Zerg, Protoss-

    30. OV

      Mm-hmm.

  7. 23:2226:53

    Why StarCraft is hard: exploration, huge action space, and partial observability

    1. LF

      I mean, so, the, you could... It's almost the same as running, uh, RL agents. What aspect of the problem of StarCraft do you think is the hardest? Is it the, like you said, the imperfect information? Is it the fact they have to do long-term planning? Is it the real-time aspects, so you have to do stuff really quickly? Is it the fact that, uh, large action space so you can do so many possible things? Or is it, uh, you know, in a game theoretic sense, there is no Nash equilibria, or at least you don't know what the optimal strategy is 'cause there's way too many options?

    2. OV

      Right.

    3. LF

      What's, is there something that stands out as just, like, the hardest, the most annoying thing?

    4. OV

      So when we sort of looked at the problem and, and start to define param- like, the parameters of it, right? What are the observations? What are the actions? Um, it became very apparent, um, that...You know, the- the very first barrier that one would hit in StarCraft would be because of the action space being so large, and us not being able to search like you could in, in chess or, or Go even though the search space is vast. Um, the main problem that we identified was that of exploration, right? So without any sort of human knowledge or human prior, if you think about StarCraft and you know how deep reinforcement learnings algorithm works, work, which is essentially by issuing random actions and hoping that they will get some wins sometimes so they could learn. So if you think of the, of the action space in StarCraft, almost anything you can do in the early game is bad, because any action involves taking workers, which are mining minerals for free, that's something that the game does automatically, sends them to mine, and you would immediately just take them out of mining and send them around. So just thinking how, how is it going to be possible to, to get to understand th- these concepts, but, but even more like expanding, right? There's, there's these buildings you can place in other locations in the map to gather more resources, but the location of the building is important, and you have to select a worker, send it walking to that location, build a building, wait for the building to be built, and then put extra workers there so they start mining. That just, that feels like impossible if you just randomly click to produce that state, desirable state that then you could hope to learn from because eventually that may yield to an extra win, right? So for me, the exploration problem and due to the action space, um, and, and the fact that there's not really turns, there are so many turns because the game essentially ticks at 22 times per second if you... I mean that's how they discretize-

    5. LF

      Mm-hmm.

    6. OV

      ... sort of time, um, obviously you always have to discretize time, but there's no, no such thing as real time. Um, but it's really a lot of time steps of things that could go wrong, and that definitely felt a priori like the hardest. Um, you mentioned many good ones, I think partial observability, um, the fact that there is no perfect strategy because of the partial observability. Those are very interesting problems we start seeing more and more now in terms of as we solve the previous ones, but the core problem to me was, uh, exploration and solving it has been basically kind of the focus on how we saw the first breakthroughs.

    7. LF

      So exploration in a, in a multi hierarchical way, so like 22 times a second, e-exploration has a very different meaning than it does in terms of should I gather resources early or should I wait or so on.

    8. OV

      Right.

  8. 26:5331:08

    Inside AlphaStar’s policy network: observations as images + units, and transformers over sets

    1. LF

      So how do you solve the long term? Let's talk about the internals of AlphaStar.

    2. OV

      Mm-hmm.

    3. LF

      So, uh, uh, w- how... First of all, how do you represent the state of the game, uh, as input.

    4. OV

      Right.

    5. LF

      How do you then do the long term sequence modeling? How do you build a policy?

    6. OV

      Right.

    7. LF

      Well, what, so what's the architecture like?

    8. OV

      So AlphaStar has obviously several components, but everything passes through what we call the policy, which is a neural network, and that's kind of the beauty of it. There is, I could just now give you a neural network and some weights, and if you fed the right observations and you understood the actions the same way we do, you would have basically the agent playing the game. There is absolutely nothing else needed other than those weights that were trained. Now the first step is observing the game, and we've experimented with a few alternatives. The one that we currently use mixes both spatial sort of images that you would process from the game, that is the zoomed out version of the, of the map, and also a zoomed in version of the camera or the screen as we call it, but also we give to the agent the list of units that it sees, um, more of as a set of objects that it can operate on.

    9. LF

      Mm-hmm.

    10. OV

      Um, that is not necessarily required to use it, and we have versions of the game that play well without this set vision that is a bit not like how humans perceive the game, but it certainly helps a lot because it's a very natural way to encode the game is by just looking at all the units that there are, their- they have properties like health, position, type of unit, whether it's my unit or the enemy's, and that sort of is kind of the, the summary of, of the state of, of the, of the game, you know, that list of units or set of units that you see all the time.

    11. LF

      But that's pretty close to the way humans see the game. Why do you say it's not... It's not... You're, you're saying the exactness of it is not-

    12. OV

      Yeah.

    13. LF

      ... similar to humans?

    14. OV

      The exactness of it is perhaps not the problem. I guess maybe the problem if you look at it from how actually humans play the game is that they play with a mouse, and a keyboard, and a screen, and they don't see sort of a structured object with all the units. What they see is what they see on the screen, right?

    15. LF

      Yeah.

    16. OV

      So if-

    17. LF

      Remember that there's a... Sorry to interrupt. There's a plot that you showed with camera based-

    18. OV

      Yeah.

    19. LF

      ... where you do exactly that, right?

    20. OV

      Right.

    21. LF

      You move around, and that seems to converge to similar performance.

    22. OV

      Yeah, I think that's what I... Uh, we're kind of experimenting with what's necessary or not, but using the set... So, so actually if you look at research in computer vision, where it makes a lot of sense to treat images as two-dimensional arrays, there's actually a very nice paper from Facebook, I think, um, I forgot who the authors are, but I think it's, um, part of Kaminga's group. And what they do is they take an image which is this two dimensional signal, and they actually take pixel by pixel and scramble the image as if it was just a list of pixels.

    23. LF

      Mm-hmm.

    24. OV

      Um, crucially they encode the position of the pixels with the XY coordinates.

    25. LF

      Mm-hmm.

    26. OV

      And this is just kind of a new architecture which we incidentally also use in StarCraft called the transformer, which is a very popular paper from last year which yielded very nice result in machine translation.And if you actually believe in this kind of, oh, it's actually a set of pixels, as long as you encode X, Y, it's okay, then you, you could argue that the list of units that we see is precisely that, because we have each unit as a kind of pixel, if you will, and then their X, Y coordinates. So in that perspective, we, without knowing it, we use the same architecture that was shown to work very well on Pascal and ImageNet and so on.

    27. LF

      So the interesting thing here is, putting it in that way, it starts to move it towards the way you usually work with language. So what (laughs) ... And especially with your expertise and, uh, uh, work in language, it seems like there's echoes of a lot of, um, the way you would work with natural language in the way you've approached AlphaStar.

    28. OV

      Right.

    29. LF

      What's (laughs) ... Uh, does that help with the long-term sequence modeling there somehow?

  9. 31:0837:43

    StarCraft as language modeling: imitation learning from replays, then self-play

    1. OV

      Exactly. So, so now that we understand what an observation for a given time step is, uh, we need to move on to say, well, there's going to be a sequence of such observations, and an agent will need to, given all that it's seen, not only the current time step, but all that it's seen. Why? Because there is partial observability. We must remember whether we saw a worker going somewhere, for instance, right? Because then there might be an expansion on the top right of the map. So given that, what you must then think about is, there is the problem of given all the observations, you have to predict the next action.

    2. LF

      Mm-hmm.

    3. OV

      And not only given all the observations, but given all the observations and given all the actions you've taken, predict the next action. And that sounds exactly like machine translation, where ... And that's exactly how kind of I saw the problem, especially when you are given supervised data or replays from humans, because the problem is exactly the same. You're translating essentially a prefix of observations and actions onto what's going to happen next, which is exactly how you would train a model to translate or to generate language as well, right? You have a, a certain prefix. You must remember everything that comes in the past, because otherwise you might start having non-coherent text. And the same architectures, um, we're using LSTMs and transformers to operate on across time to kind of integrate all that's happened in the past. Those architectures that work so well in translational language modeling are exactly the same than what the agent is using to issue actions in the game. And the way we train it, moreover, for imitation, which is step one of AlphaStar, is take all the human experience and try to imitate it, much like you try to imitate translators that translated many pairs of sentences from French to English say, that sort of principle applies exactly the same. It's you, might ... You ... It's almost the same code, except that instead of words, you have a slightly more complicated objects, which are the observations, and the actions are also a bit more complicated that, than a word.

    4. LF

      Is there a self-play component then, too? So once you run out of, uh, imitation?

    5. OV

      Right. So, so indeed, you can bootstrap from human replays, but then the agents you get are actually not as good as the humans you imitated, right? So how do we imitate? Well, we take humans from 3,000 MMR and higher. 3,000 MMR is just a metric of human skill, and 3,000 MMR might be like 50% percentile, right? So it's just kind of average human.

    6. LF

      What's the ... So maybe quick pause.

    7. OV

      Yeah.

    8. LF

      MMR is a ranking scale, the matchmaking rating-

    9. OV

      Yeah.

    10. LF

      ... for players. So it's three thous- ... I remember there's like a master and a grandmaster. What's 3,000? Is it m-

    11. OV

      So 3,000 is, is pretty bad. Um, I think it's kind of gold level. Um-

    12. LF

      It just sounds really good relative to chess, I think.

    13. OV

      Oh, yeah, yeah, no.

    14. LF

      That-

    15. OV

      The, the, the ratings, the best in the world are at 7,000 MMR.

    16. LF

      7,000.

    17. OV

      Um, so 3,000 it's, it's a bit like ELO indeed, right? So 3,500, 500 just allows us to not filter a lot of the data.

    18. LF

      Yeah.

    19. OV

      So we like to have a lot of data in deep learning as, as you probably know. So we take this kind of 3,500 and above, but then we do a very interesting trick, which is we tell the neural network what level they are imitating. So we say this replay you're going to try to imitate to predict the next action for all the actions that you're going to see is a 4,000 MMR replay. This one is a 6,000 MMR replay. And what, what's cool about this is then we take this policy that is being trained from human, and then we can ask it to play like a 3,000 MMR player by setting a beat saying, "Well, okay, play like a 3,000 MMR player or play like a 6,000 MMR player." And you actually see how the policy behaves differently. It gets worse economy if it ... you play like a gold level player. Um, it does less actions per minute, which is the number of clicks or number of actions that you will issue in a whole minute. And it's very interesting to see that it kind of imitates the skill level quite well. But if we ask it to play like a 6,000 MMR player, we tested, of course, these policies to see how well they do, they actually beat all the built-in AIs that Blizzard put in the game, but they're nowhere near 6,000 MMR players, right? They might be maybe around gold level, platinum perhaps.

    20. LF

      Mm-hmm.

    21. OV

      Um, so there's still a lot of work to be done for the policy to truly understand what it means to win. So far, we only ask them, "Okay, here is the screen and that's what's happened on the game until this point. What would the next action be if we ask, you know, we ask a pro to now say, 'Oh, this, you're gonna click here or here or there'?" And the point is experiencing, experiencing wins and losses is very important to then start to refine. Uh, otherwise the policy can get loose, can, can just go off policy, as we call it.

    22. LF

      That's so interesting that you can at least hope eventually to be able to control a policy, uh, approximately to be at, at some MMR level. That's, that's so interesting, especially given that you have ground truth for a lot of these cases.

    23. OV

      Right.

    24. LF

      Uh, can I ask you a personal question?

    25. OV

      Sure.

    26. LF

      Uh, what's your MMR?

    27. OV

      Well-

    28. LF

      (laughs)

    29. OV

      ... I haven't played StarCraft II, so I am unranked. (laughs)

    30. LF

      Oh, so it's-

  10. 37:4345:09

    Human-likeness constraints: perception quirks, APM limits, and fairness debates

    1. LF

      So what's the difference in interface between AlphaStar and StarCraft and a human player in StarCraft? Is there any significant differences between the way they both see the game?

    2. OV

      I would say the way they see the game, there's a few things that are just very hard to simulate. Um, the main one perhaps, which is obvious in hindsight, is, um, what's called cloaked units, which are invisible units. So in, in StarCraft, you can make some units that you need to have a particular kind of unit to detect it.

    3. LF

      Mm-hmm.

    4. OV

      So these units are invisible. Um, if you cannot detect them, you cannot target them, so they would just, you know, destroy your buildings or kill your workers. But despite the fact you cannot target the unit, there's a shimmer that as a human you observe.

    5. LF

      Mm-hmm.

    6. OV

      I mean, you need to train a little bit. You need to pay attention, but you would see this kind of space-time-

    7. LF

      Oh, yeah.

    8. OV

      ... space-time, like, distortion, and you would know, okay, there are...

    9. LF

      Space-time. (laughs)

    10. OV

      Yeah. (laughs)

    11. LF

      Yeah, there's, like, a wave thing you could say, yeah.

    12. OV

      Yeah, there's a... It's called shimmer. That's-

    13. LF

      Space-time distortion. I like it. (laughs)

    14. OV

      That's really, like... The B- the Blizzard term is shimmer.

    15. LF

      Shimmer. Okay.

    16. OV

      And so this shimmer, um, professional players actually can see it immediately. They, they understand it very well. Uh, but it's still something that requires certain amount of attention, and, and, and it's kind of a bit annoying to deal with, whereas for AlphaStar, in terms of vision, it's very hard for us to simulate sort of, oh, you know, are, are you looking at this pixel in the screen? And so on. So...

    17. LF

      Interesting.

    18. OV

      Um, the only thing we can do is we... There is a unit that's invisible over there, so AlphaStar would know that immediately. Um, obviously still obeys the rules. You cannot attack the unit. You must have a detector and so on. But it's, it's kind of one of the main things that it just doesn't feel there- there's a very proper way. I mean, you could imagine, oh, you, you don't have high pre- maybe you don't know exactly where it is, or sometimes you see it, sometimes you don't. But it's, it's just really, really complicated to, to, to get it so that everyone would agree, oh, that's, that's the best way to simulate this, right?

    19. LF

      You know, it seems like a perception problem.

    20. OV

      It is a perception problem. So, so the only problem is people... Or you ask, "Oh, what's the difference between how humans perceive the game?"

    21. LF

      Yeah.

    22. OV

      I would say they wouldn't be able to tell a shimmer immediately as it appears on the screen, whereas AlphaStar in principle sees it very sharply, right? It sees-

    23. LF

      Okay.

    24. OV

      ... it sees that the beat-

    25. LF

      I got it.

    26. OV

      ... turned from zero to one, meaning there's now a unit there.

    27. LF

      Got it.

    28. OV

      Although you don't know the unit or you don't know it, you know, you know that you cannot attack it and so on.

    29. LF

      Got it.

    30. OV

      Um, so that... From, from a vision standpoint, that probably is, is the one that is kind of the most obvious one. Then there are things humans cannot do perfectly, even professionals, which is they might miss a detail or they might have not seen a unit. And obviously, as a computer, if there's a corner of the screen that turns green because a, a unit enters the field of view, that can go into the memory of the agent, the LSTM, and persist there for a while and-

  11. 45:0956:03

    Strategy and metagame: openings, scouting beliefs, cheese/all-ins, and the AlphaStar League

    1. LF

      That's really interesting. That's really innovative. So, one of the constraints you put, uh, on, on your, uh, on yourself, or at least focused in, is on the Protoss race, as far as I understand.

    2. OV

      Yeah.

    3. LF

      Uh, can you tell me about the different races and how they... uh, so Protoss, Terran, and Zerg. How do they compare? How do they interact? Why did you choose Protoss? Is-

    4. OV

      Right.

    5. LF

      Yeah. Is ... in the dynamics of the game-

    6. OV

      Mm-hmm.

    7. LF

      ... seen from a strategic perspective.

    8. OV

      So, Protoss ... Um, so in, in StarCraft, there are three races. Um, indeed in the demonstration we saw only the Protoss race, so maybe let's start with that one. Protoss is kind of the most technologically advanced race. Um, it has units that are expensive, but powerful, right? So, in general, you wanna kind of conserve your units, um, as you go attack. So, you wanna ... And, and then you wanna utilize, um, these tactical advantages of very fancy spells and so on and so forth. Um, and at the same time, uh, they're kind of ... People say, like, they're, they're a bit easier to play perhaps, right? But that I actually didn't know. I mean, I, I just talked to now a lot to the players that we, we work with, TLO and Mana, and they said, "Oh, yeah, Protoss is actually ... people think is, is actually one of the easiest races." So, perhaps the easier, that doesn't mean that it's ... You know, obviously professional players excel at the three races, and there's never like a race that dominates, um, for a very long time anyways.

    9. LF

      So, if you look at the top, I don't know, 100 in the world-

    10. OV

      Mm-hmm.

    11. LF

      ... is there one race that dominates that list?

    12. OV

      It would be hard to know, because it depends on the regions. I think, um, it's pretty equal in terms of distribution, and Blizzard wants it to be equal, right? They don't want, they wouldn't want, um, uh, o- one race like Protoss to not be representative in the top place.

    13. LF

      Right.

    14. OV

      Um, so definitely, like, they tried it to be, like, balanced, right? So then maybe the opposite race of Protoss is Zerg. Zerg is a race where you just kind of expand and take over as many resources as you can, and they have a very high capacity to regenerate their units. So, if you have an army, it's not that valuable in terms of ... losing the whole army is not a big deal as Zerg, because you can then rebuild it. And given that you generally accumulate a, a huge bank of resources, um, Zergs typically play by ap- pr- applying a lot of pressure, maybe losing their, their whole army, but then rebuilding it quickly.

    15. LF

      Mm-hmm.

    16. OV

      So, um, although of course, every race ... I mean, there's never, um ... I mean, they're pretty diverse. I mean, there are some units in Zerg that are technologically advanced, and they do some very interesting spells, and there's some units in Protoss that are less valuable, and you could lose a lot of them and rebuild them, and it wouldn't be a big deal.

    17. LF

      All right. So, uh, may- maybe I'm missing out, maybe I'm gonna say some dumb stuff, but, uh, so s- summary of strategy.

    18. OV

      Mm-hmm.

    19. LF

      So first, there's collection of a lot of resources.

    20. OV

      Right.

    21. LF

      So, that's one option. The other one is exp- uh, expanding, so building other bases.

    22. OV

      Mm-hmm.

    23. LF

      Then the other is obviously attack abil- building units and attacking with those units.

    24. OV

      Mm-hmm.

    25. LF

      And then, uh, I don't know what, what else there is. Uh-... maybe there's, uh, the different timing of attacks, like, do an attack early, attack late.

    26. OV

      Right.

    27. LF

      What are the different strategies that emerged that you've learned about? I've, I've read that-

    28. OV

      Yeah.

    29. LF

      ... a bunch of people are super happy that you guys have apparently... that Alpha Pa- uh, Star apparently has discovered that it's really good to, uh, what is it? Saturate-

    30. OV

      Oh, yeah. The, the mi- the mineral line.

  12. 56:031:07:00

    Beating pros and what it really means: limitations, excitement, and next StarCraft steps

    1. LF

      That's fascinating, and where they stick to their different strategies, yeah, that, wow, that's, that's really, really interesting. So, but that said, you, uh, uh, were fortunate enough or just skilled enough to win five-zero. Uh, and so how hard is it to win? I mean, that's not the goal, I guess. I don't know what the goal is. The goal should be to win majority.

    2. OV

      Mm-hmm.

    3. LF

      Not five-zero, but how hard is it in general to win, uh, all match ups on a one v. one?

    4. OV

      So that's a very interesting question because once you see AlphaStar and superficially you think, well, okay, it won. Let's see if you sum all the games like ten to one, right? It lost the game that it played with the camera interface. You might think, well that's, that's done, right? There's, it's, it's superhuman at the game and that's not really the claim we really can make actually. The claim is we beat a professional gamer for the first time. Um, StarCraft has really been a thing that's been going on for a few years, but moments, a moment like this hasn't, not occurred, had not occurred before yet. But are these agents impossible to beat? Absolutely not, right? So that's a bit what's, uh, you know, kind of the, the difference is the agents play at grandmaster level. They definitely understand the game enough to play extremely well. Um, but are they unbeatable? Do they play perfect?

    5. LF

      Mm-hmm.

    6. OV

      Um, no, and actually in StarCraft, because of these sneaky strategies, it's always possible that you might take a huge risk sometimes, but you might get wins, right, out of-

    7. LF

      Right.

    8. OV

      ... out of this. So I think that as a domain, um, it still has a lot of opportunities, not only because of course we want to learn with less experience, we would like to, I mean if I, if I learned to play Protoss, I can play Terran and learn it much quicker than AlphaStar can, right? So there are obvious interesting research challenges as well. But even as, as the raw, like as the raw performance goes, um, really the claim here can be we are at pro level or at, at grand- high grandmaster level. Um, but obviously, um, the players also did not know what to expect, right? This kind of, their prior distribution-

    9. LF

      Yeah.

    10. OV

      ... was a bit off because they played this kind of new like alien brain as, as they like to say it, right? And that's what makes it exciting for them. But also I think if you look at the games closely, you see there were weaknesses in some points. Um, maybe AlphaStar did not scout or if it had, had invisible units going against at certain points, it wouldn't have known and it would have been bad. So there's still quite a lot of work to do. Um, but it's really an, a very, very exciting moment for us to be seeing, wow, a single neural net on a GPU is actually playing against these guys who are amazing. I mean, you have to see them play in life. They're really, really amazing players.

    11. LF

      Yeah, I'm sure there's, there's, uh, there must, there must be a, uh, a guy in Poland somewhere right now training his butt off-

    12. OV

      (laughs)

    13. LF

      ... to make sure that this never happens again with AlphaStar. So that's really exciting in terms of AlphaStar having some holes to exploit.

    14. OV

      Yeah.

    15. LF

      Which is great. And then you build on top of each other and it feels like StarCraft, unlike Go, even if you win, it's still not there, it's still not... there's so many different dimensions in which you can explore. So that's really, really interesting. Do you think there's a ceiling to AlphaStar? Y- y- you've said that it hasn't reached, you know, it's, this is a big... wait, wha- y- you know, let me actually just pause for a second.

    16. OV

      Yeah.

    17. LF

      How did it feel (laughs) to, to come here to this point to, to beat a top professional player? Like that night, I mean, you know, Olympic athletes have their gold medal.

    18. OV

      Yeah.

    19. LF

      Right? This is your gold medal in a sense. Sure, you're cited a lot, you've published a lot of prestigious-

    20. OV

      Yeah.

    21. LF

      ... papers, whatever. But this is like a win. Uh, how did it feel?

    22. OV

      I mean it was, for me it was unbelievable. Um, because first the win itself, I mean it was so exciting. I mean the... so looking back to those last days of 2018, really where that's when the games were played. Um, I'm sure I, I look back at that moment, I'll say, "Oh my God, I, I want to be like in a project like that." It's like I already feel the nostalgia of like, yeah, that was huge in terms of the energy and the team effort that went into it. And so in that sense... as, as soon as it happened I, I already knew it was kind of, I was losing it a little bit, so it, it's almost like sad that it happened, and, "Oh my god," like. But on the other hand, it also verifies the approach. Um, but to me also, there's so many challenges and interesting aspects of intelligence that even though we can train a neural network to play at the level of the best humans, um, there's still so many challenges. So for me it's also like, well, this is really an amazing achievement, but I already was also thinking about next steps. I mean, as I said, these agents play Protoss versus Protoss, um, but they should be able to play a different race much quicker, right? So that would be a- an amazing achievement. Some people, um, call this meta-reinforcement learning, meta-learning and so on, right? So there's so many possibilities after that moment, but the moment itself, it really felt great. Um, it's, I, we had this bet, so, so I, I'm kind of a pessimist in general, so I, I, I kind of sent an email to the team and said, "Okay, let's ... against TLO first," right? Like, what's gonna be the result? And I really thought we would lose, like, 5-0, right? I, I, I, we had some calibration made against the 5,000 MMR player. Um, TLO was much stronger than that player, even if he played Protoss, which is his off race. Um, but yeah, I was not imagining we would win, so for me, that was just kind of a test run or something, and then it really kind of... he was really surprised. And unbelievably, we went to this, to this bar to celebrate, and, and Dave tells me, "Well, why, why don't we invite someone who is 1,000 MMR stronger in Protoss?" Like, a actual Protoss player, like, like that, uh, it turned out being Mana, right?

    23. LF

      Mm-hmm.

    24. OV

      And, you know, we had some drinks and I said, "Sure, why not?" But then I thought, "Well, that's really gonna be impossible to beat." I mean, even e- because it, it's, it's so much ahead. 1,000 MMR is really like 99% probability that Mana would beat TLO, uh, as Protoss versus Protoss, right? So we did that, and to me, the second, the second game was much more important even though a lot of uncertainty kind of disappeared after we, we kind of beat TLO. I mean, it, he is a professional player, so that was kind of, oh, that's really a, a very nice achievement. But Mana really was at the top, and you could see he played much better, but our agents got much better, too. So it's like, ah, it's like... And then after the first game, I said, "If we take a single game, at least we can say we beat a game, I mean, even if we don't beat the series." For me, that was a huge relief. And I mean, I remember hugging Demis and I mean, it was, it was really like... This moment for me will resonate forever as a researcher and, I mean, as a person, and, uh, yeah. It's a really, like, great accomplishment. And it was great also to be there with the team in the room. I don't know if you saw, like, the, the sum-

    25. LF

      Yeah, yeah

    26. OV

      ... so it was really like-

    27. LF

      I mean, from my perspective-

    28. OV

      ... yeah.

    29. LF

      ... the other interesting thing is just like watching Kasparov, uh, watching Mana, uh, was also interesting because he did, he was kind of a, a loss of words. I mean, whenever you lose, I've done a lot of sports-

    30. OV

      Mm-hmm.

  13. 1:07:001:19:10

    Beyond games: Turing test realism, deep learning’s limits, and hybrid symbolic/program approaches

    1. LF

      Let me go old school and jump to the ... to, uh, Mr. Alan Turing.

    2. OV

      Yeah.

    3. LF

      Uh, so the Turing test, you know, is, is a natural language test, a conversational test.

    4. OV

      Mm-hmm.

    5. LF

      What's your thought of it as a test for intelligence? Uh, do you think it is a grand challenge that's worthy of undertaking? Maybe if it is, would you reformulate it or phrase it somehow differently?

    6. OV

      Right. So I, I really love the Turing test because I also like sequences and language understanding. And in fact, some of the early work we did in machine translation, we tried to appli- to ... apply to kind of a neural chatbot, um, which obviously would never pass the Turing test because it was very limited. Um, but it is a very fascina- fascinating idea that, um, you could really have an AI that would be indistinguishable from humans in terms of asking or conver- conversing with, with it, right? So, I think the test itself seems very nice and it's kind of well-defined actually, like the, the passing it or not, I think there's quite a few rules that feel like pretty simple and, and, and ex- you know, you could, you could really like have ... I mean, and I think they have these competitions every year.

    7. LF

      Yes, so the Loebner Prize, but I don't know if you've seen ... I, I, uh, I, I don't know if you've seen the kind of, uh, bots that emerge-

    8. OV

      Right.

    9. LF

      ... from that competition. They're not quite as, uh, what you would, uh ... so it feels like that there's weaknesses with the way-

    10. OV

      Right.

    11. LF

      ... Turing formulated it. It needs to be m- uh, that the definition of a genuine, rich, fulfilling human conversation-

    12. OV

      Mm-hmm.

    13. LF

      ... needs to be something else, like, uh, the Alexa Prize, which I'm not as well familiar with-

    14. OV

      Mm-hmm.

    15. LF

      ... has tried to define that more, I think by saying you have to continue keeping a conversation for 30 minutes, something like that-

    16. OV

      Right.

    17. LF

      Uh, so basically forcing the agent not to just fool, but to have an engaging-

    18. OV

      Right.

    19. LF

      ... conversation kind of thing. Is that, um ... I mean, i- th- is, is this ... have you thought about this problem richly? Like has ... and if, if you have in general, how, how far away are we from-

    20. OV

      Right.

    21. LF

      ... you worked a lot on language-

    22. OV

      Mm-hmm.

    23. LF

      ... uh, understanding language generation, but the full dialogue, the conversation-

    24. OV

      Right.

    25. LF

      ... you know, just sitting at the bar having a couple of beers for an hour-

    26. OV

      Mm-hmm.

    27. LF

      ... that kind of conversation. Have you thought about it ? How far is it?

    28. OV

      Yeah. So, I think you touch here on the critical point, which is feasibility, right? So, so there's, there's a, a great sort of essay by Hamming which describes sort of grand challenges of physics. And he argues that, well, okay, for instance, teleportation or time travel are great grand challenges of physics, but there's no attacks.

    29. LF

      Mm-hmm.

    30. OV

      We really don't know or cannot kind of make any progress, so that's why most phys- physicists and so on, they don't work on this in their PhDs and, and, and as part of their careers. So I see the Turing test as a ... in the full Turing test as a bit still too early. Like I, I am ... I think we're ... especially with the current trend of deep learning language models, uh, we've seen some amazing examples, I think GPT-2 being the most, uh, recent one, which is very impressive. But to understand, to fully solve passing or fooling a human to think that you're ... that there's a human on the other side, I think we're quite far. So as a result, I don't see myself and I probably would not recommend, uh, people doing a PhD on solving the Turing test because it just feels it's kind of too early or too hard of a problem.

  14. 1:19:101:30:43

    Oriol’s broader research philosophy: seq2seq across modalities, long-horizon tricks, and planning research

    1. LF

      Right. Uh, so there's a bunch of different ideas that, that, um, that you've developed that I really enjoy, so by one, one is, um, translating from... so image captioning.

Episode duration: 1:46:00

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode Kedt2or9xlo

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.