Lex Fridman PodcastOriol Vinyals: DeepMind AlphaStar, StarCraft, and Language | Lex Fridman Podcast #20
EVERY SPOKEN WORD
150 min read · 30,006 words- 0:00 – 15:00
The following is a…
- LFLex Fridman
The following is a conversation with Ariel Viñales. He's a senior research scientist at Google DeepMind, and before that, he was at Google Brain and Berkeley. His research has been cited over 39,000 times. He's truly one of the most brilliant and impactful minds in the field of deep learning. He's behind some of the biggest papers and ideas in AI, including sequence-to-sequence learning, audio generation, image captioning, neural machine translation, and of course, reinforcement learning. He's a lead researcher of the AlphaStar project, creating an agent that defeated a top professional at the game of StarCraft. This conversation is part of the Artificial Intelligence podcast. If you enjoy it, subscribe on YouTube, iTunes, or simply connect with me on Twitter @lexfridman, spelled F-R-I-D. And now here's my conversation with Ariel Viñales. You spearheaded the DeepMind team behind AlphaStar that recently beat a, uh, top professional player at StarCraft. So, you have an incredible wealth of work in deep learning and a bunch of fields, but let's talk about StarCraft first. Let's go back to the very beginning, even before AlphaStar, before DeepMind, before deep learning. First, what came, uh, first for you, a love for programming or a love for video games?
- OVOriol Vinyals
I think for me, it definitely came first the drive to play video games. I really liked computers. I didn't really code much, but what I would do is I would just mess with the computer, break it and fix it. That was the level of skills, I guess, that I gained in my very early days, I mean, when I was 10 or 11. Um, and then I- I really got into video games, especially StarCraft, actually, the first version. I spent most of my time just playing kind of pseudo-professionally, as professionally as you could play back in '98 in Europe, which was not a very main scene like the- what's called nowadays eSports.
- LFLex Fridman
Right. Of course, in the '90s. So, uh, how'd you get into StarCraft? What- what was your favorite race? How- how do you develop- how did you develop your skill? What- what was your strategy? All that kind of thing.
- OVOriol Vinyals
So as a player, I tended to try to play not many games, not to kind of disclose the strategies that I kind of developed, and I like to play random, actually. Not in competitions, but just to... I- I think in StarCraft there's, well, there's three main races, and I found it very useful to play with all of them. Um, so I would choose random many times, even sometimes in tournaments to gain skill on the three races, because it's not how you play against someone, but also if you understand the race because you play it, you also understand what's annoying, what... Then when you're on the other side, what to do to annoy that person, to try to gain advantages here and there and so on. So, I actually played random. Although I must say, in terms of favorite race, I really like Zerg. Um, I was probably best at Zerg, um, and that's probably what I tend to use towards the end of my career before starting university.
- LFLex Fridman
So, let's step back a little bit. Could you try to describe StarCraft to people that may never have played video games, especially the massively online variety-
- OVOriol Vinyals
Right.
- LFLex Fridman
... like StarCraft?
- OVOriol Vinyals
So StarCraft is a real-time strategy game, and the way to think about StarCraft perhaps if you understand a bit, um, chess, is that there are... There's a board-
- LFLex Fridman
Mm-hmm.
- OVOriol Vinyals
... which is called map or- or- or, like, yeah, like the map where people play against each other. There's obviously many ways you can play, but the most interesting one is the one-versus-one setup-
- LFLex Fridman
Mm-hmm.
- OVOriol Vinyals
... where you just play against, um, someo- someone else or even the built-in AI, right? The... Blizzard put, uh, a system that can play the game reasonably well if you don't know how to play. And then in this board, you have, again, pieces like in chess, but these pieces are not there initially like they are in chess. You actually need to decide to gather resources to decide which pieces to build, so in a way you're starting almost with no pieces. You start gathering resources. In StarCraft, there's minerals and gas that you, you can gather, and then you must decide how much do you want to focus, for instance, on gathering more resources or starting to build units or pieces. And then once you have enough pieces or maybe, like, attack, um, you know, a- a good attack composition, then you go and attack, um, the other side of the map. And now the other main difference with chess is that you don't see the other side of the map, so you're not seeing the moves of the enemy. Um, it's what we call partially observable. So as a result, you must not only decide trading off economy versus building your own units, but you also must decide whether you want to scout to gather information, um, but also by scouting you might be giving away some information that you might- might be hiding from the enemy. So, there's a lot of complex decision-making, um, all in real time. There's also, unlike chess, this is not a turn-based, um, game. You play basically all the time continuously, and thus some skill in terms of speed and accuracy of clicking is also very important, and people that train for this really play this game at- at an, um, amazing skill level. I've seen many times this, and if you can witness this live, it's really, really impressive. Um, so in a way it's kind of a chess where you don't see the other side of the board, you're building your own pieces, and you also need to gather resources to basically get some money to build other buildings, pieces, technology, and so on.
- LFLex Fridman
From the perspective of the human player, the difference between that and chess, or maybe that and, uh, a game like turn-based strategy like Heroes of Might and Magic-
- OVOriol Vinyals
Mm-hmm.
- LFLex Fridman
... is that there's an anxiety 'cause y- you have to make these decisions really quickly, and- and, uh, if you are not actually...... uh, aware of what decisions work. It, it's a very stressful balance that you have to ... Everything you described is actually quite stressful, difficult to balance for a amateur human player. I don't know if it gets easier at the professional level, like if they're fully aware of what they have to do, but at the amateur level there's this anxiety, "Oh, crap. I'm being attacked. Oh, crap. I have to, uh, build up resources. Oh, I have to probably expand." And all these, uh, the time, the, uh, realtime strategy aspect is really stressful and computation, I'm sure, difficult. We'll get into it, but, uh, for me, Battle.net ... Uh, s- so StarCraft was released in, uh, '98, t- 20 years ago, which is hard to believe. Uh, and, uh, uh, Blizzard Battle.net with, uh, Diablo in '96 came out. And to me, I ... It might be a narrow perspective, but it changed online gaming and perhaps society forever. (laughs)
- OVOriol Vinyals
Yeah.
- LFLex Fridman
But I, I may have way too narrow a, a viewpoint but from your perspective, uh, can you talk about the history of gaming over the past 20 years? Is this ... How transformational, how important is this, uh, line of games?
- OVOriol Vinyals
Right. So, I think I, I kind of was an active gamer whilst this was developing, the internet, um, online gaming. So for me the, the way it came was I played other games strategy-related.
- LFLex Fridman
Yeah.
- OVOriol Vinyals
I played a bit of Command & Conquer, and then I played Warcraft II, which is from Blizzard. But at the time, I didn't know. I didn't understand about what Blizzard was or anything. Warcraft II was just a game which is, which was actually very similar to StarCraft in many ways. It's also a realtime strategy game, um, where there's orcs and humans, so there's only two races.
- LFLex Fridman
But it was offline.
- OVOriol Vinyals
And it was offline, right?
- LFLex Fridman
Yeah.
- OVOriol Vinyals
So, I remember a friend of mine came to, to school saying, "Oh, there's this new cool game called StarCraft." And I just said, "Oh, this sounds like just a copy of Warcraft II," um, until I kind of installed it. And at the time, um, I, I am from Spain so we didn't have inter- like, very good internet, right? So there was ... For us, StarCraft became first kind of an offline experience where you kind of start to play these missions, right? You play against some sort of scripted things, uh, to, to de- develop the story of the characters in the game.
- LFLex Fridman
Yeah.
- OVOriol Vinyals
Um, and then later on I started playing against the built-in AI, um, and I thought it was impossible to defeat it. Then eventually you defeat one and you can actually play against seven built-in AIs at the same time, which also felt impossible. But actually, it's not that hard to beat seven built-in AIs at once. So once we achieved that, also we discovered that we could play ... Um, as I said, internet wasn't that great, but we could play with a LAN, right? On, like basically against each other if we were in the same place, because you could just connect machines with, like, cables, right? Um, so we started playing in LAN mode and w- against, you know, as a group of friends, and it was really, really, like, much more entertaining than playing against AIs.
- LFLex Fridman
Mm-hmm.
- OVOriol Vinyals
And later on, as internet was starting to develop and being a bit faster and more reliable, then it's when I started experiencing Battle.net, which is this amazing universe not only because of the fact that you can play the game against anywhere, anyone in the world, but you can also get to know more people. You just get exposed to now, like, this vast variety of ... It's kind of a bit when the chats came about, right? There's, there's a ... There was a chat system. You could play against people, but you could also chat with people not only about StarCraft but about anything. And that became a way of life for kind of two years, um, a- a- and, and obviously then it became like kind of it exploded in mean that I started to play more seriously, going to tournaments and so on and so forth.
- LFLex Fridman
Do you have a sense on a societal, sociological level what's this whole part of society that many of us are not aware of? And it's a huge part of society, which is gamers. I mean, um, uh, every time I come across that in YouTube or streaming sites, I mean, this is a hu- the huge number of people play games r- religiously. Do you have a sense of those folks, especially now that you've returned to that realm a little bit on the AI side?
- OVOriol Vinyals
Yeah. So, uh, in fact, I, uh, even after StarCraft, I actually played World of Warcraft, which is maybe the-
- 15:00 – 30:00
Mm. …
- OVOriol Vinyals
when I actually moved to London and joined DeepMind, um, transferring from Brain, it became apparent that because of the AlphaGo moment, um, and kind of Blizzard reaching out to us to say, "Wait, like, do you want the next challenge?" And also me being full-time at DeepMind, so sort of kind of all this came together.
- LFLex Fridman
Mm.
- OVOriol Vinyals
And then I was ... I went to, to Irvine in California to the Blizzard headquarters to just chat with them and try to explain how would it all work before you do anything. And the approach has always been about the learning perspective, right? So, um, in, in Berkeley, we did a lot of, uh, rule-based, you know, conditioning and, oh, if, uh, you have more than three units, then go attack, and if the other has more units than me, I retreat, and so on and so forth. And of course, the point of deep reinforcement learning, deep learning, machine learning in general is that all this should be learned behavior. So that kind of was the DNA of the project since its inception e- in 2016 where we just didn't even have an environment to work with. And so this ... that's how it all started, really.
- LFLex Fridman
So, if you go back to that conversation with Demis, or even in your own head, how far away did you ... 'cause that's ... we're talking about Atari games.
- OVOriol Vinyals
Mm-hmm.
- LFLex Fridman
We're talking about Go, which is kind of, if you're honest about it, really far away from StarCraft in, in, uh ... well, now that you've beaten it, maybe you could say it's close, but, uh, it's, it's much ... it seems like StarCraft is way harder than Go, philosophically and, uh, mathematically speaking.
- OVOriol Vinyals
Mm-hmm.
- LFLex Fridman
Uh, so how far away did you d- d- did you think you were? Did you think-
- OVOriol Vinyals
So-
- LFLex Fridman
... in 2019 and '18 you could be doing s- as well as you have?
- OVOriol Vinyals
Yeah, when I, when I kind of thought about, okay, I'm gonna dedicate now a lot of my time and focus on this, and obviously I do a lot of different research in deep learning, so spending time on it, I mean, I really had to kind of think there's gonna be something good happening out of this. Um, so really I thought, well, this sounds impossible, and it probably is impossible to do the full thing, like, the all, like, the full game, um, where you play one versus one, um, and it's only a neural network playing and so on. Um, so it really felt like I just didn't even think it was possible.
- LFLex Fridman
Mm-hmm.
- OVOriol Vinyals
But on the other hand, I could see some stepping stones, like, towards that goal. Um, clearly you could define sub-problems in StarCraft and sort of dissect it a bit and say, "Okay, here is a part of the game, here is another part." Um, and also, obviously the fact ... so this was really also critical to me, the fact that we could access human replays, right? So Blizzard was very kind, and in fact they open sourced these for the whole community where you can just go, and it's not every single StarCraft game ever played, but it's a lot of them you can just go and download. And every day they will ... you, you can just query a dataset and say, "Well, give me all the games that were played today."
- LFLex Fridman
Mm-hmm.
- OVOriol Vinyals
And given my kind of experience with language and sequences and supervised learning, I thought, well, that's definitely gonna be very helpful and something quite unique now because-... ever before, we had such a large dataset of replays of people playing the game at this scale of such a complex video game, right? So that to me was a pre- precious resource, and as soon as I knew that Blizzard was able to kind of give this to the community, I started to feel positive about something non-trivial happening. But, but I also thought the full thing, like, really no rules, no, no single line of code that tries to say, "Well, I mean, if you see this, you need to build a detector," all these, um, not having any of these specializations seemed really, really, really difficult to me.
- LFLex Fridman
Right, intuitively.
- OVOriol Vinyals
Yeah.
- LFLex Fridman
I do also like that Blizzard was teasing or even trolling you, uh, (laughs) uh, sort of almost, uh, yeah, pulling you in, into this really difficult challenge.
- OVOriol Vinyals
Yeah.
- LFLex Fridman
Did they have any aware, what's, what's the interest from the perspective of Blizzard, except just curiosity?
- OVOriol Vinyals
So- Yeah, I think Blizzard has really understood and, and really bring, bring forward this competitiveness-
- LFLex Fridman
Mm.
- OVOriol Vinyals
... of esports in, in games. StarCraft really kind of sparked a lot of, like, something that almost was never seen, especially as I was saying back in Korea. So they just probably thought, "Well, this is such a pure one-versus-one setup that it would be great to see, um, if something that can play Atari or Go," and then later on chess, "could, could even tackle this kind of complex real-time strategy game," right? So for them, they wanted to see first obviously whether, um, it was possible, if, if, if the game they created was in a way solvable to some extent.
- LFLex Fridman
Mm-hmm.
- OVOriol Vinyals
And I think on the other hand, they also are a, a pretty modern company that innovates a lot. So just starting to understand AI for them to how to bring AI into games is not, is not AI for games, but g- games for AI, right? I mean, the, both ways I think can work and you, we obviously at DeepMind use games for AI, right, to drive AI progress, but Blizzard might actually be able to do, and many other companies, to, to start to understand and do the opposite. So I think that is also something they can get out of this and they definitely, we have brainstormed a lot about, about this, right? Yeah.
- LFLex Fridman
But one of the interesting things to me about StarCraft and Diablo and these games that Blizzard has created is the task of balancing classes, for example. Uh, sort of making the game fair from the, uh, starting point and then let skill determine the outcome. Uh, is there, uh, I mean, can you first comment, there's three races, Zerg, Protoss-
- OVOriol Vinyals
Mm-hmm.
- LFLex Fridman
... and Terran? I don't know if I've ever said that out loud. Is that how you pronounce it, Terran?
- OVOriol Vinyals
Yeah, Terran.
- LFLex Fridman
(laughs)
- 30:00 – 45:00
Mm-hmm. …
- OVOriol Vinyals
encode the position of the pixels with the XY coordinates.
- LFLex Fridman
Mm-hmm.
- OVOriol Vinyals
And this is just kind of a new architecture which we incidentally also use in StarCraft called the transformer, which is a very popular paper from last year which yielded very nice result in machine translation.And if you actually believe in this kind of, oh, it's actually a set of pixels, as long as you encode X, Y, it's okay, then you, you could argue that the list of units that we see is precisely that, because we have each unit as a kind of pixel, if you will, and then their X, Y coordinates. So in that perspective, we, without knowing it, we use the same architecture that was shown to work very well on Pascal and ImageNet and so on.
- LFLex Fridman
So the interesting thing here is, putting it in that way, it starts to move it towards the way you usually work with language. So what (laughs) ... And especially with your expertise and, uh, uh, work in language, it seems like there's echoes of a lot of, um, the way you would work with natural language in the way you've approached AlphaStar.
- OVOriol Vinyals
Right.
- LFLex Fridman
What's (laughs) ... Uh, does that help with the long-term sequence modeling there somehow?
- OVOriol Vinyals
Exactly. So, so now that we understand what an observation for a given time step is, uh, we need to move on to say, well, there's going to be a sequence of such observations, and an agent will need to, given all that it's seen, not only the current time step, but all that it's seen. Why? Because there is partial observability. We must remember whether we saw a worker going somewhere, for instance, right? Because then there might be an expansion on the top right of the map. So given that, what you must then think about is, there is the problem of given all the observations, you have to predict the next action.
- LFLex Fridman
Mm-hmm.
- OVOriol Vinyals
And not only given all the observations, but given all the observations and given all the actions you've taken, predict the next action. And that sounds exactly like machine translation, where ... And that's exactly how kind of I saw the problem, especially when you are given supervised data or replays from humans, because the problem is exactly the same. You're translating essentially a prefix of observations and actions onto what's going to happen next, which is exactly how you would train a model to translate or to generate language as well, right? You have a, a certain prefix. You must remember everything that comes in the past, because otherwise you might start having non-coherent text. And the same architectures, um, we're using LSTMs and transformers to operate on across time to kind of integrate all that's happened in the past. Those architectures that work so well in translational language modeling are exactly the same than what the agent is using to issue actions in the game. And the way we train it, moreover, for imitation, which is step one of AlphaStar, is take all the human experience and try to imitate it, much like you try to imitate translators that translated many pairs of sentences from French to English say, that sort of principle applies exactly the same. It's you, might ... You ... It's almost the same code, except that instead of words, you have a slightly more complicated objects, which are the observations, and the actions are also a bit more complicated that, than a word.
- LFLex Fridman
Is there a self-play component then, too? So once you run out of, uh, imitation?
- OVOriol Vinyals
Right. So, so indeed, you can bootstrap from human replays, but then the agents you get are actually not as good as the humans you imitated, right? So how do we imitate? Well, we take humans from 3,000 MMR and higher. 3,000 MMR is just a metric of human skill, and 3,000 MMR might be like 50% percentile, right? So it's just kind of average human.
- LFLex Fridman
What's the ... So maybe quick pause.
- OVOriol Vinyals
Yeah.
- LFLex Fridman
MMR is a ranking scale, the matchmaking rating-
- OVOriol Vinyals
Yeah.
- LFLex Fridman
... for players. So it's three thous- ... I remember there's like a master and a grandmaster. What's 3,000? Is it m-
- OVOriol Vinyals
So 3,000 is, is pretty bad. Um, I think it's kind of gold level. Um-
- LFLex Fridman
It just sounds really good relative to chess, I think.
- OVOriol Vinyals
Oh, yeah, yeah, no.
- LFLex Fridman
That-
- OVOriol Vinyals
The, the, the ratings, the best in the world are at 7,000 MMR.
- LFLex Fridman
7,000.
- OVOriol Vinyals
Um, so 3,000 it's, it's a bit like ELO indeed, right? So 3,500, 500 just allows us to not filter a lot of the data.
- LFLex Fridman
Yeah.
- OVOriol Vinyals
So we like to have a lot of data in deep learning as, as you probably know. So we take this kind of 3,500 and above, but then we do a very interesting trick, which is we tell the neural network what level they are imitating. So we say this replay you're going to try to imitate to predict the next action for all the actions that you're going to see is a 4,000 MMR replay. This one is a 6,000 MMR replay. And what, what's cool about this is then we take this policy that is being trained from human, and then we can ask it to play like a 3,000 MMR player by setting a beat saying, "Well, okay, play like a 3,000 MMR player or play like a 6,000 MMR player." And you actually see how the policy behaves differently. It gets worse economy if it ... you play like a gold level player. Um, it does less actions per minute, which is the number of clicks or number of actions that you will issue in a whole minute. And it's very interesting to see that it kind of imitates the skill level quite well. But if we ask it to play like a 6,000 MMR player, we tested, of course, these policies to see how well they do, they actually beat all the built-in AIs that Blizzard put in the game, but they're nowhere near 6,000 MMR players, right? They might be maybe around gold level, platinum perhaps.
- LFLex Fridman
Mm-hmm.
- OVOriol Vinyals
Um, so there's still a lot of work to be done for the policy to truly understand what it means to win. So far, we only ask them, "Okay, here is the screen and that's what's happened on the game until this point. What would the next action be if we ask, you know, we ask a pro to now say, 'Oh, this, you're gonna click here or here or there'?" And the point is experiencing, experiencing wins and losses is very important to then start to refine. Uh, otherwise the policy can get loose, can, can just go off policy, as we call it.
- LFLex Fridman
That's so interesting that you can at least hope eventually to be able to control a policy, uh, approximately to be at, at some MMR level. That's, that's so interesting, especially given that you have ground truth for a lot of these cases.
- OVOriol Vinyals
Right.
- LFLex Fridman
Uh, can I ask you a personal question?
- 45:00 – 1:00:00
So, yeah, putting constraints…
- OVOriol Vinyals
of putting more rules that makes the agents more human-like in terms of restrictions?"
- LFLex Fridman
So, yeah, putting constraints that, uh ...
- OVOriol Vinyals
More constraints, yeah.
- LFLex Fridman
That's really interesting. That's really innovative. So, one of the constraints you put, uh, on, on your, uh, on yourself, or at least focused in, is on the Protoss race, as far as I understand.
- OVOriol Vinyals
Yeah.
- LFLex Fridman
Uh, can you tell me about the different races and how they... uh, so Protoss, Terran, and Zerg. How do they compare? How do they interact? Why did you choose Protoss? Is-
- OVOriol Vinyals
Right.
- LFLex Fridman
Yeah. Is ... in the dynamics of the game-
- OVOriol Vinyals
Mm-hmm.
- LFLex Fridman
... seen from a strategic perspective.
- OVOriol Vinyals
So, Protoss ... Um, so in, in StarCraft, there are three races. Um, indeed in the demonstration we saw only the Protoss race, so maybe let's start with that one. Protoss is kind of the most technologically advanced race. Um, it has units that are expensive, but powerful, right? So, in general, you wanna kind of conserve your units, um, as you go attack. So, you wanna ... And, and then you wanna utilize, um, these tactical advantages of very fancy spells and so on and so forth. Um, and at the same time, uh, they're kind of ... People say, like, they're, they're a bit easier to play perhaps, right? But that I actually didn't know. I mean, I, I just talked to now a lot to the players that we, we work with, TLO and Mana, and they said, "Oh, yeah, Protoss is actually ... people think is, is actually one of the easiest races." So, perhaps the easier, that doesn't mean that it's ... You know, obviously professional players excel at the three races, and there's never like a race that dominates, um, for a very long time anyways.
- LFLex Fridman
So, if you look at the top, I don't know, 100 in the world-
- OVOriol Vinyals
Mm-hmm.
- LFLex Fridman
... is there one race that dominates that list?
- OVOriol Vinyals
It would be hard to know, because it depends on the regions. I think, um, it's pretty equal in terms of distribution, and Blizzard wants it to be equal, right? They don't want, they wouldn't want, um, uh, o- one race like Protoss to not be representative in the top place.
- LFLex Fridman
Right.
- OVOriol Vinyals
Um, so definitely, like, they tried it to be, like, balanced, right? So then maybe the opposite race of Protoss is Zerg. Zerg is a race where you just kind of expand and take over as many resources as you can, and they have a very high capacity to regenerate their units. So, if you have an army, it's not that valuable in terms of ... losing the whole army is not a big deal as Zerg, because you can then rebuild it. And given that you generally accumulate a, a huge bank of resources, um, Zergs typically play by ap- pr- applying a lot of pressure, maybe losing their, their whole army, but then rebuilding it quickly.
- LFLex Fridman
Mm-hmm.
- OVOriol Vinyals
So, um, although of course, every race ... I mean, there's never, um ... I mean, they're pretty diverse. I mean, there are some units in Zerg that are technologically advanced, and they do some very interesting spells, and there's some units in Protoss that are less valuable, and you could lose a lot of them and rebuild them, and it wouldn't be a big deal.
- LFLex Fridman
All right. So, uh, may- maybe I'm missing out, maybe I'm gonna say some dumb stuff, but, uh, so s- summary of strategy.
- OVOriol Vinyals
Mm-hmm.
- LFLex Fridman
So first, there's collection of a lot of resources.
- OVOriol Vinyals
Right.
- LFLex Fridman
So, that's one option. The other one is exp- uh, expanding, so building other bases.
- OVOriol Vinyals
Mm-hmm.
- LFLex Fridman
Then the other is obviously attack abil- building units and attacking with those units.
- OVOriol Vinyals
Mm-hmm.
- LFLex Fridman
And then, uh, I don't know what, what else there is. Uh-... maybe there's, uh, the different timing of attacks, like, do an attack early, attack late.
- OVOriol Vinyals
Right.
- LFLex Fridman
What are the different strategies that emerged that you've learned about? I've, I've read that-
- 1:00:00 – 1:15:00
Mm-hmm. …
- OVOriol Vinyals
first the win itself, I mean it was so exciting. I mean the... so looking back to those last days of 2018, really where that's when the games were played. Um, I'm sure I, I look back at that moment, I'll say, "Oh my God, I, I want to be like in a project like that." It's like I already feel the nostalgia of like, yeah, that was huge in terms of the energy and the team effort that went into it. And so in that sense... as, as soon as it happened I, I already knew it was kind of, I was losing it a little bit, so it, it's almost like sad that it happened, and, "Oh my god," like. But on the other hand, it also verifies the approach. Um, but to me also, there's so many challenges and interesting aspects of intelligence that even though we can train a neural network to play at the level of the best humans, um, there's still so many challenges. So for me it's also like, well, this is really an amazing achievement, but I already was also thinking about next steps. I mean, as I said, these agents play Protoss versus Protoss, um, but they should be able to play a different race much quicker, right? So that would be a- an amazing achievement. Some people, um, call this meta-reinforcement learning, meta-learning and so on, right? So there's so many possibilities after that moment, but the moment itself, it really felt great. Um, it's, I, we had this bet, so, so I, I'm kind of a pessimist in general, so I, I, I kind of sent an email to the team and said, "Okay, let's ... against TLO first," right? Like, what's gonna be the result? And I really thought we would lose, like, 5-0, right? I, I, I, we had some calibration made against the 5,000 MMR player. Um, TLO was much stronger than that player, even if he played Protoss, which is his off race. Um, but yeah, I was not imagining we would win, so for me, that was just kind of a test run or something, and then it really kind of... he was really surprised. And unbelievably, we went to this, to this bar to celebrate, and, and Dave tells me, "Well, why, why don't we invite someone who is 1,000 MMR stronger in Protoss?" Like, a actual Protoss player, like, like that, uh, it turned out being Mana, right?
- LFLex Fridman
Mm-hmm.
- OVOriol Vinyals
And, you know, we had some drinks and I said, "Sure, why not?" But then I thought, "Well, that's really gonna be impossible to beat." I mean, even e- because it, it's, it's so much ahead. 1,000 MMR is really like 99% probability that Mana would beat TLO, uh, as Protoss versus Protoss, right? So we did that, and to me, the second, the second game was much more important even though a lot of uncertainty kind of disappeared after we, we kind of beat TLO. I mean, it, he is a professional player, so that was kind of, oh, that's really a, a very nice achievement. But Mana really was at the top, and you could see he played much better, but our agents got much better, too. So it's like, ah, it's like... And then after the first game, I said, "If we take a single game, at least we can say we beat a game, I mean, even if we don't beat the series." For me, that was a huge relief. And I mean, I remember hugging Demis and I mean, it was, it was really like... This moment for me will resonate forever as a researcher and, I mean, as a person, and, uh, yeah. It's a really, like, great accomplishment. And it was great also to be there with the team in the room. I don't know if you saw, like, the, the sum-
- LFLex Fridman
Yeah, yeah
- OVOriol Vinyals
... so it was really like-
- LFLex Fridman
I mean, from my perspective-
- OVOriol Vinyals
... yeah.
- LFLex Fridman
... the other interesting thing is just like watching Kasparov, uh, watching Mana, uh, was also interesting because he did, he was kind of a, a loss of words. I mean, whenever you lose, I've done a lot of sports-
- OVOriol Vinyals
Mm-hmm.
- LFLex Fridman
... you, you sometimes say excuses, you look for reasons.
- OVOriol Vinyals
Right.
- LFLex Fridman
Um, and he couldn't really come up with reasons.
- OVOriol Vinyals
Yeah, yeah.
- LFLex Fridman
I mean, uh, w- uh, so with- with off raise for Protoss, you could say, well, it's, it felt awkward it wasn't, but here it was-
- OVOriol Vinyals
Yup.
- LFLex Fridman
... it was, he was just beaten, and it was beautiful-
- OVOriol Vinyals
Yup.
- LFLex Fridman
... to, to look at a human being being superseded by an AI system. I mean, um, it's a, it's a beautiful moment for, for researchers, so, uh...
- OVOriol Vinyals
Yeah, for sure. It was, it was, I mean, probably the, the highlight of my career so far, um, because of its uniqueness and coolness, and I, I don't know. I mean, it's obviously, as you said, you can look at papers, citations, and so on, but this, this really is like a testament of the whole machine learning approach, um, and using games to advance technology. I mean, it's, it's really, it really was... Everything came together that moment. That, that's really the summary.
- LFLex Fridman
Also, on the other side, it's a popularization of AI too, because just like traveling to, uh, to the moon and so on, I mean, this is where a very large community of people that don't really, uh-
- OVOriol Vinyals
Mm-hmm.
- LFLex Fridman
... know AI, they get to really interact with it, uh-
- OVOriol Vinyals
Which is very important. I mean, we-
- LFLex Fridman
It's extremely important.
- OVOriol Vinyals
... we must, um, you know, writing papers helps our peers, researchers-
- LFLex Fridman
Yeah.
- OVOriol Vinyals
... to understand what we're doing, but I think AI is becoming mature enough that we must sort of try to explain what it is, and perhaps through games is, is a, an obvious way because these games always had built-in AI, so it maybe... Everyone experienced, um, an AI playing a video game even if they don't know because there's always some scripted element and some people might even call that AI already, right?
- LFLex Fridman
Uh, so what are other applications of, uh, the approaches underlying AlphaStar that you see happening? There's a lot of-
- OVOriol Vinyals
Mm-hmm.
- LFLex Fridman
... echoes of, you said Transformer of, uh, language modeling-
- 1:15:00 – 1:19:22
Y- yeah, absolutely. But…
- OVOriol Vinyals
include that, right? And it's really a difficult problem, and very fascinating, and very mysterious, actually.
- LFLex Fridman
Y- yeah, absolutely. But do you think it's... If you were to try to solve it, c- can you not grow the size of data intelligently in such a way that the distribution of your training set does include the entirety of the testing set?
- OVOriol Vinyals
I think-
- LFLex Fridman
Is that one path? The other path is totally new methodology-
- OVOriol Vinyals
Right.
- LFLex Fridman
... that's not statistical.
- OVOriol Vinyals
So, a path that has worked well, and it worked well in, in StarCraft and in machine translation and in languages, is scaling up the data in the model, and that's kind of been maybe the only single formula that deliv- still delivers today in deep learning, right? It's, it's that scale, data scale and model scale, really do more and more of the things that we thought, "Oh, there's no way it can generalize to this," or, "There's no way it can generalize to that." But I don't think fundamentally it will be solved with these.
- LFLex Fridman
Mm-hmm.
- OVOriol Vinyals
And for instance, I'm really liking, um, some style or approach that would not only have neural networks but it would have programs or some discreet decision-making, because there is where I feel there's a bit more like... Like, I mean, the example of... The best example I think for understanding this is, um, I also worked a bit on ai, like we can learn an algorithm with a neural network, right? So you give it many examples and it's gonna sort your, sort the input numbers or something like that. But really, strong generalization is you give me some numbers or you ask me to create an algorithm that sorts numbers, and instead of creating a neural net which will be fragile because it's gonna go out of range, uh, at- at some point-
- LFLex Fridman
Mm-hmm.
- OVOriol Vinyals
... you're gonna give it numbers that are too large, too small and whatnot, you just... If you just create a piece of code that sorts the numbers, then you can prove that that will generalize to absolutely all the possible inputs you could give. So I think that's... The problem comes with some exciting prospects. I mean, scale is a bit more boring, but it really works, and then maybe programs and discreet abstractions are a bit less developed, but clearly I think it- they're quite exciting in terms of future for the field.
- LFLex Fridman
Do you, uh, draw any insight wisdom from the '80s and expert systems and symbolic systems, symbolic computing? Do you ever go back to those, the reasoning, that kind of logic? Do you think that might make a comeback, you'll have to dust off those books?
- OVOriol Vinyals
Yeah. I, I actually love actually adding more inductive biases. Um, to me, the problem really is, what are you trying to solve? If what you're trying to solve is so important that try to solve it no matter what, then absolutely use rules, use domain knowledge, and then use a bit of the magic of machine learning to empower or to make the system as... The best system that will detect cancer or, or, um, you know, or, or detect weather patterns, right? Um, or in terms of StarCraft, it also was a very big challenge-
- LFLex Fridman
Mm-hmm.
- OVOriol Vinyals
... so I was definitely happy that if we had to c- take, cut a corner here and there, it could have been interesting to do. Um, and in fact, in StarCraft, we s- we start thinking about expert systems because it's a very, you know, you can def- I mean, people actually build StarCraft bots by thinking about those princip- Like sta- you know, state machines and, uh, rule-based, and then you could, you could think of combining a bit of a rule-based system but that has also neural networks incorporated to make it generalize a bit better. So absolutely. I mean, we should, we should definitely go back to those ideas, and anything that makes the problem simpler, as long as your problem is important, that's okay, and that's research driving a very important problem. And on the other hand, if you wanna really focus on the limits of reinforcement learning, then of course you must...... try not to look at imitation data or to u- u- to look some, like, for some rules of the domain that would help a lot or even feature engineering, right? So this, this attention that, depending on what you do, I think both, both ways are definitely fine and I would never not do one or the other if, if you're- as long as y- what you're doing is important and needs to be solved, right?
- LFLex Fridman
Right. Uh, so there's a bunch of different ideas that, that, um, that you've developed that I really enjoy, so by one, one is, um, translating from... so image captioning.
Episode duration: 1:46:00
Install uListen for AI-powered chat & search across the full episode — Get Full Transcript
Transcript of episode Kedt2or9xlo
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome