No PriorsNo Priors Ep. 74 | With Google DeepMind VP of Research Oriol Vinyals
EVERY SPOKEN WORD
80 min read · 16,465 words- 0:00 – 0:55
Introduction to Oriol Vinyals
- SGSarah Guo
Hi listeners, and welcome to No Priors. Today we're talking to Oriol Vinyals, the VP of research at Google DeepMind, and technical co-lead for Gemini. His storied career in machine learning includes leading the AlphaStar team, which built a professionally competitive and pioneering StarCraft agent all the way to today. And we're really excited to get his historical perspective on where we are in machine learning. Welcome to the show, Oriol.
- OVOriol Vinyals
Yeah, amazing. Thanks, Tara, for in- the invitation. And likewise, thanks a lot for, for hosting me.
- EGElad Gil
Yeah, thanks for joining.
- SGSarah Guo
Last year was an eventful year at Google and DeepMind. Um, you know, how is that research effort organized now, and what do you, what do you think of the mission as internally?
- OVOriol Vinyals
Yeah. So sure, I mean, I'm happy to obviously discuss the different phases that research, uh, organizations have gone through in the last many years. But focusing on last year, two major events happened. One was that,
- 0:55 – 2:04
The Gemini Project and Its Impact
- OVOriol Vinyals
uh, the Gemini project was formed, um, as a result of having two sort of parallel efforts on LLMs, uh, mostly led by, uh, Google Brain and, and what we now call Legacy DeepMind. So, uh, earlier in the year, there was, uh, an effort to merge the two, the two projects, and that's when sort of Geoff, Geoff and I came together and brought the two teams together to create the very first Gemini model, which, uh, was eventually released later in the year. Then the second big event was to, uh, take the, all the organizations, uh, that were doing, uh, AI research or AGI research and also form a singular organization. That's what today is called Google DeepMind. Um, and it comes from, uh, Google Brain and Legacy DeepMind coming again together under one roof. Obviously Gemini being a very large and very important project within that organization. Um, and really the goal, uh, of Gemini itself is to create an awesome core model to power, uh, the technology that of course LLMs today are, um, powering all around the world. And we obviously expect this to
- 2:04 – 8:29
AI in Google Search and Chat Models
- OVOriol Vinyals
all increase.
- SGSarah Guo
How do you interact with the rest of the company and, like, Google as a business? And I'm like, I feel like I have to ask you, uh, does AI replace traditional search?
- OVOriol Vinyals
So even running that from a research standpoint, um, is super interesting, right? There's, there's, um, two major centers, one in California, one in London, given the organizations that we come from. So that in itself is very interesting. In a way we, we have the project running 24/7, which is helpful when you train these large models. And, and then you have to do a few things, right? One of the things we do, of course, is trying to build state-of-the-art technology, showing from sort of a research, knowing where the field is coming from and where it's going to, trying to really, um, showcase from our, our own sort of intuitions and ambition what might come next, right? So a, a prime example of this was, for example, the long context that we released earlier in the year, right? Millions, millions of, uh, tokens now are be- being able to be processed by, by our models. But then of course we also, um, sort of take into consideration all the different needs, right? From the different products that we work with. Google has a lot of product areas. So we try to focus, of course, initially especially to form the project, we try to focus on critical projects. And you see that very much, um, by how Gemini's first surfaced to, to users or to enterprises, right? So obviously cloud, um, and enterprise is very important. Developers as well. Uh, super cool to put these models in the hands of creative minds that are gonna do things you, you didn't even anticipate these models could do. Um, and then very important formerly known Bard, now Gemini app, which is sort of the chatbot surface of our models. And then maybe the last, uh, very important piece indeed is, is search, which, uh, is trying to integrate, of course, this technology into their product. Um, and of course has a lot of users. So it's extremely exciting to, to think, well, the decisions you make at modeling, uh, eventually and eventually means just maybe a few couple months after or so will make it into, into the users that maybe are signing up for a beta, et cetera. So, super exciting and it, and it's obviously connected. It's the core of, of the company really, especially for the products that require very intelligent, um, AI systems like the ones we're creating today.
- EGElad Gil
How do you think about, um, the various types of use cases that fall under chat-based model versus search based models? Because I remember I was at Google many years ago at this point, and at the time a lot of the different types of search queries were, um, broken into different chunks. There's navigational queries, you're trying to get, uh, some other site and it just kind of helps direct you there. Um, there were, uh, sort of strong intent commerce queries, you know, you could kind of break it down. There's medical queries and so you can kind of map out the world of different types of things that the users are actually trying to do with the search as the interface to get there. What do you think will move more towards chat and what do you think remains in the domain of more traditional search-based approaches?
- OVOriol Vinyals
We might not know the answer and like we are sort of experimenting in a way, right? So you can, can kind of think an LLM first experience, like that's the chatbot, right? So, so in there search has a role to play because it can be seen as a tool that enhances the, the answers, um, of your chat experience, right? It provides citations, uh, a bit more reliance. We know that, uh, of course language models can and do hallucinate. So there's that sort of LLM first point of view which we are building up sort of from, it's kind of more of a new product. And then search itself, which is obviously embracing sort of e- enhancing some of, as you said, different query types that could be useful, um, to be sort of do a bit more like summaries, like AI summaries. But there's a lot more coming. I think I/O actually showcased this year quite a lot of-... a, a good breadth of vision of what search is trying to do with, with language models, right? So some of it, um, is not ready but is being tested and, and, and obviously, like, feedback is very important. But I think for now, it's hard to, to think of a convergence even or that one sort of dominates the other. I think both right now seem useful in different ways. I, as a user, use, use both certainly. Uh, but, but I think what is very clear though is that, um, if even if search is sort of the initial point where you have a query and you wanna research something, um, it just, it just feels like that, that experience, uh, will tremendously be enhanced by these models. So the search product itself is trying to figure out, right, how to integrate, uh, the LLM answers or their capabilities, uh, reasoning capabilities, et cetera. So I think we're gonna see a lot of that. And I, I kind of call, like, from Search, uh, integrating LLMs and, of course, the vice versa is, is very obvious as well. And then, uh, as a core model project like Gemini project is, um, we have an open mind. We don't really need to, you know, decide one or the other. And one of the things of o- operating at the scale that Google operates, uh, is that yes, we are the research team that builds the models, but then, of course, the PAs, um, are the ones that are gonna drive the strategy, of course influenced by what the models can do and with input from many of us that have been dreaming about this world. But, um, it's kind of cool to also, um, have the very experienced folks, right, sort of iterate with their users and their use cases.
- EGElad Gil
That makes a lot of sense. I mean, I think one of the things that's very understated or forgotten about Google is the degree to which it really was the world's, uh, first AI-first company or ML-first company as it used to be called, right? I mean, the entire product, both in Search and Ads, was very AI-driven in the very earliest days. And part of that, I guess, is also inferring user intent through action and then algorithmically adding that as an outcome. And so I guess to your point, you could continue to use search as a primary UI or interface or entry point, and then if it routes to more of a chat-based thing or a chat bubble could pop up or something else, it could just happen organically based on the type of intent that the user ex- is exhibiting. And so, um, I think you're raising really interesting points about the capabilities of, of Google and, um, you know, the rest of it. I guess related to that, what has been the most surprising thing about how users or companies have been interacting
- 8:29 – 14:42
Infinite Context Length and Its Applications
- EGElad Gil
with Gemini?
- OVOriol Vinyals
Yeah, there's quite a few, right? I guess, um, maybe the ones that surprise me the most because initially I even thought this was just a number that you could report is, is the fact that, you know, infinite context length is coming eventually. Um, so I thought, look, I mean, this is interesting, right? We come from a world where we had recurrent neural networks and LSTMs that actually had infinite memory, although it was not very capable, right? You- the, the models in, in practice, they never remember more than a few hundred words or so. So that was kind of first that we could make the context length so long and then seeing the use cases just emerge even internally when we were first just trying the model. That, I don't know, that, that seems very trivial now in hindsight, but, you know, putting a whole one-hour video and just, just ask anything and it, it feels superhuman, right? You just literally put the video in, and after 10 seconds, 30 seconds, I mean, it does take some time to process the context, but you just can ask anything, right? And I mean, thinking of computer vision as a field or video question answering some of these data sets that we come from. I mean, they all thought seemed very dwarfed then compared to the capability that was in our hands. And then we put it in the hands of developers and we saw, like, I mean, amazing obviously demos and, and things that people could do. Um, even as mind-blowing as you point the camera to the screen directly and not even the code, right? It's, it's just the letters that appear on the screen and you can debug like that, right? So you can imagine how future interfaces will be effortless. I mean, you just need to point the camera, ask a question, and you get answers. It's an interesting research problem, of course, but maybe it's not gonna be that useful to now thinking, "Wow, that's amazing." Yet, it's not very mainstream either, right? So we are kind of still trying to discover, right, what, what does this enable? We showed Project Astra where, of course, memory, you get the phone and you just interact with it as, as if it is an agent. Memory is very important. But it's still not very clear in a few years what this might be. Although you could imagine, well, the whole web is in the context or, um, all your personal data because it's remained in, in your device because it's the working memory of the model rather than the weights. In lots of applications, but still fairly early days, right? So it surprised me. Yet, of course, it, it in a way it hasn't taken off fully, um, gone fully mainstream, although it does feel magical when you start interacting and realizing you can ask anything. You get an answer like this without watching any of these movies or, um, many books you can upload, et cetera.
- EGElad Gil
What do you think is the timeframe for very long context windows really being in broad-based use? Not, not sp- this is not a specific to Google question, but it seems like multiple people are on a trajectory to add this at scale, and you're also seeing it crop up in biology models and other things like that. So I'm, I'm just a little bit curious in terms of the timeframe for which it actually hits either a large enterprise or consumer use cases. And to your point, you can imagine ones where the company is doing something on behalf of the consumer by adding all the context, uh, from their device or from their interactions into the context window. Or it could be an enterprise uploads a large folder including a bunch of legal documents, and then it gets incorporated into some query or something.
- OVOriol Vinyals
I- if there is a compelling use case, the technology is not far away from being able to be deployed at scale. And of course, hardware is also being updated based on what, I guess, the research developments are, right? So, um, certainly, like, one, two years.... um, these might be quite... The- the context that is a commodity will be definitely enhanced, um, by a factor, I mean, I don't know, 10X or so everywhere. And then, I mean, extremely long context, uh, I think is gonna be a motivational drive for- for definitely cert- from a research perspective. And then deploying it at scale, some techniques, like many have been explored already, like hierarchical memories and so on, um, and even RAG is pretty common, right? Like, um, so we're gonna combine these and probably thanks to the use cases, um, I expect, I mean, order of one, two years, you might see, wow, we- we went another order of magnitude from both state-of-the-art and of course what's considered commodity. So, I'm pretty certain about this. That again, model or finding use cases that will be compelling to serve the model that requires, I mean, more memory there. Some certain limitations, of course, uh, but these will be figured out from a technological standpoint if there's the motivation for sure.
- SGSarah Guo
In the, um, very quickly coming era of infinite context, like, what is the relevance of retrieval architectures and- and more hierarchical memory? Um, and like, you know, I think you can make an argument for this being continually relevant just from an efficiency perspective, but, you know, what- how do you think about it?
- OVOriol Vinyals
Yeah, I think definitely the efficiency argument to hierarchical memories to make context even longer make a lot of sense. And even from just efficiency of learning and- and of course of- of retrieving the memory, um, in a sort, of course, to find manner like- like, you know, an intelligent being like ourselves might do makes sense. So, I think that even the quality, uh, will motivate this sort of solution regardless. And we do have a lot of experience, of course, of retrieval-based methods, definitely at Google, and then combining them with neural-based methods, uh, I think it's a matter of time and finessing the details and the use cases of how much, uh, the con- the problem with, of course, retrieval-based methods is they su- tend to simplify, uh, things to say, "Hey, like this whole book is just a single vector," um, whereas if you just upload a whole book into Gemini and ask questions, it can really reason about every single word, right? So, probably finding the middle grounds for different use cases, um, is needed. But I think to me it seems like a feature, not a bug, that we do have a bit of a hybrid mode perhaps going into the future and research will be driven like this.
- 14:42 – 31:55
Scaling AI and Reward Functions
- OVOriol Vinyals
- SGSarah Guo
How do you contextualize this moment in time, just in terms of, you know, what the biggest limitations are for current state-of-the-art LLMs and, like, what's worth working on?
- OVOriol Vinyals
I mean, there's one reflection that even many years ago with friends who were kind of "early" in the game, they said, "Well, get ready. Lots of brilliant people, as this gets mainstream, will enter the field." And you certainly see this, right? With open sourcing and a bit of a random search even, right? It's not like... I mean, you just... Selection bias, like someone does something random, but people actually want that, and then that becomes sort of viral in a way. So, I think there's the s- the sheer size of the field that is one aspect that I think we were sort of anticipating. But to me, that's one of the biggest changes that I've seen, that there's more brains, more different backgrounds coming into the field, and that is combined. I usually tend to assign credit, uh, what's... With what has happened to, of course, the scale of data and compute, algorithmic advances that you- you can simplify, but there's certainly been some that have been important in the last 10, 20 years or so. And then actually the accessibility, right? The software, the open sourcing efforts, um, those have been quite critical to then create these sort of exponentials or linear trends in log scale that we're seeing. Now, a bit more into sort of how I see the field from maybe, like, I- I tend to call maybe the 2000, let's say '10 to '20, like deep learning era. So, what that era did, right, is it took a set of algorithms that were general, right? The- the algorithms are- are like stochastic gradient descent, deep learning, neural networks, right? Reinforcement learning. And you could think of these are ingredients, they're common, you expand them over the years, but they're certainly, uh, the same. And then you just apply them to a domain. And you get extremely good, uh, at that domain, right? So we- we have, you know, mastering the game of Go, beating the ImageNet Challenge, becoming state-of-the-art, speech recognition, state-of-the-art image generations, right? So- so that decade is kind of... The models themselves certainly are not general, but the algorithms are general, right? You could just take the same algorithm and then change the dataset and, of course, tweak a couple of things and voila, you get, like, protein folding, um, really e- enhanced, right, from the traditional methods. And I think then the greatest insight, of course, came from realizing that... And I think that's a lucky factor for us, because it makes communication with these entities much easier, but modeling language turns out to be such a powerful abstraction for generality. So of course, the GPT-2 paper especially post that sort of... In the abstract it says, "Look, you can solve every task, not one task by modeling language." And then perfecting that, the whole field, of course, building up from a lot of years of- of research, created what is not just the algorithms are reusable, but actually the model now becomes general. So, that's why I think, like, okay, AGI is getting closer that, you know, we have powerful models first in 2010 to '20, now we have general models. And I think multimodality has been more recently another amazing breakthrough that these techniques expand not to language, but also to- to vision, sound, videos, et cetera.So, that means that we have very powerful general models that from a AGI definition standpoint, it starts to tick many boxes. Reasoning capabilities of the models are there, but I don't think we've perfected sort of making the reasoning very crisp and accurate so that these models would not hallucinate or would not, you know, the model might solve an, you know, an Olympian mathematical problem, yet fail to then discern a very basic puzzle, right? And I think what you do with the model, uh, this reasoning step, um, there's a lot of ideas and a lot of experience and a lot of algorithmic advances we've also done in the last few years over, like, search and so on, but we haven't quite perfected this. And then, of course, the question is, you know, push the frontier and, and move forward in certain domains at least. But yeah, we've come a long way. I think, um, the investment and luckily, these models are finding usefulness, so the resources that go into the training the models, um, right now there's a good sort of, uh, feedback loop, right? Of revenue and then reinvesting certainly for, from the biggest players. So as a researcher, of course, I mean, we welcome that.
- SGSarah Guo
What is the difference between having reasoning capability and having reasoning capability that's crisp and accurate?
- OVOriol Vinyals
Sometimes the distinction between probabilistically solving something, right? So right now, you know, you get these models, they, they assign, they still assign probability, uh, mass over every sequence of, let's simplify and think of not multi-model, but words. So every s- every single sequence of words, it will assign a probability distribution over those. So then you, of course, are absorbing all the knowledge on the internet and then sharpening those models around being, following instructions, being aligned with humans. But you still have this probability distribution that will assign non-zero probability to certain things that would not deem to be correct. Although in language, of course, there's so many ways to say the same thing correctly, so that's why these models shine in the end of the day. They are very efficient ways to integrate over all possible sequences, right? Now, it's possible that, let's say now you predict, um, when it's a hard problem, right? A tricky question that requires deep knowledge, um, you might, you might be at 95% accuracy, but of course these will create errors. Even if it's small, this, uh, is deployed to the whole world, so certainly you will get to see the, the mistakes. So one thought would be, like, you just keep making the models larger, you keep improving the algorithms, and you're gonna hit a point where the probability of a mistake vanishes. It's possible, we will obviously explore that. But to accelerate that, that sort of progress, you, you wanna start sort of really exploring what's the reasoning the model has and by making it more redundant, more logical, by iter- iterating more on these kind of ideas, you could imagine, um, generating a, a, a very small progr- program, right, that runs as slowly with the language model at the center and then getting that 99 point many nines percent faster. And of course, you, you're gonna do both as, as a ambitious, like, lab and, and so on. So, but, but that crisply means that this probability of error diminishes. And of course, we can always put more compute power. We humans will make mistakes, we get tired, et cetera. But, I mean, these models are powerful. We can put more hardware, that inference. So the hope is that in that sense, they become at least as good as humans. (laughs)
- SGSarah Guo
You know, one way to, like, frame this problem is that even DeepMind or any large lab or, you know, the human race, we have some limited amount of compute to pur- toward this problem, right? And it's like, uh, there's, there seems to have been a strong shift of how much of that compute should be scaled at training time versus at inference time with, you know, test-time search or some of the, um, techniques that you describe in, in System Two. What is your prediction of, like, what that mix is of compute at training versus inference, um, inference time, compute, uh, you know, let's say two or three years from now?
- OVOriol Vinyals
What we're kind of all aiming to, to discover is to make the bitter lesson from Rich Sutton true, right? The bitter lesson states, like, use scale learning and use scale search, and then that's all you have to do as a computer scientist. I mean, it's, it's controversial. I certainly, like, simply said I'm a deep learner at heart, so, um, I don't disagree with that. Certainly, the learning scalability part has been tested and proven quite, quite heavily recently. The search, at least when you do not have access to a perfect reward, which is, um, this is the current, I think current problem we have in language. That is, you know, one of the key research areas, right? How do you assign reward facili- to statements that, I mean, not even, again, you and I might agree to be true, right? I mean, is the sky blue? I mean, I don't know. At night it's not, right? I, it's, it's quite an interesting kind of point that, um, assigning truth or one or zero, which is required in games, um, is not so applicable here. Now, historically, if you look at AlphaGo, which actually followed quite closely the recipe of you pre-train your model on all human data, you then use RL to make it better, and then you do some search at inference time. The compute there was very skewed for the middle step, the reinforcement learning step, right? So, I don't even know the exact numbers, but certainly the majority of compute was, uh, spent on the training of the self-play, as, as we called it at the time, uh, self-improvement loop. And if you look at today...... that's clearly not the case. Most compute is spent in pre-training and, in fact, you will see over and over that you overfit so you have to actually stop the reinforcement learning process to overfit to these imperfect reward functions. Otherwise, you start sort of doing a bit of adversarial search against a reward function that might be imperfect because, um, you have a dataset of human preferences and all of a sudden the model might discover, "Hey, you can issue lots of emoticons," and these reward function thinks that's great, right? And clearly, like, we have a problem of reward functions not being as accurate as in the game of Go or chess or whatnot. And then there's the third component which is now you have your model trained, how much do you let it ponder? Which in, in AlphaGo, again, using the same example, well the rules of, of Go say, don't quote me, but let's say roughly a game must last four hours of compute time from human perspective. So we know we had quite a limited inference time, because we obviously couldn't go over time. So, of course, there was parallelism and so on involved, but the compute there was certainly not as big as the one you would use to train. So to me that balance feels correct. Like, some on, on pre-training and here we, we're trying to learn every task, so certainly that's gonna be, you know, let's say it can be as high as 50%, not as high as over 90 like today. And then the rest mostly on reinforcement learning or, or if we get access to good rewards, that would be kind of the next maybe piece of compute, but much bigger than it is today. And then inference time, I mean, system two is slow but it's not terribly slow. So unless you ask the model, I mean, solve protein folding then probably you can go for a vacation for a month and the model will come back with a solution. Um, I think a few seconds of compute is okay, uh, for inference. So that would be the rough, like, so that's a small percentage probably compared to the, to the amount you spend in training. Of course, you're serving many queries for billions of people and then that means research is needed especially on the middle bucket of, uh, reinforcement learning step that currently feels still reasonably in the early stages of research.
- EGElad Gil
How do you think about scaling a reward function beyond games once you really hit superhuman performance of the model? I'm, I'm thinking of things like the older, like, Med-PaLM-2 models and things like that where they outperformed human physicians in terms of output relative to physician expert panels. And then obviously you could then do some post-training with physician experts as the, as the key. But at some point, the machine will be better than that, and so how do you keep scaling reward functions?
- OVOriol Vinyals
Traditionally, right, you, you just get... It's supervised learning, right? Reward function means good or bad so we can scale that process as much as we, we have so far. Um, I mean, obviously many, many players are, are realizing the process of human annotation and o- in fact deep learning comes thanks to amazing, like, Fei-Fei Li and lab effort to, to label a dataset of a million examples, right? So, so that way of scaling is one. Uh, but then I strongly believe that there might be a bootstrapping effect of the models that become better at judging their own outputs, right? And so maybe... And that really is probably maybe even the main hypothesis of reinforcement learning as I see it. I mean, I'm not a huge expert in RL. But if checking that something is correct is easier than creating the solution, then we're in business because the language models will be able to evaluate their own samples, uh, more accurate than to generate them. And then we have a sort of reinforcement learning loop because we can reinforce the ones that seem more promising and then the model gets better, right? So that's using the model itself as a reward, um, which incidentally uses language which is already fuzzy is one area that, I mean, I'm excited about. There's a leaderboard of reward models. Um, some of them are, I think the, the name that they use is maybe generative reward model. I think that area goes beyond this kind of need for specific task annotations. We might need specific task annotations. Then the question is how many labels will we need? And the hope is that in the limit, maybe you need as many labels as the user will provide the system when the user wants to teach it something new.
- EGElad Gil
Uh, not to abuse it, but, uh, you know, a friend of mine used to use, um, Nyquist-Shannon sampling theorem as sort of a proxy for how much an intelligent person or machine can actually extrapolate the intelligence of something smarter than itself. And, you know, it feels like you're almost falling into some version of that where, you know, I think the theorem basically states that, um, you know, you have some frequency on a wave and you're sampling it and you need to sample, uh, above a certain rate to be able to actually reconstruct the wave, right? And you could argue that that's some form of learning or intelligence or something else. And so you need to be smart enough to actually tell how smart you can be in some sense.
- OVOriol Vinyals
Yeah, I, I love that analogy. I, I, my undergrad degree, I studied Nyquist theorem quite heavily, which is funny-
- EGElad Gil
Yeah.
- OVOriol Vinyals
... because we broke it so much, right? I mean, let, let, let's... Okay, sorry. Aside on, on Nyquist, but, um, you know, what it says roughly is, like, look, I mean, if you wanna, let's say output certain resolution or frequency in the fullier domain, you need to sample at half the frequency you want to reconstruct, right? Otherwise, the information is simply not there. And you can see, if you take a sinusoid and you sample too little, you will not reco- recover the original frequency. But then look at these super resolution generative models, right? You input, like, a 32 by 32 pixel image. It fills in the details in a way that completely violates that principle. So I, I talked to my signal processing teachers, um, about this and, and of course, it is violating and it is inventing. It's hallucinating, but of course it's cheating because it turns out that the world...... has cer- certain structure that you can learn, which is es- essentially what all these generative models of images do. Going back to your point, I mean, I agree. There's, there's- that's a bit of, like, this sort of argument that there's these emergence properties, right? At some point, the model might have the capability, let's say, to self-correct. Let's call it self-correction. And as soon as it hits that capability, you can see how, wow, bootstrapping now is trivial. And of course, it's not gonna be that blanket across all domains. But certainly, we're, we're gonna see some of these. It's not gonna be that dramatic. It's gonna be more like, "Hey, this one now emerge," and so on. But, um, of course, once that capability is there, you need to have the algorithms to exploit it, especially these reward models that will be mostly driven by the model itself, right? And then how many labels, how are you gonna correct the, the mistakes that it might still do? Those are very interesting questions and even from a product standpoint, they're quite cool, right? Like, I mean, we all play with these models. I mean, if they, if it, if they fail, I mean, you just could say, "Look, I didn't like this," and the model should adapt and... That's where long context also plays a bit into, into, into the equation, right? You have long interactions, you re- you finesse what the model is doing for you. That sounds good a priori, so, um, how do you en- enable those capabilities and, and so on is one, one of the many kind of exciting future directions
- 31:55 – 38:14
The Future of General Models and Specialization
- OVOriol Vinyals
in the field.
- SGSarah Guo
We have this, uh, progression of the field from general algorithms to, uh, increasingly general models. We'll see how far we get that, uh, in, in, in that vein. Uh, DeepMind has also done really amazing work in particular domains, right? Um, protein folding, material science, whatever. Um, where... Are there... Are we fully in the era of general models or are, you know, is, is, is starting with language and going to audio, video as such, does that solve the rest for us or what, what other datasets or domains do you think are unique that are not well represented in that corpus?
- OVOriol Vinyals
The way you explained it is perfect, but I guess there's, there's a sort of a time and, and also what level of performance question here, right? So, could current models do a pseudo reasonable job at folding proteins? Perhaps, right? They might even use tools on the internet, download a bit of, a piece of software and figure it out. Um, so one way I, I kind of characterize general models is they are... I mean, the, the level of performance is irrelevant, but they're, like, 20% good at everything, okay? So they're, they... That's the level of performance they reach, but it's general, which, I mean, it's powerful, so you want that 20% to keep going up relatively uniformly across the board. So you don't wanna over-specialize the models or the research. But the world has very important challenges that are worth solving with specialization, right? So I think what I tell people in the team and, and around me is like, "Look, if the problem is worth it, then let's obviously specialize." Probably we can and we might see more and more bootstrapping from, from Gemini and these models to, to then a specific solution that, that the model is gonna be throw away except for, you know, maybe cracking protein folding or, you know, figuring out nuclear fusion or modeling the weather, right? Some of the projects that are currently active in, in the DeepMind portfolio, well, you better choose the ones that will matter, uh, because the timeline is such that, well, we'll get there sooner if we specialize, um, on those domains and then, perhaps eventually, like, the general model will overtake. But, I mean, that's just probably far away or further away. So when it's worth it, do it. And that's, I think we're gonna still see this hybrid mode. Although, again, more and more taking maybe Gemini and then doing something amazing in a more narrow domain and I, I think that to me seems like a, a good situation to be in because that directionality of taking a generalist model and doing something, uh, by fine-tuning it, there's gonna be a look back as well, right? Then you're gonna do something amazing, let's say, in math, right? We recently did that. And then, well, the data or as a reward model or something that model will do, will, will help the main model get better as well. So it's quite a synergistic thing to do as well. But, but if it's important for the world, it's okay. I'm okay with still task specialization. That's for sure.
- SGSarah Guo
There is a vein of criticism of math and computer science as with games or any other constrained domain like this that, uh, you know, it's, it's... And, and tell me if you think this is like a really niche view, but that it's a dead end versus, like, general reasoning advancement. Um, even though it has all these attractive attributes, like the, you know, ability to generate and, uh, self-validate in many ways. Like, how do you react to that criticism?
- OVOriol Vinyals
There's a validity that, that... Again, going back to the reward function sort of question, right? You wanna create the most general model or, or agent or intelligence. Um, turns out that reward is never perfectly defined by the environment. Um, eh, even if you go, oh, no, no. Surviving is... Look, it is, it is ex- extremely complicated to compute a reward. So you could argue like, look, when you do get these rewards, um, from certainly somewhat artificial, you know, interesting but artificial domains, that might not generalize to the real problem and in that sense, it could be a dead end. Then how you do the research is important, right? Let's use math as an e- as, as an example, right? We do have access to the reward but even there, it's not that simple, right? You know, if, if... I mean, if I'm doing a simple calculation-... okay, yeah, four plus four equal eight. That, I can check. Simple. But now you start thinking, "Well, prove this theorem." Um, the proof is either correct or not, but that starts to be more complex to get a crisp reward signal sometimes, unless you can formalize it and then there might be bugs in the formalization process or in the, um, maybe in the engine that checks the math underneath. And even then, I mean, if I say, "Four plus four," and instead of saying eight, I say, "Four plus four equals eight," is that correct or not, right? How do you check correctness? I mean, it's, you start to interweave language explanations. So I think by trying to solve these problems not in a strictly, "Hey, the reward is sharp and crisp, did you win or did you not?" But you start also say- saying, "Hey, the reward model just needs to understand what's correct or roughly correct, and look at what you wrote and assess," then you might start generalizing and w- going away from what would be this niche, niche world which is reward is perfectly defined to, "Hey, like, what's truth? What's not truth?" I mean, that's quite complicated, right? So I think in that sense, um, depending how you attack the problem, it's definitely not a dead end. And I think even because it's so hard to have a perfect reward when you interweave language, I think by, even by accident, the field will move forward as we get to discover these more general reward functions, train them better, uh, and, and they will themselves then push the models and hopefully some sort of self-improvement loop will appear, uh, more than it has so
- 38:14 – 43:09
Reflections on AGI and Personal Insights
- OVOriol Vinyals
far.
- SGSarah Guo
I, I think you already said, uh, you feel AGI is within grasp in the next-
- OVOriol Vinyals
Mm-hmm.
- SGSarah Guo
... in the next few years. Like, what is your most, um, contrarian take on, on this, or contrarian under-discussed take on this or AI in general?
- OVOriol Vinyals
I sort of didn't like the term AGI too much, uh, which is funny because Shane, he's a co-founder, had a, a lot of, I mean, incredible foresight, right? Shane, I, I, I, I just had a discussion with him about AGI timelines recently, and I mean, in 2009, he predicted it's 2028 and that seems, again, depending on how you take the definition and how strict and what's the test, that was quite, you know, a, a long time ago that, that he claimed that, and I think that still meta-calculus, et cetera, probably suggests the world agrees with that prediction. A bit, maybe a bit still pessimistic, rather optimistic his prediction versus the world estimate is at 2030 or something. The contrarian view would be, I'm not sure it matters that, that we achieved AGI. I think it might not look like, hey, like, it's gonna be exactly like the cognitive task we can do and then we reach parity. Um, it's gonna be a distribution of things that these models can do or can't do, and that to me feels like what's still worth pursuing, right? And I mean, you see plenty of examples of, as I s- as I was saying earlier, the model will crack this impossible puzzle in math and then it will just contradict itself trivially somehow, right? So, I think we need to be ready to not be too fixated on AGI, um, still fix the most, obviously egregious errors. That's very important because I think they, they, they are something profound that is wrong in the models but I think it's not the exact goal probably. It's, it's, it's just I, I think more of a distribution rather than the one point that before it wasn't and now we have it. And honestly, it's gonna be also impossible to get agreement, so it's gonna be quite a cloud of moments that people might feel it has happened now. But it doesn't matter because the models might be used for amazing things and products and, and research itself, right? Bootstrapping research and science is one of the things, of course, we're excited about. Um, Google DeepMind's mission has science very much present in, in the, in the mission statement. That's what sort of motivates at least myself. So I don't really care maybe to build AGI in, in the strict definition, uh, but I understand it is a good goal and I think, um, I appreciate, uh, to have a single number to aim for, but I think it's gonna be hard to agree with, with everyone as usual. (laughs)
- SGSarah Guo
Maybe one last one for you would just be, uh, do you... it sound... I, I'm guessing no, because it sounds like the mission continues quite a bit beyond that, but do you live your life any differently believing in 2028?
- OVOriol Vinyals
I think I was, I was reflecting on, on like cellular, like, I guess smartphones, right? So, so yes, with kids, like how do you, how do they, do you present with the option of smartphones? And I mean, I, we don't have that many samples or data points, but then, I mean, obviously like that's obsolete, right? There's, there's these technologies, um, I have... my kids are young, young so I don't get to the, the kind of, "Oh, you can try like this," Gemini, ChatGPT, whatnot. But I think that is worrying in the more humans of being a dad sense. I think, I think you, you adapt as well sort of how you do with like scaling up, right? You, you know, scaling up is very important as you progress in your career, you go from individual contributor, writing codes, to helping others sort of figure out what their, their, you know, their path is. So I think the scaling up, thanks to this technology is pretty... there's a huge opportunity. So I've been as- of course trying to use these to, you know, figure out what has happened in the endless chat rooms that I'm in. I'm in London, so, so when I wake up, I'm in California. It has like given me like a lot of tokens and long context to process, so, so I think there's, of course personally you also try to figure out how to best use the technology.... to scale yourself. And I talked to quite a few people that are not that much into technology, and all I say is, "Look, try to figure out how you could collaborate or use it as a tool," um, because I think that- that change is definitely coming no matter if AGI or not. And I might self-apply, but then, uh, yeah, from- from- from an intelligence creation standpoint, which is having kids, that is much more complicated. Sadly, it's a bit zero shot learning. So we'll see. Um, I'll tell you next podcast probably how that's going.
- EGElad Gil
Can
- 43:09 – 45:37
Will the Next Generation Study Computer Science?
- EGElad Gil
I ask you one more question? If you were to give advice to somebody with kids today, what do you think their children should study? They go to college in 10 years, what- what- what should they major in or what should they be doing to prepare for the future world?
- OVOriol Vinyals
Honestly, from a studying perspective, I feel like there's always your passion element that cannot be sort of, I mean, many of, in the early days, right, you- you did deep learning not because it was, like, the thing to do. I mean, it was just what many of us like to do. And I feel personally I cannot give advice that says, you know, find the top professions and choose based on that. Uh, so I'll modulate my answer saying definitely find which aspect, right, is your true passion or, you know, you admire someone, role models, etc. Now, what I will say, and I've been saying this for actually quite a few years, is that project how that thing changes with AI. Exploit that. Of course, not everyone will understand basics of model technology. But to be honest, as I was saying, language is the driver. So it's not that hard to kind of understand how this works. I was talking to my sister who is a teacher and, I mean, I just, look, she just used it to create like a summary of a kid's homework, uploading all the homeworks and okay, you know, that- that- that way of thinking, that needs to basically go into quite a few professions. So- so I guess first order beat, find your passion, follow that, and then second, really embrace some of the tools that- that are present today. Of course, if you're into technology, computer science, then another very fruitful direction is to find the corners of the space that AI hasn't gotten into. Maybe you- you- you will train a specialized models because again, it might be worth doing. There's still quite plenty of opportunities. I was just chatting to someone about, um, climate modeling. Weather modeling is kind of cracked with deep learning, but cl- cl- climate is quite different. We only have one sample, which is one planet. So it's pretty tricky. Uh, but then yeah, think about that as a kind of what areas could be enhanced and- and of course otherwise if you like the technology, um, I think there's quite a lot of LLM research unrelated to- to be done for, I mean, five, 10 years at least. So um, that's probably still a worthwhile way to- to investigate
- 45:37 – 46:08
Closing thoughts
- OVOriol Vinyals
and if you go into research, there's definitely lots to do.
- NANarrator
Well, thanks for doing this, Oriol. It was a great conversation.
- EGElad Gil
Yeah, thanks for joining.
- OVOriol Vinyals
Yeah, likewise. Great questions and hopefully next time any- I'm in the Bay we can- we can do one where we're in 3D.
- NANarrator
(instrumental music) Find us on Twitter @nopriorspod. Subscribe to our YouTube channel if you wanna see our faces. Follow the show on Apple Podcasts, Spotify, or wherever you listen. That way you get a new episode every week. And sign up for emails or find transcripts for every episode at no-priors.com.
Episode duration: 46:08
Install uListen for AI-powered chat & search across the full episode — Get Full Transcript
Transcript of episode NHN1bdtFKKA
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome