No PriorsNo Priors Ep. 120 | With Google DeepMind’s Pushmeet Kohli and Matej Balog
EVERY SPOKEN WORD
75 min read · 14,828 words- 0:00 – 0:48
Pushmeet Kohli and Matej Balog Introduction
- SGSarah Guo
Hi, listeners, and welcome back to No Priors. Today, we're joined by two of the key folks behind one of the most compelling developments in AI this year, AlphaEvolve. Pushmeet Koli and Matei Balag worked on this autonomous coding agent that uses Gemini models and evolutionary search to discover new algorithms. It marks a major leap in AI's ability to contribute to core computer science and math, and perhaps sciences beyond that. It's not just a stochastic parrot or a boilerplate generator, it has shown what you might consider technical creativity in the way that Move 37 did with AlphaGo, something humans hadn't done before, even in thousands of years of play. It might even be a real step on the path to self-improving AI. Pushmeet, Matei, thank you so much for being here.
- MBMatej Balog
Thank you for having us.
- PKPushmeet Kohli
It's
- 0:48 – 2:31
Origin of AlphaEvolve
- PKPushmeet Kohli
a pleasure.
- SGSarah Guo
Congratulations on the success and the launch of AlphaEvolve. Can you give me a brief description of- of what it is, broadly?
- MBMatej Balog
Yeah, so in maybe one sentence, AlphaEvolve is an AI coding agent that is able to discover new algorithms that are able to make, uh, new discoveries on open scientific problems, and at the same time, those algorithms can be so practical that they are already deployed in key- key parts of Google's own in- infrastructure.
- SGSarah Guo
What is the origin story of working on this particular form of coding agent, or this problem statement?
- MBMatej Balog
So we are not new to this space of algorithm discovery. Um, as you might know, the- the mission of, uh, all of DeepMind is to build AI responsibly to benefit humanity, and the way our particular team has been doing it for years now is to look for ways how AI can discover new algorithms. New algorithms are everywhere around us, so this is a very, very important question and can have very high impact when we can discover algorithms that solve important computational problems with, uh, higher efficiency than what we have been able to do so far. And kind of the first breakthrough we had in this space was, um, in 2022, when we released a system called AlphaTensor. And so that was a system that was an AI system using, uh, reinforcement learning that for a very specific but, uh, fundamental computational task, so multiplying matrices, for the first time showed that, uh, AI agents can discover better algorithms that- than what humans had been able to do be- before them. Uh, so this was the- the first system that gave, um, weight to this idea that indeed with AI, we'll be able to go into the superhuman region of algorithms that we as humans have not been able to discover ourselves.
- 2:31 – 8:02
AlphaEvolve’s Progression from AlphaGo and AlphaTensor
- MBMatej Balog
- SGSarah Guo
How do you differentiate, um, AlphaEvolve from, like, AlphaTensor and FunSearch and some other, um, projects in the sort of lineage of this?
- PKPushmeet Kohli
Uh, one way to also describe, uh, what we have done is if you look back at the history of, uh, DeepMind, and, uh, and see a number of sort of projects that have come even before, uh, we started working on computer science, our earlier sort of work, uh, and if you go back to, uh, the project on AlphaGo, where the AlphaGo agent was able to beat the world Go champion in the game of Go, and what was- the remarkable sort of, uh, thing, uh, in- in that, uh, agent was that it was able to explore this amazingly large search space of all possible sort of Go positions in- in such an efficient manner that it can sort of come up with what is the optimal move at that time, right? And it really surprised people. Um, both, uh, G- Go professionals as well as scientists. Scientists believed that that, uh, event would come much, much later because it was a very hard problem. And so, um, what that gave evidence for is that, is the ability of these large-scale, uh, neural network based systems to be able to reason and do very efficient exploration in these large search spaces, and come up with amazing new insights about, uh, the particular domain. And then in the game of Go, I mean, there is this move called Move 37, uh, which is a very creative new move that, uh, the agent discovered that, uh, was not in the Go literature, right, that, uh, really surprised the, uh, Go professionals. So in some sense, uh, we asked ourselves the question that if you have an agent which can do very efficient search in the domain of Go, why can't you use the same kind of philosophy to search for algorithms in the space of algorithms? And in fact, uh, that sort of was the underlying basis of, uh, the work on our- our first sort of, uh, attempt at this- that problem, which culminated in AlphaTensor. Mm-hmm. So, how we structured, uh, the algorithmic discovery, uh, problem is we looked at first a very important problem, and, uh, that problem was matrix multiplication. It is a problem that is ubiquitous in computer, uh, science. It's, uh, one of the key fundamental operators that underlies not only computer science, but also neural networks and machine learning and neu- uh, and- and AI. We said, "Can we find a way to improve matrix multiplication algorithms?" So there's a history of matrix multiplication, which is very interesting, uh, for people who might be interested in it. Like, uh, it's, uh, even though it's such a fundamental operator, um, people thought that the- the complexity or the time it takes to multiply two matrices is O (cube) .
- SGSarah Guo
Mm-hmm.
- PKPushmeet Kohli
And around 50 years back, more than 50 years back now, uh, a German mathematician, Strassen, came up with this very counterintuitive construction which showed that, in fact, the complexity was not, uh, n to the power 3.... q, uh, three, or what's not cubic, where n, n is the sort of dimensionality of the matrix. It's lower. And so... And that was a very counterintuitive sort of result and, um, and, but it stayed for more than 50 years and... Until sort of alpha tensor came up and we said, "Well, can we actually improve this result?" And, uh, remarkably, we were able to show that. That alpha tensor, while... Uh, by having this amazing ability to do search in this very large space, even much larger, uh, than the space of possible Go moves, uh, was able to come up with this a- amazingly new, uh, algorithm which, which improved things. But then the question was, well, we now have proved the thesis that you have these, uh, super intelligent agent which can go beyond what, uh, human computer scientists, uh, have been able to do, but can we generalize them?
- SGSarah Guo
Mm-hmm.
- PKPushmeet Kohli
This is... Uh, sort of alpha tensor was very smart but, uh, was only sort of purposefully constructed for the matrix multiplication problem. Can we have, build an agent that is more general? Both more general in the sense of it can handle more bo- uh, more general problems, but can also, uh, search in the space more naturally, in the space of programs rather than in the space of, uh, very specific, uh, sort of operations that were required, uh, for matrix multiplication? And that was the origin of, uh, sort of, uh, the first, uh, attempt, uh, of, of, uh, of us, uh, with FunSearch, which was an LLM-based agent which for the first time, by searching in the space of programs, showed that you can come up with completely new solutions. Like, and this... And made the first scientific discovery from an L- an LLM. And, uh, AlphaEvolve is basically an extension of,
- 8:02 – 11:18
The Open Problem of Matrix Multiplication Efficiency
- PKPushmeet Kohli
of that.
- SGSarah Guo
I'm very inspired by, um, the idea, I think as many people are, that, uh, AI will actually have creativity, does actually have cr- like, technical creativity as you were describing as one way to conceptualize this, where you're, you know, outside of the patterns that we already know as engineers. I wanna go back to some of the mechanics here, um, and the limits to generalization and how to think about automated evaluators and, and a lot of different topics. But, you know, when, when you think about these problems that are clearly, like, economically valuable and interesting, like matrix multiplication, the potential efficiency of it, what is your intuition for, uh, why, you know, those, those solutions have not been found before? I- is it simply like the search space is too large or people in this field were complacent in that, um, they believed a certain solution was, like, the maximum efficiency or... Because it, uh, clearly there's, there's value to be had here.
- PKPushmeet Kohli
My sort of opinion on this is basically that if you look at the structure of the, of the algorithm what Strassen produced, it was quite, uh, uh, sort of, uh, ingenious. It was not a natural thing that you would sort of think of, and that is, uh, that was for only two by two matrices. As you sort of go to larger sizes, the space is so huge, the constructions are not sort of something which is very natural. These are very involved and integrate- in- integra- intricate sort of constructions that, uh, would be very hard to discover by chance.
- SGSarah Guo
Mm-hmm.
- PKPushmeet Kohli
So it's quite, uh, interesting that it is, uh, uh, it has this, uh, special structu- uh, structure, but it's not something that is, that comes naturally, or e- or, uh, to, to a human computer scientist.
- MBMatej Balog
Just to add to that. So I definitely agree. The, the search space is just, uh, unbelievably vast. The solutions are maybe non-intuitive. And the third thing I want to emphasize is that I really believe the people who worked on this in the past were definitely not complacent. And in fact, the, the problems we chose to apply AlphaEvolve to in the first instance, uh, both on the scientific side and the practical side, we, we deliberately chose problems which have been worked on for a very long time by the very best people. So on the, you know, on the, on the scientific side, uh, since we're talking about matrix multiplication, this has been a known open problem for decades, and, and many people have been, have been working on it. And similarly for the, the, the practical applications that, that we, that we mentioned in our AlphaEvolve release in key parts of Google's infrastructure, again, like these are things that have been heavily optimized inside Google because they are so important. And so by having a system like AlphaEvolve or any other, um, uh, discover something new on these problems, I think that's, uh, as strong a demonstration as I can imagine, um, of the fact that this is indeed something that is new because no one found it before, and also it is something that was not easy to discover, because those results stood for such a long time and have been worked on by, yeah, such, uh, s- such strong people.
- SGSarah Guo
Uh, noted that it, this is not a comment on the, you know, uh, broad efforts of the computer science industry to date on matrix multiplication or data center optimization.
- 11:18 – 14:43
How AlphaEvolve Evolves Code
- SGSarah Guo
I, I think this is a good moment to try to demystify what's happening under the hood for a broader set of people. Um, can you, can you walk us through a concrete example of how AlphaEvolve actually evolves code? Say let's take the example of trying to optimize, um, data center scheduling, right? What, what does the step-by-step process look like from initial random code to final solution that saves millions of dollars of, of power?
- MBMatej Balog
I can walk you through that. Uh, so the user of a system like AlphaEvolve, they basically specify what is the problem that they're trying to solve. So that's the most important thing. And you specify it by providing what is called an evaluation function. What this function does is, um, whenever there is a proposed solution for solving the problem, you're able to tell how good this solution is. So you basically define what makes a good solution. For discovering an algorithm for scheduling jobs on a data center, this evaluation function could be something like a simulator of jobs in a, in a data center that given an algorithm for doing the scheduling, it simulates how good this algorithm is. So that's what the user provides.
- SGSarah Guo
And this is a simulator you already had?
- MBMatej Balog
Yes. So that's a simulator that we already had. Um, and I would say it's something that is-... quite natural to have in, in many, in many domains because, you know, whenever you want to innovate on something, you need to have a way of telling, okay, is the innovation actually good or not. So, so it's-
- SGSarah Guo
Okay.
- MBMatej Balog
... a very natural object to, to have at least, um, in principle. So you define the what by providing the evaluation function and then AlphaEvolve fills in the how. So the, that, that's the job of, uh, of our system. And you can do it in two fairly different ways. One is you tell AlphaEvolve, "I have no idea how to solve this problem. Let's start completely from scratch and let's try to be creative and come up with something completely new." So that's one option you can take. Another option you can take is, um, "Actually, we have already worked on this problem for a really long time. Here is a very strong initial solution that we can provide to the system and you can start from here." And that's what we did, uh, for the application to discovering new algorithms for, uh, for scheduling jobs in a data center. So AlphaEvolve takes this initial solution and then it c- on a high level, it combines the creative power of large language models to propose, like, creative new ways how to improve that solution, the strictness of the evaluation function provided by the user that is able to actually filter out the things that, uh, that work from the ones that don't. And then this is wrapped inside an evolutionary algorithm that makes sure that we kind of discover the whole space of, um, o- of algorithms in that region so that we don't commit to a very specific type of solution early on, but instead we maintain a diverse pool of potential solutions. Over time, maybe we combine ideas from different, uh, different solutions that are already strong until we actually have an algorithm that's so strong that we're happy to, happy to deploy it to a critical part of Google's infrastructure, let's say.
- SGSarah Guo
And intuitively, um, uh, not in the machine learning sense but in the evolution sense, you have different generations where you're getting closer to an optimal solution.
- MBMatej Balog
Yeah. That's right. Uh, like, you would expect that, uh, in each iteration of evolution, what you're doing is you're looking at the previous iteration, looking at the, maybe the strongest solutions you have, and then trying to be creative about how can I combine ideas from those solutions or maybe bring in completely new ideas to, to come up with something even better. And so as each generation gets stronger and stronger.
- 14:43 – 16:52
Scaling and Predicting Iterations
- MBMatej Balog
- SGSarah Guo
How much scaling are we talking about? Like, is there, um, a way to predict how many generations it take- or how do you, you know, constrain the, the number of iterations that, um, the model can use?
- MBMatej Balog
So there are two parts to your question. Uh, one is about, okay, how does scaling work and then how can you predict it? So for the first part, this is actually a really nice feature of AlphaEvolve, that it can adapt to the difficulty of the problem. Uh, if you ask AlphaEvolve to, uh, find a solution to a problem that's actually unexpectedly easy, then it will just do it very, very quickly. Like, almost immediately, you will have the solution. But if you ask it a problem that's really, really difficult, and by really, really difficult, I, I mean, like, really difficult, maybe an open question that has stood for decades in the sciences or, you know, a, like, you want the practical algorithm for, uh, like, really high-value application in, in Google, then you, you, you would, of course, expect this is not an easy problem. You might need to spend, um, longer time considering different solutions, exploring the space, combining ideas. But what's really nice about AlphaEvolve is that it is able to sustain this scaling in a way that it keeps improving over time and it keeps improving for so long that you can make discoveries on this level of difficulty, like, you know, breaking decades-old scientific challenges or discovering high-value algorithms. Now, uh, I, I know it maybe sounds, sounds trivial that if you, you know, wait longer, you'll get better results. But in practice, that's actually-
- SGSarah Guo
(laughs) .
- MBMatej Balog
... um, uh, like, a really difficult thing to build automated agents that are able to sustain this, this continual improvement without, um, plateauing quite, quite early. This is, I think, a nice feature. There was a second part to the question about predicting how many iterations you will need. Um, so that is something that, that is actually not so easy because e- it's like asking a priori, "Do you know how difficult this question is going to be?" And especially in the sciences, that's something that often has a very surprising answer. Like, very trivial questions can turn out to be extremely, extremely difficult and vice versa. But the nice thing is that you have continual improvement if you run this system. And, um, as long as you can, can run it, you can expect to get better and better results and you, you just have to see where, where this, where this gets
- 16:52 – 19:42
Implications for Coding Agents
- MBMatej Balog
you.
- SGSarah Guo
If you think about the coding agents that general developers have access to and are increasingly using today, one frustration with them is on relatively trivial problems, um, it is set out to do autonomously and will get lost i- in, and blow itself up or plateau, as you said, um, uh, in frustrating ways. Like, can you talk about if you think there are implications from AlphaEvolve to, to these other general coding agents?
- PKPushmeet Kohli
While large language models and coding agents are, uh, getting m- much better in their understanding of code, uh, they're not perfect, right? So they do make mistakes. The other sort of element is to think about what is the task that these agents have been assigned. Um, mostly, if you are asking an agent to solve a particular task or write a particular program, you are providing a specification.
- SGSarah Guo
Mm-hmm.
- PKPushmeet Kohli
You are specifying the tasks either in natural language or you're saying, "Well, I'm trying to do something completed." Right? So it's not a complete characterization of what you want. It's, uh, a, a partial specification of what you want, and the agents then try to solve the problem, uh, and might get lucky and, or mi- and might get the right result, or they might hallucinate and get the wrong result. And the issue is, how do you know what h- uh, sort of whether the result is right or wrong? And that depends on having a good evaluator. That's how AlphaEvolve solves the problem. So in some sense, we are able to leverage the hallucinations...... for a beneficial purpose, right? So the creativity and the wrong answers that AlphaEvolve can somehow come up with, how do we know that they're wrong? They might be very good, we, we just don't see them in that way. And which is why the, the, the role of the evaluator is really important. And how do we even do the evaluation is very important, because when you come up with a new idea, um, should you try to explore that idea much further, right? Or how deep should you go into stress testing that idea? Should you try that idea out on a few different instan- in- instances or sort of a thousand different instances, or pr- really stress test that the idea actually works for the whole, uh, thing? This is one of the interesting parts of, uh, AlphaEvolve, getting that balance right is really important, so that you can look at where are the creative solutions, how can you sort of filter out the ones that are promising-
- SGSarah Guo
Mm-hmm.
- PKPushmeet Kohli
... and then use them later to refine the search process to get the final solution?
- 19:42 – 25:21
Overcoming Limits of Automated Evaluators
- PKPushmeet Kohli
- SGSarah Guo
If evaluation functions, automated evaluators are, um, really, like, such a limiting constraint here in terms of what we can get, um, agents to do, any intuition from this project or others on how to overcome that? Like, can models get good at helping us create automated evaluators? Should we imagine simulators that are better for lots of different domains? If I, you know, lame product manager putting in incomplete natural language spec to coding agent, uh, n- should I work with an assistant to, like, complete that spec? Do I use traces? How do you, how do you think that gets solved?
- MBMatej Balog
That's a really, really great question, and I think, uh, you can view it from two perspectives that I, I think will happen at the s- same time. So, one is that, yes, currently, the, the strict evaluation function plays a key role in AlphaEvolve, and one takeaway you can take from this, thinking about the future, is that it, it shows the, the really high value of having these evaluators available. 'Cause f- uh, in many cases, it might be that you have a, a really important problem, but you don't actually have a very precise definition of what makes for a good solution. And one takeaway you can have from a system like this is that if you actually do build a very precise evaluation function, then this unlocks the possibility of having an agent like AlphaEvolve discover something that's way beyond what, what, let's say, humans have been able to discover or your, your best developers have been able to discover. So that, that's one takeaway. But the other takeaway that I'm maybe even more excited about from the research perspective is that we don't actually think this is a conceptual limitation.
- SGSarah Guo
Huh.
- MBMatej Balog
So, today, we have... this was maybe the, the easiest way to get into this game of discovering new things by looking at problems that already come with these ve- very precise evaluation functions. So that's just a natural first step to take. But I do believe that this assumption can be relaxed in very significant ways. And in particular, li- y- you already mentioned one example where maybe language models themselves will be able to evaluate whether proposed solutions look promising or not or whether they fail in, in some particular ways. And, and indeed there is, um, uh, parallel work from DeepMind as well called AI CoScientist which demonstrates this very clearly that, uh, if you, if you propose ideas in, in natural language, then you can get language models to provide meaningful critiques and identify the ones that work from the ones that, that don't. So, I really do see a lot of hope on relaxing this assumption. And then even in between these two extremes of, um, you know, strict evaluation that exactly tells you how good a solution is on one end and then, uh, natural language evaluation by a language model on the other hand, there is a continual spectrum of, um, simulators and, um, auxiliary evaluation functions, which are maybe not perfect, but as long as they are correlated with the, with the true signal, then we can build the algorithmic scaffolding of the evolutionary algorithm around this in such a way that we still make meaningful progress. And maybe it will take a few more iterations, but we can still, still go really, really far.
- PKPushmeet Kohli
So, just to add what Matei sort of mentioned, I think, uh, the, one of the takeaways is basically that, uh, LLM-based agents like AlphaEvolve, especially when we structure them in this way with population-based, uh, sort of, uh, search, right, with, uh, evolutionary approaches, they are extremely effective in searching. They can search very convincingly and very effectively in, uh, very large spaces and come up with, uh, very counterintuitive, uh, new solutions for important problems, problems that, uh, we have, uh, studied for many, many years and sometimes in, in, in some cases, decades. Uh, so that's one. The other sort of element of, uh, the evaluator, like, as Matei mentioned, there is work on using other sources, uh, for evaluation. So you don't have the perfect evaluator. Even for AlphaEvolve, uh, even if you have a simulator, that's not a perfect evaluator, right? Because you are sort of going to evaluate things on, uh, a specific distribution of problem instances. You might want to sort of, uh, prove certain properties of the solution, right? You might want to say that the solution always, uh, has certain performance. So, if you want to prove certain properties of s- of the solution, that will, that might require, uh, sort of oth- other work, right? You might have to have a proof agent which sort of tries to, uh, prove, uh, certain properties of, of the solution. While on the other hand, you have these LLM-based evaluators which, uh, can look at the solution and you don't have, uh... nobody has built a simulator, but they can just have a guess on how good that solution is. And in fact, that approach also works very well. And we have shown that, uh, uh, this AI CoScientist...... uh, which we have used for hypothesis generation. It basically uses a multiagent sort of setup and, where LLMs themselves are able to sort of figure out that, uh, uh, certain hypotheses are better in terms of novelty and significance and impact and should be propagated, right? And that whole process, uh, ends up, and this might be surprising and counterintuitive to some, m- of producing much, much, much better results than the base large language model, right? So you are really able to discover new information beyond what the large language model itself alone was able
- 25:21 – 28:10
Are We At Self-Improving AI?
- PKPushmeet Kohli
to produce.
- SGSarah Guo
That begs the question, which I think is, like, one of the biggest meta questions proposed by this sort of work, which is like, do we get self-improving AI, right? Um, one of the things you demonstrated with AlphaEvolve is you can optimize the systems used to train AlphaEvolve, right? So you have this, you know, 23% speed up in part of the training infrastructure, um, i- if I recall correctly. Are, are we now witnessing the early stages of recursive self-improvement in AI and, you know, what do you think the implications are if that's true?
- PKPushmeet Kohli
Uh, I think in some senses, uh, sort of a yes, but a- at the moment what, uh, we have seen is basically improvements in computation time. So, uh, what AlphaEvolve has been able to do is basically make training more efficient. But you can ask the question, can you make the training, uh, you can im- can you improve the training process such that the underlying model is not only, uh, sort of, uh, uh, trained faster but is actually fundamentally better-
- SGSarah Guo
Hmm.
- PKPushmeet Kohli
... in certain cognitive tasks? And that is something that has to be validated still, right? But, uh, it is a, uh, a direction that is definitely very a- appealing and, uh, and something that is being sort of actively sort of looked at by many people.
- SGSarah Guo
Do you have a reason to believe it won't work?
- PKPushmeet Kohli
Well, it should work but as, as we sort of mentioned that, uh, having good evaluators is an important element, right? And so having a sort of evaluator which can say, "This proposal that you have just suggested for me to improve the training process will yield a, a good result." So if you have that kind of evaluator, uh, then it will work, but there is no reason why such evaluator does not exist, but, uh, we need to sort of, uh, work on building such evaluation functions.
- MBMatej Balog
Maybe just one, one thing to add to it is that, uh, like, I would also agree that we are maybe seeing the first sign of self-improvement, but one also needs to be very specific about what we have shown so far. Like, as Pushmeet mentioned, the speeding up the training of the next generation of the Gemini model. So the feedback loop is, uh, is fairly long, uh, at least currently, maybe on the order of months. But there is, you can call it self-improvement for sure. Maybe the, the big question that many people are curious about is, how does this extrapolate into the future? And you can have different types of self-improvement. One is where you get maybe just a one-off benefit. Like, the model improves itself once and that's it. And other one is, okay, the model keeps improving itself continuously but maybe the improvements get, uh, marginally smaller and smaller and smaller and you converge to some limit. Or maybe the, the improvements will, will keep accumulating, uh, up and up and up. And that's a big open question that, um, yeah, we don't have an answer to today.
- 28:10 – 31:50
Effects on Scientific Discovery and Mathematics
- MBMatej Balog
- SGSarah Guo
Let's take that projection to other fields, and obviously these are all interrelated. But, um, one of the things, Menten, you're really excited about is just how, uh, AI applies to the sciences. Um, when you think about new mathematical constructions, improved solutions to, you know, open problems or problems that looked solved, you know, to humanity 50 years ago, um, what do you, what do you think the implication is in, in different fields? Like is it a fundamental shift in how scientific discovery or mathematics gets done?
- MBMatej Balog
First of all, yes, I- I'm super excited working in this area of using AI to accelerate the sciences because in a way it's the most exciting application of AI that I can imagine. Like, like, w- what could be more valuable or exciting to, to advancing the frontiers of, of human knowledge? So, so yes, that, that is definitely there. And then of course, in different, um, fields of science, the speed of progress or the advance you get from AI, uh, might be slightly different. So in AlphaEvolve we've primarily focused on mathematics and computer science because these are the, the domains where it's the easiest to get these, um, automated evaluation functions. Like you often get them basically for free. That's not to say that you cannot get them in, in, uh, other branches of science, but in maths and computer science it's just, it's just, uh, they're just most common. If you think about biology or chemistry, you want to design a molecule, then, um, you can have an evaluation function again in the form of a simulator or a predictive model that given a candidate molecule will make a meaningful prediction about, okay, is this actually work, uh, going to work in practice? And then if you are in this regime, then again, um, AlphaEvolve would be applicable. And, and we are only talking about the version of AlphaEvolve that we have built today, and these are problems that we can, uh, address today. Um, but we don't think that, uh, the journey of AlphaEvolve finishes here. We have, uh, many ideas about how to make this system, uh, more powerful and more broadly applicable, and I'm, I'm fairly confident that, uh, we'll see many applications across many branches of science. And then this is only talking about AlphaEvolve. There are many other agents. Pushmeet mentioned AI co-scientist and, uh, and many others that, um, I'm sure will keep, uh, transforming how science is being done across the whole spectrum.
- PKPushmeet Kohli
Yeah. So I think broadly if you look at it, right p- science is, uh, a lot of science involves searching.... right? Searching for the right idea, searching for the right construction, searching for the right, uh, sort of, uh, solution, the right drug candidate, and so on. And in some sense, like, what, uh, scientists have been trying to do is sort of, uh, somehow make that, uh, process repeatable, right? There's, uh, at the moment, there is still sort of an element of, uh, uh, seren- serendipity, uh, to some of the discoveries. But we are, as we move towards, uh, sort of rational material discovery or rational drug discovery, uh, uh, we are sort of seeing, uh, computational approaches and very systematic evaluations playing a much more important role in many areas of science. Uh, and I think as, uh, that work propagates, you will have systems like AlphaEvolve which will be able to search in those spaces and use these evaluations much more effectively. So, it's- it's like you can just sort of see this as a tool that will give scientists, uh, a superpower in their ability to search over very complex and sometimes, uh, counterintuitive, uh, sort of solution spaces.
- 31:50 – 38:30
Role of Human Scientists with AlphaEvolve
- PKPushmeet Kohli
- SGSarah Guo
When I think about one logical, um, extension to this approach, it is, um, let's say evaluate, like automated evaluation in the real world, right? So lab, assay, you know, a bunch of robotic arms doing experimentation if you're, um, screening molecules or something. What do you think the role, let's just say, like very near term, um, if that vision is true, um, of the human scientist or engineer is? Is it the problem framing, like determining the evaluation? Is it constraining the, like giving some intuition for, like a starting point or a search space? Like what- what should the human scientist be good at from here?
- PKPushmeet Kohli
There are many sort of elements, right? First of all, as we were, as we have been talking about a lot, the role of the evaluation function, right? So that needs to be defined. Like, what do we really, how do we want to assess these solutions? But then there are many other sort of elements as well, right? When we are, uh, trying to find a solution, it has to have certain properties. What are those properties, right? Giving hints, uh, giving sort of, uh, for example, if you're trying to discover, uh, a new drug, you want to make sure that that drug sort of, uh, treats the disease but does not kill the patient, right? It has sort of, uh, uh, its side effects are low, right? Or it can be what is, what is the delivery mechanism for it? So there are so many different, um, requirements that, uh, a solution might want, that it might need to satisfy. And some of them are encoded in the evaluator, in function evaluator. And some of them, you might want to hard constrain the- them in the solution, right? And so can you specify those so that, uh, an agent like AlphaEvolve can take that into account while it is, uh, thinking about how it explores the search space or how it constructs, uh, the solutions that it, uh, it will sort of generate? These are all sort of very interesting places where human input might be, uh, required, but especially as we look at many different types of domains. So yeah, I- I think we should definitely see this as, uh, an amazing tool for scientists, for computer scientists, mathematicians. And- and this is, in fact, this has been sort of our experience as well, that in the right hands, it is a very powerful tool, right? So like mathematicians who have tried to explore i- it and, uh, and they- they have been able to specify what are the solutions that, what are the types of solutions that they're looking for. They can be much more, uh, uh, productive and much more sort of effective in finding, uh, the solutions.
- MBMatej Balog
I just wanted to highlight that, uh, even though we have, uh, been describing AlphaEvolve as this kind of autonomous agent that does things on its own, actually, in practice, using this agent often turns out to be surprisingly collaborative.
- SGSarah Guo
Hm.
- MBMatej Balog
And we have seen this in particular, uh, with mathematicians that we have collaborated with. Um, and there- there are a few reasons for this. Uh, but one is that, um, AlphaEvolve is an agent that doesn't just give you the solution. It searches for- for an algorithm that constructs that solution. And so depending on how you, how you set up your problem definition, often it's actually the algorithm that's even more valuable than the solution itself because the algorithm, it tells you how to construct the solution. So that means you understand what are the, what are the ideas that go into building that solution. And maybe especially or definitely it's true in mathematics, that's what, um, uh, that's what people really care about, to understand the nature of our- our universe and, uh, build up the understanding of fundamental ideas. And so it's actually often not interesting almost at all what the solution is, but what you care about is how you, how you build it. And so we had a first-hand experience collaborating with, uh, multiple mathematicians and it's been really fascinating to see where we would, uh, we would share with them the- the output from AlphaEvolve and, um, they- they would be like really fascinated, uh, looking at the code that it- that it found and trying to understand, okay, what- what is it actually doing and then understanding, oh, okay, this- this is doing this, this is doing that, and now I can see why- why if you put it together, then it leads to a really good solution. Yeah, I- I can also confirm from my own personal experience that looking at the, at the code or the algorithms that the system finds, it's- it's often a really interesting experience because it's- it's code that kind of like looks human-like, like it's something that you could have written, but would you have thought of writing it in exactly this way and then, uh, trying to understand, okay, what exactly is it doing? Um, that- that's a really interesting experience. But at the same time, it's one of the key strengths of- of the system, not only for scientific applications where you can look at the code and get some understanding out of it, but also for many of the practical applications. It's- it's hugely valuable that the- the artifact you get out of AlphaEvolve is a, is a piece of code and then you deploy that piece of code. And so before you do that, um, experts, engineers who have worked on that system can visually inspect that piece of code, understand it, and make the final decision of whether it's going to be deployed. So it's in a-... completely different league from, let's say, considering, um, using a neural network to make decisions in some production system, where you kind of need to trust that the neural network is going to always behave in the way that you hope it will. With the code, you can look at it, understand it, and make the decision yourself.
- PKPushmeet Kohli
I might add that basically, uh, not all code is interpretable by, by humans, right? The solutions and the programs that, uh, AlphaEvolve finds are, uh, sort of interpretable by, uh, human, uh, programmers. So, uh, so this is going to be a very interesting area of work in the future as to when you find these solutions, what can we learn from them? Th- this was very interesting, uh, like as Matei was sort of mentioning, uh, this was a very interesting experience that we had working with Jordan Ellenberg in the first, in the earlier version of AlphaEvolve when we were working on the CAPset problem. The programs that it discovered had this, uh, very interesting symmetries that, um, that mathematicians did not know about. And so, so not only the solution was, uh, mathematically interesting, but like the actual sort of con- con- construction, but the algorithm for, uh, for producing that construction had, uh, the structure of it was interesting in itself.
- SGSarah Guo
For
- 38:30 – 40:18
Making AlphaEvolve Broadly Accessible
- SGSarah Guo
listeners who are thinking about, um, accessibility or implications for themselves where they're not professional mathematicians in collaboration with AlphaEvolve, what are the considerations in making some of these capabilities more broadly available?
- PKPushmeet Kohli
We want to make, uh, these capabilities accessible to, um, as many people as we can, uh, to the wider community. Uh, now, we, we have started a trusted tester program where, uh, we have asked, uh, people to submit proposals and what we intend to do with the, that program is to figure out what are the right, uh, ways in which people can really leverage, uh, AlphaEvolve. So we have internally, we used it ac- across Google, but as you know, uh, it requires certain things, uh, uh, sort of the need for a function evaluator. As part of the trusted tester program, we are going to be evaluating AlphaEvolve on a bunch of different, uh, types of, uh, applications and that will inform our future, uh, release strategy as to how do we make it more broadly a- applicable. The second sort of element is that, uh, not only you need functio- the, the evaluator, but you also need a significant amount of, uh, computational resources, right? Because it's not just one single LLM call, it's, uh, it requires a significant amount of, uh, uh, function evaluation depending on the difficulty of the problem, right? If it's a easy problem, then you can do it r- very quickly, but if you really are going for some very hard problems with a very large extended search space and you want to spend a significant amount of time searching over it, then how do you build the overall system that can, that people can sort of, uh, uh, can use effectively and efficiently? That's the other sort of, uh, thing that we'll be thinking about.
- SGSarah Guo
Last question
- 40:18 – 41:39
Applying AlphaEvolve Within Google
- SGSarah Guo
for you both. Is there, um, sort of practical application within Google that you think will be interesting that you haven't tried AlphaEvolve on yet?
- MBMatej Balog
In this, uh, white paper, we try to think about holistically when we look at the computational infrastructure of Google, what are the, the key parts in this infrastructure to demonstrate that AlphaEvolve can make discoveries across the stack, not only in one part of it, and that it can make discoveries that are highly valuable? And so we try to cover the entire spectrum so, so we show that AlphaEvolve can improve the efficiency of the data center, it can contribute to hardware design, and it can contribute to improving the efficiency of most important pieces of software that, uh, that are being run inside Google. And like one intention here was to demonstrate that this is a really versatile tool that you can apply across the spectrum. And as Pushmeet was saying, this is a tool that is already available inside Google and it is being, being used for, for many, many problems. Uh, there are quite a few exciting ones. I, I'm not ready to share about the particulars, uh, yet, but, uh, as you can imagine, there is so many exciting computational problems in a, in a, in a place like Google, uh, within AI and also outside, uh, that yeah, I'm, I'm sure there will be many, many really cool results, uh, coming in the future.
- SGSarah Guo
I think
- 41:39 – 42:08
Conclusion
- SGSarah Guo
that's a great note to end on. Um, Pushmeet, Matei, anything we didn't cover?
- PKPushmeet Kohli
No, I think that was great.
- SGSarah Guo
Thank you guys so much for being here. Congrats.
- PKPushmeet Kohli
Okay. Great.
- MBMatej Balog
Thank you very much.
- SGSarah Guo
Find us on Twitter @nopriorspod. Subscribe to our YouTube channel if you wanna see our faces. Follow the show on Apple Podcasts, Spotify, or wherever you listen. That way you get a new episode every week. And sign up for emails or find transcripts for every episode at no-priors.com.
Episode duration: 42:08
Install uListen for AI-powered chat & search across the full episode — Get Full Transcript
Transcript of episode 2Fs6VZpsiMQ
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome