No PriorsNo Priors Ep. 90 | With Google's DeepMind's AlphaProof Team
EVERY SPOKEN WORD
80 min read · 16,086 words- 0:00 – 2:19
Personal introductions
- SGSarah Guo
Hi, listeners, and welcome to No Priors. Today, we have Thomas Hubert, Rishi Mehta, and Laurent Sartrand from DeepMind's AlphaProof team. AlphaProof is a new AI system that can find and verify mathematical proofs, building on DeepMind's earlier successes in chess and Go to tackle one of AI's greatest challenges, mathematical reasoning. In today's episode, we'll explore how AlphaProof works, its implications for math and AI, um, more about test-time RL, and what this reveals about machine learning's capability to reason rigorously. Really happy to have you guys. Welcome.
- LSLaurent Sifre
Thank you for having us. Yeah.
- SGSarah Guo
Maybe you can start by just talking a little bit about your backgrounds and how you came to be working on AlphaProof together.
- LSLaurent Sifre
I'm Rishi. Uh, I was one of the tech leads on AlphaProof. I've been working in sort of computer science and- and machine learning for a while. I'm a chess player, and I came across, um, the AlphaZero paper and saw some of the- the chess games, uh, that that agent produced, and I found it really inspiring. And I thought, like, "This is the kind of thing I need to work on." Like, coming up with something beautiful and superhuman and almost alien, uh, felt magical. I came over to DeepMind and, uh, the AlphaZero team, which Thomas was leading, uh, was working on, um, math, and that's how I got into math.
- THThomas Hubert
My background, so yeah, started working in, uh, in industry to, uh, in- in my early career. Um, I worked on anomaly detection computer networks, I worked on, uh, ad targeting, then switched to AI research. Uh, and there, a constant interest of mine has been, uh, systems that can, uh, spend more computes to either tackle harder problems, uh, or to think more. And, uh, math seemed to be a perfect domain for that.
- LSLaurent Sifre
Yeah, on my side, I, um, I was actually a Go player. Um, so instead of doing programming since the age of 10, I was actually playing Go. And I played a lot of Go during my youth. And, um, and then at some point, like, it was also my- my dad's dream to build a computer Go program. And so I was kind of figuring out, like, "What do I need to know to be able to build a computer Go program?" And then I realized that maybe it was being built at that time. And so that's how I discovered DeepMind and how I discovered AGI. And, uh, that's how I joined the company, and I've been kind of involved with AlphaGo and AlphaZero,
- 2:19 – 3:52
Achieving silver medal in IMO competition
- LSLaurent Sifre
MuZero, this line of work. Um, and recently, we had worked on AlphaCode and AlphaTensor. So, you know, like, that's way before, uh, ChatGPT, but we already knew that transformers were- were kind of changing a little bit how things were done. And so we, you know ... I found that in math, yes, you could get this perfect verifiability, and with AlphaCode, we realized we can generate a lot of good code. And so it was very natural at that time to think about, um, the potential, uh, there was for- for mathematics.
- SGSarah Guo
Can you contrast ... Um, uh, so maybe just as context, like, for- for any listener who wasn't a super cool mathlete like me, uh, IMO is the, um, sort of oldest, uh, most prestigious math competition for young people. Uh, there's a set of six problems. They feel impossibly hard, and AlphaProof had this, um, really amazing results of solving four of the six problems this year. Uh, can you talk a little bit about, um, uh, math and the IMO in particular as a problem relative to game playing and other search problems like chess?
- LSLaurent Sifre
I think, you know, like first, there is a, there is a big difference is that in board games, you- you play against someone, uh, and that's a lot of fun, uh, in- in the board games. And- and for instance, when we did these AlphaGo, AlphaZero algorithms, uh, we could really have this thing, uh, this- this- this thing about self-play. You could always play against, uh, someone who is exactly just your strengths. Uh, and that- that proved to be a powerful idea. Uh, when you're kind of trying to learn to do math, uh, in some sense, you know, you- you don't really have an opponent. Uh,
- 3:52 – 5:56
How AlphaProof works
- LSLaurent Sifre
you just have to- to think about it. And- and math is a bit special in the sense that it's like, it's almost like a purely cognitive kind of thing where you- you can just, you know... The only thing you can do is to think more. Uh, you know, you'll, maybe you can't really go into the real world and run an experiment. Um, I guess mathematician says that, you know, sometimes it's a good thing to take a nap and let your unconscious self kind of think about the problem, and that's a good way to come up with new ideas. But it's- it's a lot about thinking, and so, you know, like, when you're confronted with a really hard problem, uh, there's a whole question about how do you go and- and try to solve it.
- SGSarah Guo
Maybe this is a good time to- to ask you to just describe for our listeners, uh, most of whom are technical or somewhere in the tech field, but also a broader business audience, like, you know, h- how- h- how does AlphaProof work overall, architecturally?
- LSLaurent Sifre
Uh, yeah, sure. So, um, AlphaProof is based on this thing called AlphaZero, so maybe let me start there. AlphaZero was this program we developed to, you know, like, be basically kind of solve kind of perfect information board games. Um, and- um, and the way it works is- is- it's also based on a reinforcement learning algorithm. So, you can think of AlphaZero as maybe three components, like, uh, one, a neural network, uh, two, kind of, um, large-scale reinforcement learning. Uh, basically learning from trial and, uh, trial and errors. And- and three, it also has, like, this planning and kind of search component to it, to try to, given the current situation, trying to search for kind of the best answer. Um, and, uh, it turned out that, you know, like, maybe we weren't very imagining that when we were doing chess, but if you can handle kind of infinite action spaces, uh, then, you know, instead of kind of looking for a chess move, you can look for, for instance, a line of a proof. Um, and so that's what we- we tried to do when we- we- we started AlphaProof, is basically can we look? Can we ... Our action space is to generate lines of proofs, uh, and- uh, and maybe very importantly, we used, um, formal language to do that. So basically, uh, another way to say that is we use kind of code to write math. Uh, and it's become quite popular recently. Um, and- and then the
- 5:56 – 8:56
AlphaProof’s strengths within mathematical reasoning
- LSLaurent Sifre
advantage of that is that once kind of the- the proof is complete, uh, then you, your, you know, the- the machine would give you a signal back to say, yes, your proof is correct or not. And so we could search for kind of correct proofs. Uh, once we find a correct proof, we can learn from it, uh, and get a better neural network, and kind of go into this self-improving loop of tackling harder and harder problems and learning more and more about maths. So at a very high level picture, that's kind of how it works and how kind of the ideas behind AlphaZero were adapted to- to do math.
- SGSarah Guo
One of the things that was an interesting tidbit from your announcement was, uh, y- you know, there's a series of problems, solves one problems within minutes, three days to solve other problems. Um, is there a way you can characterize, like, the search space for math overall or what AlphaProof is better at, um, in terms of domains and others, like types of reasoning within math?
- LSLaurent Sifre
The search space in math is quite large compared to, like, something like chess or- or other board games. So, you know, the, uh, y-
- RMRishi Mehta
... is some people might think of writing math proofs as, like, picking from a bag of known tricks, uh, at each step. But in fact, like, um, there are many proofs where, like, you've got to come up with some non-trivial constructions. Like, you've got to invent a function out of thin air, or you've got to, like, um, come up with, like, some way to manipulate... Even if you're just rewriting an expression, uh, you can rewrite it in infinite ways. And, uh, there's only, like, a few ways you could rewrite it to actually make progress on the proof. Sometimes thinking about some novel problem requires, like, decades of theory building and approaching it from an entirely new perspective, uh, to arrive at, um, eh, the angle that helps you solve it. Uh, and so in that's- i- in that sense, uh, you know, the reason why there are, like, a lot of, uh, math problems that are, like, very simple to state but have stood, uh, you know, stood the test of time in that they've been unsolved for centuries even, is that, uh, the search base is not sort of easy to navigate, um, in, in most cases. I think what AlphaProof is good at amongst the IMO categories is it's, it's largely go- so the IMO problems have, uh, come in four categories. So there's, uh, algebra, uh, number theory, combinatorics, and geometry. Um, the, the two that it's strongest at are algebra and number theory. Um, it's relatively weaker at combinatorics, although it can do, uh, quite good at some IMO combinatorics problems. Um, and we didn't apply it at geometry at, at this year's contest. Um, and one of the ways in which it navigates this massive search space is, uh, we had an idea that, uh, we came up with which we call TestTime RL. So this is an idea where, like, let's say you're confronted with a problem that you don't know how to solve, um, and you know, you can, you can do some search with, with your, uh, what you know right now and you're not, not able to find a proof to it. Um, uh, what, what the agent then does is it, it constructs many variations of the problem, um, in the vicinity of that problem and attempts to solve all of them. And if it manages to solve any of them, it learns from that experience and sort of, uh, comes closer and closer to solving the original problem. Um, and you can think of this, uh, as like when confronted with a new scientific problem, um, many of your priors that you've developed,
- 8:56 – 13:40
Challenges in scaling AlphaProof
- RMRishi Mehta
uh, from other problems may not directly apply to it. Uh, and so you've got to do, like, very bespoke experiments on this problem itself to learn something about it, uh, and then hill climb your way towards a solution to this problem. And so this, this process kind of mimics that. Um, and so when you talk about, you know, these, like, the problems that are solved after three days, like, the- th- these are the problems that had this, you know, this loop of, like, "Um, we can't solve it, uh, with the base network, but we propose many, many variations. We try all of them. We get slightly better, slightly better, slightly better, and then we hill climb our way to the final solution after, like, days of, uh, inference time compute."
- EGElad Gil
Or what are the limitations or approaches to scaling or, you know, uh, increasing the set of problems that AlphaProof can solve? And I guess there's two dimensions to that. To your point, there's a set of areas where it already does quite well. Uh, algebra and other areas. And then there's a set of areas where that you may need new forms of training data or other things like certain aspects of geometry or, you know, potentially other, other, um, uh, areas of mathematics. So I'm a little bit curious how you think about what is necessary to increase performance dramatically from here. And obviously what you've done is incredibly impressive. I'm just sort of curious, like, how you think about what i- what are the obstacles to future growth?
- THThomas Hubert
Maybe the main thing, um, that AlphaProof doesn't do is, uh, theory building. It doesn't in theory. It's not proof equipped with. Um, uh, number theory wouldn't be na- wouldn't be able to come up with the, uh, complex analysis, uh, necessary to, to derive results in, uh, uh, in number theory. So it's one of the, of the main things, uh, that we don't do. Even though we generate, uh, variance, uh, and get better by solving these and that enables us to tackle the original problems o- of interest. Uh, theory building is, is a component that is, um, that would be required to go further. Maybe another dimension. Some problems like, uh, combinatorics can be, um, stated in a somewhat obfuscated manner. That doesn't mean that all combinatorics problems just as a, uh... As a reminder, combinatorics problems are all about counting the number of things. For instance, if you have, uh, 40 socks in a drawer, how many pairs do you have? It can be stated in, in some obfuscated way, and how to, uh, translate them in Lean is a major difficulty, and then how to solve them in Lean is a bit, is a bit unwieldy. How to best tackle this is still, uh, is still an open question.
- LSLaurent Sifre
But I think there is a link a little bit to your theory building because, for instance, you know, if there were... In some sense, part of the difficulty comes from the fact that the, the tools you need to express those things don't actually exist currently in the, in the library that we're using. And so, for instance, if you had, like, some kind of, uh, things that expressed, you know, what is an agent, what is an environment, uh, what is a strategy, uh, then for instance, uh, so most of the, you know, a lot of the combinatorics problems would be much easier to state. Uh, but because that doesn't exist at the moment, in some sense kind of the auto formalization would need to kind of come up with all these things beforehand before actually kind of formalizing the problem. Uh, and so that's kind of... M- maybe that's kind of linked to the fact that, you know, you... At some point, you will need to be able to develop new definitions, new mathematical objects, then of come up with their properties and their proofs, et cetera, et cetera.
- EGElad Gil
So I guess in... I think it was 1900 when David Hilbert posed his famous 23 problems. They kind of defined a lot of the, the big areas that, at the time, he felt were important to mathematics and the sort of unsolved things that were important or topical. Or is there some Turing test equivalent, I guess, as a gener- generic question? (laughs)
- LSLaurent Sifre
It's an excellent question. Um, that, that totally defined kind of, you know, like the mathematics in, in the, in 20th century. And maybe, you know, the millennium problems, uh, you know, the seven millennium problems of which, you know, one has been solved so far, uh, is, is another attempt at defining kind of what-... you know, what could be one trajectory of, of mathematics for the, for the 21st century. What are our chances of solving that? That's like kind of, that's like a really hard question because we know, you know, our brain kind of thinks linearly, and, but we know from our experience in working on the Ida, we should actually try to think exponentially. (laughs) Uh, things might, might change pretty quickly. Um, at the same time, you know, like, I think we have no idea how hard maybe, for something like the Riemann hypothesis is. Is it, you know, is it, is it two orders of magnitudes, three orders? Maybe it is like 10 orders of magnitude away from what we can do, right? Uh, because we don't have an existence kind of proof, it's, it's much harder to kind of, uh, have an estimate of, of how hard this actually is. But, uh, I imagine that, you know, solving that problem will involve like kind of creating brand new maths, brand new theories and things like this. So, if we want to take off a proof all the way there, I think that's a capability we,
- 13:40 – 17:50
Why solve math?
- LSLaurent Sifre
we need to, um, s- either it emerges, and it's possible, because, you know, like to prove harder problems you start to need to be able to introduce this new kind of, uh, mathematical objects, um, to decompose the problem, solve problems. And we see that already happening for the IMO at a, you know, small scale. Uh, but, uh, it's very possible that it could emerge from just trying to solve problems, or potentially, you know, we have to kind of explicitly think more about what does it mean to build theories and, and, and how can we encourage, in order for proof to do these kind of things.
- SGSarah Guo
Yeah. Just for, um, our, our listener friends who, uh, don't spend as much time on the, the unsolved math problems, uh, um, just to contextualize a little bit, the Riemann hypothesis is a, it essentially predicts that prime numbers follow a specific pattern, like a distribution, uh, and it has important implications. Maybe we can go from there to thinking a little bit about when, when you were working, beginning to work on math as a search space, um, there are like lots of interesting hypotheses for, like why go work on math at all. One is like the field itself, right? And so I'd be curious if any of you have, um, problems you want to work on. Another is the premise that like this sort of advancement in reasoning will allow us, it will transfer to other domains, um, be they, you know, science or what we think of as more like language-based, like less verifiable non-code, non-rules based domains as well. Um, where do, where do you all want to take this?
- LSLaurent Sifre
I'm not really a mathematician, so I didn't have like a problem I absolutely wanted to solve. But it's been my favorite subject when I was, when I was a student. Um, and I think, yeah, you're absolutely right. There are like, you know, at least two kind of main reasons why you would want to potentially spend a lot of time on maths. One is that I guess it's been described as the langu- the language of the universe, and, you know, it's been extremely powerful to both, you know, describe and, and predict the natural world, and of course, to shape it. And you see that, you know, basically we, we see maths being at the core of all the technology we're using now. So kind of having a good understanding of maths is probably kind of, uh, very important to understand our current world. And then of course, you could, as you, as you alluded to, make an argument that, you know, kind of solving math or you know, like, which requires, you know, reasoning, generalization, obstruction, all these things that we, we think about when we are talking about something like a cognitive AGI, um, would be, would be... is, is a path to, you know, go towards AGI, and that could really help in kind of the development of AGI. So I think at least for me, you know, um, these are maybe the two main reasons as to, you know, why math, uh, is a particular interesting, you know, topic to try to solve i- in a general way, even though, you know, like must... Okay, let's say it's, it's still kind of a constrained domain, but with kind of unbounded complexity, and actually it's representing a lot of stuff. Uh, but it's, it might be worth kind of making a, a, a really kind of, um, focused effort on it, uh, because of all the potential implications it might have.
- THThomas Hubert
I can add something about my own motivation. Um, I've always been interested in the question of making systems, uh, that get better by thinking more. I looked at this in machine translation, in text diffusion, in navigating agents, uh, that discover their, their environment and where, where they do better by thinking more. Uh, math seems to be a, in, in particular proving statements, so the, the core of alpha proof, um, seems to be one of the last big, uh, challenge domains where AI still had a long way to go. Uh, and when, uh, one of the key ingredients would be figuring out a way of, uh, thinking more. And so that can be searching more, that can be a test of morale, and maybe the next step of thinking more, uh, will be thinking so much that the agent comes up with its own theories. So, um, math as a test bed for thinking more was, uh, was my motivation.
- RMRishi Mehta
One of the big reasons to pursue AGI at all is to like uncover the secrets of the universe, you know, like the, the deep questions of like, "Why do, why do things work the way they do
- 17:50 – 21:30
Pursuing knowledge versus practical applications
- RMRishi Mehta
and why are we here?" And, you know, "Why am I conscious?" And, "What's going on?" And, uh, I guess like math, like solving these pure math problems, uh, it feels like one domain where we're already... like there's like a large number of people who are already just doing this. They're seeing math as a search for truth. Like, you know, people are pursuing these problems not because they have any applications often, but more that it's like, you know, "What, what actually is the answer? Why, why does this thing work this way?" Something like even solving the Riemann hypothesis feels like, uh, it has the flavor of like, you know, a sort of pure search for truth, which is, which is quite appealing.
- EGElad Gil
I guess like related to that, um, you know, there's a long history in mathematics of people, uh, or in, uh, science in general or science and mathematics, um, of doing things for their own sake or for the pursuit of knowledge for its own sake. And, you know, Sarah mentioned as an example number theory and its applications in cryptography. Zero knowledge proofs are propping up in different ways in cryptocurrencies. You know, um, group theory and algebra propped up in quantum mechanics over time, and was sort of developed beforehand and then applied and developed further there. Um, are there specific areas that you're most excited about from an applications area for some of the work that you're doing? Or is it mainly, um, doing it for the, the love of the...
- THThomas Hubert
... theoretical part of it.
- LSLaurent Sifre
It's a good question. I think we can maybe, you know, maybe the answer depends on, on each one of us. I think one thing that, um, be motivated about is to learn from mathematicians, you know, what they are interested in and what they find interesting in the current, you know, uh, world of mathematics. And so, for instance, you know, I've been reading about this thing called the Langlands program that is trying to connect, uh, different areas of math, and it was described to me a little bit as kind of, you know, just l- just like in physics, where, uh, trying to look for this unified theory of physics is like trying this unified theory of mathematics, where maybe, you know, we have number theory here, and we have, like, geometry there, and, and but maybe there is kind of something behind that, you know, that is more unifying. Um, so I- I personally like those abstract ideas (laughs) , uh, even though maybe I don't understand them very well. Um, but yeah, I'd be, I'd be very motivated to just see, you know, like, kind of what mathematicians care about, and, uh, and they might disagree (laughs) between themselves as well, yeah.
- SGSarah Guo
I think they would disagree, right? Like, if you look at the historical examples, like when Bombelli starts working on imaginary numbers, he gets completely ridiculed by, I don't know, the more important mathematicians at the time, and like, many years later, we get alternating current and the ability to describe electricity. And so, I- I- I think it's, um, really interesting as a question. Like, if you, if you have the machine, the machine is able to develop its own theorems. Like, where do you point it for, for interest or usefulness eventually is a really big question.
- THThomas Hubert
... to that. One domain I'm currently excited is, is, um, code verification, um, that is at the moment when we write software, we, uh, write the code and we write tests, and when the tests pass, we are reasonably happy. Uh, every once in a while, bugs that pass the test or even, uh, security issues. It would be much better if we could express in code the properties that the algorithm is supposed to verify and to prove in code, uh, that, uh, the algorithms and the- that the algorithm does verify thes- these properties. So, it's already done for very critical domain, like avionics, uh, cryptography, uh, where it's very important, but it has to be done by humans. I think the software industry would be in a much better place if, um, code verification was much more common, and if we, if we remove the bottleneck, uh, which is humans writing those proofs. So, if humans could be enabled, uh, by specialized tools, we could sh- uh, we could handle the minutia of these, uh, of these proofs. I- I- I think we'd have done, uh, a major step forwards.
- RMRishi Mehta
I think an- an- another application area I'm
- 21:30 – 28:27
Insights on verifying correctness within reinforcement learning
- RMRishi Mehta
excited about is, like, the transfer of this technology to many other domains. So like, there are two kinds... two
- LSLaurent Sifre
Yeah.
- RMRishi Mehta
... players to take. Like in one is just the transfer of, like, the mathematical reasoning skill that this agent acquires via this kind of training, to like many other domains, and you know, as we see, like, math is critical for m- for engineering and science and, uh, and you name it. Um, but also like, you know, some of the- uh, the sort of tech we developed here, of like, you know, like scaling RL and like figuring out how to spend a lot of inference time compute stuff like this feels like it's quite generally applicable to many other problems.
- SGSarah Guo
Uh, I'm gonna ask a question that is, um, uh, more, more conjecture, right? Uh, we're- we're- we've been talking about math and code and, um, uh, domains which are not easily formalizable but are formalizable and verifiable. Um, how much do you think this applies in the, in the language domain? Can, um, an AI make something funnier, and, uh, it's easier to tell if something is funny than to like write a good standup set, right? Uh, and- and so I- I think it's kind of an instructive example, but like do you have any intuition for, um, how to, uh, take some of these learnings around the ability to verify correctness or quality, um, uh, you know, within the RL approach?
- LSLaurent Sifre
I guess, um, you know, there are... So as you said, you know, there are domains where there is a ground truth, and there are domains where there aren't any ground truths. And so when- when there aren't, and, you know, like for instance, funny is probably kind of, um, you know, a human kind of... it's a- it's a fuzzy human concept, and maybe if there were kind of aliens out there, they would have a different sense of humor. Um, and so, for those kind of problems, I think that, you know, the only way you can kind of get your grounding is through... like, the- the humans are your grounding. For other domains, maybe, you know, the- the real world is some sort of grounding, and then, you know, you- you- you'll have to go to the real world to get your grounding. Um, but basically, the kind of the RL would allow you to say, well, you know, like, "Where my- where does my reward come from?" I guess sometimes it comes from the real world, sometimes you can perfectly ground it, and sometimes, you know, like, the- the reward comes from the human, and- and- and- and I think that's fine. Um, there is- there are some techniques basically that we developed in the sense that, you know, when- of course when you can kind of machine check things, you can run, uh, at a scale that is- is quite different from, you know, if you have to ask, uh, humans in general, even though we are- we are quite a lot of humans, so, uh, there is still probably quite a lot of scaling that is possible, especially for things that... like this where, you know, like humor, uh, where you could, you know, everyone, uh, should have a say about what is funny or not. Um, so definitely, you know, for instance, the RL framework, uh, I think is- is- i- is still a- a great framework to think about those questions. Um, and- and part of, you know, what makes RL work, that should kind of also transfer, um, and maybe the things where we- we rely on perfect verification, that's maybe... that we don't transfer to that particular question.
- SGSarah Guo
Where is there room for, um, I suppose like human rating or input in terms of the explicit description or labeling of their reasoning in domains that are more verifiable, right? Like if you have unlimited access to Terry Tau to do labeling for you, is that useful? Or, you know, how- how- how should we think of that versus like...... just do, just do more search and work on better tren- like, formalization.
- THThomas Hubert
The way AlphaProof currently operates there is that it discovers its own proofs, and when they are valid, it learns from them and develop its own style, um, which has been commented upon as, uh, looking, yeah, quite-quite alien. So with unlimited access to theory tower with trade, um, proofs that are correct, uh, with some niceness as perceived by this human. Uh, and we could start optimizing, um, amongst the set of valid proofs towards proofs that look nice for purposes of interpretability, for purposes of teaching. Uh, so there was definitely still room in this space of, uh, valid proofs, for proofs that, um, that humans might-might prefer.
- SGSarah Guo
That's actually, like, a pretty damning statement to some degree, right? Because it's-it's just a, it's just, like, preference versus, um, perhaps, like, you know, capability advancement. Like, what do I care if the proof looks alien if I've, if we have new knowledge? Uh, uh, interpretability seems useful in this case, but...
- THThomas Hubert
Yes, yes. Uh, I don't think that's the only angle, uh, for sure. Uh, with more supervised data, we can avoid, uh, the exploration problem, and, uh, we could translate all the proofs that are- that are known to man, uh, and, um, that certainly would make the agent much better. That would save us, uh, years of compute for-for sure. Uh, I'm not saying that interpretability is the only way we could use, uh, we could use math translation, for sure, um, but it could, uh, provide a s- signal that we, uh, couldn't get otherwise, presumably, uh, with time, compute, theory building, uh, we might be able to-to maybe rediscover proofs that are- that are already known.
- RMRishi Mehta
I think it's an interesting question because it, like, uh, like, it highlights the sort of complementarity of, like, specialist human data and, like, RL data, and I think it's especially prominent with, like, LLMs, where, like, LLMs have generated... Have these, like, very strong human priors because they're pretrained on a lot of human data. Uh, but then when we, when we do RL with them, they have an opportunity to take these human priors and build on them and, you know, develop their own, um, styles of doing things, like AlphaProof has done. And I think one thing we've seen in this project is that, uh, often, like, the small amounts of precious human data can be really useful to, like, seed the agent's behavior and sort of, you know, get it from, like, a complete zero state to, like, somewhere-msomewhere much higher where-where it-it can play with the environment in a much more efficient way. And-and then beyond that, it can take it from there and, you know, perhaps match or exceed the human, but with its own style. Um, that's how... Uh, I-I guess, like, that feels to me, like, that's probably gonna be the way forward with RL for LLMs in general, is that, like, the specialist humans are gonna serve to, like, get the LLM from, like, just a bunch of weights that knows how to do nothing to, like, something that, like, uh, is, like, surprisingly strong. And then the RL is gonna take you from there to, like, something that's, like, superhuman.
- EGElad Gil
I think it was supposed to be Poincaré or somebody who was the last mathematician who knew all of mathematics, or at least, you know, understood big portions of it, and suddenly you have an AI or a system that could potentially encapsulate all of mathematics in a single program, and it could really be used as something that could help check a proof. It could help sort of push a mathematician forward in their own research. Um, you know, often when somebody proves something now, if they're in a more side field, there aren't that many people who can actually verify or check the proof that they've done. How many mathematicians have access right now to AlphaProof? Or how do you think about engaging with the mathematics community
- 28:27 – 30:28
How AI could foster more collaboration among mathematicians
- EGElad Gil
about day-to-day usage of this pretty amazing, um, set of advancements?
- LSLaurent Sifre
So at the moment, you know, like, mathematicians don't have access to AlphaProof. And-and to be honest with you, you know, like, kind of, we, at the moment, we can't rival at all with-with someone like Terry Tao. Um, we, I think we-we demonstrated that... What we've demonstrated is that we can learn general mathematics almost from scratch and-and arrive at kind of impressive high school level. Uh, and then we-we need to grow our knowledge base, uh, so that we become useful. Um, but I don't know if you've seen kind of, um, Ter- Terry Tao's kind of recent kind of interviews-
- EGElad Gil
Yeah.
- LSLaurent Sifre
... over the last year, and he's been saying that, you know, one-one thing that, uh, that is interesting in math is that collaborations has been relatively small. A lot of papers are one, two author paper, and max five authors, and it's been because it's been very hard to collaborate with more mathematicians because you need to check what they are doing. Uh, and so it's very time-consuming. But if you instead relied on a formal system to check everyone else's work, then you could l- do a little bit, like, in astronomy, where you could have, um, an amateur kind of living in the middle of maybe nowhere and you've never met, and then you wouldn't have to trust him. Like, you could trust, in some sense, the- the- the- the machine to check the work, and if he's, the machine says it's a correct proof, then it's a correct proof. And then you can kind of start to work with many, many more people, in some sense, kind of, that's cool, because, uh, you can kind of start to think about making collaborations between-between many humans, uh, but also potentially AIs, right? And so that could be, you know, that's one way we are thinking about it as well, is like, "Oh, can we, can we, could we potentially collaborate on those projects?" Um, and-and at the moment, the way, the way we think about it is, you know, we-we think more of ourselves as some kind of... It would be great if we could be, like, a graduate student and kind of the mathematicians could give us kind of the questions. I think that's maybe, you know, related to theory building. At the moment, we're not
- 30:28 – 34:17
Surprising insights from AI proof generation
- LSLaurent Sifre
really good at asking the right questions, and-and-and there's definitely kind of... We would like to rely on kind of the whole mathematical community to kind of, uh, ask the questions that they care about and see if we can help even a little bit, uh, answering those questions.
- SGSarah Guo
I was also a, um, a Go player growing up, and so, uh, um, you know, seeing...... seeing the, the Lisotl match and like, uh, very, as you described, alien moves in Go was, um, incredibly interesting to me too. Is there something that has felt alien or surprising in terms of, like, an Alpha Proof proof so far that you can talk about?
- RMRishi Mehta
Uh, yeah, sure. So let me show you guys, um, the proof that... or part of the proof that Alpha Proof came up with, um, for, uh, problem six at the IMO. Uh, I think it'll be much more easy to understand if I flash it on the screen. Um, so, um, this is problem six. So the IMO typically has six problems, um, and problems three and six are the hardest ones. Um, and so, so this is supposed to be one of the hardest problems of this year's IMO, but it actually turned out to be one of the hardest problems ever, because, um, only 5 out of the 500-odd, uh, contestants managed to solve it. Um, uh, so, so, you know, it, it was really hard. Uh, so we can just go over very quickly what this question is saying, and I won't go through the whole proof, 'cause it's probably incomprehensible i- in the time that we have. But, like, I'll, I'll point out, like, one, uh, cool construction that Alpha Proof came up with. So the question talks about, like, uh, the definition of a function, uh, being Aquasulian. So Aquasulian is not really a mathematical term. This is just a tongue-in-cheek reference to the fact that, uh, Bath, where this contest was being held, um, used to be called Aquasulis by the Romans. Uh, but anyway, so they've given us two equations that, uh, characterize a Aquasulian function. Uh, and we've got to show that, um, there exists some integer C such that for any Aquasulian function F, there are at most C different rational form- rational numbers of this form. Um, so this is a bit complicated to understand, but essentially, uh, this form is the sort of even part of a function. Um, and what it's trying to say is that, you know, you could, like, take any Aquasulian function and give it many rational numbers, uh, and, and you know, d- uh, figure out this expression for many rational number- num- numbers are, and you might get, like, infinite values or you might get five values or three values. Um, so it's asking you to prove that there is some bound, uh, C, uh, beyond which you cannot have more unique values, and also asking you to find what this bound C is. So often... So, okay, so that's the question, "What is C?" Um, so often when you're trying to answer a question like this, um, uh, one strategy you can use is to f- show an upper bound for C and a lower bound for C, uh, and then show that those two converge, and then you, you find C. Um, so, uh, I'm gonna pick up in the proof, uh, from the point where Alpha Proof had already proved that C is less than or equal to two. Um, that's itself quite an interesting, uh, proof, and you can look at it on our blog post if you're interested. Um, but at this point in the proof, Alpha Proof has to decide, is C one or two? Um, and the way it's, uh, it's gonna have to decide that is by, um, finding... So it's, it's trivial to show that C can be one. Um, what's hard to show is, like, is there any Aquasulian function F such that C is two? Um, and, you know, you can pause and try and, uh, find s- functions F that follow these cate- uh, these, um, properties, um, and, and see if you can find one for which you can find two different values of this, uh, F of R plus F of minus R, uh, expression. Um, but, uh, one thing that's interesting is like Tim Gowers, who was one of our judges and is also a Fields Medalist, um, tried this question for a couple hours, uh, and he couldn't find, uh, the construction for function that had this property, which is of course not to say that Alpha Proof is better than Tim Gowers at math, but it's just, uh, it just sh- highlights how hard this, this part of the problem is. Um, and Alpha Proof came up with, uh, this construction,
- 34:17 – 39:21
Future of math and AI: advice for math enthusiasts and researchers
- RMRishi Mehta
which is this function F of X equals minus X plus 2C of X, which is this funky-looking function that I've plotted over here. Um, and interestingly, this function, uh, if you plug, if you plug in R equal to minus one and R equal to half, you get two different values. Uh, and so you can tell that C must be equal to two. Um, th- the... So in my view, this is maybe the hardest problem that Alpha Proof ever solved. Um, and it was a really cool, funky construction that relies on this s- um, this, uh, C of function.
- SGSarah Guo
Very cool. Very cool. Can we start with advice for people who are working on math today and investing time in it, given, uh, Alpha- Alpha Proof and related models?
- RMRishi Mehta
Formal math is gonna be an increasingly important thing going forward, not just because, uh, AI is gonna be really a very, very strong tool, uh, within, uh, formal math, um, but also because, like, it's emerging as a way of collaboration between mathematicians, and many prominent mathematicians are adopting it. And, you know, it's still a small minority of the mathematical community that operates in Lean, but, um, it's a growing minority. Um, and, and AI will only accelerate that. And so I guess I'm not a mathematician, so who am I to advise mathematicians? But like, I guess my advice from the outside would be like, learn Lean as early as you can.
- LSLaurent Sifre
And I realize as well, it's a great tool for education because, you know, like, most of the proofs are written at the level of obstruction of the person who writes the proof. And so, for instance, you know, if, if that person thinks that step is trivial, uh, then, you know, you wouldn't have any explanation about it. Uh, but if you have that proof in a formal language, then you can zoom in, you know, at, at, at the level of detail that, uh, that is adapted for you. And so that's one thing. Uh, you can learn much more at your own level. And the second thing is, is you can actually do self playing with yourself (laughs) in a sense that, you know, maybe you are, you're trying to learn about a very abstract part of math, and that's, you know, the, the usual thing is, you know, it's difficult to explain math because it's- it's quite abstract. And then maybe you can't tell whether what you're doing is correct or not. Uh, and if you are doing it on paper, then maybe the, the next best thing you can do is to try to find someone or your teacher and, uh, that might take you a lot of time, right, to get some feedback. Uh, but if you are using, uh, a formal language, then, you know, you would know exactly what's, wha- what's left to be, to be proved, uh, if you've done the thing correctly or i- if you've kind of, uh, fooled yourself a little bit. So even for educational purposes, uh, I think it's- it might be very interesting.
- THThomas Hubert
It's not really advice, it's, um, more speculation. I, um... Somebody said that there were two types of mathematicians, the ones who build theories and the one who, uh, prove theorems. I understand that in order to prove theorems, sometimes you have to- to build whole, uh-... a whole series, uh, as well as done for a famous, uh, last year. Uh, but it seems that there is more room for, uh, human creativity, human taste, uh, human skill in, um, uh, building theories than in the proof box.
- SGSarah Guo
One thing that I kind of struggle with here is it is, uh, increasingly true across a bunch of, uh, domains that are being, um, where the hill is being climbed with AI, I suppose, that, uh, taste seems to matter, uh, increasingly much. It could be tarry-tau taste in problems, because if you're working on arithmetic progression in primes, that actually leads to, like, advancements in methods and tools. But I actually think it's really... It's one, it's, it's challenging because, like, telling somebody, "Go develop good taste," is a, uh, more nuanced, you know, directive than, "Make sure you learn lean," right? Um, but two, it's also pretty interesting and parallel to the work you guys have been doing, in that, um, to some degree, you know, if you look at the behavior of where you're applying computation, it's like, "Oh, well, why don't you work on, you know, abstractly some, some problems around this area?" Because it'll improve your ability to solve the problem, right? And so I, I, I think maybe as you said, we'll, we'll all do a little bit more test time RL. Um, I don't know if we'll all develop taste, even if that looks useful.
- RMRishi Mehta
I saw this written somewhere, which was like, um, as, as machines get better at finding the answers, like, we're, we're, we're gonna have to get better at finding the questions. And, you know, like, uh, these systems don't have a, you know, their own, um, sort of notion of what questions are interesting. And, uh, given a large question, how do I break it down into smaller questions that might be interesting. And, and you know, that's, that's probably where like, not just in math but in many domains, like even as a sort of, uh, software engineer, right, like, I find that, like, more and more my job is, like, not figuring out these sort of small le- low level details, but, you know, making like good high level decisions that tools can help me with. And that's probably gonna be true of more and more domains.
- SGSarah Guo
Thanks so much for doing this. It's great to, great to meet all of you, and I appreciate you doing this well. Find us on Twitter at No Priors Pod. Subscribe to our YouTube channel if you wanna see our faces. Follow the show on Apple Podcasts, Spotify, or wherever you listen. That way you get a new episode every week. And sign up for emails or find transcripts for every episode at no-priors.com.
Episode duration: 39:21
Install uListen for AI-powered chat & search across the full episode — Get Full Transcript
Transcript of episode uX6ceY1vcUg
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome