EVERY SPOKEN WORD
105 min read · 21,005 words- 0:00 – 11:32
AI is discovering new proofs. Is that AGI?
- DPDwarkesh Patel
Today, I'm chatting with Grant Sanderson, who runs 3Blue1Brown and is now working on a new project documenting the progress AI is making in math. And I wanted to talk to you about this because AI's been making the fastest progress in mathematics as, as of any other field. So whatever is happening here and whatever way we're seeing AI progress happen or not happen would tell us about what will happen to the rest of the world as AI gets better and better. So I wanted to start with this question I asked you when I first interviewed you three years ago. And I asked you, once we have AIs that can get gold in the International Math Olympiad, wouldn't that just be AGI? Wouldn't this just be able to do anything any human can do, given how hard these problems are? And you had an answer which in retrospect turned out to be very wise and, um, correct, which is like, it'll be another benchmark, like all these other benchmarks that AI are passing. Obviously, AI's gotten better in general ways since then, but there won't be some aha moment when this happens. First, I, uh, I think I, I'd be curious to get your heuristics on why that turned out to be true. And second, I'm curious how long you think this narrowness can continue, continue to be true. So by the point that AI has solved the Millennium Prize problem, do you think it's still possible that at that point there's lots of tasks that humans are doing that AI still can't automate in the economy?
- SPSpeaker
It's an interesting question because it's hard to answer without knowing what the solution looks like ahead of time. I mean, if we take the IMO, that's something where I think the spirit of your question three years ago was in looking at how some of the solutions to these problems really seem to require creativity.
- DPDwarkesh Patel
Yeah.
- SPSpeaker
And the designers of these problems, they'll try to have, um, them come up with things that you can't train for as easily. I think the dirty secret with the IMO is that you really can train for a lot of them. And so with the, with the whole AI and math project, uh, undergoing, I think as you point out, one of the reasons it's interesting at all is that there's a spiky frontier to AI. Math is just right there in one of the spikes. Um, but there's kind of a fractal nature to that spikiness, because when you zoom into the specific progress within math, you have some things that are a lot easier than others. So if we just think about IMO, which is old news at this point, right? It's kind of like two years ago that they're really, like, doing quite well. They would've gotten a gold in 2024 if for not the following reason. They were-- They're very good. They just, like, cold solved geometry, basically. And the IMO has these four categories of problems, there's geometry, number theory, algebra, and combinatorics. So, like, geometry just solves in like nineteen seconds in 2024, 'cause it's kind of a brute force solver. And the dirty secret is for students, there's also sort of a brute force way-
- DPDwarkesh Patel
Mm
- SPSpeaker
... that you kind of can go at it. Combinatorics is the one that's the wild card of much more, like, playful, puzzly-seeming problems. Um, and there were two combinatorics problems on that year's test. There's not always. There's four categories, six different problems, so it's kind of a, a toss-up which one is gonna have, um, two questions. Had it been more geometry questions, they would've gotten a gold that year.
- DPDwarkesh Patel
Mm-hmm.
- SPSpeaker
Um, but it struggles on those combinatorics ones. And, you know, someone who's trying to keep that torch of the last holdout of, like, math, uh, for humanity might say, "Well, you know, those are the ones that require the more creativity." Even then, though, I think the spirit of your question on, like, if they're solving, you know, a Millennium Prize problem, does that also, uh, service a lot of white-collar work? It suggests that whatever the rate limiter is between where we are now and that is the same as the rate limiter for making things better at white-collar work. And we can maybe, like, paint a couple different ways that, like, if we focus on, I don't know, Riemann hypothesis, like what would it look like to solve that? Um, one possibility would be these things are extremely good at a specific domain of knowledge and just knowing it very deeply, and then knowing another domain, and knowing another domain. And you've pointed this out. It's like bizarre to have something with this, um, superhuman breadth that, like, knows all the fields so well that's not just finding those lightning bolts-
- DPDwarkesh Patel
Yeah
- SPSpeaker
... that connect them. Um, I think we're starting to see sparks of that, of, uh, like, actually finding connection between the things that it's an expert at. I'm sure we'll talk about it. If the nature of the solution to the Riemann hypothesis was something like that, that feels pretty distinct to me than, uh, what's necessary to get good at white-collar work. Um, and there's a reason to believe actually that, that, that might be the nature of the solution. I don't know if you know the story of, um, like Hugh Montgomery and Freeman Dyson at the IAS, like-
- DPDwarkesh Patel
No.
- SPSpeaker
This is, this is a side tangent, but it's just kind of a fun story on how, um, th- I don't know if it was over lunch or something like that. Basically, you have this number theorist who is pointing out, just trying to understand the statistical correlation between pairs of zeros of the Riemann zeta function.
- DPDwarkesh Patel
Yeah.
- SPSpeaker
So the Riemann hypothesis is all about, like, do all these zeros sit on a straight line? And he's finding this, like, this quantitative question you could ask about, and he writes down a formula. It looks like one over sine squared or something like that. Freeman Dyson, a physicist, is like, "I know that expression. That expression comes up in studying the eigenvalues for random Hermitian matrices," which was something that comes up in studying the energy levels of, um, of, uh, like a nucleus. And the idea that the statistics of those two seemingly different things were the same sort of prompted a, a potential exploration on, hey, are there aspects of random matrix theory that might be relevant to, like, Riemann zeta function? And, uh, I think it's a little bit of an open question, like is there fruit to be had there? But that kind of bridging together from two different fields, like if it turned out that the solution to, uh, the Riemann hypothesis was exploring an idea like that even further, that has this character of kind of how you expect LLMs to be good at math. It's like-
- DPDwarkesh Patel
Right
- SPSpeaker
... they're an expert at the quantum physics, they're an expert at the analytic number theory. They should be able to see that similarity in a way that doesn't require, like, Montgomery and Dyson to be having lunch and, like, happening to talk about that. That's totally different from white-collar work, right? In terms of, like, the, the extent to which you maybe have a hard time using an AI as an editor.
- DPDwarkesh Patel
Yeah.
- SPSpeaker
It's not because they know everything-
- DPDwarkesh Patel
[laughs]
- SPSpeaker
... and you just need them to find that lightning bolt-
- DPDwarkesh Patel
Yeah
- SPSpeaker
... in between. Different possibility would be, um... What's the right analogy? Maybe, like, if we think of Fermat's last theorem between the moment of Fermat phrasing the question and then what the solution itself looks like, where ultimately the solution involves such heavy machinery in math, right? So the beauty of that problem is you can phrase it so simply. You ask about, you know, x to the n plus y to the n equals z to the n. Do you have integer solutions for this when, when n is bigger than three? And it's, um- It's something you might expect there to be an elementary number theory approach to it, but just as far as we can tell, there's just not. Whereas the actual solution, you know, maybe there is something simpler, but this might be the, the, uh, what it has to be. There's such a complicated set of ideas that build on, like, centuries of work, uh, centered around elliptic curves, and then this other, like, mountain of ideas centered around these things called modular forms. And like, both of those mountains have to be built before you can ask the right question that connects it. So if the solution to the Riemann hypothesis involved building a new mountain, like, that's a kind of skill, like, the, the ability to, like, come up with the right new ideas that feels sufficiently different from, like, the character of how they're intelligent right now that it's not like that's what you need from your hired video editor per se, but that, like, if it's capable of building mountains, uh, that are, you know, the correct new theory that, like, crystallizes how we should be thinking about a subject, that's just such a level of intelligence that then it starts to feel like it would be surprising if that didn't permeate into other aspects of the economy besides, like, just the mountain building for math itself.
- DPDwarkesh Patel
Yeah. Or at the very least, even if it couldn't, like, literally do every single thing white collar humans can do-
- SPSpeaker
Yeah
- DPDwarkesh Patel
... it would just have transformative effects in the way that getting gold in the IMO did not have transformative effects-
- SPSpeaker
Yeah
- DPDwarkesh Patel
... on the world. First of all, I do wanna point out that I'm totally moving the goalpost here.
- SPSpeaker
[laughs]
- DPDwarkesh Patel
Because when I interviewed Dario about two, three years ago-
- SPSpeaker
Yeah
- 11:32 – 26:12
The verification loop on conceptual breakthroughs can be a century long
- DPDwarkesh Patel
Right. And so there, there, it's very interesting. The kinds of things you can't make benchmarks for are also the kinds of things, at least in the current paradigm, you can't easily train for.
- SPSpeaker
Yes.
- DPDwarkesh Patel
Right? Because there, there's really no fundamental difference between a benchmark and, um, a training environment.
- SPSpeaker
Yes.
- DPDwarkesh Patel
I think it's very easy to come up with some dichotomy of, like, here's a deep reason why AI can't do a certain thing, and then it turns out, well, you're just thinking about it the wrong way, and actually I can do it pretty soon thereafter. But I'm gonna come up with- [laughs]
- SPSpeaker
[laughs] You're gonna come up with a couple anyway? [laughs]
- DPDwarkesh Patel
And I think that this, this will probably, uh, uh, it'll probably turn out that there's ways in which the, we can train AIs to do these kinds of things in the relatively near term. But it seems like it would have to be different from current RLVR training. So the thing I'm curious about, and the thing it seems to me that drives a lot of the big progress in mathematics and in science generally, is, like, coming up with a new way to think about, um, a problem or the new way to understand the world that then- ... unifies different fields, spawns entire new fields, um, solves problems we weren't even thinking we're, we were trying to solve in the first place. Like, the reason, um, Einstein was thinking about GR is not because he wanted to explain why light bends or why black holes exist. These are phenomena he didn't even need to know or need it to be explained in the first place. But in mathematics, it often seem-- Okay, a total outsider, I don't even know the details of what I'm talking about here. From the outside, um, it seems like there's often ways to, say, prove a specific problem that can motivate a new conceptualization, uh, one which results in a whole new field, a whole new way of thinking, which is immensely productive, and one which doesn't. I think, um, I, I'd be curious to hear you talk about whether, uh, Galois coming up with group theory and distinguishing his, like-
- SPSpeaker
Sure
- DPDwarkesh Patel
... solution to the, um, the quintic having no formula for the roots and Abel coming up with a different proof a few years earlier that didn't come up with group theory. But then if you wanted to do a verification loop on, like, is group theory an interesting concept that was like, was something useful done here? Why is this proof better? Potentially, that verification loop is a hundred years long-
- SPSpeaker
Yeah
- DPDwarkesh Patel
... and it involves, um, the cryptography coming around and physics making progress and all of the ideas in group theory being relevant and understanding, like, symmetries in physics and all those kinds of things. There's like a hundred-year verification loop of why is this a productive concept in the first place. Um-
- SPSpeaker
Yeah. Um, boy. Uh, y- yeah, you, you struck a nerve because I, I had this, like, project about Galois I was gonna do in 2022 that I put on the shelf, but I spent like a year of my life, like, thinking a lot about what he did. So, uh, there's a risk of me accidentally talking too long on the specifics [laughs] you can hold me back on. It's, um, it's a, it's a perfect example for your case, because describing why it was a valuable insight, um, does not come from immediate utility. And so as, uh, it, certainly if you're thinking about RLVR environments, it's like, okay, this is gonna be really hard to do. But it's interesting to note how even with, like, human verifiers at the time, like, it took a really long time to recognize it as being useful.
- DPDwarkesh Patel
Mm.
- SPSpeaker
Like, I think Einstein with GR, people sort of felt.
- DPDwarkesh Patel
Yeah, yeah.
- SPSpeaker
You can, like, feel, this feels like a good theory right away. Like the, what makes the Galois theory such an interesting example is you have, um, literally this one-hundred-year segment of, like, an idea that, like, flows through many different people's heads before it, like, settles into something that the math community, like, agrees is good. So to back up a little bit, I don't-- Do, I mean, do you want the background-
- DPDwarkesh Patel
Yes, yes. [laughs]
- SPSpeaker
... on the problem at all? All right. Uh, well, so we all learn about the quadratic formula in school. Um-
- DPDwarkesh Patel
I thought you were gonna say, "We all learn about group theory in school."
- SPSpeaker
We all learn, we all-
- DPDwarkesh Patel
[laughs] I missed that class.
- SPSpeaker
We all learn about group theory. About quadratic formula. Um, so, so this was, this was known, in some sense, like, Greeks could solve quadratics, but they didn't really write things in algebra, and so it's, it's really more like the Arabs that, like, wrote down, like a, uh, that formula. There's this delightful story around some, like, dueling Italian mathematicians. Not real duels, just like, um, like intellectual challenges, who, uh, like, secretively found a formula for the cubic. Um, and then, uh, very shortly thereafter found a formula for degree four polynomials. So a natural open question for, like, mathematicians is, um, can you find a formula that solves degree five equations? Now, the nature-- The degree four, it's monsters. It's like, it's a, it, it would be wild to write it down. You usually don't really write it down in full. You break it up as, like, a procedural thing. Um, so you might believe these things have this exponentially increasing complexity. So many hundreds of years, nobody is, like, really answering that question. Usually, we say Abell was the first to prove it. Um, he was this young, precocious Norwegian mathematician, and he showed it's, it's simply impossible. It's not that you can find a quintic formula. He thought he found one, but he showed it's impossible. I think the real credit, though, like, you have to back up a little bit and talk about Lagrange, where Lagrange found the right kind of question to ask about this. Um, I can go into the details if you want, but I'll give it a very high level. He, he, he was studying the question, and he recognized being able to solve these polynomials is actually very related to understanding, like, the way that certain algebraic expressions are, like, symmetric, like, more or less so. Like, if I write down A plus B plus C plus D, just like adding four variables, if I permute those, it doesn't change the value of the expression. Whereas if I write, like, A plus B multiplied by C plus D, some of the permutations don't change it, but some of them do. And he had this really, really nice insight about how if you can find expressions like this that have, like, four free variables, but all the permutations take on three distinct values, that had this unexpected relationship with being able to reduce degree four into degree three.
- DPDwarkesh Patel
Mm.
- SPSpeaker
So he started approaching the, like, can we find a quintic, um, polynomial by saying, "Hmm, I wonder if I can extend that." Um, and to extend that method, you would have to have an expression that has five free variables such that as you permute them over all the five factorial permutations, it takes on only four values or fewer. So that's a, like, you could put that in a puzzle book. You could put that in a brain teaser that, like, a 12-year-old could engage with. Um, and it's, it's not too hard to, like, find yourself feeling like that's an impossible task. And so Lagrange is sitting here saying, "Hmm, here's a strategy that I'm trying to solve this problem. Can I find a quintic polynomial?" This strategy doesn't-- It seems like it might be impossible, at least from this strategy. But that was the first time in history that people had the instinct that some kind of question about symmetry was the right way to be studying these polynomials. In his mind, it was just a way. It had yet to be discovered that, like, actually there's a tighter connection. And also, like, maybe rather than searching for the formula, we should be asking the opposite question. Can you prove that it's impossible? So he sort of planted that seed.
- DPDwarkesh Patel
Mm.
- SPSpeaker
Like, around 50 years later, Abell definitely read Lagrange and was influenced by it. Galois, we know that he loved Lagrange when he was, like, falling in love with math. And so it's very hard to imagine that, like, these two young geniuses-
- DPDwarkesh Patel
Mm
- SPSpeaker
... the fact that they both come up with, like, pretty similar insights around that problem, it's not, like, born from Lagrange. But to your question on, like, are you, are you able to verify that this was a good idea? There, there wasn't any, like, result that Lagrange came to. There's never, like, he solved the problem and therefore we know that that was, like, the right question to ask. He asked it. There's some, like, intrinsically interesting thing. It also wasn't very important for math at the time. Like, most people were more interested in, like, the applications to physics. This is almost in that, like, side, almost recreational hobbyist type thing. Like Abell, you know, he started working on quintic stuff, but then he was advised to, uh, spend more of his effort studying elliptic functions, and so more of his work was on that before he, he died young. He, he died at 26 from tuberculosis. And then Galois, um, he, he- Pushed both of those ideas, like, to the, in the right direction, where he really understood, um, the nature of abstraction. And so he had this really nice piece that he wrote while he was in prison, actually. He was like ... We could talk all about his life story. It's pretty wild. But he, he's like this te- teenager. He's in prison. He had tried to submit his math papers, and they had been rejected. So again, it's like verifiable reward. The, like, verifier function that is the academy at that time-
- DPDwarkesh Patel
Right
- SPSpeaker
... is rejecting what, what he wrote.
- 26:12 – 38:08
Will we understand an AI proof of the Riemann hypothesis?
- SPSpeaker
So people have this worry about mathematics in particular that, you know, the, the AIs will prove the Riemann hypothesis and our understanding of mathematics won't be any the better for it. I have a couple questions about this. The first one is whether this is, like, a thing you should expect. Mm.
- DPDwarkesh Patel
Like, isn't the reason humans come up with general, natural, uh, objects and sub-goals and whatever when we're working on a big problem is that it's just, like, p- useful when you're trying to work on a complicated, important problem. And w- so we, we can just think about, like, theoretically, is it, would this even be a simpler way to solve the Riemann hypothesis as opposed to just coming up with the natural abstractions that are relevant to thinking about the problem? And then two, empirically, is this what we observe when AIs do and make progress on problems today? When the, um, when the AI came up with that counterexample to the unit distance problem conjecture, uh, you can just read its chain of thought, and it seems... It's not understandable to me, 'cause I don't know anything about mathematics, but [laughs] it seems to other mathematicians it was, like, understandable, and it made s- it made use of, like, known concepts of mathematics and, like, proved relationships between them and all in natural language, and as a result, accelerated our understanding of the connection between, um, this object and this conjecture. So is this even a th- like, e- empirically, are, is this a thing we should be worried about?
- SPSpeaker
I think it depends on the nature of, yeah. Like, a- again, if we sort of break down, like, the three possible ways of, um, like, solving the Riemann hypothesis, that one, and o- the other, like, big one from this year was the, like, a certain Erdos problem numbered, like, 1196, but it's a, it, about these things called primitive sets. But basically, it had that character of bringing an idea from a seemingly different field. As soon as you just present the basic idea to a mathematician, you say, like, "Hmm, what if we, like, use this, uh... like, try the Markov chain process where we, uh, show that this thing is one from the bottom up probabilistically rather than the top down, and, like, use the von Mangoldt function?" If you, like, say that to someone in the know, they'd be like, they'd kinda know how to run with it. So you have this very, like, small idea that has the form of expertise in one field, expertise in another, draw a little lightning bolt between them. Like, those are, those are gonna be very human parsable, right? 'Cause all you have to do is just, like, show the start and end point of the, what those connections are. If the character of it is mountain building, uh, you do have to, you have to put in a lot more time to, like, understand that new mountain w- that was built, because it's, like, a new thread that's not just, like, lightning bolt between them. And then if the nature of the progress was just, like, raw hustle, right? It's just, like, this just super long thing that's no new theories, but it's just, like, long, long, long chain of reasoning answer. Then, then you would have that worry of, like, okay, there's this whole digestion process. So I don't think there's one clear answer. I think it depends on what the s- what the, like, solution there would look like. And on the mountain building side, that would actually be really interesting to see. Like, is it by default a very human understandable, like, the way that we, like, see new theories, um, from, like, great mathematicians? Or is it, like, a, like, an alien different kind of mountain being built where we even have to, like, reprocess the kinds of abstractions that we, we engage with?
- DPDwarkesh Patel
Right.
- SPSpeaker
Well, the closest example here would be, like, the, you know, the, the attempted solution of the ABC conjecture that was, um... Uh, we maybe shouldn't get into that one, but the, it, it's prob- it just is not, probably not a correct solution, but basically it's this, like, whole new way of thinking that this, um, otherwise reputable mathematician in Japan had, like, come up with. And it just took mathematicians, like, a long, long time to even parse what he was saying. But it had the feeling of just, like, an alien bit of mathematics that's theory-building. It's not just-
- DPDwarkesh Patel
Mm
- SPSpeaker
... like, long cha- long chain of reasoning. Um, it's like, he called it, like, inter-universal geometry or something.
- DPDwarkesh Patel
[laughs]
- SPSpeaker
And so the fear that you would have is that, like, the AI, like, does that. The, the biggest fear would be that it does that, and then much like the ABC conjecture, like, people work for years to go up the mountain, and they're like, "Dang it. This just isn't right." [laughs] Right? And, like, if there, if it turns out to be wrong, but it, like, really looked right. But even if it was right, there's, there's just a lot of effort to, like, hike up a new mountain.
- DPDwarkesh Patel
Yeah. I- if we end up in that situation, David Be- uh, Bessis had a re- really great blog post called, um, The Fall of the Theorem Economy.
- SPSpeaker
Hmm.
- DPDwarkesh Patel
Where he's talking about this, um, you know, uh, hi- historically, there, as you were saying, mathematics is coming up with these definitions and problems, and it's about proving theories, theorems about them. And that, um, really the theorem-proving stuff is what gets all the credit, but it's, like, really a parasite on the coming up with the definition stuff.
- SPSpeaker
Yeah.
- DPDwarkesh Patel
And historically this has not been a problem in terms of credit apportionment, because if you come up with a definition, you're probably gonna be the guy who, who comes up with a theorem. But now we're in a situation where, um, if the valuable work is the, the coming up with the insight, and then AI just automates the latter part, the, it... So, okay, imagine a scenario where we have, uh, AI comes up with, like, the Abul-like direct arguments about a bunch of important conjectures in the world, and then we just have these proofs. And now it's up to humans or to future AIs to then consolidate I mean, I'm sure if you had acc- again, having no object-level understanding-
- SPSpeaker
[laughs]
- DPDwarkesh Patel
... of this argument whatsoever, I'm sure if you had access to it, it would make it easier for you to then think about, like, well, what is going on here? Is th- is there some deeper way in which we can understand how, why this proof works that would make it easier to come up with the ideas behind group theory?
- SPSpeaker
Yeah. I think it would, it would be hugely helpful, right? Like, uh, 'cause I mean, so much of, like, trying to discover new math is, like, like mostly being wrong, right? You're, like, trying to solve a problem. It, like, what it, it do- it doesn't feel like constantly taking the correct step up the mountain.
- DPDwarkesh Patel
Right.
- SPSpeaker
Like, mostly it feels like a random drunken walk where you're, like, doing a thing and then, oh, you're wrong-
- DPDwarkesh Patel
Yeah
- SPSpeaker
... and, like, constantly discovering wrong. So if at the very least you know that trying to digest what you know is ultimately leading to, like, a correct solution, like, that feels like progress simply because it's pr- it's providing, like, a, a sense of knowing that it leads to a solution. And there's plenty of, plenty of, like, instances in the recent history of math where it feels like the reach has sort of exceeded the grasp, where there's things that are proven, like, long before they're understood. And, uh, I mean, one of my favorite, uh, like, openings to a, a paper, it's not even, like, a research paper, it's more like an expository one, is from this, um, a mathematician named Timothy Chow, who was trying to understand a concept called forcing. And so there's this problem called the continuum hypothesis that more or less asks, um, like, you have a size of infinity for the natural numbers, you have a size of infinity for the real numbers. Is there something in between?
- DPDwarkesh Patel
Mm.
- SPSpeaker
And the answer is both yes and no. It depends on your axioms. Like, it's sort of outside the scope of, of our usual axiom systems, which is an interesting answer. But the method to, to, um, describe it is just really, really hard to understand. It's the thing called forcing. And in the beginning of this paper he, he writes, like, "I, I wanna..." Like, everyone knows the idea of an unsolved research problem. Like, I wanna propose the idea of an unsolved expository problem-
- DPDwarkesh Patel
Oh
- SPSpeaker
... where like, sure, we've proven it, but we don't really know why it's true.
- DPDwarkesh Patel
Mm.
- SPSpeaker
And so then he proposes, like, a partial solution to that expository problem. You can imagine why I loved that framing, 'cause, like, this is my whole life. [laughs]
- DPDwarkesh Patel
[laughs]
- SPSpeaker
It's like I don't do research math. It's just, it's just wholly about, like, what's the most clear way to understand this, um, even if it's proven? Just, like, there is a difference between proof and explanation. And so on that side, I think that you are basically, like, getting to the, the importance of that distinction.
- DPDwarkesh Patel
Yeah. And that, that will be the main incentive for... Or the incentive would have to change in not just mathematics, but in other areas of science from, um, proving things about the world to consolidating proofs into problems or higher level insights. But we were having a discussion earlier at, at lunch about, like, uh, a recent talk you were giving about, you know, design and how it, um, helps us understand things. And then in the limit, is there really a difference between the conceptualization for an idea and the idea itself? So, you know, if, if you think about special relativity and, like, space-time diagrams, um, and Mi- Minkowski space-time, is it like, yeah, this is, like, a way in which we illustrate this idea of, like, why there's length contraction and time dilation, but is that like... Is it, like, that is the reality. So the exposition does seem to be, like, the explanation in some sense here.
- 38:08 – 53:48
Can AI find the hidden bridges between fields?
- DPDwarkesh Patel
I, I wanna go back to this question of, uh, earlier I was-- we were sort of just as AI has crossed this threshold, this important benchmark of being able to connect existing ideas to come up with a new discovery or, uh, pr-prove or disprove something, just as it's crossed this threshold, we're like, "Okay, but what, what's the next thing?" Um, I wanna just, um-
- SPSpeaker
There's a lot more to do on that one, by the-- like just because a couple lightning bolts have been str- I still-- I, I think there's like this flourishing future over the next couple years of like really connecting.
- DPDwarkesh Patel
Right.
- SPSpeaker
Yeah.
- DPDwarkesh Patel
And, and so in the limit, you could even say, um, I, I don't know if this is accurate to say, but potentially a lot of maybe the biggest breakthroughs ha- like look like this at some level. It's just, um, general relativity. Oh, I, I, you-- like you're just, you're just connecting together like Riemannian geometry and special relativity, right? And so as AIs keep getting better and better at this connection thing, maybe a lot of big breakthroughs are not really of a different qualitative nature. I don't know if you have a take on that.
- SPSpeaker
Well, I mean, a lot of the conversation focus has been on problem-solving and that nature of math. You know, like taking off Erdos problems or something.
- DPDwarkesh Patel
Yeah.
- SPSpeaker
Um, I would say it's not even a majority of mathematicians who would maybe characterize their work as like really targeting the next problem to take down. Uh, are you familiar with like the Langlands program?
- DPDwarkesh Patel
No.
- SPSpeaker
Um, ah, okay. So th-this is like-- it's not even a field of math so much it is, uh, like a, like a research ethos where Fermat's Last Theorem is one i-inkling of this, and you had like these two different seemingly, uh, disparate things, and a connection between them like led to a solution. Um, so, uh, Langlands was a mathematician. He has this like famous letter now essentially spelling out how it seems likely that there's a lot more connections like that, and e-even got like a little bit more specific about the nature of c- the connections, such that you might imagine this like large map, and you've got this like valley over here and this mountain over here and this like set of plains over there. And there's a lot of mathematicians who's, who would characterize their work as being part of like trying to understand the threads-
- DPDwarkesh Patel
Mm.
- SPSpeaker
Like on this map. And the progress there, it's not even like, here's this one specific problem that we know will be solved by that connection. It's more that there's been enough time and time again cases where big problems were knocked down by finding connections, that it's almost preemptively finding the connections. And so you could have-
- DPDwarkesh Patel
Interesting.
- SPSpeaker
Yeah. It's, it's, it's actually very interesting. Like this, um... [laughs] Anytime you run into a mathematician, it is like ask them whe-whether, you know, the, the character of their work is more akin to like Langlands' program or if it's more akin to like targeting one particular problem.
- DPDwarkesh Patel
Mm.
- SPSpeaker
Right? And you get a, a certain like bifurcated split there. But the, uh, the possibility of AIs being supercharged connectors feels like it might be, you know, an amplifying tool in that pursuit. It's hard to measure though, right? Like, 'cause this cuts to what we were saying earlier. How do you, how do you assign a score to say like, "Yes, you've done it"? Um, if it's, if it's knocking down a problem, you have a clear way of saying, "Yes, you've done it." You can write the headline. You can have your like PR move as the AI company to say we did it. Whereas like if it feels like that was the right connection drawn, you can like, you can write theorems around it, and this is the nature of what the papers in that, that field look like. But I think it, I, I think it will require a lot more like human in the loop to basically like say, "What was it, uh, like the kind of connection that we're going for?"
- DPDwarkesh Patel
Mm.
- SPSpeaker
Um, but that's my guess on what most of the useful progress, uh, from these models will look like, like in the next five years, is just really filling in that landscape of like connections that you can draw if you're an expert in multiple fields. Like you've pointed out, it's kinda surprising we haven't already had this.
- DPDwarkesh Patel
Right.
- SPSpeaker
And what I'd be curious-- Like I would be curious to know at a technical level what causes the unlock there, because on the one end, you can kind of paint an explanation in your head for why you could be an expert in all of these things and not be drawing those connections, which is when the thing is reasoning, like the method of reasoning is this, um, auto-regressive chain of thought phenomenon. Auto-regression is actually like a really, really weird way to, uh, produce stuff-
- DPDwarkesh Patel
Mm.
- SPSpeaker
I think if, if, if you think about it. Like, like you're an intelligent person. Imagine I've locked you in a box, right? And then the, the only way that you have of interacting with the world is that you receive a slip of paper, and then someone says, "Can you like predict what will come next?" Right? And then you predict what will come next, and then your memory's wiped, right?
- DPDwarkesh Patel
Yeah.
- SPSpeaker
And then you get like another slip of paper and you go, um... Imagine that was done a whole bunch, and then what comes out on the other end, they're like, "Look at this essay that you wrote." You might look at that and be like, "This is awful. That's not the essay that I would've written."
- DPDwarkesh Patel
[laughs]
- SPSpeaker
Right? 'Cause like the process of like repeatedly like predicting something is just pretty different from how you would think as a writer to like compose it and think it through and everything. Um, and in particular what would probably happen is you're sort of a slave to your context, where, uh, you might be answering some question about some particular field, and so you like draw on all the context around that and you're going there. The, the connection that actually is where all the substance is gonna come from is like by its nature a very like unlikely one. And, you know, you can do all the RL that you want to try to like get better in some way, but like what's the thing that's specifically up-weighting and incentivizing making these unlikely connections when the vast majority of them like aren't the predictable, you know-
- DPDwarkesh Patel
Right.
- SPSpeaker
Next token that would come in there. And so it's like- It might be the case that you just have this intelligence that's sort of locked in there inside that box, but it's just a weird way of interacting with it. So the thing I'm curious about is, like, do you ever get any fruit by just, like, questioning the premise of how tokens are generated, like, every now and then in some way? Right. And I don't think it would be as simple as you, like, manipulate the temperature or something like that, but, like, are there any things that you can do that take, like, the existing level of intelligence but, like, find the right ways of sparking those connections, uh, that, like, unlocks these sorts of things that we're seeing? Or do you need just a, a little bit more intelligence such that at the level of prediction, it's kind of predicting that it should be making that lightning bolt to another field?
- DPDwarkesh Patel
I, I think it's more productive to reason instead of architecture or even loss function-
- SPSpeaker
Yeah
- 53:48 – 1:07:07
Why real-world tasks don’t fit into RL environments
- DPDwarkesh Patel
Obviously, AI for math is making a lot faster progress than everything else, and people point to verifiability of the domain as the key reason this is happening. I think that's one of the two important reasons, but I don't think... I, I think people really neglect the other one. And, um, I'm, I'm outside the labs. I don't know what's actually going on. But this, this is, like, you know, this totally naive, uh, um, theory. Okay, a, a tangential question to why AI is making so much progress in math. Why has it been so slow at computer use?
- SPSpeaker
Hmm.
- DPDwarkesh Patel
Which is what you would... You know, computers is actually very verifiable. It's like, you know, is my Etsy package coming?
- SPSpeaker
[laughs]
- DPDwarkesh Patel
Or, like, is my event booked, you know? Whatever. These are extremely verifiable things to survey. What computer use lacks is grindability.
- SPSpeaker
Hmm.
- DPDwarkesh Patel
Um, so because websites have, like, bot detectors and also it takes a tremendous amount of compute to run parallel rollouts, it's very hard to just run, like, 1,000 parallel rollouts of the same checkout flow on Amazon, um, because you'll get, like, shut down by Andy Jassy, right? And so you can-
- SPSpeaker
Him personally. [laughs]
- DPDwarkesh Patel
Yeah. [laughs]
- SPSpeaker
Presses the, like, red X on Dwarkesh button.
- DPDwarkesh Patel
Exactly. [laughs] And so you could try to build clones of every single website. This is very labor-intensive and slows you down. So, and the reason you, by the way, you need to do so many parallel rollouts in order to learn a skill currently with, um, deep learning is that we haven't solved sample efficiency.
- SPSpeaker
Sucking supervision through a straw-
- DPDwarkesh Patel
Yeah, exactly
- SPSpeaker
... like Efthimiou says.
- DPDwarkesh Patel
And, uh, of course, people are working on many different techniques, but fundamentally, there's this big problem and there, there's this big constraint in the way we train AIs that we just... With code also, you can containerize a given, uh, level of progress in a repository and then just- Pair, spin out thousands of parallel containers or hundreds of parallel containers and say, like, "Try to implement this feature." And it's totally deterministic, and because it's deterministic, you can solve the credit assignment problem because you know that whatever caused this rollout to succeed and this one to fail, the diff is the thing that, like, worked. And y- this way you solve the credit assignment problem. If you have situations that are starting off at different starting points, it, this credit assignment problem becomes much harder to solve. But most of the mo- things in the real world are just very hard to containerize in the same way. Like, coding and math are, um, exceptions to this rule. But if you're just trying to figure out, how do I build a new business that succeeds? How do I, like, go trade in the markets for a day and, like, make money? You can't, like... The fact that you have to interact with the real world-
- SPSpeaker
Mm-hmm
- DPDwarkesh Patel
... and, like, things change day after day means that you can't keep replaying and grinding and farming the simulator. But the, the math, of course, is the exception, and I, I feel like this is actually an important driver of progress in this domain and also in, um, in, uh, in coding. Um, it's not just verifiability. It has to be grindable. Uh, the third reason that people point out that AI is making fast progress is they focus a lot on LEAN and formalization. Again, I have literally no idea what's going on in the labs.
- SPSpeaker
[laughs]
- DPDwarkesh Patel
I feel like LEAN just doesn't matter that much for, like, the current level of progress in AI. Or, like, why is AI able to solve the unit distance problem? Well, they re- or, sorry, disprove the conjecture about the unit distance problem. They release the chain of thought, or at least the su- uh-
- SPSpeaker
Mm-hmm
- DPDwarkesh Patel
... a rewrite of the chain of thought. Didn't have any LEAN in it. [laughs] I think it just, like, the, the process-based supervision that LEAN provides, where you know each step is correct-
- SPSpeaker
Mm-hmm
- DPDwarkesh Patel
... seems, like, less relevant than just having this grindable outcome that is verifiable.
- SPSpeaker
That's an interesting point, like, grindability mattering more. I guess I will say on the, yeah, okay, so naively you might think LEAN provides, um, something unique for math because you're able to see if it can prove it. You have old-school software that can tell you yes or no. You use that as your VR. I mean, what, so the, what would corroborate your point is the idea that, like, the initial attempts, again, I'll just circle back to IMO. It's like initially DeepMind basically does that. It's like everything in LEAN, and then the next year it's all in natural language. So it's, to your point, not needed. I do, I think there is a, um, a yet-to-be-explored benefit of that formalization domain, which is at the moment you still need, you know, ultimately, like, a human is, is reviewing that, um, counterexample to the unit distance conjecture to say, "Looks good." And that, that provides a certain bound on how, like, endlessly explorable things are. Like, if you consider, like, AlphaGo, AlphaZero-style stuff where they're just, like, off in their own universe just, like, playing a bunch of Go and exploring themselves, just completely going potentially off the rails of what any human needs to look at, but they still have this automated verifiable reward. It's not just that, hey, you can do RL on that. It's also, you basically never have to check in, and you can just, like, pour compute at them, like, exploring the universe of Go. Um, what stands to be interesting, like, maybe this won't pan out, but I think the, the jury h- should still be out on, like, um, whether this will yield anything. With LEAN, you could imagine having a basically endlessly running program that's constantly trying to extend Mathlib. So Mathlib, it's this GitHub repository that's basically, like, all of math, uh, written in code. It's very far from all of math, but they want it to be all of math written in code that you can ask, like, "Is this proof correct?" It's very labor-intensive to write these proofs. There's, like, a whole subcommunity around it. Um, but you could imagine, what if you just had an AI th- where you say, "Simply try to extend Mathlib"? Maybe it's a fork of it so it's, that it doesn't have, you know, like, uh, trash in it because people, you know, people have certain taste for, for what they want to be in there. So you have, like, your fork of, like, the pure AI Mathlib, and it just goes, and it just, like, doesn't stop. It doesn't need anybody to check in on it, right? It could just keep going. Uh, it might come up with its own conjectures. It might come up with its own theories and, like, different definitions. Maybe many of them are useless, but it just has this infinite tree that it can, like, grow out. That's a very unique thing that math has that nothing else has, where you could press Go and then just, like, just, just pour compute at it and, like, look away for 10 years and then come back and say, like, "What do you have?" And there's, there's gonna be something, right? And then there's a question, is it useful or not? Like, how do you suss that out? That's just an interesting thing to be able to do.
- DPDwarkesh Patel
Yeah, yeah, yeah.
- SPSpeaker
It would be very surprising if that didn't yield, like, some sort of interesting, uh, mathematical insight from it, right?
- DPDwarkesh Patel
That's, that's interesting, yeah.
- SPSpeaker
So I think, like, that's the real case for... Okay, there, there's, there's, like, two different ways that, like, LEAN is important in this story. That's the first one of them, basically, is how it's like you could let go, not even check in, and progress will be made. You can do that with Go. I don't think you can do that with natural language math.
- DPDwarkesh Patel
Hmm. That's very interesting. Have, did you see Karpathy's auto research, uh-
- SPSpeaker
No
- 1:07:07 – 1:16:02
Good writing requires theory of mind that AI still lacks
- DPDwarkesh Patel
uh, by the way, on the writing front, I actually have, I, I have a theory of why writing is making worse progress than these other domains. So I think one of, one of them is what you said, that they're bad at judging not only A versus B, but they get, like, v- just totally derailed by B star-
- SPSpeaker
Okay
- DPDwarkesh Patel
... which is this, like, shitty essay that just hits all the, um, all the wh- bells and whistles that, like, A is supposed to hit.
- SPSpeaker
Mm.
- DPDwarkesh Patel
And then so the reward hack thing just, like, totally goes off the rails. But I think the other important thing is that writing is not modular in the same way that code and math are. Like- You know, you can write a function many different ways, and they kind of do the same thing, and of course, you want it to be very clean and stuff. But like at the end of the day, it works, it works. Same with like lemmas in mathematics. And then, you know, you can like have some end product d- that is different from the way it is produced.
- SPSpeaker
Mm-hmm.
- DPDwarkesh Patel
So the code is the thing that produces some end product, and you are-- you want a functional end product. Um, whereas in writing, the end product is directly the thing the AI is producing, and each paragraph, sentence, word matters because that is a thing that is like-- like, that is the substance. It's not like some separate thing that is produced out of the writing. And so it, any... It's a-- It can't just be, it can't like be slop. It ha- the, uh, in the way that like code can be slop and still produce some out-outcome that you want.
- SPSpeaker
But you, but you were just pointing out how actually, um, we've gotten much better at agents writing not just functional code, but clean code.
- DPDwarkesh Patel
Yeah.
- SPSpeaker
Why is it not the case that the same progress that allows you to go from merely functional to like clean and, and like a mergeable PR-
- DPDwarkesh Patel
Yeah
- SPSpeaker
... doesn't also result in, um, like clearer writing?
- DPDwarkesh Patel
Yeah. That's a good point. I mean, also, has it not? Like, I agree there's many ways in which they're, um, terrible writers. But for a lot of writing I consume, I find it's better to just copy-paste it into, uh, an LLM and just say like, "Explain this to me." The explanation will be better than the thing that is produced, uh, by the human. So it's funny that we say, like, "These are such terrible writers." And also my real preference is just like, "Can I just have an LLM explain it?" Even when I'm talking to a human expert like live on a call-
- SPSpeaker
Mm-hmm
- DPDwarkesh Patel
... um, if it's a piece of knowledge they have that only they have that's not encoded in, uh, the distribution, I want them to explain it to me. But then if in order to understand that, I need to understand a more basic concept, uh, I would prefer if it was socially acceptable for me to just be able to say, "Let's pause there. I'm just gonna ask, uh, an LLM how that works, and then we can come back to your, um, your, your special piece of knowledge."
- SPSpeaker
Well, it sounds-- I mean, that's distillation, right? An explanation. And so if, if you're, if I'm thinking of like quality of you as an essay writer, um, if it's that I give you a book to read and I want a book report, right, then I might believe that, okay, the LLM maybe gives me a better book report. Um, but I think what we, what people are really-
- DPDwarkesh Patel
Mm-hmm
- SPSpeaker
... getting at when they say it's better, right, like what is writing? It's not just distillation of pre-existing ideas. It's not just like how do you explain clearly, 'cause they are good explainers. It's like what is the insight?
- DPDwarkesh Patel
Yeah.
- SPSpeaker
And, and, and this is, this is where it gets like just autoregression is a very weird way to generate stuff because, um, like when you're writing, you sort of, you sort of know in order for it to be good, you have to have an element of the unpredictable. And it's, it's not just like increasing temperature in your mind or something, right? It's like knowing exactly the correct point when you want to make an unpredictable move, and that that's gonna be-
- DPDwarkesh Patel
Yeah
- SPSpeaker
... what's more insightful. And so even if it's like better at explaining a pre-existing thing, it's like what generated that book that you wanted distilled in the first place?
- DPDwarkesh Patel
Yeah.
- SPSpeaker
Right? It wasn't, it wasn't an LLM that like generated it and you just needed it. It's like some author who, who through a lot of exploration of ideas in the world and then deciding what aspects of it were interesting and which ways of presenting it were like the, the coherent, um, well-motivated narrative. It's like they put that all together in some way. And, you know, if they're a good author, it's probably one that actually you would, uh, err on the side of reading their book instead of the distillation. But still, what makes it worthwhile to like explore at all in the first place, and you're uploading it at all, um, I think it's all of that side of it that's the, like when, when people will cite them being bad at writing.
- DPDwarkesh Patel
Yeah.
- SPSpeaker
And it's that element of unpredictability-
- DPDwarkesh Patel
That's right
- SPSpeaker
... of being deliberately, um, choosing something that's novel, that's like very directly contradictory to like the way that things are being produced.
- DPDwarkesh Patel
Yeah. That's a good point. They-- I think they're also really bad at building really good mental models-
- SPSpeaker
Mm
- 1:16:02 – 1:33:38
Why learning will still depend on human curation
- DPDwarkesh Patel
What advice do you have about, um, using LLMs to learn? I, I-- So as I was describing for a lot of well-known concepts, I find them very helpful and but often just a couple of further messages down and I'm trying to understand something and I just-- They're, they're so confused themselves, they're confusing me and they don't explain it the right way, and then I'm just-- I know that talking to the right human could clear up my confusion in three minutes. I don't know, and I feel like yet more and more we're gonna want to use these things as somebody who's talked a lot about education-
- SPSpeaker
Yeah
- DPDwarkesh Patel
... and, you know, representation stuff, we're gonna want to use these things to learn things. So, um, yeah, have you, have you noticed the ways to use them more productively to understand concepts?
- SPSpeaker
I'm curious to hear your take on this. I mean, I'll, I'll give mine. I, um, e- even pre-LLM I feel like a relevant insight in learning was, um, recognizing that like who matters more than what.
- DPDwarkesh Patel
Mm.
- SPSpeaker
So like advice to any college student when they're choosing what courses to take, uh, care a little bit less about your preexisting interests 'cause they're kind of arbitrary right now and care a little bit more about whether like the person teaching it is a good educator and someone-
- DPDwarkesh Patel
Mm
- SPSpeaker
... you resonate with. Um, I think in choosing what to read, like what books to read, like who the author is maybe matters more than if it's a prior interest. So if there's a book you've liked before, read what else that author has written rather than reading another thing on that subject. Um, on-- And I, I'm getting to like LLMs on this. So like there's a, there's a difference in feel for trying to learn something if you look at a Wikipedia page of it versus if you look at, let's say like it's a philosophy topic and you go to the Stanford Encyclopedia of Philosophy, or if it's a math topic, you go to the like Princeton Compendium of Math, where the, uh, the difference there is like the articles are deliberately written by one individual-
- DPDwarkesh Patel
Mm
- SPSpeaker
... who, uh, like tries to actually craft a motivation-
- DPDwarkesh Patel
Yeah
- SPSpeaker
... around it and everything. Whereas Wikipedia, it's this like, um-
- DPDwarkesh Patel
Mm
- SPSpeaker
... local minimum that's reached where basically every sentence has to be correct. And I think a good exposition, you care a little bit less about like correctness on the way, but you can like deliberately craft things that are a little bit wrong that you correct along the way-
- DPDwarkesh Patel
Yeah
- SPSpeaker
... that gets like edited out in a crowdsourced environment. So like that-- LLM explanations feel to me at the moment a lot like Wikipedia.
- DPDwarkesh Patel
Yeah.
- SPSpeaker
Which is to say, amazing, right? Like imagine a world before Wikipedia, like how, how long it would take to like find and like suss in and everything. But nevertheless, what's the most useful part of a Wikipedia page? It's often just the references at the bottom, right? You look at the like key references and you go to them and you read them and it's like actually sometimes that gives a much like-
- DPDwarkesh Patel
Yes
- SPSpeaker
... better overview of it. So often I like to a- just ask an LLM, um, like who should I read, right? Like, uh, and, and maybe I can even give some specifics on ways I wanna learn. I actually got gaslit by this once where I remember trying to learn about like, I don't know, like semiconductors or something. I was like, "This feels very visual. This is all like text." I'm like, "Is there any really good like well-visualized math video?" Uh, or not math, sorry. "A well-visualized video kind of like explaining the concepts that you're getting at?" And Claude was like, "Yeah, here's a couple." And the top one it was like, "Here's one from 3Blue1Brown." I'm like, "I can guarantee that there's not." [laughs]
- DPDwarkesh Patel
[laughs]
- SPSpeaker
Thank God. And it was an actual video, an actual link, but it just had like misattributed- [laughs]
- DPDwarkesh Patel
[laughs]
- SPSpeaker
... to someone else's. I mean, and it was good, and it was like I had a much better experience clicking over and watching that video to learn about the thing rather than like trying to proceed forward with questions there. So in that sense, basically using it like a very souped-up version of Google-
- DPDwarkesh Patel
Mm
- SPSpeaker
... on like zero in on the right human-written resource. Um, what about you? Like what-- You, you, you engage with these a lot. What's the best way to learn from them?
- DPDwarkesh Patel
I, I think you put your finger on it. The most productive learning sessions I've had is when there's some artifact that a human has produced, whether it's an article, a book, a video, that organizes the relevant concepts in the correct way and builds up the motivation of why building up the next idea would be relevant to, uh, solving the next problem you did encounter and the next idea and the next idea. And then using the LLMs- To just do a little bit pruning around this, uh, this, this branch that the book has identified. So I was, um, I was actually-- I, I was going through... I think you might have recommended Steven Strogatz's textbook on-
- SPSpeaker
The chaos one?
- DPDwarkesh Patel
Yeah.
- SPSpeaker
Chaos and Nonlinear Dynamics?
Episode duration: 1:33:39
Install uListen for AI-powered chat & search across the full episode — Get Full Transcript
Transcript of episode TfyPshgMbug
