What happens now that AI is good at math? — the OpenAI Podcast Ep. 17

Math is one of the clearest ways to see how far AI has come in a short span. OpenAI researchers Sébastien Bubeck and Ernest Ryu join host Andrew Mayne to explain what changed and what it could mean for the future of research. They reflect on how Ernest used ChatGPT to help solve a 42-year-old open problem, the difference between deep literature search and original mathematical discovery, and what changes when AI can work over longer timelines. Chapters 01:27 The surprising progress of AI’s math capabilities 03:01 Solving an open problem with ChatGPT 06:57 How models went from basic math to research level 11:32 Why math matters for AGI 14:26 AI and the Erdős problems 21:26 Building an automated researcher 28:19 The role of humans as models improve 33:52 Verifying proofs with AI 36:00 The risk of shallow understanding 41:19 Advice for learning math with ChatGPT

Andrew MaynehostSébastien BubeckguestErnest Ryuguest

Apr 28, 202643mWatch on YouTube ↗

EVERY SPOKEN WORD

45 min read · 8,987 words

0:00 – 1:27
Intro
1. AMAndrew Mayne
  Hello, I'm Andrew Mayne, and this is the OpenAI Podcast. Today, our guests are researchers Sébastien Bubeck and Ernest Ryu, and we're going to talk about math, how it went from almost laughable to Olympiad level, and why you need math to reach AGI.
2. SBSébastien Bubeck
  The progress of the last few years has been nothing short of miraculous.
3. ERErnest Ryu
  We will be able to have, uh, LLMs be able to solve problems that require more than 50 pages of thinking.
4. SBSébastien Bubeck
  Mathematics was just the perfect benchmark to see the model making progress during the last four years.
5. AMAndrew Mayne
  Sébastien, Ernest, I'd love to know more about you. So how would you explain your roles?
6. SBSébastien Bubeck
  Yeah, sure. Uh, so I have been, uh, working in mathematics for almost 20 years now. Uh, I used to work in optimization and, uh, theory of, of machine learning. I was a, a professor at Princeton, uh, for a few years before moving to Microsoft, and now I'm at, uh, a researcher at, at OpenAI. And in the last few years, I've been really trying to understand how AI can help mathematics and, and to really evaluate the progress that we're making in terms of solving difficult math problems with AI.
7. AMAndrew Mayne
  Ernest, how about you?
8. ERErnest Ryu
  Yeah. So, um, I've recently joined OpenAI as a researcher, but before that, I was an applied mathematician, uh, working on optimization and, uh, uh, machine learning theory. Uh, and I was, I-- in, in my previous job, I worked as a professor of ma-mathematics at the UCLA math department.
9. AMAndrew Mayne
  So
1:27 – 3:01
The surprising progress of AI’s math capabilities
1. AMAndrew Mayne
  I think a lot of people have this perception that these models aren't good at math, literally they're called language models.
2. SBSébastien Bubeck
  Yes.
3. AMAndrew Mayne
  And how has that changed? What's gone on?
4. SBSébastien Bubeck
  Yeah, I think, you know, the progress of the last few years has been nothing short of miraculous. Um, it's important to remember that two years ago, we didn't even have reasoning models-
5. AMAndrew Mayne
  Mm-hmm
6. SBSébastien Bubeck
  ... let alone models that could prove, you know, difficult mathematical theorems. Today, two years later, the models are able to help Fields Medalists in their day-to-day work. So really, the, the jump is, is just simply astounding. And maybe if I can build a, a little bit more on that, um, something which is important to understand is that everybody has been surprised by this progress, including us. So to tell you a, a, a story, um, a year and a half ago, uh, I was at a, a workshop at a conference with other fellow mathematicians, and there was a debate that I participated in on whether LLM, scaling LLMs will help us resolve major open problems.
7. AMAndrew Mayne
  Mm.
8. SBSébastien Bubeck
  So this was a debate, you know, a year and a half ago, and, and the room was very divided. In fact, they did a poll at the beginning, and I think it was like 80% said, "No. Impossible that this would happen." So then the debate unfolded, and, you know, by the end of the debate, it was more like 50/50, so, you know, pre-pretty good progress during that, that hour. This obviously was just so wrong in hindsight. Like, just mere eight months later, the model was starting to be able to do research-level mathematics.
9. AMAndrew Mayne
  What was the breakthrough moment for you realizing that there was a really good intersection between AI and mathematics?
3:01 – 6:57
Solving an open problem with ChatGPT
1. ERErnest Ryu
  So summer of '25, the big news was, uh, ChatGPT was able to achieve a top human-level performance, uh, at the International Math Olympiad, uh, you know, gold medal performance. Um, so that was amazing news, and that demonstrated that, well, at least for the competition-level mathematics, uh, the models are capable, very highly capable, um, only, um, you know, on par with the, the top human high school com, uh, contestants. Um, but, uh, well, competition problems are canned problems. Uh, they have relatively short solutions because they are meant to be solved in, within a few hours, and they're not novel because, well, somebody came up with it as, uh, there's a solution. So it's not research-level math. Um, so then I got curious, and a lot of people got curious, can ChatGPT do, uh, research-level mathematics? And there was a lot of debate online, and then I thought to myself, "I should try it on my own problem, problems. Maybe I'll try it for myself and make up my own minds as opposed to, you know, listening to what other people say, 'cause I'm a mathematician myself." So I took a, uh, uh, a, a classical open problem in, uh, in optimization theory, which is a, a branch of applied mathematics that I work on, work in. And, um, the question specifically is, there's a, uh, a famous algorithm called the Nesterov accelerated gradient method, um, and does this have this convergent behavior, or is it possible that the, um, for, you know, in, in cer-certain bad cases, can there be a certain divergent behavior? This question was, was genuinely open in the sense that, um, pe-people know that in most cases, the algorithm behaves well, it's convergent.
2. AMAndrew Mayne
  Mm.
3. ERErnest Ryu
  But people really did not know, like, is there a bad instance? Does it, in the worst case, could it diverge? The answer turned out to be yes, and the way I d-discovered it is, um, I remember it distinctly. So, so, um, my bedtime for my son is 8:00 PM, and then I try not to stay awake after midnight. So I had four hours of usually-
4. AMAndrew Mayne
  [chuckles]
5. ERErnest Ryu
  ... uh, evening hours to myself if I want to focus on something. So I decide, okay, I'm gonna spend a few days working on this. So over th-th-the course of three days, so that's 12 hours total, I interacted with ChatGPT on this question. It wasn't as simple as me just putting in the prompt and getting a solution. I played the role of the, the verifier. I told, whenever the model made a mistake, um, I corrected it. I also, um, uh, tried to point the conversation into j- uh, areas that I felt, uh, approaches that I felt were novel. And after a while, the proof, uh, uh, there was a proof, and I checked it. I also asked ChatGPT to double-check it, and it, it was correct. And that's how this, uh, 42-year-old open problem got resolved. And once I got this, uh, uh, this solution, um, I thought to myself, what, what would be the most fun thing for, for me, fun way for me to, um, publicize this? 'Cause I could just write a paper, and that would be... but that'd be less fun. So I decided, let me go to Twitter and, and talk-
6. SBSébastien Bubeck
  [chuckles]
7. ERErnest Ryu
  ... about this. And, um-
8. SBSébastien Bubeck
  Dangerous, but yeah.
9. ERErnest Ryu
  Yeah. [laughs] But, uh, well, I had a lot of fun.
10. SBSébastien Bubeck
  Yes.
11. ERErnest Ryu
  Yeah, so people pa- uh, it was-I think one of the earliest instances of a, a genuinely open prob-- mathematical open problem being solved by, uh, AI and, um, and yeah, I mean, peop- uh, people like the-- people ate it up, and it was, it was a lot of fun.
12. AMAndrew Mayne
  It, it is interesting as you brought that up that we, we've seen sometimes people said, "Hey, I found something cool or novel," and then sometimes it gets torn apart, sometimes it stands up. And going into social media can be kind of scary, but it sounds like we do need these kind of feedback cycles. I think part of the challenge for a lot of us is we hear terms, you know, we hear like the Interma- International Math Olympiad, and we're trying to figure out, like, okay, what does that mean from like a scale of a problem? You know, we-- I can understand addition, subtraction, multiplication. Could you give me an example of understanding, like, where we went from, from like, you know, first ChatGPT, which could kind of sort of use it, then it could do, could do math or it could use a tool, but then the model sort of implicitly understanding that.
6:57 – 11:32
How models went from basic math to research level
1. ERErnest Ryu
  When ChatGPT, uh, you know, just entered the scene in, in twenty-- uh, twenty-- early '23, I, I, I started testing the-- I was very curious about how the model, uh, is-- would perform fair on, on, um, sort of common math problems. So, so these would include math problems that you would see in like the high school level, but al- but also like day-to-day like math-ish problems. So for example, imagine a scenario where we-- like the three of us went camping together, and then I paid for this, Seb paid for this, um, sorry, and, and then Andrew, you paid for whatever, and then we might wanna clear the ledger. We wanna split things evenly at, at the end. Um, can ChatGPT do the calculations for us? And it's-- this is moderately complicated if you have like seventeen items that we purchased. In '23, '24, and also in twenty-- early '25, I remember, uh, the models couldn't do this. Another example would be, um, uh, I'm in, let's say, in Korea, uh, Seb's in Paris, Andrew, you're in California, and want to set up a Zoom meeting. Like, what would be a good hour to do so? Um, it-- a-again, in early '25, the models, uh, couldn't do this. But then just suddenly things just, um, changed. Uh, and I wasn't in OpenAI at the time, so I'm not at all-- I don't-- I'm not quite privy to what exactly you, you did, but, um, suddenly the models started solving IMO problems, and then, uh, furthermore, it started solving research problems. And the, uh, the way I sort of calibrate this right now is that, um, unless you are a professional mathematician trying to discover new mathematics, um, uh, if you are somebody who's like, let's say a, a physicist or a chemist who, who uses relatively complicated mathematics like differential equations, um, things, I mean, differential geometry, things like this, but, um, but you're not inventing new math, then ChatGPT can do all of the math that you would need. So any basically user of high-level mathematics it-- from STEM can now use, uh, ChatGPT to basically have their math taken care of. You would wanna, you wanna exercise some, some degree of caution, you know, to check the-
2. AMAndrew Mayne
  Mm-hmm
3. ERErnest Ryu
  ... check whether things are right, you know, run simulations just to double-check. The models can make mistakes. But now, um, any math problem that you would wanna solve, most people, for ninety-nine percent of the population, the models can do it.
4. AMAndrew Mayne
  When I worked on the release of GPT-4, I used scheduling as one of those examples, and I could put three people into a schedule and have it figure out time slots, but pushing it beyond that, that was really hard. Why, why did... Was there a change? So Ernest just talked about noticing all of a sudden it got better. Now we know one thing was tool use. You could let the model use a calculator, but something else happened with the models themselves.
5. SBSébastien Bubeck
  So going back to, to, to the debate that I just told you about, like the framing was really about can scaling alone-
6. AMAndrew Mayne
  Mm
7. SBSébastien Bubeck
  ... scaling of LLMs alone bring you to, you know, solving research, uh, uh, breakthroughs in, in mathematics? And this is a wrong framing. What we do at OpenAI, we do a lot of research, innovative research. It's not just about scaling the model. So when you ask what happened or, you know, when you, you're asking what happened middle of last year when suddenly the model were able to solve math problems, well, a lot of things happened. We do a lot of research, and, and all of this has to progress at the same time. So I can't really point to a single element.
8. AMAndrew Mayne
  Mm-hmm. But it was able to do it itself though without the tools, so.
9. SBSébastien Bubeck
  Yeah. So I, I, I think it's, it's really, really important to, you know, just d- double down on what Ernest was saying about the progress-
10. AMAndrew Mayne
  Mm
11. SBSébastien Bubeck
  ... and, you know, the scheduling problems that the model wasn't able to do back then. I said that two years ago we didn't have reasoning models. Well, think about four years ago. Four years ago, so this is pre-ChatGPT, and I remember Google came out with a mathematics model called Minerva at the time. And I fell from my chair. I was so impressed. What was I impressed by? That the model, I could give it the coordinates of points in the plane, and it would give me a line that goes through those points. Like, when I say that, you know, now it's, it's almost hard to understand. What are you talking about? Obviously, a model can do that. So I think we, we have kind of forgotten how quickly things have happened. And now, yeah, you know, Ernest was saying that it, it's basically at the point where unless you're trying to invent new mathematics, it's kind of at the right level already. I would say we're already seeing glimmers that even to invent new mathematics, it's, it's, it's getting there.
12. AMAndrew Mayne
  Could you break down though, aside from somebody who's interested in developing new fields of mathematics or just making new proofs, it's-- what does this affect everything else? What is the impact of this going to be on science? What is the impact of, of the rest of what you're working on?
11:32 – 14:26
Why math matters for AGI
1. AMAndrew Mayne
  Why is this really important and not just, "Oh, cool. It does math"?
2. SBSébastien Bubeck
  So I think the, "Oh, cool. It does math" part, what did matter as we were developing those models as a good way to benchmark the progress. The nice thing about mathematics is that the question are very clear, non-ambiguous. You know, everybody agrees on what the question is asking. So that's point number one. Point number two, you can verify the answer. So once a model can give an answer, everybody will agree was it correct or was it not correct. Although you can put a pin on that because we will talk about, you know, in research level it's not that simple anymore-
3. AMAndrew Mayne
  Mm
4. SBSébastien Bubeck
  ... to evaluate, but before research level it's very easy to evaluate. So s- mathematics was just the perfect benchmarkTo see the model making progress during the last four years. Now, we'd say we have kind of saturated that aspect, and, and you can ask, okay, now, now, okay, fine, the models do mathematics. We have understood. W- what, what about the next steps? And for the next step, I would say that having our models be good at mathematics is gonna be good for many, many other things, and let me explain why. A key feature of mathematics is that to resolve a problem, you have to think for a long time, be it days, weeks, sometimes years. So this long thinking, not only do you have to think for a long time, but you also have to think consistently for a long time. If at some point in your chain of reasoning there is a mistake, this will kill the entire argument. It doesn't matter if everything after that is correct. If there is one single failure point, everything, the entire argument is destroyed. So this property makes it that this is what you want out of reasoning models, that if they make mistakes, they will be able to correct themself. So we are hoping that this property that they acquire through mathematics will generalize to other domain, which by the way, is exactly the same thing with human beings. Why do we train human beings in mathematics? I mean, it's a very fun topic. I love it. We did it professionally. Uh, maybe we still do some of it a little bit. [laughs] Uh, but w- why do we train humans in mathematics? Exactly for the same reason. It gives you this kind of very logical thinking.
5. AMAndrew Mayne
  Do we need to think about new ways to talk about these discoveries?
6. SBSébastien Bubeck
  Yeah. So I, I personally view it, uh, a little bit as, as part of my role to try to, to educate the research community about the recent advances, uh, because I, I have this, you know, dual background of both being a former mathematician and, and now working on the, on the frontier of AI. And, and indeed, like Twitter and social media is a great place to, to try to explain what is the progress, in particular because this progress is so fast.
7. AMAndrew Mayne
  Mm-hmm.
8. SBSébastien Bubeck
  So, you know, f- for example, I, I, I... Maybe we can talk a little bit about the Erdős problems-
9. AMAndrew Mayne
  Yeah
10. SBSébastien Bubeck
  ... uh, you know, and, and, and, and some of the controversies that happened [laughs] uh, a- around that. So, um, there was a first example. So there was first, you know, Ernest example, and then there were a few other problems that were solved-
11. AMAndrew Mayne
  Do sort of explain Paul Erdős though too, just so I think-
12. SBSébastien Bubeck
  Oh
13. AMAndrew Mayne
  ... people would love to know who he is and why his problems are sort of
14:26 – 21:26
AI and the Erdős problems
1. AMAndrew Mayne
  interesting.
2. SBSébastien Bubeck
  Yeah, of course. So Paul Erdős is one of the most prolific mathematician of the last, uh, century. He has written, I think, uh, 1,500, uh, research paper. He was a very iconoclastic, uh, figure. You know, he didn't have a house or an apartment. He was just traveling, uh, from one university to the next, trying to find new collaborators, and every time he would go to a place and basically ask question. He was very, very, very gifted at asking questions. Not all the questions that he asked were interesting. Let me just say that-
3. AMAndrew Mayne
  Mm
4. SBSébastien Bubeck
  ... uh, right away, but still it was very productive and, you know, they... The research community wrote a lot of papers with him. There is even this concept of an Erdős number-
5. AMAndrew Mayne
  Mm-hmm
6. SBSébastien Bubeck
  ... which is, you know, how far away, uh, are you in the chain of collaborators from having also, uh, a paper with, with Erdős. Uh, my Erdős number is, is two. I, I, I've co-authored a paper with someone who co-authored with Erdős.
7. AMAndrew Mayne
  Wow.
8. SBSébastien Bubeck
  So, you know. Yeah, I'm, I'm pretty happy about that.
9. AMAndrew Mayne
  My number's three.
10. SBSébastien Bubeck
  Oh. [laughs]
11. AMAndrew Mayne
  [laughs]
12. SBSébastien Bubeck
  Yeah.
13. AMAndrew Mayne
  The, the joke was, you know, you could be on a train ride with him, and then by the end of the train ride, you'd maybe work on a paper with him and have your name mentioned.
14. SBSébastien Bubeck
  Absolutely. Absolutely. I think the two versus three basically says something about our respective age. [laughs]
15. AMAndrew Mayne
  [laughs] Yeah.
16. SBSébastien Bubeck
  That's, uh, essentially what it says. So anyway, uh, so Erdős has, uh, you know, all of this problem and, and there is a, uh, a very nice website by Thomas Bloom, uh, who is keeping track of all the Erdős problems that are still open.
17. AMAndrew Mayne
  Hmm.
18. SBSébastien Bubeck
  So I think there is like 1,000 problem or something like that, uh, on that website, and Thomas himself has done the work of trying to find, you know, he's an expert in combinatorics, so he can kind of say, "Okay, this is open. This is, you know, uh, resolved. This has some complicated status," you know, for every, every, uh, problem. Of course, he doesn't necessarily know the answer to all of them, so if there is a paper which is marked open, it is not necessarily true that nobody knows-
19. AMAndrew Mayne
  Mm
20. SBSébastien Bubeck
  ... how to solve it. But it is also a very interactive website where people can go on it and, you know, add comments to every, uh, problem and explain whether there is a solution, et cetera. So it's a very dynamic, a, a great website. So of course, once we started to have, uh, GPT be able to solve research math problem, this sounded like a treasure trove of problem to try our models on. And, um, we tried a, a couple, and to our great surprise, the model came back with answers to some of them that were marked as open. So we got really excited, uh, about this. The, the first one, you know, that I tweeted about, uh, I don't remember when it was, maybe it was in October or something like that, uh, last year, it was a, a deep literature search result.
21. AMAndrew Mayne
  Mm.
22. SBSébastien Bubeck
  So let me explain what that means. It means that what GPT did is that it did a vast literature search, trying to scan, you know, thousands of papers, and it found in some unrelated field the answer to the question. Now, it's really important to understand that it's not like in that, you know, unrelated field the person said, "Okay, I'm solving an Erdős problem." It was written in a completely different language. It was different mathematics. You have to do work to connect the two pieces, and GPT did that. So that was kind of amazing. And, and this was very ad hoc, like, you know, we just tried by hand basically in the ChatGPT interface. Once we saw that, um, uh, Mark Zirki, who is, you know, uh, in, in, in our team also, decided to have a, a, a more systematic approach of trying all of the problems. And he tried that, and the, the model came back with solutions to 10 Erdős problem. And this was... You, you have to remember, at that point, there was still, I think, a, a very dynamic discussion about whether, you know, those models could go beyond the state of the art and discover, invent new mathematics. So I, I, I got very excited about, uh, this, this result, and, and, and I tweeted about it and, and, you know, it's, it's kind of an infamous tweet because people misunderstood it as kind of saying it really found the solution to 10 open problems that are very hard, and the solution is completely new and did not exist in the literature. But that's not what happened. It was connected, of course, to the-
23. AMAndrew Mayne
  Hmm
24. SBSébastien Bubeck
  ... previous case, where it is a deep literature search. So there was some, you know, feud with, uh, with Google about, uh, you know, with Endemis about whether, you know, th- this is, uh, the right way to talk about such results. But now the, the punchline is kind of amazing, which isA few months later, so again, I said 10, uh, solutions to, uh, open problems, and these were solutions in the literature, and then the question is can you find solutions that are not in the literature? By now, we have more than 10 actual solutions that are completely new, that are publishable in top journal in combinatorics, completely obtained by, you know, s- some by ChatGPT and some by our internal, uh, models. So just within, again, this, this really speaks to the acceleration. In the span of just a few months, we went to it's kind of a ridiculous statement to say that there would be 10 solution to Erdős problems.
25. ERErnest Ryu
  Mm-hmm.
26. SBSébastien Bubeck
  To it's actually happening for real, and it's accelerating.
27. ERErnest Ryu
  Yeah, it's interesting 'cause it seems like that, you know, step one is have models be able to do really good literature research, and there have been m- major papers and awards done, given to people who've just done literature searches and found the solution was solved here and it actually applies elsewhere. So it's neat that G- it does that as a first step, but now that it's actually doing original.
28. SBSébastien Bubeck
  I mean, you know, the one thing that I really like about AI research is that it forces us to confront big questions about intelligence and about, you know, research and, and, and progress and how do we discover new things. In particular, there is this question of whether the progress that we're seeing in science, is it just putting together different pieces and, you know, doing a little bit of-
29. ERErnest Ryu
  Mm
30. SBSébastien Bubeck
  ... reasoning on top of it? Or are there those brilliant, you know, sparks of insight? Everybody, of course, points to Einstein's, you know, relativity. I'm, I'm not even sure that really counts, to be honest. So I think the jury is still out on whether this process of just recombination and a little bit of thinking, whether you can kind of increase, you know, human knowledge with, with no limit, or do you really need the sparks of genius that would be somehow only human?
21:26 – 28:19
Building an automated researcher
1. ERErnest Ryu
  elsewhere. We've heard the term auto researcher.
2. SBSébastien Bubeck
  Right.
3. ERErnest Ryu
  Do you wanna unpack that a bit?
4. SBSébastien Bubeck
  Right now, the, the way we, we work is exactly what Ernest described, which is really an interaction. It's, it's kind of a professor-student interaction where ChatGPT is a student and the professor is kind of, you know, giving a, a, a first problem, and the student comes back, and then they talk a little bit. The student goes away for another week, comes back. One point, of course, is that it's compressing those timelines greatly. In Ernest's story, you know, of solving this problem in 12 hours, I mean, I don't know. Without ChatGPT, how long would it have taken you?
5. ERErnest Ryu
  Well, I have spent more than 40 hours failing [laughs] my-
6. SBSébastien Bubeck
  [laughs]
7. ERErnest Ryu
  ... without, without AI, and I don't know, maybe a month at-
8. SBSébastien Bubeck
  Right.
9. ERErnest Ryu
  Yeah.
10. SBSébastien Bubeck
  So, so exactly. So, you know, there is this, this thing of just compressing timelines. Now, when we talk about the automated researcher, that's a slightly different vision where the model or maybe a collection of model would work autonomously-
11. ERErnest Ryu
  Mm
12. SBSébastien Bubeck
  ... for a long period of time. This is kind of needed if we want to go beyond the current level. The current level of interaction, you know, the professor-student interaction where the student comes back after a week, it's gonna be very hard with that mode of interaction to do real breakthroughs, to solve actually longstanding, you know, research problems, or to make problem, progress in, in, you know, very difficult fields in biology where you need to interact, you know, with the wet lab and do all kinds of experiments. So once you want to go towards a real breakthrough, we will need to work over longer timelines, and this is where the automated researcher, uh, comes in. Maybe let me say it in a, in a slightly different way. One, one concept that I'm a big fan of is this concept of AGI time.
13. ERErnest Ryu
  Mm-hmm.
14. SBSébastien Bubeck
  So you can have AGI seconds, minutes, hours, days, and so on. So that really means you have an AI, and for, like, it can mimic human thinking, but for how long? So as Ernest was saying, you know, two years ago, maybe models were mimicking, you know, a high school student who thinks for a few minutes on a problem. Now we can mimic a researcher who can think for hours, maybe a few days. We really want to go towards... A- and, and this progress has, has been going on for now, you know, very consistently for four years, where we went literally from seconds to minutes to hours to days, and now we are roughly at days/one week. We want to go to weeks, if not months. This is open research. You know? I, I don't think anyone on the planet knows exactly how to do it. But this goes back to we are doing a lot of research, a lot of innovation, and I think once everything will be put together, we're just seeing this arc of progress where we keep making progress in AGI time. But this is, this is the direction of the automated researcher.
15. ERErnest Ryu
  So the people, the other mathematicians that I, you know, t- talk to, their mode of using, uh, AI is they open up ChatGPT and then they talk to ChatGPT within that context window. And you can have multiple sessions, but each session has a, uh, a finite context length, and roughly, um, uh, on the order of, like, 50 pages of a math paper. Um, and that's not long enough to make true, like, deep m- uh, math, groundbreaking, uh, math breakthroughs, uh, because a lot of math papers are longer than 50 pages, and also the, the, the thought, the human thought that went into to produce, let's say, a 10 or 30-page paper is usually, well, much orders of magnitude longer than the final output.Um, so there's a limitation with the, the limited context win- window. But, um, for users, uh, but people who've used Codex will, will know that you can actually have very long work sess- sessions with Codex. So you just keep, you know, giving instructions as to what kind of code you wanna write, and then the code itself that, that you're working on, the repository of your, of your code, uh, which in the math sense, uh, the analogy would be that would be analogous to, like, math notes that you write down. That can be very, very, very long. Um, but Codex has a, is, is pretty good at dealing with that. It, it, it, it once in a while, it compactifies its conte- uh, its, its conversations. Um, and it has its way of bec- of becoming this, um, really, um, uh, amazing agent that can do really complex jobs over, uh, huge repositories of code over a long, a really long, uh, context of conversation. Um, and this, uh, I believe is going to happen with mathematics research as well. So we will be able to have, uh, LLMs be able to solve problems that are longer than just, you know, that require more than 50 pages of thinking, and that's what humans do, that's what human mathematic- ma- mathematicians do. We pe- when people think for a day on a certain problem, and then we kind of summarize our ideas and then put it into notes. The next day or the next week, we come back to it, and then over several months, we've thought for so, so long, but, um, uh, it's sort of summarized, it's sort of organized in a way that becomes manageable, and in the end, the final output becomes a 30-page paper, uh, uh, that r- summarizing the thoughts over, you know, many, many, uh, months or even years. So yeah, I think that's gonna happen.
16. AMAndrew Mayne
  I was working on a, a very, very, uh, laughable problem to you guys over the weekend, and using an LLM to try to do it, uh, to figure out, like, how to use a really small LLM to do math. In the middle of it, I needed a benchmark, and I came across EasyMath, which is a benchmark for small LLMs, and the problem is just a paper on it. There wasn't really, like, a lot of data. And I just, in the middle of Codex, I go, "Can you create our own benchmark here and just generate the data for that?"
17. SBSébastien Bubeck
  Yeah, yeah.
18. AMAndrew Mayne
  And five minutes later, I had it.
19. SBSébastien Bubeck
  [laughs]
20. AMAndrew Mayne
  And that was magical to me because I'm in the middle of working on the tool that would've involved me all of a sudden, okay, I gotta spend a few hours, go do a generator, go produce this sort of stuff.
21. SBSébastien Bubeck
  Absolutely.
22. AMAndrew Mayne
  And it runs in the background. I can't imagine what it's like for you guys doing grown-up problems.
23. SBSébastien Bubeck
  Yeah, I mean, what you describe is really, you know, what, what we went after when, when we published the paper w- whose title was, uh, Early Experiments, uh, in Science Acceleration-
24. AMAndrew Mayne
  Mm
25. SBSébastien Bubeck
  ... uh, with GPT-5. Like, we... What, what you have experienced is, is literal acceleration. Like, this is something that would have taken you before, I don't know, maybe a few days of work or something.
26. AMAndrew Mayne
  I would've given up.
27. SBSébastien Bubeck
  Yeah. [laughs]
28. AMAndrew Mayne
  [laughs]
29. SBSébastien Bubeck
  Yeah, so that, that, uh, that's actually a great point, you know, I would have given up. This really enables scientists everywhere, like for example, mathematicians, to be able to use code.
30. AMAndrew Mayne
  Mm-hmm.
28:19 – 33:52
The role of humans as models improve
1. SBSébastien Bubeck
  to before, uh, we should talk about the role of the human in, in all of this. What is the place for the human, especially if we start to think about, you know, let's think a little bit about the future. I, I, I'm, I'm not a big fan of trying to predict the future.
2. AMAndrew Mayne
  Mm.
3. SBSébastien Bubeck
  I like to explain what are the-
4. AMAndrew Mayne
  But what do you think will happen?
5. SBSébastien Bubeck
  [laughs] Um, I think, I think, you know, there is what my heart tells me, and there is, uh, the rational aspect. So the, the, what, what my head tells me is, look, the progress has been happening, you know, very consistently for the last four years, from being able to solve math problems that would take you seconds to minutes to hours to days. I-- there is no reason... A- anybody who would look at the situation would say, "Okay, a year from now you will have systems that can think for weeks, two years from now, systems that can think for, you know, uh, years." Not only that, but already today, we're finding that our models, they are able to really surpass humans in the sense that they can find mistakes in papers.
6. AMAndrew Mayne
  Mm-hmm.
7. SBSébastien Bubeck
  You know, we had system, we had agents, uh, that internally that have been able to come up with, to, to find papers and say, "Hey, actually this is wrong. Here is the correct answer." Not only that, but people st- tend to think that, um, AI is only good at answering questions. Actually, no, it's also pretty good at asking questions. Of course, you need to be, you know, you, you, again, you need some research innovation there, which we had, and now our models are very good at asking questions. So good, in fact, that humans are looking at those questions and saying, "Hey, maybe I should write a paper [laughs] based on this question." So, so this is, you know, really, really already happening now. So I think what, what I'm trying to say is that in a year, in two years, yes, models could do basic, more or less everything that human researchers do. So now what? What is the role of humans? Well, why is it that we're doing science? What's the point? You know, the point is not to... I mean, at, at least, it, it shouldn't be to just solve problem for, for the fun of solving problem. We're solving problems because we're trying to understand something. The understanding piece is key. We're not solving problems to write papers, to show, to say that we can write, you know, 10 times more papers than, than our neighbor. That's not, that's not the point. They, you know, you can do ch- competitive chess if that's, if that's your, your kind of deal. We're trying to really understand deeper things. And why are we trying to understand deeper things? Because we want to have better control over our environment. We want to be able to cure diseases. We want to be able to build things, you know, better, faster, more robust, more solid, all of those things. So I think we, there is a chance that-We're looking at a very, very bright future using those tools, as long as the human stays in control and guides what are the problems that matters. Problems that, you know, the AI doesn't care about curing disease. I mean, you know, they will not suffer from the same diseases as we do, but we do care, so we have to control them and to guide them towards those problems.
8. AMAndrew Mayne
  At the time of the advent of the first computers, when the computer went from being a person that did the math to an actual machine that did it, you saw some people looking at maybe we all have to move from math to physics because that's where the hard problems are gonna be, and there's not gonna be any more hard problems in mathematics 'cause-
9. SBSébastien Bubeck
  Yeah
10. AMAndrew Mayne
  ... computers will solve that. And that was in the 1940s and 1950s, and it turned out that that's not the case, that computation opened up a whole new branch of that, and y- that's what's gonna continue, that we're just gonna see that the mathematician that's in high school today is gonna have a very exciting future 30 years from now because of what's happening here?
11. ERErnest Ryu
  I think math is going to be so much fun, so, um-
12. SBSébastien Bubeck
  [laughs]
13. ERErnest Ryu
  Okay, so math is ... So mathematicians enjoy solving problems, but, um, you know, uh, pre-AI, you know, we would think for months to solve a problem, and that's ... there's enjoyment in that, but there's ... it's, it's quite grueling, and to-
14. SBSébastien Bubeck
  There is pain. There is pain too. [laughs]
15. ERErnest Ryu
  There is a lot of pain, um, and there's a huge, like, there's a surge of dopamine when you actually find the solution. Um, that's gonna be accelerated, so, you know, more solutions, more fun. Um, but also I think, uh, math is going to become much more richer because it's going to be much more interconnected because there's a lot of, uh, at research level, a lot of math is, uh, hyper niche, and, uh, when you write the paper, you know that there are only five hu- living humans right now that will care about this paper, but you like the result, so you put, you put it, put it out, and then the five other people appreciate it, so they, they read it. But then, you know, 20 years later, it's gonna, well, it's gonna be in the archive somewhere and, and nobody will read it. But now that we have AI, the AI will have read it, and if there is a useful connection, as-
16. SBSébastien Bubeck
  Yeah
17. ERErnest Ryu
  ... uh, as Sébastien mentioned, it will surface it, and then people, people, you know, 100 years down the line will discover it and use it for whatever they want to use. So there's, uh, I, I would now have much more confidence that my results that are just, like, that is put out there will be used if there is a use in the, in, in the future. And also, I c- I'm now able to access the mathematics in a much broader way. There are fields that I've not studied, but, you know, if a result comes up, then I would still have to study that field to be able to, um, use that particular result in my research, but there is no way I could have found that result w- without the assistance of AI. But now it's accessible. The model tells me, "Hey, you can use this to solve your problem," and then well, okay, uh, I'll go and try to, try to use that. So math is going to be m- a much more interconnected
33:52 – 36:00
Verifying proofs with AI
1. ERErnest Ryu
  enterprise. And also verifying correctness of, of mathematics is actually quite non-trivial because imagine there's a proof written by some, you know, some- somebody, uh, that's ... it's the ... it's 300 pages long, and it, it claims to solve, uh, a really important problem. And this person is a very reputable person, so, like, there's, there ... and, and the paper at, at surface looks, you know, uh, plausible. How do you know? Well, I mean, re- this, these are, this is a process that takes years to, to verify, and it's, it's also not enough that one person reads it. M- many people need to reas it, read it and then, uh, uh, try to extend it and then look into the details. Th- this is a process that take y- takes years. And sometimes, um, uh, like, fatally incorrect proofs are, are published, so that's also a pro- very slow process where the field initially accepts a result but later on discovers that it's unsalvageable, so then it, it needs to get filtered out. This is gonna be so much more accelerated w- with, with AI. So right now our ChatGPT, uh, and our AI models are not perfect at verifying mathematics, but it's very good, and also it has much more patience, uh, than, than humans.
2. SBSébastien Bubeck
  Yeah. [laughs]
3. ERErnest Ryu
  So the truth is so much of the published mathematics have minor mistakes, and so a lot of them do have major mistakes, and, and we know because we have tested these things, uh, uh, with our models. Um, but now I think the, the more richer future of mathematics is that this will be, um, through AI verification. We will, um, have much more certainty as to which results are correct, which, which re- which results are incorrect, and we'll have a much faster feedback on this. A paper published, uh, put out, uh, a week ago, we could get a verification on that, and then we could trust and build on that, as opposed to waiting for five years to really ascertain the correctness. So it's o- overall, it's, uh, math is gonna be much more fun. It's gonna be much more interconnected. Um, we'll, uh, it'll, it'll be ... We'll be able to trust the results more. We'll be able to move faster, and the mathematicians will solve harder and more interesting problems.
4. SBSébastien Bubeck
  So may- maybe one thing that I want to add, so I totally agree with everything that you just said. It's, it's, it's gonna be a,
36:00 – 41:19
The risk of shallow understanding
1. SBSébastien Bubeck
  a, a lot of fun, but, um, I, I, I want also to talk about one potential danger of, of the current progress, which would be that we kind of hand the keys to the castle to, to the AIs, and that human just start to trust the system a lot more and that they don't do the hard work that we, you know, kind of did to own our skills and to own our skills to, to be able to verify and to sit patiently, you know, for hours, you know, many days in a row or many weeks in a row to try to understand deeply a result, and instead just kind of ask ChatGPT, uh, to explain it to us in, in, in simpler terms. So basically I'm worried about potentially having a shallower understanding of things because we rely too much on the tool. So I think it's really important for the audience, for everyone listening to us, to understand that expertise is even more valuable than it ever was. The reason why we are able to, you know, squeeze out those results from ChatGPT is because of all of those years of training and our deep understanding of the subject. If it wasn't for that, we would not be able to push the state of the art. And we're seeing it. It's not like we're seeing, you know, thousands of pe- like non-mathematicians suddenly being able to, to prove new result. In fact, if anything, we have seen recent examples in, in social media where non-mathematician have tried to use those tools to prove CRM and, and come up with, you know, many tens of pages of, of, of proof, and then it turns out to be just wrong.
2. ERErnest Ryu
  Mm-hmm.
3. SBSébastien Bubeck
  So, so this is a danger that we have to grapple with.
4. AMAndrew Mayne
  It, it seems like that's-... going to be a problem in a lot of things. You see people spend, you know, using current models that often just reinforce things you wanna hear, and that can be kind of your, you know, "I'm gonna come up with some sort of unified theory or whatever." Like, well, guess what? That's gonna be a lot harder.
5. ERErnest Ryu
  Yeah, I mean, this sort of issue of mental sort of atrophy, if you will-
6. AMAndrew Mayne
  Mm-hmm
7. ERErnest Ryu
  ... uh, is also, I think, very prominent in coding as well.
8. AMAndrew Mayne
  Mm-hmm.
9. ERErnest Ryu
  So I mean, I, I'm not a, you know, I wasn't a computer science major, but I took some c- computer science courses, and I, I did, I, I coded myself. I, you know, wrestled with the debugger, and most people of my age did. But, uh, nowadays you don't have to do that in, in your university curriculum, and I think that's very dangerous.
10. AMAndrew Mayne
  I've heard some people in the sciences who look at the progress are very optimistic, like, "Well, we're not gonna need scientists. We're not gonna need this anymore." And-
11. SBSébastien Bubeck
  No. Yeah, no. Wow, this is terrible. [laughs] Uh, so really I want to make sure anybody listening, please do not say that. This is the opposite of what we need. We need more scientists than ever. Those scientists are gonna be more productive, more powerful. They will do better things, but we need them to be really, really good at their craft. And I think this is where, you know, obviously OpenAI cannot do everything, you know, just to, to, to say it out loud, and this is where the existing institution have a very big role to play.
12. AMAndrew Mayne
  Mm-hmm.
13. SBSébastien Bubeck
  So academia needs to both understand the, the, the rate of progress and, and you know, how fast this is going, but also to kind of reclaim their role in, in that process.
14. AMAndrew Mayne
  Yeah, my hope and expectation is we're gonna see more people go into the sciences because if you decide later on in life that you wanna get into this, it's easier to catch up if you're dedicated 'cause you have the greatest tutor in the world. OpenAI just added it to ChatGPT. It can, has-
15. SBSébastien Bubeck
  Wow
16. AMAndrew Mayne
  ... has a visual explanation tool now that helps you explain things, and I think that people, y- you know, just because all of a sudden an AI model is able to, you know, completely top out, you know, a benchmark doesn't mean that you go, "Okay, we're done."
17. SBSébastien Bubeck
  No.
18. AMAndrew Mayne
  "We, we, we, we solved grade school math. Congratulations, everybody. AI is done." It's like, no, there's a next level and a next level, and you're gonna need people.
19. SBSébastien Bubeck
  No, I think it will help, I mean, the young generation to get up to speed in science, like, so much more quickly. That's for su- Like, I, I cannot imagine if I had ChatGPT, [laughs] you know, as a teenager. I mean, I remember looking at Maxwell equation and being like, "What does it really mean? Well, how did they come up with this stuff?" Now you can just ask it-
20. AMAndrew Mayne
  Mm
21. SBSébastien Bubeck
  ... and it will explain it to you so beautifully. It, it's a big deal, but you still need to do the hard work on top of it.
22. AMAndrew Mayne
  The, with a lot more people trying to create mathematical proofs who don't know what they're doing and aren't really maybe putting in the right scholarship to make sure of that, we've seen areas of code repos and whatnot and people contributing fixes that aren't real fixes and things like this. How do you solve for that? If, if I'm somebody who's involved in mathematics or a journal right now, I'm a little bit terrified.
23. SBSébastien Bubeck
  Yeah, so I think what Ernest said is that, you know, AI can help also for that. So we can have on the other side of it, of those systems, to have AI agent that are also going over everything, trying to verify as much as possible. And then, again, we do not want to trust fully the AI to verify and to accept-
24. AMAndrew Mayne
  Mm
25. SBSébastien Bubeck
  ... a paper or to accept a comment, but we can have the AI agent flagging specific potential issues, so kind of bringing to the front, "Okay, A, maybe this, this part, I'm not totally sure about it." So that will accelerate. That, that will help, you know, the human to have less to verify, basically.
26. ERErnest Ryu
  And I think, um, the, the sort of social structure of mathematics or, you know, code, it has to s- change a little bit in a way that the human doing the commit or human controlling the agent takes responsibility. So in-
27. SBSébastien Bubeck
  Yes
28. ERErnest Ryu
  ... mathematics there already is a-
29. SBSébastien Bubeck
  Yes
30. ERErnest Ryu
  ... culture of, well, if you put out a r- incorrect proof, then well, that's, that, that, it hurts your reputation, and you're putting your reputation on your line, on the line when you put out a paper with your name. Um, and that has to... I think we need more of
41:19 – 43:28
Advice for learning math with ChatGPT
1. ERErnest Ryu
  that.
2. AMAndrew Mayne
  If you're mathematically curious and somebody's watching this or listening and they maybe have an interest in math but maybe they didn't feel they were a math person, but they're kinda curious to get started, what would you tell them?
3. SBSébastien Bubeck
  Go chat with ChatGPT. [laughs]
4. ERErnest Ryu
  If you are interested in learning, then it's so helpful. Like, uh, e- even at the research level, when I need to learn a new concept, I would, uh, habitually go to Wikipedia, and then it's just very dense, and I'm like, "Okay, well," after, like, 30 seconds I, I go, "Okay, let me ask ChatGPT," and then I ask it. And then I also ask follow-up questions, and, and when I do so, it, it, it gives me so much, uh, so much more helpful information that is tailored to the, the parts of my knowledge that is missing 'cause I'm a- asking the questions tailored towards that. Um, and you could, you could ex- imagine explain to ChatGPT-
5. AMAndrew Mayne
  Mm-hmm
6. ERErnest Ryu
  ... your mathematical background, the, the, the, the, the things that you've, the books that you've read, the, the material that you've learned, um, and then ask it to come up with a question that, uh, is, would be open and also would be, uh, uh, understandable with your level of expertise. Sébastien mentioned this. Uh, I think, uh, you know, people-- I don't think people yet appreciate that ch- uh-
7. SBSébastien Bubeck
  [laughs]
8. ERErnest Ryu
  ... these LLMs are able to come up with good questions, but I think they can. So having this companion that you can talk, talk, uh, ma- uh, talk about math with, uh, and, uh, uh, ha- talk, talk about questions, you could ask the model to help you solve it. And, uh, once you have a solution, then you could m- keep talking and, uh, come up with the next question, you know, variations of this. Uh, it, it becomes a much more, um, even though you're still in your room alone, it, it, it feels much less of a solitary process, and that's, that's what really makes, uh, mathematics, uh, fun. Because math, I think, it really is a social endeavor.
9. AMAndrew Mayne
  Okay. And I think, and toy problems can be fun, and, and I, I tell people you can start with, like, how many M&M's can you fit in your bathtub, you know?
10. ERErnest Ryu
  [laughs]
11. SBSébastien Bubeck
  Yeah. [laughs]
12. AMAndrew Mayne
  It sounds silly, and you start to ask, like, then you go, how many words did you read last year? How would you figure this out? And then you can start to have this real wonderful conversation and start asking these questions. Next thing you know, you're starting to do more and more complex mathematics and realize how much it affects you.
13. ERErnest Ryu
  Yeah.
14. AMAndrew Mayne
  Uh, gentlemen, this is great. Sébastien, Ernest, thank you very much.
15. SBSébastien Bubeck
  Thank you.
16. ERErnest Ryu
  Thank you for having us.

Episode duration: 43:28

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode 9-TVwv6wtGQ

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome

Intro

The surprising progress of AI’s math capabilities

Solving an open problem with ChatGPT

How models went from basic math to research level

Why math matters for AGI

AI and the Erdős problems

Building an automated researcher

The role of humans as models improve

Verifying proofs with AI

The risk of shallow understanding

Advice for learning math with ChatGPT

Get more out of YouTube videos.