OpenAIHow AI is accelerating scientific discovery today and what's ahead — the OpenAI Podcast Ep. 10
EVERY SPOKEN WORD
50 min read · 9,824 words- 0:00 – 1:15
OpenAI for Science: compressing decades of research into years
- AMAndrew Mayne
Hello, I'm Andrew Mayne, and this is the OpenAI Podcast. Today, my guests are Kevin Weil, head of OpenAI for Science, and Alex Lupsasca, who is an OpenAI research scientist and professor of physics at Vanderbilt University. We're gonna be discussing how AI is impacting science, an upcoming research paper, and where science may be headed in the next five years.
- KWKevin Weil
Maybe the most profound way that people are going to feel AGI in their lives is through science.
- ALAlex Lupsasca
With ChatGPT, I can just launch it in that direction, in that direction, that direction.
- KWKevin Weil
The acceleration that is going to come from these tools is going to change science.
- AMAndrew Mayne
[upbeat music] So you're running the OpenAI for Science initiative. Could you explain what that's about?
- KWKevin Weil
Yeah, the, the mission of OpenAI for Science is to accelerate science. So the, the question is, can we help scientists do the next, say, twenty-five years of scientific research and scientific discovery in five years instead? Science underpins so much of, of, you know, what we do and how we live, and if we can make progress go faster by putting our most advanced models into the hands of the best scientists in the world, we should do that, and that's what we're trying to do. The, the-- you could ask, like, why now?
- 1:15 – 2:52
From "can’t" to "can barely" to "can’t imagine without AI": the acceleration curve
- KWKevin Weil
Why didn't we do this a year ago? Why aren't we doing this a year from now? One of the big reasons is we're just starting to see our frontier AI models being able to do novel science. So we're starting to see examples where GPT-5 can actually prove new things. Maybe not yet things that humans could not do-
- AMAndrew Mayne
Mm-hmm.
- KWKevin Weil
-but things that humans have not done. So the- these are, like, these little existence proofs of GPT-5 being able to break out past the frontier of human knowledge and into the unknown. And if there's one thing that I've learned from now, uh, you know, a year and a half or so at OpenAI, it's that you go very quickly from the model can't do something, to the model can just barely do something, and it's not great at it yet, but you see these, these, these early examples, and then, you know, six months later, twelve months later, all of a sudden, you couldn't imagine doing this thing without AI. And I think science is in that initial phase where we're seeing real acceleration for scientists that are using AI, sometimes novel, uh, you know, n- not yet maybe large breakthroughs, call them small breakthroughs, and that just says that there's so much potential in this space.
- AMAndrew Mayne
We've seen examples of, let's say, AI helping with mathematical proofs. Could you give me an example of how it might do things in some other areas, like physics or whatever kind of things we might see in the short term?
- KWKevin Weil
Yeah, I mean, we're seeing examples every day, and they're across the, the range of, of sort of the scientific frontier. You see examples in mathematics, in physics, uh, astronomy, life sciences, like biology. Uh, Alex, I mean, you've, you've worked on some of these. Maybe, maybe it's a good time to talk about some of the physics stuff that you've seen.
- 2:52 – 5:37
Physics case study: GPT finds obscure math identity to solve a pulsar PDE (with a typo)
- ALAlex Lupsasca
Yeah, I think coming back to, uh, Kevin's point about how this is a special time, that's very much how I feel as well, because I started the year, uh, twenty twenty-five thinking, "Yeah, ChatGPT is cool." Like everybody, I used it when it came out, and I thought it's a great chatbot, but I, I was sure it would take a very long time before it would become really relevant for my own work. Uh, so I started the year, I would say, as an AI skeptic-
- AMAndrew Mayne
Mm-hmm.
- ALAlex Lupsasca
... uh, because I like to see evidence before I'm convinced of something. And, um, I saw people using it to help in their writing, and it-- I started to use it for that as well. It's very useful for proofreading, but I thought, "Oh, it's gonna be a while before it gets to do the special stuff that I'm really a specialist at."
- AMAndrew Mayne
You're, like, black holes.
- ALAlex Lupsasca
Like black hole physics, exactly.
- AMAndrew Mayne
Yeah.
- ALAlex Lupsasca
And I had this experience early this year where I was trying to find this magnetic field solution that describes, um, what happens around a pulsar, which is a, a, a rotating star with very powerful magnetic fields, and I was going for this very particular solution. I had to solve a partial differential equation, and I was able to, um, identify that solution as an infinite sum over products of special functions called Legendre polynomials. And if you w- go to physics grad school, this is the kind of thing that you spend a, a lot of time getting, um, familiar with, and I also like these puzzles, and I was playing around with the sum, and I felt like there should be a, a simple formula that it evaluates to. And I thought, "Okay, I have this friend who has ChatGPT o3-Pro-
- AMAndrew Mayne
Mm-hmm
- ALAlex Lupsasca
... which I didn't have access to at the time, and I thought, "Okay, I'm just gonna send it to him and see what comes out of it." [lips smack] And [chuckles] he sends me back this output. It thought for eleven minutes, which at the time I'd never seen it do, because I was using the free version, which doesn't think for as long. And it gave this beautiful answer where it was able to understand what the sum was, um, and break it down into pieces that it could tackle, and then it had to go and find this special identity, uh, that was published in one paper from the 1950s in a Norwegian Journal of Mathematics. And so it understood what the problem was, and it knew about this random identity that was just the thing for the job, and it used them, and it gave this beautiful output, and at the end, the answer was wrong because it made this silly typo. It added an extra factor in front-
- AMAndrew Mayne
Mm.
- ALAlex Lupsasca
-for... It was almost kind of like a human making a, a silly typo at the end. But it was very easy to check the derivation, and I, I went through it, and I realized, "Okay, there's this extra factor, but aside from that, it, it did the work." And that really sent me reeling because I thought, "Okay, [chuckles] I would say that's a uniquely human ability." I, I thought that's something that makes theoretical physicists special. Um, you know, now in twenty twenty-five, clearly, they're capable of doing things that I, I would consider amazing.
- KWKevin Weil
Yeah, I think one of the cool things, so you've got examples like Alex's, where it was probably not something that he... Like, he could have done it himself over-
- ALAlex Lupsasca
Mm-hmm
- 5:37 – 8:26
Conceptual literature search across fields and languages
- KWKevin Weil
... you know, eventually, but GPT was able to do it faster. That's acceleration on its own. And there's something qualitative about that even as well, because if you can explore-- instead of exploring two paths over the course of a week, if you can explore ten paths in parallel in, you know, an hour, all of a sudden there's a lot more ideas that you can try, and that's also acceleration. We also see examples in, like, literature search, which you don't think of as m- maybe like deep scientific innovation.... but it's really important to be able to understand, you know, h- has somebody worked on this problem before? And if so, is there something I can learn to speed up my own work? So, um, and we've seen interesting examples where, uh, there was one-- I might get the details of this wrong, but we were talking to this researcher, and, uh, he was saying he was exploring this particular idea in, like, high-dimensional optimization. And he was like: Man, you know, this thing I'm working on, it's interesting, but somebody must have worked on this before. I can't be the first person to have had this idea. I just can't. But I can't find any examples. And then he had given it, he, he'd sort of given a description of what he was working on to GPT-5, and GPT-5 found an example from, I think it was like economics or something, a completely different field-
- ALAlex Lupsasca
Mm-hmm
- KWKevin Weil
... that used completely different terminology, so no keyword lookup would have ever worked. GPT-5 did sort of a con- a conceptual-level literature search-
- ALAlex Lupsasca
Yeah
- KWKevin Weil
... found somebody's PhD thesis in German, so also-
- ALAlex Lupsasca
[chuckles]
- KWKevin Weil
... a completely different language. You know, it was like basically lost to time, but this person had done really interesting, sort of related work that helped him in his research. And so, you know, that's another area. So you can talk about, uh, the acceleration that comes from just, like, novel proofs-
- ALAlex Lupsasca
Mm-hmm
- KWKevin Weil
... and GPT-5 being able to do something on its own or guided by an expert, but there's also these examples of acceleration in calculations and literature search, and, and all of them contribute to accelerating science.
- ALAlex Lupsasca
Yeah, and the exact same thing happened to me. I was trying to derive this property of black holes, and I got this equation that described this phenomenon I was after, and it had a three derivative term, which is pretty unusual, and I looked at it, and I recognised it's something called the Schwarzschild derivative, which is a special thing that appears in math, and I thought: Hmm, well, this is really strange that this would show up. And I just copy-pasted the equation into ChatGPT and said, "Do you-- have you seen this before?" And it said, "Oh, yes, this is the conformal bridge equation." I had no idea what a conformal bridge was at the time. [chuckles] And it said, "Oh, just look up this paper," and that was amazing because it turns out that this equation that showed up in my work had already been studied in some other works, and I've heard from a lot of colleagues doing research in physics that there's a lot of that going on. And at the forefront of knowledge, everything becomes so niche that it's very hard to know the latest details in, in neighbouring fields, and GPT is an amazing help with that.
- 8:26 – 11:19
AI as a 24/7 collaborator: going deeper and broader than specialization allows
- KWKevin Weil
Yeah, that's another thing that we've heard from, uh, professors, researchers that we've talked to is there's so much you have to be so specialised today. And so sometimes it gets hard to explore an area outside of your main area.
- ALAlex Lupsasca
Mm-hmm.
- KWKevin Weil
There's one particular, um, mathematician we were talking to who said, "You know, one of my last papers, I knew there was an area that I, I wanted to go follow it off in this direction, but it wasn't my specialty, and it would've taken me a long time, and I just kind of ended up feeling like, you know, maybe that's not the most efficient place for me to spend my time. Now, with GPT-5, I'm going to go back and explore that because I've got a, a coworker, effectively, a collaborator, who has read just about every scientific paper that's out there and is, is, a, you know, a pretty meaningful expert on just about any topic you want. And I think I'm going to be able to go explore these adjacencies in a far better way with ChatGPT than I could have on my own." And so that's also a fascinating new take, right? It helps every- it, it, it can help you go deeper, like you were saying, and it can also help you go more broad.
- ALAlex Lupsasca
Literature search is pretty interesting because, like, one of my weird hobbies is I like to go back and look at when was some early scientific discovery made that didn't get utilised till much later on. You know, a famous one was carbon filaments, you know, when Thomas Edison spent all that effort to try to find it. It had been published in, like, twenty years before.
- KWKevin Weil
Yeah.
- ALAlex Lupsasca
Of course, you know, Dewey Decimal System was invented that year, so you can't blame him. Uh, other things like, uh, silicon as a semiconductor. You know, if somebody had been reading the literature, we might have had that five to ten years earlier. Ability to replicate DNA, that had been published, like, ten or twelve years earlier-
- KWKevin Weil
Yeah
- ALAlex Lupsasca
... before somebody figured that out. And then the shotgun technique we use for DNA, you know, understanding, you know, figuring out, like, the DNA sequencing, that was first published in, like, 1982, but at that time, there weren't supercomputers that could run it.
- KWKevin Weil
Right.
- ALAlex Lupsasca
And that's exciting just to think of just having a really good tool that can search through all of this stuff and pull up these answers you have.
- KWKevin Weil
Yeah, and I think especially some of the most interesting research now happens at the, at the intersections of, of two fields. And a- again, it's, it's hard for one person to be an expert in two fields, let alone three or four or five.
- ALAlex Lupsasca
Mm-hmm.
- KWKevin Weil
And sometimes it's tough for humans to collaborate. You don't necessarily find the right person. The person doesn't have infinite patience. And here, with GPT, you have now the option to have a collaborator that will work twenty-four/seven, has infinite patience, you know, has read substantially every scientific paper written in the last however many years. And so it's just-- it's, it's a new kind of collaboration-
- ALAlex Lupsasca
Mm
- KWKevin Weil
... that is its own form of acceleration.
- ALAlex Lupsasca
You think about, like, Claude Shannon's wife was a mathematician and how much that a- to help what he was able to do, and I think we forget how much collaboration really is a factor of that. But I would say some people hearing this might go, "Yeah, but it couldn't spell strawberry last year."
- KWKevin Weil
Yeah.
- ALAlex Lupsasca
"It couldn't do math, so why are we going to have it do, you know, science?"
- 11:19 – 14:50
Kevin’s “fusion ladder” demo: undergrad to 20-year expert questions
- KWKevin Weil
Yeah. I- so actually, uh, I don't even know if I've told you this, my own sort of origin story-
- ALAlex Lupsasca
No
- KWKevin Weil
... with appreciating-
- ALAlex Lupsasca
All right
- KWKevin Weil
... uh, what GPT-5 could do, or in this case, it was, I think, o... this was almost a year ago, so it was o1-
- ALAlex Lupsasca
One
- KWKevin Weil
... preview, maybe.
- ALAlex Lupsasca
Yeah.
- KWKevin Weil
Um, but I was meeting with, uh, this, this guy named Brian Spears, who's a physicist at Lawrence Livermore. Um, it was in DC, and we'd never met before, so I, I didn't know sort of what to expect. I, I thought maybe I was going to go in and, um, be talking to him about what was new and what he could do with o1 preview and why he should give it a try. Little did I know, I sat down, and, and he immediately took control of the conversation and said: Let me tell you what I can do with your models, and, like, these are the most amazing things for science, and this is going to change the world. And he was like: Okay, let me take you through this.... and he opened up his laptop, uh, and, and, you know, he works on fusion, right? Lawrence Livermore, uh, was the first to, to do large-scale fusion with positive energy, like super exciting. So he's like: "All right, we're gonna take a fusion example, and first, I'm gonna start with the undergrad version of this problem." And so he, he shows me this conversation, and he's like: "All right, so you've got, you know, a, a, a copper rod, and we're gonna bombard it with super high-pressure waves. What happens?" And, you know, he's like, so he, he answered, and, and o1-preview gives a good answer. It's like, okay, cool. So it got the, it got the, uh, got the undergrad problem right. And then, "Now let's, let's ask the graduate version of this. Now what happens inside the, the rod itself as you're doing this? And, you know, what, what needs to be true in order for it to generate these certain kinds of shock waves?" And he goes through, and he's like: "Okay, so got that right. All right, now let's ask the postdoc-level question. All right, now let's ask the-"
- AMAndrew Mayne
[chuckles]
- KWKevin Weil
... And at this point, I'm like, you know, despite having a physics background, I'm just following along-
- AMAndrew Mayne
Right
- KWKevin Weil
... for the ride because he's beyond anything I can do. Like, all right, now let's ask the, you just joined Lawrence Livermore and you, you know, kind of question. You've gone through your postdoc, you're a nuclear physicist, and he keeps going, and o1-preview keeps getting the answer right. And then he's like: "All right, now let me ask you the you've worked at Lawrence Livermore-
- AMAndrew Mayne
[chuckles]
- KWKevin Weil
... for twenty years question."
- AMAndrew Mayne
[laughing]
- KWKevin Weil
And it goes and, and it gets it right, and then not only that, but it like suggests that the only way to go forward is to use these, these set of simulation tools that are like partially classified, or that only Lawrence Livermore has.
- AMAndrew Mayne
[laughing]
- KWKevin Weil
It's like, you know, I don't have access to these, but if you did, you would want to use these tools. And he's like: "Look, nothing in here, that nothing that I just showed you is something that I couldn't do, but it would've taken me days, and certainly not everybody at the lab can do this. Like, the acceleration that comes, that is going to come from these tools, is going to change science." And so I went from like sitting down with this guy, who I thought maybe I was gonna be sort of talking to him about the value of AI, to him just completely blowing my mind about the, the potential of AI. And this is a year ago, this is o1-preview, you know? We've come leaps and bounds since then, and the thing that I always try and, and like remind everybody, the AI models that we're using today, as good as GPT 5.1 Pro is, these are the worst AI models that we will ever use for the rest of our lives. And when you think about that, the fact that we're here-
- AMAndrew Mayne
Mm-hmm
- KWKevin Weil
... just implies that the future is very bright.
- 14:50 – 18:13
Black-hole symmetries: warm-ups, priming, and frontier reliability
- AMAndrew Mayne
How have your colleagues been using these tools?
- ALAlex Lupsasca
Yeah, there's a lot of different usages, I think. Um, literature search, here's what I'm working on-
- AMAndrew Mayne
Mm
- ALAlex Lupsasca
... does it connect to any other thing? And this is something that we spend a lot of time on as scientists, just understanding when something new shows up in our work, how it connects to other things. Um, and okay, m- my own experience that made me become AI-pilled, I think, is-
- AMAndrew Mayne
This is the reason you came to OpenAI?
- ALAlex Lupsasca
The reason, yeah. [chuckles] And, uh, when GPT-5 Pro came out, um, uh, I met, uh, Mark Chen, who works here at OpenAI, he's Chief Research Officer, and he gave me a challenge. He was very proud. He said: "You know, why don't you just give it a, a hard problem?" And I thought: Huh-
- AMAndrew Mayne
[chuckles]
- ALAlex Lupsasca
... huh, you want a hard problem? Okay. And so I, I gave it this question.
- KWKevin Weil
Quantum gravity. [chuckles]
- ALAlex Lupsasca
[chuckles] Right. So I had just found these, uh, new symmetries of black holes, which is something that doesn't happen that often, and I'd written up a paper that came out in June on the archive, and I was very happy about that. Um, and I thought, "Okay, well, let's see how GPT Pro handles this new question." And so I, I gave it the equation, and I didn't say that it has some symmetries, I did... Yeah, I didn't give it a leading question. I just said, "What are the symmetries?" And it, it thought for five minutes, and it said: "Hmm, mm, no symmetries." And I go: Ha! [chuckles] It's not there yet, still better than the AI. [chuckles] And, and Mark Chen is visibly crestfallen, and he goes:
- AMAndrew Mayne
[chuckles]
- ALAlex Lupsasca
"Okay, well, just, um, just give it an easier question then." And so I think, okay, I- I'm gonna give it the warm-up baby version of the problem, which is, find the symmetries of this equation, not in the full black hole spacetime, which is complicated, but in the flat space limit, where, where the spacetime is empty. And hit Enter, it thinks for, you know, nine minutes, and it comes back with this beautiful answer: "Oh, this equation has conformal symmetry, which is the, the correct thing, and here are the three generators," and it was very beautiful. And, you know, this version of the equation, it probably has been studied, I'm sure has been studied-
- AMAndrew Mayne
Mm
- ALAlex Lupsasca
... many times over the decades, so I don't know what it did exactly, but it came up with the answer. I thought, "Okay, this is very good. Like, you know, this, this is a great outcome." And then Mark said: "Okay, well, but now that it's been primed on the warm-up example, try again in this instance of chat, the harder problem." And I thought, "Okay, let's go." And so we give it the hard problem again, hit Enter, and it thinks, and it thinks, and that was the first time I saw it think for so long. I think it's like e- eighteen minutes. And it comes out with this beautiful answer that was completely correct, and that blew my mind, 'cause I had been working on this for a very long time. Um, and I would say that that calculation is at the edge of my abilities. I think it's something that, you know, very few people could have done the way I did it. Um, and so I was really shocked. [chuckles] Because, you know, you spend years of your life training to be best in class at something and finding symmetries of black holes and these kinds of equations, that's, that's my jam.
- AMAndrew Mayne
[chuckles]
- ALAlex Lupsasca
Um, and I thought, "Okay, so I guess that just happened?" [chuckles] And it really sent my mind reeling, and, um, I was a little bit shell-shocked for a few days, and I just couldn't stop thinking about it. And after that, I realized, okay, I have to become involved in this, because to see this capability emerge into the world, like, like right now, and not, to not be involved with this, just seemed crazy to me.
- 18:13 – 24:33
Low pass-rate problems and the hidden frontier: making iteration less painful
- KWKevin Weil
I was gonna... I, I actually think you made, uh-... you made a really important point in the middle of that around the, the fact that you gave it the hard question, it didn't get it right. You gave it an easier question, it got that right, and then you were able to give it a harder question, it got-- There is still, you know, as excited as we clearly are about the future here, there's also a very real set. Like y- y- when you're giving GPT-5 or any of these AI models a problem that's on the frontier, that's at the limit of their capabilities, they tend to still be wrong a lot, right? Kinda like any human would be at operating at the level of, uh, at, at the frontier of their capabilities. And it takes, you know, it isn't just automatic yet.
- AMAndrew Mayne
Mm.
- KWKevin Weil
Hopefully, in the future, it will be, you know, enter in any hard question, and the model answers it. But today, there's a lot of back and forth, and the people that are best, the, the researchers that are best at getting the most out of the models, have a sort of patience to go back and forth with them.
- AMAndrew Mayne
Mm-hmm.
- KWKevin Weil
I think that's natural. It's probably the way that you would work with any, any, you know, any two people operating at, at about the limit of their capabilities. But I, I think it's important, uh, especially for folks listening to this who are doing research with the models, to know that it's not-- it isn't just one shot, and it always works.
- AMAndrew Mayne
Right.
- KWKevin Weil
It-- there really is a, a back and forth and sort of a patience that it takes. Um, and one of the interesting research problems that we're spending a lot of time thinking about is how we, how we, uh, help people with... uh, yeah, how we sort of help reduce that, that cognitive load.
- AMAndrew Mayne
Mm-hmm.
- KWKevin Weil
'Cause when you're working on a problem, say, the model is, has a 5% pass rate on some problem. So technically, the model can get it right one out of twenty times, but it's really at the frontier, so it's not gonna get it right nearly, you know, even close to every time. If you're sitting inside ChatGPT and just entering in this question, you're gonna have to enter it in, you know, what? Ten times before you have the odds that it's gonna get the right answer.
- AMAndrew Mayne
Mm.
- KWKevin Weil
And that's-- most people aren't gonna do that. Um, and so there's a whole host of problems that the model can solve, that people probably try and are like, "Oh, after three tries, it didn't get it right, so let's-- I'll move on. The model's not good enough yet." And actually it is, but it's just very hard to tell apart low pass rate problems from problems that are too hard. And, uh, I think that's actually a really important thing for us to help researchers and mathematicians get past, uh, because the most interesting problems right now are gonna be the ones where the model has a very low but non-zero pass rate. Those are gonna be the hardest problems that the model can solve, the best ways that it can, that it can help accelerate science. And so that's a really interesting research problem that we're taking on, uh, to try and make that a little more automatic, a little less grunt work. But for now, uh, like, putting in the time and really going back and forth with the model does yield results.
- AMAndrew Mayne
Well, it, it feels like we're at a moment, kind of like when we went from GPT-3.5 to ChatGPT. 3.5 was a model, extremely capable model, but it was still effectively a base model.
- KWKevin Weil
Yeah.
- AMAndrew Mayne
And, and I was a prompt engineer at the time, and knowing how to prompt it, I could get great results for it, but it took all those little tricks to sort of understand the context. Then when we went to ChatGPT, and we understood, okay, we know the kind of problems people are trying to solve, let's make it a little bit easier for them to get there without having to do that. It feels like that's kind of where we're heading into a science, though, that now that you have people like Alex explaining the problems you're trying to solve and what you're doing, that we may see like a big acceleration with this.
- KWKevin Weil
I think it's probably just a characteristic of any question that's on the frontier of or sort of at the limit of what the models can do. And back with GPT-3.5 and early versions of 4, the questions that were at the, the limit of what the model can do were, were much more basic.
- AMAndrew Mayne
Mm.
- KWKevin Weil
Now they're, they're questions of, you know, scientific research, but you still, when you're operating at the frontier, the, the pass rate will be low, and so you've gotta kinda... Like, there's value in, in-
- AMAndrew Mayne
Yeah
- KWKevin Weil
... sticking with it and trying a few different things and taking the parts that it gets right and refining them, while telling the model where it got other things wrong.
- AMAndrew Mayne
I-
- ALAlex Lupsasca
I- in this example I mentioned, it needed a warm-up, but the warm-up was the obvious warm-up that you would do as a human.
- AMAndrew Mayne
Right.
- ALAlex Lupsasca
Because actually, when I was attacking this problem, I wasn't thinking about the black hole case first. The, the, this flat space limit was the obvious place to start, and that is where, where I, I began. And so I think, um, the models are actually really good, but we could get better at making them think of the warm-up-
- AMAndrew Mayne
Mm
- ALAlex Lupsasca
... problem themselves, so they can go, go there directly. But more generally, I think there's this, um, thing we have to bear in mind, which is that as scientists, our role is to push the edge of knowledge. There are things that are just beyond the edge, and our goal is to bring them before the edge of knowledge by understanding them, but this edge is very jagged.
- AMAndrew Mayne
Mm.
- ALAlex Lupsasca
So there are very basic questions about the universe, like, why are there three dimensions of space? Or, you know, h- what happened to the Big Bang? These are things that everybody w- wants to know the answer to, and yet, even though they're simple questions, there's really nothing intelligent to say about this. W- we just don't know. They're very hard problems, actually. And then, meanwhile, there are these very hard questions that you would think we wouldn't be able to answer at all, to which we have extremely detailed answers. We can predict the electron dipole moment to, I don't know, 12 decimal places, something crazy. So there-- the, the edge of human knowledge itself is, is very jagged.
- AMAndrew Mayne
Mm-hmm.
- ALAlex Lupsasca
And it takes many years of graduate school to learn where the edge is, and I think what we've-- we're finding with these AI models is that the edge of their knowledge is also very jagged. So you mentioned, you know, there's some basic questions that it can't a-- that the models can't answer. That's true. At the same time, there are some very hard questions that they're very well suited for already today, and I think what's exciting is that their edge of knowledge is very jagged in a way that's different from ours. Um, so obviously, as time goes on, I think the edge of, of a- ability for these models is going to keep expanding, but as long as it expands in a way that is slightly different from our edge, that's also really interesting because at, at the intersection-... where, where it can go farther than us, so we can get ahead of it, that's where a lot of interesting things are gonna happen, I think.
- KWKevin Weil
Yeah, human and AI together-
- 24:33 – 27:22
A snapshot paper: documented workflows plus new mathematical results
- AMAndrew Mayne
I, I wanna explore that a little bit more, but first, uh, tell me about the research paper.
- KWKevin Weil
Yeah, so, uh, we, we've, we've talked a bunch about these anecdotal examples-
- AMAndrew Mayne
Mm-hmm
- KWKevin Weil
... that Alex has gotten from the time that he spent with his colleagues, that we see coming in across Twitter, you know, on a semi-daily basis at this point. And we wanted to sort of bring them together and just write something, publish something about, that, that, that lays out the current sort of state of GPT-5 with respect to science. Um, and so what we, we've got, it's a handful of collaborators from inside OpenAI, and I think eight or nine academics from, uh, beyond our walls, across a bunch of different fields: math, physics, astronomy, uh, computer science, biology, material science. And, uh, the paper is, uh, something on the order of 12 sections, each one highlighting a different way that GPT-5 is accelerating their work. The goal was not to be, you know, hypey and say, "Everything is solved."
- AMAndrew Mayne
[chuckles]
- KWKevin Weil
It's, you know, it's really to say, look, there-
- AMAndrew Mayne
Hoverboards for everybody. [chuckles]
- KWKevin Weil
Yeah. Like, this is what works, this is what doesn't work, here's what I tried. In many cases, we're sharing the, the ChatGPT, you know, the full share links-
- AMAndrew Mayne
Mm-hmm
- KWKevin Weil
... the conversation, so you can see the back and forth that the scientist has with the model. Um, and it's, it's meant to be kind of a moment in time to say, "This is where we are today," and I think we'll look back in six months, 12 months, and, and, you know, we- we'll probably be much further, and that'll be exciting. But even where we are today, we've got, uh, a section in the paper on the different... a bunch of different examples around literature search, a section in the paper with a bunch of different examples around acceleration, whether it's calculations or other things like that. Um, and then a section where we actually contribute four or five new non-trivial results in mathematics.
- AMAndrew Mayne
Hmm.
- KWKevin Weil
Um, and a couple of these are small, a couple of them probably could have been papers on their own. Um, and so it, it-- you go from kind of the, the mundane, but, but very pragmatic and real bits of acceleration to the, the more sort of profound GPT-5 actually pushing past the, the current frontier of human knowledge. Um, and so we're super excited about this paper. Um, it's, you know, I think there'll be a lot more to come. We're not the only lab doing great work, by the way. Um, Google has been doing this for a while, and I have a ton of respect for what, uh, Demis and the team have done with AlphaFold and, and more. Uh, I just think we're at a really exciting time. You know, ideas in science often have their moment, when you have multiple people coming with the same idea, whether it's quantum mechanics, like, like Alex was talking about, or the light bulb. Right now, it's very clear that AI is, is just beginning to change science. Um, and it, it, it's gonna be an exciting few years.
- 27:22 – 29:53
Advice to students: AI as a pathfinder through the unknown
- AMAndrew Mayne
What advice do you have for students and grad students in the sciences? 'Cause I hear people talk about like, "Oh, we're not gonna need scientists anymore," which sounds absolutely crazy. It's not like the telescope got rid of the astronomer-
- KWKevin Weil
Right, exactly
- AMAndrew Mayne
... it actually created the astronomer. Um, how do you feel about that, and what advice do you have?
- ALAlex Lupsasca
Okay, I think first of all, it's important to acknowledge there's a lot of anxiety in academia right now that is unrelated to AI. It has to do with lots of changes in the way that science is organized in this country, and we're still going through these changes. Um, so I think that, uh, talking to young people, there's a lot of anxiety surrounding this. I actually think AI is, um, a really exciting new tool that's coming, uh, that's becoming available, that is going to help a lot, because, um, it's just gonna make everybody just so much more efficient. Um, as Kevin was mentioning earlier, when you work on a research project, oftentimes you don't know which way exactly to go. Um, you know you're here, you wanna get there, but there are different possible paths, uh, different lines of attack, and the whole point of research is that from the get-go, you don't know which, which way to go. And, um, one of the things that's really fun, actually fun, um, with GPT, is that you can just say, "Hey, I'm trying to solve this. Here are some ideas I have." You can upload some notes that you have or just dis- describe it in a few sentences, and it's very good at getting what you're trying to do. And then you can just say: What if I approached it this way? Or what if I were to do it this way? And it, it can immediately go off and chart a path through the unknown, just signposting, um, different potential avenues. And that actually saves so much time because, you know, I, okay, I'm a human, I have limited time, energy, and when I'm gonna put in the effort to do a calculation, I spend a lot of time trying to prototype it and think ahead where it's gonna take me. And with ChatGPT, I, I, I, I can just launch it in that direction, in that direction, in that direction. And it doesn't like completely get everything right, but just having these signposts along the way is so helpful because then, when you do go down the path yourself, y- you have somebody helping you along, it feels like. Um, and I, I think that's just gonna make everybody faster, more productive. And, you know, I-- already, the young people that I meet are spending a lot of time experimenting with ChatGPT and, and figuring out its capabilities. Um, and I, I, I think it's gonna be a, a boon for everyone.
- 29:53 – 36:43
12 months vs 5 years: bottlenecks, biology, and turning predictions into reality
- AMAndrew Mayne
You mentioned part of the idea of the paper was to say, "Okay, this is where we are now. Let's go look in six months." Let's talk, we're five years since GPT-3-
- KWKevin Weil
Mm-hmm
- AMAndrew Mayne
... five years from now, we're sitting down here. What are we gonna see?
- KWKevin Weil
Oh, man! [chuckles] The five-year question is so hard. I mean-
- AMAndrew Mayne
[chuckles] It's a great question.
- KWKevin Weil
[chuckles] 'Cause, eh-
- ALAlex Lupsasca
Here's a crystal ball. [chuckles]
- KWKevin Weil
Yeah. You know, I think, I mean, the exciting thing about this field in general is from, like, you look back 12 months, and you're completely embarrassed by where you were 12 months ago. You know, the idea if, if I... When GPT-3 launched-... it was unbelievable, right? I mean, it, it, I'll speak for myself, it blew my mind-
- ALAlex Lupsasca
Mm-hmm.
- KWKevin Weil
-the idea that AI could do any of these things. And then somewhere in around, like, three poi- GPT 3.5 and 4, the, the, the Turing test, which we had held up for, like, what, 75 years as the pinnacle of artificial intelligence research. Like, "Oh man, the world will be different when an AI-
- ALAlex Lupsasca
Mm
- KWKevin Weil
... can pass the Turing test," we just went whooshing by, and, like, now we just don't talk about the Turing test anymore.
- ALAlex Lupsasca
It's totally forgotten. [chuckles]
- KWKevin Weil
Um, and even you look back to the beginning of, of this year, of 2025, and most people were writing code themselves. Most engineers were writing all of their own code, and the idea that you-
- ALAlex Lupsasca
It's gross, writing it yourself. Ugh. [chuckles]
- KWKevin Weil
[chuckles] And now fast-forward, and you've got, like, uh, the idea that you would do really much of anything without leveraging Codex-
- ALAlex Lupsasca
Mm
- KWKevin Weil
... Claude Code, GitHub Copilot, you know, any of these tools, they're all incredible, is crazy, right? You're so much more productive with it. So just in 12 months, and I don't, I... I- in 12 months, software engineering has fundamentally changed. I think over the next 12 months, we're going to see profound changes in the way that science is done, you know, both in the stuff that we can do in silico, in theoretical physics, and mathematics, and computer science, and I think we're going to begin to see it in the life sciences and the physical sciences. That's over the next 12 months. I mean, five years? [chuckles]
- ALAlex Lupsasca
So that, yeah, that, that's a, a question I think about a lot because when it comes to mathematical proof, I can kind of go into a computer, and I can test that, and I can verify that or at least test with it-
- KWKevin Weil
Yeah
- ALAlex Lupsasca
... with some extent. The same with some sort of equation for physics. But when you get into talking about the life sciences or material sciences and stuff, are we going to have a bottleneck of way more predictions than ways to test them?
- KWKevin Weil
Well, I think one of the valuable, uh, there's so many areas where models can help with life sciences. If you take, you know, biology, drug discovery, for example, there, um, y- you have a huge search space, and the, the more that the models can learn, uh, how to prune that search space, the more... Even if you're going to end up with, uh, a bunch of physical, real-world experiments to run at the end of the day, if you can intelligently prune the search space-
- ALAlex Lupsasca
Mm
- KWKevin Weil
... then you can more rapidly converge on the, the drugs that are likely to work in particular scenarios. Um, and, and then you can think about the impact, you know, for, uh, for that to have real-world impact, you need to make it all the way through, um, the, the, the regulatory process.
- ALAlex Lupsasca
Mm-hmm.
- KWKevin Weil
That is its own process that AI can help speed up, um, because you end up needing to write these huge papers that, that bring together, uh, you know, tons of different findings and, and, and so on. So it- you can take each step of the process, it can... and AI can help upfront as you prune the search space and try and find, you know, candidates that are more likely to be, to meet your needs and meet the goals that you have. And then as you go through the process to getting this thing out to consumers and making a real-world impact, uh, AI can contribute there, and we have, we have pilots with a number of the companies in the space doing that. So it's, um, it, it, it really is fairly broad-based.
- ALAlex Lupsasca
You started off with an interest in particle physics. You were studying that, and then you found other things.
- KWKevin Weil
[chuckles]
- ALAlex Lupsasca
And now you find yourself back in the sciences. Do you think other people are going to follow that pattern?
- KWKevin Weil
Uh, I mean, I- it is an absolute privilege for me to, to get to come back and work on science, and, you know, I am nowhere near the scientist that, uh, folks like Alex and other people here at OpenAI are, but I, I don't know of something... You know, I, it, it, I think we talk a lot about AGI at OpenAI, Artificial General Intelligence. I think maybe the most profound way that people are going to feel AGI in their lives is through science.
- 36:43 – 40:31
Compute and thinking time: longer reasoning boosts success on hard problems
- AMAndrew Mayne
I, I think you've both made a very good point, and that is that these models improve at such a rapid pace that sometimes people have a very firm idea of what they are 'cause they tried something six months ago.
- KWKevin Weil
Yeah.
- AMAndrew Mayne
And I've encountered with people who I really respect, and the scientists are like, "Oh, I tried it," and I'm like, "Well, you tried it 18 months ago."
- KWKevin Weil
Yeah.
- AMAndrew Mayne
And they're not used to a tool evolving that quickly.
- KWKevin Weil
Yeah, or they're using the free version because-
- AMAndrew Mayne
Yes
- KWKevin Weil
... you know, of course, that's how everyone starts. Um, and the free version doesn't think for as long, and so-
- AMAndrew Mayne
Yeah
- KWKevin Weil
... it can't solve problems that are as challenging. Um, yeah, I think that's really real. It's one of the reasons that I, I, I think the best advice is to just, like, keep trying the problems. Even if you're working on problems, and as you try them on GPT-5, it, like, isn't super helpful, I wouldn't give up. I would keep trying it every few months, and I think at some point, you know, it's gonna start being valuable if it's not already there today. W- we talked about sort of, uh, thinking time. Yeah, that's another area that we're really-
- AMAndrew Mayne
Mm
- KWKevin Weil
... ex- excited to see that with GPT-5 Pro, you can get the model... I've seen it think for what? Maybe 40 minutes on, on-
- ALAlex Lupsasca
Yeah, we have an example in the paper where-
- KWKevin Weil
... some of the hardest problems. Um, but, you know, it has a, it has a certain amount of, of sort of compute allowance because we have to serve it to many, many, many people.
- ALAlex Lupsasca
Mm.
- KWKevin Weil
Uh, 40 minutes is certainly not a limit on thinking. Like, it-- the models can think for 2 hours, 6 hours, 12 hours, 24 hours, and one thing we continue to see is that pass rate on hard problems continues to improve as you give the models more time to think.
- ALAlex Lupsasca
Mm.
- KWKevin Weil
Which is like, you know, it, it's surprising, actually, the number of times there's a totally reasonable human, like, intuitive human analogy to these things. There are a lot of problems that I can't solve in 20 minutes, but that I might be able to solve if you gave me two hours.
- AMAndrew Mayne
System one and system two thinking.
- KWKevin Weil
Yeah, and some that I can't solve in two hours, but if I had a day to really think about it and try different things, I might get there, and the models are the same way. So being able to give a much smaller-- you know, there aren't as many scientists in the world as there are users of ChatGPT. If we could find ways to give scientists that really know how to use the models well, just a huge amount of compute-
- AMAndrew Mayne
Mm-hmm
- KWKevin Weil
... I think that is yet another way that we can accelerate science.
- AMAndrew Mayne
Yeah, it's a, it's a very good point because you'll hear people talk about we hit a wall or whatever, and one of the things that was really an amazing discovery, which, you know, a year ago, we found out about the whole the reasoning paradigm and the fact that you can just take the model of today and let it think longer. And we think about, you know, people go, "What would we do with all this compute we're building, all these, this hyperscaling?"
- KWKevin Weil
Yeah.
- AMAndrew Mayne
It's like even using today's models and letting them think for a long time, we could probably have some amazing discoveries.
- KWKevin Weil
Yeah, 100%. I, I think if, if model progress stopped today, just the process of, of driving awareness within the scientific community and giving people more of the best that the models can deliver, I think we would see a, a large amount of scientific acceleration. But of course, progress is not gonna stop, as Alex was saying. And so it-- when you think about the models being able to think for a longer time, um, being able to train them to do harder and harder scientific tasks, um, and actually also just, you know, getting out in the scientific community and helping people see what the frontier really is and how they can use the models better to do the work that they're doing, um, I, I just-- like, I'm excited to see where this goes over the course of the next 6 months, 12 months, 24 months.
- ALAlex Lupsasca
Yeah, I think this is a really unique time in history. It feels like a special moment. And to be clear, we're not telling people, "Drop whatever you're doing and come do AI." That's not the message.
- KWKevin Weil
Mm.
- ALAlex Lupsasca
I think what we want to say is, "Keep doing what you're doing-
- KWKevin Weil
Mm
- 40:31 – 42:18
Scientific benchmarks after saturation: from GPQA to frontier evaluations
- AMAndrew Mayne
One of the challenges right now with benchmarks is that models, when we talk about terms like saturation-
- KWKevin Weil
Mm-hmm
- AMAndrew Mayne
... a lot of things, like models, have done that. Also, a lot of them are just don't seem that impressive anymore. Now it looks like we're moving to the scientific frontier. What do scientific benchmarks look like?
- KWKevin Weil
Yeah. Like with many things, there's sort of an intuitive, uh, way to understand this. As the models get smarter, benchmarks are just a way of, of testing the model in some sense.
- AMAndrew Mayne
Mm.
- KWKevin Weil
And as the models get smarter, you need to give them harder and harder tests because they learn how to ace the-
- AMAndrew Mayne
Mm
- KWKevin Weil
... the earlier tests. So if you take GPQA, which, uh, stands for Google-Proof Q&A, it's a, it's a, a scientific benchmark that asks basically PhD-level questions across a range of scientific fields. We thought for a long time that was a very hard benchmark to beat. I, I think it came out in 2023, and GPT-4 originally was, like, at 39% on this benchmark. Humans, by the way, are at about 70%. But now you fast-forward two years, and our latest models are nearly at 90%.
- AMAndrew Mayne
Wow!
- KWKevin Weil
So they're surpassing the capability of most humans in their field of scientific study across every field at once, which is kind of amazing when you think about it. Um, but that isn't-- you know, those aren't the hardest questions in the world, and that's one of the reasons that we're focused on, uh, new evaluations that ask frontier science and mathematics questions. It's also, you know, we released something called GDPVal recently, which is an eval that tests the model's ability to do economically valuable tasks.
- AMAndrew Mayne
Mm.
- KWKevin Weil
So the, the smarter the models get, the harder the tests that we want to keep giving them. Uh, because, you know, every gap that we see, every place where the model can't answer a certain question, that's feedback for us and gives us a way to improve the model further.
- AMAndrew Mayne
Curing disease, great. [laughing]
- KWKevin Weil
[laughing]
- 42:18 – 48:12
Where they want acceleration most: black holes, dark matter, and fusion—then global adoption
- AMAndrew Mayne
What area, though, beyond that, would you really like to see? And it could be crazy or weird or odd you'd like to see scientific acceleration.
- KWKevin Weil
You wanna go first?
- ALAlex Lupsasca
Well, I'm very selfish, so I have- [laughing] ... my own interests. Um, I really like black holes.
- KWKevin Weil
Right.
- ALAlex Lupsasca
That's my passion.
- AMAndrew Mayne
You want to build a black hole?
- ALAlex Lupsasca
Uh, I think there's a lot of potential for how AI can, uh, accelerate black hole research. And, of course, I, I, I want to see it help with cancer and drug discovery and all these good things, but my first priority is-
- AMAndrew Mayne
Yeah, we get that. Yeah, yeah.
- ALAlex Lupsasca
Yeah, I want, I want to see more AI helping with black holes. Um, so-... you know, [scoffs] there's a lot of ideas on the table and so much potential. O- one thing is, um, there are a lot of theoretical questions that are very thorny, and I think if, uh, you just sat down and you could understand everything that is known, and you could integrate it, integrate that knowledge, I think a lot of things would fall out of that. Um, and that's one of the things that we're exploring. Um, you know, um, dark matter, for instance, is something that we've been talking about because, um, there's a lot of data on dark matter from, like, various experiments, but we still have no idea really what it is. There's a bunch of theories out there. I think a really interesting idea is, could it be that by, um, feeding ChatGPT all the experimental data that is known about dark matter and all the theories, it could rule some of them out already by combining bits of knowledge that are just so disparate that it's hard for our human minds to hold them together? Um, I think that's kind of an exciting frontier. Um, and then, uh, I think also, since we were talking about the far future, um, experimental work is totally not out of the question. Right now, we're focused on more theoretical fields because they can be done in silico. But you could totally imagine using AI, um, to design better experiments and maybe run very hard, complicated experiments, including maybe for black hole physics and other fields. I think there's a lot of, um, ground to explore here and very exciting possibilities.
- KWKevin Weil
And I'll say fusion. Um-
- ALAlex Lupsasca
Yeah
- KWKevin Weil
... just because the, i- if we can actually... We have, again, small scale, uh, I mean, large scale, but small, small, uh, existence proofs of it. Um, so clearly it can work, and the, the challenge now is to do it, like, uh, at, at bigger scale, m- more reliably. Clearly, it's possible. We will figure this out, but if we can accelerate it, then, you know, the world, the world with fusion is a, a significantly better place than the world without. We solve a lot of problems if we, if we solve fusion, and, uh, you know, I'm excited to see if maybe we can contribute in some way.
- AMAndrew Mayne
I think it's easily overlooked by people how much we're dependent upon energy, and if we had the same orders of magnitude improvement on energy producti- production that we had in the last two hundred years, what that unlocks. And you think about, you know, things that are energy-intensive, like desalinization, you know, or construction and other things, and when you have really, really, really unbound energy-
- KWKevin Weil
Yeah
- AMAndrew Mayne
- it's incredible.
- KWKevin Weil
I mean, some groups might need to, like, might be looking to build lots of infrastructure for lots of GPUs, for example.
- ALAlex Lupsasca
Mm. Yeah. [chuckles]
- AMAndrew Mayne
Yeah. [chuckles]
- KWKevin Weil
Yeah. [chuckles]
- ALAlex Lupsasca
Who knows? Who, who might want to do that? [chuckles]
- AMAndrew Mayne
But even, yeah, even beyond that, I think that we're gonna probably see from that, the infrastructure build-out, a lot more energy devoted to energy. And much like mobile phones and laptops made electric cars a lot more efficient because of all this money being thrown into battery technology, I think we'll probably see that offshoot.
- KWKevin Weil
Yeah, and I think, you know, anytime you change something by an order of magnitude, the world changes. I think the, the-- what we've seen over the past year with the way that software engineering has changed, you now don't need to be trained as a software engineer to write, you know, meaningful amounts of, of code. That means you can bring, you know, there are, like, what? Thirty million software engineers in the world. I think now three hundred million, maybe three billion people can, can write software, and that's gonna fundamentally change things. If we can move... You know, if we can make energy ten times more prevalent, ten times cheaper, it will change the world, and I think it's a really, uh, high-potential place for us to apply the intelligence of our models.
- ALAlex Lupsasca
And if I can add something, we have ideas that we're excited about in terms of the potential of AI to change science, but this is very much not supposed to be a, a top-down-
- KWKevin Weil
Yeah
- ALAlex Lupsasca
... effort, where we dictate what AI is gonna do in the world. We're actually very excited about building the best general-purpose AI, and if we release that into the world, then everybody will take it and use it for their own purposes. And, you know, for me, I'm a black hole physicist, I want to use AI to further black hole science. But, you know, for a scientist in another field, I think it's natural to use it for that. And the nature of research is such that it's very hard to know where the next breakthrough is gonna come from, really. And so, I think our, our vision is to push this out into the world. Um, we can see a-- I think we could see a lot more adoption than we have today, and once that happens, um, who knows, who knows where the next biggest discovery will come? But that's how we give us the, the-- give ourselves the best chance to accelerate scientific discovery.
- KWKevin Weil
Yeah, it's such an important point. The, the, the frontier or the, the, the surface area of science is massive, and this is not about what we can do within OpenAI individually to, to accelerate science or to accelerate specific scientific projects. It's about giving scientists all around the world AI, so that they can accelerate their work. That's how we move science forward faster. Um, so, you know, there, there are pieces I think that we will try and do because it'll help us learn, but the vast majority, like, what we really want, is to see a hundred scientists win Nobel Prizes using AI.
- ALAlex Lupsasca
Mm-hmm.
- AMAndrew Mayne
Yeah, it feels like it's not the end of science, it's really the start.
- KWKevin Weil
Exactly.
- ALAlex Lupsasca
Exactly. [chuckles]
Episode duration: 48:12
Install uListen for AI-powered chat & search across the full episode — Get Full Transcript
Transcript of episode 0sNOaD9xT_4