AGI progress, surprising breakthroughs, and the road ahead — the OpenAI Podcast Ep. 5

How close are we to automating scientific discovery? What do AI competition wins really tell us about progress toward AGI? OpenAI Chief Scientist Jakub Pachocki and researcher Szymon Sidor share inside stories—from gold medals at the International Math Olympiad to surprising leaps in reasoning—that reveal where AI is headed next. Chapters 1:20 – From high school in Poland to AI research leaders 4:50 – Explaining AGI: technical and everyday perspectives 6:30 – Automating scientific discovery with AI 7:50 – Breakthroughs in medicine, AI safety, and alignment 10:30 – Today is a decade in the making 14:30 – Benchmark saturation and its limits 16:50 – Why math competitions matter for AI 18:15 – How models reason without tools 21:45 – Recognizing when a model can’t solve a problem 23:30 – Storytime: AtCoder competition in Japan 26:50 – How reasoning breakthroughs really happen 28:55 – What’s next for scaling and long-horizon reasoning 30:30 – What AGI will look and feel like 36:25 – Balancing trust and personal value 34:00 – Advice to high school students in 2025

Andrew MaynehostJakub PachockiguestSzymon Sidorguest

Aug 15, 202540mWatch on YouTube ↗

EVERY SPOKEN WORD

40 min read · 7,880 words

0:00 – 1:20
Intro
1. AMAndrew Mayne
  Hello, I'm Andrew Mayne, and this is the OpenAI Podcast. Today, our guests are OpenAI's chief scientist, Jakub Pachocki, and Szymon Sidor. We're gonna talk about measuring AI progress, how you determine AGI, and where the next breakthrough might come from.
2. JPJakub Pachocki
  The model was able to correctly identify that it didn't make progress on the problem.
3. SSSzymon Sidor
  We started asking very, very seriously the question, like, are we ready as an organization for, for incredibly fast-paced progress?
4. JPJakub Pachocki
  When we think about how we shape our research program at OpenAI, we seek to create intelligence that is very general.
5. AMAndrew Mayne
  I want to first start off by understanding your roles. So, Jakub, you're the chief researcher, chief scientist at OpenAI?
6. JPJakub Pachocki
  Chief scientist, yes.
7. AMAndrew Mayne
  Okay, what does chief scientist mean?
8. JPJakub Pachocki
  So the primary thing I'm responsible for is setting the research roadmap for the company. Um, so deciding what is the technical path we are going to bet on, and what is the, um, the underlying long-term research that, that, that, that, that we're going to pursue.
9. AMAndrew Mayne
  So how about you, Szymon, what do you do?
10. SSSzymon Sidor
  Random things.
11. AMAndrew Mayne
  Random things. [chuckles] Okay.
12. SSSzymon Sidor
  Um, yeah, I, I, I mostly do IC work. Uh, I try to, um... Well, maybe sprinkle of leadership somewhere in there.
13. AMAndrew Mayne
  Mm-hmm.
14. SSSzymon Sidor
  Uh, I try to do what's the very s- most
1:20 – 4:50
From high school in Poland to AI research leaders
1. SSSzymon Sidor
  useful.
2. AMAndrew Mayne
  Now, you two knew each other before working at OpenAI, right?
3. SSSzymon Sidor
  Yeah, we went to the same high school.
4. AMAndrew Mayne
  Same high school?
5. SSSzymon Sidor
  Yeah.
6. AMAndrew Mayne
  Were you guys friends?
7. SSSzymon Sidor
  Uh, I think we became best friends w- when, when, uh, after we left. Like, I think kind of coming to US is the kind of, uh, emotional experience that forms bonds.
8. AMAndrew Mayne
  Right.
9. SSSzymon Sidor
  Uh, I think in, in, uh, in high school, uh, uh, w- we were more like colleagues.
10. AMAndrew Mayne
  What, what kind of high school produces guys like you? [chuckles]
11. JPJakub Pachocki
  So, well, yeah, we, we went to this high school in, um, in Gdynia, in Poland. Uh, I think we were both drawn there by this, uh, computer science teacher-
12. AMAndrew Mayne
  Mm-hmm
13. JPJakub Pachocki
  ... uh, Mr. Ryszard Szubartowski, um, who's had a great track record, uh, before, be- before we went there, of, of, of, of, uh, bringing up, uh, um, computer scientists, programmers, uh, um, with this, like, big focus on programming competitions and kind of, and pursuing, uh, you know, excellence in this, like, like, one field. Yeah, so and I, I think that was, like, a very formative experience and a great mentor for us.
14. AMAndrew Mayne
  Oh, wow!
15. SSSzymon Sidor
  Yeah. No, definitely. Uh, I think he there was, like, really going deep on programming. I think it went way beyond, like, typical high school curriculum. Like, there was, like, graph theory, matrices, and all sorts of stuff like that. I actually hope that maybe with ChatGPT, it's a little bit easier for people now to do these kind of-
16. AMAndrew Mayne
  Mm-hmm
17. SSSzymon Sidor
  ... deep dives. 'Cause, um, you know, without the right mentor and without a lot of work, it's, it's kind of h- hard to replicate that experience.
18. AMAndrew Mayne
  I've been using it to explain things like there's, you know, the Monty Hall problem, where you have to choose which door, and you go into ChatGPT, and you say, "Make a graphic, interactive version of this," and all of a sudden, you can see it. It can show you the different solutions if you do one thing or the other. I think that's... It's one of these things where I'm excited about the ability to not just explain in text, but to build multimedia to do things, and it does get into the area of there's not really a measure for that. You know, that it's a use case. You know, this didn't exist before, and, you know, we're, we talk about AGI, but we kind of have very loose definitions and whatever, and I'd love to hear kind of like what, how you would describe it, both from a technical and also, like, a layperson's understanding of it.
19. JPJakub Pachocki
  Yeah. Uh, oh, m- maybe to, like, address the point about teaching, um, you know, this sort of, like, better, um, explanation of some concept or, like... or, or, you know, teaching through Socratic methods, like, definitely, uh, yeah, a powerful use of ChatGPT, and I think works well with, like, a, um, um, a teacher, like, like, like, uh, like our Mr. Szubartowski. But at the same time, um, I think, like, the, the thing that, like, he was able to provide was more, like, kind of emotional support and space-
20. AMAndrew Mayne
  Mm-hmm
21. JPJakub Pachocki
  ... which I think is, uh, it will be, it will be hard for AI to, to do, to, to do alone.
22. AMAndrew Mayne
  That's a great point. I think, I think that gets lost a lot 'cause I sometimes hear people talk about, "Oh, well, AI will replace education," and for me, I had teachers that maybe their facts weren't always right, but their heart was there, and they were caring, and they answered questions for stuff, and so I think that's a good point, that these are companions to that, and I think that a teacher using these tools can be a even more capable teacher.
23. JPJakub Pachocki
  Yeah.
24. AMAndrew Mayne
  But, uh, so on the subject, though, of AGI, though, uh, I want to hear first, like, give me your technical definition of it, or actually not a technical definition. Give me, like, how you would describe it to, like, if you were talking to, you know, a younger sibling.
4:50 – 6:30
Explaining AGI: technical and everyday perspectives
1. JPJakub Pachocki
  You know, a, a few years ago, when we would talk about AGI, um, it, you know, it felt like the technol- the technology, deep learning, uh, you know, has incredible promise, but at the same time, the concept still felt a little bit abstract, right, and far away. Um, and so I think, you know, whether you talk about, like, you know, human-level intelligence, ability to converse naturally, uh, you know, ability to, you know, solve math problems or pursue research, like, they all kind of felt like in the same, um, space. Um, I think, yeah, as, as the technology has progressed, now we see there's, like, this... these are actually, like, quite distinct, uh, capabilities, and I think we pretty clearly are at the point where the AI is able to converse naturally, uh, on a, a wide range of topics. Um, you know, it is able to solve math problems. I think, you know, like, getting a gold medal at IMO is something-
2. AMAndrew Mayne
  Mm-hmm
3. JPJakub Pachocki
  ... we've long discussed as, like, a milestone on the path to AGI, and that, that happened. Um, I think, um, solving, you know, solving all the problems on the National Math Olympiad is actually a little bit harder, and, and, you know, I, I think it's, like, another milestone on the path there. But I think increasingly, um-... you know, we see that, like, these kind of pointwise measures are less adequate, and so we, we turn to thinking about, like, what is, what is its actual impact in the world?
4. AMAndrew Mayne
  Hmm.
5. JPJakub Pachocki
  Um, for me personally, the, the, the, the thing that I think about when I think about, uh, how, um, AI progress really
6:30 – 7:50
Automating scientific discovery with AI
1. JPJakub Pachocki
  impacts the world meaningfully, I, I, I, I first think about its potential for automating, um, the discovery and production of new technology.
2. AMAndrew Mayne
  Hmm.
3. JPJakub Pachocki
  Um, [lips smack] I think, um, you know, we tend to associate kind of new ideas, you know, fundamental technological progress with just human ingenuity. Uh, and, you know, we, we, we measure kind of the, you know, the, the, uh, our progress by these, like, kind of major milestone i- inventions and, and, and, and technological revolutions. Um, and I think it is just hard to internalize, like, this is possi- it, it, it is possible to automate most of this process. It is possible to have, uh, a big computer that is coming up with ideas that fundamentally change our understanding of the world, and I, I actually think that is not that far away. Uh, and so thinking about that, you know, what separates us from that, and what are the consequences of such technology is, is, yeah, i- it's my first thought.
4. AMAndrew Mayne
  I, I just ordered a little Mac Studio because I wanna take, uh, the open source model, GPT-OSS, and I wanna just let it run non-stop. 'Cause that idea, just the idea of letting it generate and do stuff twenty-four seven is fascinating to me. But you're talking about a scale of basically automating science at a huge scale, and so what kind of discoveries, what kind of things do we think
7:50 – 10:30
Breakthroughs in medicine, AI safety, and alignment
1. AMAndrew Mayne
  might be the first things we could see from that?
2. JPJakub Pachocki
  Wh- when we think about how we shape our research program at OpenAI, we seek to create intelligence that is very general. Um, we, you know, drive towards this automated researcher as a priority, but we, um, you know, we don't really think of it as, like, "Let's take these specific domains, and let's kind of, like, deploy this technology there." I think that is a way to, like, make faster pointwise progress, but I think the potential for, like, the really big discoveries and, and, and, and, a- and, and, and the most meaningful technology advancement comes from this general- generality.
3. AMAndrew Mayne
  Mm-hmm.
4. JPJakub Pachocki
  Um, although still I think we see kind of like, you know, the technology, like, uh, you know, it's, it's, it's kind of easier to apply in some domains than others. I think, um, I think especially in places that combine, uh, a large amount of reasoning with a lot of kind of domain knowledge and intuition, uh, seem, uh, very, um, very amenable to, to, to, to, to, to these systems. I think, in particular, we see, like, pretty incredible results on, uh, medicine, which is very encouraging.
5. AMAndrew Mayne
  Mm-hmm.
6. JPJakub Pachocki
  Uh, I have high hopes about that. Um, yeah, I think naturally, being a company of AI researchers, uh, we think a lot about, uh, automating our, our own work. I think it is also kind of a, uh... You know, I, I don't think it's, um... You know, if it is, if AI can indeed reach a point where you can automate AI research, then that is probably a very important thing to automate. Uh, and, you know, and similarly thinking about, like, how we can help with automating research on AI alignment and safety.
7. SSSzymon Sidor
  I'm obviously impressed by the, like, IMO IOI results. I, I mean, uh, I was actually about to add that, like, um, in the past, when we were talking about the IMO of Jakub, like, that was, like, you know, a few years ago, uh, and, um, and we, we are still, like, kind of trying to even figure out what our definition of AGI might be. Like, like, one, one, one kind of concept we are considering is something like, you know, solving all the problems on, on, uh, on the Math Olympiad. Uh, and, um, and, and, and why did that feel appropriate? Is just like, okay, if, if you have a model of such a superior mathematical reasoning, then [clears throat] it should be able to, like, disrupt, like, a bunch of different, uh, domains that kind of, uh, can be mathematically modelled. Um, right? Um, I mean, I'm, I'm in general just, um... I think maybe this podcast is just a good opportunity to, to kind of, like, share a little bit more of an inside perspective. I was
10:30 – 14:30
Today is a decade in the making
1. SSSzymon Sidor
  just astounded by the progress. I, I think-- So sometimes I see those headlines w-where, uh, where, where people say that, like, "Oh, um, the economic kind of impact of AI is only, like, 3% or 5%," right?
2. AMAndrew Mayne
  Mm.
3. SSSzymon Sidor
  And, and the- those headlines are often accompanied by, by comments like, "Well, so AI is slowing down," or-
4. AMAndrew Mayne
  [chuckles]
5. SSSzymon Sidor
  ... or, or, you know, like, like, like, like, uh, "People are s- over-hyping AI so much, and it's only, like, 3%, so what's up with that," right? And, uh, and when I, you know, wh- when I see headlines like this, I, I remember to, like, uh, maybe 10 years ago, uh, I was working on, uh, natural language processing, uh, i- in, in, in with deep learning, and back then, it just didn't really work. Like, I remember Jakub once came to test, like, one of the technologies we were working on, and that was, like, trying to detect sentiment of sentences, and he was trying, "This movie is bad," correctly classified as negative.
6. JPJakub Pachocki
  Mm-hmm.
7. SSSzymon Sidor
  "This movie, uh, is good," correctly classified as positive. And then he would say, "This movie is not bad," [chuckles] and the model is like, "Oh, negative." [chuckles]
8. JPJakub Pachocki
  Yeah, yeah.
9. SSSzymon Sidor
  Right? So that was 10 years ago, right? And since then, like, okay, like, like we, we slowly got like, uh, uh... We slowly started solving tasks like this, solving tasks like, decide, d- is this word a noun, a verb? That was like sentiment neuron. Th- then we had, uh, uh, th- then we had GPT-1, GPT-2 started producing, like, a paragraph of text that made sense, right? That was such a breakthrough. Right now it feels so simple, but back then, it was such a breakthrough. Then we had, like, G- GPT-3, GPT-4. GPT-4 was like, to me, like, kind of-
10. JPJakub Pachocki
  ... like, let's say, my personal AGI moment, uh-
11. AMAndrew Mayne
  Mm-hmm
12. JPJakub Pachocki
  ... because it would sometimes say things that surprised me, and I was like, like, "Can this model actually surprise me," right? It's still back then, like, ChatGPT, for my personal, mm, use, kind of felt a little bit m- m- more like a nuance and kind of like maybe slightly better Google, but like-
13. AMAndrew Mayne
  Mm-hmm
14. JPJakub Pachocki
  ... that like w- was the big deal. And then, like, suddenly we get to deep research and, and this can actually, like, answer questions to really, like, rarely make things up, that, that felt useful. And then finally, now we have, like, models that can, like, compete in programming competitions, which was like a, [inhales] you know, like, very hard-earned for me personally, and even [chuckles] more so for Jakub, obviously. Uh, and, uh, yeah, the, the, the pace of progress, just like from the perspective of somebody working on this technology, is, like, absolutely amazing. So, so when you see that three percent, like I raised you, like ten years ago, if we had to quantify it, it would probably be like zero point zero zero zero zero zero one percent or something, right? So like, really, like really I think, I think those numbers need to be put in perspective, right? And, and, and there is no reason not to believe that, like, in a year it will be ten percent, in two years it will be twenty, and so on, so forth.
15. AMAndrew Mayne
  Yeah, I've heard it said that if you looked at, like, a graph of the economy from, let's say, like, you know, uh, World Wide Web, you know, early nineties forward, and you said, "Point to the internet happening to the economy," you can't find the point. There's no point you go, "Oh, okay, Tim Berners-Lee announced this," whatever. And I think AI is a lot like that, where people go, "Oh, we've only measured this one. Our measures are hard." It's hard to know that, you know, one, who's using it, how they're using it. And you brought up a very good point, too, about if you've been following it for a while... I remember training, like, a very simple next character predictor on my computer, and it was terrible, right? One, I'm using a small computer, but even then and then you've got, you know, the sentiment analysis, you're playing with BERT, and it's kind of, it's getting a little bit better. But then GPT-2 comes out, and I read every single output on GitHub, every single output GPT-2 came out, because I'm like, "There is something going on with this." And that's how I ended up working at OpenAI, was because I was this obsessive person about that, and then with access to GPT-3, kept saying, "Oh, this is really this path that's moving forward." But it's kind of crazy now, 'cause, like, if six weeks go by and a benchmark hasn't been broken, people are like, "Oh, we hit the wall!"
16. JPJakub Pachocki
  [chuckles]
17. AMAndrew Mayne
  "We hit the wall." And I would say part of the problem,
14:30 – 16:50
Benchmark saturation and its limits
1. AMAndrew Mayne
  though, is that benchmarks in some ways feel like you'll see modest improvement on them. I've heard some of the benchmarks have problems, and some of them actually have wrong answers, and it's impossible to get a hundred percent if you answer anything correctly. But also, we talk about the term... Internally, I've heard people talk about this as, you know, saturation. Do you wanna talk about that, Per?
2. JPJakub Pachocki
  Yeah. I, I think, I think there's a, there's a few issues that, that, that, that, that we're hitting with benchmarks right now. [lips smack] Um, yeah, I mean, a pretty clear one is saturation, and that is just the model's genuinely reaching a point where, you know, for the kind of standardized forms of measuring, um, intelligence or ability, like, they are, uh, at human level for a lot of them. Uh, you know, if you're kind of like able to, to, you know, perform in, in, in, in, in, uh, among the top on this, like, h- very hard high school competitions, where we have, um, you know, the, the, the best competitors from around the world, um, it just becomes quite hard to, like, have this, like, like, very, uh, very constrained, uh, measurement. You know, previously when we were looking at, you know, just like, yeah, GPT-1, GPT-2, GPT-3, really far scaling paradigm, um, you know, the benchmarks were really very- they were really just, like, measuring the rising of the tide. Um, I think now, um, you know, the field has developed a lot of, uh, more data-efficient ways to train for specific abilities, right? It doesn't mean, you know, train on these benchmarks, but you can train models that are, like, disproportionately good at math-
3. AMAndrew Mayne
  Mm
4. JPJakub Pachocki
  ... compared to their ability, um, you know, to write, for example, right? And, and so they will do better on math benchmarks, but it's no longer as representative of their, of their, of their overall intelligence, uh, in other topics. Um, I think, you know, these two issues combined, we, yeah, I think, I think we really have to think about the overall utility of these models, and especially, like, their ability to, uh, discover new insights.
5. AMAndrew Mayne
  Yeah, I guess that's a, a thing that sort of, kind of gets overlooked, is that you can build a model that's a really good test taker, but that model may not really be that useful for work. Ideally, your model should score well on tests, but just 'cause a model got these scores, doesn't mean you're gonna find it personally useful. And I certainly think that's a challenge we're at now, where when people say,
16:50 – 18:15
Why math competitions matter for AI
1. AMAndrew Mayne
  "Is a model good or bad?" It's kind of like saying, you know, you're trying to create a blanket assessment when there is a hundred different use cases for it. You know, is a model good or bad? Maybe it's great at creative writing, maybe it's bad at math, maybe it's great at math and bad at creative writing, and that becomes a really big challenge. And we've talked about this with, like, one for math, the International Math Olympiad, and these kinds of metrics. Why are they important? Why is it important to put it into these sort of human-level competitions?
2. JPJakub Pachocki
  I think the reason we've been excited about these competitions, like the International Math Olympiad and International-- uh, International Olympiad, is that, um, they are a pretty interesting example of, like, a, a test that is constrained, doesn't require that much knowledge, but really tests your ability to, to think about a problem hard for-
3. AMAndrew Mayne
  Mm
4. JPJakub Pachocki
  ... you know, an hour or two or, or three. Um, and, you know, and we have, like, a very kind of good, uh... we have very good evidence that, like, these problems are hard. Uh, you know, there's a lot of people that try to solve them and compete at solving them, and it matters to them. Um, so yeah, so I think this is the, then, you know, d- like for, for models that, like, you know, excelled at, like, kind of knowing a lot of things but not necessarily, uh, you know, thinking very hard in the past, that, that really seemed like the kind of the, the right milestone to be working towards.
5. AMAndrew Mayne
  Now, I'm-- as I understand
18:15 – 21:45
How models reason without tools
1. AMAndrew Mayne
  it, the model that scored gold medal level on that-... wasn't using, like, a calculator, it wasn't using other tools, it wasn't using some of the frameworks. It was doing it purely through reasoning.
2. JPJakub Pachocki
  Yeah, that's right. For, for the, for the International Math Olympiad, uh, yeah, the model was not using, uh, other tools, like, uh, yeah, it-
3. AMAndrew Mayne
  And, and, and again, and that was like two years ago, you ask it to multiply two four-digit numbers, it would fail.
4. JPJakub Pachocki
  Yeah, but, but definitely, like, you know, for, for this kind of contest, it's really, like... It is, of course, like, in a limited domain-
5. AMAndrew Mayne
  Mm
6. JPJakub Pachocki
  ... of, of math, but it really is about, like, fairly creative thinking, not about applying a formula.
7. AMAndrew Mayne
  I guess that's part of the challenge, though, is that once you start moving outside of math, it gets to be harder. You can start to come up with things like humanity's last exam, which I think is a pretty neat test, but you find that certain models, after they learn a certain kind of tool use, kind of figure out maybe sort of how to solve these problems better. And I would wonder, what kind of benchmarks are we gonna need? You know, what are you, what are you looking at to say, "Okay, this is, this is how I can kind of get an objective measure of a capability?"
8. SSSzymon Sidor
  One thing that surprised me in the past, I was talking to one, uh, one of our, uh, coworkers here, uh, Anna MacAndrew.
9. AMAndrew Mayne
  Mm.
10. SSSzymon Sidor
  And I was, uh, telling her about, uh, IMO. I was excited about, like, so- some progress, and she's like: "What's IMO?"
11. AMAndrew Mayne
  Yeah. [chuckles]
12. SSSzymon Sidor
  And, uh, and that, that, that kind of like, it was, like, a very in front of the mind for me because I, I, I do realise that, like, w- some of those benchmark, we kind of live in a bubble a little bit.
13. AMAndrew Mayne
  Mm-hmm.
14. SSSzymon Sidor
  F- for me, that competition, uh, feels important, and especially like, the, the computer science c- counterpart, uh, IOI, because it was a big part of my life, and so it's true for many coworkers here. But actually, like, for an average person, like, wor- w- working in other field or maybe not as interested in mathematics or, or, or, or, or computer science, maybe they're interested in history or something, the, the, the-
15. AMAndrew Mayne
  Yeah, Anna speaks, like, five languages, too-
16. SSSzymon Sidor
  Yeah
17. AMAndrew Mayne
  ... so I could see for her, a different metric based on that would be interesting.
18. SSSzymon Sidor
  Yeah. So, so, so I think, like, o- o- one thing that, like, um, that, uh, it's not a perfect metric, but, uh, at least helps keep us honest and keep, uh, helps keep us escape the, uh, the bubble is, uh, is just ChatGPT usage, right?
19. AMAndrew Mayne
  Mm.
20. SSSzymon Sidor
  Because everybody uses ChatGPT, and they use it for all sorts of use cases. Uh, and obviously, there's, like, a lot of pitfalls to using that as a metric, but at least it avoids that partic- particular problem where, like, there are just some things that I'm more familiar with, and other people might, uh, appreciate other things, and this gives you, like, a very wide coverage.
21. AMAndrew Mayne
  Yeah, and in there, too, you have subsets of users, people who are building GPTs and doing more complicated stuff. You mentioned before, too, the fact that the model will reason longer, and that seems like a very interesting way to evaluate capabilities.
22. JPJakub Pachocki
  Yeah, yeah, and, and I think this is also maybe, like, one, um, you know, challenge with focusing on, on the kind of usage of, like, ChatGPT and broad adoption of AI as, as the metric of progress. Like, I think, uh, this hasn't really happened to a very meaningful extent yet, but I think it will start happening pretty soon. We should be able to use vastly more compute, um, than, you know, a, a, a, a, a, a, a user would normally be willing to, um, um, to, to, to, to, to, to buy for themselves to produce, you know, technology artefacts that are useful to a lot of people. Um, and I, I think, I think that, that, for
21:45 – 23:30
Recognizing when a model can’t solve a problem
1. JPJakub Pachocki
  me, will be a very important measure of progress.
2. AMAndrew Mayne
  Which of these wins were the most surprising to you?
3. JPJakub Pachocki
  I think we definitely kind of anticipated getting to this point when we saw the reasoning models starting to work.
4. AMAndrew Mayne
  Mm-hmm.
5. JPJakub Pachocki
  Um, at the same time, I think this, like, recent, uh, set of wins is very impressive. I think maybe out of those, uh, um, I think, I think IMO came a little bit sooner than I expected.
6. AMAndrew Mayne
  Mm.
7. JPJakub Pachocki
  IMO gold, again, like, I think IMO problem six will, will still, uh... IMO has, like, all the problems require, like, creative thought and some, uh, new insights, I think, but typically, like, you know, there's this proverbial problem six that is, like, requires very out-of-the-box thinking. Uh, and, you know, it's really kind of like, usually, like, outside the kind of typical domains of the other problems. Um, and you know, so in the past, we were actually kind of drawing a boundary between, like, getting a gold, you know, like, solving these other problems, and, like, actually consistently solving all the problems, and in particular, problem six. So it was pretty, it's pretty, pretty, uh, you know, hilarious in some way to see ourselves and also Google DeepMind at the same time, like, "Oh, yeah, we solved problems one to five perfectly, and we didn't make any progress on problem, problem six." I think that kind of makes that challenge pretty clear.
8. AMAndrew Mayne
  Yeah, that was, I think that was what's interesting is that, yeah, that I think that the, the OpenAI model said, like, "Yeah, I don't think I can solve this," didn't even try or said that it had a problem with that. Was that correct?
9. JPJakub Pachocki
  Yeah, the model was able to correctly identify that it didn't make progress on the problem.
10. AMAndrew Mayne
  That's pretty fascinating to think about that, that the model's able to sort of determine that. 'Cause, uh, you know, there's a lot of conversations about when people talk about hallucination, which I think is a kind of a poorly understood thing, and there's a difference between fluid and crystalline thinking, and, you know, one is how much knowledge a model has, and the other is its problem-solving capability.
23:30 – 26:50
Storytime: AtCoder competition in Japan
1. AMAndrew Mayne
  And when you get to the point where it's able to do that, it's able to say, "Hey, no, I think, you know, I won't be able to answer this," that's pretty interesting sort of point to get to. Um, I've been told to ask this question about a Jap- uh, a live stream in Japan.
2. JPJakub Pachocki
  Oh, so, so, so I think in the, in the past few weeks, actually, like, our models have performed incredibly well in, uh, three competitions. Uh, so we talked about two of them, which is IOI and IMO. Um, there is also this competition for... that is open to everyone, not just, not, not just high schoolers, um, called, um, AtCoder. It's a, it's a kind of very prestigious-
3. AMAndrew Mayne
  Mm
4. JPJakub Pachocki
  ... very, uh, um, um, high-quality, uh, competition organised in Japan, uh, but open to, o- open to competitors worldwide. And in this particular contest, it was about, um, kind of longer-horizon, um, heuristic problems, where you're, you're given only a single problem, and you have ten hours to solve it.... and so you have competitors, uh, racing to figure out, like, the best approach to, to, up to, to, to, to this, like, difficult optimization problem. Um, so it's a bit different because there isn't, like, a single correct solution. There isn't, like, a single pattern to follow. Like, these tasks are, like, extremely diverse, and you, you can focus on the single task for, for, for, for, for ten hours. And so we, we entered our, a model into this contest, and, um, you know, to, to, to me, this, this had a little bit of a personal significance. Um, I used to be a kind of very, uh, engaged competitor in the past in this, like, more short-form, like, closed-form contests, like IOI. Um, and, um, my friend, uh, Saiho, um, who also works at OpenAI at the time, um, excelled at this, like, long-duration, uh, contest. And when we worked together, he would mock me a little bit that, you know, my sort of contest would be automated, uh-
5. AMAndrew Mayne
  Mm
6. JPJakub Pachocki
  ... long before his. Um, because, you know, they are-- that kind of, like, longer duration require, uh, um, require, require kind of more focused work. Um, and turns out, like, in this, in this, in this contest in Japan, Saiho was actually, like, one of the top contenders. Um, and so I was watching this live stream, watching our model kind of race with Saiho throughout the competition. Um, in the end, um, our model actually got second place, and Saiho won.
7. AMAndrew Mayne
  [chuckles]
8. JPJakub Pachocki
  So, you know, he alone stood in the way of, of his, uh, uh, of, of, you know, his prediction not coming true. [chuckles]
9. AMAndrew Mayne
  Still two wins for OpenAI.
10. SSSzymon Sidor
  I think, uh, one, one thing that also stood out to me is, like, uh, Saiho, at the end of the competition, he was, like, really, really tired, and they interviewed him a little bit to, to, to talk about his experience, uh, like, in the middle of the competition. And, um, I, I don't think I can quote him directly on this podcast, but he's like: "Your models are very, very bad. I want to go to sleep."
11. AMAndrew Mayne
  [chuckles]
12. SSSzymon Sidor
  "I am tired." [laughing]
13. AMAndrew Mayne
  Yeah. We, we've heard talks about, like, the wall. We mentioned that before, and I think it was interesting because reasoning kind of came out of nowhere. I mean, there were, you know, hints of stuff, some papers and stuff, but people really hadn't kind of drawn the line. Then all of a sudden, the o1 model comes out, and the whole idea that you can not just have a model give answers, you can let the model
26:50 – 28:55
How reasoning breakthroughs really happen
1. AMAndrew Mayne
  kind of have an inner monologue, talk to itself, and solve things through. Do we think that's enough to take us to AGI, or are there other breakthroughs needed, or are there just other breakthroughs you think are gonna happen?
2. SSSzymon Sidor
  Yeah, I, I just need to point out that-
3. AMAndrew Mayne
  Yeah
4. SSSzymon Sidor
  ... like, the team here worked extremely hard on, on this particular thing, and it, uh, it feels like something simple, like you just need longer chain of thought, but, like, actually to make it work was really hard-earned. And, um, and I do think, like, back to your previous question of, like, what was the surprising result when, when we first noticed that it's working or we first noticed that, that, that, that we can train those models and give them more data, and they get better, that was, like, I think, one of the most kind of shocking moments, the moments where we are like, um... We started asking, uh, uh, uh, uh, like, very, very seriously the question, like: "Are we ready as an organization for, for incredibly fast-paced progress?" Like, I, I, I remember there was, like, one particular evening, like, I think eleven PM, like, there was like... I think we were on the line with, with Sam and Mira and [chuckles] just kind of trying to-- I, I think we got a little bit freaked out by those results. Sometimes, sometimes that happens. Uh, yeah.
5. AMAndrew Mayne
  The, the, the pace is fast. I mean, it is a fast thing, and like I said, the, you know, the joke is people, nothing happens for six weeks, they think it's slowed down. But then, if you look year over year, it is. And, uh, I mean, it's a fair point, 'cause yeah, you, you have things that you're aware of internally when you work on something for a couple of years, and then, "Hey, there's a research paper," but it's like, yeah, it's not like it came out last night. It was like, there was a lot of work onto it. But I'd say, too, the world was sort of surprised by the fact that there is this really fundamental new way to sort of make these models do even more, to take, take kind of the existing sort of infrastructure, so to speak, and get a lot more capability out of it. Where do you think the next breakthroughs are gonna happen?
6. JPJakub Pachocki
  I think one thing we always try to not underestimate is the importance of scaling. Um, I think, you know, even as we look at these, we think models,
28:55 – 30:30
What’s next for scaling and long-horizon reasoning
1. JPJakub Pachocki
  you know, like, it's not like, you know, the previous scaling paradigm of pre-training, uh, has vanished, right?
2. AMAndrew Mayne
  Mm.
3. JPJakub Pachocki
  Like, I think we will see these things compound, and I think there's also, like, new, uh, new directions that, that we can move in. In particular, um, we were talking about, you know, extending the horizon that these models can, uh, you know, plan for and, and, and, and reason. And, and I think if you look at it from the perspective of just, like, compute spend, um, you know, we say, "Okay, yeah, we went from GPT, GPT-4 was doing some amount of compute for, for, for every answer, to like, you know, GPT-5 Pro, which maybe uses, I don't know, like, uh, ten X, twenty X, I don't know," like some, some, you know, non-trivial, but in some ways, not, not that impressive amount of compute more, right? And can produce much better answers. Uh, I think on the scale of, like, what amount of compute would you, would you be willing to spend on a problem that actually matters to a lot of people, right? On, like, progress on a, on a, on a medical research question, uh, you know, progress on developing the next generation of models, right? Like, like, these are, like, incomparably larger amounts.
4. AMAndrew Mayne
  Mm.
5. JPJakub Pachocki
  And so I think, I think that is, that question of, like, model, uh, persistence and ability to work for a very long time on focus problems is a pretty clear next step.
6. AMAndrew Mayne
  Mm. How would you put-... the practical implications of
30:30 – 34:00
What AGI will look and feel like
1. AMAndrew Mayne
  AGI to sort of like if you were talking to a typical ChatGPT user or something, like, what would be the- what would their experience be like in a few years from now, or five years from now? Which sounds far away, but it's really not, 'cause we're just five years ago, GPT-3 came out, and that feels like a blur. What would an AGI-like model be capable of?
2. JPJakub Pachocki
  So I, I, I was, I was talking about automating research.
3. AMAndrew Mayne
  Mm-hmm.
4. JPJakub Pachocki
  Um, you know, my picture of, of how that would actually look like is, you know, imagine a, a company of very capable researchers and engineers that is largely automated, right? And, and, um, now, again, like, that is... I think that is something that, like, you know, will interface with the world in all sorts of ways. It won't be just, like, kind of a black box. Like, it will, you know, talk to people, it will kind of, like, take in inputs, it will run experiments. But I think, like, you know, having, having this sort of potential for developing new technology and, you know, and, and other kind of artifacts, you know, uh, um, code bases, designs, um, I think, I think kind of, like, radically accelerate, uh, the pace of technical progress. Um, so I think that is something that, like, we will feel and, you know, and we need to, to do a lot of work to, to get it right, you know, from a technical and from societal perspective. Um, but I think that is kind of our time frame. I think we, we should also expect, like, a lot of progress on the, on the actual kind of, you know, um, interfaces to AI that we interact with.
5. AMAndrew Mayne
  Mm-hmm.
6. JPJakub Pachocki
  You know, we, we see, like, ChatGPT can, can feel quite human-like. We can, we can form attachments with it. I think, you know, as it becomes more persistent, as it becomes kind of capable of expressing itself in, like, different forms and texts, right? Like, I think that those effects will become stronger, and again, like, that will be something, uh, I think will become a, a very big and important conversation.
7. AMAndrew Mayne
  I just got access in ChatGPT to have it actually read my calendar in Gmail, and, and I realized, like, how far we've come, 'cause I'm excited about that now. I'm not really terrified that it's gonna start writing, like, you know, Ewok fan fiction to somebody. You know, and I think that's sort of this neat threshold that we've sort of crossed, this sort of level of trust.
8. JPJakub Pachocki
  I think there's definitely, like... Like, we are in a place where there's, like, a very tough trade-off, where, like, there is, like, such clear, just economic, you know, personal value you can extract out of having the model, um, you know, have access to a lot of your, um, a- a- a- of your data. At the same time, I think, like, you know, we are not at, like, the threshold of, like, robustness, where, like, you know, like, we can fully trust these models to not be exploited by someone trying to exploit them.
9. AMAndrew Mayne
  Mm-hmm.
10. JPJakub Pachocki
  Um, yeah, it's def- it's definitely, like, a big problem I think, you know, we as a field will have to iterate on.
11. AMAndrew Mayne
  What would you tell two versions of you guys today in high school? What would you do? If you're visiting your old classroom, what would you say right now, tell them about the future? What advice would you give?
12. SSSzymon Sidor
  Invest in Bitcoin. [laughing]
13. JPJakub Pachocki
  [laughing]
14. AMAndrew Mayne
  No, I mean today. Today, even today. In 2025, what would you tell a high school student?
15. SSSzymon Sidor
  High school students today? Oh, yeah, yeah, that, that one is also, I think, uh, a great question, right? Because I, I, I, I hear a lot of, uh, kind of... what I consider misinformation on that online. So you should absolutely learn to code.
16. AMAndrew Mayne
  Yeah.
17. SSSzymon Sidor
  Like, one skill that, that is at premium and will continue being at premium is
34:00 – 36:25
Advice to high school students in 2025
1. SSSzymon Sidor
  to have, like, um, have, uh, like, really structural intellect that can, like, break complicated problems into pieces. And, you know, like, that might not be programming in the future, but programming is a fine way to acquire that skill. So are other, uh, other kind of, uh, domains where you need to think a lot. So, so don't, don't let people tell you that, uh, you should not learn to code.
2. AMAndrew Mayne
  Yeah, I, I learned to code late in life, and that's actually how I ended up working at OpenAI as an engineer. And, and I try to explain to people, just because a system can do the thing, doesn't mean you don't want to know how it works anymore. And as you said, when you understand how to break down a task... When I worked at OpenAI in prompt engineering, my coding understanding helped me understand to take both language and break it down and make it do better things. And I think that people who bridge those gaps are really at an advantage. And so whenever I hear people say, like, "Don't learn to code," it's like, do I want an airplane pilot who doesn't understand aerodynamics? Like, this doesn't make much sense to me.
3. JPJakub Pachocki
  Well, I've... You know, thinking about how I thought about things in high school, I think it's, like, pretty incredible, like, how many kind of perceived constraints are not actually there when you really think about it. You know, maybe, maybe, like, the first revelation to me was like, "Hey, you know, if I really kind of, like, am passionate about this computer science stuff, like, it is- I can actually spend a bit more time on it at the cost of, you know, maybe spending a bit of- a bit less time on, like, you know, like, other 12 subjects in school." Um, but, you know, but then, like, you know, but then, like, somehow it, like, again, like, took kind of like a, um... It was, like, a big revelation to me that, like, actually, you know, I, I can, I can, I can, I can, um, go and, you know, study in the USA at some point. And, like, you know, that's not really something that, like, seems, uh... That's, like, obviously in our action space and, you know, and obviously kind of like, you know, like spending some time here in, uh, Silicon Valley, right, and kind of seeing, like, how, you know, people are willing to really, like, attack these big problems with ambition and, like, the kind of, um, and the belief that, like, you can actually, you, you, you can actually, um, make a meaningful, um, positive change in the world. Uh, yeah, I think, I think it has been incredibly inspiring and, uh, yeah, and something I, I, I, I, I, I, I, I, I cherish about, about, about this community.
36:25 – 40:23
Balancing trust and personal value
1. AMAndrew Mayne
  Is there a book or something that, like, inspired you?
2. JPJakub Pachocki
  ... I think there's, uh, there's a couple of books. Um, I remember my, [chuckles] it's actually, yeah, it's actually very hilarious, like, thinking about it now. Um, I didn't really connect the dots, but, uh, my dad gave me this book once, um, when I was, like, in a pretty, uh, um... I think I was, like, 15, and I was, like, pretty unsure what I wanted to do. Um, yeah, it was, um, a Polish, Polish version of a book by, like, some author I didn't know, uh, called, uh, Hackers and Painters.
3. AMAndrew Mayne
  Mm.
4. JPJakub Pachocki
  Uh, [chuckles] yeah, so, uh, yeah, it was actually, uh, Paul Graham. Uh, so, so I guess like, uh, uh, yeah, again, like, kind of, like, this community. Um, so I f- I found that pretty, pretty, pretty inspiring. [chuckles]
5. AMAndrew Mayne
  Yeah, there's something helpful, I think, to, to hearing the message of, like, "No, it's okay to dream big and go do stuff," that you can just make things happen in the world, and I think that the more people realize that, kind of the better the world gets to be. Was there any book that influenced you, or movie, TV show?
6. SSSzymon Sidor
  Oh, movie. I have a stupid answer to that question-
7. AMAndrew Mayne
  I love stupid answers
8. SSSzymon Sidor
  ... which kind of feels bad after the profound one. But, but like, okay, so I watched Iron Man.
9. AMAndrew Mayne
  Yeah.
10. SSSzymon Sidor
  And it inspired me to start a PhD in robotics.
11. AMAndrew Mayne
  Th- that's a great answer, though. Like, you know, The, The Martian by Andy Weir, I met a scientist at NASA who was a botanist who read that book, and I'm like-
12. SSSzymon Sidor
  Yeah
13. AMAndrew Mayne
  ... "Well, they got the atmospheric physics wrong and all this." He's like, "Well, that's why I'm here." I'm like, "Oh!"
14. SSSzymon Sidor
  Well-
15. AMAndrew Mayne
  Yeah.
16. SSSzymon Sidor
  Yeah, I guess I didn't get to the stupid part.
17. AMAndrew Mayne
  Oh.
18. SSSzymon Sidor
  The stupid part was, like, when I started working on robotics, I was very disappointed how bad those robots are. I-- somehow it didn't occur to me that, like, maybe the movie is a movie.
19. AMAndrew Mayne
  [laughing]
20. SSSzymon Sidor
  Uh, yeah. So that whole experience was kind of bad for me, if not for- would be kind of bad for me, if not for the fact that this is... There I, like, met a friend who was into deep learning, and at the time, I thought all of the machine learning kind of is a hype, but it was an interesting systems problem. And then s- out of, out of nowhere, and as I'm sure, like, uh, you know, like, I would frustrate some DeepMind folks by saying that, AlphaGo came out. Uh, [chuckles] no, I'm sure it wasn't out of nowhere.
21. AMAndrew Mayne
  Yeah.
22. SSSzymon Sidor
  I'm sure it was years in the making. Uh, and, and that, like, was very inspiring, actually to both of us.
23. AMAndrew Mayne
  Yeah.
24. SSSzymon Sidor
  Uh, a- a- and, and since then, it was just hard not to work.
25. JPJakub Pachocki
  Yeah. Took me a while to become, um, convinced that deep learning is more than a, than a fad. Uh, because, you know, like, we don't really understand the kind of underlying optimization, and, you know, I think this kind of has been the story of our research here, make- trying to make progress on, on, on these questions, like, about how it really works, but it really is like studying a physical phenomenon in some way. And, you know, to a classically trained computer scientist, that was a, that was a weird thing to accept.
26. AMAndrew Mayne
  Mm-hmm. Yeah-
27. SSSzymon Sidor
  I do remember when Jakub was telling me about, like, scaling up principled convex optimization.
28. AMAndrew Mayne
  Like-
29. SSSzymon Sidor
  That was before AlphaGo.
30. AMAndrew Mayne
  Yeah, we, we, we did- yeah, and AlphaGo was interesting, 'cause first, like, "Oh, cool, it solved Go." And then we're like, "Yeah, but it just learned by watching all these." Then they did AlphaGo Zero, where it's self-taught, and you're like, "Okay, game over, folks." Like, there's a trajectory here, and I think that's continued on. But I think that, yeah, if you hadn't watched Iron Man, maybe Thor instead, you know, maybe things would've turned out better.

Episode duration: 40:23

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode yBzStBK6Z8c

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome

Intro

From high school in Poland to AI research leaders

Explaining AGI: technical and everyday perspectives

Automating scientific discovery with AI

Breakthroughs in medicine, AI safety, and alignment

Today is a decade in the making

Benchmark saturation and its limits

Why math competitions matter for AI

How models reason without tools

Recognizing when a model can’t solve a problem

Storytime: AtCoder competition in Japan

How reasoning breakthroughs really happen

What’s next for scaling and long-horizon reasoning

What AGI will look and feel like

Advice to high school students in 2025

Balancing trust and personal value

Get more out of YouTube videos.