Skip to content
OpenAIOpenAI

AGI progress, surprising breakthroughs, and the road ahead — the OpenAI Podcast Ep. 5

How close are we to automating scientific discovery? What do AI competition wins really tell us about progress toward AGI? OpenAI Chief Scientist Jakub Pachocki and researcher Szymon Sidor share inside stories—from gold medals at the International Math Olympiad to surprising leaps in reasoning—that reveal where AI is headed next. Chapters 1:20 – From high school in Poland to AI research leaders 4:50 – Explaining AGI: technical and everyday perspectives 6:30 – Automating scientific discovery with AI 7:50 – Breakthroughs in medicine, AI safety, and alignment 10:30 – Today is a decade in the making 14:30 – Benchmark saturation and its limits 16:50 – Why math competitions matter for AI 18:15 – How models reason without tools 21:45 – Recognizing when a model can’t solve a problem 23:30 – Storytime: AtCoder competition in Japan 26:50 – How reasoning breakthroughs really happen 28:55 – What’s next for scaling and long-horizon reasoning 30:30 – What AGI will look and feel like 36:25 – Balancing trust and personal value 34:00 – Advice to high school students in 2025

Andrew MaynehostJakub PachockiguestSzymon Sidorguest
Aug 15, 202540mWatch on YouTube ↗

EVERY SPOKEN WORD

  1. 0:001:20

    Intro

    1. AM

      Hello, I'm Andrew Mayne, and this is the OpenAI Podcast. Today, our guests are OpenAI's chief scientist, Jakub Pachocki, and Szymon Sidor. We're gonna talk about measuring AI progress, how you determine AGI, and where the next breakthrough might come from.

    2. JP

      The model was able to correctly identify that it didn't make progress on the problem.

    3. SS

      We started asking very, very seriously the question, like, are we ready as an organization for, for incredibly fast-paced progress?

    4. JP

      When we think about how we shape our research program at OpenAI, we seek to create intelligence that is very general.

    5. AM

      I want to first start off by understanding your roles. So, Jakub, you're the chief researcher, chief scientist at OpenAI?

    6. JP

      Chief scientist, yes.

    7. AM

      Okay, what does chief scientist mean?

    8. JP

      So the primary thing I'm responsible for is setting the research roadmap for the company. Um, so deciding what is the technical path we are going to bet on, and what is the, um, the underlying long-term research that, that, that, that, that we're going to pursue.

    9. AM

      So how about you, Szymon, what do you do?

    10. SS

      Random things.

    11. AM

      Random things. [chuckles] Okay.

    12. SS

      Um, yeah, I, I, I mostly do IC work. Uh, I try to, um... Well, maybe sprinkle of leadership somewhere in there.

    13. AM

      Mm-hmm.

    14. SS

      Uh, I try to do what's the very s- most

  2. 1:204:50

    From high school in Poland to AI research leaders

    1. SS

      useful.

    2. AM

      Now, you two knew each other before working at OpenAI, right?

    3. SS

      Yeah, we went to the same high school.

    4. AM

      Same high school?

    5. SS

      Yeah.

    6. AM

      Were you guys friends?

    7. SS

      Uh, I think we became best friends w- when, when, uh, after we left. Like, I think kind of coming to US is the kind of, uh, emotional experience that forms bonds.

    8. AM

      Right.

    9. SS

      Uh, I think in, in, uh, in high school, uh, uh, w- we were more like colleagues.

    10. AM

      What, what kind of high school produces guys like you? [chuckles]

    11. JP

      So, well, yeah, we, we went to this high school in, um, in Gdynia, in Poland. Uh, I think we were both drawn there by this, uh, computer science teacher-

    12. AM

      Mm-hmm

    13. JP

      ... uh, Mr. Ryszard Szubartowski, um, who's had a great track record, uh, before, be- before we went there, of, of, of, of, uh, bringing up, uh, um, computer scientists, programmers, uh, um, with this, like, big focus on programming competitions and kind of, and pursuing, uh, you know, excellence in this, like, like, one field. Yeah, so and I, I think that was, like, a very formative experience and a great mentor for us.

    14. AM

      Oh, wow!

    15. SS

      Yeah. No, definitely. Uh, I think he there was, like, really going deep on programming. I think it went way beyond, like, typical high school curriculum. Like, there was, like, graph theory, matrices, and all sorts of stuff like that. I actually hope that maybe with ChatGPT, it's a little bit easier for people now to do these kind of-

    16. AM

      Mm-hmm

    17. SS

      ... deep dives. 'Cause, um, you know, without the right mentor and without a lot of work, it's, it's kind of h- hard to replicate that experience.

    18. AM

      I've been using it to explain things like there's, you know, the Monty Hall problem, where you have to choose which door, and you go into ChatGPT, and you say, "Make a graphic, interactive version of this," and all of a sudden, you can see it. It can show you the different solutions if you do one thing or the other. I think that's... It's one of these things where I'm excited about the ability to not just explain in text, but to build multimedia to do things, and it does get into the area of there's not really a measure for that. You know, that it's a use case. You know, this didn't exist before, and, you know, we're, we talk about AGI, but we kind of have very loose definitions and whatever, and I'd love to hear kind of like what, how you would describe it, both from a technical and also, like, a layperson's understanding of it.

    19. JP

      Yeah. Uh, oh, m- maybe to, like, address the point about teaching, um, you know, this sort of, like, better, um, explanation of some concept or, like... or, or, you know, teaching through Socratic methods, like, definitely, uh, yeah, a powerful use of ChatGPT, and I think works well with, like, a, um, um, a teacher, like, like, like, uh, like our Mr. Szubartowski. But at the same time, um, I think, like, the, the thing that, like, he was able to provide was more, like, kind of emotional support and space-

    20. AM

      Mm-hmm

    21. JP

      ... which I think is, uh, it will be, it will be hard for AI to, to do, to, to do alone.

    22. AM

      That's a great point. I think, I think that gets lost a lot 'cause I sometimes hear people talk about, "Oh, well, AI will replace education," and for me, I had teachers that maybe their facts weren't always right, but their heart was there, and they were caring, and they answered questions for stuff, and so I think that's a good point, that these are companions to that, and I think that a teacher using these tools can be a even more capable teacher.

    23. JP

      Yeah.

    24. AM

      But, uh, so on the subject, though, of AGI, though, uh, I want to hear first, like, give me your technical definition of it, or actually not a technical definition. Give me, like, how you would describe it to, like, if you were talking to, you know, a younger sibling.

  3. 4:506:30

    Explaining AGI: technical and everyday perspectives

    1. JP

      You know, a, a few years ago, when we would talk about AGI, um, it, you know, it felt like the technol- the technology, deep learning, uh, you know, has incredible promise, but at the same time, the concept still felt a little bit abstract, right, and far away. Um, and so I think, you know, whether you talk about, like, you know, human-level intelligence, ability to converse naturally, uh, you know, ability to, you know, solve math problems or pursue research, like, they all kind of felt like in the same, um, space. Um, I think, yeah, as, as the technology has progressed, now we see there's, like, this... these are actually, like, quite distinct, uh, capabilities, and I think we pretty clearly are at the point where the AI is able to converse naturally, uh, on a, a wide range of topics. Um, you know, it is able to solve math problems. I think, you know, like, getting a gold medal at IMO is something-

    2. AM

      Mm-hmm

    3. JP

      ... we've long discussed as, like, a milestone on the path to AGI, and that, that happened. Um, I think, um, solving, you know, solving all the problems on the National Math Olympiad is actually a little bit harder, and, and, you know, I, I think it's, like, another milestone on the path there. But I think increasingly, um-... you know, we see that, like, these kind of pointwise measures are less adequate, and so we, we turn to thinking about, like, what is, what is its actual impact in the world?

    4. AM

      Hmm.

    5. JP

      Um, for me personally, the, the, the, the thing that I think about when I think about, uh, how, um, AI progress really

  4. 6:307:50

    Automating scientific discovery with AI

    1. JP

      impacts the world meaningfully, I, I, I, I first think about its potential for automating, um, the discovery and production of new technology.

    2. AM

      Hmm.

    3. JP

      Um, [lips smack] I think, um, you know, we tend to associate kind of new ideas, you know, fundamental technological progress with just human ingenuity. Uh, and, you know, we, we, we measure kind of the, you know, the, the, uh, our progress by these, like, kind of major milestone i- inventions and, and, and, and technological revolutions. Um, and I think it is just hard to internalize, like, this is possi- it, it, it is possible to automate most of this process. It is possible to have, uh, a big computer that is coming up with ideas that fundamentally change our understanding of the world, and I, I actually think that is not that far away. Uh, and so thinking about that, you know, what separates us from that, and what are the consequences of such technology is, is, yeah, i- it's my first thought.

    4. AM

      I, I just ordered a little Mac Studio because I wanna take, uh, the open source model, GPT-OSS, and I wanna just let it run non-stop. 'Cause that idea, just the idea of letting it generate and do stuff twenty-four seven is fascinating to me. But you're talking about a scale of basically automating science at a huge scale, and so what kind of discoveries, what kind of things do we think

  5. 7:5010:30

    Breakthroughs in medicine, AI safety, and alignment

    1. AM

      might be the first things we could see from that?

    2. JP

      Wh- when we think about how we shape our research program at OpenAI, we seek to create intelligence that is very general. Um, we, you know, drive towards this automated researcher as a priority, but we, um, you know, we don't really think of it as, like, "Let's take these specific domains, and let's kind of, like, deploy this technology there." I think that is a way to, like, make faster pointwise progress, but I think the potential for, like, the really big discoveries and, and, and, and, a- and, and, and the most meaningful technology advancement comes from this general- generality.

    3. AM

      Mm-hmm.

    4. JP

      Um, although still I think we see kind of like, you know, the technology, like, uh, you know, it's, it's, it's kind of easier to apply in some domains than others. I think, um, I think especially in places that combine, uh, a large amount of reasoning with a lot of kind of domain knowledge and intuition, uh, seem, uh, very, um, very amenable to, to, to, to, to, to these systems. I think, in particular, we see, like, pretty incredible results on, uh, medicine, which is very encouraging.

    5. AM

      Mm-hmm.

    6. JP

      Uh, I have high hopes about that. Um, yeah, I think naturally, being a company of AI researchers, uh, we think a lot about, uh, automating our, our own work. I think it is also kind of a, uh... You know, I, I don't think it's, um... You know, if it is, if AI can indeed reach a point where you can automate AI research, then that is probably a very important thing to automate. Uh, and, you know, and similarly thinking about, like, how we can help with automating research on AI alignment and safety.

    7. SS

      I'm obviously impressed by the, like, IMO IOI results. I, I mean, uh, I was actually about to add that, like, um, in the past, when we were talking about the IMO of Jakub, like, that was, like, you know, a few years ago, uh, and, um, and we, we are still, like, kind of trying to even figure out what our definition of AGI might be. Like, like, one, one, one kind of concept we are considering is something like, you know, solving all the problems on, on, uh, on the Math Olympiad. Uh, and, um, and, and, and why did that feel appropriate? Is just like, okay, if, if you have a model of such a superior mathematical reasoning, then [clears throat] it should be able to, like, disrupt, like, a bunch of different, uh, domains that kind of, uh, can be mathematically modelled. Um, right? Um, I mean, I'm, I'm in general just, um... I think maybe this podcast is just a good opportunity to, to kind of, like, share a little bit more of an inside perspective. I was

  6. 10:3014:30

    Today is a decade in the making

    1. SS

      just astounded by the progress. I, I think-- So sometimes I see those headlines w-where, uh, where, where people say that, like, "Oh, um, the economic kind of impact of AI is only, like, 3% or 5%," right?

    2. AM

      Mm.

    3. SS

      And, and the- those headlines are often accompanied by, by comments like, "Well, so AI is slowing down," or-

    4. AM

      [chuckles]

    5. SS

      ... or, or, you know, like, like, like, like, uh, "People are s- over-hyping AI so much, and it's only, like, 3%, so what's up with that," right? And, uh, and when I, you know, wh- when I see headlines like this, I, I remember to, like, uh, maybe 10 years ago, uh, I was working on, uh, natural language processing, uh, i- in, in, in with deep learning, and back then, it just didn't really work. Like, I remember Jakub once came to test, like, one of the technologies we were working on, and that was, like, trying to detect sentiment of sentences, and he was trying, "This movie is bad," correctly classified as negative.

    6. JP

      Mm-hmm.

    7. SS

      "This movie, uh, is good," correctly classified as positive. And then he would say, "This movie is not bad," [chuckles] and the model is like, "Oh, negative." [chuckles]

    8. JP

      Yeah, yeah.

    9. SS

      Right? So that was 10 years ago, right? And since then, like, okay, like, like we, we slowly got like, uh, uh... We slowly started solving tasks like this, solving tasks like, decide, d- is this word a noun, a verb? That was like sentiment neuron. Th- then we had, uh, uh, th- then we had GPT-1, GPT-2 started producing, like, a paragraph of text that made sense, right? That was such a breakthrough. Right now it feels so simple, but back then, it was such a breakthrough. Then we had, like, G- GPT-3, GPT-4. GPT-4 was like, to me, like, kind of-

    10. JP

      ... like, let's say, my personal AGI moment, uh-

    11. AM

      Mm-hmm

    12. JP

      ... because it would sometimes say things that surprised me, and I was like, like, "Can this model actually surprise me," right? It's still back then, like, ChatGPT, for my personal, mm, use, kind of felt a little bit m- m- more like a nuance and kind of like maybe slightly better Google, but like-

    13. AM

      Mm-hmm

    14. JP

      ... that like w- was the big deal. And then, like, suddenly we get to deep research and, and this can actually, like, answer questions to really, like, rarely make things up, that, that felt useful. And then finally, now we have, like, models that can, like, compete in programming competitions, which was like a, [inhales] you know, like, very hard-earned for me personally, and even [chuckles] more so for Jakub, obviously. Uh, and, uh, yeah, the, the, the pace of progress, just like from the perspective of somebody working on this technology, is, like, absolutely amazing. So, so when you see that three percent, like I raised you, like ten years ago, if we had to quantify it, it would probably be like zero point zero zero zero zero zero one percent or something, right? So like, really, like really I think, I think those numbers need to be put in perspective, right? And, and, and there is no reason not to believe that, like, in a year it will be ten percent, in two years it will be twenty, and so on, so forth.

    15. AM

      Yeah, I've heard it said that if you looked at, like, a graph of the economy from, let's say, like, you know, uh, World Wide Web, you know, early nineties forward, and you said, "Point to the internet happening to the economy," you can't find the point. There's no point you go, "Oh, okay, Tim Berners-Lee announced this," whatever. And I think AI is a lot like that, where people go, "Oh, we've only measured this one. Our measures are hard." It's hard to know that, you know, one, who's using it, how they're using it. And you brought up a very good point, too, about if you've been following it for a while... I remember training, like, a very simple next character predictor on my computer, and it was terrible, right? One, I'm using a small computer, but even then and then you've got, you know, the sentiment analysis, you're playing with BERT, and it's kind of, it's getting a little bit better. But then GPT-2 comes out, and I read every single output on GitHub, every single output GPT-2 came out, because I'm like, "There is something going on with this." And that's how I ended up working at OpenAI, was because I was this obsessive person about that, and then with access to GPT-3, kept saying, "Oh, this is really this path that's moving forward." But it's kind of crazy now, 'cause, like, if six weeks go by and a benchmark hasn't been broken, people are like, "Oh, we hit the wall!"

    16. JP

      [chuckles]

    17. AM

      "We hit the wall." And I would say part of the problem,

  7. 14:3016:50

    Benchmark saturation and its limits

    1. AM

      though, is that benchmarks in some ways feel like you'll see modest improvement on them. I've heard some of the benchmarks have problems, and some of them actually have wrong answers, and it's impossible to get a hundred percent if you answer anything correctly. But also, we talk about the term... Internally, I've heard people talk about this as, you know, saturation. Do you wanna talk about that, Per?

    2. JP

      Yeah. I, I think, I think there's a, there's a few issues that, that, that, that, that we're hitting with benchmarks right now. [lips smack] Um, yeah, I mean, a pretty clear one is saturation, and that is just the model's genuinely reaching a point where, you know, for the kind of standardized forms of measuring, um, intelligence or ability, like, they are, uh, at human level for a lot of them. Uh, you know, if you're kind of like able to, to, you know, perform in, in, in, in, in, uh, among the top on this, like, h- very hard high school competitions, where we have, um, you know, the, the, the best competitors from around the world, um, it just becomes quite hard to, like, have this, like, like, very, uh, very constrained, uh, measurement. You know, previously when we were looking at, you know, just like, yeah, GPT-1, GPT-2, GPT-3, really far scaling paradigm, um, you know, the benchmarks were really very- they were really just, like, measuring the rising of the tide. Um, I think now, um, you know, the field has developed a lot of, uh, more data-efficient ways to train for specific abilities, right? It doesn't mean, you know, train on these benchmarks, but you can train models that are, like, disproportionately good at math-

    3. AM

      Mm

    4. JP

      ... compared to their ability, um, you know, to write, for example, right? And, and so they will do better on math benchmarks, but it's no longer as representative of their, of their, of their overall intelligence, uh, in other topics. Um, I think, you know, these two issues combined, we, yeah, I think, I think we really have to think about the overall utility of these models, and especially, like, their ability to, uh, discover new insights.

    5. AM

      Yeah, I guess that's a, a thing that sort of, kind of gets overlooked, is that you can build a model that's a really good test taker, but that model may not really be that useful for work. Ideally, your model should score well on tests, but just 'cause a model got these scores, doesn't mean you're gonna find it personally useful. And I certainly think that's a challenge we're at now, where when people say,

  8. 16:5018:15

    Why math competitions matter for AI

    1. AM

      "Is a model good or bad?" It's kind of like saying, you know, you're trying to create a blanket assessment when there is a hundred different use cases for it. You know, is a model good or bad? Maybe it's great at creative writing, maybe it's bad at math, maybe it's great at math and bad at creative writing, and that becomes a really big challenge. And we've talked about this with, like, one for math, the International Math Olympiad, and these kinds of metrics. Why are they important? Why is it important to put it into these sort of human-level competitions?

    2. JP

      I think the reason we've been excited about these competitions, like the International Math Olympiad and International-- uh, International Olympiad, is that, um, they are a pretty interesting example of, like, a, a test that is constrained, doesn't require that much knowledge, but really tests your ability to, to think about a problem hard for-

    3. AM

      Mm

    4. JP

      ... you know, an hour or two or, or three. Um, and, you know, and we have, like, a very kind of good, uh... we have very good evidence that, like, these problems are hard. Uh, you know, there's a lot of people that try to solve them and compete at solving them, and it matters to them. Um, so yeah, so I think this is the, then, you know, d- like for, for models that, like, you know, excelled at, like, kind of knowing a lot of things but not necessarily, uh, you know, thinking very hard in the past, that, that really seemed like the kind of the, the right milestone to be working towards.

    5. AM

      Now, I'm-- as I understand

  9. 18:1521:45

    How models reason without tools

    1. AM

      it, the model that scored gold medal level on that-... wasn't using, like, a calculator, it wasn't using other tools, it wasn't using some of the frameworks. It was doing it purely through reasoning.

    2. JP

      Yeah, that's right. For, for the, for the International Math Olympiad, uh, yeah, the model was not using, uh, other tools, like, uh, yeah, it-

    3. AM

      And, and, and again, and that was like two years ago, you ask it to multiply two four-digit numbers, it would fail.

    4. JP

      Yeah, but, but definitely, like, you know, for, for this kind of contest, it's really, like... It is, of course, like, in a limited domain-

    5. AM

      Mm

    6. JP

      ... of, of math, but it really is about, like, fairly creative thinking, not about applying a formula.

    7. AM

      I guess that's part of the challenge, though, is that once you start moving outside of math, it gets to be harder. You can start to come up with things like humanity's last exam, which I think is a pretty neat test, but you find that certain models, after they learn a certain kind of tool use, kind of figure out maybe sort of how to solve these problems better. And I would wonder, what kind of benchmarks are we gonna need? You know, what are you, what are you looking at to say, "Okay, this is, this is how I can kind of get an objective measure of a capability?"

    8. SS

      One thing that surprised me in the past, I was talking to one, uh, one of our, uh, coworkers here, uh, Anna MacAndrew.

    9. AM

      Mm.

    10. SS

      And I was, uh, telling her about, uh, IMO. I was excited about, like, so- some progress, and she's like: "What's IMO?"

    11. AM

      Yeah. [chuckles]

    12. SS

      And, uh, and that, that, that kind of like, it was, like, a very in front of the mind for me because I, I, I do realise that, like, w- some of those benchmark, we kind of live in a bubble a little bit.

    13. AM

      Mm-hmm.

    14. SS

      F- for me, that competition, uh, feels important, and especially like, the, the computer science c- counterpart, uh, IOI, because it was a big part of my life, and so it's true for many coworkers here. But actually, like, for an average person, like, wor- w- working in other field or maybe not as interested in mathematics or, or, or, or, or computer science, maybe they're interested in history or something, the, the, the-

    15. AM

      Yeah, Anna speaks, like, five languages, too-

    16. SS

      Yeah

    17. AM

      ... so I could see for her, a different metric based on that would be interesting.

    18. SS

      Yeah. So, so, so I think, like, o- o- one thing that, like, um, that, uh, it's not a perfect metric, but, uh, at least helps keep us honest and keep, uh, helps keep us escape the, uh, the bubble is, uh, is just ChatGPT usage, right?

    19. AM

      Mm.

    20. SS

      Because everybody uses ChatGPT, and they use it for all sorts of use cases. Uh, and obviously, there's, like, a lot of pitfalls to using that as a metric, but at least it avoids that partic- particular problem where, like, there are just some things that I'm more familiar with, and other people might, uh, appreciate other things, and this gives you, like, a very wide coverage.

    21. AM

      Yeah, and in there, too, you have subsets of users, people who are building GPTs and doing more complicated stuff. You mentioned before, too, the fact that the model will reason longer, and that seems like a very interesting way to evaluate capabilities.

    22. JP

      Yeah, yeah, and, and I think this is also maybe, like, one, um, you know, challenge with focusing on, on the kind of usage of, like, ChatGPT and broad adoption of AI as, as the metric of progress. Like, I think, uh, this hasn't really happened to a very meaningful extent yet, but I think it will start happening pretty soon. We should be able to use vastly more compute, um, than, you know, a, a, a, a, a, a, a user would normally be willing to, um, um, to, to, to, to, to, to buy for themselves to produce, you know, technology artefacts that are useful to a lot of people. Um, and I, I think, I think that, that, for

  10. 21:4523:30

    Recognizing when a model can’t solve a problem

    1. JP

      me, will be a very important measure of progress.

    2. AM

      Which of these wins were the most surprising to you?

    3. JP

      I think we definitely kind of anticipated getting to this point when we saw the reasoning models starting to work.

    4. AM

      Mm-hmm.

    5. JP

      Um, at the same time, I think this, like, recent, uh, set of wins is very impressive. I think maybe out of those, uh, um, I think, I think IMO came a little bit sooner than I expected.

    6. AM

      Mm.

    7. JP

      IMO gold, again, like, I think IMO problem six will, will still, uh... IMO has, like, all the problems require, like, creative thought and some, uh, new insights, I think, but typically, like, you know, there's this proverbial problem six that is, like, requires very out-of-the-box thinking. Uh, and, you know, it's really kind of like, usually, like, outside the kind of typical domains of the other problems. Um, and you know, so in the past, we were actually kind of drawing a boundary between, like, getting a gold, you know, like, solving these other problems, and, like, actually consistently solving all the problems, and in particular, problem six. So it was pretty, it's pretty, pretty, uh, you know, hilarious in some way to see ourselves and also Google DeepMind at the same time, like, "Oh, yeah, we solved problems one to five perfectly, and we didn't make any progress on problem, problem six." I think that kind of makes that challenge pretty clear.

    8. AM

      Yeah, that was, I think that was what's interesting is that, yeah, that I think that the, the OpenAI model said, like, "Yeah, I don't think I can solve this," didn't even try or said that it had a problem with that. Was that correct?

    9. JP

      Yeah, the model was able to correctly identify that it didn't make progress on the problem.

    10. AM

      That's pretty fascinating to think about that, that the model's able to sort of determine that. 'Cause, uh, you know, there's a lot of conversations about when people talk about hallucination, which I think is a kind of a poorly understood thing, and there's a difference between fluid and crystalline thinking, and, you know, one is how much knowledge a model has, and the other is its problem-solving capability.

  11. 23:3026:50

    Storytime: AtCoder competition in Japan

    1. AM

      And when you get to the point where it's able to do that, it's able to say, "Hey, no, I think, you know, I won't be able to answer this," that's pretty interesting sort of point to get to. Um, I've been told to ask this question about a Jap- uh, a live stream in Japan.

    2. JP

      Oh, so, so, so I think in the, in the past few weeks, actually, like, our models have performed incredibly well in, uh, three competitions. Uh, so we talked about two of them, which is IOI and IMO. Um, there is also this competition for... that is open to everyone, not just, not, not just high schoolers, um, called, um, AtCoder. It's a, it's a kind of very prestigious-

    3. AM

      Mm

    4. JP

      ... very, uh, um, um, high-quality, uh, competition organised in Japan, uh, but open to, o- open to competitors worldwide. And in this particular contest, it was about, um, kind of longer-horizon, um, heuristic problems, where you're, you're given only a single problem, and you have ten hours to solve it.... and so you have competitors, uh, racing to figure out, like, the best approach to, to, up to, to, to, to this, like, difficult optimization problem. Um, so it's a bit different because there isn't, like, a single correct solution. There isn't, like, a single pattern to follow. Like, these tasks are, like, extremely diverse, and you, you can focus on the single task for, for, for, for, for ten hours. And so we, we entered our, a model into this contest, and, um, you know, to, to, to me, this, this had a little bit of a personal significance. Um, I used to be a kind of very, uh, engaged competitor in the past in this, like, more short-form, like, closed-form contests, like IOI. Um, and, um, my friend, uh, Saiho, um, who also works at OpenAI at the time, um, excelled at this, like, long-duration, uh, contest. And when we worked together, he would mock me a little bit that, you know, my sort of contest would be automated, uh-

    5. AM

      Mm

    6. JP

      ... long before his. Um, because, you know, they are-- that kind of, like, longer duration require, uh, um, require, require kind of more focused work. Um, and turns out, like, in this, in this, in this contest in Japan, Saiho was actually, like, one of the top contenders. Um, and so I was watching this live stream, watching our model kind of race with Saiho throughout the competition. Um, in the end, um, our model actually got second place, and Saiho won.

    7. AM

      [chuckles]

    8. JP

      So, you know, he alone stood in the way of, of his, uh, uh, of, of, you know, his prediction not coming true. [chuckles]

    9. AM

      Still two wins for OpenAI.

    10. SS

      I think, uh, one, one thing that also stood out to me is, like, uh, Saiho, at the end of the competition, he was, like, really, really tired, and they interviewed him a little bit to, to, to talk about his experience, uh, like, in the middle of the competition. And, um, I, I don't think I can quote him directly on this podcast, but he's like: "Your models are very, very bad. I want to go to sleep."

    11. AM

      [chuckles]

    12. SS

      "I am tired." [laughing]

    13. AM

      Yeah. We, we've heard talks about, like, the wall. We mentioned that before, and I think it was interesting because reasoning kind of came out of nowhere. I mean, there were, you know, hints of stuff, some papers and stuff, but people really hadn't kind of drawn the line. Then all of a sudden, the o1 model comes out, and the whole idea that you can not just have a model give answers, you can let the model

  12. 26:5028:55

    How reasoning breakthroughs really happen

    1. AM

      kind of have an inner monologue, talk to itself, and solve things through. Do we think that's enough to take us to AGI, or are there other breakthroughs needed, or are there just other breakthroughs you think are gonna happen?

    2. SS

      Yeah, I, I just need to point out that-

    3. AM

      Yeah

    4. SS

      ... like, the team here worked extremely hard on, on this particular thing, and it, uh, it feels like something simple, like you just need longer chain of thought, but, like, actually to make it work was really hard-earned. And, um, and I do think, like, back to your previous question of, like, what was the surprising result when, when we first noticed that it's working or we first noticed that, that, that, that we can train those models and give them more data, and they get better, that was, like, I think, one of the most kind of shocking moments, the moments where we are like, um... We started asking, uh, uh, uh, uh, like, very, very seriously the question, like: "Are we ready as an organization for, for incredibly fast-paced progress?" Like, I, I, I remember there was, like, one particular evening, like, I think eleven PM, like, there was like... I think we were on the line with, with Sam and Mira and [chuckles] just kind of trying to-- I, I think we got a little bit freaked out by those results. Sometimes, sometimes that happens. Uh, yeah.

    5. AM

      The, the, the pace is fast. I mean, it is a fast thing, and like I said, the, you know, the joke is people, nothing happens for six weeks, they think it's slowed down. But then, if you look year over year, it is. And, uh, I mean, it's a fair point, 'cause yeah, you, you have things that you're aware of internally when you work on something for a couple of years, and then, "Hey, there's a research paper," but it's like, yeah, it's not like it came out last night. It was like, there was a lot of work onto it. But I'd say, too, the world was sort of surprised by the fact that there is this really fundamental new way to sort of make these models do even more, to take, take kind of the existing sort of infrastructure, so to speak, and get a lot more capability out of it. Where do you think the next breakthroughs are gonna happen?

    6. JP

      I think one thing we always try to not underestimate is the importance of scaling. Um, I think, you know, even as we look at these, we think models,

  13. 28:5530:30

    What’s next for scaling and long-horizon reasoning

    1. JP

      you know, like, it's not like, you know, the previous scaling paradigm of pre-training, uh, has vanished, right?

    2. AM

      Mm.

    3. JP

      Like, I think we will see these things compound, and I think there's also, like, new, uh, new directions that, that we can move in. In particular, um, we were talking about, you know, extending the horizon that these models can, uh, you know, plan for and, and, and, and reason. And, and I think if you look at it from the perspective of just, like, compute spend, um, you know, we say, "Okay, yeah, we went from GPT, GPT-4 was doing some amount of compute for, for, for every answer, to like, you know, GPT-5 Pro, which maybe uses, I don't know, like, uh, ten X, twenty X, I don't know," like some, some, you know, non-trivial, but in some ways, not, not that impressive amount of compute more, right? And can produce much better answers. Uh, I think on the scale of, like, what amount of compute would you, would you be willing to spend on a problem that actually matters to a lot of people, right? On, like, progress on a, on a, on a medical research question, uh, you know, progress on developing the next generation of models, right? Like, like, these are, like, incomparably larger amounts.

    4. AM

      Mm.

    5. JP

      And so I think, I think that is, that question of, like, model, uh, persistence and ability to work for a very long time on focus problems is a pretty clear next step.

    6. AM

      Mm. How would you put-... the practical implications of

  14. 30:3034:00

    What AGI will look and feel like

    1. AM

      AGI to sort of like if you were talking to a typical ChatGPT user or something, like, what would be the- what would their experience be like in a few years from now, or five years from now? Which sounds far away, but it's really not, 'cause we're just five years ago, GPT-3 came out, and that feels like a blur. What would an AGI-like model be capable of?

    2. JP

      So I, I, I was, I was talking about automating research.

    3. AM

      Mm-hmm.

    4. JP

      Um, you know, my picture of, of how that would actually look like is, you know, imagine a, a company of very capable researchers and engineers that is largely automated, right? And, and, um, now, again, like, that is... I think that is something that, like, you know, will interface with the world in all sorts of ways. It won't be just, like, kind of a black box. Like, it will, you know, talk to people, it will kind of, like, take in inputs, it will run experiments. But I think, like, you know, having, having this sort of potential for developing new technology and, you know, and, and other kind of artifacts, you know, uh, um, code bases, designs, um, I think, I think kind of, like, radically accelerate, uh, the pace of technical progress. Um, so I think that is something that, like, we will feel and, you know, and we need to, to do a lot of work to, to get it right, you know, from a technical and from societal perspective. Um, but I think that is kind of our time frame. I think we, we should also expect, like, a lot of progress on the, on the actual kind of, you know, um, interfaces to AI that we interact with.

    5. AM

      Mm-hmm.

    6. JP

      You know, we, we see, like, ChatGPT can, can feel quite human-like. We can, we can form attachments with it. I think, you know, as it becomes more persistent, as it becomes kind of capable of expressing itself in, like, different forms and texts, right? Like, I think that those effects will become stronger, and again, like, that will be something, uh, I think will become a, a very big and important conversation.

    7. AM

      I just got access in ChatGPT to have it actually read my calendar in Gmail, and, and I realized, like, how far we've come, 'cause I'm excited about that now. I'm not really terrified that it's gonna start writing, like, you know, Ewok fan fiction to somebody. You know, and I think that's sort of this neat threshold that we've sort of crossed, this sort of level of trust.

    8. JP

      I think there's definitely, like... Like, we are in a place where there's, like, a very tough trade-off, where, like, there is, like, such clear, just economic, you know, personal value you can extract out of having the model, um, you know, have access to a lot of your, um, a- a- a- of your data. At the same time, I think, like, you know, we are not at, like, the threshold of, like, robustness, where, like, you know, like, we can fully trust these models to not be exploited by someone trying to exploit them.

    9. AM

      Mm-hmm.

    10. JP

      Um, yeah, it's def- it's definitely, like, a big problem I think, you know, we as a field will have to iterate on.

    11. AM

      What would you tell two versions of you guys today in high school? What would you do? If you're visiting your old classroom, what would you say right now, tell them about the future? What advice would you give?

    12. SS

      Invest in Bitcoin. [laughing]

    13. JP

      [laughing]

    14. AM

      No, I mean today. Today, even today. In 2025, what would you tell a high school student?

    15. SS

      High school students today? Oh, yeah, yeah, that, that one is also, I think, uh, a great question, right? Because I, I, I, I hear a lot of, uh, kind of... what I consider misinformation on that online. So you should absolutely learn to code.

    16. AM

      Yeah.

    17. SS

      Like, one skill that, that is at premium and will continue being at premium is

  15. 34:0036:25

    Advice to high school students in 2025

    1. SS

      to have, like, um, have, uh, like, really structural intellect that can, like, break complicated problems into pieces. And, you know, like, that might not be programming in the future, but programming is a fine way to acquire that skill. So are other, uh, other kind of, uh, domains where you need to think a lot. So, so don't, don't let people tell you that, uh, you should not learn to code.

    2. AM

      Yeah, I, I learned to code late in life, and that's actually how I ended up working at OpenAI as an engineer. And, and I try to explain to people, just because a system can do the thing, doesn't mean you don't want to know how it works anymore. And as you said, when you understand how to break down a task... When I worked at OpenAI in prompt engineering, my coding understanding helped me understand to take both language and break it down and make it do better things. And I think that people who bridge those gaps are really at an advantage. And so whenever I hear people say, like, "Don't learn to code," it's like, do I want an airplane pilot who doesn't understand aerodynamics? Like, this doesn't make much sense to me.

    3. JP

      Well, I've... You know, thinking about how I thought about things in high school, I think it's, like, pretty incredible, like, how many kind of perceived constraints are not actually there when you really think about it. You know, maybe, maybe, like, the first revelation to me was like, "Hey, you know, if I really kind of, like, am passionate about this computer science stuff, like, it is- I can actually spend a bit more time on it at the cost of, you know, maybe spending a bit of- a bit less time on, like, you know, like, other 12 subjects in school." Um, but, you know, but then, like, you know, but then, like, somehow it, like, again, like, took kind of like a, um... It was, like, a big revelation to me that, like, actually, you know, I, I can, I can, I can, I can, um, go and, you know, study in the USA at some point. And, like, you know, that's not really something that, like, seems, uh... That's, like, obviously in our action space and, you know, and obviously kind of like, you know, like spending some time here in, uh, Silicon Valley, right, and kind of seeing, like, how, you know, people are willing to really, like, attack these big problems with ambition and, like, the kind of, um, and the belief that, like, you can actually, you, you, you can actually, um, make a meaningful, um, positive change in the world. Uh, yeah, I think, I think it has been incredibly inspiring and, uh, yeah, and something I, I, I, I, I, I, I, I, I cherish about, about, about this community.

  16. 36:2540:23

    Balancing trust and personal value

    1. AM

      Is there a book or something that, like, inspired you?

    2. JP

      ... I think there's, uh, there's a couple of books. Um, I remember my, [chuckles] it's actually, yeah, it's actually very hilarious, like, thinking about it now. Um, I didn't really connect the dots, but, uh, my dad gave me this book once, um, when I was, like, in a pretty, uh, um... I think I was, like, 15, and I was, like, pretty unsure what I wanted to do. Um, yeah, it was, um, a Polish, Polish version of a book by, like, some author I didn't know, uh, called, uh, Hackers and Painters.

    3. AM

      Mm.

    4. JP

      Uh, [chuckles] yeah, so, uh, yeah, it was actually, uh, Paul Graham. Uh, so, so I guess like, uh, uh, yeah, again, like, kind of, like, this community. Um, so I f- I found that pretty, pretty, pretty inspiring. [chuckles]

    5. AM

      Yeah, there's something helpful, I think, to, to hearing the message of, like, "No, it's okay to dream big and go do stuff," that you can just make things happen in the world, and I think that the more people realize that, kind of the better the world gets to be. Was there any book that influenced you, or movie, TV show?

    6. SS

      Oh, movie. I have a stupid answer to that question-

    7. AM

      I love stupid answers

    8. SS

      ... which kind of feels bad after the profound one. But, but like, okay, so I watched Iron Man.

    9. AM

      Yeah.

    10. SS

      And it inspired me to start a PhD in robotics.

    11. AM

      Th- that's a great answer, though. Like, you know, The, The Martian by Andy Weir, I met a scientist at NASA who was a botanist who read that book, and I'm like-

    12. SS

      Yeah

    13. AM

      ... "Well, they got the atmospheric physics wrong and all this." He's like, "Well, that's why I'm here." I'm like, "Oh!"

    14. SS

      Well-

    15. AM

      Yeah.

    16. SS

      Yeah, I guess I didn't get to the stupid part.

    17. AM

      Oh.

    18. SS

      The stupid part was, like, when I started working on robotics, I was very disappointed how bad those robots are. I-- somehow it didn't occur to me that, like, maybe the movie is a movie.

    19. AM

      [laughing]

    20. SS

      Uh, yeah. So that whole experience was kind of bad for me, if not for- would be kind of bad for me, if not for the fact that this is... There I, like, met a friend who was into deep learning, and at the time, I thought all of the machine learning kind of is a hype, but it was an interesting systems problem. And then s- out of, out of nowhere, and as I'm sure, like, uh, you know, like, I would frustrate some DeepMind folks by saying that, AlphaGo came out. Uh, [chuckles] no, I'm sure it wasn't out of nowhere.

    21. AM

      Yeah.

    22. SS

      I'm sure it was years in the making. Uh, and, and that, like, was very inspiring, actually to both of us.

    23. AM

      Yeah.

    24. SS

      Uh, a- a- and, and since then, it was just hard not to work.

    25. JP

      Yeah. Took me a while to become, um, convinced that deep learning is more than a, than a fad. Uh, because, you know, like, we don't really understand the kind of underlying optimization, and, you know, I think this kind of has been the story of our research here, make- trying to make progress on, on, on these questions, like, about how it really works, but it really is like studying a physical phenomenon in some way. And, you know, to a classically trained computer scientist, that was a, that was a weird thing to accept.

    26. AM

      Mm-hmm. Yeah-

    27. SS

      I do remember when Jakub was telling me about, like, scaling up principled convex optimization.

    28. AM

      Like-

    29. SS

      That was before AlphaGo.

    30. AM

      Yeah, we, we, we did- yeah, and AlphaGo was interesting, 'cause first, like, "Oh, cool, it solved Go." And then we're like, "Yeah, but it just learned by watching all these." Then they did AlphaGo Zero, where it's self-taught, and you're like, "Okay, game over, folks." Like, there's a trajectory here, and I think that's continued on. But I think that, yeah, if you hadn't watched Iron Man, maybe Thor instead, you know, maybe things would've turned out better.

Episode duration: 40:23

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode yBzStBK6Z8c

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome