Skip to content
Dwarkesh PodcastDwarkesh Podcast

Ilya Sutskever on Dwarkesh Patel: Why RL Overfits the Evals

Why RL targeting benchmark evals creates models that ace GPT-3 tests but cycle bugs: Sutskever links this to skipping value functions in the training mix.

Ilya SutskeverguestDwarkesh Patelhost
Nov 25, 20251h 36mWatch on YouTube ↗

EVERY SPOKEN WORD

  1. 0:009:39

    Explaining model jaggedness

    1. IS

      You know what's crazy?

    2. DP

      Uh-huh.

    3. IS

      That all of this is real.

    4. DP

      Yeah? Meaning what?

    5. IS

      Don't- don't you think so?

    6. DP

      Meaning what?

    7. IS

      Like all this AI stuff, and all this Bay-

    8. DP

      Like, it actually happened?

    9. IS

      ...Area? Yeah. That it's happe- like, isn't it straight out of science fiction?

    10. DP

      Yeah. I- i- another thing that's crazy is, like, how normal this low takeoff feels. The idea that we'd be investing 1% of GDP in AI, like, I feel like it would ha- felt like a bigger deal, you know? But right now, it just feels like-

    11. IS

      We get used to things pretty fast, turns out, yeah. But also, it's kinda like it's abstract, like, what does it mean? What it means that you see it in the news-

    12. DP

      Yeah.

    13. IS

      ...that such and such company announced such and such dollar amount.

    14. DP

      Right.

    15. IS

      That's- that's all you see.

    16. DP

      Right.

    17. IS

      It's not really felt in any other way, so far.

    18. DP

      No. Should we actually begin here? I think this is interesting discussion.

    19. IS

      Sure.

    20. DP

      I think your point about, well, from the average person's point of view, nothing is that different, will continue being true, even into the singularity.

    21. IS

      No, I don't think so.

    22. DP

      Okay. Interesting.

    23. IS

      So, the thing which I was referring to not feeling different is, okay, so such and such company announced some, uh, difficult to comprehend dollar amount of investment.

    24. DP

      Right.

    25. IS

      I don't think anyone knows what to do with that.

    26. DP

      Yeah.

    27. IS

      But I think that the impact of AI is gonna be felt. AI is going to be diffused through the economy. There are very strong economic forces for this, and I think the impact is going to be felt very strongly.

    28. DP

      When do you expect that impact? I think the models seem smarter than their economic impact would imply.

    29. IS

      Yeah, this is one of the very confusing things about the models right now, how to reconcile the fact that they are doing so well on evals.

    30. DP

      Mm-hmm.

  2. 9:3918:49

    Emotions and value functions

    1. DP

      Here's analogies that people have proposed for what the human analogy to pre-training is, and I'm curious to get your thoughts on why they're potentially wrong. One is to think about the first 18 or 15 or 13 years of a person's life when they aren't necessarily economically productive, but they are doing something that is making them understand the world better and so forth. And the other is to think about evolution as doing some kind of search for three billion years which then results in a human lifetime instance. And then I- I'm, I'm curious if you think either of these are actually analogous to pre-training or how wo- how would you think about at least what lifetime human learning is like, if not pre-training?

    2. IS

      I think there are some similarities-

    3. DP

      Yeah.

    4. IS

      ... between both of these two pre-training.

    5. DP

      Yeah.

    6. IS

      And pre-training tries to play the role of both of these, but I think there are some big differences as well. The amount of pre-training data is very, very staggering.

    7. DP

      Yes.

    8. IS

      And somehow, a h- a human being, after even 15 years with a tiny fraction of the pre-training data, they know much less.

    9. DP

      Yeah.

    10. IS

      But whatever they do know, they know much more deeply, somehow. And the mistakes, like, like already at that age, you would not make mistakes that RAs make.

    11. DP

      Yeah.

    12. IS

      There is another thing. You might say, "Could it be something like evolution?" And the answer is maybe, but in this case, I think evolution might actually have an edge. Like, there is this... I remember reading about this case where some... You know that one thing that neuroscientists do, or rather one way in which neuroscientists can learn about the brain, is by studying people with brain damage to different parts of the brain.

    13. DP

      Mm-hmm.

    14. IS

      And, and so it's... and some people have the most strange symptoms you could imagine. It's actually really, really interesting. And there was one case that comes to mind that's relevant. I read about this person who had some kind of brain damage that took out... I think a stroke or an accident that took out his emotional processing, so he stopped feeling any emotion. And as a result of that, you know, he still remained very articulate and he could solve little puzzles and on tests he seemed to be just fine, but he felt no emotion. He didn't feel sad, he didn't feel anger, he didn't feel animated, and he became somehow extremely bad at making any decisions at all.It will take him hours to decide on which socks to wear, and he would make very bad financial decisions. And that's very, that does- what- what does it say about the role of our built-in emotions in making us like a viable agent, essentially? And I guess to connect to your question-

    15. DP

      Yes.

    16. IS

      ... about pre-training, it's like maybe pre-tr- like, maybe if you are good enough at, like, getting everything out of pre-training you can get, you- you could get that as well. But that's the kind of thing which seems... Well, it may or may not be possible to get that from pre-training.

    17. DP

      What is that? Clearly not just directly emotion. It seems like some almost value function-like thing, which is giving- telling you which decision to be ma- like, what- what the r- end reward for any decision should be, and you think that doesn't sort of implicitly come from pre-training?

    18. IS

      I think it could.

    19. DP

      Oh.

    20. IS

      I'm just saying it's not one, it's not, uh, 100% obvious.

    21. DP

      Yeah. But, what- what is that? Like, what, how do you think about emotions and what is the ML analogy for emotions?

    22. IS

      It should be some kind of a value function thing.

    23. DP

      Yeah.

    24. IS

      But I don't think there is a great ML analogy, because right now, value functions don't play a very prominent role in, uh, the things people do.

    25. DP

      It might be worth defining for the audience what a value function is, if- if you're gonna do that.

    26. IS

      I mean, certainly. I, I'll be very happy to do that. Right. So, so when people do reinforcement learning, the way reinforcement learning is done right now, how does it, how do people train those agents? So you have a neural net, and you give it a problem, and then you tell the model to go solve it. And the model takes maybe thousands, hundreds of thousands of actions or thoughts or something, and then it produces a solution. A solution is created. And then the score is used to provide a training signal for every single action in your trajectory.

    27. DP

      Mm-hmm.

    28. IS

      So that means that if you are doing something that goes for a long time, if you're training a task that takes a long time to solve, you will do no learning at all until you solve the, until you come up with a proposed solution. That's how reinforcement learning is done naively. That's how O1, R1 ostensibly are done. The value function says something like, "Okay, look. Maybe I could sometimes, not always, could tell you if you are doing well or badly." Uh, the notion of a value function is more useful in some domains than others. So for example, when you play chess and you lose a piece, you know, "I messed up." You don't need to play the whole game to know that what I just did was bad and therefore, whatever, um, whatever preceded it was also bad. So the value function lets you short circuit the wait until the very end. Like, let's suppose that you started to pursue some kind of, um... okay, let's suppose that you are doing some kind of a math thing or a programming thing, and you're trying to explore a particular solution direction. And after, let's say, after a thousand, uh, steps of thinking, you concluded that this direction is unpromising. As soon as you conclude this, you could already get a reward signal a thousand times steps previously when you decided to pursue down this path. You say, "Oh, next time, I shouldn't pursue this path in a similar situation," long before you actually came up with a proposed solution.

    29. DP

      Mm. This was in the DeepCGAR1 paper is that the space of trajectories is so wide that maybe it's hard to learn a mapping from an intermediate trajectory and value. And also given that, you know, in coding, for example, you will have the wrong idea, then you'll go back, then you'll change something.

    30. IS

      This sounds like such lack of faith in deep learning.

  3. 18:4925:13

    What are we scaling?

    1. IS

    2. DP

      Yeah. People have been talking about scaling data, scaling parameters, scaling po- compute. Is there a more general way to think about scaling? What are the other scaling a- axes?

    3. IS

      So the thing... So he- so he- here is a perspective. Here's a perspective that I think might be, might be true. So the way ML used to work is that people would just tinker with, with stuff and try to... and try to get interesting results. That's what's been going on in the past. Then the scaling insight arrived, right? Scaling laws, GPT-3, and suddenly everyone realized we should scale. And it's just this, this is an example of how language affects thought. Scaling is, well, just one word, but it's such a powerful word because it informs people what to do. They say, "Okay, let's, let's try to scale things." And so you say, "Okay, so what are we scaling?" And pre-training was the thing to scale.

    4. DP

      Yeah.

    5. IS

      It was a particular scaling recipe.

    6. DP

      Yes.

    7. IS

      The big breakthrough of pre-training is the realization that this recipe is good. So you say, "Hey, if you mix some compute with some data into a neural net of a certain size, you will get results, and you will know that it will be better if you just scale the recipe up." And this is also great. Companies love this because it gives you a very, uh, low-risk way of investing-

    8. DP

      Yeah.

    9. IS

      ... your resources.

    10. DP

      Yeah.

    11. IS

      Right? It's much harder to invest your resources in research. Compare that. You know, if you research, you need to have, like, go-forth researchers and research and come up with something versus get more data, get more compute. You know you'll get something from pre-training. And indeed, you know, it looks like I, based on various, um, um, things people say on tw- some people say on Twitter, maybe it appears that Gemini have found a way to get more out of pre-training. At some point, though, pre-training will run out of data. The data is very clearly finite. And so then, okay, what do you do next? Either you do some kind of a souped-up pre-training, different recipe from the one we've done before, or you're doing RL, or maybe something else. But now that compute is big, compute is now very big, in some sense, we are back to the age of research. So maybe here's another way to put it. Up until 2020, from twenty fif- from 2012 to 2020, it was the age of research. Now, from 2020 to 2025, it was the age of scaling, or maybe plus/minus. Let's add error bars to those years because people say, "This is amazing, you gotta scale more, keep scaling." The one word, scaling. But now the scale is so big, like, is, is the, is the belief really that, "Oh, it's so big, but if you had 100x more, everything would be so different"? Like, it would be different, for sure. But, like, is the belief that if you just 100x the scale, everything would be transformed? I don't think that's true. So it's back to, to the age of research again, just with big computers.

    12. DP

      Th- th- that's a very interesting way to put it. But let me ask you the question you just posed then. Wha- uh, what are we scaling and what, what does it, what would it mean to have a recipe? Because I guess I'm not aware of a very clean relationship that almost looks like a law of physics which existed in pre-training. There's a power law between data or computer parameters and loss. What is the kind of relationship we should be seeking, and how shou- how, how should we think about what this n- new recipe might look like?

    13. IS

      So we've, we've already witnessed a transition from one type of scaling to a different type of scaling, from pre-training to RL. Now people are scaling RL. Now, based on what people say on Twitter, they spend more compute on RL than on pre-training at this point, because RL can actually consume quite a bit of compute.

    14. DP

      Yeah.

    15. IS

      You know, you do very, very long rollouts.

    16. DP

      Yes.

    17. IS

      So it takes a lot of compute to produce those rollouts. And then you get relatively small amount of learning per rollout-

    18. DP

      Right.

    19. IS

      ... so you really can spend, you really can spend a lot of compute. And I could imagine... Like I wouldn't... At, at this, at this sta- uh, it's, it's more like I wouldn't even call it a scale, um, scaling. I would say, "Hey, like, what are you doing?"

    20. DP

      (laughs)

    21. IS

      "And is the thing you are doing the b- the, the, the most productive thing you could be doing?"

    22. DP

      Yeah.

    23. IS

      "Can you find a most, more productive way of using your compute?" We've discussed the value function business earlier. And maybe once people get good at value functions, they will be using their comp- their, um, resources more productively. And if you find a whole other way of training models, you could say, "Is this scaling, or is it just using your resources?" I think it becomes a little bit ambiguous in the sense that when people were in the age of research, back then, it was like people say, "Hey, let's try this and this and this. Let's try that and this and that. Oh, look. Something interesting is happening." And I think there will be a return to that.

    24. DP

      Wh- so if we're back in the era of research, stepping back, what is the part of the recipe that we need to think most about? When you say value function, people are already trying the current recipe, but then having LLM as a judge and so forth. And you could say that's a value function, but it sounds like you have something much more fundamental in mind. Do we need, do we need to go back to... Uh, sh- should we even rethink-... pre-training at all, and not just add more steps to the end of that process?

    25. IS

      Yeah. So the, the, the, the discussion about value function, I think it was interesting. I want to re- like emphasize that I think the value function is something like, it's going to make our realm more efficient, and I think that makes a difference. But I think that anything you can do with a value function, you can do without, just more slowly.

    26. DP

      Mm-hmm.

    27. IS

      The thing which I think is the most fundamental is that these models somehow just generalize dramatically worse than people.

    28. DP

      Yes.

    29. IS

      And it's super obvious. That's, that seems like a very fundamental thing.

  4. 25:1335:45

    Why humans generalize better than models

    1. DP

      Okay, so this is the crux, the generalization, and there's two sub-questions. There's one which is about sample efficiency, which is, why should it take so much more data for these models to learn than humans? There's a second about... Even separate from the amount of data it takes, there's a question of, why is it so hard to teach the thing we want to a model than to a human? Which is to say, for, to a human, that we don't necessarily need a verifiable reward to be able to... You're probably mentoring a bunch of researchers right now, and you're, you know, talking with them, you're showing them your code, and you're showing them how you think, and from that, they're picking up your way of thinking, and how they should do research. You don't have to set like a verifiable reward for them that's like, "Okay, this is the next part of your curriculum, and now this is the next part of your curriculum" and, "Oh, it was, th- this training was unstable, and we gotta..." There's not this schleppy, bespoke process. So perhaps these two issues are actually related in some way, but I'd be curious to explore this, this second thing, which was more like continual learning, and this first thing, which feels just like, um, sample efficiency.

    2. IS

      Yeah. So you know, you could actually wonder, one, one possible explanation for the human sample efficiency that needs to be considered is evolution, and evolution has given us a small amount of the mo- the most useful information possible. And for things like vision, hearing, and locomotion, I think there's a pretty strong case that evolution actually has given us a lot.

    3. DP

      Mm-hmm.

    4. IS

      So for example, human dexterity far exceeds... I mean, robots can become dexterous too if you subject them to like a huge amount of training and simulation, but to train a robot in the real world to quickly like pick up a new skill like a person does seems very out of reach.

    5. DP

      Yeah.

    6. IS

      And here you could say, oh yeah, like locomotion, all our ancestors needed great locomotion, squirrels like... So locomotion maybe like you've got some unbelievable prior.

    7. DP

      Yeah.

    8. IS

      You could make the same case for vision. You know, I believe Yann LeCun made the point, oh, like, um, children learn to drive after 16 hour, after 10 hours of practice, which is true, but our vision is so good. At least for me, when I remember myself being five year old, my, I was, I was very excited about cars back then, and I'm pretty sure my car recognition was more than adequate for self-driving already as a five year old. You don't get to see that much data as a five year old. You spend most of your time in your parents' house, so you have very low data diversity.

    9. DP

      Yeah.

    10. IS

      But you could say maybe that's evolution too, but in language and math, encoding, probably not.

    11. DP

      It still seems better than models. I mean, obviously models are better than the average human at language and math encoding, but are they better at the average human at learning?

    12. IS

      Oh yeah. Oh yeah, absolutely. What I meant to say is that language, math, and coding, and especially math and coding, suggests that whatever it is that makes people good at learning is probably not so much a complicated prior, but something more, some fundamental thing.

    13. DP

      Wait, I'm, I'm not sure I understood. Wh- wh- why should that be the case?

    14. IS

      So consider a skill that people exhibit some kind of great reliability or, you know, um...

    15. DP

      Yeah.

    16. IS

      If the skill is one that was very useful to our ancestors for many millions of years, hundreds of millions of years, you could say, you could argue that maybe humans are good at it because of evolution...

    17. DP

      Yeah.

    18. IS

      ... because we have a prior.

    19. DP

      Yeah.

    20. IS

      An evolutionary prior that's encoded in some very non-obvious way...

    21. DP

      Yeah.

    22. IS

      ... that somehow makes us so good at it.

    23. DP

      Yeah.

    24. IS

      But if people exhibit great ability, reliability, robustness, ability to learn, in a domain that really did not exist until recently, then this is more an indication that people might have just better machine learning, period.

    25. DP

      Mm-hmm. But then how should we think about what that is? Is it a matter of... Yeah, what is the ML analogy for wh- There's a couple interesting things about it. It takes fewer samples. It's more unsupervised. You don't have to set a ver- Like a child learning to drive a car, a child should not learn how to drive a car. A teenager (laughs) learning how to drive a car is like not exactly getting some pre-built verifiable reward. They're, it comes from their interaction with the machine and the, with the environment. Um, and yet it takes mu- much fewer samples, it seems more unsupervised. It seems more robust.

    26. IS

      Much more robust. The robustness of people is really staggering.

    27. DP

      Yeah. So is it like... Okay, and do you have a unified way of thinking about why are all these things happening at once? What is the ML analogy that would, th- that could be, could realize something like this?

    28. IS

      So, so, so, um, this is where, you know, one of the things that you've been asking about is how can, you know, the teenage driver kind of self-correct and learn from...... their experience-

    29. DP

      Yeah.

    30. IS

      ... without an external teacher? And the answer is, well, they have their value function.

  5. 35:4546:47

    Straight-shotting superintelligence

    1. DP

      I'm curious, so if you say we are back in the era of research, you were there from 2012 to 2020, and do, do you have... Y- yeah. What, what is n- now the vibe gonna be if we go back to the era of r- era of research? For example, even after AlexNet, the amount of compute that was used to run experiments kept increasing, and the f- size of Frontier Systems kept increasing. And do you think now that y- this era of research will still require tremendous amounts of compute? Um, do you think it will require going back into the archives and reading old papers? What is... y- y- maybe what was the vibe of, like, y- you were at Google, and, um, OpenAI, and Stanford, uh, these places when there was like a more of a vibe of research. What, what, what kind of things should we be expecting in the community?

    2. IS

      So one consequence of, um, the age of scaling is that there was this, um, scaling sucked out all the air in the room.

    3. DP

      Yeah.

    4. IS

      And so because scaling sucked out all the air in the room, everyone started to do the same thing. We got to the point where, uh, we are in a world where there are more companies than ideas-

    5. DP

      Hmm.

    6. IS

      ... by quite a bit. Actually, on that, you know there is the Silicon Valley saying that says that ideas are cheap, execution is everything. And people say that a lot.

    7. DP

      Yeah.

    8. IS

      And there is truth to that. But then I saw, I saw someone say on Twitter, um, something like, "If ideas are, are so cheap, how come no one's having any ideas?"

    9. DP

      (laughs)

    10. IS

      And I think it's true too. I think, like if you think about, um, research progress in terms of bottlenecks, there are several bottlenecks. If you go back to the, if, if you, and, um, one of them is ideas and one of them is your ability to bring them to life-

    11. DP

      Yeah.

    12. IS

      ... which might be compute, but also engineering. So if you go back to the '90s let's say, you had people who had, had pretty good ideas. And if they had much larger computers, maybe they could demonstrate that their ideas were viable. But they could not. So they could only have very, very small demonstration that did not convince anyone.

    13. DP

      Yeah.

    14. IS

      So the bottleneck was compute. Then in the age of scaling, computers increased a lot. And of course there is a question of how much compute is needed, but compute is large. So compute is large enough such that it's, like, not obvious that you need that much more compute to prove some idea. Like, I'll give you an analogy. AlexNet was built on two GPUs. That was the total amount of compute used for it. The transformer was built on eight to 64 GPUs. No single transformer paper experiment used more than 64 GPUs of 2017, which would be like, what, two GPUs of today? So the ResNet, right? Many, like even, even the, the, um, you could argue that the, like, O1 reasoning was not the most compute-heavy thing in the world. So there definitely for, for research you need, like, definitely some amount of compute. But it's far from obvious that you need the absolutely largest amount of compute ever for research.

    15. DP

      Hmm.

    16. IS

      You might argue, and I think it is true, that if you want to build the absolutely best system, if you want to build the absolutely best system, then it helps to have much more compute. And especially if everyone is within the same paradigm, then compute becomes one of the big differentiators.

    17. DP

      Yeah. I guess while it was possible to develop these ideas, I'm, I'm asking you for the history 'cause you were actually there. I'm not sure what actually happened. But it sounds like it was possible to develop these ideas using minimal amounts of compute, but it wasn't... the transformer didn't immediately become famous. It became the thing everybody started doing and then started experimenting on top of and building on top of because it was validated at higher and higher levels of compute.

    18. IS

      Correct.

    19. DP

      And if you at SSI have 50 different ideas, how will you know which one is the next transformer and which one is, you know, brittle without having the kinds of compute that other frontier labs have?

    20. IS

      So I can, I can comment on that, which is, the short comment is that, you know, you mentioned SSI. Specifically for us, the amount of compute that SSI has for research is really not that small, and I want to explain why. Like, a simple math can explain why the amount of compute that we have is actually a lot more comparable for research than one might think. And I'll explain. So SSI has raised $3 billion, which is, like, not small, but it's, like, a lot by any absolute sense. But you could say, "But look at the other companies raising much more." But a lot of what they're, a lot of their compute goes for inference. Like, these big numbers, these big loans, it's earmarked for inference. That's number one. Number two, you need, if you want to have a product on which you do inference, you need to have a big staff of engineers, of salespeople. A lot of the research needs to be dedicated for producing all kinds of product-related features. So then when you look at what's actually left for research, the difference becomes a lot smaller. Now, the other thing is, is that if you are doing something different, do you really need the absolute maximal scale to prove it? I don't think that's true at all. I think that in our case, we have sufficient compute to prove, to convince ourselves and anyone else that what we are doing is correct.

    21. DP

      There's been public estimates that, uh, you know, companies like OpenAI spend on the order of 5, $6 billion a year, even, just so far on experiments.

    22. IS

      Mm-hmm.

    23. DP

      This is separate from the amount of money they're spending on inference and so forth. So it seems like they're spending more a year running exper- like, research experiments than you guys have in total funding.

    24. IS

      I think it's a question of what you do with it. It's a question of what you do with it. Like, they have a, like, the more I think in, in, in their case, in the case of others, I think there is a lot more demand on the training compute. There's a lot more different workstreams. There is, there are different modalities. There is just more stuff, and so it becomes fragmented.

    25. DP

      How will SSI make money?

    26. IS

      You know...My answer to this question is something like we just fo- right now, we just focus on the research and then the answer to that question will reveal itself. I think there will be lots of possible answers.

    27. DP

      Hmm. Is SSIs plan still to straight shot superintelligence?

    28. IS

      Maybe. I think that there is merit to it.

    29. DP

      Hmm.

    30. IS

      I think there's a lot of merit, because I think that it's very nice to not be affected by the day-to-day market competition. But I think there are two reasons that may cause us to change the plan. One is pragmatic, if timelines turned out to be long, which they might. And second, I think there is a lot of value in the best and most powerful AI being out there impacting the world.

  6. 46:4755:07

    SSI’s model will learn from deployment

    1. IS

      even in the straight shot scenario, you would still do a gradual release of it, is how I would imagine it. The- the gra- gra- gradualism would be an inhe- inherent- inherent component of any plan. It's just a question of what is the first thing that you get out of the door? That's number one. Number two, I also think, you know, I believe you have advocated for continual learning more than other people.

    2. DP

      Yeah.

    3. IS

      And I actually think that this is an important and correct thing. And here is why. So one of the things... So I'll give you another example of how thinking, uh, how language affects thinking. And in this case, it is, will be two words. Two words that have shaped everyone's thinking, I maintain. Fir- first word, AGI. Second word, pre-training. Let me explain. So the word, the term AGI, why does this term exist? It's a very particular term. Why does it exist? There's a reason. The reason that the term AGI exists is in, in my opinion, not so much because it's, like, a very important essential descriptor of int- of- of some end state of intelligence. But because it is a reaction to a different term that existed, and the term is narrow AI. If you go back to ancient history of game-playing AI, of checkers AI, chess AI, computer games AI, everyone would say, "Look at this narrow intelligence. Sure, the chess AI can beat Kasparov, but it can't do anything else. It is so narrow, artificial narrow intelligence." So in response, as a reaction to this, some people said, "Well, this is not good. It is so narrow. What we need is general AI." General AI, an AI that can just do all the things.The second... And, and that term just got a lot of traction.

    4. DP

      Yeah.

    5. IS

      The second thing that got a lot of traction is pre-training, specifically the recipe of pre-training. I think the current, uh, th- the way people do RL now is maybe, um, un- is undoing the conceptual imprint of pre-training. But pre-training had the property, you do more pre-training and the model gets better at everything more or less uniformly.

    6. DP

      Yeah.

    7. IS

      General AI. Pre-training gives AGI. But the thing that happened with AGI and pre-training is that in some sense they overshot the target. Because by the kind... If you think about the term AGI, you will realize, and especially in the context of pre-training, you will realize that a human being is not an AGI, because a human being... Yes, we- there is definitely a foundation of skills. A human being, a human being lacks a huge amount of knowledge. Instead, we rely on continual learning. We rely on continual learning. And so then when you think about, okay, so let's suppose that we achieve success and we produce a safe super- in- some kind of safe superintelligence. The question is, but how do you define it? Where on the curve of continual learning is going to be? I produce like, um, a superintelligent 15-year-old that's very eager to grow and you say, "Okay, I'm going to..." They don't know very much at all. They're a great student, very eager. "You go and be a programmer. You go and be a doctor. Go and learn." So you could imagine that the deployment itself will involve some kind of a learning-

    8. DP

      Mm-hmm.

    9. IS

      ... trial and error period. It's a process as opposed to you drop the finished thing.

    10. DP

      Okay. I, I, I, I see. So you're- you're suggesting that the thing you're pointing out with superintelligence is not some finished mind which knows how to do every single job in the economy 'cause the way, say, the original, I think, OpenAI charter or whatever defines AGI as, like, it can do every single job that a h- every single thing a human can do. You're proposing instead a mind which can learn to do any single, every single job.

    11. IS

      Yes.

    12. DP

      And that is superintelligence. And then but once you have the learning algorithm, it gets deployed into the world the same way a human laborer might join an organization.

    13. IS

      Exactly.

    14. DP

      And it seems like one of these two things might happen. Maybe neither of these happens. One, this super efficient learning algorithm becomes superhuman, becomes as good as you and potentially even better at the task of ML research. And as a result, the algorithm itself becomes more and more superhuman. The other is even if that doesn't happen, if you have a single model, th- I mean, this, this is explicitly your vision. If you have a single model or instances of a model which are deployed through the economy, doing different jobs, learning how to do those jobs, continually learning on the job, picking up all the skills that any human could pick up, but actually picking them all up at the same time and then amalgamating their learnings, you basically have a model which functionally becomes superintelligent even without any sort of recursive self-improvement in software, right? Because you would now have one model that can do every single job in the economy, and humans can't merge our minds in the same way. And so do you expect some sort of, like, intelligence explosion from broad deployment?

    15. IS

      I think that it is likely that we will have rapid economic growth. I think the broad deployment, like, there are two arguments you could make which are conflicting. One is that, look, if indeed you get... Once indeed you get to a point where you have an AI that can learn to do things quickly and you have many of them, then they will, then there will be a strong force to deploy them in the economy unless there will be some kind of a regulation that stops it, which by the way there might be. But I think the idea of very rapid economic growth for some time, I think it's very possible from broad deployment. The other question is how rapid it's going to be. So I think this is hard to know because on the one hand, you have this very efficient worker. On the other hand, there is, the world is just really big and there's a lot of stuff, and that stuff moves at a different speed. But then on the other hand, now the AI could-

    16. DP

      Yeah, exactly.

    17. IS

      ... you know? So I think very rapid economic growth is possible, and we will see, like, all kinds of things like different countries with different rules and the ones which have different rules, the economic growth will be faster. Hard to predict.

    18. DP

      Some people in our audience like to read the transcripts instead of listening to the episode, and so we put a ton of effort into making the transcripts read like they are stand-alone essays. The problem is that if you just transcribe a conversation verbatim using a speech-to-text model, it'll be full of all kinds of fits and starts and confusing phrasing. We mentioned this problem to Labelbox and they asked if they could take a stab. Working with them on this is probably the reason that I'm most excited to recommend Labelbox to people. It wasn't just, "Oh, hey, tell us what kind of data you need and we'll go get it." They walked us through the entire process, from helping us identify what kind of data we needed in the first place to assembling a team of expert aligners to generate it. Even after we got all the data back, Labelbox stayed involved. They helped us choose the right base model and set up auto QA on the model's output so that we could tweak and refine it. And now we have a new transcriber tool that we're gonna use for all our episodes moving forward. This is just one example of how Labelbox meets their customers at the ideas level and partners with them through their entire journey.If you wanna learn more, or if you wanna try out the transcriber tool yourself, go to labelbox.com/dwarkesh.

  7. 55:071:18:13

    Alignment

    1. DP

      It seems to me that this is a very precarious situation to be in, where, looking at the limit, we know that this should be possible. Because if you have something that is as good as a human at learning, but which can merge its brains, merge th- their, uh, different instances in a way that humans can't merge, already this seems like a thing that should physically be possible. Humans are possible. Digital computers are possible. You just need both of those combined to f- produce this thing. And it also seems like this kind of thing is extremely, um, powerful and economic gre- growth is one way to put it. Um, I mean, Dyson Sphere is a lot of economic growth, but another way to put it is just, like, you will have potentially a very short period of time, because a human on the job can... You know, you're, you're hiring people at SSI. In six months, they're, like, net productive probably, right? Um, a human, like, learns really fast, and so this thing is becoming smarter and smarter very fast. What is, how do you think about making that go well, and why is SSI positioned to do that well? What is SSI's plan there basically, is what I'm trying to ask.

    2. IS

      Yeah. So one of the, one of the ways in which my thinking has been changing is that I now place more importance on AI being deployed incrementally and in advance. One very difficult thing about AI is that we are talking about systems that don't yet exist, and it's hard to imagine them. I think that one of the things that's happening is that, in practice, it's very hard to feel the AGI. It's very hard to feel the AGI. We can talk about it, but it's like, it's like talking about, like, the long fu- like, imagine, like, having a conversation about, like, how is it like to be old, when you're, like, old and, and frail. And you can have a conversation. You can try to imagine it, but it's just hard, and you come back to reality, where that's not the case. And I think that a lot of the issues around AGI and its future power stem from the fact that it's very difficult to imagine. Future AI is going to be diff- different. It's going to be powerful. Indeed, the whole problem... What is the problem of AI and AGI? The whole problem is the power. The whole problem is the power. When the power is really big, what's gonna happen? And one of the, one of the ways in which I've changed my mind over the past year, and so that, that change of mine may back... may, I'll say, I'll, I'll, I'll hedge a little bit, may back-propagate into, into the plans of our, of our company, is that... So if it's hard to imagine, what do you do? You got to be showing the thing. You got to be showing the thing. And I maintain that... I think, I think most people who work on AI also can't imagine it, because it's too different from what people see on a day-to-day basis. I do maintain... Here is something which I predict will happen. That's a prediction. I maintain that as AI becomes more powerful, then people will change their behaviors, and we will see all kinds of unprecedented things which are not happening right now. And I'll give some examples. I do, like... I, I think, I think for better or worse, the, the frontier companies will play a very important role in what happens, as will the government. And the kind of things that I think we'll see, which you see the beginnings of, companies that are fierce competitors starting collabor- to, to collaborate on AI safety. You may have seen OpenAI and Anthropic ev- uh, doing a first small step, but that did not exist. That's actually something which I predicted in one of my talks about three years ago, that such a thing will happen. I also maintain that as AI continues to become more powerful, more visibly powerful, there will also be a desire from governments and the public to do something. And I think that this is a very important force of showing the AI. That's number one. Number two. Okay, so then the AI has been built. What needs to, what needs to be done? So, one thing that I maintain that will happen is that right now people who are working on AI, I maintain that the AI doesn't feel powerful because of its mistakes. I do think that at some point the AI will start to feel powerful, actually. And I think when that happens, we will see a big change in the way all AI companies approach safety. They'll become much more paranoid. I think... Like, I say this as a predi- as a, as a, as a prediction that we will see happen. We'll see if I'm right. But I think this is something that will happen, because they will see the AI becoming more powerful. Everything that's happening right now, I maintain, is because people look at today's AI, and it's hard to imagine the future AI. And there is a third thing which needs to happen. And I think this is, this, this... An- and I'm talking about it in, in broader terms, not just from the perspective of SSI, because you asked me about our company. But the question is, okay, so then what should, what should the companies aspire to build?

    3. DP

      Yeah.

    4. IS

      What should they aspire to build?And there has been one big idea that actually every- that, um, everyone has been locked in- locked into, which is the- the self-improving AI. And why- why did it happen? Because there is fewer ideas than companies. But I maintain that there is something that's better to build, and I think that everyone will actually want that. It's like the AI that's robustly aligned to care about sentient life specifically. I think in particular it will be... there is a case to be made that it will be easier to build an AI that cares about sentient life than an AI that cares about human life alone because the AI itself will be sentient. And if you think about things like mirror neurons and human empathy for animals, which is, you know, you might argue it's not big enough, but it exists. I think it's an emergent property from the fact that we model others with the same circuit that we use to model ourselves, because that's the most efficient thing to do.

    5. DP

      So even if you got an AI to care about sentient beings, and it's not actually clear to me that that's what you should try to do if you solve alignment, it would still be the case that most sentient beings will be AIs. There will trillions, eventually quadrillions of AIs. Humans will be a very small fraction of sentient beings. So it's not clear to me if the goal is some kind of human control over this future civilization, that this is the best criterion.

    6. IS

      It's true. I- I think that it's possible it's not the best criterion. I'll say two things. I think that... Thing number one, I think that if there re- so I think that care for sentient life, I think there is merit to it. I think it should be considered. I think that it will be helpful if there was some kind of a short list of ideas that then the companies, when they are in this situation, could use. That's number two. Number three, I think it would be really materially helpful if the power of the most powerful superintelligence was somehow capped, because it would address a lot of these concerns. The question of how to do it, I'm not sure, but I think that would be materially helpful when you're talking about really, really powerful systems.

    7. DP

      Yeah. Um, before we continue the alignment discussion, I- I- I wanna double click on that. How much room is there at the top? How do you think about superintelligence? Do you think... I mean, using this learning efficiency idea, maybe it's just extremely fast at learning new skills or new knowledge. And does it just have a bigger pool of strategies? Is there a single cohesive it in the center that's more powerful or bigger? And if so, do you- do you imagine that this will be sort of godlike in comparison to the rest of human civilization or does it just feel like another agent or another cluster of agents?

    8. IS

      So this is an area where different people have different intuitions.

    9. DP

      Yeah.

    10. IS

      I think it will be very powerful for sure. I think that... What I think is most likely to happen is that there will be multiple such AIs being created roughly at the same time.

    11. DP

      Mm.

    12. IS

      I think that if the cluster is big enough, like if the cluster is literally continent-sized, that thing could be really powerful indeed, right? If you literally have a continent-sized cluster, like those- those AIs can be very powerful. And I- I... All I can tell you is that if you're talking about extremely powerful AIs, like truly dramatically powerful, then yeah, it would be nice if they could be restrained in some ways or if there was some kind of an agreement or something. Because I think that if you are saying, hey, like, if- if you really... Like what- what is the- the concern of superintelligence? What is one way to explain the concern? If you imagine a system that is sufficiently powerful, like really sufficiently powerful, and you could say, "Okay, you need to do something sensible, like care for sentient life, let's say, in a very single-minded way," we might not like the results. That's really what it is. And so maybe by the way, the answer is that you do not build a single... you do not build an RL agent in the usual sense. And actually, I'll point se- I'll point several things out. I think human beings are a semi-RL agent. You know, we pursue a reward and then the emotions or whatever make us tire out of the reward, we pursue a different reward. The market is like kind- it's like a very shortsighted kind of agent. Evolution is the same. Evolution is very intelligent in some ways, but very dumb in other ways. The government has been designed to be a never-ending fight between three parts, which has an effect. So I think things like this. Another thing that makes this discussion difficult is that we are talking about systems that don't exist, that we don't know how to build. Right? That's the other thing, and that's actually my belief. I think what people are doing right now will go some distance and then peter out. It will continue to improve, but it will also not be it. So the it, we don't know how to build, and I think that a lot hin- a- a lot hinges on understanding reliable generalization. And I'll say another thing, which is like, you know, one of the things that you could say is what- that cause alignment to be difficult is that human value... that it's- it- it's, um, your ability to learn human values is fragile, then your ability to optimize them is fragile. Will you actually learn to optimize them? And then can't you say, "Are these not all instances of unreliable generalization?"... why is it that human beings appear to generalize so much better? What if generalization was much better? What would happen in this case? What would be the effect? But those, we can't, we c- we can't, like, those questions are, right now, still unanswerable.

    13. DP

      Um, how does one th- think about what AI going well looks like? I- 'cause I think you've scoped out how AI might evolve. We'll have these sort of continual learning agents. AI will be very powerful. Maybe there will be many different AIs. How do you think about lots of continent, compute size intelligences going around? How dangerous is that? How do we make that less dangerous? And how do we do that in a way that protects a equilibrium where there might be misaligned AIs out there and bad actors out there?

    14. IS

      So, one reason why I liked the AI that cares for sentient life-

    15. DP

      Hmm.

    16. IS

      ... you know, and we can debate on whether it's good or bad. But if the first N of these dramatic systems actually do care for, you know, love humanity or something, you know, care for sentient life. Obviously, this also needs to be achieved. This needs to be achieved. So, if this is achieved by the first N of those systems, then the- then I can see it go well, at least for quite some time. And then there is the question of, what happens in the long run?

    17. DP

      Yeah.

    18. IS

      What happens in the long run? How do you achieve a long-run equilibrium? And I think that there, there is an answer as well, and I don't like this answer, but it needs to be considered. In the long run, you might say, okay, so if you have a world where powerful AIs exist, in the short run, you could say, okay, you have universal high income. You have universal high income and we're all doing well. But we know that, what do the Buddhists say? Change is the only constant. And so things change, and there is some kinda government political structure thing and it changes, because these things have a shelf life. You know, some n- new government thing comes up and it functions, and then after some time, it stops functioning. That's something that you see happening all the time. And so I think that for the long-run equilibrium, one approach, you could say, okay, so maybe every person will have an AI that will do their bidding, and that's good. And if that could be maintained indefinitely, that's true. But the downside with that is, okay, so then the AI goes and like, earns- earn- earn- you know, earns money for, for the person and, you know, advocates for their needs in like, the political sphere. And maybe then writes a little report saying, "Okay, here's what I've done. Here's the situation." And the person says, "Great, keep it up." But the person is no longer a participant. And then you can say that's a precarious place to be in. But- so I- I'm gonna preface by saying I don't like the solution, but it is a solution. And the solution is if people become part AI with some kind of Neuralink++. Because what will happen as a result is that now the AI understands something and we understand it too. Like, because now the understanding is transmitted wholesale. So now if the AI is in some situation, now it's like you are involved in that situation yourself fully. And I think this is the answer to the equilibrium.

    19. DP

      I wonder if, uh, the fact that emotions, which were developed millions, or in many cases, billions of years ago in a totally different environment, are still guiding our actions so strongly is an example of alignment success. Uh, to- to maybe spell out what I mean, the brain stem has these... I don't know if it's more accurate to call it a value function or a reward function, but the brain stem has a directive where it's saying, "Mate with somebody who's more successful." The cortex is the part that understands, what does success mean in the modern context? But the brain stem is able to align the cortex and say, "However you recognize success to be," and I- I'm not smart enough to understand what that is, "you're still gonna pursue this directive."

    20. IS

      I think, I think there is... So I think there's a more general point.

    21. DP

      Hmm.

    22. IS

      I think it's actually really mysterious how the brain encodes high-level desires. Sorry, how evolution encodes high-level desires.

    23. DP

      Hmm.

    24. IS

      Like, it's pretty easy to understand how evolution would, would endow us with the desire for food that smells good, 'cause smell is a chemical, and so just pursue that chemical. It's very easy to imagine such a mecha- evolution doing such a thing. But evolution also has- has endowed us with all these social desires. Like we, we really care about being seen positively by society. We care about being in a good standing. We... Like, all these social intuitions that we have, I feel strongly that they are baked in, and I don't know how evolution did it, because it's a high-level concept that's represented in the brain. Like, what people think, like let's say you were like... you care about some social thing. It's not like a low-level signal like smell. It's not something that- for which there is a sensor. Like, the brain needs to do a lot of processing to piece together lots of bits of information to understand what's going on socially. And somehow evolution said, "That's what you should care about."

    25. DP

      Yes.

    26. IS

      How did it do it? And it did it quickly, too.

    27. DP

      Yeah.

    28. IS

      Because I think all these sophisticated social things that, um, we care about, I think they evolved pretty recently.

    29. DP

      Yeah.

    30. IS

      So evolution had an easy time hardcoding this high-level desire and-... I maintain, or, you know, at least I'll say, I'm unaware of good hypotheses for how it's done. I ha- I had some ideas I was kicking around, but none of them, none of them, um, are satisfying.

  8. 1:18:131:29:23

    “We are squarely an age of research company”

    1. DP

      What is SSI planning on doing differently? So presumably your plan is to be one of the frontier companies wh- when this time arrives. And then what is... Presumably you started SSI because you're like, "I, I think I have a way of approaching how to do this safely in a way that the other companies don't." What, what is that difference?

    2. IS

      So the way I would describe it as there are some ideas that I think are promising, and I wanna investigate them and see if they are indeed promising or not. It's really that simple. It's an attempt. I think that if the ideas turn out to be correct, these ideas that we discussed around understanding generalization.

    3. DP

      Mm.

    4. IS

      If these ideas turn out to be correct, then I think we will have something worthy. Will they turn out to be correct? We are doing research. We are squarely age of research company. We are making progress. We've actually made quite good progress over the past year.

    5. DP

      Mm-hmm.

    6. IS

      But we need to keep making more progress.

    7. DP

      Yeah.

    8. IS

      More research.And that's how I see it. I see it as an attempt to be... an attempt to be a voice and a participant.

    9. DP

      Um, people have asked, uh, your co-founder and previous CEO left to go to Meta recently, and people have asked, "Well, if there was a lot of breakthroughs being made, that seems like a thing that should've been unlikely." And I, I wonder how you respond.

    10. IS

      Yeah, so I, uh, in, for, for this, I will simply remind a few facts that may have been forgotten, and I think this, these facts which provide the context, I think they explain the situation. So the context was that we were fundraising at a 32 billion valuation, and then Meta, um, came in and offered to, to acquire us. And I said no, but my former co-founder, like in some sense, said yes. And as a result, he also was able to enjoy from a lot of near-term liquidity, and he was the only person from SSI to join Meta.

    11. DP

      It sounds like SSI's plan is to be a company that is at the frontier when you get to this very important period in human history, where you have superhuman intelligence and you have these ideas about how to make superhuman intelligence go well. But other companies will be trying their own ideas. What distinguishes SSI's approach to making superintelligence go well?

    12. IS

      The, the main thing that distinguishes SSI is its technical approach. So we have a different technical approach that I think is worthy, and we are pursuing it. I maintain that in the end, there will be a convergence of strategies. So I think there will be a convergence of strategies where, at some point, as AI becomes more powerful, it's going to become more or less clearer to everyone what the strategy should be. And it should be something like, yeah, you need to find some way to talk to each other, and you want your first actual, like, real superintelligent AI to be aligned and somehow be, you know, care for sentient life, care for people, democratic, one of those, some combination of thereof. And I think this is the condition that everyone should strive for, and that's what SSI is striving for. And I think that with time, if not already, all the other companies will realize and they're striving towards the same thing. And we'll see. I think that the world will truly change as AI becomes more powerful.

    13. DP

      Yeah.

    14. IS

      And I think a lot of these forecasts will, like, I think things will be really different and people will be acting really differently.

    15. DP

      What, uh, speaking of forecasts, what are your forecasts to this system you're describing which can learn as well as a human, and, uh, subsequently, as a result becomes superhuman?

    16. IS

      I think, like, uh, five to 20.

    17. DP

      Five to 20 years?

    18. IS

      Mm-hmm.

    19. DP

      So ju- I just wanna unroll your, how you might see the world coming. It's like, we have a couple more years where these other companies are continuing the current approach and it st- stalls out. And stalls out here meaning they earn no more than low hundreds of billions in revenue, or h- h- how do you think about what stalling out means?

    20. IS

      Yeah. I think the rev, I think it could, I think it could stall out, and I think stalling out will look like, it will all look very similar-

    21. DP

      Yeah.

    22. IS

      ... among all the different companies, something like this. I'm not sure because I think, I think, I think even with, I think even, I think even with stalling out, I think these companies could make a stupendous, stupendous revenue. Maybe not profits because they will be, it will be, they will need to work hard to differentiate each other from themselves. But revenue, definitely.

    23. DP

      But there's so- is something in your model implies that the, whe- when the correct solution does emerge, there will be convergence between all the companies. And I'm curious why you think that's the case.

    24. IS

      Well, I was talking more about convergence on their larger strategies.

    25. DP

      Hmm.

    26. IS

      I think eventual convergence on the technical approach is probably gonna happen as well, but I in- I was alluding to convergence to the larger strategies.

    27. DP

      Right.

    28. IS

      So what, what, what exactly is the thing that should be done?

    29. DP

      I, I just wanna be- better understand how you see the future unrolling. So currently, we have these different companies, and you expect their approach to continue generating revenue.

    30. IS

      Yes.

Episode duration: 1:36:03

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode aR20FWCCjAs

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome