
Ilya Sutskever – We're moving from the age of scaling to the age of research
Ilya Sutskever (guest), Dwarkesh Patel (host)
In this episode of Dwarkesh Podcast, featuring Ilya Sutskever and Dwarkesh Patel, Ilya Sutskever – We're moving from the age of scaling to the age of research explores ilya Sutskever: Beyond scaling laws toward deeply generalizing superintelligence Ilya Sutskever argues that the era of simply scaling pre‑training is ending and we are re‑entering an era where genuine research and new training recipes matter more than raw compute. He highlights a glaring gap between benchmark performance and real‑world usefulness, blaming overfitting to evals, weak generalization, and poorly understood RL fine‑tuning. Much of the discussion contrasts human learning and robustness with current models, exploring value functions, emotions, evolution, and why humans generalize so much better from far less data. Sutskever outlines SSI’s bet on a different technical path to human‑like continual learners, the societal implications of such systems, and his views on alignment, superintelligence, and what “AI going well” might require.
Ilya Sutskever: Beyond scaling laws toward deeply generalizing superintelligence
Ilya Sutskever argues that the era of simply scaling pre‑training is ending and we are re‑entering an era where genuine research and new training recipes matter more than raw compute. He highlights a glaring gap between benchmark performance and real‑world usefulness, blaming overfitting to evals, weak generalization, and poorly understood RL fine‑tuning. Much of the discussion contrasts human learning and robustness with current models, exploring value functions, emotions, evolution, and why humans generalize so much better from far less data. Sutskever outlines SSI’s bet on a different technical path to human‑like continual learners, the societal implications of such systems, and his views on alignment, superintelligence, and what “AI going well” might require.
Key Takeaways
Benchmark‑driven RL can cause models to overfit evals while underperforming in reality.
Teams design RL environments inspired by public benchmarks, so models become like hyper‑specialized competition coders: great on targeted tests, but surprisingly brittle and repetitive in open‑ended workflows.
Get the full analysis with uListen AI
Pre‑training reached diminishing returns; future gains demand new recipes, not just more scale.
Pre‑training was a clear, low‑risk scaling recipe—add data, compute, parameters—but data is finite, compute is now huge, and 100× more of the same is unlikely to radically transform capabilities, pushing the field back into exploratory research.
Get the full analysis with uListen AI
Generalization is the core unsolved problem separating current models from human‑like intelligence.
Humans learn deeply and robustly from tiny amounts of data—even in domains like math and coding that didn’t shape our evolution—while today’s models require massive data and still fail in simple but off‑distribution situations.
Get the full analysis with uListen AI
Value functions and richer intermediate feedback could make RL vastly more compute‑efficient.
Instead of only rewarding final outcomes after long trajectories, learning robust value estimates for partial progress (as humans do with emotions and gut feelings) could massively reduce wasted exploration and improve stability.
Get the full analysis with uListen AI
Alignment may be easier if advanced AIs care about sentient life, including themselves.
Sutskever suggests that an AI which models itself as a sentient being could more naturally extend empathy to other sentient beings, analogous to human mirror neurons and empathy, though this may not perfectly align with specifically human interests.
Get the full analysis with uListen AI
Continual, human‑like learners deployed across the economy could trigger rapid growth.
Instead of a single monolithic AGI that ‘knows everything,’ Sutskever envisions powerful learners that can quickly master any job, whose many instances specialize, learn on the job, and potentially aggregate their knowledge, driving very fast economic expansion.
Get the full analysis with uListen AI
Future safety strategy will likely converge once systems feel unmistakably powerful.
He predicts that as models begin to clearly feel powerful to their creators (not just impressive on paper), frontier labs and governments will become more paranoid, collaborate more on safety, and seek common strategies for constraining superintelligent systems.
Get the full analysis with uListen AI
Notable Quotes
“The models seem smarter than their economic impact would imply.”
— Ilya Sutskever
“Up until 2020 it was the age of research; from 2020 to 2025 it was the age of scaling; now it’s back to the age of research again, just with big computers.”
— Ilya Sutskever
“These models somehow just generalize dramatically worse than people, and it’s super obvious.”
— Ilya Sutskever
“I think the fact that people are like that is proof it can be done.”
— Ilya Sutskever
“There are more companies than ideas by quite a bit.”
— Ilya Sutskever
Questions Answered in This Episode
If generalization is the core bottleneck, what concrete research directions could most improve it beyond today’s architectures and training regimes?
Ilya Sutskever argues that the era of simply scaling pre‑training is ending and we are re‑entering an era where genuine research and new training recipes matter more than raw compute. ...
Get the full analysis with uListen AI
How can we design RL and eval systems that incentivize real‑world robustness instead of benchmark overfitting and ‘reward hacking’ by researchers?
Get the full analysis with uListen AI
What would a practical implementation of an AI value function that mirrors human emotions and gut judgments actually look like in modern ML systems?
Get the full analysis with uListen AI
How realistic—and desirable—is the idea of advanced AIs that explicitly care about sentient life, given conflicts between human and non‑human interests?
Get the full analysis with uListen AI
In a world of many superintelligent, continually learning agents, what governance or technical mechanisms could robustly cap their power and prevent destructive competition?
Get the full analysis with uListen AI
Transcript Preview
You know what's crazy?
Uh-huh.
That all of this is real.
Yeah? Meaning what?
Don't- don't you think so?
Meaning what?
Like all this AI stuff, and all this Bay-
Like, it actually happened?
...Area? Yeah. That it's happe- like, isn't it straight out of science fiction?
Yeah. I- i- another thing that's crazy is, like, how normal this low takeoff feels. The idea that we'd be investing 1% of GDP in AI, like, I feel like it would ha- felt like a bigger deal, you know? But right now, it just feels like-
We get used to things pretty fast, turns out, yeah. But also, it's kinda like it's abstract, like, what does it mean? What it means that you see it in the news-
Yeah.
...that such and such company announced such and such dollar amount.
Right.
That's- that's all you see.
Right.
It's not really felt in any other way, so far.
No. Should we actually begin here? I think this is interesting discussion.
Sure.
I think your point about, well, from the average person's point of view, nothing is that different, will continue being true, even into the singularity.
No, I don't think so.
Okay. Interesting.
So, the thing which I was referring to not feeling different is, okay, so such and such company announced some, uh, difficult to comprehend dollar amount of investment.
Right.
I don't think anyone knows what to do with that.
Yeah.
But I think that the impact of AI is gonna be felt. AI is going to be diffused through the economy. There are very strong economic forces for this, and I think the impact is going to be felt very strongly.
When do you expect that impact? I think the models seem smarter than their economic impact would imply.
Yeah, this is one of the very confusing things about the models right now, how to reconcile the fact that they are doing so well on evals.
Mm-hmm.
And you look at the evals, and you go, "Those are pretty hard evals."
Right.
They're doing so well. But the economic impact seems to be dramatically behind.
Yes.
And it's almost like it's- it's very difficult to make sense of, how can the model, on the one hand, do these amazing things-
Yeah.
...and then, on the other hand, like, repeat itself twice in some situation, in a kind of a... An- an example would be, let's say you use live coding to do something, and you go to some place, and then you get a bug. And then you tell the model, "Can you please fix the bug?"
Yeah.
And the model says, "Oh, my God. You're so right, I have a bug. Let me go fix that." And it produces a second bug.
Install uListen to search the full transcript and get AI-powered insights
Get Full TranscriptGet more from every podcast
AI summaries, searchable transcripts, and fact-checking. Free forever.
Add to Chrome