Dwarkesh PodcastIlya Sutskever on Dwarkesh Patel: Why RL Overfits the Evals
Why RL targeting benchmark evals creates models that ace GPT-3 tests but cycle bugs: Sutskever links this to skipping value functions in the training mix.
At a glance
WHAT IT’S REALLY ABOUT
Ilya Sutskever: Beyond scaling laws toward deeply generalizing superintelligence
- Ilya Sutskever argues that the era of simply scaling pre‑training is ending and we are re‑entering an era where genuine research and new training recipes matter more than raw compute. He highlights a glaring gap between benchmark performance and real‑world usefulness, blaming overfitting to evals, weak generalization, and poorly understood RL fine‑tuning. Much of the discussion contrasts human learning and robustness with current models, exploring value functions, emotions, evolution, and why humans generalize so much better from far less data. Sutskever outlines SSI’s bet on a different technical path to human‑like continual learners, the societal implications of such systems, and his views on alignment, superintelligence, and what “AI going well” might require.
IDEAS WORTH REMEMBERING
5 ideasBenchmark‑driven RL can cause models to overfit evals while underperforming in reality.
Teams design RL environments inspired by public benchmarks, so models become like hyper‑specialized competition coders: great on targeted tests, but surprisingly brittle and repetitive in open‑ended workflows.
Pre‑training reached diminishing returns; future gains demand new recipes, not just more scale.
Pre‑training was a clear, low‑risk scaling recipe—add data, compute, parameters—but data is finite, compute is now huge, and 100× more of the same is unlikely to radically transform capabilities, pushing the field back into exploratory research.
Generalization is the core unsolved problem separating current models from human‑like intelligence.
Humans learn deeply and robustly from tiny amounts of data—even in domains like math and coding that didn’t shape our evolution—while today’s models require massive data and still fail in simple but off‑distribution situations.
Value functions and richer intermediate feedback could make RL vastly more compute‑efficient.
Instead of only rewarding final outcomes after long trajectories, learning robust value estimates for partial progress (as humans do with emotions and gut feelings) could massively reduce wasted exploration and improve stability.
Alignment may be easier if advanced AIs care about sentient life, including themselves.
Sutskever suggests that an AI which models itself as a sentient being could more naturally extend empathy to other sentient beings, analogous to human mirror neurons and empathy, though this may not perfectly align with specifically human interests.
WORDS WORTH SAVING
5 quotesThe models seem smarter than their economic impact would imply.
— Ilya Sutskever
Up until 2020 it was the age of research; from 2020 to 2025 it was the age of scaling; now it’s back to the age of research again, just with big computers.
— Ilya Sutskever
These models somehow just generalize dramatically worse than people, and it’s super obvious.
— Ilya Sutskever
I think the fact that people are like that is proof it can be done.
— Ilya Sutskever
There are more companies than ideas by quite a bit.
— Ilya Sutskever
High quality AI-generated summary created from speaker-labeled transcript.
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome