Dwarkesh Podcast

Ilya Sutskever on Dwarkesh Patel: Why RL Overfits the Evals

Why RL targeting benchmark evals creates models that ace GPT-3 tests but cycle bugs: Sutskever links this to skipping value functions in the training mix.

Ilya SutskeverguestDwarkesh Patelhost

Nov 25, 20251h 36mWatch on YouTube ↗

WHAT IT’S REALLY ABOUT

Ilya Sutskever: Beyond scaling laws toward deeply generalizing superintelligence

Ilya Sutskever argues that the era of simply scaling pre‑training is ending and we are re‑entering an era where genuine research and new training recipes matter more than raw compute. He highlights a glaring gap between benchmark performance and real‑world usefulness, blaming overfitting to evals, weak generalization, and poorly understood RL fine‑tuning. Much of the discussion contrasts human learning and robustness with current models, exploring value functions, emotions, evolution, and why humans generalize so much better from far less data. Sutskever outlines SSI’s bet on a different technical path to human‑like continual learners, the societal implications of such systems, and his views on alignment, superintelligence, and what “AI going well” might require.

IDEAS WORTH REMEMBERING

5 ideas

Benchmark‑driven RL can cause models to overfit evals while underperforming in reality.

Teams design RL environments inspired by public benchmarks, so models become like hyper‑specialized competition coders: great on targeted tests, but surprisingly brittle and repetitive in open‑ended workflows.

Pre‑training reached diminishing returns; future gains demand new recipes, not just more scale.

Pre‑training was a clear, low‑risk scaling recipe—add data, compute, parameters—but data is finite, compute is now huge, and 100× more of the same is unlikely to radically transform capabilities, pushing the field back into exploratory research.

Generalization is the core unsolved problem separating current models from human‑like intelligence.

Humans learn deeply and robustly from tiny amounts of data—even in domains like math and coding that didn’t shape our evolution—while today’s models require massive data and still fail in simple but off‑distribution situations.

Value functions and richer intermediate feedback could make RL vastly more compute‑efficient.

Instead of only rewarding final outcomes after long trajectories, learning robust value estimates for partial progress (as humans do with emotions and gut feelings) could massively reduce wasted exploration and improve stability.

Alignment may be easier if advanced AIs care about sentient life, including themselves.

Sutskever suggests that an AI which models itself as a sentient being could more naturally extend empathy to other sentient beings, analogous to human mirror neurons and empathy, though this may not perfectly align with specifically human interests.

WORDS WORTH SAVING

5 quotes

The models seem smarter than their economic impact would imply.

— Ilya Sutskever

Up until 2020 it was the age of research; from 2020 to 2025 it was the age of scaling; now it’s back to the age of research again, just with big computers.

— Ilya Sutskever

These models somehow just generalize dramatically worse than people, and it’s super obvious.

— Ilya Sutskever

I think the fact that people are like that is proof it can be done.

— Ilya Sutskever

There are more companies than ideas by quite a bit.

— Ilya Sutskever

Gap between model benchmark scores and real‑world economic impactLimitations of pre‑training and the shift from scaling to researchRole of RL, value functions, and eval‑driven reward hackingHuman vs. machine generalization, sample efficiency, and continual learningEmotions, evolution, and value functions as alignment analogiesSSI’s strategy, compute needs, and path toward superintelligenceAlignment visions: caring for sentient life, deployment, and long‑run equilibria

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.