Dwarkesh PodcastIlya Sutskever on Dwarkesh Patel: Why RL Overfits the Evals
Why RL targeting benchmark evals creates models that ace GPT-3 tests but cycle bugs: Sutskever links this to skipping value functions in the training mix.
Ilya SutskeverguestDwarkesh Patelhost
CHAPTERS
- 0:00 – 9:39
Explaining model jaggedness
- 9:39 – 18:49
Emotions and value functions
- 18:49 – 25:13
What are we scaling?
- 25:13 – 35:45
Why humans generalize better than models
- 35:45 – 46:47
Straight-shotting superintelligence
- 46:47 – 55:07
SSI’s model will learn from deployment
- 55:07 – 1:18:13
Alignment
- 1:18:13 – 1:29:23
“We are squarely an age of research company”
- 1:29:23 – 1:32:42
- Self-play and multi-agent
- 1:32:42 – 1:36:03
Research taste
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome