Dwarkesh PodcastIlya Sutskever (OpenAI Chief Scientist) — Why next-token prediction could surpass human intelligence
CHAPTERS
Security, misuse, and why alignment gets harder as models surpass us
Ilya frames alignment as especially difficult once models are smarter than humans, including the risk of deception or misrepresented intentions. The conversation also touches on hostile use cases (propaganda/scams) and the feasibility of monitoring or tracking abuse at scale.
- •Alignment difficulty increases sharply for superhuman models, especially if they can deceive
- •Misuse (propaganda, scams) may already be happening; open-source models could be used
- •Large-scale tracking of misuse is possible with dedicated effort
- •Security concerns include potential espionage and model-weight theft
How soon is AGI—and what the “pre-AGI value window” looks like
They discuss the economic value of AI prior to AGI and how long startups and the broader economy can benefit before capabilities become fully general. Ilya emphasizes uncertainty in timelines and warns that optimists often underestimate time-to-AGI.
- •AI value likely grows year-over-year, potentially feeling exponential in hindsight
- •The time window for non-AGI businesses is essentially the same as “how long until AGI”
- •Forecasting is hard; optimistic builders often underestimate timelines
- •AI’s impact may look compressed in retrospect as later years dominate earlier gains
Reliability as the bottleneck for real-world impact and GDP share
Dwarkesh probes what would explain a surprisingly low economic impact by 2030. Ilya argues the main failure mode would be insufficient reliability—needing constant human verification, limiting automation and trust.
- •GDP-share predictions have huge uncertainty (“error bars in log scale”)
- •If AI underdelivers economically, reliability is the most plausible explanation
- •Needing humans to double-check outputs dampens value creation
- •Technological maturity is tightly linked to reliability and robustness
What comes after today’s generative models—and why next-token prediction can go superhuman
Ilya argues the current paradigm can go very far, even if the final AGI form factor differs. He challenges the idea that imitation (next-token prediction) caps performance at human level, proposing that sufficiently capable models can extrapolate to ‘hypothetical wiser agents.’
- •Current generative paradigm likely scales far, though AGI may have a different form factor
- •Future progress may integrate ideas from multiple past approaches
- •Next-token prediction can surpass humans by extrapolating beyond observed behaviors
- •Good prediction implies understanding underlying reality, not mere surface statistics
RLHF, self-improvement, and the ‘thinking out loud’ path to stronger reasoning
They explore how reinforcement learning data is already largely AI-generated, with humans shaping reward models. Ilya supports human–AI collaboration for training and argues models are better at multi-step reasoning when allowed to externalize intermediate thoughts, improving with targeted training.
- •In RLHF, humans train reward models; most generated RL data comes from AI interaction
- •Desirable future: humans contribute a small fraction while AI does most iterative work
- •Models’ multi-step reasoning improves when they can ‘think out loud’
- •Dedicated training and stronger base models are expected to improve reasoning substantially
Data limits, ‘smarter tokens,’ and multimodality as a growth direction
Dwarkesh asks whether the internet will run out of useful tokens and what data sources matter most. Ilya expects eventual data exhaustion and highlights the need for new training methods; he also notes that higher-quality, more ‘interesting’ tokens are especially valuable and that multimodality looks fruitful.
- •Eventually we may exhaust available training data; new methods will be required
- •The data situation is currently still ‘quite good’ with lots left
- •Best data is generally ‘smarter’/more interesting content across sources
- •Text-only can go far, but multimodal training is a promising direction
Algorithmic ideas, retrieval, robotics at scale, and why hardware isn’t the main limiter
In a quick-fire segment, Ilya briefly evaluates research directions like retrieval-augmented transformers and revisits OpenAI’s earlier decision to step back from robotics. He argues robotics progress is possible now but requires massive deployment to generate data; he also downplays hardware as the key bottleneck compared to other constraints.
- •Retrieval-augmented approaches seem promising as a direction
- •OpenAI left robotics due to insufficient data; robotics would have required becoming a robotics company
- •Robotics can work today with large-scale fleets (thousands to hundreds of thousands) and iterative usefulness
- •Hardware is not the fundamental limitation; cost and practical constraints matter more
Alignment without a single definition: layered evaluations, interpretability, and release thresholds
Ilya explains why a single mathematical definition of alignment is unlikely, favoring multiple complementary lenses. He emphasizes adversarial testing, internal inspection, and the need for higher confidence as models become more capable, noting the ambiguity in what counts as AGI.
- •No single mathematical definition; expect multiple partial definitions and checks
- •Alignment assurance should combine behavioral tests, adversarial stress tests, and ‘looking inside’ models
- •Release confidence must increase with capability; AGI itself is an ambiguous threshold
- •A vision: smaller, better-understood nets may help verify larger, opaque nets
AI doing AI research: idea generation as the bottleneck and a ‘prize’ for alignment progress
They discuss when AI will meaningfully contribute to research, with Ilya highlighting that high-quality ideas and insights are the real bottleneck. On incentivizing alignment breakthroughs, he suggests retroactive evaluation—awarding a prize years later once the field can identify what truly mattered.
- •Near-term AI contribution: suggesting fruitful experiments and insights to researchers
- •Key bottleneck is generating good ideas, not just execution
- •Alignment prizes are hard to specify; retroactive awards may better capture true breakthroughs
- •End-to-end training and modular ‘connecting things’ are both viewed as promising
Economics of models: revenue forecasting, inference cost, and avoiding commoditization
Ilya describes how OpenAI extrapolates from existing product traction rather than speculation. He reframes inference cost as value-relative (expensive is fine if outputs are reliable and useful), and explains differentiation strategies amid commoditization pressure: continuous progress, cost improvements, and specialization.
- •Revenue forecasts rely on observed growth (API, DALL·E, ChatGPT), not guesswork
- •Inference cost isn’t prohibitive if usefulness outweighs expense (e.g., legal advice analogy)
- •Market already segments by model sizes and price-performance needs
- •To avoid commoditization: keep improving capability, reliability, trustworthiness; reduce serving costs; specialize
Research ecosystem dynamics: convergence, secrecy, and threats like weight theft
They explore whether AI labs will converge on similar approaches or diverge into distinct bets. Ilya predicts a cycle of convergence on near-term work, divergence on longer-term directions, then reconvergence once breakthroughs bear fruit—slowed somewhat by reduced publishing—alongside ongoing security concerns about espionage and leaks.
- •Expect convergence on near-term work, divergence on longer-term bets, then convergence again
- •Reduced publishing means promising directions take longer to spread or be rediscovered
- •Spies and weight theft are real concerns; strong security practices are essential
- •These risks apply broadly to any organization building frontier models
Emergent properties, scaling laws, and the ‘inevitability’ of the deep learning moment
Ilya anticipates surprising emergent behaviors, especially reliability and controllability, and discusses the limits of predicting capabilities from parameter count alone. He also argues scaling laws measure next-token prediction and only indirectly relate to reasoning; finally, he explains why the convergence of data, GPUs, and transformers is intertwined rather than pure coincidence, and why progress may have been only modestly delayed without specific pioneers.
- •Expected emergent wins: reliability and controllability as capability scales
- •Capability prediction is possible in some coarse ways, but not yet fine-grained
- •Scaling laws track next-token accuracy; mapping to reasoning is complex and may need targeted data/training
- •Data/GPUs/internet progress co-evolved; deep learning’s timing may be only modestly contingent on individuals
Post-AGI meaning, human autonomy, and the far future
Dwarkesh asks what people do after AGI and what the world might look like centuries ahead. Ilya suggests AGI could help people become more ‘enlightened’ and find meaning, but rejects the idea that we can neatly choose the future; he emphasizes preserving human freedom and moral evolution rather than outsourcing society to an AGI’s directives.
- •AGI could help humans improve internally (e.g., ‘best meditation teacher in history’)
- •Some may choose human–AI integration to tackle harder problems
- •The world will continue changing after AGI; predicting year 3000 is unrealistic
- •Preferred future preserves human autonomy and learning from mistakes, with AGI as a safety net—not a ruler
Breakthroughs vs ‘understanding’: how research really advances (and views on forward-forward)
Ilya argues ‘new ideas’ are often overrated compared to deep understanding of results and phenomena. He reframes many breakthroughs as realizations of properties that were ‘there all along,’ and critiques non-backprop approaches like forward-forward as more neuroscience-motivated than engineering-practical today, ending with advice on being carefully inspired by brains without copying non-essentials.
- •Research time is dominated by understanding results and diagnosing unexpected behavior
- •Many major advances are ‘new understandings’ of old ingredients
- •Forward-forward is interesting for brain-learning hypotheses, but backprop remains the practical workhorse
- •Human/brain inspiration can be valuable if focused on essentials, not superficial analogies