Dwarkesh PodcastAGI is still 30 years away — Ege Erdil & Tamay Besiroglu
CHAPTERS
- 0:34 – 4:35
Why Ege & Tamay expect AGI/“drop-in remote worker” to take ~30 years
Dwarkesh opens by probing their surprisingly long timelines for automating remote work. Ege and Tamay argue you can’t just extrapolate recent capability curves, because real economic automation requires many distinct competencies and sustained scaling of compute, data, and deployment.
- •Ege’s estimate for fully drop-in remote worker replacement: ~2045
- •Why recent progress doesn’t cleanly extrapolate to full automation
- •Automation impact should be measured by fraction of economy truly automated
- •Framing: count major capability “unlocks” still missing for broad labor automation
- 4:35 – 8:01
Compute scaling, capability “unlocks,” and hard physical constraints
They describe AI progress as a series of stepwise capability unlocks correlated with large increases in training compute. The discussion shifts to practical limits: energy, chip production, and the share of world output required to keep scaling data centers.
- •~9–10 orders of magnitude compute growth since AlexNet tied to key unlocks (games → LLMs → coding/reasoning)
- •Rule of thumb: big unlocks every few years / every few OOM of scaling
- •Remaining headroom might be only a few more OOM before large GDP fractions go to compute
- •Scaling is constrained by fabs, energy infrastructure, and global supply chains
- 8:01 – 18:05
Why “unhobbling” isn’t enough: jobs are messy bundles of tasks
Dwarkesh suggests models are “almost there” and mainly need better tooling, context, and agency scaffolds. Ege pushes back: most jobs aren’t single tasks, and even seemingly simple computer tasks require robust, long-horizon execution across messy contexts.
- •“Booking a flight” is not equivalent to automating the job that includes it
- •Jobs are bundles; automating a slice often has limited economic impact
- •Agency is about composing many simple actions reliably over long horizons
- •Skepticism that current computer-use agents generalize well beyond curated tasks
- 18:05 – 24:02
Generalization gaps: from Pokémon to arbitrary new environments
They debate whether existing models can transfer their vast knowledge into effective action. Pokémon is used as a case study: models may ‘know’ the game from internet text yet still get stuck, highlighting a gap between explicit knowledge and robust situational control.
- •Knowledge from pretraining doesn’t automatically translate into effective policies/actions
- •“Claude plays Pokémon” still gets stuck despite having walkthrough-level knowledge
- •A stronger test: download a brand-new Steam game post-cutoff and finish it
- •Update triggers: long-horizon agency, multimodal integration, and broad novelty handling
- 24:02 – 31:43
Why reasoning models still fall short of innovation and “research taste”
Dwarkesh turns to the intelligence-explosion claim via automating AI R&D. Ege and Tamay argue current models excel at short-horizon, well-specified problems but have not demonstrated the creative conceptual recombination and direction-setting that drives real research progress.
- •Eval tasks are compact/closed; real R&D has fuzzy goals and huge context
- •Models leverage massive memorized knowledge more than novel concept creation
- •No compelling examples yet of new math concepts interesting to mathematicians
- •Research automation requires broader competences than current coding/math benchmarks show
- 31:43 – 45:39
Moravec’s paradox and why impressive benchmarks can mislead
They introduce Moravec’s paradox: AI often advances fastest on tasks humans find cognitively impressive but evolution didn’t heavily optimize. This weakens inferences from chess/Go/coding contests to full economic agency, and highlights mismatches between benchmark success and practical usefulness.
- •Hard-for-humans symbolic tasks can be easy for machines (and vice versa)
- •Evolutionary recency explains why some ‘impressive’ skills are less fundamental
- •Competitive programming skill doesn’t map cleanly to real-world coding assistance
- •Don’t over-update from narrow benchmark wins to broad worker replacement
- 45:39 – 1:02:21
The “software-only singularity” debate: R&D returns and complementarity with experiments
Dwarkesh lays out the software-only singularity loop: AI researchers optimize AI efficiency, enabling more copies and cheaper experiments, compounding progress. Tamay responds using economics of R&D: diminishing returns, ambiguous scaling effects, and strong evidence that compute/hardware experiments are complementary bottlenecks.
- •Intelligence-explosion argument: more AI copies → more parallel research → cheaper training → faster progress
- •Economics lens: diminishing returns vs ‘standing on shoulders of giants’
- •Evidence that compute/experiments matter: progress rates track hardware trends; innovations cluster in GPU-rich labs
- •Many algorithmic advances are hardware/efficiency-motivated (transformers, flash attention, scaling laws)
- 1:02:21 – 1:06:50
Ege & Tamay’s origin story: Epoch AI, “internet misfits,” and motivations
The conversation pivots to how they met and why they built Epoch AI: frustration with bureaucracy and low-quality debate, plus a desire for grounded empirical analysis. They discuss recruiting unusually insightful online collaborators and focusing on neglected questions academia/industry wouldn’t prioritize.
- •Met via Metaculus/forecasting communities; early collaborations
- •Founding Epoch to move faster than academia and answer ‘unsexy’ but important questions
- •Recruiting unconventional talent outside standard pipelines
- •Motivation: raise the quality of public reasoning about AI trajectories
- 1:06:50 – 1:33:54
What explosive economic growth would actually look like (and why it’s not a “desert robot city” only)
They unpack their ‘broad transformation’ view: explosive growth requires upgrading many sectors, supply chains, and deployment channels, not just smart software in isolation. Even if growth starts locally, it likely spreads across jurisdictions depending on regulatory and institutional differences.
- •A ‘Shenzhen in the desert’ is more plausible than pure software singularity—but still supply-chain constrained
- •Internet data stock and broad deployment are key to future capability acquisition
- •Learning-by-doing and capital deepening are part of discovery, not just post-invention rollout
- •Heterogeneous adoption across countries; regulatory jurisdictions may define growth hotspots
- 1:33:54 – 1:51:44
AI economy vs human economy: integration, power, and takeover concerns
Dwarkesh presses on the idea of a separate AI economy that could outgrow and dominate human institutions. Ege and Tamay argue it’s usually more efficient for AIs to integrate with existing markets and norms, and that ‘power imbalance implies takeover’ is not a complete predictive model without motives, coordination assumptions, and political economy context.
- •Separate AI economy is typically less efficient than leveraging existing supply chains/markets
- •Power imbalances don’t automatically predict conquest; incentives and coordination matter
- •They’re skeptical of rapid value lock-in and simplistic ‘single utility function’ futures
- •Conflict literature: misalignment alone doesn’t explain wars; bargaining/commitment/information problems matter
- 1:51:44 – 2:13:02
Can we predictably influence the far future? humility, institutions, and war-planning analogies
They argue that reasoning alone is weak for detailed long-range planning; real-world contact, experiments, and iteration dominate. They advocate flexibility and institution-building over brittle grand plans, illustrated by dramatic pre-WWII forecasting errors about aerial bombardment impacts.
- •Policy planning under deep uncertainty: prioritize adaptability over detailed master plans
- •Example: British pre-WWII bombing casualty forecasts were off by ~orders of magnitude
- •Digital systems may not ‘lock in’ values; even digital info decays (link rot) and culture shifts quickly
- •Focus on near-term, predictably positive actions; build capacity to respond as evidence arrives
- 2:13:02 – 3:09:03
Why accelerate automation (Mechanize), and objections to explosive growth (Tyler Cowen, bottlenecks, geopolitics)
Dwarkesh challenges their stated goal to accelerate broad labor automation. Ege and Tamay argue the welfare gains of earlier automation are enormous and that slowing down may not improve alignment—since progress (including alignment progress) often depends on scaling and deployment. They end with objections to explosive growth: intelligence isn’t the bottleneck; institutions, adoption, and geopolitics shape outcomes, with uneven growth across jurisdictions and possible arms-race dynamics.
- •Case for acceleration: faster growth, medical breakthroughs, massive consumer surplus; near-term complementarity boosts wages
- •Skepticism that pausing at fixed compute yields major alignment progress; scaling reveals new alignment-relevant phenomena
- •Objection handling: clean water example—management/institutions are hard to ‘plug and play’
- •Geopolitics: uneven adoption, potential embargoes, and arms-race incentives; debate about China’s industrial capacity relevance