Skip to content
Dwarkesh PodcastDwarkesh Podcast

AGI is still 30 years away — Ege Erdil & Tamay Besiroglu

Ege Erdil and Tamay Besiroglu have 2045+ timelines, think the whole "alignment" framing is wrong, don't think an intelligence explosion is plausible, but are convinced we'll see explosive economic growth (with the economy literally doubling every 1 or 2 years). This discussion offers a totally different scenario than my recent interview with Scott and Daniel. Ege and Tamay are the co-founders of Mechanize (disclosure - I’m an angel investor), a startup dedicated to fully automating work. Before founding Mechanize, Ege and Tamay worked on AI forecasts at Epoch AI. 𝐄𝐏𝐈𝐒𝐎𝐃𝐄 𝐋𝐈𝐍𝐊𝐒 * Transcript: https://www.dwarkesh.com/p/ege-tamay * Apple Podcasts: https://podcasts.apple.com/us/podcast/agi-is-still-30-years-away-ege-erdil-tamay-besiroglu/id1516093381?i=1000703894255 * Spotify: https://open.spotify.com/episode/68eeIiy3mT6PRlrTej9dtq?si=8bd51bdc846e47f6 𝐒𝐏𝐎𝐍𝐒𝐎𝐑𝐒 * WorkOS makes it easy to become enterprise-ready. With simple APIs for essential enterprise features like SSO and SCIM, WorkOS helps companies like Vercel, Plaid, and OpenAI meet the requirements of their biggest customers. To learn more about how they can help you do the same, visit https://workos.com * Scale’s Data Foundry gives major AI labs access to high-quality data to fuel post-training, including advanced reasoning capabilities. If you’re an AI researcher or engineer, learn about how Scale’s Data Foundry and research lab, SEAL, can help you go beyond the current frontier at https://scale.com/dwarkesh * Google's Gemini Pro 2.5 is THE model we use the most at Dwarkesh Podcast: it helps us generate transcripts, identify interesting clips, and code up new tools. Check out our internal Gemini powered tools here: https://huggingface.co/spaces/dwarkesh/transcriber, https://huggingface.co/spaces/dwarkesh/producer. And if you want to try it for yourself, it's now available in Preview with higher rate limits! Start building with it today at https://aistudio.google.com To sponsor a future episode, visit https://dwarkesh.com/advertise. 𝐓𝐈𝐌𝐄𝐒𝐓𝐀𝐌𝐏𝐒 00:00:00 - AGI will take another 3 decades 00:23:01 - Even reasoning models lack animal intelligence 00:45:38 - Intelligence explosion 01:01:31 - Ege & Tamay's story 01:06:58 - Explosive economic growth 01:33:34 - Will there be a separate AI economy? 01:47:42 - Can we predictably influence the future? 02:20:22 - Arms race dynamic 02:30:22 - Is superintelligence a real thing? 02:36:19 - Reasons not to expect explosive growth 02:49:34 - Fully automated firms 02:55:17 - Will central planning work?

Ege ErdilguestTamay BesirogluguestDwarkesh Patelhost
Apr 17, 20253h 9mWatch on YouTube ↗

CHAPTERS

  1. 0:34 – 4:35

    Why Ege & Tamay expect AGI/“drop-in remote worker” to take ~30 years

    Dwarkesh opens by probing their surprisingly long timelines for automating remote work. Ege and Tamay argue you can’t just extrapolate recent capability curves, because real economic automation requires many distinct competencies and sustained scaling of compute, data, and deployment.

    • Ege’s estimate for fully drop-in remote worker replacement: ~2045
    • Why recent progress doesn’t cleanly extrapolate to full automation
    • Automation impact should be measured by fraction of economy truly automated
    • Framing: count major capability “unlocks” still missing for broad labor automation
  2. 4:35 – 8:01

    Compute scaling, capability “unlocks,” and hard physical constraints

    They describe AI progress as a series of stepwise capability unlocks correlated with large increases in training compute. The discussion shifts to practical limits: energy, chip production, and the share of world output required to keep scaling data centers.

    • ~9–10 orders of magnitude compute growth since AlexNet tied to key unlocks (games → LLMs → coding/reasoning)
    • Rule of thumb: big unlocks every few years / every few OOM of scaling
    • Remaining headroom might be only a few more OOM before large GDP fractions go to compute
    • Scaling is constrained by fabs, energy infrastructure, and global supply chains
  3. 8:01 – 18:05

    Why “unhobbling” isn’t enough: jobs are messy bundles of tasks

    Dwarkesh suggests models are “almost there” and mainly need better tooling, context, and agency scaffolds. Ege pushes back: most jobs aren’t single tasks, and even seemingly simple computer tasks require robust, long-horizon execution across messy contexts.

    • “Booking a flight” is not equivalent to automating the job that includes it
    • Jobs are bundles; automating a slice often has limited economic impact
    • Agency is about composing many simple actions reliably over long horizons
    • Skepticism that current computer-use agents generalize well beyond curated tasks
  4. 18:05 – 24:02

    Generalization gaps: from Pokémon to arbitrary new environments

    They debate whether existing models can transfer their vast knowledge into effective action. Pokémon is used as a case study: models may ‘know’ the game from internet text yet still get stuck, highlighting a gap between explicit knowledge and robust situational control.

    • Knowledge from pretraining doesn’t automatically translate into effective policies/actions
    • “Claude plays Pokémon” still gets stuck despite having walkthrough-level knowledge
    • A stronger test: download a brand-new Steam game post-cutoff and finish it
    • Update triggers: long-horizon agency, multimodal integration, and broad novelty handling
  5. 24:02 – 31:43

    Why reasoning models still fall short of innovation and “research taste”

    Dwarkesh turns to the intelligence-explosion claim via automating AI R&D. Ege and Tamay argue current models excel at short-horizon, well-specified problems but have not demonstrated the creative conceptual recombination and direction-setting that drives real research progress.

    • Eval tasks are compact/closed; real R&D has fuzzy goals and huge context
    • Models leverage massive memorized knowledge more than novel concept creation
    • No compelling examples yet of new math concepts interesting to mathematicians
    • Research automation requires broader competences than current coding/math benchmarks show
  6. 31:43 – 45:39

    Moravec’s paradox and why impressive benchmarks can mislead

    They introduce Moravec’s paradox: AI often advances fastest on tasks humans find cognitively impressive but evolution didn’t heavily optimize. This weakens inferences from chess/Go/coding contests to full economic agency, and highlights mismatches between benchmark success and practical usefulness.

    • Hard-for-humans symbolic tasks can be easy for machines (and vice versa)
    • Evolutionary recency explains why some ‘impressive’ skills are less fundamental
    • Competitive programming skill doesn’t map cleanly to real-world coding assistance
    • Don’t over-update from narrow benchmark wins to broad worker replacement
  7. 45:39 – 1:02:21

    The “software-only singularity” debate: R&D returns and complementarity with experiments

    Dwarkesh lays out the software-only singularity loop: AI researchers optimize AI efficiency, enabling more copies and cheaper experiments, compounding progress. Tamay responds using economics of R&D: diminishing returns, ambiguous scaling effects, and strong evidence that compute/hardware experiments are complementary bottlenecks.

    • Intelligence-explosion argument: more AI copies → more parallel research → cheaper training → faster progress
    • Economics lens: diminishing returns vs ‘standing on shoulders of giants’
    • Evidence that compute/experiments matter: progress rates track hardware trends; innovations cluster in GPU-rich labs
    • Many algorithmic advances are hardware/efficiency-motivated (transformers, flash attention, scaling laws)
  8. 1:02:21 – 1:06:50

    Ege & Tamay’s origin story: Epoch AI, “internet misfits,” and motivations

    The conversation pivots to how they met and why they built Epoch AI: frustration with bureaucracy and low-quality debate, plus a desire for grounded empirical analysis. They discuss recruiting unusually insightful online collaborators and focusing on neglected questions academia/industry wouldn’t prioritize.

    • Met via Metaculus/forecasting communities; early collaborations
    • Founding Epoch to move faster than academia and answer ‘unsexy’ but important questions
    • Recruiting unconventional talent outside standard pipelines
    • Motivation: raise the quality of public reasoning about AI trajectories
  9. 1:06:50 – 1:33:54

    What explosive economic growth would actually look like (and why it’s not a “desert robot city” only)

    They unpack their ‘broad transformation’ view: explosive growth requires upgrading many sectors, supply chains, and deployment channels, not just smart software in isolation. Even if growth starts locally, it likely spreads across jurisdictions depending on regulatory and institutional differences.

    • A ‘Shenzhen in the desert’ is more plausible than pure software singularity—but still supply-chain constrained
    • Internet data stock and broad deployment are key to future capability acquisition
    • Learning-by-doing and capital deepening are part of discovery, not just post-invention rollout
    • Heterogeneous adoption across countries; regulatory jurisdictions may define growth hotspots
  10. 1:33:54 – 1:51:44

    AI economy vs human economy: integration, power, and takeover concerns

    Dwarkesh presses on the idea of a separate AI economy that could outgrow and dominate human institutions. Ege and Tamay argue it’s usually more efficient for AIs to integrate with existing markets and norms, and that ‘power imbalance implies takeover’ is not a complete predictive model without motives, coordination assumptions, and political economy context.

    • Separate AI economy is typically less efficient than leveraging existing supply chains/markets
    • Power imbalances don’t automatically predict conquest; incentives and coordination matter
    • They’re skeptical of rapid value lock-in and simplistic ‘single utility function’ futures
    • Conflict literature: misalignment alone doesn’t explain wars; bargaining/commitment/information problems matter
  11. 1:51:44 – 2:13:02

    Can we predictably influence the far future? humility, institutions, and war-planning analogies

    They argue that reasoning alone is weak for detailed long-range planning; real-world contact, experiments, and iteration dominate. They advocate flexibility and institution-building over brittle grand plans, illustrated by dramatic pre-WWII forecasting errors about aerial bombardment impacts.

    • Policy planning under deep uncertainty: prioritize adaptability over detailed master plans
    • Example: British pre-WWII bombing casualty forecasts were off by ~orders of magnitude
    • Digital systems may not ‘lock in’ values; even digital info decays (link rot) and culture shifts quickly
    • Focus on near-term, predictably positive actions; build capacity to respond as evidence arrives
  12. 2:13:02 – 3:09:03

    Why accelerate automation (Mechanize), and objections to explosive growth (Tyler Cowen, bottlenecks, geopolitics)

    Dwarkesh challenges their stated goal to accelerate broad labor automation. Ege and Tamay argue the welfare gains of earlier automation are enormous and that slowing down may not improve alignment—since progress (including alignment progress) often depends on scaling and deployment. They end with objections to explosive growth: intelligence isn’t the bottleneck; institutions, adoption, and geopolitics shape outcomes, with uneven growth across jurisdictions and possible arms-race dynamics.

    • Case for acceleration: faster growth, medical breakthroughs, massive consumer surplus; near-term complementarity boosts wages
    • Skepticism that pausing at fixed compute yields major alignment progress; scaling reveals new alignment-relevant phenomena
    • Objection handling: clean water example—management/institutions are hard to ‘plug and play’
    • Geopolitics: uneven adoption, potential embargoes, and arms-race incentives; debate about China’s industrial capacity relevance

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.