Skip to content
Dwarkesh PodcastDwarkesh Podcast

Dylan Patel — The single biggest bottleneck to scaling AI compute

Dylan Patel, founder of SemiAnalysis, provides a deep dive into the 3 big bottlenecks to scaling AI compute: logic, memory, and power. And walks through the economics of labs, hyperscalers, foundries, and fab equipment manufacturers. Learned a ton about every single level of the stack. Enjoy! 𝐄𝐏𝐈𝐒𝐎𝐃𝐄 𝐋𝐈𝐍𝐊𝐒 * Transcript: https://www.dwarkesh.com/p/dylan-patel * Apple Podcasts: https://podcasts.apple.com/us/podcast/dylan-patel-deep-dive-on-the-3-big-bottlenecks-to/id1516093381?i=1000755126873 * Spotify: https://open.spotify.com/episode/5qiibwoBWY5rXyflK7WJzH?si=SX4ajSKXT-KeNtaHsiTNzw 𝐒𝐏𝐎𝐍𝐒𝐎𝐑𝐒 - Mercury has already saved me a bunch of time this tax season. Last year, I used Mercury to request W-9s from all the contractors I worked with. Then, when it came time to issue 1099s this year, I literally just clicked a button and Mercury sent them out. Learn more at https://mercury.com - Labelbox noticed that even when voice models appear to take interruptions in stride, their performance degrades. To figure out why, they built a new evaluation pipeline called EchoChain. EchoChain diagnoses voice models’ specific failure modes, letting you understand what your model needs to truly handle interruptions. Check it out at https://labelbox.com/dwarkesh - Jane Street is basically a research lab with a trading desk attached – and their infrastructure backs this up. They’ve got tens of thousands of GPUs, hundreds of thousands of CPU cores, and exabytes of storage. This is what it takes to find subtle signals hidden deep within noisy market data. If this sounds interesting, you can explore open positions at https://janestreet.com/dwarkesh To sponsor a future episode, visit https://dwarkesh.com/advertise. 𝐓𝐈𝐌𝐄𝐒𝐓𝐀𝐌𝐏𝐒 00:00:00 – Why an H100 is worth more today than 3 years ago 00:24:52 – Nvidia secured TSMC allocation early; Google is getting squeezed 00:34:34 – ASML will be the #1 constraint for AI compute scaling by 2030 00:55:47 – Can't we just use TSMC's older fabs? 01:05:37 – When will China outscale the West in semis? 01:16:01 – The enormous incoming memory crunch 01:42:34 – Scaling power in the US will not be a problem 01:54:44 – Space GPUs aren't happening this decade 02:14:07 – Why aren't more hedge funds making the AGI trade? 02:18:30 – Will TSMC kick Apple out from N2? 02:24:16 – Robots and Taiwan risk

Dwarkesh PatelhostDylan Patelguest
Mar 13, 20262h 30mWatch on YouTube ↗

CHAPTERS

  1. Hyperscaler CapEx vs real compute coming online: why timelines lag

    Dwarkesh opens by puzzling over how hundreds of billions in hyperscaler CapEx translates into only modest near-term data center power additions. Dylan explains that much of today’s CapEx is actually pre-spend for future years: power deposits, construction lead times, and long-term procurement that enables rapid scaling later. The key idea is that the spend and the compute deployment are offset in time, and much of the system is being built for 2027–2029, not “this year.”

    • CapEx includes forward-looking items (turbine deposits, PPAs, land, construction) not just GPUs
    • Data center and power projects require multi-year sequencing even if final buildouts look fast
    • Incremental US critical IT capacity additions are far below implied “rent-equivalent” GW math
    • Supply chain and deployment cadence (not funding) sets the near-term slope
    • Compute coming online is partially funded in prior-year budgets
  2. AI lab fundraising, inference demand shock, and scramble for capacity

    The conversation shifts to why labs like OpenAI and Anthropic raise massive sums despite already spending billions on compute. Dylan argues inference demand is compounding so quickly that revenue growth itself requires multi-gigawatt inference expansion, even before training/R&D growth. Anthropic, being more conservative on early commitments, must now hunt for capacity in less ideal places and structures.

    • Inference scaling can require multiple new GW just to serve accelerating revenue
    • Compute needs aren’t just training; inference can dominate as adoption inflects
    • Conservative contracting can become a liability when demand surprises upward
    • Anthropic can reach ~5–6 GW by year-end via direct capacity + hyperscaler channels
    • OpenAI likely ends the year slightly higher due to broader multi-provider strategy
  3. Spot vs long-term GPU deals: why “last-minute compute” is expensive

    Dwarkesh asks what it concretely means to acquire compute in a pinch. Dylan details how shorter contracts roll off, and the buyer willing to pay the most can crowd out others at much higher hourly rates, especially for H100/Hopper. Long-term contracts lock in margin advantages, while late buyers pay spot-like premiums or revenue shares via hyperscalers’ platforms.

    • Compute is not all 5-year locked; some supply turns over via short deals and on-demand
    • Recent H100 deals can reach ~$2.40/hr for 2–3 years, far above build economics
    • Neoclouds/hyperscalers may have flex capacity from expiring contracts or reallocations
    • Revenue-share delivery (Bedrock/Vertex/Foundry) embeds an additional margin layer
    • Early commitments buy both price certainty and availability during scarcity
  4. Why an H100 can be worth more today than years ago: utility beats specs

    Dwarkesh and Dylan explore GPU depreciation and why older GPUs may not lose value as quickly as bears claim. Dylan contrasts “perf/$ vs new chips” with a second lens: the value generated by running better models, which can increase over time even on the same hardware. As model quality and efficiency improve, the same H100 can produce more valuable tokens than it could years earlier, supporting higher market value in a compute-constrained world.

    • Depreciation depends on market scarcity and token value, not just new-chip perf/$
    • Model progress (architecture, RL, sparsity, data) can raise output value per GPU
    • H100 utility can rise if it serves more tokens of better models at lower cost
    • Long-term contracts lock in margins while new deployments clear at current scarcity pricing
    • Economic value per GPU can grow as models become more monetizable
  5. Nvidia’s early TSMC allocation lock-in and Google’s TPU squeeze

    Dwarkesh challenges why TSMC appears to have ceded so much N3 capacity to Nvidia instead of balancing customers. Dylan argues Nvidia signaled demand earlier with stronger commitments, while some rivals had delays and weaker conviction, leading to a self-reinforcing allocation outcome. He also explains how Google’s TPU capacity was unexpectedly pulled toward external sales (notably Anthropic) before internal demand fully woke up.

    • TSMC responds to credible demand signals: deposits, non-cancelable orders, long lead planning
    • Nvidia moved earlier than Google/Amazon on commitments, tightening rivals’ options
    • TSMC often prefers stable CPU demand, but Nvidia’s platform breadth increases pull
    • Google TPU allocation shifted due to Anthropic deal timing and internal under-forecasting
    • Google “woke up” after usage/revenue inflected, but near-term wafer capacity was already spoken for
  6. Bottlenecks over time: from CoWoS and power back to chips

    Dylan frames bottlenecks as a moving target driven by lead times: short-lead items get fixed faster, pushing constraints upstream to long-lead manufacturing. As AI absorbed more of the overall semiconductor pie, there’s less remaining “slack” to reallocate from phones/PCs into AI. The scaling limit becomes the semiconductor production chain itself—fabs, tools, logic wafers, and memory.

    • Bottlenecks shift upstream as shorter-lead constraints (CoWoS, power, DCs) get addressed
    • Fabs and tools have multi-year lead times; data centers can be built in under a year
    • AI can no longer rely on reallocating capacity away from consumer electronics
    • Nvidia becomes top customer at TSMC and SK Hynix, showing AI’s dominance
    • Long-run compute scaling is capped by semiconductor manufacturing expansion speed
  7. ASML as the #1 constraint by 2030: translating EUV tools into gigawatts

    Dylan argues that by late decade, ASML’s EUV tool output becomes the hard floor under global compute scaling. He quantifies how many wafers and EUV “passes” are needed per gigawatt of next-gen AI capacity and derives a rough tools-per-GW mapping. This produces a striking conclusion: relatively small amounts of lithography tool capacity can gate tens of billions in downstream data center CapEx and AI value.

    • EUV tools: ~70 now, ~80 next year, only a bit over ~100/year by decade end (aggressive case)
    • A gigawatt of cutting-edge AI compute implies ~2M EUV passes across logic + memory wafers
    • Rule of thumb: ~3.5 EUV tools per 1 GW of latest-gen AI capacity (illustrative)
    • Existing installed base + annual shipments could imply ~hundreds of GW potential by 2030 if allocated
    • Small tooling dollars can bottleneck vastly larger compute and revenue opportunity
  8. Why ASML can’t just double output: artisanal complexity and supply-chain inertia

    Dwarkesh asks why ASML can’t simply triple CapEx and produce far more tools. Dylan walks through the extreme complexity of EUV systems—source, optics, stages, metrology—and the long, specialized supply chain behind them, including Zeiss optics and Cymer sources. Even with high demand, scaling production is limited by specialized labor, intricate components, and a historically cautious semiconductor industry that’s reluctant to overbuild into cyclicality.

    • EUV comprises multiple ultra-complex subsystems: source, optics, reticle stage, wafer stage
    • Precision requirements are extreme (overlay ~nanometers), making ramping labor and QA hard
    • Supply chain spans thousands of suppliers; assembly/shipping/reassembly takes months
    • Industry culture avoids “YOLO expansion” due to boom-bust memory/semicap cycles
    • Even if demand is clear, training specialized talent and qualifying parts takes years
  9. Can we go back to older fabs (7nm/DUV) if EUV is scarce?

    Dwarkesh proposes a fallback: use older nodes like 7nm via DUV multi-patterning to sidestep EUV constraints. Dylan explains why this underestimates system-level effects: performance isn’t just FLOPS; it’s compounded by networking, memory bandwidth, packaging, and scaling efficiency across many GPUs. He cites large real-world gaps (e.g., Hopper vs Blackwell on optimized inference) that far exceed naive FLOPS ratios due to communication and architecture constraints.

    • Node fallback isn’t just a die shrink question; system throughput depends on interconnect + memory
    • Cross-chip communication is orders slower than on-chip; older designs pay heavier scaling penalties
    • Optimized inference can show ~order-of-magnitude differences beyond FLOPS comparisons
    • You can port some packaging advances to old nodes, but leading-edge also keeps advancing packaging
    • Older-node compute might be used at extreme scarcity, but it’s not a clean substitute
  10. China vs the West: when could scale outweigh node advantage?

    The discussion turns geopolitical: whether China’s scale and verticalization could eventually outproduce the West’s leading-edge advantage. Dylan notes China still relies heavily on imported DUV tools, but expects indigenous DUV by ~2030 and possibly early EUV prototypes—though not mass production. The key uncertainty is production ramp (“production hell”), and how takeoff speed affects whether US-led compute scaling compounds faster than China can catch up.

    • China’s current advanced-node output still depends heavily on imported ASML DUV tools
    • Indigenous DUV by ~2030 seems likely; EUV may exist but mass production is harder
    • Catch-up depends on both tool capability and manufacturing scale/quality ramp
    • Fast AI economic compounding favors the US/West; slower timelines give China more time to verticalize
    • As models become productized work (not exposed chains), distillation becomes harder for lagging ecosystems
  11. The enormous incoming memory crunch: HBM vs DDR, consumer squeeze, and pricing spillovers

    Dwarkesh presses on memory as a near-term limiter: HBM is much less bit-dense per wafer than commodity DRAM, making AI’s memory demand exceptionally wafer-hungry. Dylan explains why switching to DDR isn’t a free fix: bandwidth per package-edge area collapses, leaving compute underutilized. They then connect memory price inflation to consumer device costs and volume destruction, especially in low/mid-range phones, as memory vendors prioritize AI willingness-to-pay.

    • HBM trades bit density for bandwidth; AI demand is constrained by bandwidth/wafer, not bits/wafer
    • DDR provides far less bandwidth per edge area (orders-of-magnitude gap vs HBM stacks)
    • Using DDR would shift bottlenecks and waste FLOPS; ‘slow mode’ is economically unattractive
    • Memory price increases can raise iPhone BOM sharply; low/mid-range phones get hit hardest
    • Consumer volume declines release some supply, but AI keeps DRAM/HBM pressure intense
  12. Why power in the US won’t be the limiting factor: many supply-chain “escape hatches”

    Dylan argues power is scalable relative to chip manufacturing because generation options are diverse and supply chains are less specialized. Beyond combined-cycle turbines, providers can deploy aero-derivatives, reciprocating engines, ship engines, fuel cells, and solar-plus-storage, and can also “unlock” grid capacity by addressing peak-load requirements. Labor and interconnection are real frictions, but the market can pay to overcome them, especially when power costs are a small fraction of total GPU TCO.

    • Critical IT power differs from nameplate due to losses, cooling, derating, and capacity factors
    • Beyond big turbines: many gas-based options + fuel cells + renewables + batteries can scale
    • Addressing peak-load constraints can free substantial grid capacity most of the year
    • Even higher $/kW power is tolerable because GPUs dominate TCO and model value is rising
    • Labor is a constraint, but modularization and factory-built components can reduce onsite work
  13. Why space GPUs aren’t happening this decade: reliability, networking, and time-to-token

    Dwarkesh asks about Musk’s “space data centers” idea; Dylan is skeptical for near-term economics and engineering. He emphasizes that in a chip-constrained world, the dominant objective is minimizing time from chip manufacture to useful tokens, and space adds months plus servicing challenges. Space also makes networking vastly harder and more expensive, while GPU and optical component failure rates already demand constant hands-on remediation on Earth.

    • Compute scarcity makes ‘time-to-deploy’ crucial; space adds long delays and logistics complexity
    • GPU and network components have high failure/RMA rates requiring frequent physical intervention
    • Inter-satellite links don’t match aggregate rack-scale bandwidth needs (per-GPU vs per-rack scaling)
    • Space networking would require expensive, fragile laser interconnect at massive scale
    • Space may make sense later when energy/land dominate and chips are no longer the bottleneck
  14. Why aren’t more hedge funds making the AGI/infra trade? Market disbelief and interpretation gaps

    Dwarkesh challenges why obvious-seeming supply-chain signals (memory, fabs, tools) aren’t universally traded. Dylan claims many actors still believe the projections are “too high,” even after repeated upward revisions, and that conviction about AI’s economic takeoff is the gating factor. SemiAnalysis data makes the baseline more accurate, but profits accrue to those willing to interpret it with strong directional bets and sizing.

    • Many customers (including industry) resist high-growth forecasts until after they materialize
    • Trading requires conviction that AI demand will sustain and intensify constraints
    • Opportunities come from what’s ‘outside expectations’ (e.g., memory pricing, fab scarcity)
    • Information advantage is as much about interpretation and positioning as raw data access
    • Persistent skepticism creates repeated lag between obvious technical drivers and market pricing
  15. TSMC, Apple, N2 allocation, Huawei counterfactual, robots, and Taiwan risk

    The episode closes with a cluster of strategic questions: whether TSMC could displace Apple for N2, what it means that AI becomes the first customer for future nodes, and how Huawei might compare if it had leading-edge access. Dylan also discusses robots as a competing sink for scarce leading-edge chips, and ends on Taiwan risk—arguing that losing Taiwan would collapse incremental compute growth regardless of evacuated talent because capacity replication takes years.

    • TSMC likely won’t ‘kick Apple out,’ but Apple’s leverage shrinks as AI becomes the growth driver
    • Future nodes (e.g., A16 variants) may be led by AI customers, changing historical dynamics
    • Huawei could be extremely formidable with leading-edge access due to end-to-end strengths
    • Robots likely offload major cognition to cloud for batching; on-device needs still consume scarce leading-edge silicon
    • Taiwan disruption would crater global semiconductor capacity; airlifting engineers doesn’t replace lost fabs quickly

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.