Skip to content
Dwarkesh PodcastDwarkesh Podcast

Dylan Patel — The single biggest bottleneck to scaling AI compute

Dylan Patel, founder of SemiAnalysis, provides a deep dive into the 3 big bottlenecks to scaling AI compute: logic, memory, and power. And walks through the economics of labs, hyperscalers, foundries, and fab equipment manufacturers. Learned a ton about every single level of the stack. Enjoy! 𝐄𝐏𝐈𝐒𝐎𝐃𝐄 𝐋𝐈𝐍𝐊𝐒 * Transcript: https://www.dwarkesh.com/p/dylan-patel * Apple Podcasts: https://podcasts.apple.com/us/podcast/dylan-patel-deep-dive-on-the-3-big-bottlenecks-to/id1516093381?i=1000755126873 * Spotify: https://open.spotify.com/episode/5qiibwoBWY5rXyflK7WJzH?si=SX4ajSKXT-KeNtaHsiTNzw 𝐒𝐏𝐎𝐍𝐒𝐎𝐑𝐒 - Mercury has already saved me a bunch of time this tax season. Last year, I used Mercury to request W-9s from all the contractors I worked with. Then, when it came time to issue 1099s this year, I literally just clicked a button and Mercury sent them out. Learn more at https://mercury.com - Labelbox noticed that even when voice models appear to take interruptions in stride, their performance degrades. To figure out why, they built a new evaluation pipeline called EchoChain. EchoChain diagnoses voice models’ specific failure modes, letting you understand what your model needs to truly handle interruptions. Check it out at https://labelbox.com/dwarkesh - Jane Street is basically a research lab with a trading desk attached – and their infrastructure backs this up. They’ve got tens of thousands of GPUs, hundreds of thousands of CPU cores, and exabytes of storage. This is what it takes to find subtle signals hidden deep within noisy market data. If this sounds interesting, you can explore open positions at https://janestreet.com/dwarkesh To sponsor a future episode, visit https://dwarkesh.com/advertise. 𝐓𝐈𝐌𝐄𝐒𝐓𝐀𝐌𝐏𝐒 00:00:00 – Why an H100 is worth more today than 3 years ago 00:24:52 – Nvidia secured TSMC allocation early; Google is getting squeezed 00:34:34 – ASML will be the #1 constraint for AI compute scaling by 2030 00:55:47 – Can't we just use TSMC's older fabs? 01:05:37 – When will China outscale the West in semis? 01:16:01 – The enormous incoming memory crunch 01:42:34 – Scaling power in the US will not be a problem 01:54:44 – Space GPUs aren't happening this decade 02:14:07 – Why aren't more hedge funds making the AGI trade? 02:18:30 – Will TSMC kick Apple out from N2? 02:24:16 – Robots and Taiwan risk

Dwarkesh PatelhostDylan Patelguest
Mar 13, 20262h 30mWatch on YouTube ↗

CHAPTERS

  1. Hyperscaler CapEx vs real compute coming online: why timelines lag

    Dwarkesh opens by puzzling over how hundreds of billions in hyperscaler CapEx translates into only modest near-term data center power additions. Dylan explains that much of today’s CapEx is actually pre-spend for future years: power deposits, construction lead times, and long-term procurement that enables rapid scaling later. The key idea is that the spend and the compute deployment are offset in time, and much of the system is being built for 2027–2029, not “this year.”

  2. AI lab fundraising, inference demand shock, and scramble for capacity

    The conversation shifts to why labs like OpenAI and Anthropic raise massive sums despite already spending billions on compute. Dylan argues inference demand is compounding so quickly that revenue growth itself requires multi-gigawatt inference expansion, even before training/R&D growth. Anthropic, being more conservative on early commitments, must now hunt for capacity in less ideal places and structures.

  3. Spot vs long-term GPU deals: why “last-minute compute” is expensive

    Dwarkesh asks what it concretely means to acquire compute in a pinch. Dylan details how shorter contracts roll off, and the buyer willing to pay the most can crowd out others at much higher hourly rates, especially for H100/Hopper. Long-term contracts lock in margin advantages, while late buyers pay spot-like premiums or revenue shares via hyperscalers’ platforms.

  4. Why an H100 can be worth more today than years ago: utility beats specs

    Dwarkesh and Dylan explore GPU depreciation and why older GPUs may not lose value as quickly as bears claim. Dylan contrasts “perf/$ vs new chips” with a second lens: the value generated by running better models, which can increase over time even on the same hardware. As model quality and efficiency improve, the same H100 can produce more valuable tokens than it could years earlier, supporting higher market value in a compute-constrained world.

  5. Nvidia’s early TSMC allocation lock-in and Google’s TPU squeeze

    Dwarkesh challenges why TSMC appears to have ceded so much N3 capacity to Nvidia instead of balancing customers. Dylan argues Nvidia signaled demand earlier with stronger commitments, while some rivals had delays and weaker conviction, leading to a self-reinforcing allocation outcome. He also explains how Google’s TPU capacity was unexpectedly pulled toward external sales (notably Anthropic) before internal demand fully woke up.

  6. Bottlenecks over time: from CoWoS and power back to chips

    Dylan frames bottlenecks as a moving target driven by lead times: short-lead items get fixed faster, pushing constraints upstream to long-lead manufacturing. As AI absorbed more of the overall semiconductor pie, there’s less remaining “slack” to reallocate from phones/PCs into AI. The scaling limit becomes the semiconductor production chain itself—fabs, tools, logic wafers, and memory.

  7. ASML as the #1 constraint by 2030: translating EUV tools into gigawatts

    Dylan argues that by late decade, ASML’s EUV tool output becomes the hard floor under global compute scaling. He quantifies how many wafers and EUV “passes” are needed per gigawatt of next-gen AI capacity and derives a rough tools-per-GW mapping. This produces a striking conclusion: relatively small amounts of lithography tool capacity can gate tens of billions in downstream data center CapEx and AI value.

  8. Why ASML can’t just double output: artisanal complexity and supply-chain inertia

    Dwarkesh asks why ASML can’t simply triple CapEx and produce far more tools. Dylan walks through the extreme complexity of EUV systems—source, optics, stages, metrology—and the long, specialized supply chain behind them, including Zeiss optics and Cymer sources. Even with high demand, scaling production is limited by specialized labor, intricate components, and a historically cautious semiconductor industry that’s reluctant to overbuild into cyclicality.

  9. Can we go back to older fabs (7nm/DUV) if EUV is scarce?

    Dwarkesh proposes a fallback: use older nodes like 7nm via DUV multi-patterning to sidestep EUV constraints. Dylan explains why this underestimates system-level effects: performance isn’t just FLOPS; it’s compounded by networking, memory bandwidth, packaging, and scaling efficiency across many GPUs. He cites large real-world gaps (e.g., Hopper vs Blackwell on optimized inference) that far exceed naive FLOPS ratios due to communication and architecture constraints.

  10. China vs the West: when could scale outweigh node advantage?

    The discussion turns geopolitical: whether China’s scale and verticalization could eventually outproduce the West’s leading-edge advantage. Dylan notes China still relies heavily on imported DUV tools, but expects indigenous DUV by ~2030 and possibly early EUV prototypes—though not mass production. The key uncertainty is production ramp (“production hell”), and how takeoff speed affects whether US-led compute scaling compounds faster than China can catch up.

  11. The enormous incoming memory crunch: HBM vs DDR, consumer squeeze, and pricing spillovers

    Dwarkesh presses on memory as a near-term limiter: HBM is much less bit-dense per wafer than commodity DRAM, making AI’s memory demand exceptionally wafer-hungry. Dylan explains why switching to DDR isn’t a free fix: bandwidth per package-edge area collapses, leaving compute underutilized. They then connect memory price inflation to consumer device costs and volume destruction, especially in low/mid-range phones, as memory vendors prioritize AI willingness-to-pay.

  12. Why power in the US won’t be the limiting factor: many supply-chain “escape hatches”

    Dylan argues power is scalable relative to chip manufacturing because generation options are diverse and supply chains are less specialized. Beyond combined-cycle turbines, providers can deploy aero-derivatives, reciprocating engines, ship engines, fuel cells, and solar-plus-storage, and can also “unlock” grid capacity by addressing peak-load requirements. Labor and interconnection are real frictions, but the market can pay to overcome them, especially when power costs are a small fraction of total GPU TCO.

  13. Why space GPUs aren’t happening this decade: reliability, networking, and time-to-token

    Dwarkesh asks about Musk’s “space data centers” idea; Dylan is skeptical for near-term economics and engineering. He emphasizes that in a chip-constrained world, the dominant objective is minimizing time from chip manufacture to useful tokens, and space adds months plus servicing challenges. Space also makes networking vastly harder and more expensive, while GPU and optical component failure rates already demand constant hands-on remediation on Earth.

  14. Why aren’t more hedge funds making the AGI/infra trade? Market disbelief and interpretation gaps

    Dwarkesh challenges why obvious-seeming supply-chain signals (memory, fabs, tools) aren’t universally traded. Dylan claims many actors still believe the projections are “too high,” even after repeated upward revisions, and that conviction about AI’s economic takeoff is the gating factor. SemiAnalysis data makes the baseline more accurate, but profits accrue to those willing to interpret it with strong directional bets and sizing.

  15. TSMC, Apple, N2 allocation, Huawei counterfactual, robots, and Taiwan risk

    The episode closes with a cluster of strategic questions: whether TSMC could displace Apple for N2, what it means that AI becomes the first customer for future nodes, and how Huawei might compare if it had leading-edge access. Dylan also discusses robots as a competing sink for scarce leading-edge chips, and ends on Taiwan risk—arguing that losing Taiwan would collapse incremental compute growth regardless of evacuated talent because capacity replication takes years.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.