Dylan Patel — The Single Biggest Bottleneck to Scaling AI Compute

Dylan Patel — The Single Biggest Bottleneck to Scaling AI Compute

Dwarkesh PodcastMar 13, 20262h 31m

Dwarkesh Patel (host), Dylan Patel (guest)

Capex timing vs deployed computeLong-term GPU rental contracts and spot pricingWhy H100 can appreciate in valueTSMC allocation dynamics and Nvidia prebookingASML EUV tooling as end-of-decade bottleneckEUV passes per gigawatt and tool throughput mathOlder-node fallback limits (7nm, DUV multipatterning)Scale-up domains (NVL72 vs TPU pods) and topologyHBM vs DDR bandwidth/shoreline constraintsConsumer electronics demand destruction from memory pricesPower scaling via behind-the-meter and alternative generationChina semiconductor indigenization timelines (DUV/EUV)Space data centers skepticism (deployment latency, failures, networking)Taiwan risk and global capacity fragility

In this episode of Dwarkesh Podcast, featuring Dwarkesh Patel and Dylan Patel, Dylan Patel — The Single Biggest Bottleneck to Scaling AI Compute explores aI compute scaling hits bottlenecks in chips, tools, and memory Hyperscaler AI capex largely pre-pays long-lead items like turbines, land, and construction years ahead, so spend today does not translate linearly into gigawatts online this year.

AI compute scaling hits bottlenecks in chips, tools, and memory

Hyperscaler AI capex largely pre-pays long-lead items like turbines, land, and construction years ahead, so spend today does not translate linearly into gigawatts online this year.

GPU economics are shifting because chip value is increasingly set by the utility of today’s best models under scarcity, making older GPUs (e.g., H100) potentially more valuable now than years ago despite newer generations.

By the late 2020s the binding constraint to AI compute is expected to move upstream to semiconductor manufacturing capacity—ultimately ASML EUV tools and their intricate supplier network—rather than power or data center construction.

Falling back to older nodes (e.g., 7nm) is not a clean escape hatch because modern performance depends heavily on system-level factors like interconnect, memory bandwidth, and packaging, not just raw FLOPS or node shrink.

A major memory crunch (HBM/DRAM) is projected to raise prices, reallocate wafers away from consumer devices, and potentially shrink low-end phone/PC volumes while AI captures a growing share of global memory output.

Key Takeaways

Capex is increasingly about reserving future constraints, not just buying GPUs.

Hyperscalers spend heavily on deposits for turbines, power agreements, land, and multi-year buildouts; a meaningful portion of “this year’s capex” is actually securing 2027–2029 capacity.

Get the full analysis with uListen AI

Early compute commitments create durable margin advantages.

Five-year GPU contracts lock in pricing before scarcity reprices compute to model value; late buyers face higher spot/shorter-term rates or revenue-share markups through cloud channels.

Get the full analysis with uListen AI

GPU “depreciation” can invert under compute scarcity.

If demand is constrained by supply, pricing is anchored to the value generated by current frontier models; as models become cheaper/faster to serve, an H100 can produce more valuable tokens than it could years earlier.

Get the full analysis with uListen AI

Nvidia’s share at TSMC reflects earlier, firmer demand signals than rivals.

TSMC allocates based on credible long-term commitments and supply-chain readiness; Nvidia secured capacity earlier while some custom silicon programs faced delays, leaving them squeezed later.

Get the full analysis with uListen AI

By ~2028–2030, EUV tools (ASML) become the ‘lowest rung’ constraint.

Even aggressive expansion only raises EUV shipments to ~100 tools/year by decade end, and the ecosystem’s scaling is limited by highly specialized sub-suppliers and long ramp times.

Get the full analysis with uListen AI

A simple EUV accounting links tools to AI gigawatts.

Using Rubin-like assumptions, ~2M EUV ‘passes’ per critical-IT gigawatt implies roughly ~3. ...

Get the full analysis with uListen AI

Older-node fallback is possible but far less powerful than it looks on FLOPS charts.

Modern gains compound from packaging, interconnect, memory bandwidth, and larger scale-up domains; crossing-chip and cross-rack communication penalties mean node regressions degrade real throughput far more than nominal FLOPS suggest.

Get the full analysis with uListen AI

Memory (HBM/DRAM) is an enormous and underappreciated limiter.

HBM consumes far more wafer area per bit and shifts the key metric to bandwidth-per-wafer; the result is rising DRAM costs, pressure on phone/PC BOMs, and potential volume declines in low/mid-range devices.

Get the full analysis with uListen AI

Power is hard but likely not the long-run binding constraint.

Behind-the-meter generation (reciprocating engines, aero-derivatives, ship engines), fuel cells, solar+battery, and grid-peak smoothing can unlock hundreds of gigawatts; energy cost increases are small relative to GPU TCO under high model value.

Get the full analysis with uListen AI

Space GPUs are not a near-term answer in a chip-constrained world.

Launch/deployment delays reduce the most valuable ‘early’ compute time, hardware failures require hands-on servicing, and inter-satellite networking is mismatched to rack-scale bandwidth needs; chips should produce tokens immediately on Earth.

Get the full analysis with uListen AI

China’s long-timeline advantage hinges on semiconductor verticalization.

China is likely to fully indigenize DUV by ~2030 and may have working EUV prototypes, but mass-production is harder; slow AI timelines increase the chance China scales past the West in manufacturing capacity.

Get the full analysis with uListen AI

Taiwan disruption would crash global compute expansion regardless of engineer airlifts.

Even preserving know-how doesn’t replace destroyed fab capacity quickly; without Taiwan, the world’s ability to add compute could drop from hundreds of GW/year trajectories to a small fraction, reshaping GDP and AI progress.

Get the full analysis with uListen AI

Notable Quotes

An H100 is worth more today than it was three years ago.

Dylan Patel

By ’28, ’29, the bottleneck falls to the lowest rung on the supply chain, which is ASML.

Dylan Patel

It might be a hundred billion dollars worth of AI value… held up by this one point two billion dollars worth of tooling.

Dylan Patel

If takeoff or timelines are slow enough, then certainly China… [can] catch up drastically.

Dylan Patel

Space data centers… are not this decade.

Dylan Patel

Questions Answered in This Episode

You estimate ~3.5 EUV tools per critical-IT gigawatt for Rubin-class systems—how sensitive is that ratio to packaging changes, chiplets, and shifts in numerics (FP4/FP6/FP8)?

Hyperscaler AI capex largely pre-pays long-lead items like turbines, land, and construction years ahead, so spend today does not translate linearly into gigawatts online this year.

Get the full analysis with uListen AI

What are the most realistic ways ASML/Zeiss could increase EUV output beyond ~100 tools/year: workforce scaling, modularization, design simplification, or parallel supplier lines?

GPU economics are shifting because chip value is increasingly set by the utility of today’s best models under scarcity, making older GPUs (e. ...

Get the full analysis with uListen AI

If memory is headed for a multi-year crunch, which end markets get “sacrificed” first—low-end smartphones, PCs, consoles, automotive—and what leading indicators would confirm demand destruction is underway?

By the late 2020s the binding constraint to AI compute is expected to move upstream to semiconductor manufacturing capacity—ultimately ASML EUV tools and their intricate supplier network—rather than power or data center construction.

Get the full analysis with uListen AI

For labs like Anthropic that must add inference GW fast, what’s the practical cost difference between (a) last-minute neocloud capacity, (b) revenue-share via Bedrock/Vertex/Foundry, and (c) long-term direct leasing?

Falling back to older nodes (e. ...

Get the full analysis with uListen AI

You argue older nodes won’t save us because system effects dominate—what specific architectural features (NVLink scale-up, HBM bandwidth, packaging) are hardest to replicate on 7nm at scale?

A major memory crunch (HBM/DRAM) is projected to raise prices, reallocate wafers away from consumer devices, and potentially shrink low-end phone/PC volumes while AI captures a growing share of global memory output.

Get the full analysis with uListen AI

Transcript Preview

Dwarkesh Patel

All right, this is the episode where my roommate teaches me semiconductors. [laughs]

Dylan Patel

[laughs] It's also the send-off for this, uh, this current set.

Dwarkesh Patel

It's fi-- yeah, you're-- you know, after you use it, I'm like, "I can't use this again." [laughs]

Dylan Patel

[laughs] Oh.

Dwarkesh Patel

I gotta get out of here.

Dylan Patel

No, no sloppy seconds for Dwarkesh. [laughs]

Dwarkesh Patel

[laughs] Okay, Dylan is the, uh, CEO of SemiAnalysis. Dylan, the burning question I have for you, um, if you add up the big four, Amazon, Meta, Google, Microsoft, their combined, uh, forecasted capex that y-you published recently this year is six hundred billion dollars. And given, uh, y-you know, yearly prices of renting that compute, that would be, like, close to fifty gigawatts. Now, obviously, we're not putting on fifty gigawatts this year, so presumably that's paying for compute that is gonna be coming online over the coming years. So I have a question about w-what-- how to think about the timeline ar-a-around when that capex comes online. Similar question for the labs, where, you know, OpenAI just announced that they raised a hundred and ten billion dollars. Anthropic just announced they raised thirty billion dollars. And if you look at the compute that they have coming online this year, um, you, you should tell me how much it is, but, like, is it not, is it in another four gigawatts total that they'll have this year? It feels like the cost to rent the compute that OpenAI and Anthropic will have this year to, like, sustain their compute spend at, you know, ten, thirteen billion dollars a gigawatt, th-those individual raises alone are, like, w-enough to cover their compute spend for the year, and then this is not even including the revenue that they're gonna earn this year. So help me understand first, when is the timescale at which the big tech capex is actually coming online? And two, what are the labs raising all this money for if, like, the, the yearly price of a, a one gigawatt data center is, like, thirteen billion dollars?

Dylan Patel

So when you talk about the capex of these hyperscalers, right, on the order of six hundred billion dollars, and you look at the cross, the rest of the supply chain, gets you to on the order of a trillion dollars. A portion of this is, you know, immediately for compute going online this year, right? The chips and the, uh, the, the other parts of capex that do get paid this year. But there's a lot of setup capex as well, right? So when we have-- when we're talking about twenty gigawatts this year in America, roughly-

Dwarkesh Patel

Incremental

Dylan Patel

... incremental added capacity, a portion of this is not spent this year. A portion of that capex is actually spent the prior year. And so when you look at, hey, Google's got a hundred and eighty billion dollars, actually a ch- big chunk of that is spent on turbine deposits for '28 and '29. A chunk of that is spent on data center construction for '27. A chunk of that is spent on, you know, power purchasing agreements and down payments and all these other things that they're doing, uh, for further out into the future so that they can set up this super fast scaling, right? And, and, and this applies to all the hyperscalers and other people in the supply chain. And so, you know, twenty gigawatts roughly deployed this year, um, a big chunk of that being hyperscalers, chunk not being... And all of these companies, their biggest customers are Anthropic and OpenAI. Um, Anthropic and OpenAI are in the, you know, two gigawatt and, you know, two and a half gigawatt and one and a half gigawatts roughly right now. They're trying to scale to much larger, right? If you look at what Anthropic has done over the last few months, you know, four billion, six billion revenue added, and if we just draw a straight line, hey, yeah, they'll add another six billion dollars of revenue a month. Uh, people would argue that's bearish and that they should go faster. What that implies is that they're gonna add sixty billion dollars of revenue across the next ten months, right? And sixty billion dollars of revenue at the current gross margins that Anthropic had at least last, uh, reported by media, um, would imply that they have, you know, roughly forty billion dollars of compute spend for that inference, for that sixty bill of revenue. That forty billion of compute at roughly ten billion dollars a gigawatt, um, rental cost means that they need to add four gigawatts of inference capacity just to grow revenue, and that's saying that their research and development training fleet stays flat, right? So, you know, in a sense, Anthropic needs to get to well above five gigawatts by the end of this year, and it's gonna be really tough for them to get there, but it's possible.

Install uListen to search the full transcript and get AI-powered insights

Get Full Transcript

Get more from every podcast

AI summaries, searchable transcripts, and fact-checking. Free forever.

Add to Chrome