Dwarkesh PodcastDylan Patel on Dwarkesh Patel: How EUV Tools Cap AI by 2030
Carl Zeiss optics and specialized mirror stacks bottleneck ASML output; CoWoS deposits and turbine delays add years between capex and delivered compute.
EVERY SPOKEN WORD
150 min read · 30,028 words- 0:00 – 24:52
Why an H100 is worth more today than 3 years ago
- DPDwarkesh Patel
All right, this is the episode where my roommate teaches me semiconductors. [laughs]
- DPDylan Patel
[laughs] It's also the send-off for this, uh, this current set.
- DPDwarkesh Patel
It's fi-- yeah, you're-- you know, after you use it, I'm like, "I can't use this again." [laughs]
- DPDylan Patel
[laughs] Oh.
- DPDwarkesh Patel
I gotta get out of here.
- DPDylan Patel
No, no sloppy seconds for Dwarkesh. [laughs]
- DPDwarkesh Patel
[laughs] Okay, Dylan is the, uh, CEO of SemiAnalysis. Dylan, the burning question I have for you, um, if you add up the big four, Amazon, Meta, Google, Microsoft, their combined, uh, forecasted capex that y-you published recently this year is six hundred billion dollars. And given, uh, y-you know, yearly prices of renting that compute, that would be, like, close to fifty gigawatts. Now, obviously, we're not putting on fifty gigawatts this year, so presumably that's paying for compute that is gonna be coming online over the coming years. So I have a question about w-what-- how to think about the timeline ar-a-around when that capex comes online. Similar question for the labs, where, you know, OpenAI just announced that they raised a hundred and ten billion dollars. Anthropic just announced they raised thirty billion dollars. And if you look at the compute that they have coming online this year, um, you, you should tell me how much it is, but, like, is it not, is it in another four gigawatts total that they'll have this year? It feels like the cost to rent the compute that OpenAI and Anthropic will have this year to, like, sustain their compute spend at, you know, ten, thirteen billion dollars a gigawatt, th-those individual raises alone are, like, w-enough to cover their compute spend for the year, and then this is not even including the revenue that they're gonna earn this year. So help me understand first, when is the timescale at which the big tech capex is actually coming online? And two, what are the labs raising all this money for if, like, the, the yearly price of a, a one gigawatt data center is, like, thirteen billion dollars?
- DPDylan Patel
So when you talk about the capex of these hyperscalers, right, on the order of six hundred billion dollars, and you look at the cross, the rest of the supply chain, gets you to on the order of a trillion dollars. A portion of this is, you know, immediately for compute going online this year, right? The chips and the, uh, the, the other parts of capex that do get paid this year. But there's a lot of setup capex as well, right? So when we have-- when we're talking about twenty gigawatts this year in America, roughly-
- DPDwarkesh Patel
Incremental
- DPDylan Patel
... incremental added capacity, a portion of this is not spent this year. A portion of that capex is actually spent the prior year. And so when you look at, hey, Google's got a hundred and eighty billion dollars, actually a ch- big chunk of that is spent on turbine deposits for '28 and '29. A chunk of that is spent on data center construction for '27. A chunk of that is spent on, you know, power purchasing agreements and down payments and all these other things that they're doing, uh, for further out into the future so that they can set up this super fast scaling, right? And, and, and this applies to all the hyperscalers and other people in the supply chain. And so, you know, twenty gigawatts roughly deployed this year, um, a big chunk of that being hyperscalers, chunk not being... And all of these companies, their biggest customers are Anthropic and OpenAI. Um, Anthropic and OpenAI are in the, you know, two gigawatt and, you know, two and a half gigawatt and one and a half gigawatts roughly right now. They're trying to scale to much larger, right? If you look at what Anthropic has done over the last few months, you know, four billion, six billion revenue added, and if we just draw a straight line, hey, yeah, they'll add another six billion dollars of revenue a month. Uh, people would argue that's bearish and that they should go faster. What that implies is that they're gonna add sixty billion dollars of revenue across the next ten months, right? And sixty billion dollars of revenue at the current gross margins that Anthropic had at least last, uh, reported by media, um, would imply that they have, you know, roughly forty billion dollars of compute spend for that inference, for that sixty bill of revenue. That forty billion of compute at roughly ten billion dollars a gigawatt, um, rental cost means that they need to add four gigawatts of inference capacity just to grow revenue, and that's saying that their research and development training fleet stays flat, right? So, you know, in a sense, Anthropic needs to get to well above five gigawatts by the end of this year, and it's gonna be really tough for them to get there, but it's possible.
- DPDwarkesh Patel
Can, can I ask a question about that? So-
- DPDylan Patel
Yeah
- DPDwarkesh Patel
... um, if Anthropic was not on track to have five gigawatts by the end of, end of this year, but it needs that to serve both the n-revenue that's gone crazier than expected, and maybe it's gonna be even more than that, plus the research and training to make sure its models are good enough for next year, how, how-- where is that gonna come from?
- DPDylan Patel
You know, Dario, when he was on your podcast, was very, very, like, conservative. He's like, "You know, I'm not gonna go crazy on compute because if my revenue inflects at a different rate, at a different point, I don't wanna go bankrupt. You know, I wanna make sure that we're being responsible with this scaling." But in reality, you know, he's definitely missed the pooch in terms of, like, going like OpenAI, which was, "Let's just sign these crazy fucking deals," right? Um, and OpenAI has k-kind of got way more access to compute than Anthropic by the end of the year. And so what does Anthropic have to do to get the compute? Well, they have to go to lower quality providers that they would not have gone to before, right? You know, optimally, you know, Anthropic, at least historically, has had the best quality providers, been like Google and Amazon. Um, whereas, you know, at least historically minded, you know, the biggest companies in the world, um, now Microsoft, and now they're expanding across the supply chain and going to other players that are newer. Um, OpenAI has been, you know, a bit more aggressive on going to many players. Yes, they have tons of capacity from Microsoft. Uh, they have Google and Amazon as well, but they also have, like, tons with CoreWeave and Oracle, and they've gone to, like, random companies, or, you know, one would think random companies like SoftBank Energy, who has never built a data center in their life, but, you know, they're building data centers now for OpenAI. So they've gone to, and, and, and many others like Enscale and others, um, that they're going and getting capacity from. And so there's this, like, conundrum for Anthropic because they were so conservative on compute, um, because they didn't wanna go crazy, right? Anthro-- Oh, in, in some sense, a lot of the financial freak outs in the second half of last year were like, OpenAI signed all these deals, but they don't have the money to pay for them. Um, okay, Oracle stock's gonna tank. Oh, okay, CoreWeave stock's gonna tank. Oh, okay, like, you know, all these companies' stocks tanked, um, and credit markets went crazy because people were like, "The end buyer can't pay for this." Now it's like, "Oh, wait, they raised a ton of money. Okay, fine, they can pay for it." ButIn the sense Anthropic was a lot more conservative. They were like, "We'll sign contracts, but we'll be principled, um, and we'll purposely undershoot what we think we can possibly do, um, and be conservative because we don't wanna potentially go bankrupt."
- DPDwarkesh Patel
But the thing I don't understand is, so in a-- But what, what, what does it mean to have to acquire compute in a pinch? Um, is it that you have to go with, like, neoclouds that-- Is it that they have worse compute or so- Like, in what way is it worse? And is it that you had to pay gross margins to a cloud provider that you wouldn't have otherwise had to pay to because you're coming in at the last minute? Who built the spare capacity such that i-it's available for Anthropic and OpenAI to get last minute? And, like, basically, what is the concrete advantage that OpenAI has gotten if they end up at similar compute numbers by twenty twenty-seven? Um, is it just, like, they're gonna end this year with different gigawatts? If so, h-how many gigawatts is Anthropic and OpenAI gonna have by the end of this year?
- DPDylan Patel
Yeah. So to, to acquire excess compute, I mean, yes, there is capacity at hyperscalers that, um... And not all contracts for compute are long-term, right? Five years, right? There's compute that in twenty twenty-three or twenty twenty-four, H100s twenty twenty-five, that were signed at not five-year deals, right? OpenAI, the vast majority of their compute is signed at five-year deals. But they can-- You know, there were, there were many other customers that had one-year, two-year, three-year deals, six-month deals, on demand. And as these contracts roll off, who is the participant in the market most willing to pay price? Um, and in this sense, right, we've seen H100 prices inflect a lot and go up, and people willing to sign long-term deals for, you know, as above two dollars even, right? Like, I've seen deals where certain AI labs, I'm gonna be a little bit, uh, vague here for a reason, uh, have signed at as high as two dollars and forty cents for two to three years for H100s, which, if you think about the margin, a dollar forty f-for Hopper when you release it, uh, or Hopper to build it, um, across five years. And now two years in, you're signing deals that are two to three years that are at two dollars and forty cents. Those margins are way higher, right?
- DPDwarkesh Patel
Insane.
- DPDylan Patel
And so now you can crowd out all of these other suppliers, whether it's Amazon had these or CoreWeave had these, or Together AI or Nebius or whoever it is, right? You know, the, the-- Or these, these neoclouds are the firms that had a higher percentage of Hopper in general because they were more aggressive on it, A. And B, they tended to sign shorter-term deals. You know, not CoreWeave, but the others tended to sign shorter-term deals. And so, hey, if I want Hopper, there is some capacity out there. And then also, while most of the capacity at, like, an Oracle or a CoreWeave is signed for a long-term deal in terms of Blackwell, uh, anything that's going online this quarter is already sold. Um, and, and in some cases, they're not even hitting all the numbers that they promised they would sell because there are some data center delays, not just those two, but, like, Nebius and all the other folks, M-Microsoft, Amazon, Google. But there is a lot of neoclouds, as well as some of the hyperscalers who have capacity they're building that they did not sell yet, or capacity that they were gonna allocate to some internal use, uh, that is not necessarily super A-AGI focused that they may now turn around and sell.
- DPDwarkesh Patel
Interesting.
- DPDylan Patel
Or they may, you know, in the case of Anthropic, they don't have to have all the compute directly, right? Amazon can have the compute. They can serve Bedrock, or Google can have the compute and serve Vertex, or Microsoft can have the compute and serve Foundry and then do a revenue share with Anthropic or vice versa.
- DPDwarkesh Patel
Okay. Basically, you're saying Anthropic is having to pay either this, like, fifty percent markup in the sense of the revenue share or in the sense of last-minute spot compute that they wouldn't have otherwise had to pay had they bought the compute early.
- DPDylan Patel
Right. And, and, you know, there's a trade-off there. Uh, but also at the same time, um, you know, for a solid, like, four months, everyone was like, "OpenAI, we're not gonna sign deals with you."
- DPDwarkesh Patel
Yeah.
- DPDylan Patel
Like, that sounds crazy, right? Because you guys don't have the money. Now everyone's like, "Yeah, OpenAI, we believed you the whole time. We can, we can sign any deal 'cause you've raised all this money." Um, but i-in a sense, O-Open-- Anthropic is constrained in that sense. Um, there are not that many incremental buyers of compute yet because Anthropic hit the capability tier first where their revenue is mooning.
- DPDwarkesh Patel
Oh, th-that's interesting. Like, the, the-- That's, uh, this... You know, 'cause otherwise you were like, well, having the best model is a extremely depreciating asset that, you know, three months later or you don't have the best model. But, like, the reason it's important is that you can sign these deals and then lock in the compute in advance, get better prices. Um, doesn't this also imply, by the way, and maybe this is an obvious point, but there's, at least until recently, people had made this huge point about, oh, what is the depreciation cycle of a GPU? And the bears, the, like Michael Burrys or whatever, have said, "Look, people are saying that four or five years for these GPUs." And in fact, if you... Uh, uh, maybe it's because the technology's improving so fast or whatever. In fact, it makes sense to have two-year depreciation cycles for these GPUs, which increases the sort of, like, reported amortized CapEx in a given year, uh, and so makes it maybe financially less lu-lu-lucrative to building all these clouds. But in fact, you were pointing at, like, m-maybe the depreciation cycle is even longer than five years, 'cause if we're using Hoppers, and then e-especially if AI really takes off and in twenty-thirty we're like, "Fuck, we gotta, like, get the seven-nanometer fabs up, and we gotta, like, you know, we gotta go back to the A100s," [chuckles] like, turn on the A100s again, uh, then it's like actually the depreciation cycle, cycle is incredibly long. And, um, uh, so I, I feel like that's an interesting financial implication of-
- DPDylan Patel
Yeah, there's-
- DPDwarkesh Patel
... what you're saying.
- DPDylan Patel
There's a few, um, strings to pull on there. One is, um, what happens to depreciation of GPUs, right? Um, and, and I guess I didn't answer your prior question, which is, like, Anthropic I think will be able to get to, like, five gigawatts-ish, maybe a little bit more by the end of the year, through themselves as well as their product being served through Bedrock or through-
- DPDwarkesh Patel
Yeah
- DPDylan Patel
... Vertex or through Foundry. Uh, I think they'll be able to get to five or six gigawatts, uh, which is way above their, like, initial plans, right?
- 24:52 – 34:34
Nvidia secured TSMC allocation early; Google is getting squeezed
- DPDwarkesh Patel
let's get into logic and memory. Um, it-- how specifically NVIDIA has been able to lock up so much of both. So if you-- I think according to your numbers, by '27, NVIDIA is gonna have like seventy-plus percent of N3 wafer capacity or something like that, um, uh, or, or around that area. And then I, I, I forget what the numbers were for memory at SK hynix and Samsung and so forth. But, um, if you look at-- So think about how the Neo cloud business works and how NVIDIA works with that, or how the, uh, RL environment business works and how Anthropic works with that. In both those cases, NVIDIA is purposely trying to fracture the complementary industry to make sure that they have as much leverage as possible, so they're giving, you know, a-allocation to random Neo clouds to make sure that there's not one person that has all the compute. Similarly, Anthropic or OpenAI, when they're working with the data providers, they say, "No, we're gonna just seed a huge industry of th-these things so that, um, we're not locked into any one supplier for, uh, for data environments." And I wonder why on the three-nanometer process th-that's gonna be Tranium 3, that's gonna be TPU v7, uh, other accelerators potentially, and why is TSMC just giving it all up to NVIDIA rather than, you know, trying to fracture the market?
- DPDylan Patel
Yeah. So I think, um, there's a couple like points here, right? Um, on three nanometer, you know, if we go back to last year, the vast majority of three nanometer was Ap- Apple.
- DPDwarkesh Patel
Right.
- DPDylan Patel
Right? Apple's being moved to two nanometer. Memory prices are going up, so Apple's volumes may go down, right? Because as memory prices go up, they have to-- either they cut margin or they, uh, move, move on. You know, there, there's some time lag 'cause they have long-term contracts. But basically, Apple likely reduces demand/moves to two nanometer faster, where two nanometer is only capable of sort of mobile chips today. Um, and in the future, AI chips will move there, so sort of Apple has that. And then Apple's also talking to, uh, third-party vendors because they're getting squeezed out of TSMC a little bit, um, because TSMC's margins on high-performance computing, um, HPC, AI chips, et cetera, is higher than it is for mobile, um, because they have a bigger advantage in mobile, um, sorry, in HPC than they do in mobile. But anyways, when you look at what's, what's TSMC running calculus here, actually they're providing really good, um, allocations to companies that are doing CPUs, right? So when you think about, hey, Amazon has Tranium and Amazon has, uh, Graviton, both of those are on three nanometer, Graviton being their CPU, Tranium being their s-- their AI chip. They're actually-- TSMC is much more excited to give allocation to Graviton than they are to Tranium because they view CPU business as more stable long-term growth, right? And as a company that is conservative and doesn't want to ride cycles of growth too hard, you actually wanna allocate to the, uh, the market that is more stable and lower growth rate first before you allocate all the incremental capacity to the fast growth rate market. Now, that is, that is the case generally, and so when you look at like, hey, same for AMD, right? The ca- allocations they get on, um, you know, their CPUs is, is like-- TSMC is much more excited about those than they are for GPUs. Um, likewise for Amazon. And NVIDIA, um, is, is a bit unique because all-- yes, they have CPUs, yes, they make switches, yes, they make networking, um, they make NVLink, they make all these different InfiniBand, Ethernet, all these dif- different products, NICs. Um, by and large, most of these things will be on three nanometer by the end of this year with the Rubin launch and all the chips that are in that family, um, the GPU being the most important one. And yet NVIDIA is getting the majority of supply, right? Part of this is because you look at the market and you like sort of like, you know, TSMC and others like they, there are many ways that they forecast market demand. Um, but also it's dema-- market signal, right? The market signaled, "Hey, we need this much capacity next year. We need this much. We need this much. We'll sign non-cancel-- non-returnable. We may even pay deposits," right? Things like this. NVIDIA just did it way earlier than Google or Amazon. And in some cases, Google and Amazon had stumbling blocks. You know, there was one, one of the chips got delayed slightly by, by a couple quarters, uh, Tranium and, and all these sorts of things happened. And then so in that case, there was a huge sort of like, okay, well, these guys are delaying, but NVIDIA is wanting more, more, more, more, more. And we are checking with the rest of the supply chain, is there enough capacity, right? So they're going to all the PCB vendors and they're saying, "Hey, is there enough, uh, Victory Giant? Is there enough PCB?" This is like one of the largest suppliers of PCBs to NVIDIA, and they're a Chinese company. All the, all the PCBs come from China, sort of from them, um, or many of them. And, and anyways, they're like, "Do you have enough PCB capacity? Great. Oh, hey, uh, memory vendors, who has all the memory capacity? Oh, okay, NVIDIA does. Great." Um, so when you look at sort of in the same way, you know, who, who is AGI-pilled enough to buy compute and long timelines at levels that seem ridiculous to people who aren't AGI-pilled, but nonetheless they're willing to pay a pretty good margin, um, and sign it now because they view in the future that, that ratio is screwed up. The same thing happens with the supply chain for semiconductors, right? NVIDIA was-- while the-- I don't think NVIDIA is quite AGI-pilled, right? You know, Jensen doesn't believe software's gonna be automated fully and all these things, right?
- DPDwarkesh Patel
A-accelerated computing, not AI chips, right? [chuckles]
- DPDylan Patel
Eh, it's AI chips, right?
- DPDwarkesh Patel
Yeah. But that's what he calls it, right? He-
- DPDylan Patel
Yeah. 'Cause I mean, I think there's a broader term, right? 'Cause AI is within that, but like physics modeling and simulations-
- DPDwarkesh Patel
Yeah, yeah
- DPDylan Patel
... and like-
- DPDwarkesh Patel
Or but he just like he's not embracing the sort of like main use case and-
- DPDylan Patel
I think he's embracing it. But like I, I just don't think he's like AGI-pilled like Dario, right?
- DPDwarkesh Patel
Yeah.
- DPDylan Patel
Or Sam. But he's still way, way more AGI-pilled than-Google was at Q3 of last year, or Amazon was at Q3 of last year, and he saw way more demand, right? Um, and, and, and, and the reason is pretty simple. You know, you can see all the data center construction. He's like, "Okay, I wanna have this market share." Um, you know, we sort of like have all the data centers tracked and, you know, you can see, you know, there's, there's a lot of data centers that you could say, well, they could be one or the other, right? And so in some-- to some extent, Google and Amazon, you know, Google especially, even though their, you know, their TPU is just better for them to deploy, they have to deploy a crap load of GPUs because they don't have enough TPUs to fill up their data centers. They can't get them fabbed.
- DPDwarkesh Patel
Wait, can I-- So I, I have a question about that. Google sold, I think, a million, was it the V7s-
- DPDylan Patel
Yes
- DPDwarkesh Patel
... the Ironwoods, to Anthropic. And you're saying in general, there's this, the b-big bottleneck right now, this year or next year, I, I mean, I guess going forward forever now, is gonna be the, the, uh, you know, logic memory, the stuff that like it takes to build these chips. And Google has DeepMind. This is the other third prominent AI lab. And if this is the big bottleneck, why would they sell it rather than just giving it to DeepMind?
- DPDylan Patel
Right. So, so this is again, like a, a problem with like... You know, DeepMind people are like, "This is insane. Why did we do this?"
- DPDwarkesh Patel
Yeah.
- DPDylan Patel
Right? But then Google Cloud people and Google executives saw a different like thought process, right? And basically, um, you, you, you know, you and I know the Compute team. There's one guy from-- You know, both of them actually came from Google, uh, at, at, uh, the main people on, on, uh, the Compute team at Anthropic. They saw this dislocation, they negotiated a deal, and they were able to get access to these-- to this compute before Google realized. And so the, actually the chain of events, at least from our data that we found, was in, in early Q3, um, we saw over the course of two, uh, o-over the course of like six weeks, we, we saw capacity on, um, Anthropic-- or sorry, on TPUs go up by a significant amount over the course of those six weeks, and it went up like multiple times in those six weeks, right? There were multiple requests. Google even had to go to TSMC and explain to them why they needed this, uh, increase in capacity because it was so sudden. But that, a lot of that capacity increase was for selling to Anthropic.
- DPDwarkesh Patel
Yeah.
- DPDylan Patel
Because Anthropic saw it before Google. And then Google had Nana Banana and Gemini 3, which caused their user metrics to skyrocket, and leadership at Google was like, "Oh." And then they started making the statement of we have to double compute every, is it six months, or I don't remember the exact number that they said. Um, but they, they really woke up a lot more, and then they were like, "Oh, hey, TSMC, we want more. We want more." And it's like, "Well, sorry, guys," like, "we're sold out for next year. Um, we can work on next year. We can maybe get like five, 10% more for '26, but really we're gonna work on '27," right? It's sorta like, you know, there's this like information asymmetry of the labs in my mind, right? I don't know if this is exactly... It's the narrative I've spun myself from seeing all the data in the supply chain from like wafer orders and like what's going on with the data centers that, you know, Anthropic signed and Fluid Stack signed and all this. Like sort of it's, it's, it's, it's pretty clear to me that Google screwed up. And you can see this from Google's Gemini ARRs, right? Um, they had next to nothing in Q1 to Q3. Uh, Q3 a little bit, right, once they started inflecting. But Q4, they were at like five billion ARR, right? Um, exiting or something like this. So it's like... Or 5 billion revenue for Q4, uh, on an ARR basis. Um, and so it's clearly like Google didn't see revenue skyrocket. Um, and in a sense, right, Anthropic was not willing, you know, has kinda had like a little bit of commitment issues before their ARR exploded-
- DPDwarkesh Patel
Yeah
- DPDylan Patel
... even though they have far more information asymmetry and see what's coming down the pipe. Google is going to be more conservative than Anthropic is, A. And B, Google had, had even less ARR. Um, so they, they sort of were like, I think, just not willing to like sort of do it, and then they realized they should do it. And so now since then, Google, um, has gotten absurdly AGI-pilled, right, uh, in terms of like what they're doing. They bought an energy company. They're buying, putting d-deposits down for turbines. Uh, they're buying a ridiculous percentage of the powered land. Uh, they're going to utilities and negotiating long-term agreements. They're doing this on the data center and, um, power side, um, very, very aggressively, right? So, you know, I think Google woke up towards the end of last year, but it took them some time.
- DPDwarkesh Patel
And h-how many gigawatts do you think Google will have by the end of next year?
- DPDylan Patel
By my data. [both laughing]
- DPDwarkesh Patel
You charge for that kind of information.
- DPDylan Patel
Yes. Yes.
- DPDwarkesh Patel
[chuckles] Um,
- 34:34 – 56:06
ASML will be the #1 constraint for AI compute scaling by 2030
- DPDwarkesh Patel
I feel like every year the bottleneck for what is preventing us from scaling AI compute keeps changing. Uh, a couple years ago it was CoWoS. Last year it was power. This year... You'll tell me what the bottleneck is this year. But I wanna understand five years out, what will be the thing that is constraining us from deploying the singularity?
- DPDylan Patel
Yeah. I think the biggest bottleneck is compute, and for that, the longest lead time supply chains are not power or data centers. They're actually the semiconductor supply chain themselves, right? It switches back from being power and data center, uh, as a major bottleneck to chips. And in the chip supply chain, there's a number of different bottlenecks, right? There's memory, there's, uh, logic wafers from TSMC. There's, uh, there's fabs themselves. Construction of the fabs takes a couple years, three, two to three years, versus a data center takes, uh, less than a year, right? Uh, we've seen Amazon build data centers in as fast as eight months, right? So there's a big difference in lead times because of the complexity of the building, the fab that actually makes the chips. And then the tools, right? Those also have really long lead times. And so the bottlenecks, as we've scaled, have shifted from, hey, what is the supply chain currently not, what is it currently not able to do? Um, which was CoWoS and power and data centers, but those were all shorter lead time items, right? CoWoS is a much more simple process of packaging chips together. Um, power and data centers are ultimately way more simple than the actual manufacturing of the chips. And so there's been some sliding of, of, of capacity across, you know, mobile or PC to data center chips, but that's been somewhat fungible, whereas, um, in, uh, whereas CoWoS and power and data centers, those have sort of had to start anew as supply chains. But now there's sort of no more capacity for the mobile and PC industries, which used to be the majority of the semiconductor industry, to shift over to AI, right? Nvidia is now the largest customer at TSMC, and Nvidia is the largest customer at SK hynix, the largest memory manufacturer, right? So it's sort of-Impossible for the scaling or the sliding of resources away from the common person, right? PCs and, and smartphones to shift any more towards the AI chips. And so now how do we scale the AI chip production? And that's the biggest bottleneck as we go to 2030 is those.
- DPDwarkesh Patel
It'd be very interesting if there's an absolute gigawatt ceiling that you can project out to 2030 based just on, "Hey, we can't produce more than this many EUV machines."
- DPDylan Patel
Right. So to scale compute further, right, there's some different bottlenecks this year, next year. Uh, but ultimately by '28, '29, the bottleneck falls to the lowest rung on the supply chain, which is ASML, right? ASML makes the world's most complicated machine, i.e. an EUV tool, um, and the selling price for those is three hundred, four hundred million dollars. And currently they can make about seventy. Next year they'll get to eighty. Uh, even under very aggressive supply chain expansion, they only get to a little bit over a hundred by the end of the decade. And so what does that mean? Okay, they can make a hundred of these tools by the end of the decade and, um, you know, seventy right now. How does that actually translate to AI compute, right? We, we see all these numbers from Sam Altman and, and many others across the supply chain, gigawatts, gigawatts, gigawatts, right? How many gigawatts are we adding? Um, and we see, you know, Elon saying, "Hey, the hun- hundred gigawatts in space."
- DPDwarkesh Patel
A year.
- DPDylan Patel
A year, right. The, the problem with any of, uh, these numbers or the challenge to these numbers is, you know, actually not the power, not the data center. We can dive into that. But it's, it's, it's manufacturing the chips, right? So a gigawatt of, you know, Nvidia's Rubin chips, right? So Rubin is announced at GTC, uh, I believe the week this podcast goes live. And to make a gigawatt worth of data center capacity of Nvidia's latest chip that they're releasing at the end of this year or towards the end of this year, you need, you know, a few different wafer technologies, right? Um, you need about fifty-five thousand wafers of three nanometer. You need about six thousand wafers of five nanometer, and then you need about a hundred and seventy thousand wafers of DRAM, right? Memory. And so across these three different buckets, um, each of these requires different amounts of EUV, right? So when you manufacture a wafer, uh, there's thousands and thousands of process steps where you're depositing material, removing them. But the sort of key critical step, which at least in advanced logic is like thirty percent of the cost of the chip, is something that doesn't actually put anything on the wafer, right? You take the wafer, you deposit photoresist, which is like a chemical that basically chemically changes when you expose it to light, and then you stick it into the EUV tool, which shines light at it in a certain way. It patterns it, right? 'Cause there's what's called a mask, st- which is a stencil effectively for the design. And so when you look at a wafer, um, you know, a leading-edge three nanometer wafer has seventy or so masks, right? Seventy or so layers of lithography, but twenty of them are the most advanced EUV, right? And that specifically, you know, if you think about, okay, well, if I need fifty-five thousand wafers for a gigawatt, if I do twenty EUV wa-- uh, passes per wafer, you then, you can do the math that's like, okay, that's one point one million passes of EUV for a single gigawatt. So actually, like it's pretty simple. And then once you add the rest of the stuff, it ends up being two million, right, across five nanometer and all the memory. You're at roughly, um, two million EUV passes for a single gigawatt. You know, these, these tools are very complicated, so, um, when you think about what it's doing across a wafer, it's taking the wafer and it's scanning and it's stepping across, or it's scanning, stepping across, and it does this hundreds of times across the entire... or dozens of times across the whole wafer. And, and so when you're talking about, hey, how many EUV passes? That's the entire wafer is being exposed, um, at a certain rate. A wafer... A, a EUV tool can do roughly seventy-five wafers per hour, um, and the tool is up roughly ninety percent of the time, right? So in the end, you end up with, actually, I need about three and a half EUV tools to do the two million EUV wafer passes for the gigawatt. So three and a half EUV tools, uh, satisfies a gigawatt. So it's funny to think about the numbers, right? Because we're talking about, oh, what's a gigawatt cost? It costs like fifty billion dollars roughly, right? Whereas what does three and a half EUV tools cost? That's like one point two, right?
- DPDwarkesh Patel
Right.
- DPDylan Patel
Um, it's actually like quite a lower number, which is, which is interesting to think about, like, oh, fifty gigawatts of economic, you know, sort of CapEx in, in the data center, and what gets built on top of that in terms of tokens is even larger, right? It might be a hundred billion dollars worth of AI value into the supply chain-
- DPDwarkesh Patel
Right
- DPDylan Patel
... is held up by this one point two billion dollars worth of tooling that simply just cannot expand its supply chain quickly.
- DPDwarkesh Patel
In, in, in fact, it, it goes... Even the intermediate layers, um, are sort of shocking here. So, um, Carl Zeiss, which is like the optics supplier that is bottlenecking ASML itself, the... I checked its market cap this morning. You know what it is? Two point five billion dollars.
- DPDylan Patel
Dude, let's, let's LBO that.
- DPDwarkesh Patel
[laughs]
- DPDylan Patel
Let's LBO it.
- DPDwarkesh Patel
Um, and I think... So you, you, you read this article recently where you were saying over the last three years, TSMC has done a hundred billion dollars of CapEx, so it's like thirty, thirty, uh, forty. And if, if you think about... I mean, a small fraction of that is sort of like b-being used by Nvidia for the three nanometer that it's gonna... or, you know, previously four nanometer that, that it's using for its chips. Um, but Nvidia has turned that into... What was, what, what are its like... Earnings last quarter was like forty billion, and so forty billion times four, so a hundred and sixty billion dollars. So Nvidia alone is turning some small fraction of a hundred billion in CapEx that's gonna be depreciated over many years, not just this one year, into a hundred and sixty billion dollars in a single year. And then that gets even more intense when you go down the supply chain to ASML, which is taking a billion dollars worth of machines to produce a gigawatt. And then, of course, those machines last for more than a year, right? So it's, it's doing more than that. Okay, so now I wanna understand, okay, well, how many such machines will there be by 2030 if you include not just the ones that are sold that year but are... have been compiling over the previous years? Um, and what does that imply about the... Sam Altman says he wants to do a gigawatt a week in 2030.Are, are, uh, when you add up those numbers, is that compatible with that?
- DPDylan Patel
Right. That's, that's completely compatible, right?
- DPDwarkesh Patel
Okay.
- DPDylan Patel
'Cause if you think about TSMC and the entire ecosystem has something 250 to 300 EUV tools already, um, and then you stack on 70 this year, 80 next year, growing to 100 by 2030, you're at, like, 700 EUV tools by the end of the decade. Um, 700 EUV tools, three and a half tools per gigawatt, um, assuming it's all allocated to AI, which it's not, but three and a half tools per gigawatt gets you to 200 gigawatts-
- DPDwarkesh Patel
Yeah
- DPDylan Patel
... worth of AI chips for the data centers to deploy, right? So 200 gigawatts, Sam wants 50 gigawatts, right, 52 gigawatts a year. He's only taking 25% share then, right? Obviously, there's some share given to, um, you know, mobile and PC, uh, assuming that, you know, s- for some reason we're allowed to even have consumer goods still, [chuckles] um, you know, and we don't get priced out of them. But, you know, roughly, like, he, he's saying 25%, 50% l- you know, 25% market share of the total chips fab. That's, that's kind of, like, very reasonable given, you know, this year alone, I think he's gonna have access to 25% of the Blackwell GPUs that are deployed, right? So it's, it's not that crazy.
- DPDwarkesh Patel
I find it surprising that, you know, wha-when was the first, uh... When did ASML start shipping EUV tools? When the seven-nanometer started, so I don't know when that was exactly. But you're saying in 2030, they're gonna be using machines that initially were shipped in 2020. So 10 years, you're using the same most important machine in this m-most technologically advanced industry in the world. I, I find that surprising.
- DPDylan Patel
So ASML's been shipping EUV tools now for roughly a decade, but it only entered p-mass volume production around 2020. You know, the tool's not the same. Um, you know, back then the tools were even lower throughput. Um, there were-- there's various specifications around them called overlay, right? You know, I was mentioning you're stacking layers on top of each other, right? You'll do some EUV, you'll do a bunch of different process steps, depositing stuff, etching stuff, cleaning the wafer, you know, dozens of those steps before you do another EUV layer. Um, there's a spec called overlay, right? Which is, okay, you did all this work, you know, you drew these lines on the wafer, um, now I wanna draw these dots, right? Let's just say I wanna draw these dots to connect this li- these lines of metal to, and then do- you know, holes, and then the next layer up is another set of lines g-it goes perpendicular, so now you're connecting wires going perpendicular to each other. Um, there-- you have to, you have to be able to land them on top of each other, so it's called overlay. And overlay is a spec that's been improved rapidly by ASML. Wafer throughput has been improved rapidly by ASML. And also the price of the tool has gone up, but not as much as the capabilities of the tool, right? Initially, the EUV tools were, like, $150 million, and over time, they're now, like, $400 million, uh, you know, as I, as I look out to 2028. But the capabilities of the tools have more than doubled as well, right? Especially, um, on throughput and overlay accuracy, which is the ability to stack, you know, accurately align the, the sub-subsequent passes on top of each other, um, even though you do tons of steps between. And so this is, this is, um... You know, ASML is improving super rapidly. I think it's also something noteworthy to say ASML is m- you know, maybe one of the most generous companies in the world, right? They have this linchpin thing. No one has anything competitive. Maybe China will have some EUV by the end of the decade, but no one else, you know, has anything even close to EUV, um, and yet they haven't taken price and margins up like crazy, right? You know, you go ask, you know, some other folks, you know, that we talk to all the time, like, you know, for example, Leopold, and they're like, "Why?" You know, "Let's, let's, you know, let's, let's have the price go up," right? Uh, 'cause they can. The margin is there. You can, you can take the margin. Like Nvidia takes the margin, memory players are taking the margin, but ASML has never risen the price more than they've increased the capability of the tool. Um, and so in a sense, they've always provided net benefit to their customer. It's not that the tool is stagnant, it's just that, like, you know, these tools are old. Yes, you can upgrade them some, and the new tools are coming. And for simplicity's sake, we're kind of ignoring, you know, the advances for this podcast, the advances in overlay or throughput per tool.
- DPDwarkesh Patel
So you say we're producing 60 of these machines, uh, this year, and then 70, 80 over subsequent years. W-what, what would happen if ASML just decided to double its CapEx or triple its CapEx? What is preventing them from producing more than 100 in 2030? Why, why are, why so confident that even five years out you can be, uh, relatively sure what their production will be?
- DPDylan Patel
So I think, I think a couple factors here, right? ASML has not decided to just go YOLO, let's expand capacity as fast as possible, right? Um, in general, the semiconductor supply chain has not, right? It's lived through the booms and busts, and, uh, we can talk a bit more about it, but basically no one, you know, m- some players as of very recently have, like, woken up, but in general, no one really sees demand for 200 gigawatts a year of AI chips or, you know, trillions of dollars of spend a year in the semiconductor supply chain. They're just like, they're not, they're not s- AI pilled, right? They're not AGI pilled.
- DPDwarkesh Patel
We're gonna get to tr- [chuckles] a trillion dollars this year. [laughs]
- DPDylan Patel
Yeah. I, I, I, I, I, I feel you, but I'm just saying, like, no one really understands this in the supply chain. Um, constantly we're told our numbers are way too high, and then when they're right, they're like, "Oh, yeah, yeah, but your, your next year's numbers are still too high." And it's like... But anyways, like ASML has sort of-- their tool has four major components, right? It has, um, the source, right, which is made by Cymer in San Diego, um, has the, uh, reticle stage, which is made in Wilmington, uh, Connecticut, right, has the wafer stage, um, and the, um, the optics, right, the lenses and such, and those two are made in Europe, right? And so when you, when you look at each f- each of these four, they're tremendously complex supply chains that, A, they have not tried to expand massively, and B, when they try to expand them, the time lag is quite long, right? Um, and so again, this is the most complicated machine that humans make, period, right, at, at a volume, um, a-any sort of volume. But, like, let's talk about the source specifically, right? What does the sporse-sp-source do? It drops these tin droplets. It hits it three subsequent times with a laser perfectly, so the first one, uh, hits this tin droplet, it expands out. It hits it again, so it expands out to this perfect shape, and then it blasts it at super high power, and, um, the tin droplets get excited enough that they release, uh, EUV light, 13.5 nanometer, and then it's in this thing that is, like, basically collecting all the light and directing it into the lens stack, right? Then you have the lens stack, which is Carl Zeiss, right, as you mentioned, and, and, and some other folks, but Zeiss being the most important part of it. Um-They also have not tried to expand production capacity because they don't see any, you know, they, they're like, "Oh, yeah, yeah. Like we're growing a lot because of AI. We're growing from sixty to a hundred," right? It's like, "No, no, no, no, no. We need to go to like a couple hundred, but it's, it's fine, whatever." Um, each of these tools has, you know, I think eighteen, um, of these lenses effectively, um, mirrors. Um, they are, they are multilayer mirrors, which are perfect layers of molybdenum and, uh, ruthenium, if I recall correctly, um, stacked on top of each other in many layers, and then the light bounces off of it perfectly. But it's not just like, you know, like when we think about a lens, you know, it's, it's like in a shape, and it focuses the light. This is a, this is like a mirror that's also a lens, and so it's pretty complicated. Any defect in this perfect layer of sta- in this, in these like, uh, super thinly deposited stacks will mess it up. Uh, any curvature issues, like there is a lot of challenges with scaling the production. Um, it's quite artisanal, right? In the sense, right? Because you're not making tens of thousands of these a year. You're making hundreds. You're making thousands, right? Uh, you know, talk about sixty tools a year, um, eighteen of these per tool. You end up with, you know, you're still in the, um, you know, hundreds of tools, uh, or thousand-- You're at the thousand number roughly for these, these lenses, um, and projection optics. So then you, and then you step forward to the reticle stage, uh, which is also something really, uh, crazy. This thing moves at, I wanna say, nine Gs. Like it, it will shift nine Gs because as you step across a wafer, the tool will go, um, and, and the wafer stage is complementary. It's the wafer part. So you, you line these t- two things up. You're taking all the light through the lenses that's focused, and, and here's the reticle, here's the wafer, and you're passing, uh, the reticle's moving one direction, the wafer's moving the direction, the other direction as it scans a, uh, twenty-six by thirty-three millimeter section of the wafer, and then it stops. It shifts over to another part of the wafer and does it again, and it does that in just seconds, right? And it, and each of them are moving at nine Gs in opposite directions. So each of these things is like a wonder and marvel of like chemistry, uh, fabrication, you know, um, you know, sort of like mechan-me-me-me-mechanical engineering, um, optical engineering, because you have to align all these things and make sure they're perfect. Uh, all these things have crazy amounts of metrology because you have to perfectly test everything, 'cause if anything is messed up, the yield goes to zero, right? 'Cause this is such a finely tuned system. And by the way, you-- it's so large that you're building it in all these, you're building in the factory in Eindhoven, uh, Netherlands, and they're deconstructing it and shipping it on p- many planes to the customer site, and then you're reassembling it there and testing it again, and that process takes many, many months. So like, it's, it's just there's so many steps in the supply chain, right? Whether it's Zeiss making their le- uh, lenses and projection optics, or Cymer, which is an ASML-owned company, making the EUV source, and each of these has th- its own complex supply chain, right? ASML's commented their supply chain has te- over ten thousand people in it, right?
- SPSpeaker
Like individual suppliers.
- DPDylan Patel
Yes. And, and it might not be directly. It might be through like, hey, you know, Zeiss has so many suppliers and, you know, XYZ company has so many suppliers, but, you know, they, these, you know... If you just think about like, okay, you're talking about two physically moving objects that are like this large and this large, you know, um, the size of a wafer, right? And it has to be accurate to the level of, you know, single digit nanometers or even smaller because the entire system, the overlay, right? Uh, layer to layer, uh, variation has to be on the order of three nanometers, right? Um, and so if the overlay is three nanometers, that means each individual part, the accuracy of its physical movement has to be even less than that, right? Has to be sub one nanometer in most cases because the, the error of these things stacks up, right? And, and, and so there's no way to like, you know, just like snap your fingers and increase production, right? You know, it's things simple as power, right? The US going from zero percent power growth to two percent power growth, even though China's already at thirty, was like so hard for America to do, right? Um, and, and, and that's a really simple supply chain with very few people in the supply chain, right? Uh, who make difficult things. And there's, you know, probably what? A hundred thousand electricians/people working the supply chain, uh, of electricity, um, or more in the US. And, you know, when you look at, oh, ASML employs like so few people. Carl Zeiss probably employs like less than a thousand people working on this, and all of those people are like super, super specialized.
- SPSpeaker
Yeah.
- DPDylan Patel
So it's, you know, you can't just train random people up for this like in the snap of a finger. You can't just get your entire supply chain to g-get galvanized, right? Nvidia's had to do a lot, um, to get the entire supply chain to even deliver the capacity they're gonna make this year, even though when you lo- go talk to Anthropic, they're like, "Well, we're short of TPUs, we're short of training, and we're short of GPUs." When you go talk to OpenAI, they're like, "We're short of these things," right? Um, so OpenAI and Anthropic, they know they need X. Nvidia is not quite as AGI-pilled, and they, they're building, uh, you know, X minus one. Um, and you go down the supply chain, everyone's doing minus one, and in some cases, they're doing like divided by two, right? Because they just don't, they're not AGI-pilled, right? I think... And, and, and so you end up with the time lag for this whip to react, right? You know, the, the cur- the sort of AI-pilledness is, and, and desire to increase production is so long. And then once they finally understand, hey, we need to increase production rapidly, right? And they, they think they understand, oh, AI means we have to go from sixty to a hundred in, in addition to the tools all just getting better and faster, you know, the source getting higher power from five hundred watts to a thousand and, you know, all these other aspects of the supply chain, you know, advancing technically plus increase of production. They think they're, they're like actually increasing production a lot. But if you flow through the numbers of, hey, what does Elon want? He wants a hundred gigawatts a year in space by 2028, is it? Um, or 2029. And, you know, Sam Altman wants fifty gigawatt, fifty-two gigawatts a year, um, by the end of the decade. And you look at, you know, probably Anthropic needs the same, and then, you know, Google needs that. You know, you, you go across the supply chain, it's like, wait, no, the, the supply chain can't possibly build enough capacity for everyone to get what they want on the side of compute.
- 56:06 – 1:05:56
Can’t we just use TSMC’s older fabs?
- DPDwarkesh Patel
in the data center supply chain for the last few years, people have been making arguments of this specific thing we are bottlenecked by, therefore AI compute can't scale more than X. But then as you've written about, oh, no, if, you know, say the grid is a bottleneck, then we just do, um, we just do behind the meter on the site, we do gas turbine, et cetera. If that doesn't work, there's like all these other alternatives that people fall back on. And I, I wanna ask you a question about wha- whether we can imagine a similar thing happening in the semiconductor supply chain. So if EUV becomes a bottleneck, well, we, we, you know, what if we just went back to seven nanometer and do what China is doing currently and producing seven-nanometer chips with, uh, multi-patterning with DUV machines. Um, and you know, if, if you look at a seven-nanometer chip like the A100, um, there's been a lot of progress obviously since from the A100 to the B100 or B200, but, um, how much of that progress is just numerics? And then like if, if you just told constant, say, uh, FP16 from A100 to B100, the B100 is like a little over one petaflop, and then, uh, A100 is like three hundred teraflops. And so-
- DPDylan Patel
Yeah, three twelfths
- DPDwarkesh Patel
... you have, you have like basically three X, uh, holding numerics constant. You have like a three X improvement from A100 to B100, and then some of that is the process improvement, some of that is just the accelerator design improving, which, you know, we could replicate again in the future. And so th-then it just seems like actually it's like very small effect from the process improving from seven nanometer to four nanometer. So I don't know. This is-- Say we have, uh, I don't know the numbers offhand, but let's say there's like a hundred and fifty K wafers per month of three nanometer and then eventually similar amounts for two nanometer, but then there's a similar amount for seven nanometer, right? So if you have all those old wafers, and then there's maybe a fifty percent haircut because the process, you know, the bits per wafer area are like, what is it, fifty percent less or something? Um, then it's like, it doesn't seem like that bad to just bring on seven nanometer wafers and then, oh, that gives you another fifty or a hundr- another hundred gigawatts. Um, yeah, tell, tell me why that's naive.
- DPDylan Patel
Yeah. So I think, you know, we potentially do go crazy enough that this is, this happens because we just need incremental compute, and the compute is worth the higher cost, power, et cetera, of these chips. But it's, it's also unlikely to some extent, uh, to a large extent because of, I think, I think just comparing, you know, some of these are like not fair comparisons, right? Um, for example, you know, from A100, which is three twelve teraflops, to, uh, Blackwell, which is like a thousand, um, ish of FP16, um, or maybe it's two thousand, and then Rubin is like five thousand or so FP16. It's, it's not a fair comparison because these chips have vastly different, um, you know, design targets, right? At, at A100, that's what-- that, that is what Nvidia optimized for was FP16, bfloat16 numerics. When you look at, uh, Hopper, they didn't care as much about that. They cared about FPA. When, when you looked at Rubin, they don't care about, about FP16 and BF16 as much. Uh, they care mostly about FP4 and six, right? Um, and so numerics like are what they've designed the search, uh, designed their chip for. Um, and so there's a couple like, you know, okay, let's just say, let's redesign, let's make a new chip design on seven nanometer. Sure, we can do that. Like, and then it's optimized for, uh, the numerics of the modern day. The performance difference is still gonna be much larger than the flops different you mentioned, right? Um, often it's easy to boil things down to flops, uh, per watt or flops per dollar, but that's actually not a fair comparison, right? Um, and so this is where sort of you can bring in, hey, let's look at Kimi K one or DeepSeek. When you look at Kimi or, or K-Kimi K two point five, sorry, and DeepSeek, when you look at these two models, and you look at their performance on Hopper versus Blackwell on, you know, very optimized software, you get vastly different performance, right? And most of this is not attributed to flops. A lot of this is attri-- or numerics, right? Because those models are actually eight bit. So it's not like Blackwells, uh, and Hopper, they're both optimized for eight bit, and Blackwell's not really taking advantage of its four bit there. Um, you know, the, the performance gulf is, is actually much larger. And, you know, the way you can sort of compare them and think about them is, sure it's one thing to, you know, shrink process technology and make the transistor smaller, and each chip has X number of flops, but you forget the big gating factors, which is these models don't run on a single chip. They run on hundreds of chips at a time, right? If you look at DeepSeek's production deployment, which is well over a year old now, they were running on a hundred and sixty GPUs, right? Um, and that's what they serve production traffic on. And so they split the model across a hundred and sixty GPUs. Every time you cross the barrier of a chip to another chip, there is an efficiency loss because you now have to tr- uh, transmit over, you know, high-speed electrical serdes, and there is a latency cost, there's a power cost, there's a, there's all these, um, dynamics that hurt. As you shrink and shrink and shrink the process node, you've-increase the amount of compute in a single chip. Now, in chip, right, uh, movement of data is, you know, at, at hundreds or of, of at least tens of terabytes a second, if not hundreds of terabytes a second. Um, whereas between chips, you're on the order of a terabytes of second, right? Um, and, and, and so this, this movement of data between chips that are super close to each other physically, and then you can only put so many chips close to each other physically, so you have to put chips in different racks. The order of, uh, data, uh, between that is on the order of hundreds of gigabits a second, right, four hundred gig or eight hundred giga a second. Um, so a, a hundred gigabytes a second, roughly. And so you've got this like huge ladder of like, oh, on chip I can communit-communicate at super fast speeds. Within the rack I can communicate at, you know, order of magnitude speeds. Outside the rack I can communicate at an even order of magnitude lower than that. And as you break the bounds of chips, you end up with this performance loss. So anyways, the reason I explain this is because when I look at-- when you look at Hopper versus Blackwell, even if both of them are using, you know, a rack worth of chips, the Hopper is significantly slower because the amount of performance that you have leveraged to the task within that, you know, within each domain of, hey, tens of terabytes a second of communication between these transistors or, or these processing elements and, you know, terabytes a second between these processing elements is much, much higher, and therefore the performance is much higher. So when you look at inference at, let's say, a hundred tokens a second for DeepSeek and Kimi K two point five, Hopper versus Blackwell, the performance difference is on the order of twenty x.
- DPDwarkesh Patel
Interesting.
- DPDylan Patel
Not two or three x like the FLOPS performance difference indicates, even though those are on the same process node.
- DPDwarkesh Patel
Makes sense, yeah.
- DPDylan Patel
Um, you know, there's just differences in networking technologies and what they've worked on. And so you can translate some of these back. Um, but when you look at like Rubin, what they're doing on three nanometer, some of these things are just not possible to do all the way back on A100, even if you make a new chip for-
- DPDwarkesh Patel
Interesting
- DPDylan Patel
... uh, seven nanometer. There's just like certain architectural improvements you can port. There's certain ones you cannot. Um, and, and so the performance difference is not just gonna be the difference in FLOPS. It's in some senses cumulative between the difference in, you know, FLOPS per chip, networking speed between chips, how many FLOPS are on a chip versus a system, memory bandwidth on a single chip and on a entire system. All of these things compound.
- DPDwarkesh Patel
Can I ask you a very naive question? So, uh, this year, last year, the B200 has now two dies on a single chip, so you can get that bandwidth on a single chip, uh, without having to go through NVLink or InfiniBand. And then next year, Rubin Ultra will have four dies on one chip. What is preventing us from just doing that with an o-- Like, how, how many dies could you have a single chip and still get these tens of terabytes a second?
- DPDylan Patel
Yeah, so, so even within Blackwell, um, there are differences in performance when you go-- when you're communicating on the chip versus across the chips. Uh, those, those bounds are obviously much smaller than when you're going, you know, out of the entire chip, but each die versus, uh-
- DPDwarkesh Patel
Mm-hmm
- DPDylan Patel
... you know, within the package. A-and so anyways, when you scale perform-- uh, you know, the number of chips up, there is some performance loss. It's not just perfect, but it is way better than different entire packages. Now, how large can advanced packaging scale? Um, the way Nvidia's doing it is CoWoS, the way G- uh, you know, Google and, uh, with Broadcom and MediaTek and, you know, Amazon, Tranium, all these chips are doing is called CoWoS. But actually, you can go and look back at what, um, what Tesla did with Dojo, right? Dojo, uh, which they canceled and restarted. I don't... A-anyways, Dojo was a chip that was the size of an entire wafer. They had twenty-five chips on it. Um, and there were some trade-offs, right? They couldn't put HBM on it. Um, but the positive side of it was that they had twenty-five chips, uh, on it. And so to date, it is still probably the best chip for running convolutional neural networks. Um, it's just not great at transformers because the, you know, the sort of the shape of the chip, the memory, the arithmetic, all, all these various specifications of it are just not well suited for transformers. They're well suited for CNNs. Um, and anyway, so, so, you know, Dojo chips were optimized around that. They made a bigger package, but at the same time, you know, as you make packages bigger and bigger and bigger, you have other constraints, right? Networking speed, uh, memory bandwidth, cooling capabilities, all of these things start to rear their heads. It's not simple. But yes, you will see a trend line of more chips on the package. And yes, you're gonna be able to do that on seven nanometer. In fact, that's what Huawei did with their, um, Ascend nine ten C or D. Uh, they put, they put-- They were initially just one, and then they did two. Um, and they're focusing on scaling the packaging up because that is an area where they can advance faster than sort of process technology where they can't shrink. But at the end of the day, that's still... You know, that's something that you can do on the leading-edge chips too, right? Anything you do on seven nanometer, you can also probably do on three nanometer in terms of packaging.
- DPDwarkesh Patel
Um, so if
- 1:05:56 – 1:16:20
When will China outscale the West in semis?
- DPDwarkesh Patel
we're-- if you end up in this world in 2030 where the West has the most advanced process technology, but it has not ramped it up as much, whereas China, I don't know if you think by 2030 they would have EUV and, I don't know, two nanometer or whatever. But they are a semiconductor power, so they are producing in mass quantity. Um, basically, I'm wondering what the year is where there's a crossover where our advantage in process technology has faded enough and their advantage in scale has increased enough, and also their advantage in like having one country that has the entire supply chain indigenized rather than having random suppliers in Germany and Netherlands and whatever, would mean that China would be ahead in its ability to pr- like produce mass FLOPS.
- DPDylan Patel
Yeah, so to date, um, China still does not have, you know, entire indigenized semiconductor supply chain, right?
- DPDwarkesh Patel
But will they in 2030?
- DPDylan Patel
Yeah. By 2030, it's, it's possible that they do. Uh, but, but to date, right, all of, of China's seven nanometer and fourteen nanometer capacity uses ASML DUV tools, right? Um, and the amount that they can ship and import from ASML is, is large and, uh... But the point being that the ma-vast majority of ASML's revenue, especially on EUV, all of it, uh, is outside of China. So the scale advantage is still in the favor of the, of let's call it the West plus Taiwan, Japan, et cetera.
- DPDwarkesh Patel
But they're trying to make their own DUV and EUV tools, right?
- DPDylan Patel
They're, they're trying to do all these things. The question is how fast can they advance, um, and, and scale up production as well as quality?And to date, we haven't seen that. Now, I'm quite bullish that they're gonna be able to do these things over the next five to 10 years, right? Really scale up production, really, uh, kick it into high gear. They have more engineers working on it. They're, um, they have more, uh, desire to throw capital at the problem.
- DPDwarkesh Patel
So, so by 2030, do they have fully indigenized DUV?
- DPDylan Patel
Uh, I think for sure. For sure.
- DPDwarkesh Patel
Okay.
- DPDylan Patel
DUV, yes.
- DPDwarkesh Patel
And fully indigenized EUV by 2030?
- DPDylan Patel
I think they'll have working tools. I don't think that they'll be able to manufacture a bunch yet, right? You know, there's, there's sort of having it work, and then there's production hell, right? Um, and ultimately, like ASML had EUV working in the early 2010s at some capacity.
- DPDwarkesh Patel
Right.
- DPDylan Patel
Right? Now, the tools were not accurate enough. They were not-
- DPDwarkesh Patel
Right
- DPDylan Patel
... uh, scaled for high production or for, scaled for high volume manufacturing, reliable enough. And then they had to ramp production, and that all took time. Production hell takes time, right? Which is why it took another five to seven years to get EUV into mass production at a fab rather than just it working in the lab.
- DPDwarkesh Patel
So how many, um, DUV tools do you think they need to be able to manufacture in 2030?
- DPDylan Patel
ASML?
- DPDwarkesh Patel
No, uh, China.
- DPDylan Patel
Oh, that's a great question. Um, you know, current i-i-it's, it's, it's a bit of a, a challenge to look into the supply chain especially. We try really hard, um, but, you know, for in some instances, they're like buying stuff from Japanese vendors, and if they wanna fully indigenize supply chain, they need to not buy-
- DPDwarkesh Patel
Yeah
- DPDylan Patel
... these lenses or buy these, uh, projection optics or stages from Japanese vendors. They need to build it internally. So it's really tough to say where they'll be able to get to. Like, I honestly think it's like a shot in the dark. But it's, it's probably not unlikely that they'll be able to do, you know, on the order of 100 DUV tools a year, uh, whereas ASML is doing hundreds of E- DUV tools a year currently. You know, no one's made a process node... No company has a process node where they make a million wafers a month, right? Um, Elon says he wants to do it, and China's obviously going to do it, right? Uh, and I don't think they'll y-you know, TSMC is trying to do that. Um, the memory makers may get there as well, right, to the million wafers a month, but not in a single fab. It, it's, it's, it's sort of mind-boggling to think of that scale, um, and challenging to see the supply chain galvanized for that. So I'm not sure. You know, I don't wanna doubt, you know, China's capability to scale.
- DPDwarkesh Patel
Right. I guess this is an interesting question, and I think it might, uh, you know, at some point SemiAnalysis will do the deep dive on this. But I think this question of like, by when would China be able-- Like, indigenized Chinese production would b- could be bigger than the rest of the West combined if you just add up like all the d- w-w- and put in the input of your model when they'll have DUV machines at scale and they'll have EUV machines at scale. 'Cause I think there's this, like, question around if you have long timelines on AI, by long meaning 2035, which is not that long in the grand scheme of things. Um, should you expect a world where China is, like, dominating in semiconductors? Which I think, I don't know, it doesn't get asked enough because if you're in San Francisco, we're just, like, thinking on timescale of like, you know, weeks. And then if you're outside of San Francisco, you're not thinking about AGI at all. And so this question of like, okay, what if we have AGI? What if you have this transformational thing that is commanding tens of trillions of dollars or hundreds of trillions of dollars of economic growth and weight, you know, token output and so forth, uh, but then it happens in 2035? And like, what does that imply for the West versus China? I think it's just like... I don't know. The SemiAnalysis has gotta write the definitive model on this.
- DPDylan Patel
Yeah. So I, I think it's, it's really challenging when you move timescales out that far, right?
- DPDwarkesh Patel
Yeah.
- DPDylan Patel
Like, what we tend to focus on is, like, we're tracking every data center, we're tracking every fab, we're tracking all the tools, and we're tracking where they're going. But the, the time lags for these things are, are relatively short, right? Um, we can only make, like, reasonably accurate estimates for data center capacity based on, you know, land purchasing and, you know, permits and turbine purchasing and all these things. And we know where all these things are going, and we like, that's what the data we sell is. But like, you know, as you go out to like 2035, you know, things are just so radically different and, you know, your error bars get so large it's kinda hard to make an estimate. Uh, but at the end of the day, like, you know, there is... If takeoff or timelines are slow enough, right, um, then certainly China, I, I don't see why they wouldn't be able to catch up drastically, right? Um, you know, in, in some sense we've got like this valley, right? Of where, you know, call it three to six months ago, Chinese models were, or maybe even now, Chinese models are competitive as they've ever been. Uh, I think, I think Opus 4.6 and GPT 5.4 have really pulled away and made the gap a little bit bigger, but I'm sure, you know, some new Chinese models will come out. But as we move from, you know, hey, these companies are selling tokens where they provide the entire, uh, reasoning chain and all that to, uh, selling automated, you know, white collar work, right? Automated software engineer, send them the request, they give you the result back, and there's a bunch of thinking on the back end that they don't show you. The ability to distill out of American models into Chinese models will be harder, A. B, as the scale of the compute that the labs have, right? Uh, OpenAI exited the year with roughly two gigawatts last year. Um, Anthropic will get to, you know, two-plus gigawatts this year and, and by the end of next year they'll both be at like 10 gigawatts of capacity. Um, China has, is not scaling their AI lab compute nearly as fast. And so at some point, you know, when you can't distill the learnings from these labs into the Chinese models plus this compute, uh, race that O-OpenAI, Anthropic, Google, et cetera, Meta are all racing on, at some point they end up getting to a point where, you know, the model performance should start to diverge more. Um, and then all of this CapEx that's being spent on, you know, data centers and all that, right? Amazon, you know, 200 billion, Google 180, you know, so on and so forth. All these companies are spending hundreds of billions of dollars of CapEx. Um, you know, there's, there's, you know, nearly a trillion dollars of CapEx being invested in data centers in America this year roughly, right? Um-You, you end up with, okay, well what's the return on invested capital here? Uh, you and I would think that the return on invested capital for data center CapEx is very high. Um, and at least if we look at Anthropic's revenues in, you know, January, they added like four billion. In February, which was a shorter month, they added like six. Um, we'll see what they can do in March and April, uh, given compute constraints are what's bottlenecking their growth, right? The reliability of cloud code is actually quite low because they're so compute constrained. Uh, but if this continues, then the ROIC on these data centers is super high.
- DPDwarkesh Patel
Yeah.
- DPDylan Patel
Um, and at some point, the US economy starts growing faster and faster over the next, you know, th-this year and next year because of all this CapEx and all this revenue that these models are generating, um, and downstream supply chain versus China doesn't have that yet, right? Um, they have not built the scale of infrastructure to then invest in model-- uh, to invest in models to get to the capabilities to then deploy these models at such scale, right? Uh, 'cause when you look at like Anthropic's, hey, they're at, call it twenty billion ARR. Of that, you know, the margins are sub fifty percent, at least last reported by the information. So then, you know, you're at, okay, that's like thirteen, fourteen billion dollars of compute that it's running on rental cost-wise, which is actually like fifty, uh, billion dollars worth of CapEx that someone p- laid out for Anthropic to generate their current revenue. Um, and China has just not done this. If, if and when Anthropic 10X's revenue again, uh, and I think our, our answer would be when, not if, um, then China doesn't have the compute to deploy at that scale. And so there is some sense of like, oh, we're in fast takeoff-ish, right? And it's not like we're talking about, you know, Dyson sphere by X date. It's more like the revenue is compounding at such a rate that it does affect the ec-economic growth, um, and the resources these labs are gathering are s- going so fast that... You know, and, and China hasn't done that yet. So in that case, the US, uh, and the West is actually diverging. The flip side is actually th-these, these infrastructure investments have middling returns. Maybe they're not as good as, as, as hoped. You know, maybe Google is wrong for wanting to take free cash flow to zero and spend three hundred billion dollars on CapEx next year. Maybe they're just wrong. Um, and, you know, people on Wall Street who are bearish and people who don't understand AI are correct, right? Um, and in which case then the US is building all this capacity. It doesn't get really great returns, and China's able to build the fully vertical indigenized supply chain, not, you know, US, Japan, Korea, uh, Taiwan, Southeast Asia, you know, Europe, all these, all these countries together building this like less vertical supply chain. Um, and in a sense, at some point, China is able to scale past us if AI takes longer to get to certain capability levels than-
- DPDwarkesh Patel
Yeah.
- DPDylan Patel
You know, I would say the vast majority of your guests on this podcast believe.
- 1:16:20 – 1:42:53
The enormous incoming memory crunch
- DPDwarkesh Patel
Let, let's go back to memory because I think this is maybe, uh, pe-people on Wall Street and people in the industry are understanding how big this is, but maybe generally people don't understand how big a deal this is. So we've got this memory crunch, as you were talking about, and earlier I was asking about, oh, could we solve for the EUV tool shortage by going back to seven nanometers? So let me ask a similar question about memory. Um, uh, HBM is made of DRAM but has three to four X less, uh, bits per wafer area than the DRAM it's made out of. Is it possible that accelerators in the future could just use commodity or DRAM and not HBM, and so just we can make much more, uh, capacity out of the, the DRAM we get? And the reason to think this might be possible is, look, if we're gonna have agents that are just going off and doing work and it doesn't really... You don't-- It's not a synchronous chatbot application, then you don't necessarily need extremely high, uh, fast latency kinds of things anymore. And so maybe you can have the low, low bandwidth, uh, 'cause the, the reason you stack DRAM into stacks and make HBM is for higher bandwidth. And so is it possible to go to HBM, uh, accelerators and, um, and basically have the opposite of cloud code fast, like have cloud code slow and [laughs] and do that?
- DPDylan Patel
Yeah, I think, I think at the end of the day, the incremental purchaser who's willing to pay the highest price for tokens also ends up being the one that's like less price sensitive.
- DPDwarkesh Patel
Mm-hmm.
- DPDylan Patel
And, you know, the, the compute should be allocated in a capitalistic society towards the val- the, the goods that have the highest value, and the private market determines this by willingness to pay. And so to some extent, um, sure, Anthropic could actually release a slow mode, right? They could release Claude Slow mode and have an increase in tokens per dollar by a, a significant amount. Um, they could probably like reduce the price of Opus four-six by, you know, four X, five X and reduce the speed by another, by maybe just like two X. Like the curve on inference throughput versus speed is there already just on HBM. Um, and yet they don't, um, n-because no one actually wants to use a slow model. And furthermore, on these agentic tasks, you know, it's, it's great that the model can run at this time horizon of hours that's kind of like, okay, well if the model was just running slower, that hours would become a day, right? Um, or vice versa, right? If the model's running faster, that hours becomes hour. Um, and yet no one really wants to move to that day-long wait period because the highest value tasks also have some time, time sensitivity to them, right? And, and so I'm-- I struggle to see, you know, yes, you could use DDR, um, but then there's a couple like things that are challenging with this, right? You could use regular DRAM. Um, one is you're, you're still limited. You know, one of the like core constraints of chips, even though they're, you know, sort of like, you know, the, there's an... a chip is like a certain size, all of the IO escapes on the edges of the chip, right? So oftentimes, you know, what you see is the left and the right of the chip are HBM, the IO from the chip to the HBM is on the sides, and then the top and bottom are IO to other chips, right? Um-And so if you were to change from HBM to DDR, then all of a sudden this IO on this edge would have significantly less bandwidth, but it had significantly more capacity per chip.
- DPDwarkesh Patel
Yeah.
- DPDylan Patel
Because-- And, and, and so yes, you're making less, um, you know, the, the, the metric that you actually care about is bandwidth per wafer, not bits per wafer.
- DPDwarkesh Patel
Be-because the, the thing that is constraining the flops is just getting in and out the next matrix, and for that you just need more bandwidth.
- DPDylan Patel
Yeah, getting out the weights and getting out, getting in and out the, uh, KV cache.
- DPDwarkesh Patel
Right.
- DPDylan Patel
And so in many cases, these GPUs are not running at full memory capacity. Yes, there-- it's obviously like a system design thing, you know, model hardware, software co-design of, hey, what do I, what do I-- how much KV cache do I do? How much do I keep on the chip? How much do I offload to other chips and call when I need it for tool calling or whatever? How much do I, um-- how many chips do I paralyze this on? Obviously, these are like the, the search space of this is like very broad, which is why we have like InferenceX, like this is like an open source model, like searches all the optimal points on inference for a variety of eight different chips, um, and models. Um, anyways, like the point is, you don't necessar- You're not always necessarily constrained by memory capacity. Uh, you can be constrained by com-- flops. You can be constrained by network bandwidth. You can be constrained by memory bandwidth, uh, or you can be constrained by memory capacity. There's sort of like four... If you were really to simplify it down, there's like four constraints, and each of these can break out into more. But in this case, if you switch to DDR, yes, you produce four X the bits per wa- DRAM wafer, but all of a sudden the constraints shift a lot and your system design shifts a lot. You go slower, yes. Is the market smaller? Okay, maybe, possibly. But also now all of a sudden, all these flops are wasted-
- DPDwarkesh Patel
Yeah
- DPDylan Patel
... because they're just sitting there waiting for memory. It's like, great, I don't need all that capacity because I can't really increase batch size because then the KV cache is gonna take even longer to read.
- DPDwarkesh Patel
Makes sense. Yeah.
- DPDylan Patel
And so you never-- You can-- Yeah.
- DPDwarkesh Patel
Interesting. Uh, what, what is the bandwidth difference between HBM and, uh, a normal DRAM?
- DPDylan Patel
Yeah. So an HBM stack, uh, of HBM4, let's just talk about like the stuff that's in Rubin-
- DPDwarkesh Patel
Yeah
- DPDylan Patel
... 'cause that's what we've been indexing on, is 2048 bits across connected in an area that's like thirteen millimeters acr- wide. Um, so 2048 bits, and it tr- it transfers memory at around ten giga transfers a second. So HBM, a stack of HBM4 is 2048 bits on an area that's thirteen millimeters wide, roughly, or eleven, and that's, that's the shoreline that you're c- taking on the chip. And in that shoreline, um, you have 2048 bits transferring at ten giga transfers per second. Uh, you multiply those together and you divide by eight bits to bytes. You're at roughly two and a half terabytes a second per HBM stack, right? When you look at DDR, um, in that same area, it's maybe sixty-four or a hundred and twenty-eight bits wide.
- DPDwarkesh Patel
Hmm.
- DPDylan Patel
And that DDR5 is transferring at any, you know, anywhere from six-point-four giga transfers a second to maybe eight, eight thousand giga transfers a second. So your, your bandwidth is like significantly lower, lower, right at sixty-four times eight thousand divided by eight, um, you're at sixty, sixty-four gigabytes a second. Um, and even if you take a generous interpretation of one twenty-eight times eight giga transfers, you're at a hundred and twenty-eight gigabytes a second for the same shoreline versus two and a half terabytes a second.
- DPDwarkesh Patel
Interesting.
- DPDylan Patel
There's a, there's an order of magnitude difference in bandwidth per edge area. And if your chip is a square or it's twenty-six by thirty-three, right, is the maximum size for a chip, individual die, um, you only have so much edge area, and then on the inside of that chip, you put all your compute. Um, there's things you can do to try and change, right? More SRAM, more caching, blah, blah, blah. Uh, but at the end of the day, you're very constrained by bandwidth.
- DPDwarkesh Patel
Interesting. So, um, uh, then there's the question of like, where can you destroy demand to free up enough for AI? Um, and, uh, a-and I guess the picture is especially bad because as you're saying, if it takes four X more wafer area to get the same byte for HBM, you have to destroy four X as much consumer demand for g- laptops and phones and whatever in order to free up one byte for AI. So what, yeah, what does this imply for the next year or two of... Sorry for the run-on question. I think on, on your newsletter, you said thirty percent of the CapEx in 2026 of big tech is going towards memory.
- DPDylan Patel
Yes.
- DPDwarkesh Patel
That's insane, right?
- DPDylan Patel
Yeah.
- DPDwarkesh Patel
Like of the six hundred billion or whatever, you're saying thirty percent is going just to, uh, just to-
- DPDylan Patel
And, you know, obviously there's some level of like margin stacking that Nvidia does, and so if you separate out-
- DPDwarkesh Patel
Yeah
- DPDylan Patel
... you know, and you apply their margin to the memory and the logic, but at the end of the day, yeah, like a third of their CapEx is going to memory.
- 1:42:53 – 1:55:03
Scaling power in the US will not be a problem
- DPDwarkesh Patel
okay. Let, let, let me ask you about power now. So it sounds like you think power can be arbitrarily scaled. Um-
- DPDylan Patel
Not arbitrarily, but yes.
- DPDwarkesh Patel
But b- beyond these numbers. And, um, I think if I'm remembering correctly, your blog post on the power, uh, how, how AI labs are increasing power, you were like, well, you were implying that, uh, GE Vernova and Mitsubishi and Si- uh, Siemens could produce, and gas turbines was like 60 gigawatts a year. And then there's these other sources, but they're like less significant than the, the turbines. And so-And only a fraction of that goes to AI, I assume. So w-- uh, yeah, if, if in 2030 we have enough logic and memory to do 200 gigawatts a year, is it... Do, do, do you just think that these things are on a path to ramp up to more than 200 gigawatts a year, or w-what, what do you see?
- DPDylan Patel
Yeah. So I mean, I mean, right now we're at 30, right?
- DPDwarkesh Patel
Yeah.
- DPDylan Patel
Um, or 20, 20, 20... So this is critical IT capacity, by the way, right? This is an important thing to mention. When I'm talking about these gigawatts, I'm talking about critical IT capacity. Server plugged in, that's how much power it pulls. But there's losses along the chain, right? There is a loss on the transmission. There's lo-losses on the conversion. Uh, there's losses on cooling, et cetera. And so you should f-gross this factor up, you know, from 20 gigawatts for this year or 200 gigawatts by the end of the decade, um, to some number 20, 30% higher, and then you have capacity factors, right? Turbines don't run at 100%. In fact, like if you look at PJM, uh, which is the largest grid, I think, in America, um, sort of the Midwest, sort of Northeast kind of area-ish, not, not the full Northeast. But anyways, PJM, they rate i-in their models for like, "Hey, turbines, how much capacity? We wanna have excess, you know, roughly 20% capacity." Uh, in addition, in that 20% excess capacity, we're running all the turbines at 90% because they are derated some for reliability, oh, things go down, maintenance, et cetera, et cetera, et cetera. So then in reality, the nameplate capacity for, uh, energy is always way higher than the actual end critical IT capacity because of all of these factors.
- DPDwarkesh Patel
Yeah.
- DPDylan Patel
Um, but it's not just turbines, right? If you're just making power from turbines, like that's simple, boring, easy, right? Um, we're, you know, humans, and capitalism is far more effective. And so the whole point of that blog was, yes, there's only three people making combined cycle gas turbines, but there's so much more we can do, right? We can do aero derivatives, right? We can take airplane engines and turn them into, uh, turbines as well, and there's even new entrants in the market, like Bloom Supersonic's trying to do that, right? And they're working with Crusoe and, and also there's all the other ones like that already exist in the market. There's, um, there's medium speed reciprocating engines, right? Engines that spin in circles, right? So sort of like any diesel engine, right? There's like 10 people who make engines that way, right? So Cummins, you know, you know, p-- at least I'm from Georgia, and we, we, you know, people used to be like, "Oh man, you got a Cummins engine in there," um, you know, like, you know, r-regarding RAM trucks. But it's like, well, actually auto-automobiles manufacturing's going down. These companies all have capacity and could scale and convert that to... for data center power, right? Stick all these reciprocating engines. Yes, it's not as clean as combined cycle. Maybe you can, you can convert them from diesel to g- to gas if you want. Um, but at the end of the day, these spinning engines-- Oh, what about ship engines, right? All of these engines for these massive cargo ships, those are great. Nebius is doing that for a data center in Micro-- uh, for Microsoft, um, in New Jersey, right? They're running these ship engines to generate power. Oh, there's, um, you know, Bloom Energy's doing, uh, fuel cells. We've been like very positive on them for like a year and a half now, um, because they have like such a cap-capability to increase their production, um, and their payback period for production increase is like very fast, even if the cost is a little bit higher than combined cycle, which is like the best cost and efficiency. Um, you know, and then, and then there's solar plus battery, which as these cost curves continue to come down, those can come online. There's wind, and you know, of course, the derating of those, you know, hey, when you put on a wind turbine, you might say, "Oh, I'm only gonna expect 15% of the maximum power because things just oscillate." But yeah, batteries, there's all these things. And then the other thing is that like the grid is scaled for, um, you know, hey, we're not gonna cut off power at peak usage, which is like the hottest day in the summer. Um, but in reality, that's a load spike that is 10, 15, 20% higher than the average. Well, if you just put enough utility scale batteries, or you put peaker plants that only run a small portion of the year, then all of a sudden, you know, and those could be gas, they could be industrial gas turbines, they could be combined cycle, they could be any of the other sources of power I mentioned, um, they could be batteries. Then all of a sudden you've unlocked 20% of the US grid for data centers because most of the times that capacity is sitting idle, and it's really only there for that peak, right? Which is a day or two, right? And it's a few hours of like maybe a few, a few days of the full year is that peak. And so you just r-have enough capacity to absorb that, that peak load, and all of a sudden you've transferred all. And today, data centers are only 3, 4% of the power of the US grid, and by '28 they'll be 10%. But if you can just unlock 20% of the US grid like this, like it's like not that crazy. Um, you know, and, and the US grid is terawatt level, not hundreds of gigawatts level, right? So we, we can add a lot more energy. It's not easy. I'm not saying it's easy. These things are gonna be hard. There's a lot of hard engineering. There's a lot of risks that people have to take. There's a lot of new technologies people have to use. But Elon was the first to do this behind the meter gas. Um, and since then we've seen an explosion of different things that people are doing to get power, and they're not easy, but people are gonna be able to do them, and the supply chains are just way more simple than chips.
- DPDwarkesh Patel
Interesting. So I, I, I guess, uh, he made the point during the interview that the specific blade for the specific turbine he was looking at, the lead times for that go out beyond 2030. And your point is that-
- DPDylan Patel
That's great. There's so many other ways to make energy.
- DPDwarkesh Patel
Okay. So you're saying-
- DPDylan Patel
Like just be inefficient, like it's fine. [chuckles]
- DPDwarkesh Patel
Right. So you're like r-right now, I guess co-combined cycle gas turbines have CapEx of $1,500 per kilowatt, and you're saying you could just... It would make sense to have either technologies that are much more expensive than that, or other things are getting cheap enough to that to make it competitive?
- DPDylan Patel
Exactly. Exactly.
- DPDwarkesh Patel
Yeah.
- DPDylan Patel
You know, the, it can be as high as $3,500 per kilowatt, uh, even, right? So it could be twice as much as the cost of combined cycle, and the total cost of the GPU, you know, you know, on a TCO basis has gone up a few cents per hour.
- DPDwarkesh Patel
Right.
- DPDylan Patel
Right? A-a-a-again, if we're t- because, because we've been talking about, uh, Hopper pricing, $1.40 now becomes, you know, oh, the power price doubles. Okay, the Hopper that was $1.40 is now $1.50 in cost.
- DPDwarkesh Patel
Right.
- DPDylan Patel
It's like, oh, I don't care because the models are improving so fast that the marginal utility of them is worth way more than that 10 cent increase in energy.
- DPDwarkesh Patel
Right. Okay. And then so you're saying 20% of the grid, so one terawatt about, uh, 20% of that can kinda just come, uh, online from utility scale batteries increasing, um-What you'd com- be comfortable putting on the grid
- DPDylan Patel
The regulatory mechanism there is like not easy, by the way. But yeah
- DPDwarkesh Patel
But, but like that's two hundred gigawatts, like if that hypothetically happens. But you're saying on just from the different sources of gas generation you mentioned, the different kinds of engines and turbines, um, combined, how, how many gigawatts could they unlock by the end of the decade?
- DPDylan Patel
Yeah. So we're, we're tracking, uh, in some of our data where all... You know, there's over 16 different manufacturers of power-generating, uh, things just from gas alone, right? So, you know, yes, there's only three turbine manufacturers for combined cycle. Um, but we're tracking 16 different vendors, and we have all of their orders and things like that. And it turns out there is just hundreds of gigawatts of orders to various data centers. As we get to the end of the decade, we think like something like half of the capacity that's being added will be behind the meter. Um, and when we look at like a lot of this is-- Actually, behind the meter is almost always more expensive than grid connected, but there's just a lot of problems with getting grid connected and, uh-
- DPDwarkesh Patel
Right
- DPDylan Patel
... you know, permits and interconnection queues and all this sort of stuff. So it ends up being, even though it's more expensive, people are doing behind the meter, and then what they're doing behind the meter with ranges widely, right? It could be reciprocating engines, it could be ship engines, it could be aeroderivatives, it could be combined cy- although combined cycle's not that great for behind the meter. Um, it could be, uh, Bloom Energy fuel cells. It could be solar plus battery, right? Like, it could be any of these things.
- DPDwarkesh Patel
And you're saying e- e- any of these individually could do like tens of gigawatts?
- DPDylan Patel
Any of these individually will do tens of gigawatts, and in whole, they will do hundreds of gigawatts.
- DPDwarkesh Patel
Okay. So that, that alone should more than, um-
- DPDylan Patel
I mean, it's, it's gonna take... I mean, like electrician wage is probably double or triple again, right?
- 1:55:03 – 2:14:26
Space GPUs aren't happening this decade
- DPDwarkesh Patel
Elon Musk is very bullish on space GPUs. If you're s- right that power is not a constraint on Earth, I guess the other reason they would make sense is that even you can phys- there is enough, there will be enough gas turbines or whatever to build it on Earth. I think Elon's next argument then is like, you can't get the permitting to build hundreds of gigawatts on Earth. Do you buy that argument?
- DPDylan Patel
Land-wise, it's pretty... America's big. There does-- Data centers don't take that much space.
- DPDwarkesh Patel
Yeah.
- DPDylan Patel
You can, you can solve that. Um, permitting-wise, air pollution permits are a challenge, but the Trump administration's made it much easier. You go to Texas, and you can skip a lot of this red tape. Uh, and so, you know, Elon, Elon had to deal with a lot of like this complex stuff in Memphis, and then building a power plant across the border, um, and all these things for Colossus One and Two.But at the end of the day, there's a lot more you can get away with in the middle of Texas, right?
- DPDwarkesh Patel
But why-- Given that Elon lives in Texas, why didn't he just go to Texas?
- DPDylan Patel
I think, I think it was partially like they over-indexed on grid power-
- DPDwarkesh Patel
Ah.
- DPDylan Patel
-for a temporary period of time, right? Because that's just what they, they thought they needed more of and then-
- DPDwarkesh Patel
They just had an aluminum refinery connected to the grid there.
- DPDylan Patel
Uh, no. It's a, it was a, it was an appliance factory that was-
- DPDwarkesh Patel
Oh, sorry. Never mind.
- DPDylan Patel
-that was, uh, idled. Um, but I think they may have indexed more to what was grid power. They may have indexed more to, like, water access and gas access because it-- Actually, I think they bought that knowing that the gas line was right there and they were gonna tap it. Same with water. Um, it was a whole host of different constraints. It was probably an area where electricians and things like that were easier to find. But at the end of the day, I'm not exactly sure why they chose that site. I bet Elon would've chosen somewhere in Texas if he could've, like, gone back. But yeah, and because of the regulatory faces he's challen-- he's, uh, challenges he's faced, i-it's, it's, it's ultimately, like, permitting is a challenge, but America is a big place, and there are 50 states, and things will get done. And there are a lot of small jurisdictions where you can just transport in all of the workers that you need for a temporary period of six months to a year, um, depending on the type of contractor. It can be even three months for you, depending on the type of the contractor, uh, that's coming in, and put them in temporary housing, pay out the butt because labor is very cheap relative to the GPUs and the power... Or not the power, but the GPUs and the, like, the networking and so on and so forth, and the end value of the tokens it's gonna produce. So all of these things have plenty of room to, like, be paid for. Um, and so I think it's fine, right? You, you, you... And also people are diversifying now, right? Australia, uh, Malaysia, Indonesia, India, these are all places where data centers are going up at a much faster pace. But currently still seventy percent plus of the AI data centers are in America, and that continues to be the trend. And so I think people are figuring out how to build these things. And permitting-- Like, I, I just like, ultimately, like permitting and red tape in middle-of-nowhere Texas or middle-of-nowhere Wyoming or middle-of-nowhere, like, New Mexico is probably a hell of a lot easier than sending stuff into space.
- DPDwarkesh Patel
Right. Well, o-other than the fact that the economic argument, uh, makes less sense once you consider the fact that energy is a small fraction of the cost of ownership of a data center, what are the other reasons you're skeptical?
- DPDylan Patel
Yeah. So obviously power's free in space, basically. Uh-
- DPDwarkesh Patel
Yeah. No, no, so that's the reason to do it.
- DPDylan Patel
Yeah, that's the reason to do it. But then there's all the other counterarguments, right? Which is because even if power costs double, you're still at a fraction of the total cost of the GPU. The, the main challenge is, is... And what we've seen that disperses, right, we have ClusterMax, which rates all the neo clouds, and we test them. We test over 40 cloud companies, including the hyperscalers and neo clouds. What differentiates some of these clouds the most outside of software is their ability to deploy and manage failure, right? GPUs are horrendously unreliable. Uh, even today, 15% of Blackwells or so that get deployed have to be RMA'd. You have to take them out. You have to, you know, maybe just plug them in and plug them back in. But sometimes you have to take them out and ship them to Nvidia, or rather their partners who do these RMAs and such. Um-
- DPDwarkesh Patel
What do you make of Elon's kind of argument that once you have the initial, um, after initial phase, they actually don't fail that much?
- DPDylan Patel
Sure. But now you've, you've done this. You've tested them all. You deconstructed them, put them on a spaceship, fucking put them into space, and then put them online again. That's months, right? And if your argument is that, you know, hey, GPUs have a useful life of X years, right? If a GPU has a useful life of five years, and it takes three additional months, probably six, let's say six additional months, then that is 10% of your cluster's-
- DPDwarkesh Patel
Right.
- DPDylan Patel
-useful life.
- DPDwarkesh Patel
Yeah.
- DPDylan Patel
And, and because we're so capacity constrained, that compute is most valuable theoretically in the first six months you have it because we're more constrained now than in the future because that compute now can contribute to a better model in the future-
- DPDwarkesh Patel
Yeah.
- DPDylan Patel
-or contribute to revenue now, which you can use to raise more money to get be- you know, all these, all these sorts of things. Now is always the most important moment. And so you've delayed your compute deployment by six months potentially. And, and the thing that separates these clouds is we see clouds that take six months to deploy GPUs today on Earth, right? We see clouds that take a lot less than six months, right? And so the question is, where does space get in there? I don't see how you would test them all on Earth, deconstruct them, and ship, ship them-
- DPDwarkesh Patel
Right. Right. Right.
- DPDylan Patel
-and shoot them into space and it not take longer than just putting them in the spot that you were testing them.
- DPDwarkesh Patel
Yeah. So the question I wanted to ask is the topology of space communication. So, um, right now, uh, Starlink satellites talk to each other at 100 gigabi- uh, gigabits per second. And you could imagine that being much higher with optical, uh, intersatellite laser links that are optimized for this. Um, and that actually ends up being, like, quite close to the InfiniBand bandwidth, which is like 400 gigabytes a second, right?
- DPDylan Patel
But that's per GPU, not per rack.
- DPDwarkesh Patel
I see. Okay.
- DPDylan Patel
So, so multiply that by 72. Also, like, that was Hopper. When you go to Blackwell and Rubin, that two X's and two X's again.
- 2:14:26 – 2:18:49
Why aren’t more hedge funds making the AGI trade?
- DPDwarkesh Patel
spicy question.Um, you know, you're explaining, you make the, uh, SemiAnalysis sells these spreadsheets, and y- you're always like, "Ah, six months ago or a year ago, we told people the memory crunch," or now you're telling people the clean room crunch, and then the, in the future, the tool crunch. Um, why is Leopold the only person that is using your spreadsheets to make outrageous money?
- DPDylan Patel
Um-
- DPDwarkesh Patel
What, what is everybody else doing? [chuckles]
- DPDylan Patel
I think, I think there are a lot of people making money in many ways. I think obviously Leopold, Leopold jokes that, you know, he's the only client of mine that tells me our numbers are too low.
- DPDwarkesh Patel
[chuckles]
- DPDylan Patel
Everyone else tells me our numbers are too high, uh, almost ad nauseam. Um, you know, whether it's a hyperscaler saying, "Hey, that other hyperscaler, their numbers are too high," you know, and we're like, "Nah, that's it." And they're like, "No, no, no, no, it's impossible," blah, blah, blah. And then you, like, finally have to convince them through all these facts and data when we're working with hyperscalers or AI labs that in fact, no, that number isn't too high, um, that's correct. But event- and eventually, like sometimes it's like six months later it takes them to realize, or a year later. Um, I think, I think other clients, like on the trading side, also use our data, right? We, we sell data to a lot of, um, you know, I think roughly 60% of my business is industry, so AI labs, data s- uh, data center companies, hyperscalers, semiconductor companies, uh, you know, the, the whole supply chain across AI infrastructure. Um, but then like 40% of our revenue is like hedge funds, right? And, and you know, I'm not gonna comment on who our customers are, but I think a lot of people use the data, it's just how do you interpret it, and then what do you, like view as beyond it? And I will say Leopold is pretty much the only person who tells me my numbers are too low always. Um, and sometimes he's too high, sometimes I'm too low, right? Uh, but in general, I think other people are, you know, doing that, and you can check certain... You can, you can look across the space at hedge funds and look at their 13Fs and see actually they own maybe not exactly what Leopold does, uh, because it's always like a question of like what is the most constrained thing? What's the thing that's gonna be, that's most outside of expectations? And that's what you're really trying to exploit is inefficiencies in the market. And in a sense, what our data shows is li- is, is like making the market more efficient by making the base data, uh, of what's happening more accurate versus like... And, but, but in a sense, I think many, many funds do trade on information, um, that is out there, and it's not... I, I don't think, I don't think Leopold's the only person. I think he has the most conviction on the entire, uh, in the entire like about the AGI takeoff though, right?
- DPDwarkesh Patel
Right. I mean, but the, the, but the, the bets are not about like what happens in 2035. The bets that you're making that are at least exemplified by public returns we can see for different funds, including Leopold's, are about what has happened in the last year, and the last year stuff could be predicted using your spreadsheets, right? So it's like, it's um, it's less about, it's about buying like the next year spreadsheets.
- DPDylan Patel
They're not just, they're not just spreadsheets. You know, there's reports-
- DPDwarkesh Patel
Yeah, yeah
- DPDylan Patel
... there's API access to the data.
- DPDwarkesh Patel
Yeah.
- DPDylan Patel
There's a lot of data. But anyways, you know, I think-
- DPDwarkesh Patel
But do, do you see what I mean? Like, it's like he, he... It's not about some crazy singularity thing. It's about like, oh-
- DPDylan Patel
It's a simple-
- DPDwarkesh Patel
... okay, do you buy the memory crunch?
- DPDylan Patel
A simple one though is like you, you only buy the memory crunch if you believe AI is gonna take off in, in a huge way. And, um, the memory crunch, a lot of it was predicated on like, you know, at least for like people in the Bay Area who think about infrastructure, it's like obvious. KV cache explodes as context lenses go longer-
- DPDwarkesh Patel
Right
- DPDylan Patel
... so you need more memory, and then you do the math, and you... Then, and you also have to have a lot of supply chain understanding of like what fabs are being built and what data centers are being built, and how many chips, and all these things. And so we, we, we, we track all these different data, data sets like very tightly, but at the end of the day, it takes, you know, someone to fully believe that this is gonna happen. Like, I think a year ago, if you told someone memory prices were quadruple and smartphone volumes are gonna go down 40%, um, you know, over the, over the year or two after that, people were like, "You're crazy. That never happened." Except a few people do believe that, and those people did trade memory, right? And, and, and people did. I don't think like Leopold was the only person buying like memory companies. I think there were a lot of people buying memory companies. He, of course, sized and positioned and did things in a better ways than some, um, may- maybe most, right? I, I don't wanna comment on whose returns are what. Um, but he d- certainly did well. Um, but other people also did really well, right? Um, trying to be like this... Wow, you've made me diplomatic for the first time ever.
- DPDwarkesh Patel
[laughs] Yeah.
- DPDylan Patel
No, no, you're fine. You're fine. I think it's hilarious, right? I'm being a diplomat, you know, whereas usually I'm like spicy. [laughs]
- DPDwarkesh Patel
Yeah. Okay, uh, maybe some rapid fire, uh, to, to close out. Um,
- 2:18:49 – 2:24:35
Will TSMC kick Apple out from N2?
- DPDwarkesh Patel
can TSMC, if you're saying, look, the, the memory logic, et cetera, the N3 is mostly gonna be AI accelerators, but then there's N2, which is mostly Apple now, and then in the future, I guess AI would also wanna go on N2. Can, can they kick out Apple if Nvidia and Amazon and Google say, "Hey, we really... We're willing to pay a lot of money for N2 capacity"?
- DPDylan Patel
So I think the challenge with this is chip design timelines take a long while. And so that's more than a year, and the designs that are on 2 nanometer are more than a year out.
- DPDwarkesh Patel
Yeah.
- DPDylan Patel
And so what would really happen is Apple... Or sorry, Nvidia and all these others will be like, "Hey, we're gonna prepay for the capacity. We're... And you're gonna expand it for us." And then Apple would be... And, and maybe TSMC takes a little bit of margin, but not a ton. They're not gonna kick Apple out entirely, right? What they're gonna do is when Apple orders X, they might say, "Hey, we project you only need Y or X minus one, and so that's what we're gonna give you is X minus one," and then that flex capacity Apple's kinda screwed on. Um, whereas traditionally, Apple's always over-ordered by like 10% and cut back by 10% over the course of the year. And some years they, they hit the entire 10%, just, you know, volumes vary, right? Based on the season and macro, blah, blah, blah, blah, blah.
- DPDwarkesh Patel
Yeah.
- DPDylan Patel
Um, and so I don't think TSMC would kick out Apple. I think Apple will become a smaller and smaller and smaller percentage of TSMC's re- revenue, and therefore be less relevant for TSMC to cater to their demands. And TSMC could eventually start saying, "Hey, you gotta pre-book your capacity for next year for two years out, and you have to prepay for the CapEx," 'cause that's what Nvidia and Amazon and Google are doing.
Episode duration: 2:31:03
Install uListen for AI-powered chat & search across the full episode — Get Full Transcript
Transcript of episode mDG_Hx3BSUE
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome