No PriorsNo Priors Ep. 127 | With SemiAnalysis Founder and CEO Dylan Patel
EVERY SPOKEN WORD
110 min read · 21,651 words- 0:00 – 0:31
Dylan Patel Introduction
- SGSarah Guo
(instrumental music) . Hi, listeners. Welcome back to No Priors. Today, I'm here with Dylan Patel, the chief analyst at SemiAnalysis, a leading source for anyone interested in chips and AI infrastructure. We talk about open source models, the bottlenecks to building a data center the size of Manhattan, geopolitics, and poker as a tell for entrepreneurship. Welcome, Dylan. Dylan, thank you so much for being here.
- DPDylan Patel
Thank you for having me.
- SGSarah Guo
I've been really looking forward to this conversation. Um, you're such a deep thinker about this
- 0:31 – 2:10
Dylan’s Love for Android Products
- SGSarah Guo
space. And then also, it's very odd, you clearly have the Samsung watch.
- DPDylan Patel
Yeah. I- I got the Blink-
- SGSarah Guo
The folding phone.
- DPDylan Patel
... I got the Blink, I got the-
- SGSarah Guo
And the laptop.
- DPDylan Patel
... the Fold. Yeah, yeah.
- SGSarah Guo
Tell me more.
- DPDylan Patel
So part of the sto- origin story is that I was moderating forums when I was a child, and my dad's first Android phone was the Droid, right?
- SGSarah Guo
Okay, yes.
- DPDylan Patel
And for some reason, I was obsessed with, like, messing with it, like rooting it, like under-clocking it, improving the battery life, all these things, because when we were on a road trip, there's nothing to do besides, like, mess around on this phone. So I posted so much about Android that I became a moderator of slash r/Android on Reddit, and- and like many other subreddits related to hardware and NVIDIA, and Intel, and all this stuff. But because of that, I've just always had Android. Now, I've had work iPhones before, but I just really love Android, and then it's like, if you're gonna like technology, I'm not like someone who pushes it, but like get the best stuff. So I have like the Ultra Samsung watch, which I think looks cool, and the- the folding phone, right? It's fun. It's obviously different and weird. No- no iMessage is- is a travesty.
- SGSarah Guo
What does it dominate at? What is it better at?
- DPDylan Patel
Um-
- SGSarah Guo
Besides the openness of, like, the hackability.
- DPDylan Patel
I don't even hack that much stuff anymore, right? It's like, what do you use your phone for? I think- I think the main thing is, like, you can have, like, Slack and an email up on two different parts of your phone. I think that's probably the main thing. Or like, you can actually use, like, a spreadsheet on a folding phone. You cannot use a spreadsheet on a regular phone.
- SGSarah Guo
Okay.
- DPDylan Patel
And that's not even an Android thing.
- SGSarah Guo
Yeah.
- DPDylan Patel
Like, Apple's folding phone next year will be able to do that just fine, and I'll have no argument then.
- SGSarah Guo
Yeah.
- DPDylan Patel
But I just like it, you know? People- people have their preferences. People are creatures of habit.
- SGSarah Guo
You got to look at the GPU purchasing forecast-
- DPDylan Patel
Yes.
- SGSarah Guo
... on a sheet, on your phone, I think.
- DPDylan Patel
Yes, I do. I do. No. Like, it's like someone's telling you numbers. You're like, "Wait, this is, like, slightly different than my number," right? Like...
- 2:10 – 6:50
Predictions About OpenAI’s Open Source Model
- DPDylan Patel
- SGSarah Guo
Okay, so we have a week of big rumored announcements coming up. Tell me your, like, reaction to the OpenAI open source model.
- DPDylan Patel
In theory, it's gonna be amazing, right? Like, I- I assume this is releasing after it's released or...
- SGSarah Guo
Yes.
- DPDylan Patel
(laughs) So that's okay. The open source model is amazing, guys. Like... (laughs) I think the world is going to be really, like, shocked and excited. It's the first time America's had the best open source model in six months, nine months, a year. LLaMA 3.1 405B was the last time we had the best model. And then Mistral took over for a little bit, if I recall correctly, and then the Chinese labs have been dominating for the last, like, six, nine months.
- SGSarah Guo
Mm-hmm.
- DPDylan Patel
Right? So it'll be interesting. It'll also be funny because, like, the open source model probably won't be the best for just regular chat, because it is like more reasoning-focused and all these things. But it'll be really good at code, and that's... I'm excited for that. Yeah. Like tool use, although that's, like, going to be confusing. Like, how do you use the tools if you don't have access to OpenAI's tool use stuff, but the model is trained to do so? That'll be interesting for people to figure out. I think the last thing is, like, the way they're rolling it out is really interesting. They accidentally leaked all the weights, but no one in the open source has figured out how to actually run inference on it, because there's just some weird stuff in the model with the architecture, like 4-bid and, like, the biases and all this other stuff. But what's interesting is, other companies drop the model weights and say, "Go, make your own inference implementation." But OpenAI is, like, actually, like, dropping the model weights and, like, all these custom kernels for people to implement in inference. So everyone has a very optimized inference stack day one.
- SGSarah Guo
And they work with partners on it too.
- DPDylan Patel
Yeah, working with partners on this. But this is very interesting because, like, when DeepSeque drops, it's like... Well, Together and Fireworks are like, "Yeah, we're the best at inference because we have all these, like, people who are really good at low level coding," whether it be, like, Fireworks with all their, like, former PyTorch meta people, or Together with, like, you know, Tri DAO and all the, you know, Dan Fu and all these, like, super cracked, like, kernel people. They have, like, higher performance, right? Uh, but in this case, like, OpenAI is releasing a lot of this stuff. So it's, it's- it's interesting for the inference providers too, like, how do they differentiate now?
- SGSarah Guo
Yeah, I- I mean, my premise on this is, um, in the end, a lot of the model optimization performance layer is open source, and it's a commodity. Um, a- and it will end up being like a fight at the infrastructure level, actually.
- DPDylan Patel
Interesting.
- SGSarah Guo
Um, and so, you know, all of these inference providers, like, as you mentioned, you know, Fireworks and Together and Baseten and such, they- they compete on both dimensions. And the question is, what's going to matter in the long term?
- DPDylan Patel
Why would these model level software optimizations all be open? They haven't been open so far and the advancements are so fast, right? Like...
- SGSarah Guo
Oh, well, I think they... Uh, um, a bunch of them have been partially open, and I think-
- DPDylan Patel
Yeah.
- SGSarah Guo
... O- OpenAI is also pushing for them to be open as well, right? Um, and so I think there's a lot of force in the ecosystem to, um, open source from both, like, the NVIDIA level up-
- DPDylan Patel
Yes, yes.
- SGSarah Guo
... and from the model providers down, right?
- DPDylan Patel
Agreed. Yeah.
- SGSarah Guo
And so, uh, I think today, these providers all fight on that dimension.
- DPDylan Patel
Yeah.
- SGSarah Guo
And they also fight on the infrastructure dimension, and I think infrastructure is going to end up being a bigger differentiator.
- DPDylan Patel
That makes sense, yeah.
- SGSarah Guo
You can't open source your actual infrastructure, right?
- DPDylan Patel
Yeah, yeah.
- SGSarah Guo
You just have to have the network and you have to run it, right?
- DPDylan Patel
Yeah, yeah. That makes a lot of sense. Although, like, I see today, the inference, like, providers have such a wide variance, right? Like, the ones you mentioned are on the, like, the leading edge, especially, like, Together and Fireworks, I think are on the leading edge of, like, their own custom stacks, all the way down to, like, there's a lot of people who just take the out of box open source software.
- SGSarah Guo
Yeah, I think there's no market for that.
- DPDylan Patel
But those guys have just-
- SGSarah Guo
Yeah.
- 6:50 – 10:48
Implications of an American Open Source Model for the Application Ecosystem
- SGSarah Guo
move, uh, uh, you know-
- DPDylan Patel
(laughs)
- SGSarah Guo
... out and a layer down. Like, what does having access to an American open source model mean, or just more and more powerful, like, uh, open source AI models mean for the application ecosystem?
- DPDylan Patel
I mean, I know, like, a lot of people in some enterprises are really iffy about, like, using, like, the best open source model. They're, like, worried. It's like, there's nothing wrong with them today. There's nothing in them today, right? You know, there's the worry that one day they will-
- SGSarah Guo
How do you check?
- DPDylan Patel
I mean-
- SGSarah Guo
How do you know?
- DPDylan Patel
... you don't, but you can just vibes it out. Like, they're, like, competing with each other to just release as fast as possible, right? Like, like, DeepSeek and Moonshot and all these other la- you know, Alibaba, et cetera. Like, they're competing to release as fast as they can with each other. The Alibaba team's in Singapore. Like, I don't think that they're, like, putting Trojan horses in these models, right? And like, there's some interesting papers that Anthropic did on like, you know, trying to embed some stuff in models and ended up, like, being detectable pretty easily. Again, like, I don't know how to... You know, I'm not, I'm not too much into that space of interpretability and, like, evals, but I just don't think that they are, right? It's just a vibes thing. But some people are worried that they could be, or they're just, like, iffy. Like, "Oh, I don't wanna use a Chinese model." It's like, well, fine, but now you're gonna go use a service that is backed by a Chinese model, which is fine. Like, you know, like, uh, but they, you know, they're fine with that, they just don't wanna directly use the model. I don't know, I think, I think it's, it's interesting for some enterprises who are still stuck on LLaMA, but it's mostly just really interesting because it continues to move the commodity bar up. Now, with this tier being open source, and sure, like, probably won't be, like, drastically better than Kimmy, but Kimmy is so big, it's so difficult to run. Like, people aren't running it, whereas the OpenAI model's, like, relatively small, so you can run it without being, like, giga brain of infrastructure. You end up with that commoditizing so much more of the closed source API market. I don't know, I think that's just gonna be great for adoption, right? Like...
- SGSarah Guo
Yeah. One of my, um, hopes is, uh, for our companies that are doing more with reasoning, it is, like, they're still blocked on cost and latency.
- DPDylan Patel
So, this is something that I've found very interesting, is that w- we, we've been trying to build a lot of alternative data sources for token usage. Um, who's using what tokens, what models, where, et cetera, why? And it's very clear that people aren't actually using the reasoning models that much in API. Like, Anthropic has eclipsed OpenAI in API revenue, and their API revenue is primarily not thinking. It's Claude 4, but it's not in the thinking mode. You know, code is, code being the biggest, uh, use case that's skyrocketing. Um, and the same applies to, like, OpenAI and, and DeepMind, and, and from what we see, querying big users and other ways of, like, scraping alternative data, because the latency issues, because the cost issues especially, right? The cost is just ridiculous.
- SGSarah Guo
You're... Exactly. So I guess my view is, um, you're not allowed to have a tech podcast without saying the words Jevons Paradox now. And I, I think, like-
- DPDylan Patel
Oh, nice.
- SGSarah Guo
... I, I think the behavior is gonna be, like, we see a lot of people use reasoning because it's so much cheaper to run if you take out a big piece of the margin layer and you make it smaller. And so I think, like, we have a lot of companies that are at scale who are using it, but it's so expensive that they restrain themselves.
- DPDylan Patel
For a long time, OpenAI was charging more per token for the reasoning model, right, o1 and o3, uh, than they were for GPT-4o even though the architecture's, like, basically the same. It's just the weights are different. And there's, like, some reason for it to be a little bit more expensive per token because the context length is on average longer. But in general, like, it made no sense for it to be, like, what was it, like, 4X the cost per token? That didn't make any sense. And then finally they, like, cut it. Uh, but for a long time, not only was it, like, way more tokens outputted, it was also a way or higher price per token, even though... And they were just taking that as margin.
- SGSarah Guo
Because they could, right?
- DPDylan Patel
Yeah.
- SGSarah Guo
Because they had the only thing out there.
- DPDylan Patel
Yeah.
- SGSarah Guo
Yeah.
- DPDylan Patel
And then, you know, DeepSeek dropped and, and Anthropic and Google and others started releasing models and it, like, you know, commoditized quite a bit. But this is gonna just, like, kneecap, like, like, take, cut everyone off at the hip, right? Uh, and bring margins down again. So that'll be fun.
- SGSarah Guo
Oh, who has an API business, you mean?
- DPDylan Patel
Yeah, yeah.
- SGSarah Guo
Yeah, yeah.
- DPDylan Patel
For API for models that aren't, like, like, super leading edge.
- 10:48 – 17:26
Evolution of Neoclouds
- DPDylan Patel
- SGSarah Guo
What do you think, uh, evolves in the sort of neo-cloud layer over time?
- DPDylan Patel
It's funny, every day we still find a new neo-cloud. Like, we have, like, like, 200 now. And still every day we find new ones, right? Like-
- SGSarah Guo
Should they all exist?
- DPDylan Patel
(laughs) Obviously not, right? Like-
- SGSarah Guo
Okay.
- DPDylan Patel
So, so to, to some extent it depends on what the neo-cloud business is, right? Like today, it's, there is quite a bit of differentiation between the neo-clouds. It's not just, like, buy a GPU, put it in a data center. Otherwise you wouldn't have some neo-clouds with, like, horrible utilization rate, and you wouldn't have some neo-clouds who are, like, completely sold out on four, five, six-year contracts, right? Like CoreWeave, for example, who doesn't even quote most startups, or they just give them a stupid quote 'cause they're just like, "I don't want your business." Or, like, they want a long-term contract, right? Which a lot of people don't wanna sign. And so, like, there's quite a bit of differentiation in, in financial performance of these neo-clouds, time to deploy, reliability, the software they're putting on top, right? Like, many of them can't even install Slurm for you. It's like, what are you doing? Like, and you should have some sort of, like, Kubernetes-
- SGSarah Guo
So, very low-level hardware management, yeah.
- DPDylan Patel
Yeah, yeah. It's, like, very... And it's like, to some extent from the investor side we see a lot more debt and equity flowing in from the commercial real estate folks.
- SGSarah Guo
Mm-hmm.
- DPDylan Patel
As commercial real estate has been really poor over the last couple years, few years, they've been starting to pour money into cloud space. And obviously the return profile is quite different because it's, like, a short-lived asset versus, like, a longer lived asset. But at the end of the day, like, these companies, they're okay with a 10, 15% return on equity, right? Uh, and, and over time net falling. That is...... not okay for venture capital, right? And yet, a lot of these neoclouds are backed by venture capital. So a lot of these companies will fail, either because it no longer makes sense for them to continue to get venture funding, or they end up getting out-competed because they just can't get their utilization up, unlike, you know, some other clouds, right? Like, like the, like the, uh, CoreWeaves and Crusoes and, and such of the world, right? So there, there's sort of, like, a rock and a hard place for a hundred of these neoclouds. And, and there's many of them who are, like, "Oh, no, I purchased these GPUs. I have a loan. It costs me this much, and because my utilization is here, I'm, like, burning cash."
- SGSarah Guo
Mm-hmm.
- DPDylan Patel
Right? And they, they should a- the very least not be burning cash, right? And so some of them are, like, you know, they're desperate to sell the remaining GPUs, so they go out to, like, you know, companies and give them insanely low deals. There, there are some startups who I really commend because they, like, really figured out how to get the desperate neoclouds to give them GPUs. But those neoclouds are gonna go bankrupt at some point because their cash flow is worse than their debt payment. But at the end of the day, like, there's gonna be a lot of consolidation. There is gonna be differentiation, right? There's a lot of software today, uh, but we have this, like, thing called ClusterMax where we review all the neoclouds, um, and, and major clouds, and it's like, like actually, some of these neoclouds are better than Amazon and Google and Microsoft in terms of software.
- SGSarah Guo
In terms of, um, uptime and availability, or however you, uh, measure that.
- DPDylan Patel
Yeah, uptime, availability-
- SGSarah Guo
Yeah.
- DPDylan Patel
... um, reliability, network performance. Like, there's just a variety of things that they don't have all the old baggage. Uh, but the vast majority are worse, and we, we measure across, like, you know, a bunch of different metrics, including the ones I mentioned, insecurity and so on and so forth. But our vision of, like, ClusterMax is that it starts at, like, a really low stage today, which is, like, does the cloud work? And how long does it take the user to, like, get a workload running? Because you have Slurm installed or you have K8s installed and, um, you know, your network performance is good and your reliability is good and it's secure, right?
- SGSarah Guo
Mm-hmm.
- DPDylan Patel
Like, these are, like, table stakes. Like, what we consider gold or platinum tier today will be just, like, table stakes in, like, you know, six months, a year, a couple of years. Uh, there'll be a whole layer of, like, software on top, and then it's, like, do neoclouds build this software, right?
- SGSarah Guo
Mm-hmm.
- DPDylan Patel
And some of them are, right? Like Together, Nebious, um, are offering inference services on top, right?
- SGSarah Guo
Mm-hmm.
- DPDylan Patel
So they are, they are saying, "Hey, we actually wanna provide an API endpoint, not just rent GPUs." And CoreWeave, r- rumored by the information to be, uh, attempting to buy Fireworks for the same reason, right? Like, do you move up or do you just slide down into, like, "I'm making commercial real estate returns"? Or you have to go crazy, right? Like, Crusoe's like, "We're gonna build gigawatt data centers," right? Like, okay, there's no competition there. There's, like, a few companies doing that, right? So it's very different. So you either have to go, like, really, really big or you need to move into the software layer, or you just make commercial real estate, uh, or you go bankrupt, right?
- SGSarah Guo
(laughs)
- DPDylan Patel
Like, these are the paths for all neoclouds, I think.
- SGSarah Guo
I really have to believe there's a, a reason for being for these companies. And my, like, simple framework for it is, I think the software layer is really hard for people coming from this operation to, to try and build, right? There's actually a lot of very specialized software, so I think people will buy or partner into it.
- DPDylan Patel
Yeah.
- SGSarah Guo
But if you think about other inputs, it could be, like, "I'm very good at, like, finding and controlling power agreements," right?
- DPDylan Patel
Yeah.
- SGSarah Guo
Um, it could be like, "I build at a scale other people are incapable of doing so," as you mentioned.
- 17:26 – 27:43
What It Would Take to Challenge Nvidia
- DPDylan Patel
just be able to pay away.
- SGSarah Guo
I mean, I, I feel like the multi-trillion dollar question that you, um, have thought about for perhaps longer than, um, almost anyone else is, like, what does it take to actually challenge NVIDIA? You know, asking for a friend, what would it take? (laughs)
- DPDylan Patel
The, like, you know, simple way to put it is, like, it's a three-headed dragon, right? Like you have, you have ... They're actually just really, really good at, you know, engineering hardware and GPUs. Like, that is difficult. Um, they're really, really good at networking. And then they're really ... I would actually say they're, like, okay at software, but everyone else is just terrible. No one else is even close on software. But, you know, and, and I guess in that argument, you can say they're great at software. But, like, actually, like, you know, installing NVIDIA drivers is not, like, not always easy, right?
- SGSarah Guo
Well, there's great, and there's also just, like, well, there's, like, 20 years plus of work in the ecosystem, right?
- DPDylan Patel
Yeah, yeah. I- yeah. And I think-
- SGSarah Guo
There's today's capability and, like, usability, and there's just, like, mass of, like, libraries. (laughs) Yeah.
- DPDylan Patel
Yeah, so I think NVIDIA is really hard to take down because of those three reasons. And it's like, okay, as a hardware provider, can I do the same thing as NVIDIA and win? No, they're an execution machine, and they have these three different pillars, right? I'm sure they have a lot of margin, but, like, you have to do something different.... right? Um, in the case of the hyper-scalers, right, Google, Amze- uh, with TPUs, Amazon with Trainium, Meta with MTIA, they are making a bet of, "I can actually do something pretty similar to NVIDIA." If you squint your eyes now, like Blackwell and TPU is starting... Like, the, the NVIDIA architecture with TPU architectures are actually converging, like same memory hierarchies and similar sizes of systolic arrays. Like, it's actually not that different anymore. It's still quite different, right? Uh, but hand wave U, it's, like, pretty similar. And Trainium and TPUs are very similar. Architecturally, the hyper-scalers are not doing anything crazy, but that's okay because they can just, like, do the mass, the margin game. That's fine. But for a chip company to try and compete, they must do something very unique. Now, if you do something unique, it's like, okay, all your energy is focused on that one unique thing, but on every other vector, you're gonna be worse. Like, are you gonna be there at the latest process known as fast as NVIDIA? No? Okay, that's like 20, 30%, right, on cost/performance and power, right? Are you gonna be on the latest memory technology as fast as NVIDIA? No, you'll be, like, a year behind. Great. Same, same penalty. Are you gonna be the same on networking? No? Okay. You know, you just stack all these penalties up, it's like, "Oh, wait, your unique thing can't just be, like, two to 4X faster. It has to be, like, way faster." But then the problem is, if you really look at it simplistically, right? Like, a flop is a flop, right? Uh, again, like, this is super simple, but, like, there is not 10X you can get out of doing a standard von Neumann architecture on efficiency of compute. Um, in which case, do all of these things that NVIDIA will engineer better than you because they have a team of 50 people working on, you know, just memory controllers and HBM, and just, like, a- and networking, or actually, like, thousands of people working on networking. But, like, each of these things, do they just cut you by 1,000? And that's like, "Oh, actually, what would have been 5X faster is now only, like, 2X faster. Plus, if I, like, misstep, I'm, like, six months behind, and now the new chip is there." Right? And you're screwed. So, or, or supply chain, or, like, intrinsic, like, challenges with, oh, okay, getting other people to deploy it now, or rack deployments. There's all these supply chain challenges, right? Like, literally, in Amazon's most recent earnings, they said their, like, chip architecture is not aggressive. Their, their rack architecture is very simple. It's not that aggressive. They were like, "Yeah, we have rack integration yield issues, which is why we've had..." Uh, which is they, like, blamed their miss on AWS for their Trainium not coming online fast enough because of rack integration issues. And when you look at the architecture, like, we have an article on it, but it's like, it's not, like, that crazy. Like, it's like what Google was doing, like, four or five years ago, right? It's like, "Oh, wait, supply chain is hard." And Amazon couldn't get everything in supply chain to work, and so therefore they missed their AWS revenue by a few percent, right? Which caused the whole stock market to freak out. But it's like, there are so many things that could go wrong in hardware, and the timescales are so long. And then the last thing is that, like, model architecture is not stagnant. If it was, NVIDIA would s- optimize for it. But model architecture and hardware, right, software-hardware co-design is the thing that matters. Right? And these two things, you can't just, like, look at one in individual, right? Like, there's a reason why Microsoft's hardware programs suck, right? Because they don't understand models at all, right? Meta, Meta, their chips actually work for recommendation systems and they're deployed for recommendation systems, because they can do hardware-software co-design. Google is awesome because they do hardware-software co-design. Uh, why is AMD not catching up despite being awesome at hardware engineering? Well, yeah, they're bad at networking, but also they suck at software and they can't do hardware-software co-design. You know, there's, like, much deeper reasons why you can get into this, but you have to understand the hardware and the software, and they move in lockstep. And whatever your optimization is doesn't end up working, right? So one example is all of the first wave AI company, AI hardware companies, right? Cerebras, Grok, uh, SambaNova- Yup. ... uh, Graphcore. Graphcore, yup. All of them made a very similar bet. Now, they were very different, right? Well, some of these are architecturally pretty weird, really. Yeah. Right, they're architecturally pretty weird, but they made the same bet on memory versus compute, right? "We're gonna have more on-chip memory and lower bandwidth." Right? Off chip, right? Because tha- that was the trade-off they decided to make. So all of them had way more on-chip memory, uh, than NVIDIA, right? NVIDIA, their, their on-chip memory has not really grown much from A100, H100, Blackwell, right? It's up 30% in, like, three generations. Whereas these guys had, like, 10X the on-chip memory, right? All the way back in, like, A... when they were competing with A100 or even the generation before. But that ended up being a problem because they were like, "Oh, yeah, we could just run the model on the chip," right? Mm-hmm. "If we put the whole weight, all the weights on there. And then, you know, we'll-" Totally. "... be so much more efficient." Yeah. And then th- the models just got way too big, right? Yeah. And Cerebras was like, "Oh, wait, but our chip is huge." Yeah. "Oh, wait, but still the model's way too big to fit on it." This is, like, very simple, right? You know, the same thing's happening in the other direction, right? Like, some companies are like, "Oh, we're going to make our s- like, systolic array, your compute unit, super, super, super large, because let's say LLaMA 70B is an 8K hidden dimension and your batch and all that, like, it's, it's a pretty large MATMUL. Oh, great. Okay, we'll make this chip," and then all of a sudden, all the models get super, super sparse MOEs. Mm-hmm. Right? Like, the hidden dimension of DeepSeque's models are, like, really tiny because they have a lot of experts, right? Instead of one large MATMUL, it's a bunch of small ones, you do route, route, right? Like, and all of a sudden, like, if I made a really, really large hardware unit, but I have all these small experts, how am I gonna run it efficiently? You know, I, I, no one pre- They didn't really predict that the hardware would go that way, but then it ended up going that way. And this is like... This is actually the case with at least two of the AI hardware companies today. I don't wanna, I don't wanna shit talk them just because, you know, it's a let's be friendly. Uh, but like, this is, like, like, clearly, like, what's happening, right? So it's like, you can make a b- decision. It's a hardware bet that will actually be way better on today's architectures, but then architecture evolves and the generality of, like, NVIDIA's GPUs or even, like, TPUs in Trainium is, like, ge- more general than, like, as an architecture. But then it doesn't beat NVIDIA by that much, right? In which case, they're just gonna destroy you with their six months or a year ahead on every technology because they have more people working on it and their supply chain is better, right? So you, you, you... It's kind of really tough to make that architecture bet, have the models not just go in a different direction that no one predicted because no one knows where models are headed, right? Even like, you know, you could get Greg Brockman and he might, like, have, like, a good idea, but, like, I'm sure he doesn't even know where our models will look like in two years. So there's got to be a level of generality. And it's hard to, like, hit that intersection properly. And so I'm very hopeful people compete with NVIDIA. Um, I think it would be a lot more fun. There'd be a lot less margin eaten up by the infra and there'd just be a lot more deployment of AI potentially.... um, if someone was able to compete with NVIDIA effectively. But NVIDIA charges a lot of money because they're the best. And like, if there was something better, people would use it, but there isn't. And it's just really hard to get be- be better than them.
- SGSarah Guo
I mean, you have to give the first gen AI hardware companies some credit because they, like, made a secular-
- DPDylan Patel
Yes.
- SGSarah Guo
... correct decision about the workload. But then the architectural decisions, like, ended up being hard to predict correctly.
- DPDylan Patel
Yeah.
- SGSarah Guo
Right? Then you have the cycle of NVIDIA innovation, which is really hard to compete with.
- DPDylan Patel
Yep.
- SGSarah Guo
Um, both hardware and also, as you said, supply chain issues.
- DPDylan Patel
Even just putting together servers is hard.
- SGSarah Guo
Yes. Um, I think the thing that you point out that, like, people oversimplified was with maybe a- a current generation of AI chip startups, they're like, "We're betting on transformers," and it's a lot more complicated than that in terms of-
- DPDylan Patel
Yeah.
- SGSarah Guo
... like, workload at scale and continued evolution in model architecture. And it's also not exposed so that if you're not working with the soda labs at, like, from the beginning, and then you can't make predictions because nobody can make a lot of predictions right now, it's very hard to, like, say, "I'm going to be better at the workload two years from now," in a, like a very comfortable way-
- DPDylan Patel
Yeah.
- SGSarah Guo
... with no other changes happening. Like, I can't make that bet right now.
- DPDylan Patel
Yeah, and it's like, one- one of the interesting things about OpenAI's open source models, it's like all their training pipelines, but on a quite boring architecture. Right? Like, it's not their crazy, like, cool architecture advantages that they have in their closed source models, which are, make it better for long contacts or more efficient KV cache or all these other things, right? They're doing it on a standard model architecture that's publicly available. They, like, intentionally made the decision to open source a model with a boring architecture that's pretty much open source, right, already. Like, people have already done all these things and kept all the secrets internal that they wanted to keep. And it's like, what's- what's in there, right? Are they even doing standard scale dot product attention? Probably, but like, there's probably a lot of weird things they're doing which don't map directly to hardware, uh, like you mentioned, right? Like, transformer chip architecture is like, there's a lot more complicated here than just like, oh, it's optimized for transformers, because like, so is an NVIDIA chip and a TPU, and their next generation is more optimized for it. Like, they take steps towards it, they don't leap, but as long as they're, like, close enough to where you are architecturally optimized for workload, they'll beat you because of all the other reasons.
- SGSarah Guo
And I think your description of like, how might a, like, a chip startup win or any vendor win by specializing, like, that ex- actually is really hard in this era, like, generalization may-
- DPDylan Patel
And- and-
- SGSarah Guo
... continue to win to a degree, yeah.
- DPDylan Patel
And it happened with all the edge hardware companies too. You know, we talk about the first gen AI hardware companies for data center, there were a handful, but for the edge there were like 40, 50. And like, none of them are winning because it turns out the edge is just, take a Qualcomm chip or an Intel chip that's made for PC or smartphone and deploy it on the edge, right? Like, that ended up being way more meaningful. So- so it ends up being like, the incumbents, they can take steps towards what you're going for, and if you didn't execute perfectly or if the models didn't change their architecture away from what you thought it would
- 27:43 – 28:18
What Would an Nvidia Challenger Look Like?
- DPDylan Patel
be, you end up failing.
- SGSarah Guo
If you had to make a bet that something becomes competitive, what is the configuration or company type that- that does that?
- DPDylan Patel
I don't want to shill any company that I've invested in or anything like that, and so therefore I'm just-
- SGSarah Guo
Not investment advice, okay, yeah.
- DPDylan Patel
No, no, no. But like, I've like, I like, I would just say like, I probably think that like AMD GPUs or Amazon's Trainium will be probably more likely to be a best second choice for people, or Google TPU of course, but I think Google's just more interested in it for internal workloads. I- I just think that those will be much more likely options, uh, to succeed than a chip hardware startup, yeah. But I mean, I really hope they do 'cause there's some really cool
- 28:18 – 34:48
Understanding Operational and Power Constraints for Data Centers
- DPDylan Patel
stuff they're doing.
- SGSarah Guo
If we zoom out to, um, the macro and we think about just the scale of, um, hardware and data center deployment for these workloads, people talk a lot about the operational constraint on building data centers of this size, the, uh, power constraints. I think in particular on the power side, it's very interesting how that practically shows up. Is it generation, at scale, at cost? Is it grid issues? Is it real... Like, how- how should, you know, more people in technology understand this?
- DPDylan Patel
Yeah, so supply chain is always, like, fun because, like, people want to point at one thing is the issue, but it always ends up being these things are so complicated, like, if one thing was solved, you could increase production another 20% and then something else would be the issue.
- SGSarah Guo
You think it's a multi bottleneck issue?
- DPDylan Patel
Yeah, or like-
- SGSarah Guo
Yeah.
- DPDylan Patel
... "Hey, for company A it's actually, because their supply chain is this, this is the issue. And for company B it's, this is the issue." But, you know, that's sort of in generalities, but like, I think zooming out, right, like, Noah, Noah Pinion, like he had a really fun blog about like, is this AI hardware build out going to cause a recession? I- I think it's actually funny because you can flip the statement and be like, actually, the US economy would not be growing that much this year if it weren't for all the AI build outs, and as a result, data center infrastructure. As a result, electricians' wages have soared. As a result, power deployments and other capital investments which have 15, 30 year lifespans are being made, and all of this CAPEX is in turn actually growing the economy. And, like, actually, maybe the economy wouldn't even be growing much or at all if it weren't for all of these investments.
- SGSarah Guo
One thing that is perhaps looked over from the, um, White House AI action plan was the view of like, we're going to build these AI data centers in the United States, we're actually going to need like a lot of general investment beyond the GPUs and the power, which are everybody's-
- DPDylan Patel
Yeah.
- SGSarah Guo
... first two items-
- DPDylan Patel
Yeah.
- SGSarah Guo
... into, like, labor, for example. Right? So if you just, you know, for simplicity's sake be like, it's the size of Manhattan and we have to run it, and it's a new system with changing topology and like very high degree of relatively novel hardware with failure-
- DPDylan Patel
Yeah.
- SGSarah Guo
... and like lots of networking, then I'm like, hmm, like kind of feels like we need to have a bunch of new capacity, like from a labor or robotics sort of view.
- DPDylan Patel
In like '23 it was very simple, it's like, NVIDIA can't make enough chips. Oh, okay. Why can't NVIDIA make enough chips? Oh, CoWoS, right? Chip on wavefront substrate packaging technology. And it was like, oh, HBM, right? Like those were like... It was like very simple in '23-
- SGSarah Guo
Bonders, yeah. Yeah.
- DPDylan Patel
... '24. Like, yeah, all these tools involved in that supply chain. It was great. But then it like very quickly became much more murky, right? Then it was like, oh, data centers are the issue. Oh, okay. We'll just build a lot of data centers. Oh wait, substation equipment and transformers are the issue. Oh wait, power generation is the issue. It's not like the other issues went away, right? Like actually, you know, ch- CoWoS...... uh, is still a bottleneck, and HBM is still a bottleneck. Optical transceivers are still a bottleneck, but so is power generation and data center ph- physical real estate, right? Like, w- I mentioned, like, META is literally building these, like, temporary, like, tent structures to put GPUs in because building the building takes too long.
- SGSarah Guo
(laughs)
- DPDylan Patel
And it takes too much labor.
- SGSarah Guo
Yeah.
- DPDylan Patel
Right? As you mentioned labor, right? That's, like, one way they were able to remove, uh, part of a constraint.
- SGSarah Guo
Yeah.
- DPDylan Patel
They're still constrained on power and they had to delay the, uh, bring up of some GPUs in Ohio because the AEP, the grid in Ohio, like, had some issues, right? The utility, right? Th- uh, would like, bring in a generator or something, right? Oh, okay, great. Uh, we'll, we'll buy our own generators and put them on site. Oh, wait, now there's an eight-year backlog or whatever, a four-year backlog for GE's turbines.
- SGSarah Guo
Yeah.
- DPDylan Patel
Oh, okay, great. Um, I'm, I'm Elon, I'm gonna buy a power plant from overseas that's already existing, we're gonna move it in. Okay, great, now there's, like, permits and people protesting against me in Memphis. Like, you know, there's, like, there's, like, a bajillion things that could go wrong, and labor is a huge one. I've literally had people in pitches be like, "No, no, no. We've already booked all the contractors, so no one else is going to be able to build a data center in this entire area of this magnitude besides us."
- SGSarah Guo
Because we took all the people. (laughs)
- DPDylan Patel
We took all the people. They're gonna have to fly them in. But it's like, okay, fine. Like, you can fly them in, but it's like, there's just, like, not that many electricians in America. And as a result, we've seen the wages rise a lot for people building data center infra. There's a group of, like, these Russian guys who used to work for Yandex, Russia's search engine, who, like, wire up data centers, who now live in America, and they get paid a ton. Like, and they get paid bonuses for being faster, and therefore, they do, like, certain drugs to be able to finish the build-out faster.
- SGSarah Guo
What? (laughs)
- DPDylan Patel
Because they get bonuses based on how fast they build it, right? Like, it's like, there is crazy stuff going on to alleviate bottlenecks, but it's like, there's bottlenecks everywhere. And it really just takes a really, really hyper-competent organization tackling each of these things and creatively thinking about each of these things. Because if you do it the layman old way, you're gonna, you're gonna lose and you're gonna, like, you're gonna be too slow, right? Which is why OpenAI and Microsoft partially, like, Microsoft is not building Stargate for OpenAI, right? Just because it would have just been too slow, and they're doing it the layman old way. You have to go crazy. You have to go... That's why Microsoft rents from CoreWeave a ton, right? Because, "Oh, wait, we, we, we need someone who can do things faster than us," and, "Oh, look, CoreWeave's doing it faster." And now, like, you know, OpenAI is, like, going to Oracle and CoreWeave and, and others, right? Nscale in Finland and all these other companies all around the world. The Middle East, right? G42. Like, anywhere and everywhere they can get compute because you put your eggs in many baskets and whoever executes the best will win. And this infrastructure is very, very hard. Software is, like, fast turnaround times. Like, you know, it's, it's still hard. Software is not easy. But it's like, the cycle time is very fast for, like, try something fail, right? Try something else. It is not for infra, right? Like, what has xAI actually done to deserve their prior funding rounds? They haven't released a leading edge model, right? And yet their evaluation is higher than Anthropic today, right? At least, you know, Anthropic's raising, but whatever, right? Like, it's Elon A and B, they've tackled a problem creatively and done it way faster than anyone else, which is building Colossus, right? Like, and that's, like, commendable because that is part of the equation of being the best at models, right?
- SGSarah Guo
It's the input. Yeah, besides the talent, yeah.
- 34:48 – 43:01
Dylan’s View on the American Stack
- DPDylan Patel
and tackled creatively.
- SGSarah Guo
Speaking of, like, the policy, uh, and, uh, geopolitics implication here, like, what do you think about the, you know, White House, um, implication that America needs to, like, export the AI stack or, like, needs to control important components of it? Like, it's better for us to be exporting NVIDIA chips than to foster a new industry. It's better for us to have, like, a globally leading open source model, et cetera. Like, what actually makes sense to you there?
- DPDylan Patel
I want to tell a crazy story. I was in Lebanon-
- SGSarah Guo
Okay.
- DPDylan Patel
... uh, for a week.
- SGSarah Guo
It's a good start. Yeah, yeah.
- DPDylan Patel
Yeah. (laughs) This is completely unrelated-
- SGSarah Guo
Yeah.
- DPDylan Patel
... but it just popped into my head. I think it'll be entertaining.
- SGSarah Guo
Yeah.
- DPDylan Patel
I was in Lebanon. I was with a few of my friends. Uh, so it's, like, two Indian people, two Chinese people, then a Lebanese person, right? Um, and these, like, 12-year-old girls ran up to the Chinese woman that was with us, like, my friend, and they were like, "Oh, my God, your skin's so beautiful. Do you like sushi?" Right? It's like, fine, you're just ignorant. But what was really interesting is, like, when they asked where we're from, we're like, "San Francisco." They're like, "Do people get shot in the streets?" Because their entire worldview was built from TikTok-
- SGSarah Guo
Okay, yeah.
- DPDylan Patel
... of politics. And it's like-
- SGSarah Guo
Yeah.
- DPDylan Patel
... when you think about the global propaganda machine that is Hollywood, and it's not intentional, it's just American media is pervasive, it built such a positive image of America. Now, like, with monoculture broken and it's more social media based, a lot of the world thinks America is, like, people are getting shot all the time and it's, like, really bad and it's, like, bad lives and people are working all the time, it's unsafe and, like... You know, like, Europe has a certain view of America and, like, I don't think it's accurate. Like, random Lebanese 12-year-old had a really negative view of some... Like, they liked America, they loved Target for some reason-
- SGSarah Guo
(laughs)
- DPDylan Patel
... because some influencers posted TikToks about Target-
- SGSarah Guo
Yeah.
- DPDylan Patel
... but, like, they had negative views of America. And it's like, from a sense of, like, what is important is, like, the world should still run on American technology, right? Uh, and they generally do still, in terms of the web, although, you know, ByteDance TikTok has broken that to a large degree. But in this next age, do you want them to run on Chinese models which now have Chinese values, which then spread Chinese values to, uh, the world? Or do you want them to have American models, to have American values? Like, you talked to Claude and has a worldview, right?
- SGSarah Guo
Yeah.
- DPDylan Patel
And it's like, I don't know if you want to call that propaganda or what. There's a worldview that you're pushing, right? And so I think it makes sense that we need that worldview espoused. Now, how do you do that, right? The prior administration, current administration had different viewpoints on this, right? Prior administration said, "Yes, we would love for the whole world to use our chips, but it has to be run by American companies." And so it was like, "Microsoft, Oracle, we're cool with you building shitloads of capacity in Malaysia. We don't want random other companies doing it in Malaysia." And so the prior diffusion rule had a lot of technical ways in which, like, you know, you could be, you, you could have these, like, licenses and all this. And it was very hard for, like, random small companies to build-... large GPU clusters, right? But it was very easy for Microsoft and Oracle to do it in, in Malaysia. Of course, the current administration tore that up and they have their own view on things. I, I, I mean, I think there was a lot of things wrong with the diffusion rules, right? They were just too complicated. They pissed a lot of people off, et cetera. Now they have a different view, which is like what did they do in the Middle East, right, with the deal they signed? Well, actually, most of those GPUs are being operated by American companies or rented to American companies, right? Either or, right? Like G42, operating them, but renting them mostly to, like, OpenAI and such for a large part. Or Amazon and Oracle and others are operating the GPUs themselves in the Middle East. So it's like, okay, that's effectively the same thing, but in a very different way. That is still, I think, a view, right? Which is, like, we want America to be as high in the value stack as possible, right? If we could sell tokens, or if we could sell services, we should.
- SGSarah Guo
Mm-hmm.
- DPDylan Patel
Okay, but if we can't sell the service, let's at least sell them tokens. Okay, if we can't sell them tokens, at least sell them, like, infra, right? Um, whether it be data centers or renting GPUs or just the GPUs physically. Um, and it sort of, like, makes sense, right, in the value chain. Like, give them the highest value, highest margin thing where we capture most of the value, and, like, squeeze it down to where, like, actually, for, like, the bottom of the stack, right? Like the tools to make chips, maybe you shouldn't sell. And so, like, ex- current export controls and policy dictate that, yes, uh, you know, it's better to sell them services, but sell them both, right? Like, give the option, let us compete, uh, and don't let anyone else win. I think the challenge here is that, like, how much are you enabling China by selling them their RPGUs? Like, how much fearmongering around, like, Huawei's production capacity is there? Like, how realistic it ver- is it versus not because of the bottlenecks of, like, Korea, sanctions that America's made Korea put on China for memory or Taiwan on China for, uh, chips or, you know, US equipment on China, right? Like, there's a lot of different sanctions. Many of these are not well enforced/have holes, but it's sort of like a... It's a very difficult argument on, like, how much capacity of GPUs should be sold to China. A lot of people in, in San Francisco, frankly, don't sell China any GPUs. But then they cut off rare earth minerals and, you know, like, ostensibly, most people think that, like, the deal was that you get, you get GPUs and also EDA software 'cause the administration banned EDA software for a little bit, just for, like, a few weeks basically, until China was like, "Okay, we'll ship rare earth minerals." You can't just ban everything because China can retaliate. If they banned rare earth minerals and magnets and such, car factories in America would've shut down and the entire supply chain there would've had, like, hundreds of thousands of people not working, right? Like, you know, like there is like-
- SGSarah Guo
Yeah.
- DPDylan Patel
There is a push and pull here.
- SGSarah Guo
There's a standoff here, yeah.
- DPDylan Patel
There is a push and pull here. So like, do I think China should just have the best NVIDIA GPUs? No, like that, that would suck. But like, you know, can you give them no GPUs? No, they're gonna retaliate. Like, there is a middle ground and, like, Huawei is eventually going to have a lot of production capacity, but there's ways to slow them down, right? Like properly ban the equipment because it's not, there's a lot of loopholes there. Uh, properly ban the sub-components, the, uh, of like, of memory and wafers 'cause Huawei is still getting, uh, you know, wafers in Taiwan from TSMC through, like, shell companies, right? Like, it's like, you know, there, there's a lot of enforcement challenges because parts of the government are not, like, funded properly or not competent enough and has never been competent, right? So it's like, how do you work within this framework? Well, like, okay, fine, we should sell them some GPUs so that they, you know, that kind of slows them down on a Huawei standpoint, although not really, right? Um, but also, like, gets us back the rare earth minerals, but don't sell them too many, right? Like, how do you find that massive gray line is what the administration's grappling with in my view.
- SGSarah Guo
Implied in that opinion is your belief of they are going to be able to build NVIDIA-equivalent GPUs eventually.
- DPDylan Patel
Um.
- SGSarah Guo
If forced.
- 43:01 – 44:22
What Dylan Would Ask Mark Zuckerberg
- SGSarah Guo
is the game.
- DPDylan Patel
(laughs)
- SGSarah Guo
Um, I wanna ask you, like, a wild card, uh, question to, uh, finish out. Um, we're trying to get Mark to do the podcast.
- DPDylan Patel
Zuck?
- SGSarah Guo
Yes. Uh, you can ask him any question. What would you ask? Mark, you gotta do the podcast.
- DPDylan Patel
I thought, like, the like... Did you read the doc- the page they put up? I thought that was very interesting that they were like, "We want AI to be your companion." So my question to him is not, like, around his infra stuff 'cause I feel like I know most everything. Like, you can figure that stuff out from supply chain and, like, satellites and all this stuff. But, like, the interesting thing I'm curious about is philosophically what exactly, like, does the world look like if everyone is talking to AIs more than other people or if they're interacting socially with the AIs more than other people? Do we lose our human element? Do we lose our human connection? It's not the same thing as, hey, I'm posting on social media and we're interacting with our social media posts, which that already breaks the brain of a lot of people. What happens when it's, like, always on your face like meta... You know, his worldview is like meta reality. Labs makes these, like, devices that you wear and they're always... They have all this AI on them and you're talking to the AI companion all the time. How does that change the human psyche? Like-This human-machine evolution, like, is- what are the negative ramifications of it? What are the positive ramifications? How do we- how are you going to make sure that there's more positive ramifications from this than like, you know, the sloppification and like complete brain rot of like our youth, right? Which I- I like love my brain rot, right? Like
- 44:22 – 46:51
Poker and AI Entrepreneurship
- DPDylan Patel
it's like, okay.
- SGSarah Guo
Obviously, the coding wars continue to be like very central.
- DPDylan Patel
Mm-hmm.
- SGSarah Guo
And we were talking about cognition's relevance and like how- how to think about the strategy here. But I do think it's really funny what flipped your bit on cognition. Can you tell the story?
- DPDylan Patel
I- I thought cognition, NGMI, right? Like, you know, like OpenAI, cr- Anthropic, xAI, et cetera, they're just gonna make better code models. Like, you know, they just have way more resources. General models will win. You know, I hadn't- hadn't really met too many people there. It was just like a pure vibes-based thing. And I had, you know, I'd used a little bit of- of Devin, but I was like, "Whatever," right? Like, it was like, "Cloud Code seems better," and we use that internally. But like, I went to Coatue's East Meets West event. It's an awesome event where there's people from Asia. Like, there was like, you know, all these like CFOs and CEOs of like major Chinese companies, uh, East Coast of US, all these finance bros. Also, West Coast, like a lot of tech people, right? So you and I were both there. There were people from governments and major companies. And Scott was there. Um, I spoke with him like very briefly, but then what was interesting is like, it's like, you know, they have a poker night one night and everyone gets plastered. The- the like leader of Coatue is like-
- SGSarah Guo
LaFont, yeah.
- DPDylan Patel
... very good at poker. These hedge fund guys are just good at poker generally.
- SGSarah Guo
And they love it.
- DPDylan Patel
Technically, they like poker as well.
- SGSarah Guo
Yeah.
- DPDylan Patel
You know, there's a big poker culture in- in the Bay. I was playing. I'm okay, right? Um, but I see- I see sc- I look over at the super high stakes table. Scott's just dominating everyone, right? I'm like, "What is going on?" Like, how are you like- you're like taking chips from like CEO of major Chinese company. I don't want to name people's names because I think there's-
- SGSarah Guo
Yeah.
- DPDylan Patel
... like some terms around them, like naming who's there. But like, you know, it's like you're- you're like winning like a lot of chips from a lot of big people. And it's like all of a sudden my vibes were like, "I don't know, maybe like maybe he can win. Maybe he can take from the lion, you know?" Uh, so I was like very excited about that. Uh, you know, I thought it was funny. Uh, I still have zero... Like, I- I have not done much due diligence on their code product. Like, you know, like it's like nor have I owned like Cloud Code besides the fact that we use it. But it's like, you know, cool.
- SGSarah Guo
Well, I think Windsurf acquisition part two is like a- a pretty good hand to play here. Um, and, uh, you know, as somebody who invests a lot at a, you know, violently competitive application level-
- DPDylan Patel
Yeah.
- SGSarah Guo
... poker game is live, man. Everybody, they're-
- DPDylan Patel
(laughs) Exactly.
- SGSarah Guo
You just invested live players.
- DPDylan Patel
Exactly. And- and so I- I just loved that, you know, that was how he, uh, he dominated everyone. And it's like- it's like it's such a stupid reason 'cause I pride myself on being analytical and like data-driven. And it's like, you know, vibes.
- SGSarah Guo
Correct. For any entrepreneurs listening, I think like, you know, Dylan might angel invest or we might back you fully if you- if you win the cognition poker game.
- DPDylan Patel
(laughs)
- 46:51 – 47:17
Conclusion
- DPDylan Patel
- SGSarah Guo
Uh, and we'll host at Conviction. Um, okay. Think we got it? Good. Awesome.
- DPDylan Patel
Yeah. Thank you.
- NANarrator
(instrumental music)
- SGSarah Guo
Find us on Twitter @nopriorspod. Subscribe to our YouTube channel if you wanna see our faces. Follow the show on Apple Podcasts, Spotify, or wherever you listen. That way, you get a new episode every week. And sign up for emails or find transcripts for every episode at no-priors.com.
Episode duration: 47:17
Install uListen for AI-powered chat & search across the full episode — Get Full Transcript
Transcript of episode vGhlJqnECd0
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome