Dwarkesh PodcastSatya Nadella on Dwarkesh Patel: Why GitHub Copilot Leads AI
Fairwater 2 is a 10x compute step beyond GPT-5 in Microsoft's roadmap; Satya shows GitHub Copilot holds even as Claude Code and Cursor close the gap.
EVERY SPOKEN WORD
150 min read · 30,082 words- 0:00 – 4:15
Fairwater 2
- SNSatya Nadella
... maybe after the industrial revolution, this is the biggest thing. But at the same time, I'm a little grounded in the fact that this is still early innings. If you're a model company, you may have a winner's curse. You may have done all the hard work, done unbelievable innovation, except it's kinda like one copy away from that being commoditized. We didn't want to just be a host star for one company and have just a massive book of business with one customer. That- that's not a business. You can't build an infrastructure that's optimized for one model. If you do that, you're one tweak away from some MoE, like, breakthrough that happens when your entire network topology goes out of the window. Then that's a scary thing. Our business, which today is an end user tools business, will become essentially an infrastructure business in support of agents doing work. The thing that you have to think through is not what you do in the next five years, but what do you do for the next 50?
- DPDwarkesh Patel
Today, we are interviewing Satya Nadella, we being me and Dilin Patel, who is founder of SemiAnalysis. Satya, welcome.
- SNSatya Nadella
Thank you. It's great. Thanks for comin' over to Atlanta.
- DPDwarkesh Patel
Yeah. Thank you for giving us the tour of, uh, the new facility. It's been really cool to see.
- SNSatya Nadella
Absolutely.
- DPDwarkesh Patel
Satya and Scott Guthrie, Microsoft's EVP of Cloud and AI, give us a tour of their brand new Fairwater 2 data center, the current most powerful in the world.
- SGScott Guthrie
We try to 10X the training capacity every 18 to 24 months. And so this would be, effectively, a 10X increase, 10X from what GPT-5 was trained with. And so to put it in perspective, the number of optics, the network optics in this building, is almost as much as all of Azure across all our data centers two and a half years ago.
- SNSatya Nadella
It's kinda, what, five million network connections.
- DPDwarkesh Patel
You've got all this bandwidth between different sites in a region and between the two regions. So is this like a big bet on scaling in the future, that you anticipate in the future there's gonna be some huge model that needs to require two whole different regions to train?
- SNSatya Nadella
The goal is to be able to kind of aggregate these FLOPs for a large training job and then put these things together across sites.
- DPDwarkesh Patel
Right.
- SNSatya Nadella
And the reality is you'll use it for, uh, training, and then you'll use it for data gen, you'll use it for inference in all sort of ways.
- DPDwarkesh Patel
Yeah.
- SNSatya Nadella
It's not like it's going to be used only for one workload forever.
- SGScott Guthrie
Fairwater 4, which you're gonna see under construction nearby-
- SNSatya Nadella
Mm-hmm.
- SGScott Guthrie
... yeah, will also be on that one peta- petabits network-
- SNSatya Nadella
Yep.
- SGScott Guthrie
... so that we can actually link the two at a very high rate. And then basically we do the AI WAN connecting to Milwaukee, where we have multiple other Fairwaters being built.
- SNSatya Nadella
Literally, you can see the- the model parallelism and the data parallelism. And it's kinda built for, um, essentially the training jobs, the pods, the super pods across this campus, and then with the WAN, you can go to the Wisconsin data center and literally run a training job with all of them getting aggregated.
- SGScott Guthrie
And what we're seeing right here is, this is a cell with no servers in it yet, no racks.
- DPDylan Patel
How many, uh, racks are in a cell?
- SGScott Guthrie
We- think about it, uh, y- we don't necessarily share that per se, but- but we- we- let me-
- DPDylan Patel
That's the reason I asked. (laughs)
- SGScott Guthrie
Uh, you'll see upstairs and so-
- DPDylan Patel
I'll start counting. I'll start counting.
- SGScott Guthrie
You can start counting. We'll let you start counting.
- DPDylan Patel
How many cells are there in this building?
- SGScott Guthrie
That part also I can't tell you, but-
- DPDylan Patel
Division is easy, right? My God, it's kinda loud.
- 4:15 – 13:42
Business models for AGI
- DPDwarkesh Patel
- DPDylan Patel
When you look at all the past technological transitions, whether it be, you know, railroads or the internet or, you know, replaceable parts and industrialization, uh, the cloud, all of these things, each revolution has gotten much faster in the time it goes from technology discover to ramp-
- SNSatya Nadella
Yeah.
- DPDylan Patel
... and pervasiveness through the economy. Many folks who have been on Dwarkesh's podcast believe this is sort of the final, uh, technological revolution or transition, and that this time is very, very different. Um, and at least so far in the markets it's sort of, you know, in three years we've already skyrocketed to, you know, hyper scalers are doing $500 billion of CapEx next year, which is a scale that's un- unmatched to prior revolutions in terms of speed. And the end state seems to be quite different. How- how do you, y- y- your- your framing of this seems quite different than sort of the, I would say the AI bro, who is-
- DPDwarkesh Patel
(laughs)
- DPDylan Patel
... who is quite, you know, AGI is coming and, you know, I- I'd like to understand that more.
- SNSatya Nadella
Yeah, I mean, look, I- I- I start with the excitement that I also feel for maybe after the industrial revolution this is the biggest thing. Um, and so therefore, I- I- I- I- I start with that premise. Uh, but at the same time, I'm a little grounded in the fact that, uh, this is still early innings. Uh, we've built some very useful things. We're seeing some great properties. The scaling laws seem to be working. Um, and I'm optimistic that they'll continue to work, right? Some of it is, um, you know, it does require real science breakthroughs, but it's also a lot of engineering and what have you. But that said, I also sort of take the view that, you know, even what has been happening in the last 70 years of computing, uh, has also been a march, uh, that has helped us move, um, you know, with the... As I said, you- you know, I- I like one of the things Raj Reddy-... has as a metaphor for what AI is, right? He's a, he's a Turing Award winner out of, uh, CMU, um, and he's always ... And he had this even pre-AGI, uh, but he had this metaphor of, uh, AI should either be a guardian angel or a cognitive amplifier. I love that. Uh, it's a simple way to think about what this is. Ultimately, what is its u- human utility? It is going to be a cognitive amplifier, uh, and a guardian angel. And so if I sort of view it that way, I view it as a tool. But then you can also go very mystical about it and say, "Well, this is, you know, more than a tool. It does all these things which only humans did so far." But that has been the case with many technologies in the past. Only humans did a lot of things, and then we add tools that did them.
- DPDwarkesh Patel
Mm. I guess, uh, we don't have to get wrapped up in the definition here, but maybe one way to think about it is like may- maybe it takes five years, 10 years, 20 years. At some point, eventually a machine is producing Satya tokens, right? And the Microsoft board thinks that Satya tokens are worth a lot.
- SNSatya Nadella
(laughs)
- DPDylan Patel
How much, how much are you wasting of this-
- DPDwarkesh Patel
(laughs)
- SNSatya Nadella
... of, of like economic value by interviewing Satya? (laughs)
- DPDwarkesh Patel
(laughs) You could not afford the API cost of Satya tokens. Um, but so, you know, whatever you wanna call it is that, are the Satya tokens a tool or an agent, whatever, um, right now if you have models that cost on the order of dollars or cents per million tokens, there's just an enormous room for expansion, uh, a margin expansion there where Satya to- a million tokens of Satya are, like, worth a lot. Um, and where does that margin go, and what level of that margin is Microsoft involved in is the question I have?
- SNSatya Nadella
So I think, um, in, in some sense, this goes back, again, to essentially what's the, uh, economic growth picture gonna really look like. Um, what's the firm gonna look like? What's productivity gonna look like?
- DPDwarkesh Patel
Yeah.
- SNSatya Nadella
And that to me is where, again, if the Industrial Revolution created after, whatever, 70 years of diffusion is when you started seeing the economic growth, right? It took ... That's the other thing to remember is, um, even if the tech is diffusing fast, uh, this time around, for true economic growth to appear, it has to sort of diffuse to a point where the work, the work artifact and the workflow has to change. And so that's kinda one place where I think, uh, the change management required for a corporation to truly change I think is something we shouldn't discount. So, I think going forward, do humans and the tokens they produce, uh, get higher leverage, right? Uh, whether it's the Dwarkesh or the Dylan tokens of the future. I mean, think about the amount of technolo- would you be able to run SemiAnalysis or this podcast without technology? No chance.
- DPDwarkesh Patel
Yeah.
- SNSatya Nadella
Right? I mean, the scale that you would be able to achieve, no chance. So the question is what's that scale? Is it gonna be ten X'd with something that comes through? Uh, absolutely. Uh, and therefore with their, your ramp to some revenue number or your ramp to some audience number or what have you. And so that I think is what's going to happen, right?
- DPDwarkesh Patel
Yeah.
- SNSatya Nadella
I mean, the, the point is, uh, that s- whatever, what took 70 years, maybe 150 years for the Industrial Revolution may happen in 20 years, 25 years. That's a better way to f- like, I would love to compress what happened in 200 years of the Industrial Revolution into 20-year period, if you're lucky.
- DPDwarkesh Patel
Mm.
- DPDylan Patel
So Microsoft historically has been perhaps, you know, the greatest software company, the largest software as a service company. You know, you've gone through a transition in the past where you used to sell Windows licenses and discs of Windows or Microsoft, and now you sell, you know, subscriptions to 365 or, um, a- a- as, as we go from sort of, you know, that transition to wh- where your business is today, um, there's also a transition going after that, right? Uh, software as a service, incredibly low incremental cost per user. Uh, there's a lot of R&D, there's a lot of customer acquisition cost. This is why, not Microsoft, but the SaaS companies have m- underperformed massively in the markets because the cogs of AI is just so high and that-
- SNSatya Nadella
Yeah.
- DPDylan Patel
... just completely breaks how these business models work. H- how do you as a, as, as a, as perhaps the greatest software company, um, software as a service company transition Microsoft to this new age where cogs matters a lot, um, and, and the incremental cost per users is different, right? 'Cause right now you're charging, hey, it's 20 bucks for Copilot.
- SNSatya Nadella
Yeah. So I think that this is a, it's a great question because in some sense, the business models themselves, I think the levers are gonna remain similar, right? Which is if I look at the, the, if, if you look at the menu of models, uh, starting from, like, say, consumer all the way, right, there will be some ad unit, uh, there will be some transaction, there'll be some device gross margin for somebody who builds an AI device. Um, uh, there will be subscriptions, consumer and enterprise. Uh, and then there'll be consumption, right? So I still think that that's kinda how ... Those are all the meters. To your point, what is a subscription? Up to now, people like subscriptions because they can budget for them, right? They are essentially entitlements to some consumption rights that come encapsulated in a subscription. So that, I think, is what, in some sense, it becomes a pricing decision. Uh, so how much consumption is en- you are entitled to? Is if you look at all the coding subscriptions, that's kinda what they are, right?
- DPDwarkesh Patel
Yeah.
- SNSatya Nadella
And they kinda have the pro tier, the standard tier, and what have you. And so I think that's how the pricing will h- uh, you know, and the margin structures will get tiered. Um, the interesting thing is having ... At Microsoft, the good news for us is we kinda are in that business, uh, all, in across all those meters. In fact that, that as a portfolio level, uh, we pretty much have consumption, subscriptions, uh, to all of the other consumer levers as well. Um, and then I think time will tell which of these models make sense in what categories. Um, one thing on the SaaS side, since you brought up, which I think a lot about is, uh, take, uh, Office 365 or Microsoft 365. I mean, man, uh, having a low ARPU is great because h- here's an interesting thing, right? During the transition from server to cloud, one of the questions we used to ask ourselves is, "Oh my God, if all we did was just basically move the same users who were using, let's call it our Office licenses and our servers at that time, Office servers, right, to the cloud."... and we had COGS, this is going to basically not only shrink our margins, uh, but we'll be fundamentally a nonprofitable or even less profitable company. Except what happened was, the move to the cloud expanded the market like crazy, uh, right? I mean, we sold a few servers in India, didn't sell much, whereas in the cloud, suddenly everybody in India also could afford fractionally buying, uh, servers. The IT cost. I mean, in fact, the biggest thing I had not realized for example was, the amount of money people were spending buying storage underneath SharePoint. In fact, EMC's biggest segment may have been storage servers for SharePoint. All that sort of dropped in the cloud because nobody had to go buy. In fact, it was working capital... I mean, basically it was cash flow out, right? And so, it expanded the market massively. So this AI thing will be that, right? So if you take coding, um, lit- what we built with GitHub and VSCode in over, whatever, decades, uh, suddenly the coding assistant is that big in one year. And so that I think is what's gonna happen as well, which is the market expands massively.
- DPDwarkesh Patel
Hmm. I, I
- 13:42 – 20:56
Copilot
- DPDwarkesh Patel
guess there's a question of the market will expand but will the parts of the revenue that touch Microsoft expand? So, Copilot is an example where if you look, uh, early this year, I think, uh, I guess according to Dylan's numbers, um, the Copilot revenue, GitHub Copilot revenue was like 500 million or something like that. And then, uh, there were, like, no close competitors. Whereas now you have Claude Code, Cursor, and Copilot with around similar revenue, around a billion. And then Codex is catching up, around 700, 800 million. And so the question is, across all the services that Microsoft has access to, what is the advantage that Microsoft's equivalents of Copilot have?
- SNSatya Nadella
Yeah. By the way, I love this chart. You know, I love this chart for so many reasons. One is we're still on the top.
- DPDwarkesh Patel
(laughs)
- DPDylan Patel
(laughs)
- SNSatya Nadella
Um, second is all these companies that are listed here are all companies that have been born in the last four, five years.
- DPDwarkesh Patel
Yeah.
- DPDylan Patel
Yeah.
- SNSatya Nadella
That to me is the best sign, right? Which is if you have new competitors, new existential problems when you say, "Man, who's hit now? Oh, Claude's going to kill you, Cursor's going to kill you."
- DPDwarkesh Patel
Yeah.
- SNSatya Nadella
It's not boring, right? So thank God. Like that means we are in the right direction. But this is it, right? The fact that we went from nothing to this scale is the market expansion. So this is like the cloud-like stuff. This, uh, fundamentally this category of coding and AI is probably going to be one of the biggest categories, right? It is a software factory category.
- DPDwarkesh Patel
Yeah.
- SNSatya Nadella
In fact, it may be bigger than knowledge work.
- DPDwarkesh Patel
Yeah.
- SNSatya Nadella
So I kind of want to keep myself open-minded about... I mean, we're going to have tough competition. I think that's your point-
- DPDwarkesh Patel
Yeah.
- SNSatya Nadella
... which I think is a great one. Uh, but man, like, I'm glad we have, we parlayed, uh, what we had into this, and now we have to compete. And so in the compete side, uh, even in the last quarter, we just fini- we did our quarterly, uh, announcement, I think we grew from 20 to 26 million subs, right? So I feel good about our sub growth, uh, and where the direction of travel on that is. But the more interesting thing that has happened is, guess where all the repos of all these other guys, uh, who are generating lots and lots of code go to. They go to GitHub. So it- GitHub is at an all-time high in terms of repo creation, PRs, everything. So that, in some sense, we want to keep that open, by the way. That means we want to have that, right? Because we don't want to conflate that with our own growth, right? The f- interestingly enough, we are getting one developer joining GitHub, uh, a second or something.
- DPDwarkesh Patel
Hmm.
- SNSatya Nadella
That is the stat, I think. And then 80% of them just fall into some GitHub Copilot, uh, workflow just because there are. And by the way, many of these things will even use some of our coding, uh, code review agents, which are by default on just because you can use it. So we'll have many, many structural shots at this. The thing that we're also going to do is what we did with GIT- G- the primitives of GitHub, whether starting with GIT to issues, to actions. These are powerful, lovely things because they kind of are all built around your repo. So we want to extend that. At EA- last week at GitHub Universe, that's kind of what we did, right? So we said Agent HQ was the conceptual thing that we said we are gonna build out. This is where, for example, you have a thing called Mission Control and you go to Mission Control, and now I can fire off... Sometimes I describe it as the cable TV of all these AI agents, because I'll have essentially packaged into one subscription Codex, Claude, um, you know, cognition stuff, anyone's agents, Grok, all of them will be there. So I get one package and then I can literally go issue a task, steer them, so they'll all be working in their independent branches. Uh, I can monitor them. Uh, so I literally have... Because I think that's going to be one of the biggest places of innovation, right? Because right now I want to be able to use multiple agents, I want to be able to then digest the output of the multiple agents, I want to be able to then keep a han- a handle on my repo. So if there's some- some kind of a heads-up display that needs to be built, and then for me to quickly steer and triage what decoding agents have generated. That, to me, between VSCode, GitHub, and all of these new primitives we'll build, uh, as mission control, I think, uh, with a control plane observability... I mean, think about everyone who's going to deploy all this. We'll require a whole host of observability of what agent did what, at what time, to what code base? So I feel that's the opportunity, uh, and at the end of the day, your point is well taken, which is we better be competitive and innovate. And if we don't, yes, we will get toppled but I like the chart, at least as long as we're on the top even with competition.
- DPDylan Patel
The key point here is sort of that GitHub will keep growing irregardless of whose coding agent wins. But that, that market only grows at, you know, call it 10, 15, 20%, which is way above GDP. It's a great compounder but these AI coding agents have grown from, you know, call it $500 million run rate at the end of last year, which was basically just GitHub Copilot to now the current run rate across, you know, GitHub Copilot, Claude Code, Cursor, Cognition, Windsurf, Replit, uh, Codex, OpenAI Codex, that's, that's, that's run rating at five, $6 billion now.
...um, for the, for the Q4 of this year. That's a 10X, right? And when you look at, hey, what's the TAM of software agents? Is it, is it the $2 trillion of wages you pay people or is it, is it something beyond that, uh, because every company in the world will now be able to-
- SNSatya Nadella
Absolutely.
- DPDylan Patel
...you know, develop software more? No question Microsoft takes a slice of that, but you've gone from near 100% or certainly way above 50% to, you know, sub 25% market share in just one year. What is the sort of confidence that people can get that Microsoft will be-
- SNSatya Nadella
Look, I mean, there's no... Again, it goes back a little bit, Dylan, to sort of there's no birthright here that we should have any confidence other than to say, "Hey, we should go innovate." And knowing the lucky break we have in some sense is that, uh, this category is gonna be a lot bigger than anything we had high share in. Let's- let me say it that way, right?
- DPDylan Patel
Mm-hmm.
- SNSatya Nadella
In some sense, you could say, "Well, we kinda had high share in VS Code. We had high share in the repos for- with GitHub." Uh, and that was a good market, but the point is even having a decent share in what is a much more expansive market, right? I mean, you could say we had a high share in client server server computing. We are much lower share than that in hyper-scale, but is it a much bigger business? By orders of magnitude. So at least there's existence proof that Microsoft is being okay, uh, even if our share position has not been as strong as it was, uh, as long as the markets we're competing in are creating more value. Uh, and there are multiple winners, uh, so I think that's the stuff. But I- I- I take your point that ultimately it all means you have to get competitive, so I watch that every quarter. And so that's why I think, well, I'm very optimistic that, uh, what we're going to do with GitHub HQ and, uh, or Agent HQ, turning GitHub into a place where all these agents come. Uh, as I said, we'll have multiple shots on goal on there, right? It need, it need not be that, "Hey, some of these guys can succeed along with us." Uh, and so it ne- doesn't need to be just one winner, uh, and one subscription.
- DPDylan Patel
Hmm.
- 20:56 – 37:12
Whose margins will expand most?
- DPDylan Patel
I- I guess the reason to focus on this question is that it's not just about GitHub, but fundamentally about Office and all the other software that Microsoft offers, which is that one vision you could have about how AI proceeds is that, look, the models are going to keep being hobbled, and you'll need this direct, visible, um, observability all the time. And another vision is over time, these models can... now they're doing tasks that take two minutes. In the future, they'll be doing tasks that... Next they're gonna be doing tasks that take 10, 30 minutes. In the future, maybe they're doing days worth of work autonomously, and then the model companies are charging thousands of dollars maybe for access to, uh, really a coworker which could use any UI to communicate with their human and so forth and migrate between platforms. So w- if we were getting closer to that, why aren't the model companies that are, uh, just getting more and more profitable the ones that are taking all the margin? Why is the- the place where the scaffolding happens, which becomes less and less relevant as AI become more capa- capable, gonna be that important? And that goes to, you know, Office as it exists now versus coworkers that are just doing knowledge work autonomously in South.
- SNSatya Nadella
I think that's a great point. I mean, I think that's a gr- I mean, for example, I mean, this is where, you know, does all the mo- value migrate just to, uh, the model, um, and, uh, or does the mo- you know, the- does it get split between the scaffolding, um, and, uh, the model and what have you? I- I think that, uh, time will tell, but my- my fundamental point also is the incentive structure gets clear, right? Which is if you take, um, let's take, uh, let's take information work or take even coding. Um, already, in fact, one of the favorite settings I have, uh, in GitHub Copilot is called Auto, um, right? Which will just optimize. In fact, I buy a subscription, the Auto one will start picking and optimizing for what I am asking it to do. Uh, and it could even be fully autonomous, and it could sort of arbitrage the tokens available across multiple models to go get a task done. So if that is the- that- that mean- that... If you take that argument, the commodity there will be models. Uh, and especially with open-source models, you can pick a checkpoint, and you can take a bunch of your data. And you're seeing it, right? I think all of us will start u- whether it's from Cursor or from Microsoft, uh, we will start seeing some in-house models even, uh, which will... And then you'll offload most of your, uh, tasks to it. So, I think that one argument is if you win the scaffolding, uh, which today is dealing with all the hobbling problems or the, uh, the jaggedness of these intelligence problems, which you kinda have to. Um, if you win that, then you will vertically integrate yourself into the model just because you will have the liquidity of the data and what have you, and there are enough and more checkpoints that are gonna be available. Uh, that's the other thing, right? So structurally, I think there will always be an open-source model, uh, that will be fairly capable in the world that you could then use as long as you have something that you can use that, uh, with, which is data, uh, and a scaffolding, right? So I can make the argument that, oh my god, uh, if you're a model company, you may be- you may have a winner's curse. You may have done all the hard work, done unbelievable innovation, except it's kinda like one copy, uh, away from that being commoditized. And then the person who has the data for grounding and context engineering, um, and the liquidity of data can then go take that checkpoint and train it. So I think the argument can be made both ways.
- DPDylan Patel
Un- unpacking sort of what you said, there's two views of the world, right? One is that models... There's so many different models out there. Open source exists. There will be differences between the models that will drive some level of, you know, who wins and who doesn't, but the scaffolding is what enables you to win.
- SNSatya Nadella
Yeah.
- DPDylan Patel
The other view is that actually models are the key IP. And yes, we're in a very- everyone's in a tight race, and there's some, you know, "Hey, I can use Anthropic or OpenAI," and you can see this in the revenue charts, right? Like OpenAI's revenue started skyrocketing once they finally had a code model similar capabilities to Anthropic, although in different ways. Um-... th- th- there's a view that, like, the model companies are actually the ones that garner all the margin, right? Because, you know, if you look across this year, at least on Anthropic, their gross margins on inference went from, you know, well below 40% to north of 60, right-
- SNSatya Nadella
Yeah.
- DPDylan Patel
... by the end of the year. Um, the- these- the margins are-
- SNSatya Nadella
Yeah.
- DPDylan Patel
... expanding there despite, hey, more Chinese open-source models than ever.
- SNSatya Nadella
Yeah.
- DPDylan Patel
Hey, OpenAI's competitive. Hey, Google's competitive. Hey, x- Grok is now competitive, right? All these- all these companies are now competitive. And yet despite this, the margins have expanded at the model layer significantly.
- SNSatya Nadella
Yeah.
- DPDylan Patel
Um, h- h- how do you think about the...
- SNSatya Nadella
It's a gr- it's a great question. It, it, I, I think that the one thing is perhaps a few years ago people were saying, "Oh, I can just wrap a model and build a successful company." Uh, and that, I think, is g- probably gotten debunked just because the model capabilities, um, and with tools used in particular. But the interesting thing is there's no... Like, when I look at Office 365, let's take even this little thing we built called Excel Agent. Uh, it's interesting, right? Excel Agent is not a UI-level wrapper. It's actually a model that is in the middle tier. Uh, in this case, because we have all the IP from the, the GPT family, uh, we are taking that and putting it into the core middle tier of the Office system to both teach it what it means to natively understand Excel, everything in it, so it's not just, "Hey, I just have a pixel-level understanding," I have a n- full understanding of all the native artifacts of Excel, uh, both when I see it. Like, because if you think about it, if I'm going to give it some reasoning task, right, I need to even fix the reasoning mistakes I make. And so that means I need to both not just see the pixels. I need to be able to see, oh, I got that formula wrong, and I need to understand that. And then so to some degree, that's all being done not at the UI wrapper level with some prompt, but it's being done in the middle tier by teaching it all the tools of Excel, right? So I'm giving it even, essentially, a markdown to teach it the skills of what it means to be a sophisticated Excel user. So it's a weird thing that it, it goes back a little bit to AI brain, right? Which is you're building not just Excel. You are now m- m- business logic in its traditional sense. You're taking the Excel business logic in the traditional sense and wrapping essentially a cognitive layer to it using this model which knows how to use the tool. So in some sense, Excel will come with an analyst bundled in and with all the tools used.
- DPDylan Patel
Mm.
- SNSatya Nadella
That's the type of stuff that will get built by everybody. So even for the model companies, they'll be allowed to compete, right? So if they price stuff high, uh, guess what? If I'm a builder of a tool like this, I'll substitute you. I may use you e- for a while, and so as long as there's competition, there's always a winner-take-all thing, right? If there's going to be one model that is better than everybody else with massive distance, yes, that's a winner-take-all. As long as there's gonna be competition where there's multiple models, just like hyperscale competition, and there's an open-source check, uh, there is enough room here, uh, to go build value on top of models.
- DPDylan Patel
Mm.
- SNSatya Nadella
Uh, but at Microsoft, the way I look at it and say is we are going to be in the hyperscale business which will support multiple models. We will have access to OpenAI models for, uh, uh, you know, seven more years, which we will innovate on top of, so royalty for... And essentially, I think of ourselves as having a frontier-class model, uh, that we can use and innovate on with full, uh, flexibility, and we'll build our own models, uh, with MAI. Um, and, and so we will always have a model level, and then we'll build these, whether it's in security, whether it's in knowledge work, whether it's in coding, or in science. We will build our own application scaffolding-
- DPDylan Patel
Mm.
- SNSatya Nadella
... which will be model-forward, right?
- DPDylan Patel
Mm-hmm.
- SNSatya Nadella
It won't be a wrapper on a model, but the model will be wrapped into, uh, the application.
- DPDylan Patel
I have so many questions about th- the other things you mentioned.
- SNSatya Nadella
(laughs)
- DPDylan Patel
But before we move on to those topics, um, I still wonder whether this is, like, not forward-looking on AI capabilities, where you're imagining models like they exist today where, yeah, I can, you have this, like, it takes a screenshot of your screen but it can't, like, look inside each cell and what the formula is. And I think the better m- mental model here is like, look, a human, just imagine that these models actually will be able to actually use a computer as well as a human.
- SNSatya Nadella
100%, yeah.
- DPDylan Patel
And a human knowledge worker who is using Excel can look into the formulas, can w- you know, use alternative software, can migrate data between Office 365 and another piece of software if the migration is necessary, et cetera. So what is-
- SNSatya Nadella
That's kind of what I'm saying. So what-
- DPDylan Patel
But, but if that's the case, then the integration with Excel doesn't matter that much-
- SNSatya Nadella
No, no, no.
- 37:12 – 48:42
MAI
- DPDwarkesh Patel
speaking of model companies, you say, okay, we will also be one of the... Not only we'll have the infrastructure, we'll have the model itself. Right now Microsoft AI's most recent model that was released two months ago is 36 in Chatbot Arena, and there's a qu- I mean you obviously have the IP rights to OpenAI, so there's a question of...... first, to the extent you agree w- that l- it seems to be behind, why is that the case? Especially given the fact that you could, um ... You theoretically have the right to just, like, fork OpenAI's monorepo or distill on their models. Um, y- yeah. E- especially if it's a big part of your strategy that we need to have a leading model company.
- SNSatya Nadella
Yeah. I mean, f- so first of all, we are g- absolutely going to use the OpenAI models, uh, to the maximum, uh, across all of our products, right? I mean, that's, I think, the core thing that we're gonna continue to do all the way for the next seven years. Uh, and not just use it, uh, but then add value to it. That's kinda where the analyst and this Excel agent, and these are all things that we will do where, you know, we'll do r- you know, RL fine-tuning, we'll do some mid-training runs on top of a GPT family where we have unique data assets and build capability. Um, the MAI model, the way I think we're gonna think about it is, the, the good news here, in fact with the new agreement, is even we can be very, very clear that we're gonna build a world-class superintelligence team and go after it with high ambition. But th- at the same time, we're also gonna use this time to be smart about how to use both these things. So that means we will, on one end, be very product-focused or on the other end be very research-focused. In other words, uh, because we have access, aha, to the GPT family, the last thing I don't wanna do is use my FLOPs in a way that is just duplicative and doesn't add much value. So I want to be able to take, uh, the FLOPs that we use to generate a GPT family and maximize its value while my MAI FLOPs are being used for ... Let's take the image model that we launched, which I think this launched, uh, is a number nine in the, uh, image arena. You know, we're using it, you know, both for cost optimization. It's on Copilot, it's in Bing, and we're gonna use that. We have a, a audio model in Copilot, which really it's got personality and what have you. We optimized it for our products. So we will do those. Even on the LM arena, we started on the text one, I think it was, it debuted at ni- 13. And by the way, it was v- it was done only on, whatever, 15,000, uh, H100s. And so it was a very small model. And, uh, so it was, again, to prove out, uh, the core capability, the instruction-following and everything else, which b- you know, we wanted to make sure we can match what was state-of-the-art. And so that shows us, given scaling laws, what we are capable of doing if you g- gave more FLOPs to it, right? So the next thing we will do is an omni model where we will take sort of the work we have done in audio, what we have done in image, and what we have done in text. That'll be the next pit stop on the MAI side. So when I think about the MAI roadmap, we're gonna build a first-class superintelligence team. We're gonna continue to drop and do on, in the open some of these models. They will either be in our products being used because they're going to be latency-friendly, COGS-friendly, or what have you, or they'll have some special capability. And we will do real research in order to be ready for some next five, six, seven, eight brea- breakthroughs, uh, that are all needed on this march towards superintelligence. So I think that's ... And while exploiting the advantage we have of having the GPT family that we can work on top of as well.
- NANarrator
Mm.
- DPDylan Patel
S- say we roll forward seven years. Uh, you no longer have access to OpenAI models. What does one get confidence, or what does Microsoft do to make sure they are leading or have a leading AI lab, right? Today, you know, it's, it's all OpenAI has developed many of the breakthroughs, whether it be scaling or reasoning, or Google's developed all the breakthroughs like Transformers. Uh, but, but it- it is also a big talent game, right? You know, you've seen Meta spend, you know, north of $20 billion on talent, right? Uh, you've seen Anthropic pr- uh, poach the entire Blueshift reasoning team from Google last year. You've seen Meta poach a large reasoning and post-training team from Google more recently. These, these sorts of talent wars are very capital-intensive. They're the ones that, you know, arguably, you know, if you're spending $100 billion on infrastructure, you should also spend, you know, X amount of money on, on the people using the infrastructure so that they're more efficiently making these new breakthroughs. What, what confidence can one get that, you know, hey, Microsoft will have a team that's world class that can make these breakthroughs? And, you know, once you decide to turn on the money faucet, you know, you're, you're being a bit capital-efficient right now, which is, which is smart it seems, uh, to not waste money d- doing duplicative work. But once you decide you need to, you know, how, how can one say, "Oh, yeah, now you can shoot up to, we're the top five model now."
- SNSatya Nadella
Well, look, I mean, at the e- eh, the end of the day, we are gonna build a world-class team, and we, uh, already have a world-class (laughs) team that's beginning to be sort of a- assembled, right? With Moustafa coming in. We have Karen, we have Amar Subramanian who did a l- lot of the post-training at Gemini, Toufi who's at Microsoft, Nando who did a lot of the multimedia s- work at DeepMind is there. And so we're gonna build a world-class team. And in fact, I think later this week even, Moustafa published some, you know, a little more clarity on what our lab is going to go do. Um, I think the thing that I want, uh, the world to know, perhaps, uh, is we are gonna build the infrastructure that'll support multiple models. Uh, you know, uh, we ... Because from a hyper-scale perspective, we wanna build the most scaled-infrastructure fleet that's capable of supporting all the models the world needs, whether it's from open source or whether obviously from OpenAI and others. And so that's kinda one job.Second is in our own model capability, we will absolutely use the OpenAI model in our products, and we will start building our own models. And we may, like in- in GitHub Copilot, Anthropic is used. So, we will even have other frontier models that are gonna be wrapped into our products as well. So, I think that that's kind of how at least each time, uh, the end of the day, the eval of the product as it meets a particular task or a job is what matters. And we'll sort of back from there into the vertical integration needed. Uh, knowing that as long as your service, you know, you're serving the market well with the product, you can always cost optimize.
- DPDwarkesh Patel
Mm-hmm. Th- th- there's a question going forward. So right now, we have models that have this distinction between training an inference, and one could argue that there's like a- a smaller and smaller difference between the different models. Um, going forward, if you're really expecting something like human-level intelligence, humans learn on the job. H- you know, if you think about your last 30 years, what- what makes Satya's token so valuable? It's the last 30 years of wisdom and experience you've gained at Microsoft. Um, and w- we will eventually have models, if they get to human level, which will have this ability to continuously learn on the job, and that will drive so much value to the model company that is ahead, at least in my view. Because you have copies of one model broadly deployed through the economy, learning how to do every single job. And unlike humans, they can amalgamate their learnings to that model. So, there's this sort of continuous learning sort of exponential feedback loop, um, which almost looks like a sort of intelligence explosion. Uh, if that happens and Microsoft isn't the leading model company by that time, doesn't then this, uh, you know, you're saying, "Well, we substitute one model for another," et cetera, matter less 'cause it's just like this one model knows how to do every single job in the economy. The other long tail don't.
- SNSatya Nadella
Yeah. No, I think that your point about if there's one model that is the only model that's most broadly deployed in the world and it sees all the data and it has continuous learning- (laughs)
- DPDwarkesh Patel
Yeah.
- SNSatya Nadella
... that's game, set, match-
- DPDwarkesh Patel
Yeah.
- SNSatya Nadella
... and, you know, it's shot sharp, right? I mean, the- the reality, at least I see, um, is the world, even today, for all the dominance of any one model, it's not the case. Um, a- it's like take any... Take coding. There's multiple models. In fact, every day, it's less the case where there is not one model that is getting deployed broadly. In fact, there's multiple models that are getting deployed. It's kind of like databases, right? You- it's always the thing is like, "Hey, can one database be the one that just is used everywhere?" Except it's not. Uh, there are multiple types of databases that are getting deployed, uh, for different use cases. So, I think that there is going to be some network effects of continual learning or data, you- you know, I'll call it liquidity, that any one model has. Uh, is it gonna happen in all domains? I don't think so. Is it gonna happen in all geos? I don't think so. Is it gonna happen in all segments? I don't think so. It'll happen in all categories at the same time? I don't think so. So therefore, I feel like the design space is so large, uh, that there's plenty of opportunity. But your fundamental point is having a capability which is at the infrastructure layer, model layer, and at the scaffolding layer, and then to be able to compose these things not just as a vertical stack, but to be able to compose each thing for what its purpose is, right? You can't build an infrastructure that's optimized for one model. If you do that, what if you go fall behind? In fact, all the infrastructure you built will be a waste, right? You kind of need to build an infrastructure that's capable of supporting multiple sort of families and lineages of models. Otherwise, the capital you put in, which is optimized for one model architecture, that means you're one tweak away from some MOE like breakthrough that happens with somebody else and your entire network topology goes out of the window. Then that's a scary thing, right? So therefore, you kind of want the infrastructure to support whatever may come, in fact, in your own model family and other model families, and you got to be open. If you- if you're serious about the hyper-scale business, you've got to be serious about that, right? Um, if you're serious about being a model company, you've got to basically say, "Hey, what are the ways people can actually do things on top of the model so that I can have an ISV ecosystem?" Unless I'm thinking I'll own every category. That just can't be. Then- then you won't have an API business, and that by definition will mean you'll never be, uh, a platform company that's gonna be successfully deployed everywhere, right? So therefore, the industry structure is so- such that it will, uh, really force people to specialize. And that in that specialization, a company like Microsoft should compete in each layer by its merits, uh, but not think that this is all about all a road to game, set, match where I just compose vertically all these layers. That's- that just doesn't happen.
- DPDwarkesh Patel
(instrumental music) So, according to Dylan's numbers, there's gonna be half a trillion in AI CapEx next year alone. And labs are already spending billions of dollars to snag top researcher talent. But none of that matters if there's not enough high quality data to train on. Without the right data, even the most advanced infrastructure and world-class talent won't translate into end value for the user. That's where LabelBox comes in. LabelBox produces high quality data at massive scale, powering any capability that you want your model to have. It doesn't matter whether you need a coding agent that needs detailed feedback on multi-hour trajectories or a robotics model that needs thousands of samples on everyday tasks, or a voice agent that can also perform real world actions for the user, like booking them a flight. To be clear, this isn't just off-the-shelf data. LabelBox can design and launch a custom production scale data pipeline in 48 hours, and they can get you tens of thousands of targeted examples in weeks. Reach out to at labelbox.com/dwarkesh. All right, back to Satya.
- 48:42 – 1:03:39
The hyperscale business
- DPDylan Patel
So last year, Microsoft was on- on path to be the largest infrastructure provider, uh, by far. You were- or at least in '23. So, you- you went out there, you acquired all the resources in terms of leasing data centers, starting construction, securing power, everything. You guys were on pace to beat Amazon in '26 or '27. Um, but certainly by '28, you were gonna beat them.Um, since then, you, you know, in, let's call it the second half of last year, Microsoft did this big pause, right, where they let go of a bunch of leasing sites that they were gonna take, which then Google, Meta, um, Amazon in some cases, Oracle, uh, took these sites. We're sitting in one of the largest data centers in the world, so obviously it's not everything. You guys are expanding like crazy. Uh, but there are sites that you just stopped working on.
- SNSatya Nadella
Mm-hmm.
- DPDylan Patel
Wh- why, why did you do this, right?
- SNSatya Nadella
Yeah, I mean, the fundamental thing we ... This goes back a little bit to what is the hyper-scale business all about, right, which is one of the key decisions we made was that if you're gonna build out Azure to be fantastic for all sort of stages of AI, uh, from training to mid-training, to data gen, to inference, we just need fungibility, uh, of the fleet. Um, and, and so that entire thing caused us not to basically go build a, a whole lot of capacity with a particular set of generations, uh, because the other thing that you got to realize is having actually for n- up to now 10X-ed every 18 months enough training capacity for the various OpenAI models, uh, we realized that, um, the key is to stay on that path. But the more important thing is to actually have a balance, to not just train, but to be able to serve these models all around the world, because at the end of the day, the rate of monetization is what then will allow us to even keep, uh, funding, and then the infrastructure was going to need us to support, as I said, multiple models and what have you. So once we said that that's the case, since then, we just course corrected to whe- the path we're on, right? If I look at the path we are on is we are doing a lot more starts now. Uh, we are also buying up as managed capacity as we can, whether it's to build, whether it's to lease, or even GPUs as a service. But we are building it for where we see the demand, uh, and the serving needs and our training needs. And we didn't want to just be a host stop for one company, uh, and have just a massive book of business with one customer. That, that's not a business, right? That is sort of ... Uh, you know, you should be vertically integrated with that company.
- DPDylan Patel
Yeah.
- SNSatya Nadella
Uh, and so given the, the thing that OpenAI was going to be a successful independent company, which is fantastic, right? I think it's makes sense, right? And even Meta may use third-party capacity, but ultimately they're all going to be first party. Uh, for anyone who has large scale, they'll be, you know, they'll be a hyper-scaler on their own. And so to me, was to build out a hyper-scale fleet and our own research compute, uh, and that's what the adjustment was. Um, you know, and then, and so I feel very, very good. Oh, by the way, the other thing is I didn't want to get stuck with massive scale of one generation. I mean, we just saw the, the GB200s. I mean, the GB300s are coming, right? And by the time I get to Vera Rubin, Vera Rubin Ultra, guess what? The data center is gonna look very different because the power per rack, power per row is going to be so different. Uh, the cooling requirements are going to be so different. And that, that means I don't want to just go build out like a whole number of gigawatts that are only for a one generation, one family. And so I think the pacing matters and the fungibility and the location matters. Uh, the workload diversity matters, customer diversity matters, and that's what we're building towards. The other thing that we've learned a lot is, um, every AI workload does require not only the AI accelerator, but it requires a whole lot of other things, right? And in fact, a lot of the margin structure for us will be in those other things. And so therefore, we want to build out Azure as being fantastic for the long tail of the workloads because that's the hyper-scale business while knowing that we've got to be super competitive starting with the bare metal for the highest end training. And, but that can't crowd out the rest of the business, right? Because we are not in the business of just doing five contracts with five customers, being their bare metal service. That's not a, a Microsoft business. That may be a business for someone else, and that's a good thing. What we have said is we are in the hyper-scale business, which is, at the end of the day, a long tail business, uh, for AI workloads. And in order to do that, we will have some leading bare metal as a service capabilities for a set of models, including our own. Uh, and that I think is the balance you see.
- DPDylan Patel
The, another sort of question that comes around this whole fungibility topic is, okay, it's not where you want it, right? You would rather have it in a good population center like Atlanta as you're- we're here. Um, there, there's, there's also the question of like, well, how much does that matter if as the horizon of AI task grows? Well, actually-
- SNSatya Nadella
It's a great question.
- DPDylan Patel
... you know, 30 seconds for a reasoning prompt, or, you know, 30 minutes for a deep research or, you know, it's gonna be hours for software agents at some point, um, and days and so on and so forth, the time to human interaction. Why does it matter if it's-
- SNSatya Nadella
Yeah.
- DPDylan Patel
... if it's, uh-
- SNSatya Nadella
It's a great, it's a great question.
- DPDylan Patel
... uh, location A, B, or C?
- SNSatya Nadella
That's exactly right. So, in fact, that's one of the other reasons why we want to think about like, hey, what does an Azure region look like and what is the, in fact the networking between Azure regions? So this is where, uh, I think as the model capabilities evolve, and I think the usage of these tokens, whether it's synchronously or asynchronously evolves. And in fact, you don't want to be out of position, right? Then on top of that, by the way, what are the data residency laws, right? Wh- where do I ... Like, I mean the entire EU thing, uh, for us where we literally had to create an EU data boundary, uh, basically meant that you can't just round trip a call to where- wherever, even if it's asynchronous. And so therefore you need to have maybe regional things that are high density and then the power costs and so on. But you're a hundred percent right in bringing up that the topology as we build out, uh, a- will have to evolve, one, for-Tokens per dollar per watt. Uh, what are the economics? So, uh, overlay that with what is the usage pattern, um, usage pattern in terms of synchronous, asynchronous, but also what is the compute storage because the latencies may matter for certain things. Uh, the storage better be there. If I have a Cosmos DB close to this for session data or even for an autonomous thing, then that also has to be somewhere close to it, and so on. So I think that all of those considerations is what will shape, um, the hyperscale business.
- DPDylan Patel
Mm. You know, prior to the pause, you were, you were, you're, you know, versus, you know, what we had forecasted for you by '28, you're gonna be, like, 12, 13 gigawatts-
- SNSatya Nadella
Yeah.
- DPDylan Patel
... and now we're at, you know, nine and a half or so, right? But, you know, something that's even more relevant, right, and it's, it's, you know, I just want you to, like, more concretely state that this is the business you don't want to be in, but, like, Oracle's going from like one-fifth your size to bigger than you by end of 2027. And while it's not a Microsoft level quality of return on invested capital, right, they're still making 35% gross margins, right? So sort of the question is like does it, is it, isn't it, is it, is it, you know, hey, it's not Microsoft's business to maybe do this, wha- wh- what-
- SNSatya Nadella
(laughs) .
- DPDylan Patel
But you've created a hyperscaler now-
- SNSatya Nadella
Yeah.
- DPDylan Patel
... by refusing this business, by giving away the right of, uh-
- SNSatya Nadella
Look, I'm, I'm-
- DPDylan Patel
... first refusal, et cetera.
- SNSatya Nadella
I'm not... First of all, I don't, I don't want to take away any, uh, uh, thing from the success Oracle has had-
- DPDylan Patel
Yeah.
- SNSatya Nadella
... uh, in building their business, and I wish them well. And so the thing that I think I've answered for you is, it didn't make sense for us, uh, to go be a hoster for one model company, uh, with limited time horizon RPO. Let's-
- DPDylan Patel
Yeah.
- SNSatya Nadella
... let's just put it that way, right? The thing that you have to think through is not what you do in the next five years, but what do you do for the next 50? Uh, because that's kind of what I w- We made our set of decisions. Um, I feel very good about our OpenAI, uh, partnership and what we're doing. We have a decent book, uh, book of business. We wish them a lot of success. In fact, we are buyers of Oracle capacity, we wish-
- DPDylan Patel
Yep.
- SNSatya Nadella
... them success. But, you know, at this point, I think the industrial logic for what we are trying to do is pretty clear, which is it's not about like chasing... I mean, first of all, I track, by the way, your, uh, things, whe- whether it's the AWS or the Google and ours, which I think is super useful. Uh, but doesn't mean I (laughs) gotta chase those.
- 1:03:39 – 1:10:30
In-house chip & OpenAI partnership
- DPDwarkesh Patel
mentioned the c- how the, you know, you're depreciating this asset that's five, six years and this is the majority of the, you know, 75% of the TCO data center, and Jensen is taking a 75% margin on that. So what all the hyperscalers are trying to do is develop their own accelerator so that they can reduce this overwhelming cost for, um, uh, equipment to increase their margins.
- DPDylan Patel
Yeah. And then, and then like, you know, when you look at where they are, right? Google's way ahead of everyone else, right? They've been doing it for the longest. They're going to make something like five to seven million chips, right, of their own TPUs. You look at Amazon, they're trying to make three to five million. Uh, but when we look at what, you know, Microsoft is ordering of their own chips, it's, it's, it's way below that number. Um, you've had a program for just as long. What's going on with your internal chips?
- SNSatya Nadella
Yeah, it's a good question. So, so the, couple of things. One is the thing that is the biggest competitor for any new accelerator is kind of even the previous generation of NVIDIA, right? I mean, in a fleet, what I'm going to look at is the overall TCO. So the bar I have even for our own, and which by the way, you know, I was just looking at the data for Maia 200, which looks great, um, e- e- except that one of the things that we learned even on the compute side, right, which is we had a lot of Intel, then we introduced AMD, and then we introduced Cobalt. And so that's kind of how we scaled it. And so we have good, um, sort of existence proof of at least in core compute on how to build your own silicon and then manage a fleet where all three are at play in some balance. Uh, because by the way, even Google's buying NVIDIA and so is, uh, Amazon. And it makes sense because NVIDIA is innovating and it's the general purpose thing. All models run on it, uh, and customer demand is there, because if you build your own vertical thing, you better have your own model, uh, which is, you know, either going to use it for training or inference, and you have to generate your own demand for it or subsidize the demand for it. So therefore, you want to, uh, make sure, um, you scale it appropriately. So the way we are going to go do it is have a closed loop between our own MAI models and our silicon because I feel like that's the, that's what gives you the birthright to really do your own silicon, right, where you literally have, uh, designed the microarchitecture with what you're doing and then you keep pace with your own models. Um, in our case, the good news here is OpenAI has a program, uh, which we have access to. Um, and so therefore, to think that Microsoft is not going to have something that's scaled-
- DPDylan Patel
What level of access do you have to that?
- SNSatya Nadella
All of it.
- DPDylan Patel
You just get the IP for all of that?
- SNSatya Nadella
Yeah.
- DPDylan Patel
So the only IP you don't have is a consumer hardware?
- SNSatya Nadella
That's it.
- DPDylan Patel
Oh, wow. Okay.
- SNSatya Nadella
Yeah. (laughs)
- DPDwarkesh Patel
(laughs) Interesting. (laughs)
- SNSatya Nadella
Yeah. Oh, and by the way, we gave them, uh, a bunch of IP as well to bootstrap them, right? So this is one of the reasons why they had a mass ... Because we built all these supercomputers together. Uh, we built it for them and they, uh, benefited from it, rightfully so. And, uh, and now as they innovate even at the system level, we get access to all of it. Uh, and, uh, we first wants to in- in, wanna instantiate what they built, uh, for them. Uh, but then we'll extend it. And so to think that we don't have ... And so if anything, the way I, I think about your question is, uh, Microsoft wants to be a fantastic, I'll call it, speed of light execution partner for NVIDIA. Because quite frankly, that fleet, uh, is life itself. I'm not worried about ... I mean, obviously Jensen's doing super well with his margins, but the TCO has many dimensions to it, and I wanna be great at that TCO. Uh, on top of that, I wanna be able to sort of really work with the OpenAI lineage, uh, and the MAI lineage, and the system design knowing that we have the IP rights on both ends.
- DPDwarkesh Patel
Yeah. Hmm. Uh, sp- speaking of rights, one thing, you know, you had an interview a couple of days ago, uh, where you said that we have rights to the, the, the new agreement you made with OpenAI, you have rights to the exclusivity to the stateless API calls that OpenAI makes. And we were sort of confused about if there's any state whatsoever. I mean, you were just mentioning a second ago that all these complicated workloads that are coming up are gonna require memory and databases and storage and so forth. And is that now not stateless? Are ChatGPTs storing stuff, run sessions?
- SNSatya Nadella
No, but that's the reason why. So the, the thing, the business, the strategic decision we made, and also accommodating for the flexibility OpenAI needed in order to be able to procure compute for ... Essentially think of o- OpenAI having, um, a PaaS business and a SaaS business. SaaS business is ChatGPT, their PaaS business is their API.
- DPDwarkesh Patel
Yeah.
- SNSatya Nadella
That API is Azure-exclusive. The SaaS business, they can run it anywhere.
- DPDwarkesh Patel
And they can partner with anyone they want to to build SaaS products?
- SNSatya Nadella
Th- So if they want a partner, and this partner wants to use a s- a stateless API, then Azure is the place where they can get the stateless API.
- DPDwarkesh Patel
It seems like there's a way for them to make ... You, you, you know, build the product together and, and it's a stateful thing?
- SNSatya Nadella
No, even that, they'll have to come to Azure.
- DPDwarkesh Patel
Okay.
- SNSatya Nadella
So if it is any partner ... And so, so fundamentally, you know, so, uh, again, this is done in the spirit of, what is it that we valued as part of our partnership? And we made sure while at the same time we were good partners to OpenAI, given all the flexibility they need.
- DPDwarkesh Patel
So, so for example, Salesforce wants to integrate OpenAI. It's not through an API. They actually work together, train a model together, deploy it on, let's say, Amazon now. Is that, is that allowed? Or, uh, or do they have to use your ...
- SNSatya Nadella
No, for any custom agreement like that, they will have to come run it. There, there are some few exceptions to US government and so on that we made. But other than that, they'll have to come to Azure.
- DPDwarkesh Patel
So as Satya explained, as AI agents get more capable, you're gonna need more and more observability into what they're doing. You're gonna need to catch them when they're making mistakes. You're gonna need high-level summaries of what they're doing. And you're gonna need a picture of how everything that they're doing fits together. This is exactly what CodeRabbit provides. You just make a normal pull request and CodeRabbit automatically reviews the PR. It generates a summary of changes so you can understand exactly what the PR's author was intending, and it uses the context from your full codebase to provide line-by-line feedback on how things could be improved. This is helpful whether you're reviewing a PR from a coworker or an agent. In either case, CodeRabbit will write up its thoughts and flag any issues so that your teammate or your agent can go fix them. I've noticed that when I'm coding with agents, CodeRabbit catches a lot of mistakes that the models make by default. For example, the models have a bad habit of using old versions of libraries. So in one session, I watched CodeRabbit cache a call to a, an old model, figure out what the new version was, and then suggest that improvement. Go to coderabbit.ai/thwarkesh to learn more.
- 1:10:30 – 1:16:01
The CAPEX explosion
- DPDwarkesh Patel
Stepping back, a question I have is, you know, when we were t- uh, w- walking back and forth to the, uh, the factory, one of the things you were talking about is, you know, that M- Microsoft, you can think of it as a software business, but now it's really becoming an industrial business. Uh, there's all this capex, there's all this construction. And if you just look over the last two ye- t- t- um, two years, your sort of capex has, like, tripled and maybe you extrapolate that forward, it just actually just becomes this huge industrial, uh, explosion. Well, other hyperscalers are taking loans, right? Meta's, Meta's done a $20 billion loan at Louisiana. They've taken, they've done a corporate loan. It seems clear everyone's free cashflow is going to zero.
- SNSatya Nadella
(laughs)
- DPDwarkesh Patel
Um, which is, which is ... I'm sure Amy is, like, gonna beat you up for-
- SNSatya Nadella
(laughs)
- DPDwarkesh Patel
... for even, if you even try to do that. But, like, uh, what, what, what's happening?
- SNSatya Nadella
I mean, I think, uh, I think the structural change, um, is what you're referencing, which I think is massive, right, which is ... I, I describe it as we are now a, a capital-intensive business and a knowledge-intensive business. And in fact, we have to use our knowledge to increase the ROIC on the capital spend, right? Because that's kind ... You know, look, uh, the hardware guys have done a great job, uh, of marketing the Moore's Law, which I think is unbelievable and it's great. But if you even look at, I think some of the stats I even did in my earnings call, which is for a given GPT family, right, uh, the improvement, software improvements of really throughput in terms of tokens per dollar per watt that we're able to get, uh, y- you know, quarter over quarter, year over year, is massive. Uh, right now, so it's 5X, 10X, maybe 40X in some of these cases, right, just because, uh, how you can optimize. That's s- sort of knowledge intense-
- DPDwarkesh Patel
Yeah.
- SNSatya Nadella
... intensity coming to bring out capital efficiency. So that, at, at some level, uh, that's what we have to master. What does it mean? Like, so many people ask me, "What is the difference between, uh, you know, a classic, old-time host, uh, and a hyperscaler?" It is software.So, yes, it is capital-intensive, but as long as you have systems, know-how, software capability to optimize by workload, by fleet. That's why I think when, when we say fungibility, there's so much software in it, it's just not about the fleet, right? It's kind of the ability evict a workload, uh, you know, and then schedule another workload. Can I, like, manage the, that algorithm of scheduling around? Uh, that is the type of stuff that we have to be world-class at. And so yes, so I think we'll still remain a software company.
- DPDwarkesh Patel
Mm-hmm.
- SNSatya Nadella
Uh, but yes, this is a different business. Um, and we're gonna manage. Look, at the, at the end of the day, uh, the cash flow that Microsoft has allows us to have, um, both these arms firing un- you know, uh, well.
- DPDwarkesh Patel
It seems like in the short term, you have more sort of, um, credence on things taking a while, being more jagged. But in the, maybe in the long term you think, like, the people who say, talk about AGI and ASI are correct. Like Sam, Sam will be right, but eventually. Um, and I, I have a broader question about what makes sense for a hyper-scaler to do given that you have to invest massively in this thing, which depreciates over five years. So it's, uh, so if you, if you have 20, 40 timelines to the kind of thing that somebody like Sam anticipates in three years, um, you know, what- what is a reasonable thing for you to do in that world?
- SNSatya Nadella
There needs to be an allocation, uh, to I'll call it research compute-
- DPDwarkesh Patel
Mm-hmm.
- SNSatya Nadella
... that needs to be done like you did R&D.
- DPDwarkesh Patel
Yeah.
- SNSatya Nadella
Right? So that's the best way to even account for it, quite frankly, because you think of it as just R&D expense, and you should say, "Hey, what's the research compute, and, uh, how do you wanna scale it?"
- DPDwarkesh Patel
Yeah.
- SNSatya Nadella
Um, and let's even say it's an order of magnitude scale, uh, in some period. Pick your thing.
- DPDwarkesh Patel
Yeah.
- SNSatya Nadella
Is it two years? Is it 16 months? What have you, right? So that's sort of one piece, which is kinda, that's kinda table stakes. That's R&D expenses. And the rest is all demand-driven, right? I mean, ultimately you can, you'll have to build ahead of demand, but you better have a demand, uh, uh, plan, uh, that doesn't go completely off-kilter.
- DPDwarkesh Patel
Do you buy... So, uh, these labs are now projecting revenues of 100 billion in '27, '28, uh, and they're projecting, you know, revenue keeps growing at this rate of, like, 3X, 2X a year.
- SNSatya Nadella
See, there's lo- it's like re- in the marketplace, right, there's all kinds of incentives right now and, and rightfully so, right? I mean, what, what do you expect an independent lab that is sort of trying to raise money to do, right? They have to put some numbers out there such that they can actually go raise money so that they can pay their bills for compute-
Episode duration: 1:28:41
Install uListen for AI-powered chat & search across the full episode — Get Full Transcript
Transcript of episode 8-boBsWcr5A
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome