
Steeve Morin: Why Google Will Win the AI Arms Race & OpenAI Will Not | E1262
Steeve Morin (guest), Harry Stebbings (host)
In this episode of The Twenty Minute VC, featuring Steeve Morin and Harry Stebbings, Steeve Morin: Why Google Will Win the AI Arms Race & OpenAI Will Not | E1262 explores google’s Compute Advantage: Why It Wins AI, NVIDIA Loses Margin Steeve Morin (ZML) argues the real power in AI will accrue to players that own all three pillars: product distribution, proprietary data, and their own compute—positioning Google as the long‑term ‘sleeping giant’ over OpenAI and even Microsoft.
Google’s Compute Advantage: Why It Wins AI, NVIDIA Loses Margin
Steeve Morin (ZML) argues the real power in AI will accrue to players that own all three pillars: product distribution, proprietary data, and their own compute—positioning Google as the long‑term ‘sleeping giant’ over OpenAI and even Microsoft.
He explains how today’s NVIDIA‑centric world is an artifact of CUDA/PyTorch lock‑in and supply, not technical superiority, and why the economics of H100s are a fragile bubble once alternative chips (TPU, Trainium, AMD and new ASICs) become easy to adopt.
Morin predicts a massive shift from training to inference (95% of spend in five years), driven by agents and deep reasoning workloads that are latency‑bound and favor new architectures, memory technologies, and inference‑specialized chips over general‑purpose GPUs.
He believes the key strategic leverage will move to software that makes hardware and model choice interchangeable, collapses switching costs, and forces true price/spec competition in the AI compute stack.
Key Takeaways
Owning compute is the real strategic moat in AI.
If you rent NVIDIA GPUs through clouds, most of every dollar you bill is paying NVIDIA and the hyperscaler’s margin. ...
Get the full analysis with uListen AI
Inference will dwarf training, and its needs are fundamentally different.
Morin predicts AI infra spend will be roughly 95% inference, 5% training in five years. ...
Get the full analysis with uListen AI
Latency‑bound agents and reasoning will break today’s GPU assumptions.
For agents and deep reasoning, users care about end‑to‑end latency per request, not aggregate throughput. ...
Get the full analysis with uListen AI
CUDA/PyTorch lock‑in is a social/stack problem, not a permanent moat.
Most of the industry is on NVIDIA because PyTorch was built around CUDA and the ecosystem snowballed. ...
Get the full analysis with uListen AI
Overbuying and mis‑provisioning compute is creating a looming oversupply.
Because on‑demand GPU pricing is punitive and scaling is hard, companies over‑reserve GPUs and then underutilize them, using GPUs as collateral for multi‑year financing. ...
Get the full analysis with uListen AI
Efficiency and new architectures may matter more than brute‑force scaling.
DeepSeek showed that careful engineering can deliver multiple‑X gains without more GPUs, and researchers like Yann LeCun are pushing non‑transformer, world‑model approaches that could radically reduce compute requirements for the same or better capability.
Get the full analysis with uListen AI
For startups, reselling compute is a trap—own the product, not the GPU bill.
If your business model is buying API tokens or GPU time and marking them up, 90–98% of your revenue flows to someone else’s margins. ...
Get the full analysis with uListen AI
Notable Quotes
“If you don't own your compute, you're starting with something at your ankle.”
— Steeve Morin
“In five years, I would say 95% inference, 5% training.”
— Steeve Morin
“You have the products, the data, and the compute. Who has all three? Google.”
— Steeve Morin
“The thing with NVIDIA is that they spend a lot of energy making you care about stuff you shouldn't care about.”
— Steeve Morin
“Constraint is the mother of innovation. They had no choice, so they delivered efficiency.”
— Steeve Morin (on DeepSeek and China)
Questions Answered in This Episode
If software truly drives hardware fungibility, how quickly could NVIDIA’s pricing power collapse once switching costs go to zero?
Steeve Morin (ZML) argues the real power in AI will accrue to players that own all three pillars: product distribution, proprietary data, and their own compute—positioning Google as the long‑term ‘sleeping giant’ over OpenAI and even Microsoft.
Get the full analysis with uListen AI
What concrete actions can Google take to fully exploit its product–data–compute triangle and ‘wake up’ as the AI leader Morin describes?
He explains how today’s NVIDIA‑centric world is an artifact of CUDA/PyTorch lock‑in and supply, not technical superiority, and why the economics of H100s are a fragile bubble once alternative chips (TPU, Trainium, AMD and new ASICs) become easy to adopt.
Get the full analysis with uListen AI
How should AI product teams architect today to be ready for a world of latency‑bound agents and non‑transformer models tomorrow?
Morin predicts a massive shift from training to inference (95% of spend in five years), driven by agents and deep reasoning workloads that are latency‑bound and favor new architectures, memory technologies, and inference‑specialized chips over general‑purpose GPUs.
Get the full analysis with uListen AI
What indicators should investors or operators watch to spot the impending oversupply and fire‑sale phase in GPU capacity?
He believes the key strategic leverage will move to software that makes hardware and model choice interchangeable, collapses switching costs, and forces true price/spec competition in the AI compute stack.
Get the full analysis with uListen AI
Where is the ethical line between legitimate distillation/efficiency work (like DeepSeek) and exploitative use of proprietary models and training data?
Get the full analysis with uListen AI
Transcript Preview
The thing with NVIDIA is that they spend a lot of energy making you care about stuff you shouldn't care about. And they were very successful. Like, who gives a shit about Kuda? OpenAI is amazing, but it's not their compute. Ultimately, if you don't own your compute, you're starting with, you know, something at your ankle. In five years, I would say 95% inference, 5% training. You have the products, the data, and the compute. Who has all three? Google has, like, you know, Android, Google Docs, whatever. They have everything. They can sprinkle everywhere. This is the sleeping giant in my mind.
Ready to go? Steve, dude, I am so grateful to you for joining me today. I've wanted to make this one happen for a while, but when we were discussing who'd be best for this topic, I was like, "We've got to have Steve on." So thank you for joining me today, man.
Man, well, well thank you. I feel, uh, humbled. I appreciate it. Thank you.
Dude, I want to start. Can you just give us a quick overview of ZML, and specifically your role in the infrastructure strategy today and where you sit?
At the very bottom of things, um, ZML is an ML framework that runs any models on any hardware. Uh, and it does so without compromise. So we sit ultimately at, at the, um, at the infrastructure layer. Uh, we enable anybody to run their model better, faster, more reliably, uh, but on any compute whatsoever. It doesn't really matter, it could be NVIDIA, it could be AMD, it could be TPU, and whatnot, uh, and we do all that without compromise. That's the key point because if there's a compromise, then it's not really, you know, agnostic, right?
Can I ask you then, if we think about sitting between any model and any provider, there in terms-
Right.
... of AMD, NVIDIA-
Right.
... do you think then we will be existing in a world where people are using multiple models simultaneously and that is-
Yeah.
... concurrently running?
Yes, um, you, you actually can see it. It's been happening for a while. Models now are, are not the right abstractions. At least if you look at closed source model, they're not really models. They're more like backends, right? Uh, and there are a lot of tricks that you feel like you're talking to one model, but ultimately you're talking to a constellation, an assembly of backends that produces, you know, a response. Uh, probably the number one, you know, I would say obvious thing, would be that if you ask a model to generate an image, then it will, you know, switch to a diffusion model, right, not an LLM. So, and there's many, many more tricks, the turbo models at OpenAI do that. There's a lot of tricks. So definitely, uh, models in the sense of, you know, getting, you know, weights and running them is something that is ultimately going away because, you know, in favor of, like, full-blown backends, right? You feel like you're talking to a model, but ultimately you're talking to an API. The thing is that API will be running locally, right? Or locally I mean in your own, you know, uh, cloud, you know, instances and so on.
Install uListen to search the full transcript and get AI-powered insights
Get Full TranscriptGet more from every podcast
AI summaries, searchable transcripts, and fact-checking. Free forever.
Add to Chrome