Steeve Morin: Why Google Will Win the AI Arms Race & OpenAI Will Not | E1262

Name: Steeve Morin: Why Google Will Win the AI Arms Race & OpenAI Will Not | E1262
Uploaded: 2025-02-24T12:00:00Z
Duration: 1 h 18 min 9 s
Description: Steeve Morin (ZML) argues the real power in AI will accrue to players that own all three pillars: product distribution, proprietary data, and their own compute—positioning Google as the long‑term ‘sleeping giant’ over OpenAI and even Microsoft.

The Twenty Minute VCFeb 24, 20251h 18m

Steeve Morin (guest), Harry Stebbings (host)

ZML’s role as a hardware‑agnostic ML framework and zero buy‑in infrastructureNVIDIA vs AMD/TPU/Trainium economics, CUDA/PyTorch lock‑in, and GPU market dynamicsTraining vs inference: different infrastructure needs, economics, and scaling patternsEmerging architectures: agents, reasoning, latent space reasoning, SRAM, HBM, and compute‑in‑memoryCloud and hyperscaler strategies, margins, and the importance of owning computeModel scaling laws, efficiency (DeepSeek), non‑transformer / world‑model directionsCompetitive landscape: Google vs OpenAI/Microsoft, DeepSeek, China, Mistral, and regulation

In this episode of The Twenty Minute VC, featuring Steeve Morin and Harry Stebbings, Steeve Morin: Why Google Will Win the AI Arms Race & OpenAI Will Not | E1262 explores google’s Compute Advantage: Why It Wins AI, NVIDIA Loses Margin Steeve Morin (ZML) argues the real power in AI will accrue to players that own all three pillars: product distribution, proprietary data, and their own compute—positioning Google as the long‑term ‘sleeping giant’ over OpenAI and even Microsoft.

Google’s Compute Advantage: Why It Wins AI, NVIDIA Loses Margin

Steeve Morin (ZML) argues the real power in AI will accrue to players that own all three pillars: product distribution, proprietary data, and their own compute—positioning Google as the long‑term ‘sleeping giant’ over OpenAI and even Microsoft.

He explains how today’s NVIDIA‑centric world is an artifact of CUDA/PyTorch lock‑in and supply, not technical superiority, and why the economics of H100s are a fragile bubble once alternative chips (TPU, Trainium, AMD and new ASICs) become easy to adopt.

Morin predicts a massive shift from training to inference (95% of spend in five years), driven by agents and deep reasoning workloads that are latency‑bound and favor new architectures, memory technologies, and inference‑specialized chips over general‑purpose GPUs.

He believes the key strategic leverage will move to software that makes hardware and model choice interchangeable, collapses switching costs, and forces true price/spec competition in the AI compute stack.

Key Takeaways

Owning compute is the real strategic moat in AI.

If you rent NVIDIA GPUs through clouds, most of every dollar you bill is paying NVIDIA and the hyperscaler’s margin. ...

Get the full analysis with uListen AI

Inference will dwarf training, and its needs are fundamentally different.

Morin predicts AI infra spend will be roughly 95% inference, 5% training in five years. ...

Get the full analysis with uListen AI

Latency‑bound agents and reasoning will break today’s GPU assumptions.

For agents and deep reasoning, users care about end‑to‑end latency per request, not aggregate throughput. ...

Get the full analysis with uListen AI

CUDA/PyTorch lock‑in is a social/stack problem, not a permanent moat.

Most of the industry is on NVIDIA because PyTorch was built around CUDA and the ecosystem snowballed. ...

Get the full analysis with uListen AI

Overbuying and mis‑provisioning compute is creating a looming oversupply.

Because on‑demand GPU pricing is punitive and scaling is hard, companies over‑reserve GPUs and then underutilize them, using GPUs as collateral for multi‑year financing. ...

Get the full analysis with uListen AI

Efficiency and new architectures may matter more than brute‑force scaling.

DeepSeek showed that careful engineering can deliver multiple‑X gains without more GPUs, and researchers like Yann LeCun are pushing non‑transformer, world‑model approaches that could radically reduce compute requirements for the same or better capability.

Get the full analysis with uListen AI

For startups, reselling compute is a trap—own the product, not the GPU bill.

If your business model is buying API tokens or GPU time and marking them up, 90–98% of your revenue flows to someone else’s margins. ...

Get the full analysis with uListen AI

Notable Quotes

“If you don't own your compute, you're starting with something at your ankle.”
— Steeve Morin

“In five years, I would say 95% inference, 5% training.”
— Steeve Morin

“You have the products, the data, and the compute. Who has all three? Google.”
— Steeve Morin

“The thing with NVIDIA is that they spend a lot of energy making you care about stuff you shouldn't care about.”
— Steeve Morin

“Constraint is the mother of innovation. They had no choice, so they delivered efficiency.”
— Steeve Morin (on DeepSeek and China)

Questions Answered in This Episode

If software truly drives hardware fungibility, how quickly could NVIDIA’s pricing power collapse once switching costs go to zero?

Get the full analysis with uListen AI

What concrete actions can Google take to fully exploit its product–data–compute triangle and ‘wake up’ as the AI leader Morin describes?

Get the full analysis with uListen AI

How should AI product teams architect today to be ready for a world of latency‑bound agents and non‑transformer models tomorrow?

Get the full analysis with uListen AI

What indicators should investors or operators watch to spot the impending oversupply and fire‑sale phase in GPU capacity?

Get the full analysis with uListen AI

Where is the ethical line between legitimate distillation/efficiency work (like DeepSeek) and exploitative use of proprietary models and training data?

Get the full analysis with uListen AI

Transcript Preview

Steeve Morin

The thing with NVIDIA is that they spend a lot of energy making you care about stuff you shouldn't care about. And they were very successful. Like, who gives a shit about Kuda? OpenAI is amazing, but it's not their compute. Ultimately, if you don't own your compute, you're starting with, you know, something at your ankle. In five years, I would say 95% inference, 5% training. You have the products, the data, and the compute. Who has all three? Google has, like, you know, Android, Google Docs, whatever. They have everything. They can sprinkle everywhere. This is the sleeping giant in my mind.

Harry Stebbings

Ready to go? Steve, dude, I am so grateful to you for joining me today. I've wanted to make this one happen for a while, but when we were discussing who'd be best for this topic, I was like, "We've got to have Steve on." So thank you for joining me today, man.

Steeve Morin

Man, well, well thank you. I feel, uh, humbled. I appreciate it. Thank you.

Harry Stebbings

Dude, I want to start. Can you just give us a quick overview of ZML, and specifically your role in the infrastructure strategy today and where you sit?

Steeve Morin

At the very bottom of things, um, ZML is an ML framework that runs any models on any hardware. Uh, and it does so without compromise. So we sit ultimately at, at the, um, at the infrastructure layer. Uh, we enable anybody to run their model better, faster, more reliably, uh, but on any compute whatsoever. It doesn't really matter, it could be NVIDIA, it could be AMD, it could be TPU, and whatnot, uh, and we do all that without compromise. That's the key point because if there's a compromise, then it's not really, you know, agnostic, right?

Harry Stebbings

Can I ask you then, if we think about sitting between any model and any provider, there in terms-

Steeve Morin

Right.

Harry Stebbings

... of AMD, NVIDIA-

Steeve Morin

Right.

Harry Stebbings

... do you think then we will be existing in a world where people are using multiple models simultaneously and that is-

Steeve Morin

Yeah.

Harry Stebbings

... concurrently running?

Steeve Morin

Yes, um, you, you actually can see it. It's been happening for a while. Models now are, are not the right abstractions. At least if you look at closed source model, they're not really models. They're more like backends, right? Uh, and there are a lot of tricks that you feel like you're talking to one model, but ultimately you're talking to a constellation, an assembly of backends that produces, you know, a response. Uh, probably the number one, you know, I would say obvious thing, would be that if you ask a model to generate an image, then it will, you know, switch to a diffusion model, right, not an LLM. So, and there's many, many more tricks, the turbo models at OpenAI do that. There's a lot of tricks. So definitely, uh, models in the sense of, you know, getting, you know, weights and running them is something that is ultimately going away because, you know, in favor of, like, full-blown backends, right? You feel like you're talking to a model, but ultimately you're talking to an API. The thing is that API will be running locally, right? Or locally I mean in your own, you know, uh, cloud, you know, instances and so on.

Install uListen to search the full transcript and get AI-powered insights

Get Full Transcript

Get more from every podcast

AI summaries, searchable transcripts, and fact-checking. Free forever.

Add to Chrome