The Twenty Minute VCSteeve Morin: Why Google Will Win the AI Arms Race & OpenAI Will Not | E1262
At a glance
WHAT IT’S REALLY ABOUT
Google’s Compute Advantage: Why It Wins AI, NVIDIA Loses Margin
- Steeve Morin (ZML) argues the real power in AI will accrue to players that own all three pillars: product distribution, proprietary data, and their own compute—positioning Google as the long‑term ‘sleeping giant’ over OpenAI and even Microsoft.
- He explains how today’s NVIDIA‑centric world is an artifact of CUDA/PyTorch lock‑in and supply, not technical superiority, and why the economics of H100s are a fragile bubble once alternative chips (TPU, Trainium, AMD and new ASICs) become easy to adopt.
- Morin predicts a massive shift from training to inference (95% of spend in five years), driven by agents and deep reasoning workloads that are latency‑bound and favor new architectures, memory technologies, and inference‑specialized chips over general‑purpose GPUs.
- He believes the key strategic leverage will move to software that makes hardware and model choice interchangeable, collapses switching costs, and forces true price/spec competition in the AI compute stack.
IDEAS WORTH REMEMBERING
5 ideasOwning compute is the real strategic moat in AI.
If you rent NVIDIA GPUs through clouds, most of every dollar you bill is paying NVIDIA and the hyperscaler’s margin. Players like Google (TPUs) and hyperscalers with their own silicon can own their margin and undercut NVIDIA‑based economics.
Inference will dwarf training, and its needs are fundamentally different.
Morin predicts AI infra spend will be roughly 95% inference, 5% training in five years. Inference is production: reliability, autoscaling, and cost efficiency dominate, and interconnect matters far less than in massive distributed training runs.
Latency‑bound agents and reasoning will break today’s GPU assumptions.
For agents and deep reasoning, users care about end‑to‑end latency per request, not aggregate throughput. That shifts the advantage toward architectures with high single‑stream tokens‑per‑second (SRAM‑heavy chips, compute‑in‑memory, new ASICs) and away from GPU farms optimized for batch throughput.
CUDA/PyTorch lock‑in is a social/stack problem, not a permanent moat.
Most of the industry is on NVIDIA because PyTorch was built around CUDA and the ecosystem snowballed. If software like ZML makes switching hardware frictionless (zero buy‑in), even modest efficiency gains on AMD/TPUs/ASICs become compelling.
Overbuying and mis‑provisioning compute is creating a looming oversupply.
Because on‑demand GPU pricing is punitive and scaling is hard, companies over‑reserve GPUs and then underutilize them, using GPUs as collateral for multi‑year financing. Morin expects fire‑sale data centers and cold emails for cheap GPU capacity as this overhang hits the market.
WORDS WORTH SAVING
5 quotesIf you don't own your compute, you're starting with something at your ankle.
— Steeve Morin
In five years, I would say 95% inference, 5% training.
— Steeve Morin
You have the products, the data, and the compute. Who has all three? Google.
— Steeve Morin
The thing with NVIDIA is that they spend a lot of energy making you care about stuff you shouldn't care about.
— Steeve Morin
Constraint is the mother of innovation. They had no choice, so they delivered efficiency.
— Steeve Morin (on DeepSeek and China)
High quality AI-generated summary created from speaker-labeled transcript.
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome