The Twenty Minute VC

Steeve Morin: Why Google Will Win the AI Arms Race & OpenAI Will Not | E1262

Steeve Morin is the Founder & CEO @ ZML, a next-generation inference engine enabling peak performance on a wide range of chips. Prior to founding ZML, Steeve was the VP Engineering at Zenly for 7 years leading eng to millions of users and an acquisition by Snap. ---------------------------------------------- In Today’s Episode We Discuss: (00:00) Intro (00:59) How Will Inference Change and Evolve Over the Next 5 Years (06:24) Challenges and Innovations in AI Hardware (14:07) The Economics of AI Compute (16:57) Training vs. Inference: Infrastructure Needs (24:56) The Future of AI Chips and Market Dynamics (36:25)Nvidia's Market Position and Competitors (40:47) Challenges of Incremental Gains in the Market (41:39) The Zero Buy-In Strategy (42:18) Switching Between Compute Providers (43:23) The Importance of a Top-Down Strategy for Microsoft and Google (44:49) Microsoft's Strategy with AMD (49:35) Data Center Investments and Training (52:20) How to Succeed in AI: The Triangle of Products, Data, and Compute (52:48) Scaling Laws and Model Efficiency (54:34) Future of AI Models and Architectures (01:03:38) Retrieval Augmented Generation (RAG) (01:07:51) Why OpenAI’s Position is Not as Strong as People Think (01:15:20) Challenges in AI Hardware Supply ----------------------------------------------- Subscribe on Spotify: https://open.spotify.com/show/3j2KMcZTtgTNBKwtZBMHvl?si=85bc9196860e4466 Subscribe on Apple Podcasts: https://podcasts.apple.com/us/podcast/the-twenty-minute-vc-20vc-venture-capital-startup/id958230465 Follow Harry Stebbings on X: https://twitter.com/HarryStebbings Follow Steeve Morin on X: https://twitter.com/steeve Follow 20VC on Instagram: https://www.instagram.com/20vchq Follow 20VC on TikTok: https://www.tiktok.com/@20vc_tok Visit our Website: https://www.20vc.com Subscribe to our Newsletter: https://www.thetwentyminutevc.com/contact ----------------------------------------------- #20vc #harrystebbings #steevemorin #zml #openai #nvidia #amd #ai #inference #startups #founder #ceo

Steeve MoringuestHarry Stebbingshost

Feb 23, 20251h 18mWatch on YouTube ↗

WHAT IT’S REALLY ABOUT

Google’s Compute Advantage: Why It Wins AI, NVIDIA Loses Margin

Steeve Morin (ZML) argues the real power in AI will accrue to players that own all three pillars: product distribution, proprietary data, and their own compute—positioning Google as the long‑term ‘sleeping giant’ over OpenAI and even Microsoft.
He explains how today’s NVIDIA‑centric world is an artifact of CUDA/PyTorch lock‑in and supply, not technical superiority, and why the economics of H100s are a fragile bubble once alternative chips (TPU, Trainium, AMD and new ASICs) become easy to adopt.
Morin predicts a massive shift from training to inference (95% of spend in five years), driven by agents and deep reasoning workloads that are latency‑bound and favor new architectures, memory technologies, and inference‑specialized chips over general‑purpose GPUs.
He believes the key strategic leverage will move to software that makes hardware and model choice interchangeable, collapses switching costs, and forces true price/spec competition in the AI compute stack.

IDEAS WORTH REMEMBERING

5 ideas

Owning compute is the real strategic moat in AI.

If you rent NVIDIA GPUs through clouds, most of every dollar you bill is paying NVIDIA and the hyperscaler’s margin. Players like Google (TPUs) and hyperscalers with their own silicon can own their margin and undercut NVIDIA‑based economics.

Inference will dwarf training, and its needs are fundamentally different.

Morin predicts AI infra spend will be roughly 95% inference, 5% training in five years. Inference is production: reliability, autoscaling, and cost efficiency dominate, and interconnect matters far less than in massive distributed training runs.

Latency‑bound agents and reasoning will break today’s GPU assumptions.

For agents and deep reasoning, users care about end‑to‑end latency per request, not aggregate throughput. That shifts the advantage toward architectures with high single‑stream tokens‑per‑second (SRAM‑heavy chips, compute‑in‑memory, new ASICs) and away from GPU farms optimized for batch throughput.

CUDA/PyTorch lock‑in is a social/stack problem, not a permanent moat.

Most of the industry is on NVIDIA because PyTorch was built around CUDA and the ecosystem snowballed. If software like ZML makes switching hardware frictionless (zero buy‑in), even modest efficiency gains on AMD/TPUs/ASICs become compelling.

Overbuying and mis‑provisioning compute is creating a looming oversupply.

Because on‑demand GPU pricing is punitive and scaling is hard, companies over‑reserve GPUs and then underutilize them, using GPUs as collateral for multi‑year financing. Morin expects fire‑sale data centers and cold emails for cheap GPU capacity as this overhang hits the market.

WORDS WORTH SAVING

5 quotes

If you don't own your compute, you're starting with something at your ankle.

— Steeve Morin

In five years, I would say 95% inference, 5% training.

— Steeve Morin

You have the products, the data, and the compute. Who has all three? Google.

— Steeve Morin

The thing with NVIDIA is that they spend a lot of energy making you care about stuff you shouldn't care about.

— Steeve Morin

Constraint is the mother of innovation. They had no choice, so they delivered efficiency.

— Steeve Morin (on DeepSeek and China)

ZML’s role as a hardware‑agnostic ML framework and zero buy‑in infrastructureNVIDIA vs AMD/TPU/Trainium economics, CUDA/PyTorch lock‑in, and GPU market dynamicsTraining vs inference: different infrastructure needs, economics, and scaling patternsEmerging architectures: agents, reasoning, latent space reasoning, SRAM, HBM, and compute‑in‑memoryCloud and hyperscaler strategies, margins, and the importance of owning computeModel scaling laws, efficiency (DeepSeek), non‑transformer / world‑model directionsCompetitive landscape: Google vs OpenAI/Microsoft, DeepSeek, China, Mistral, and regulation

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.