Skip to content
The Twenty Minute VCThe Twenty Minute VC

Jonathan Ross, Founder & CEO @ Groq: NVIDIA vs Groq - The Future of Training vs Inference | E1260

Jonathan Ross is the Founder & CEO of Groq, the creator of the world’s first Language Processing Unit (LPUTM). Prior to Groq, Jonathan began what became Google’s Tensor Processing Unit (TPU) as a 20% project where he designed and implemented the core elements of the first-generation TPU chip. Jonathan next joined Google X’s Rapid Eval Team, the initial stage of the famed “Moonshots Factory”, where he devised and incubated new Bets (Units) for Google’s parent company, Alphabet. ---------------------------------------------- In Today’s Episode We Discuss: (00:00) Intro (01:29) Scaling Laws and AI Model Training (06:48) Synthetic Data and Model Efficiency (09:00) Inference vs. Training Costs: Why NVIDIA Loses Inference (15:12) The Future of AI Inference: Efficiency and Cost (16:35) Chip Supply and Scaling Concerns (19:40) Energy Efficiency in AI Computation (25:37) Why Most Dollars Into Datacenters Will Be Lost (31:41) Meta, Google, and Microsoft's Data Center Investments (43:24) Distribution of Value in the AI Economy (44:17) Stages of Startup Success (45:46) The AI Investment Bubble (47:45) The Keynesian Beauty Contest in VC (51:52) NVIDIA's Role in the AI Ecosystem (57:30) China's AI Strategy and Global Implications (01:02:25) Europe's Potential in the AI Revolution (01:17:13) Future Predictions and AI's Impact on Society ---------------------------------------------- Subscribe on Spotify: https://open.spotify.com/show/3j2KMcZTtgTNBKwtZBMHvl?si=85bc9196860e4466 Subscribe on Apple Podcasts: https://podcasts.apple.com/us/podcast/the-twenty-minute-vc-20vc-venture-capital-startup/id958230465 Follow Harry Stebbings on Twitter: https://twitter.com/HarryStebbings Follow Jonathan Ross on Twitter: https://twitter.com/JonathanRoss321 Follow 20VC on Instagram: https://www.instagram.com/20vchq Follow 20VC on TikTok: https://www.tiktok.com/@20vc_tok Visit our Website: https://www.20vc.com Subscribe to our Newsletter: https://www.thetwentyminutevc.com/contact ----------------------------------------------- #20vc #harrystebbings #jonathanross #groq #CEO #venturecapital #founder #ai #nvidia #modeltraining #inference

Jonathan RossguestHarry Stebbingshost
Feb 16, 20251h 25mWatch on YouTube ↗

At a glance

WHAT IT’S REALLY ABOUT

Groq’s Jonathan Ross Redefines AI Inference, Chips, and Global Power Dynamics

  1. Jonathan Ross, founder and CEO of Groq, explains why AI inference, not training, will be the dominant long‑term bottleneck and economic driver, and how Groq’s LPU architecture is built specifically to win that market. He argues that synthetic data, better algorithms, and massive compute will push scaling laws further than most people expect, while energy and power infrastructure become true hard constraints. Groq’s model avoids high‑margin GPU training, instead targeting low‑margin, ultra‑high‑volume inference with a capital‑light, revenue‑sharing deployment model, exemplified by its multibillion‑dollar Saudi deal. Ross also examines NVIDIA’s position, China’s AI trajectory, Europe’s strategic choices, and the societal risks of abundance and over‑delegation to AI, framing Groq’s mission as preserving human agency in the age of AI.

IDEAS WORTH REMEMBERING

5 ideas

Inference will dwarf training as the dominant AI compute market.

Ross notes that at Google, inference consumed 10–20x the compute of training, and argues that as AI becomes embedded in every workflow, the need to serve tokens (inference) will massively outstrip one‑off model training.

Synthetic data and smarter training pipelines can extend scaling laws.

By iteratively training models to generate higher-quality synthetic data, pruning errors, and retraining on that curated data, the effective scaling curve becomes much steeper than traditional token counts suggest, blurring the idea of a near-term ‘scaling limit.’

Groq’s LPU design trades chip count for radically better inference efficiency.

Instead of relying on scarce, power-hungry HBM, Groq keeps model parameters on-chip across many LPUs, avoiding expensive external memory traffic and delivering ~3x better energy efficiency per token and more than 5x lower cost per token than GPUs for inference.

NVIDIA will likely remain dominant in training while inference unbundles.

Ross sees NVIDIA’s GPU stack as a solved problem for training and expects them to sell every GPU they can manufacture into that high-margin market, while Groq and others absorb the low‑margin, high‑volume inference workloads.

Groq’s capex-light, revenue-share model removes capital as the main constraint.

Partners like Aramco fund the hardware build‑out; Groq repays with a targeted IRR and then flips more upside to itself, so growth is governed by chip supply and deployment capacity rather than Groq’s own balance sheet.

WORDS WORTH SAVING

5 quotes

Your job is not to follow the wave, your job is to get positioned for the wave.

Jonathan Ross

We did not raise 1.5 billion, that's revenue. That's actually about 30% of the revenue of OpenAI.

Jonathan Ross

You could almost say we're one of the best things that's ever happened to NVIDIA... we'll take the low margin, high volume inference business off their hands.

Jonathan Ross

When you are growing faster than exponential, there is no amount of profit that you can make that matters. What matters is getting a toehold in the market and becoming relevant.

Jonathan Ross

To preserve human agency in the age of AI, we need to be one of the most important compute providers in the world.

Jonathan Ross

AI scaling laws, synthetic data, and algorithmic efficiencyInference vs training economics and future infrastructure bottlenecksGroq’s LPU architecture vs NVIDIA GPUs (performance, energy, and cost)Business model innovation: revenue-share, capex-light deployments, Saudi/Aramco dealGlobal AI geopolitics: U.S., China, Europe, and power infrastructureStartup strategy: positioning for the wave, product–market fit, and talentSocietal implications: human agency, abundance, and AI-driven work transformation

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome