Jonathan Ross, Founder & CEO @ Groq: NVIDIA vs Groq - The Future of Training vs Inference | E1260

The Twenty Minute VCFeb 17, 20251h 25m

Jonathan Ross (guest), Harry Stebbings (host)

AI scaling laws, synthetic data, and algorithmic efficiencyInference vs training economics and future infrastructure bottlenecksGroq’s LPU architecture vs NVIDIA GPUs (performance, energy, and cost)Business model innovation: revenue-share, capex-light deployments, Saudi/Aramco dealGlobal AI geopolitics: U.S., China, Europe, and power infrastructureStartup strategy: positioning for the wave, product–market fit, and talentSocietal implications: human agency, abundance, and AI-driven work transformation

In this episode of The Twenty Minute VC, featuring Jonathan Ross and Harry Stebbings, Jonathan Ross, Founder & CEO @ Groq: NVIDIA vs Groq - The Future of Training vs Inference | E1260 explores groq’s Jonathan Ross Redefines AI Inference, Chips, and Global Power Dynamics Jonathan Ross, founder and CEO of Groq, explains why AI inference, not training, will be the dominant long‑term bottleneck and economic driver, and how Groq’s LPU architecture is built specifically to win that market. He argues that synthetic data, better algorithms, and massive compute will push scaling laws further than most people expect, while energy and power infrastructure become true hard constraints. Groq’s model avoids high‑margin GPU training, instead targeting low‑margin, ultra‑high‑volume inference with a capital‑light, revenue‑sharing deployment model, exemplified by its multibillion‑dollar Saudi deal. Ross also examines NVIDIA’s position, China’s AI trajectory, Europe’s strategic choices, and the societal risks of abundance and over‑delegation to AI, framing Groq’s mission as preserving human agency in the age of AI.

Groq’s Jonathan Ross Redefines AI Inference, Chips, and Global Power Dynamics

Jonathan Ross, founder and CEO of Groq, explains why AI inference, not training, will be the dominant long‑term bottleneck and economic driver, and how Groq’s LPU architecture is built specifically to win that market. He argues that synthetic data, better algorithms, and massive compute will push scaling laws further than most people expect, while energy and power infrastructure become true hard constraints. Groq’s model avoids high‑margin GPU training, instead targeting low‑margin, ultra‑high‑volume inference with a capital‑light, revenue‑sharing deployment model, exemplified by its multibillion‑dollar Saudi deal. Ross also examines NVIDIA’s position, China’s AI trajectory, Europe’s strategic choices, and the societal risks of abundance and over‑delegation to AI, framing Groq’s mission as preserving human agency in the age of AI.

Key Takeaways

Inference will dwarf training as the dominant AI compute market.

Ross notes that at Google, inference consumed 10–20x the compute of training, and argues that as AI becomes embedded in every workflow, the need to serve tokens (inference) will massively outstrip one‑off model training.

Get the full analysis with uListen AI

Synthetic data and smarter training pipelines can extend scaling laws.

By iteratively training models to generate higher-quality synthetic data, pruning errors, and retraining on that curated data, the effective scaling curve becomes much steeper than traditional token counts suggest, blurring the idea of a near-term ‘scaling limit.’

Get the full analysis with uListen AI

Groq’s LPU design trades chip count for radically better inference efficiency.

Instead of relying on scarce, power-hungry HBM, Groq keeps model parameters on-chip across many LPUs, avoiding expensive external memory traffic and delivering ~3x better energy efficiency per token and more than 5x lower cost per token than GPUs for inference.

Get the full analysis with uListen AI

NVIDIA will likely remain dominant in training while inference unbundles.

Ross sees NVIDIA’s GPU stack as a solved problem for training and expects them to sell every GPU they can manufacture into that high-margin market, while Groq and others absorb the low‑margin, high‑volume inference workloads.

Get the full analysis with uListen AI

Groq’s capex-light, revenue-share model removes capital as the main constraint.

Partners like Aramco fund the hardware build‑out; Groq repays with a targeted IRR and then flips more upside to itself, so growth is governed by chip supply and deployment capacity rather than Groq’s own balance sheet.

Get the full analysis with uListen AI

Global power and data center infrastructure will become a hard bottleneck.

Ross anticipates a near-term illusion of power and data center oversupply due to fake or under‑specced projects, followed by a genuine power crunch as chip counts continue doubling and AI inference demand explodes, especially outside hyperscaler campuses.

Get the full analysis with uListen AI

Winning in AI requires solving unsolved customer problems, not specsmanship.

Ross dismisses ‘specsmanship’ (e. ...

Get the full analysis with uListen AI

Notable Quotes

“Your job is not to follow the wave, your job is to get positioned for the wave.”
— Jonathan Ross

“We did not raise 1.5 billion, that's revenue. That's actually about 30% of the revenue of OpenAI.”
— Jonathan Ross

“You could almost say we're one of the best things that's ever happened to NVIDIA... we'll take the low margin, high volume inference business off their hands.”
— Jonathan Ross

“When you are growing faster than exponential, there is no amount of profit that you can make that matters. What matters is getting a toehold in the market and becoming relevant.”
— Jonathan Ross

“To preserve human agency in the age of AI, we need to be one of the most important compute providers in the world.”
— Jonathan Ross

Questions Answered in This Episode

If inference becomes the primary bottleneck and economic engine, how will that reshape the business models of today’s leading AI labs and cloud providers?

Get the full analysis with uListen AI

What risks arise from synthetic data loops—could self-generated training data amplify subtle model biases or errors over time?

Get the full analysis with uListen AI

How sustainable is Groq’s low-margin, high-volume inference strategy if hyperscalers or NVIDIA aggressively cut prices or develop competing inference-optimized hardware?

Get the full analysis with uListen AI

At what point do power, cooling, and water constraints fundamentally cap global AI capacity, and who will control those chokepoints?

Get the full analysis with uListen AI

How can societies encourage the productivity gains of agentic AI without drifting into over-dependence, where individuals and institutions cede too much decision-making to models?

Get the full analysis with uListen AI

Transcript Preview

Jonathan Ross

We did not raise 1.5 billion, that's revenue. That's actually about 30% of the revenue of OpenAI. Your job is not to follow the wave, your job is to get positioned for the wave. You could almost say we're one of the best things that's ever happened to NVIDIA, because they can make every single GPU that they were gonna make, and they can sell it for training, high margin, gets amortized across deployment. You know, we'll take the low margin, high volume inference business off their hands, and they won't have to sell you the margin. We're growing faster than exponential. And when you are growing faster than exponential, there is no amount of profit that you can make that matters. What matters is getting a toehold in the market and becoming relevant.

Harry Stebbings

Ready to go? Jonathan, thank you so much for agreeing to do this in Paris. You look fantastic, by the way. I feel so underdressed, but you look great.

Jonathan Ross

Thank you. I, I could take the tie off if you want, but I'll never be able to tie it again. I don't know how to tie a tie.

Harry Stebbings

(laughs)

Jonathan Ross

No, literally, my Chief of Staff has to tie it for me, it's, and it's, like, a struggle 'cause, like, he's putting it on himself, he's tying it. I, I literally only bought this suit recently.

Harry Stebbings

Well, I mean, you look fantastic. I don't, I, I think, have a suit, so you're one up on me. I wanna split the show into two parts today. I wanna talk about the landscape where we're at, and then I wanna dive specifically into Grok, where you're at. You've announced a massive new deal that I think everyone's slightly misunderstanding-

Jonathan Ross

(laughs)

Harry Stebbings

... what we were just talking about. Um, I just want to start on where we're at. In terms of, like, scaling laws, everyone says we are at the limits of scaling laws, and then there seems to be exponential innovation happening with the likes of DeepSeek and others.

Jonathan Ross

Yeah.

Harry Stebbings

Where are we at in terms of the limits of scaling laws?

Jonathan Ross

So scaling laws is a paper that was published by OpenAI, and what it does is it effectively says the more parameters your model has, basically the better it can absorb information. So you'll, you'll see these curves that they draw, and they're, they're amazing. You should show it if you can. But effectively, um, you have these sort of asymptotic drop-offs where y- you keep getting better and better, but you get a logarithmic improvement when you put a linear number of tokens in. This is why you see people doing 15 mil- uh, trillion tokens of training and whatnot. But they're misunderstood because, um, the assumption is that all of the, all of the data is the same quality. So, uh, you have a kid now, right?

Harry Stebbings

Mm-hmm.

Jonathan Ross

So eventually, you're gonna be training your kid, and you're gonna say, and play along with me here, "What's one plus one?"

Install uListen to search the full transcript and get AI-powered insights

Get Full Transcript