
Jonathan Ross, Founder & CEO @ Groq: NVIDIA vs Groq - The Future of Training vs Inference | E1260
Jonathan Ross (guest), Harry Stebbings (host)
In this episode of The Twenty Minute VC, featuring Jonathan Ross and Harry Stebbings, Jonathan Ross, Founder & CEO @ Groq: NVIDIA vs Groq - The Future of Training vs Inference | E1260 explores groq’s Jonathan Ross Redefines AI Inference, Chips, and Global Power Dynamics Jonathan Ross, founder and CEO of Groq, explains why AI inference, not training, will be the dominant long‑term bottleneck and economic driver, and how Groq’s LPU architecture is built specifically to win that market. He argues that synthetic data, better algorithms, and massive compute will push scaling laws further than most people expect, while energy and power infrastructure become true hard constraints. Groq’s model avoids high‑margin GPU training, instead targeting low‑margin, ultra‑high‑volume inference with a capital‑light, revenue‑sharing deployment model, exemplified by its multibillion‑dollar Saudi deal. Ross also examines NVIDIA’s position, China’s AI trajectory, Europe’s strategic choices, and the societal risks of abundance and over‑delegation to AI, framing Groq’s mission as preserving human agency in the age of AI.
Groq’s Jonathan Ross Redefines AI Inference, Chips, and Global Power Dynamics
Jonathan Ross, founder and CEO of Groq, explains why AI inference, not training, will be the dominant long‑term bottleneck and economic driver, and how Groq’s LPU architecture is built specifically to win that market. He argues that synthetic data, better algorithms, and massive compute will push scaling laws further than most people expect, while energy and power infrastructure become true hard constraints. Groq’s model avoids high‑margin GPU training, instead targeting low‑margin, ultra‑high‑volume inference with a capital‑light, revenue‑sharing deployment model, exemplified by its multibillion‑dollar Saudi deal. Ross also examines NVIDIA’s position, China’s AI trajectory, Europe’s strategic choices, and the societal risks of abundance and over‑delegation to AI, framing Groq’s mission as preserving human agency in the age of AI.
Key Takeaways
Inference will dwarf training as the dominant AI compute market.
Ross notes that at Google, inference consumed 10–20x the compute of training, and argues that as AI becomes embedded in every workflow, the need to serve tokens (inference) will massively outstrip one‑off model training.
Get the full analysis with uListen AI
Synthetic data and smarter training pipelines can extend scaling laws.
By iteratively training models to generate higher-quality synthetic data, pruning errors, and retraining on that curated data, the effective scaling curve becomes much steeper than traditional token counts suggest, blurring the idea of a near-term ‘scaling limit.’
Get the full analysis with uListen AI
Groq’s LPU design trades chip count for radically better inference efficiency.
Instead of relying on scarce, power-hungry HBM, Groq keeps model parameters on-chip across many LPUs, avoiding expensive external memory traffic and delivering ~3x better energy efficiency per token and more than 5x lower cost per token than GPUs for inference.
Get the full analysis with uListen AI
NVIDIA will likely remain dominant in training while inference unbundles.
Ross sees NVIDIA’s GPU stack as a solved problem for training and expects them to sell every GPU they can manufacture into that high-margin market, while Groq and others absorb the low‑margin, high‑volume inference workloads.
Get the full analysis with uListen AI
Groq’s capex-light, revenue-share model removes capital as the main constraint.
Partners like Aramco fund the hardware build‑out; Groq repays with a targeted IRR and then flips more upside to itself, so growth is governed by chip supply and deployment capacity rather than Groq’s own balance sheet.
Get the full analysis with uListen AI
Global power and data center infrastructure will become a hard bottleneck.
Ross anticipates a near-term illusion of power and data center oversupply due to fake or under‑specced projects, followed by a genuine power crunch as chip counts continue doubling and AI inference demand explodes, especially outside hyperscaler campuses.
Get the full analysis with uListen AI
Winning in AI requires solving unsolved customer problems, not specsmanship.
Ross dismisses ‘specsmanship’ (e. ...
Get the full analysis with uListen AI
Notable Quotes
“Your job is not to follow the wave, your job is to get positioned for the wave.”
— Jonathan Ross
“We did not raise 1.5 billion, that's revenue. That's actually about 30% of the revenue of OpenAI.”
— Jonathan Ross
“You could almost say we're one of the best things that's ever happened to NVIDIA... we'll take the low margin, high volume inference business off their hands.”
— Jonathan Ross
“When you are growing faster than exponential, there is no amount of profit that you can make that matters. What matters is getting a toehold in the market and becoming relevant.”
— Jonathan Ross
“To preserve human agency in the age of AI, we need to be one of the most important compute providers in the world.”
— Jonathan Ross
Questions Answered in This Episode
If inference becomes the primary bottleneck and economic engine, how will that reshape the business models of today’s leading AI labs and cloud providers?
Jonathan Ross, founder and CEO of Groq, explains why AI inference, not training, will be the dominant long‑term bottleneck and economic driver, and how Groq’s LPU architecture is built specifically to win that market. ...
Get the full analysis with uListen AI
What risks arise from synthetic data loops—could self-generated training data amplify subtle model biases or errors over time?
Get the full analysis with uListen AI
How sustainable is Groq’s low-margin, high-volume inference strategy if hyperscalers or NVIDIA aggressively cut prices or develop competing inference-optimized hardware?
Get the full analysis with uListen AI
At what point do power, cooling, and water constraints fundamentally cap global AI capacity, and who will control those chokepoints?
Get the full analysis with uListen AI
How can societies encourage the productivity gains of agentic AI without drifting into over-dependence, where individuals and institutions cede too much decision-making to models?
Get the full analysis with uListen AI
Transcript Preview
We did not raise 1.5 billion, that's revenue. That's actually about 30% of the revenue of OpenAI. Your job is not to follow the wave, your job is to get positioned for the wave. You could almost say we're one of the best things that's ever happened to NVIDIA, because they can make every single GPU that they were gonna make, and they can sell it for training, high margin, gets amortized across deployment. You know, we'll take the low margin, high volume inference business off their hands, and they won't have to sell you the margin. We're growing faster than exponential. And when you are growing faster than exponential, there is no amount of profit that you can make that matters. What matters is getting a toehold in the market and becoming relevant.
Ready to go? Jonathan, thank you so much for agreeing to do this in Paris. You look fantastic, by the way. I feel so underdressed, but you look great.
Thank you. I, I could take the tie off if you want, but I'll never be able to tie it again. I don't know how to tie a tie.
(laughs)
No, literally, my Chief of Staff has to tie it for me, it's, and it's, like, a struggle 'cause, like, he's putting it on himself, he's tying it. I, I literally only bought this suit recently.
Well, I mean, you look fantastic. I don't, I, I think, have a suit, so you're one up on me. I wanna split the show into two parts today. I wanna talk about the landscape where we're at, and then I wanna dive specifically into Grok, where you're at. You've announced a massive new deal that I think everyone's slightly misunderstanding-
(laughs)
... what we were just talking about. Um, I just want to start on where we're at. In terms of, like, scaling laws, everyone says we are at the limits of scaling laws, and then there seems to be exponential innovation happening with the likes of DeepSeek and others.
Yeah.
Where are we at in terms of the limits of scaling laws?
So scaling laws is a paper that was published by OpenAI, and what it does is it effectively says the more parameters your model has, basically the better it can absorb information. So you'll, you'll see these curves that they draw, and they're, they're amazing. You should show it if you can. But effectively, um, you have these sort of asymptotic drop-offs where y- you keep getting better and better, but you get a logarithmic improvement when you put a linear number of tokens in. This is why you see people doing 15 mil- uh, trillion tokens of training and whatnot. But they're misunderstood because, um, the assumption is that all of the, all of the data is the same quality. So, uh, you have a kid now, right?
Mm-hmm.
So eventually, you're gonna be training your kid, and you're gonna say, and play along with me here, "What's one plus one?"
Install uListen to search the full transcript and get AI-powered insights
Get Full TranscriptGet more from every podcast
AI summaries, searchable transcripts, and fact-checking. Free forever.
Add to Chrome