David SenraFrom Near Death to a $20B NVIDIA Deal | Jonathan Ross, Groq
At a glance
WHAT IT’S REALLY ABOUT
Groq’s Jonathan Ross on speed, leadership, and seizing luck fast
- Ross says Groq’s LPU and NVIDIA GPUs are complementary because different LLM operations are compute-bound versus memory-throughput-bound, and splitting work across both defeats bottlenecks across the decode path.
- He argues inference speed becomes decisive in an “AI talking to AI” world, where agents chain tasks and payments, making latency and token generation throughput economically and strategically critical.
- Ross describes a $20B-rumored NVIDIA partnership that moved from first call to money wired in roughly three weeks, enabled by Groq’s autonomous team experimenting with GPU+LPU integration and NVIDIA’s urgency around customer value and opportunity cost.
- He reframes modern success around asking the right questions (not answering them), tying it to interactive AI learning, agent workflows, and a coming shift where more people become “leaders of AI.”
- Ross shares founder-operating lessons from Groq: intentional leadership (“I intend to…”), minimizing politics via radical transparency, hiring for negatives and “reality quotient,” and survival tactics like “Groq bonds” when the company neared running out of cash.
IDEAS WORTH REMEMBERING
5 ideasGPUs and LPUs win together by targeting different bottlenecks.
Ross frames GPUs as best for compute-heavy portions (e.g., attention) and LPUs as best for memory-throughput-heavy portions (e.g., applying weights), arguing hybrid execution improves performance across varied matmul bottlenecks rather than betting on one “perfect” architecture.
Speed is not just UX—faster inference can make models “smarter.”
Using AlphaGo’s TPU jump as an analogy, Ross claims more compute per unit time enables deeper search/verification, surfacing better moves (or answers), so latency and throughput improvements can translate into higher-quality outputs, not merely quicker ones.
Agentic AI makes latency an economic multiplier, not a nicety.
Humans tolerate seconds; agents chain tools, spawn parallel research, and potentially execute micropayments—so time saved compounds across workflows, making fast token generation a key enabler of “AI using AI.”
To sell speed, you must let people experience it firsthand.
Ross says demonstrations weren’t enough—value clicked only when users could ask their own questions and feel instantaneous responses; a viral clip of Groq speed triggered developer experimentation and accelerated adoption.
Autonomy requires one clear objective and fewer constraints to unlock surprise.
He credits Groq’s progress to setting an extremely crisp goal (e.g., “25M tokens/sec” on a challenge coin) while avoiding over-constraint, because innovation at scale requires teams to surprise leadership with solutions.
WORDS WORTH SAVING
5 quotesThe fewer constraints that you give someone, the more freedom they have to solve the problem, and the more freedom they have to surprise you with the solution.
— Jonathan Ross
The only way for your team to innovate, right, without you being the innovator is they must be able to surprise you in a good way, which means you must not overconstrain the goal.
— Jonathan Ross
There are plenty of really smart people who wouldn't recognize reality if it tapped them on the shoulder.
— Jonathan Ross
Moving from being an engineer to being a founder, the thing that finally clicked for me was if I was gonna do something disruptive, my job was full-time change management. And the first principle of change management is to make it feel like it isn't a change.
— Jonathan Ross
We were about three weeks from running out of money at one point.
— Jonathan Ross
High quality AI-generated summary created from speaker-labeled transcript.