From Near Death to a $20B NVIDIA Deal | Jonathan Ross, Groq

Jonathan Ross is the founder of Groq and the inventor of the Google Tensor Processing Unit (TPU), now a senior executive at NVIDIA following the company's $20 billion partnership with Groq. Before Groq, Ross built something that didn't exist: a custom AI chip at Google called the TPU, which became the backbone of DeepMind's AlphaGo — the system that defeated world Go champion Lee Sedol in 2016. After watching the TPU push AlphaGo's ELO score up by hundreds of points overnight, Ross grasped a principle that would define his next decade: faster inference produces more capable models. He left Google to act on it. Groq's first decade was brutal. Early West Coast VCs passed — and would later watch as NVIDIA announced what Ross describes as the firm's largest deal by nearly 3x. Ross came within weeks of running out of money. Rather than lay off the engineers he needed to hit a critical product milestone, he created "Groq bonds" — war-bond–style instruments that exchanged salary for equity. About 80% of the team participated; nearly half took statutory minimum wage. They saved two months of runway and kept the company alive. The core bet Ross made — that fast inference would matter — was widely dismissed, inside Groq and out. When the CEO of GitHub called needing chips to run LLMs, Ross's own engineers told him it couldn't be done. He eventually stopped asking and started declaring: "I intend to do this." He describes that shift — from inviting pessimism to announcing direction — as the most important leadership change he made. Now at NVIDIA, Ross carries what he calls manufactured discontent: a deliberate refusal to rest, convinced that every day without sufficient compute is a day the world waits longer for cures for cancer and aging. Show notes: https://www.davidsenra.com/episode/jonathan-ross Made possible by Ramp: ⁠https://ramp.com AppLovin: https://applovin.com/senra Deel: https://deel.com/senra David Senra X: https://x.com/davidsenra Instagram: https://www.instagram.com/davidsenra LinkedIn: https://www.linkedin.com/in/davidsenra Facebook: https://www.linkedin.com/company/senrashow Threads: https://www.threads.com/@davidsenra Spotify: https://spti.fi/TVrr557 Apple Podcasts: https://apple.co/4msoZtb Website: https://www.davidsenra.com Jonathan Ross X: https://x.com/JonathanRoss321 LinkedIn: https://www.linkedin.com/in/ross-jonathan Chapters 00:00:00 The $20 Billion NVIDIA Deal Closed In 3 Weeks 00:00:25 Why GPUs And LPUs Are Better Together 00:01:46 When AI Talks To AI, Speed Wins 00:03:30 Always Start With A Hobby Project 00:05:55 Ask The Right Questions, Not Answer Them 00:08:23 There Are Infinite Ways To Be A Leader 00:13:00 I Was One Of The World's Worst Leaders 00:14:34 Fewer Constraints, More Room To Surprise You 00:16:31 At NVIDIA There Is No Politics 00:19:44 You Have To Learn Confidence 00:22:23 East Coast VCs Think, West Coast VCs Follow 00:23:50 The Keynesian Beauty Contest Of Silicon Valley 00:26:48 The Autonomy That Created The NVIDIA Deal 00:30:07 Making A Model Smarter By Making It Faster 00:34:52 Reality Quotient Beats Intelligence Quotient 00:35:44 Find The Dominant Game And Play It 00:37:11 A Founder's Job Is Full-Time Change Management 00:38:34 Return On Luck: Seize It Better Than Anyone 00:42:54 You Can't Sell Speed, You Have To Let People Try It 00:46:32 I Intend To: Intentional Leadership 00:51:07 Groq Bonds: Trading Salary For Survival 00:54:13 Hire For Negatives, Grow For Positives 00:58:46 Loss Aversion And Booking The Win Early 01:00:37 How Michael Jordan Weaponized Humiliation 01:03:13 Manufactured Discontent Drives Everything 01:05:02 Every Day Without Compute Has A Real Cost 01:07:07 Code Was Rationed, Now It's Nearly Free 01:10:04 Teach Kids To Ask Questions, Not Answer Them #davidsenra #groq #nvidia

David SenrahostJonathan Rossguest

Jul 5, 20261h 11mWatch on YouTube ↗

WHAT IT’S REALLY ABOUT

Groq’s Jonathan Ross on speed, leadership, and seizing luck fast

Ross says Groq’s LPU and NVIDIA GPUs are complementary because different LLM operations are compute-bound versus memory-throughput-bound, and splitting work across both defeats bottlenecks across the decode path.
He argues inference speed becomes decisive in an “AI talking to AI” world, where agents chain tasks and payments, making latency and token generation throughput economically and strategically critical.
Ross describes a $20B-rumored NVIDIA partnership that moved from first call to money wired in roughly three weeks, enabled by Groq’s autonomous team experimenting with GPU+LPU integration and NVIDIA’s urgency around customer value and opportunity cost.
He reframes modern success around asking the right questions (not answering them), tying it to interactive AI learning, agent workflows, and a coming shift where more people become “leaders of AI.”
Ross shares founder-operating lessons from Groq: intentional leadership (“I intend to…”), minimizing politics via radical transparency, hiring for negatives and “reality quotient,” and survival tactics like “Groq bonds” when the company neared running out of cash.

IDEAS WORTH REMEMBERING

5 ideas

GPUs and LPUs win together by targeting different bottlenecks.

Ross frames GPUs as best for compute-heavy portions (e.g., attention) and LPUs as best for memory-throughput-heavy portions (e.g., applying weights), arguing hybrid execution improves performance across varied matmul bottlenecks rather than betting on one “perfect” architecture.

Speed is not just UX—faster inference can make models “smarter.”

Using AlphaGo’s TPU jump as an analogy, Ross claims more compute per unit time enables deeper search/verification, surfacing better moves (or answers), so latency and throughput improvements can translate into higher-quality outputs, not merely quicker ones.

Agentic AI makes latency an economic multiplier, not a nicety.

Humans tolerate seconds; agents chain tools, spawn parallel research, and potentially execute micropayments—so time saved compounds across workflows, making fast token generation a key enabler of “AI using AI.”

To sell speed, you must let people experience it firsthand.

Ross says demonstrations weren’t enough—value clicked only when users could ask their own questions and feel instantaneous responses; a viral clip of Groq speed triggered developer experimentation and accelerated adoption.

Autonomy requires one clear objective and fewer constraints to unlock surprise.

He credits Groq’s progress to setting an extremely crisp goal (e.g., “25M tokens/sec” on a challenge coin) while avoiding over-constraint, because innovation at scale requires teams to surprise leadership with solutions.

WORDS WORTH SAVING

5 quotes

The fewer constraints that you give someone, the more freedom they have to solve the problem, and the more freedom they have to surprise you with the solution.

— Jonathan Ross

The only way for your team to innovate, right, without you being the innovator is they must be able to surprise you in a good way, which means you must not overconstrain the goal.

— Jonathan Ross

There are plenty of really smart people who wouldn't recognize reality if it tapped them on the shoulder.

— Jonathan Ross

Moving from being an engineer to being a founder, the thing that finally clicked for me was if I was gonna do something disruptive, my job was full-time change management. And the first principle of change management is to make it feel like it isn't a change.

— Jonathan Ross

We were about three weeks from running out of money at one point.

— Jonathan Ross

$20B-rumored NVIDIA deal executed in weeksGPU + LPU workload partitioning for LLM decodingAgentic AI and why latency dominatesHobby projects as a sandbox for innovationAsking questions as the core AI-age skillAutonomy, constraints, and “brutally clear priority” goalsReality Quotient, dominant game selection, and change managementReturn on Luck and intentional leadership languageGroq bonds (salary-for-equity) as a runway extensionHiring: avoid negatives, loss aversion, “poetic design”Manufactured discontent and mission-driven urgencyCode becoming nearly free; education redesigned around questions

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.