From Near Death to a $20B NVIDIA Deal | Jonathan Ross, Groq

Jonathan Ross is the founder of Groq and the inventor of the Google Tensor Processing Unit (TPU), now a senior executive at NVIDIA following the company's $20 billion partnership with Groq. Before Groq, Ross built something that didn't exist: a custom AI chip at Google called the TPU, which became the backbone of DeepMind's AlphaGo — the system that defeated world Go champion Lee Sedol in 2016. After watching the TPU push AlphaGo's ELO score up by hundreds of points overnight, Ross grasped a principle that would define his next decade: faster inference produces more capable models. He left Google to act on it. Groq's first decade was brutal. Early West Coast VCs passed — and would later watch as NVIDIA announced what Ross describes as the firm's largest deal by nearly 3x. Ross came within weeks of running out of money. Rather than lay off the engineers he needed to hit a critical product milestone, he created "Groq bonds" — war-bond–style instruments that exchanged salary for equity. About 80% of the team participated; nearly half took statutory minimum wage. They saved two months of runway and kept the company alive. The core bet Ross made — that fast inference would matter — was widely dismissed, inside Groq and out. When the CEO of GitHub called needing chips to run LLMs, Ross's own engineers told him it couldn't be done. He eventually stopped asking and started declaring: "I intend to do this." He describes that shift — from inviting pessimism to announcing direction — as the most important leadership change he made. Now at NVIDIA, Ross carries what he calls manufactured discontent: a deliberate refusal to rest, convinced that every day without sufficient compute is a day the world waits longer for cures for cancer and aging. Show notes: https://www.davidsenra.com/episode/jonathan-ross Made possible by Ramp: ⁠https://ramp.com AppLovin: https://applovin.com/senra Deel: https://deel.com/senra David Senra X: https://x.com/davidsenra Instagram: https://www.instagram.com/davidsenra LinkedIn: https://www.linkedin.com/in/davidsenra Facebook: https://www.linkedin.com/company/senrashow Threads: https://www.threads.com/@davidsenra Spotify: https://spti.fi/TVrr557 Apple Podcasts: https://apple.co/4msoZtb Website: https://www.davidsenra.com Jonathan Ross X: https://x.com/JonathanRoss321 LinkedIn: https://www.linkedin.com/in/ross-jonathan Chapters 00:00:00 The $20 Billion NVIDIA Deal Closed In 3 Weeks 00:00:25 Why GPUs And LPUs Are Better Together 00:01:46 When AI Talks To AI, Speed Wins 00:03:30 Always Start With A Hobby Project 00:05:55 Ask The Right Questions, Not Answer Them 00:08:23 There Are Infinite Ways To Be A Leader 00:13:00 I Was One Of The World's Worst Leaders 00:14:34 Fewer Constraints, More Room To Surprise You 00:16:31 At NVIDIA There Is No Politics 00:19:44 You Have To Learn Confidence 00:22:23 East Coast VCs Think, West Coast VCs Follow 00:23:50 The Keynesian Beauty Contest Of Silicon Valley 00:26:48 The Autonomy That Created The NVIDIA Deal 00:30:07 Making A Model Smarter By Making It Faster 00:34:52 Reality Quotient Beats Intelligence Quotient 00:35:44 Find The Dominant Game And Play It 00:37:11 A Founder's Job Is Full-Time Change Management 00:38:34 Return On Luck: Seize It Better Than Anyone 00:42:54 You Can't Sell Speed, You Have To Let People Try It 00:46:32 I Intend To: Intentional Leadership 00:51:07 Groq Bonds: Trading Salary For Survival 00:54:13 Hire For Negatives, Grow For Positives 00:58:46 Loss Aversion And Booking The Win Early 01:00:37 How Michael Jordan Weaponized Humiliation 01:03:13 Manufactured Discontent Drives Everything 01:05:02 Every Day Without Compute Has A Real Cost 01:07:07 Code Was Rationed, Now It's Nearly Free 01:10:04 Teach Kids To Ask Questions, Not Answer Them #davidsenra #groq #nvidia

David SenrahostJonathan Rossguest

Jul 5, 20261h 11mWatch on YouTube ↗

CHAPTERS

0:02 – 0:25
A $20B NVIDIA partnership negotiated and wired in three weeks
David opens with the rumored $20B NVIDIA partnership and how unusually fast it came together. Jonathan emphasizes speed of execution as a core trait of Jensen Huang and why moving quickly is decisive in tech.
- •The first call floated the idea ~3 weeks before money hit the bank
- •Jensen’s bias for speed as a competitive advantage
- •Deal origin is tied to Groq’s work integrating with NVIDIA hardware
0:25 – 1:46
Why GPUs and LPUs work better together than either alone
Jonathan explains the complementary strengths of GPUs and Groq’s LPUs across different bottlenecks in LLM inference. The combined system improves performance because no single architecture wins across all workloads.
- •LLM token processing includes both compute- and memory-throughput-bound matmuls
- •Compute-heavy pieces map well to GPUs; memory-throughput-heavy pieces map well to LPUs
- •Bottlenecks vary across layers—hybrid architectures can “defeat” more of them
- •Initial goal was to buy ~100,000 GPUs to deploy internally
1:46 – 3:30
When AI talks to AI, latency becomes the bottleneck (and payments follow)
They zoom out to the idea that AI agents will increasingly call other AIs, making response speed far more critical than human-interactive UX. This leads into agentic workflows that trigger many more transactions, especially micropayments.
- •Humans tolerate seconds of waiting; AIs idle at machine speed waiting on latency
- •Agentic systems chain tools/agents, creating compounding workloads
- •Micropayments could explode as agents transact autonomously
- •A hobby project revealed friction: proving “human” to buy phone numbers (Twilio)
3:30 – 5:47
Always start with a hobby project: building personal tools as R&D
Jonathan describes using side projects to test risky, cutting-edge ideas outside the work codebase. He shares examples like travel planning apps and a personalized “daily brief,” then iterates toward more interactive, question-driven consumption.
- •Personal builds in GCP/AWS to explore ideas safely and quickly
- •Examples: travel routing/seat optimization, math utilities, daily brief
- •Daily brief evolves from long text → headline summary + follow-up questions
- •Interactive learning beats static feeds/podcasts for rapid drill-down
5:47 – 8:19
The AI age rewards asking the right questions, not knowing the answers
Jonathan expands on his thesis: as AI commoditizes answers, the differentiator becomes question quality. He links this to leadership—effective leaders orchestrate, probe, and surface the questions others miss.
- •Success shifts from answering → asking (school trains the opposite)
- •Leaders add value by asking incisive questions across domains
- •AI output quality is tightly coupled to prompt/question quality
- •Trend toward “everyone becomes a leader,” increasingly leading AI agents
8:19 – 12:51
Leadership has infinite styles—find the one that’s true to you
Jonathan frames leadership as “having followers” and argues there’s no single correct style. He explains his natural bias toward delegation and autonomy (even in personal life), and why mismatched borrowed styles fail founders.
- •Leadership definition: you’re not a leader without followers
- •Many valid leadership modes, analogous to many investing styles
- •Jonathan’s style: high autonomy, low need for control (e.g., doesn’t drive)
- •Choose environments early that teach lessons compatible with your temperament
12:51 – 16:31
From ‘world’s worst leader’ to ‘one brutally clear priority’
Jonathan details painful early mistakes at Groq: delegating to people not ready for autonomy and then failing at command-and-control when frustrated. He evolved toward crisp goals (e.g., a challenge coin) and minimizing constraints to unlock creativity.
- •Early failure mode: too much latitude without hiring for autonomy
- •Commanding late felt unnatural and was rejected by the team
- •Solution: simple, organization-wide goal (challenge coin: 25M tokens/sec)
- •Fewer constraints → more freedom → more ‘good surprises’ from creative teams
16:31 – 22:18
What NVIDIA taught him: no politics, radical clarity, and customer truth
Jonathan contrasts NVIDIA’s operating cadence with typical org dysfunction. He highlights how minimizing private side-channels reduces politics and how Jensen’s customer-first truth-telling avoids “3D chess” distractions.
- •At NVIDIA: exceptionally low politics for a large organization
- •Avoid mismatched messaging: prefer group communication over fragmented 1:1s
- •Make discussions public: copy all relevant people; don’t enable backchannels
- •Jensen’s focus: build what customers need; don’t sell what you don’t believe
22:18 – 26:08
Fundraising realities: East Coast independence, West Coast herding, and the beauty contest
Jonathan explains why Groq struggled raising capital at times, including investor network effects and reputation dynamics. He introduces the Keynesian beauty contest metaphor to describe why VCs often optimize for what others will fund, not what’s best.
- •Groq raised from VCs who later fell out of favor, complicating follow-on rounds
- •West Coast pattern: one pass can cascade into many passes; one yes can attract others
- •East Coast pattern: more independent analysis; less signaling dependence
- •Keynesian beauty contest: betting on what others will reward, not intrinsic merit
- •Modern shift: startups can be ‘overfunded’; extra money no longer always confers advantage
26:08 – 34:41
The autonomy that created the NVIDIA deal—and why Jensen moved immediately
Jonathan tells the internal story: the integration push came from his COO, reflecting the culture of autonomy. He then explains why speed mattered strategically—fast token generation changes what’s possible and compounds across workflows.
- •Insight originated with COO Sunny; team executed without needing Jonathan to drive it
- •Correct split is within generation/decoder work, not just prefill vs generate
- •They weren’t afraid to show NVIDIA because they wanted to be a GPU customer
- •Jensen moved fast due to opportunity cost and immediate customer value from speed
34:41 – 34:52
Making models smarter by making them faster: the AlphaGo/TPU lesson
Jonathan uses AlphaGo’s TPU migration as a concrete example where more compute depth improved “intelligence” without changing the model. Faster search enables deeper rollouts, surfacing higher-quality moves—and the same principle applies to LLM inference speed.
- •DeepMind urgently ported to TPUs ahead of the Lee Sedol match
- •ELO jumped dramatically on faster hardware despite the ‘same model’
- •AI ranks moves then plays out deeper chains; more depth uncovers better options
- •Speed increases effective reasoning (reflection/search), improving output quality
- •GPU+LPU pairing aims to unlock this advantage for modern models
34:52 – 38:34
Reality Quotient: pick the dominant game—and lead as full-time change management
Jonathan distinguishes “reality quotient” from intelligence: seeing what game matters most and aligning the org to it. He explains how founders must reframe work as continuity (not disruptive change) so teams can adapt without resistance.
- •Reality quotient: recognizing reality and choosing the dominant game to win
- •Example: Facebook optimizing MAUs vs MySpace optimizing signups
- •Groq’s dominant metric: deployed capacity toward 25M tokens/sec (many paths to contribute)
- •Founder’s job becomes full-time change management
- •Principle: make change feel like it isn’t change by anchoring on stable purpose
38:34 – 50:52
Return on luck, virality of speed, and ‘I intend to’ intentional leadership
Jonathan recounts missed and seized moments around early LLM inference opportunities, then the breakthrough: you can’t sell speed—people must feel it. He connects this to “intentional leadership,” changing phrasing to reduce pessimistic friction while preserving critical feedback.
- •Return on luck: winners don’t get more luck; they capitalize better
- •Missed early chances (e.g., GitHub/Microsoft GPU scarcity) by accepting internal pessimism
- •Market didn’t ‘get’ speed until users tried it; a viral demo made it obvious
- •Speed became ‘eye candy’ and drove organic experimentation and app building
- •Intentional leadership (Marquet): say “I intend to…” to mobilize execution without inviting endless opinion
50:52 – 1:11:50
Survival tactics and the founder mindset: Groq bonds, hiring for negatives, and compute urgency
Jonathan explains how Groq survived a near-death cash moment by swapping salary for equity, putting “everyone’s hands on the steering wheel.” He then shares counterintuitive hiring lessons (screen for negatives, loss-aversion as drive), manufactured discontent, and his optimism about AI’s future—especially as code becomes nearly free and education shifts to question-asking.
- •Groq bonds: salary-for-equity to avoid fatal layoffs; ~80% participated; runway extended
- •Psychology: shared control reduces fear and increases commitment
- •Hiring shift: stop selecting for positives; screen out organizationally toxic negatives
- •Loss aversion ‘book the win early’ mindset accelerates product decisions and ambition
- •Manufactured (divine) discontent sustains elite performance; current discontent: global compute shortage
- •Optimistic AI future: code marginal cost → ~0; non-engineers build useful software; education should teach asking questions

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

iOS

Android

Claude

Chrome

A $20B NVIDIA partnership negotiated and wired in three weeks

Why GPUs and LPUs work better together than either alone

When AI talks to AI, latency becomes the bottleneck (and payments follow)

Always start with a hobby project: building personal tools as R&D

The AI age rewards asking the right questions, not knowing the answers

Leadership has infinite styles—find the one that’s true to you

From ‘world’s worst leader’ to ‘one brutally clear priority’

What NVIDIA taught him: no politics, radical clarity, and customer truth

Fundraising realities: East Coast independence, West Coast herding, and the beauty contest

The autonomy that created the NVIDIA deal—and why Jensen moved immediately

Making models smarter by making them faster: the AlphaGo/TPU lesson

Reality Quotient: pick the dominant game—and lead as full-time change management

Return on luck, virality of speed, and ‘I intend to’ intentional leadership

Survival tactics and the founder mindset: Groq bonds, hiring for negatives, and compute urgency

Get more out of YouTube videos.