David SenraFrom Near Death to a $20B NVIDIA Deal | Jonathan Ross, Groq
CHAPTERS
- 0:02 – 0:25
A $20B NVIDIA partnership negotiated and wired in three weeks
David opens with the rumored $20B NVIDIA partnership and how unusually fast it came together. Jonathan emphasizes speed of execution as a core trait of Jensen Huang and why moving quickly is decisive in tech.
- •The first call floated the idea ~3 weeks before money hit the bank
- •Jensen’s bias for speed as a competitive advantage
- •Deal origin is tied to Groq’s work integrating with NVIDIA hardware
- 0:25 – 1:46
Why GPUs and LPUs work better together than either alone
Jonathan explains the complementary strengths of GPUs and Groq’s LPUs across different bottlenecks in LLM inference. The combined system improves performance because no single architecture wins across all workloads.
- •LLM token processing includes both compute- and memory-throughput-bound matmuls
- •Compute-heavy pieces map well to GPUs; memory-throughput-heavy pieces map well to LPUs
- •Bottlenecks vary across layers—hybrid architectures can “defeat” more of them
- •Initial goal was to buy ~100,000 GPUs to deploy internally
- 1:46 – 3:30
When AI talks to AI, latency becomes the bottleneck (and payments follow)
They zoom out to the idea that AI agents will increasingly call other AIs, making response speed far more critical than human-interactive UX. This leads into agentic workflows that trigger many more transactions, especially micropayments.
- •Humans tolerate seconds of waiting; AIs idle at machine speed waiting on latency
- •Agentic systems chain tools/agents, creating compounding workloads
- •Micropayments could explode as agents transact autonomously
- •A hobby project revealed friction: proving “human” to buy phone numbers (Twilio)
- 3:30 – 5:47
Always start with a hobby project: building personal tools as R&D
Jonathan describes using side projects to test risky, cutting-edge ideas outside the work codebase. He shares examples like travel planning apps and a personalized “daily brief,” then iterates toward more interactive, question-driven consumption.
- •Personal builds in GCP/AWS to explore ideas safely and quickly
- •Examples: travel routing/seat optimization, math utilities, daily brief
- •Daily brief evolves from long text → headline summary + follow-up questions
- •Interactive learning beats static feeds/podcasts for rapid drill-down
- 5:47 – 8:19
The AI age rewards asking the right questions, not knowing the answers
Jonathan expands on his thesis: as AI commoditizes answers, the differentiator becomes question quality. He links this to leadership—effective leaders orchestrate, probe, and surface the questions others miss.
- •Success shifts from answering → asking (school trains the opposite)
- •Leaders add value by asking incisive questions across domains
- •AI output quality is tightly coupled to prompt/question quality
- •Trend toward “everyone becomes a leader,” increasingly leading AI agents
- 8:19 – 12:51
Leadership has infinite styles—find the one that’s true to you
Jonathan frames leadership as “having followers” and argues there’s no single correct style. He explains his natural bias toward delegation and autonomy (even in personal life), and why mismatched borrowed styles fail founders.
- •Leadership definition: you’re not a leader without followers
- •Many valid leadership modes, analogous to many investing styles
- •Jonathan’s style: high autonomy, low need for control (e.g., doesn’t drive)
- •Choose environments early that teach lessons compatible with your temperament
- 12:51 – 16:31
From ‘world’s worst leader’ to ‘one brutally clear priority’
Jonathan details painful early mistakes at Groq: delegating to people not ready for autonomy and then failing at command-and-control when frustrated. He evolved toward crisp goals (e.g., a challenge coin) and minimizing constraints to unlock creativity.
- •Early failure mode: too much latitude without hiring for autonomy
- •Commanding late felt unnatural and was rejected by the team
- •Solution: simple, organization-wide goal (challenge coin: 25M tokens/sec)
- •Fewer constraints → more freedom → more ‘good surprises’ from creative teams
- 16:31 – 22:18
What NVIDIA taught him: no politics, radical clarity, and customer truth
Jonathan contrasts NVIDIA’s operating cadence with typical org dysfunction. He highlights how minimizing private side-channels reduces politics and how Jensen’s customer-first truth-telling avoids “3D chess” distractions.
- •At NVIDIA: exceptionally low politics for a large organization
- •Avoid mismatched messaging: prefer group communication over fragmented 1:1s
- •Make discussions public: copy all relevant people; don’t enable backchannels
- •Jensen’s focus: build what customers need; don’t sell what you don’t believe
- 22:18 – 26:08
Fundraising realities: East Coast independence, West Coast herding, and the beauty contest
Jonathan explains why Groq struggled raising capital at times, including investor network effects and reputation dynamics. He introduces the Keynesian beauty contest metaphor to describe why VCs often optimize for what others will fund, not what’s best.
- •Groq raised from VCs who later fell out of favor, complicating follow-on rounds
- •West Coast pattern: one pass can cascade into many passes; one yes can attract others
- •East Coast pattern: more independent analysis; less signaling dependence
- •Keynesian beauty contest: betting on what others will reward, not intrinsic merit
- •Modern shift: startups can be ‘overfunded’; extra money no longer always confers advantage
- 26:08 – 34:41
The autonomy that created the NVIDIA deal—and why Jensen moved immediately
Jonathan tells the internal story: the integration push came from his COO, reflecting the culture of autonomy. He then explains why speed mattered strategically—fast token generation changes what’s possible and compounds across workflows.
- •Insight originated with COO Sunny; team executed without needing Jonathan to drive it
- •Correct split is within generation/decoder work, not just prefill vs generate
- •They weren’t afraid to show NVIDIA because they wanted to be a GPU customer
- •Jensen moved fast due to opportunity cost and immediate customer value from speed
- 34:41 – 34:52
Making models smarter by making them faster: the AlphaGo/TPU lesson
Jonathan uses AlphaGo’s TPU migration as a concrete example where more compute depth improved “intelligence” without changing the model. Faster search enables deeper rollouts, surfacing higher-quality moves—and the same principle applies to LLM inference speed.
- •DeepMind urgently ported to TPUs ahead of the Lee Sedol match
- •ELO jumped dramatically on faster hardware despite the ‘same model’
- •AI ranks moves then plays out deeper chains; more depth uncovers better options
- •Speed increases effective reasoning (reflection/search), improving output quality
- •GPU+LPU pairing aims to unlock this advantage for modern models
- 34:52 – 38:34
Reality Quotient: pick the dominant game—and lead as full-time change management
Jonathan distinguishes “reality quotient” from intelligence: seeing what game matters most and aligning the org to it. He explains how founders must reframe work as continuity (not disruptive change) so teams can adapt without resistance.
- •Reality quotient: recognizing reality and choosing the dominant game to win
- •Example: Facebook optimizing MAUs vs MySpace optimizing signups
- •Groq’s dominant metric: deployed capacity toward 25M tokens/sec (many paths to contribute)
- •Founder’s job becomes full-time change management
- •Principle: make change feel like it isn’t change by anchoring on stable purpose
- 38:34 – 50:52
Return on luck, virality of speed, and ‘I intend to’ intentional leadership
Jonathan recounts missed and seized moments around early LLM inference opportunities, then the breakthrough: you can’t sell speed—people must feel it. He connects this to “intentional leadership,” changing phrasing to reduce pessimistic friction while preserving critical feedback.
- •Return on luck: winners don’t get more luck; they capitalize better
- •Missed early chances (e.g., GitHub/Microsoft GPU scarcity) by accepting internal pessimism
- •Market didn’t ‘get’ speed until users tried it; a viral demo made it obvious
- •Speed became ‘eye candy’ and drove organic experimentation and app building
- •Intentional leadership (Marquet): say “I intend to…” to mobilize execution without inviting endless opinion
- 50:52 – 1:11:50
Survival tactics and the founder mindset: Groq bonds, hiring for negatives, and compute urgency
Jonathan explains how Groq survived a near-death cash moment by swapping salary for equity, putting “everyone’s hands on the steering wheel.” He then shares counterintuitive hiring lessons (screen for negatives, loss-aversion as drive), manufactured discontent, and his optimism about AI’s future—especially as code becomes nearly free and education shifts to question-asking.
- •Groq bonds: salary-for-equity to avoid fatal layoffs; ~80% participated; runway extended
- •Psychology: shared control reduces fear and increases commitment
- •Hiring shift: stop selecting for positives; screen out organizationally toxic negatives
- •Loss aversion ‘book the win early’ mindset accelerates product decisions and ambition
- •Manufactured (divine) discontent sustains elite performance; current discontent: global compute shortage
- •Optimistic AI future: code marginal cost → ~0; non-engineers build useful software; education should teach asking questions