Skip to content
The Twenty Minute VCThe Twenty Minute VC

Jonathan Ross: DeepSeek Special - How Should OpenAI and the US Government Respond | E1253

Jonathan Ross is the Co-Founder and CEO of Groq, providing fast AI inference. Prior to founding Groq, Jonathan started Google’s TPU effort where he designed and implemented the core elements of the original chip. Jonathan then joined Google X’s Rapid Eval Team, the initial stage of the famed “Moonshots factory,” where he devised and incubated new Bets (Units) for Alphabet. ---------------------------------------------- Timestamps: (00:00) Intro (02:01) Is DeepSeek News as Big a Deal as It Seems? (04:33) Distillation & DeepSeek's Use of OpenAI Data (07:18) Scraping OpenAI Models for Higher Quality Output (11:40) Concerns About US Customer Data Going to China (13:08) DeepSeek and Its Potential Use by the CCP (23:13) Is DeepSeek Diminishing OpenAI's Distribution Advantage? (33:07) Advising the EU on Europe's Stance Today (34:48) Perplexity in 3 Years (37:22) Commoditization of Models & Big Tech's Stock Struggles (41:01) Nvidia's High Margins and the Strength of Their Moat (42:55) The Future of Efficiency After Nvidia's Success (49:50) The $500BN Stargate Project (54:01) Excitement or Nerves in the AI Arms Race? (57:54) Where Does Value Accrue in Wrapper Apps & Foundation Models? ---------------------------------------------- The 10 Most Important Questions on DeepSeek: - How did DeepSeek innovate in a way that no other model provider has done? - Do we believe that they only spent $6M to train R1? - Should we doubt their claims on limited H100 usage? Is Josh Kushner right that this is a potential violation of US export laws? - Is DeepSeek an instrument used by the CCP to acquire US consumer data? - How does DeepSeek being open-source change the nature of this discussion? - What should OpenAI do now? What should they not do? - Does DeepSeek hurt or help Meta who already have their open-source efforts with Lama? - Will this market follow Satya Nadella’s suggestion of Jevon’s Paradox? - How much more efficient will foundation models become? - What does this mean for the $500BN Stargate project announced last week? ----------------------------------------------- Subscribe on Spotify: https://open.spotify.com/show/3j2KMcZTtgTNBKwtZBMHvl?si=85bc9196860e4466 Subscribe on Apple Podcasts: https://podcasts.apple.com/us/podcast/the-twenty-minute-vc-20vc-venture-capital-startup/id958230465 Follow Harry Stebbings on Twitter: https://twitter.com/HarryStebbings Follow Jonathan Ross on Twitter: https://twitter.com/JonathanRoss321 Follow 20VC on Instagram: https://www.instagram.com/20vchq Follow 20VC on TikTok: https://www.tiktok.com/@20vc_tok Visit our Website: https://www.20vc.com Subscribe to our Newsletter: https://www.thetwentyminutevc.com/contact ----------------------------------------------- #20vc #harrystebbings #jonathanross #groq #founder #CEO #venturecapital #startups #deepseek #openai #ai #samaltman #trump

Harry StebbingshostJonathan Rossguest
Jan 29, 20251h 0mWatch on YouTube ↗

CHAPTERS

  1. 0:00 – 1:39

    DeepSeek as “Sputnik 2.0”: why this moment matters

    Harry and Jonathan open with a blunt claim: DeepSeek is a geopolitical and industry inflection point comparable to Sputnik. Jonathan previews the core themes—true cost vs headline cost, distillation from OpenAI, and why “open” changes the competitive landscape.

    • DeepSeek framed as a wake-up call for the US AI ecosystem
    • Training cost headlines vs the broader reality of what it took to get results
    • Early thesis: OpenAI may need to respond by open-sourcing to keep users
    • Why this is bigger than a normal model release (speed, surprise, adoption)
  2. 1:39 – 2:36

    Jonathan Ross’s vantage point: Google TPU → Groq and inference-first thinking

    Jonathan explains his background in AI hardware, from Google’s TPU work to founding Groq. This context sets up his emphasis on inference economics and how hardware constraints shape real-world AI competition.

    • Experience building accelerators (TPU) and founding Groq (LPUs)
    • Why hardware/inference realities matter as much as model quality
    • Positioning to comment on compute, efficiency, and deployment constraints
  3. 2:36 – 4:29

    Compute, data, and the marketing narrative behind the “$6M model”

    They unpack why DeepSeek surprised people: it appears to reach frontier-like performance with far fewer GPUs and budget. Jonathan argues the $6M number is partly marketing—data quality and post-training/distillation spend are the real story.

    • Most labs can access similar raw data; compute and data quality drive deltas
    • Scaling laws: more tokens/compute generally improves models, but not uniformly
    • DeepSeek’s reported 2,000 GPUs/60 days is plausible but incomplete framing
    • Key nuance: high-quality training signals can beat “more tokens of average data”
  4. 4:29 – 7:29

    Distillation explained: using OpenAI outputs as high-quality training signal

    Jonathan explains distillation as ‘learning from a smarter tutor’ and ties it to scaling-law assumptions about uniform data quality. He uses AlphaGo/AlphaGo Zero as an analogy for iterative self-improvement and shows how ‘jumping’ to a better teacher accelerates progress.

    • Distillation: train a student model on outputs from a stronger teacher model
    • Scaling laws assume uniform data quality; better data reduces required tokens
    • Synthetic data flywheel: model generates data → retrain → improved model
    • DeepSeek likely spent far more on generating/scraping high-quality outputs than on the base training run
  5. 7:29 – 9:31

    What DeepSeek innovated beyond copying: automated RL-style checking

    They address the ‘just copying’ critique. Jonathan argues DeepSeek also introduced genuinely clever reinforcement/automation tricks, especially replacing human grading with programmatic, deterministic checks where possible.

    • Not merely duplication—some reinforcement learning techniques are novel and simple
    • Replacing human preference labeling with automated ‘answer-in-the-box’ verification
    • Why automation reduces friction, cost, and ambiguity in training signals
    • Limits: Jonathan notes he hasn’t dug deeply into all reward-model details
  6. 9:31 – 11:41

    Export controls loopholes and OpenAI’s accidental subsidy via API usage

    Discussion shifts to how DeepSeek could get the compute and data it needed despite restrictions. Jonathan argues cloud access creates a major enforcement gap, and that heavy OpenAI API usage can effectively subsidize competitors if tokens are not fully profitable.

    • Why smuggling GPUs may be unnecessary if cloud GPUs are rentable from abroad
    • IP-blocking is porous; identity/location verification is hard to enforce
    • If API tokens are subsidized, large-scale scraping/distillation shifts cost to OpenAI
    • OpenAI may retain logs/data that could theoretically be used for training
  7. 11:41 – 13:08

    The biggest risk: US customer data exposure and the reality of “delete”

    They focus on data security and why users underestimate retention and government access. Jonathan argues ‘delete’ often means ‘marked deleted,’ and that even indirect data (neighbors, spouses, health info) can create vulnerabilities.

    • Data retention practices: deletion frequently means soft-delete, not true erasure
    • National security angle: state access to aggregated user data is the key concern
    • Second-order exposure: others can leak information about you unintentionally
    • Why AI apps amplify risk due to sensitive prompts and habitual usage
  8. 13:08 – 15:41

    CCP influence, content shaping, and the TikTok analogy—now with open source

    Harry asks directly whether DeepSeek could be used to increase CCP control; Jonathan says the structural issue is any China-based operator can be compelled. They discuss censorship behavior (e.g., Tiananmen responses) and the scarier prospect of subtle persuasion on contested topics.

    • Operating in China/HK can imply compulsory data access and answer constraints
    • Demonstrations of selective topic sensitivity and controlled outputs
    • Risk evolves from censorship to persuasion: “cogent” biased arguments at scale
    • Open source complicates traditional responses like bans or forced divestiture
  9. 15:41 – 17:21

    Why Groq chose to host DeepSeek: offering a ‘no-data-retention’ alternative

    Jonathan explains Groq’s difficult decision to run DeepSeek after it became the #1 app. The goal: let users access the model without sending data to China, leveraging Groq’s claim of storing nothing (memory-only).

    • Strategic shift: refusing Chinese models initially, then adding DeepSeek due to adoption
    • User protection framing: provide an alternative where prompts aren’t retained
    • Prediction: CCP may shift strategy after seeing success and seek data capture
    • DeepSeek as hedge-fund-origin project, but ‘influenced’ and potentially leveraged by the state
  10. 17:21 – 19:09

    Models are now commoditized: seven powers, moats, and OpenAI’s open-source dilemma

    Jonathan argues DeepSeek makes commoditization undeniable and shifts focus to defensibility (Hamilton Helmer’s seven powers). He claims OpenAI’s strongest current moat is brand, and suggests open-sourcing could be the best strategic response to preserve distribution and goodwill.

    • Commoditization reduces switching costs and erodes pricing power
    • Seven powers applied: brand, scale, network effects, switching costs, etc.
    • OpenAI’s choice: protect proprietary advantage vs win users via openness
    • Distribution advantage is weakening as alternatives spread quickly
  11. 19:09 – 37:23

    Why $500B ‘Stargate’ isn’t absurd: inference dwarfs training

    They debate whether massive infrastructure spending is ridiculed by efficiency gains. Jonathan argues the opposite: training breakthroughs trigger far larger inference spend, especially with test-time compute, so total compute demand can grow even as unit costs fall.

    • Google TPU origin story: ML ‘works’ but production cost explodes
    • Inference historically 10–20x training at Google; Jonathan predicts ~95% long-term
    • Test-time compute increases inference tokens dramatically for some queries
    • Efficiency doesn’t necessarily reduce spend; it often expands usage
  12. 37:23 – 41:01

    Big Tech stocks, Jevons paradox, and why cheaper tokens increase demand

    Harry asks why markets punished AI/hardware names; Jonathan says investors are over-indexing on training demand. He argues Jevons paradox and price elasticity mean lower costs create more applications, more developers, and ultimately more inference—and then more training again.

    • Market confusion: assuming efficiency implies fewer chips needed
    • Jevons paradox: lower cost → higher consumption → higher total spend
    • Developer adoption spikes when token costs drop and quality rises
    • Positive feedback loop: better inference demand motivates better training
  13. 41:01 – 42:55

    Nvidia’s margins and the emerging split: high-margin training vs high-volume inference

    They discuss whether Nvidia’s high margins invite disruption. Jonathan frames training as a premium niche and inference as the larger market, suggesting inference-specialized providers can absorb lower-margin volume while Nvidia preserves high-margin positioning.

    • Training as ‘mainframe-like’ high-margin business; inference as the larger TAM
    • Inference-focused chips/services can complement Nvidia’s margin structure
    • Many investors still misunderstand inference’s dominance despite years of signals
    • Clear definitions: training builds the model; inference uses it at scale
  14. 42:55 – 48:19

    Next efficiency wave: Mixture-of-Experts, sparse compute, and what competitors will copy

    Jonathan explains MoE architectures and how DeepSeek uses many experts but activates only a subset per token, reducing compute while maintaining capacity. He predicts widespread adoption of these ideas and intensified synthetic-data generation using vast GPU fleets.

    • MoE basics: dense models use all parameters; MoE routes to a subset of experts
    • DeepSeek’s large expert count enables sparsity and efficiency
    • More parameters can help retain information; sparsity controls runtime cost
    • Competitors will replicate architecture and scale synthetic data + training
  15. 48:19 – 1:00:01

    Compute bottlenecks, AI arms-race nerves, and where value accrues in apps

    They note DeepSeek limiting signups (Chinese phone numbers) as a sign of inference scarcity. The conversation closes on dual optimism and fear: AI-enabled cyber offense, deniable escalation, and—on the upside—rapid product creation, with value accruing to polished, crafted user experiences even in a ‘wrapper’ world.

    • Inference scarcity is real: serving users scales with end-user count, not researchers
    • AI security risk: LLMs finding exploits and automating offensive cyber operations
    • Arms-race dynamics: deniability and low-friction attacks increase escalation risk
    • Value in the stack: craftsmanship, reliability, and ‘details’ differentiate products

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.