The Twenty Minute VC

Andrew Feldman, Cerebras Co-Founder and CEO: The AI Chip Wars & The Plan to Break Nvidia's Dominance

Andrew Feldman is the Co-Founder and CEO @ Cerebras, the fastest AI inference + training platform in the world. In Sept 2024 the company filed to go public off the back of a rumoured $1BN deal with G42 in the UAE. Andrew is the leading expert for all things inference. ---------------------------------------------- In Today’s Episode We Discuss: (00:00) Intro (00:56) Where Was AI Landscape in 2015 When Cerebras Founded (02:34) NVIDIA’s Biggest Strength Has Become Their Biggest Weakness (04:07) What Happens to the Cost of Inference? (06:24) Why Are AI Algorithms So Inefficient? (20:58) Why is it Total BS That We Have Hit Scaling Laws? (25:26) What Will Be the Ratio of Synthetic to Human Data Used in 5 Years? (36:50) What Specifically Was So Impressive About DeepSeek? (37:16) Why is Distillation Not Wrong and OpenAI Need to Look in the Mirror? (38:07) Where Will Value Accrue in a World of AI? (40:13) How Will NVIDIA’s Market Position Change Over the Next Five Years? (48:18) Why is the CUDA Locking for NVIDIA BS? What is Their Weakness? (49:11) Why is Trump Better for Business than Biden? (01:01:22) Do We Underestimate China in a World of AI? (01:05:23) Quick-Fire Round ----------------------------------------------- Subscribe on Spotify: https://open.spotify.com/show/3j2KMcZTtgTNBKwtZBMHvl?si=85bc9196860e4466 Subscribe on Apple Podcasts: https://podcasts.apple.com/us/podcast/the-twenty-minute-vc-20vc-venture-capital-startup/id958230465 Follow Harry Stebbings on X: https://twitter.com/HarryStebbings Follow Andrew Feldman on X: https://twitter.com/andrewdfeldman Follow 20VC on Instagram: https://www.instagram.com/20vchq Follow 20VC on TikTok: https://www.tiktok.com/@20vc_tok Visit our Website: https://www.20vc.com Subscribe to our Newsletter: https://www.thetwentyminutevc.com/contact ----------------------------------------------- #20vc #harrystebbings #andrewfeldman #cerebras #ceo #founder #ai #nvidia #chips #cuda #deepseek

Andrew FeldmanguestHarry Stebbingshost

Mar 23, 20251h 14mWatch on YouTube ↗

WHAT IT’S REALLY ABOUT

Cerebras CEO on Beating Nvidia, AI Inference, and Future Architectures

Andrew Feldman, co-founder and CEO of Cerebras, explains how AI’s explosive growth is reshaping chip design, data centers, and the balance of power in the semiconductor industry. He details why current GPU architectures—especially their reliance on off-chip HBM—are fundamentally suboptimal for large-scale inference and how Cerebras’ wafer-scale SRAM-based design aims to outcompete Nvidia on speed, efficiency, and cost. Feldman argues that we are still early across compute, algorithms, and data, with vast gains ahead from better architectures, more efficient models, and synthetic data. He also discusses export controls, China’s capabilities, the economics of AI infrastructure, and why he believes hardware providers will ultimately be more valuable than model companies over a five-year horizon.

IDEAS WORTH REMEMBERING

5 ideas

GPU architecture is fundamentally misaligned with large-scale inference needs.

GPUs were built around off-chip HBM, which is high-capacity but relatively slow; generative inference requires repeatedly moving massive model weights from memory to compute, making memory bandwidth and off-chip I/O a hard bottleneck that wastes power and limits utilization.

Wafer-scale chips with on-chip SRAM can dramatically improve inference performance and efficiency.

By using an entire wafer packed with fast SRAM and many identical tiles plus redundancy, Cerebras keeps far more data on-chip, cuts I/O power, overcomes yield issues, and can run large models on far fewer devices, unlocking speed and power advantages over GPU clusters.

AI algorithms and utilization are still highly inefficient, leaving large headroom for improvement.

Current inference workloads often leave GPUs 93–95% idle, and many models are all-to-all connected; algorithmic advances such as sparsity, MoEs, and better architectures can both reduce compute waste and improve quality, driving down cost per token significantly over time.

Inference will dwarf training as AI shifts from novelty to daily workflow.

Feldman frames inference demand as the number of users × usage frequency × compute per use; all three are rising simultaneously as AI moves into mainstream workflows, meaning inference capacity and efficiency will dominate infrastructure spending in the coming years.

Synthetic data will likely become the dominant source for training.

He expects most training data to be synthetic within five years, particularly for rare or dangerous scenarios (like edge cases in flying or driving), where simulators can generate high-value examples more efficiently than collecting real-world data.

WORDS WORTH SAVING

5 quotes

Our AI algorithms today are not particularly efficient. In a GPU, most of the time it's doing inference, it's 5 or 7% utilized.

— Andrew Feldman

The fundamental architecture of the GPU with off-chip memory is not great for inferencing… it can be beaten and I think they know it.

— Andrew Feldman

We won't be as dependent on transformers in three years or five years as we are now, 100%.

— Andrew Feldman

There are no examples in compute in 50 years in which by making things cheaper and faster, the market got smaller.

— Andrew Feldman

In five years from now Nvidia's gonna have somewhere between 50 and 60% of the market. Right now they have approximately all of it.

— Andrew Feldman

Founding thesis of Cerebras and wafer-scale chip architectureGPU limitations for inference and Nvidia’s structural vulnerabilitiesAI inference economics: utilization, power, and cost trajectoriesFuture of algorithms: beyond transformers, sparsity, and MoEsScaling of AI demand, data centers, and energy constraintsGeopolitics, export controls, and China’s AI/semiconductor ambitionsCerebras’ business model, G42 partnership, and public company strategy

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.