Andrew Feldman, Cerebras Co-Founder and CEO: The AI Chip Wars & The Plan to Break Nvidia's Dominance

The Twenty Minute VCMar 24, 20251h 14m

Andrew Feldman (guest), Harry Stebbings (host), Narrator, Narrator

Founding thesis of Cerebras and wafer-scale chip architectureGPU limitations for inference and Nvidia’s structural vulnerabilitiesAI inference economics: utilization, power, and cost trajectoriesFuture of algorithms: beyond transformers, sparsity, and MoEsScaling of AI demand, data centers, and energy constraintsGeopolitics, export controls, and China’s AI/semiconductor ambitionsCerebras’ business model, G42 partnership, and public company strategy

In this episode of The Twenty Minute VC, featuring Andrew Feldman and Harry Stebbings, Andrew Feldman, Cerebras Co-Founder and CEO: The AI Chip Wars & The Plan to Break Nvidia's Dominance explores cerebras CEO on Beating Nvidia, AI Inference, and Future Architectures Andrew Feldman, co-founder and CEO of Cerebras, explains how AI’s explosive growth is reshaping chip design, data centers, and the balance of power in the semiconductor industry. He details why current GPU architectures—especially their reliance on off-chip HBM—are fundamentally suboptimal for large-scale inference and how Cerebras’ wafer-scale SRAM-based design aims to outcompete Nvidia on speed, efficiency, and cost. Feldman argues that we are still early across compute, algorithms, and data, with vast gains ahead from better architectures, more efficient models, and synthetic data. He also discusses export controls, China’s capabilities, the economics of AI infrastructure, and why he believes hardware providers will ultimately be more valuable than model companies over a five-year horizon.

Cerebras CEO on Beating Nvidia, AI Inference, and Future Architectures

Andrew Feldman, co-founder and CEO of Cerebras, explains how AI’s explosive growth is reshaping chip design, data centers, and the balance of power in the semiconductor industry. He details why current GPU architectures—especially their reliance on off-chip HBM—are fundamentally suboptimal for large-scale inference and how Cerebras’ wafer-scale SRAM-based design aims to outcompete Nvidia on speed, efficiency, and cost. Feldman argues that we are still early across compute, algorithms, and data, with vast gains ahead from better architectures, more efficient models, and synthetic data. He also discusses export controls, China’s capabilities, the economics of AI infrastructure, and why he believes hardware providers will ultimately be more valuable than model companies over a five-year horizon.

Key Takeaways

GPU architecture is fundamentally misaligned with large-scale inference needs.

GPUs were built around off-chip HBM, which is high-capacity but relatively slow; generative inference requires repeatedly moving massive model weights from memory to compute, making memory bandwidth and off-chip I/O a hard bottleneck that wastes power and limits utilization.

Get the full analysis with uListen AI

Wafer-scale chips with on-chip SRAM can dramatically improve inference performance and efficiency.

By using an entire wafer packed with fast SRAM and many identical tiles plus redundancy, Cerebras keeps far more data on-chip, cuts I/O power, overcomes yield issues, and can run large models on far fewer devices, unlocking speed and power advantages over GPU clusters.

Get the full analysis with uListen AI

AI algorithms and utilization are still highly inefficient, leaving large headroom for improvement.

Current inference workloads often leave GPUs 93–95% idle, and many models are all-to-all connected; algorithmic advances such as sparsity, MoEs, and better architectures can both reduce compute waste and improve quality, driving down cost per token significantly over time.

Get the full analysis with uListen AI

Inference will dwarf training as AI shifts from novelty to daily workflow.

Feldman frames inference demand as the number of users × usage frequency × compute per use; all three are rising simultaneously as AI moves into mainstream workflows, meaning inference capacity and efficiency will dominate infrastructure spending in the coming years.

Get the full analysis with uListen AI

Synthetic data will likely become the dominant source for training.

He expects most training data to be synthetic within five years, particularly for rare or dangerous scenarios (like edge cases in flying or driving), where simulators can generate high-value examples more efficiently than collecting real-world data.

Get the full analysis with uListen AI

Nvidia’s biggest moat is market share and default status, not just CUDA.

Feldman argues that in inference there is effectively no CUDA lock-in—models can be moved across providers with minimal changes—but Nvidia’s entrenched position, installed base, and ecosystem habits create a powerful, often under-discussed moat.

Get the full analysis with uListen AI

Hardware providers are likely to accrue more long-term value than model companies.

Over a five-year horizon, he expects chip and systems companies to be larger than model vendors in enterprise value, as infrastructure economics stabilize and markets “weigh” durable margin and growth more than short-term hype around models.

Get the full analysis with uListen AI

Notable Quotes

“Our AI algorithms today are not particularly efficient. In a GPU, most of the time it's doing inference, it's 5 or 7% utilized.”
— Andrew Feldman

“The fundamental architecture of the GPU with off-chip memory is not great for inferencing… it can be beaten and I think they know it.”
— Andrew Feldman

“We won't be as dependent on transformers in three years or five years as we are now, 100%.”
— Andrew Feldman

“There are no examples in compute in 50 years in which by making things cheaper and faster, the market got smaller.”
— Andrew Feldman

“In five years from now Nvidia's gonna have somewhere between 50 and 60% of the market. Right now they have approximately all of it.”
— Andrew Feldman

Questions Answered in This Episode

If wafer-scale SRAM-based designs prove superior for inference, how might Nvidia adapt its architecture or product strategy to defend its dominance?

Andrew Feldman, co-founder and CEO of Cerebras, explains how AI’s explosive growth is reshaping chip design, data centers, and the balance of power in the semiconductor industry. ...

Get the full analysis with uListen AI

What concrete algorithmic innovations beyond transformers and current MoEs does Feldman expect to have the biggest impact on utilization and cost over the next five years?

Get the full analysis with uListen AI

How should policymakers balance AI export controls with the reality that nations like China can rapidly build domestic alternatives and work around restrictions?

Get the full analysis with uListen AI

As synthetic data becomes the majority of training data, what mechanisms will ensure that models remain grounded in real-world behavior and avoid compounding synthetic biases?

Get the full analysis with uListen AI

For enterprises planning AI infrastructure, when does it make sense to move off Nvidia GPUs for inference, and what trade-offs should they evaluate between performance, ecosystem, and vendor risk?

Get the full analysis with uListen AI

Transcript Preview

Andrew Feldman

Our AI algorithms today are not particularly efficient. In a GPU, most of the time it's doing inference, it's 5 or 7% utilized. That means it's 95 or 93% wasted. We won't be as dependent on transformers in three years or five years as we are now, 100%. The fundamental architecture of the GPU with off-chip memory is not great for inferencing. Now, they will continue to do well in inference, but it can be beaten and I think they know it.

Harry Stebbings

Ready to go? (upbeat music) Andrew, it is such a pleasure to meet, man. I've wanted to do this one for a while. I've heard so many good things from Eric for a long time, so thank you so much for joining me.

Andrew Feldman

Oh, ha- Harry, thank you for having me. I appreciate it.

Harry Stebbings

Not at all. This will be a fantastic conversation. I have my pen ready.

Andrew Feldman

(laughs)

Harry Stebbings

I feel like this is gonna be a learning experience for me. Um, I wanna go back to 2015. What did you and the team see in the AI landscape in 2015 that led to the founding of Cerebras?

Andrew Feldman

We, we saw the rise of a new workload, and th- this is every computer architect's dream. We, we, we saw a, a new problem to solve, and what that means is, is m- maybe you can build a new machine better suited to that problem. And so in 2015, and the credit goes to Gary and Shawn and JP and Michael, my co-founders, they, they saw on the horizon the rise of AI. And that, what that meant was there'd be a new problem for computers, that what the AI software would ask from the underlying chip, the processor, would be different. And we came to believe that we could build a, a better machine for that problem. That's what we saw. You know, obviously, we didn't see it exactly right. I underestimated it. You know, this is my fifth startup and the, the first time I, I underestimated the size of the market by a lot. (laughs) Um, you know, it, uh ... But what we did get right was that this was gonna be big and it would put a different type of pressure on a processor, and that it would put pressure on the memory bandwidth, that it would put pressure on the communication structure. So, that's what we saw. We dove in. Uh, it's been an extraordinary nine years.

Harry Stebbings

It's been an extraordinary nine years. Can you just help me understand, how does the movement into an age of AI change the requirements from a chip perspective of what is needed for a provider, and how that then resulted in how you built Cerebras?

Andrew Feldman

The way to think about a, a, a chip is it does two things. It does calculations and it moves data. Right? Th- th- this is wha- what, what, what a chip does. And, uh, sometimes along the way, it stores data. And so, uh, what AI presented was a very unusual combination of challenges. First, the underlying calculation is trivial. It's a matrix multiplication. And an FMAC can be developed by any second-year electrical engineering student. So, you say to yourself, "Holy cow, this has a huge number of very, very simple calculations." The hard part with AI work is results and intermediate results have to be moved a lot. And th- therein is the most complicated part. They have to be moved to memory and from memory, and they have to be broken up and moved among GPUs. And what we saw was that this was gonna be the hard problem. And that if we could solve for that problem, we would build an AI computer that was faster and used less power.

Install uListen to search the full transcript and get AI-powered insights

Get Full Transcript

Get more from every podcast

AI summaries, searchable transcripts, and fact-checking. Free forever.

Add to Chrome