No Priors Ep. 31 | With Cerebras CEO Andrew Feldman

Name: No Priors Ep. 31 | With Cerebras CEO Andrew Feldman
Uploaded: 2023-09-07T12:00:00Z
Duration: 30 min 3 s
Description: Andrew Feldman, CEO of Cerebras, explains how wafer-scale dataflow chips and disaggregated memory architectures can radically reduce AI training times and bypass today’s GPU supply crunch.

No PriorsSep 7, 202330m

Elad Gil (host), Andrew Feldman (guest), Sarah Guo (host)

Cerebras’ wafer-scale chip architecture and dataflow design for AI workloadsGPU constraints, NVIDIA’s market power, and the global AI compute crunchPartnership with G42 and deployment of massive AI supercomputersOpen-source models (BTLM, Arabic LLM) and ease of training on Cerebras systemsModel size trade-offs, inference costs, and the future of training vs. inference hardwareChallenges and structure of building a semiconductor startup vs. software startupsMultilingual and culturally grounded LLMs and the strategic value of proprietary data

In this episode of No Priors, featuring Elad Gil and Andrew Feldman, No Priors Ep. 31 | With Cerebras CEO Andrew Feldman explores cerebras CEO on Beating GPU Bottlenecks With Wafer-Scale AI Supercomputers Andrew Feldman, CEO of Cerebras, explains how wafer-scale dataflow chips and disaggregated memory architectures can radically reduce AI training times and bypass today’s GPU supply crunch.

Cerebras CEO on Beating GPU Bottlenecks With Wafer-Scale AI Supercomputers

Andrew Feldman, CEO of Cerebras, explains how wafer-scale dataflow chips and disaggregated memory architectures can radically reduce AI training times and bypass today’s GPU supply crunch.

He describes Cerebras’ partnership with G42 to build nine AI supercomputers totaling 36 exaflops, and how surplus compute capacity led them to train and open-source state-of-the-art models like BTLM and the largest Arabic LLM.

Feldman contrasts chip startups with software businesses, outlines the economics and risks of building custom silicon, and analyzes NVIDIA’s dominance amid a deeply inflexible semiconductor supply chain.

He also explores model scaling limits, inference costs, multilingual and culturally specific LLMs, and why proprietary enterprise data sets will become the key differentiator in the next wave of AI.

Key Takeaways

Purpose-built wafer-scale chips can drastically simplify and accelerate AI training.

Cerebras’ dinner-plate–sized wafer-scale processor with 850,000 dataflow tiles and abundant on-chip SRAM runs models in pure data parallel fashion, eliminating complex tensor/model parallelism and cutting training projects from months to days.

Get the full analysis with uListen AI

Disaggregating memory from compute unlocks flexible, larger models and easier debugging.

By storing parameters off-chip in a perimeter memory store, Cerebras lets users mix-and-match compute and memory independently, supporting trillion-parameter-scale models on a single system for experimentation without GPU-style memory limits.

Get the full analysis with uListen AI

The current AI compute crunch stems from a rigid, slow-moving chip supply chain.

TSMC’s advanced fabs are capital-intensive and capacity-constrained, so when demand (like NVIDIA’s) is massively underestimated, ramping supply takes 6–8 months; this structural inflexibility guarantees ongoing shortages and persistent under-forecasting.

Get the full analysis with uListen AI

Model size must be balanced against inference cost and business use cases.

Hyper-large models (e. ...

Get the full analysis with uListen AI

Multilingual and culturally specific LLMs require careful tokenization and data curation.

Underrepresented languages like Arabic, Hebrew, or Hindi have fewer tokens and more complex scripts, so naive tokenization and English-dominated corpora can wash out cultural nuance—creating demand for localized data sets and bespoke training efforts.

Get the full analysis with uListen AI

Chip startups demand huge upfront capital, long timelines, and heavy QA investment.

Unlike SaaS, a new chip can take 2–3 years and $50–60M before first customer feedback; bugs can cost tens of millions in re-spins, so engineering organizations skew toward verification and simulation, with software still comprising ~75% of headcount.

Get the full analysis with uListen AI

Enterprise-owned data will be a primary competitive edge in AI.

As internet-scale scraped data becomes a commodity, organizations like Bloomberg, Reuters, and major pharma companies with rich proprietary data will drive differentiated models via training from scratch, continuous training, or high-value fine-tuning.

Get the full analysis with uListen AI

Notable Quotes

“Why would a machine built for pushing pixels to a monitor be ideal for AI?”
— Andrew Feldman

“How long it takes your customer to train a model is the answer.”
— Andrew Feldman

“We went to a technology that had never worked before, called wafer-scale, and we built a chip that's sort of the size of a dinner plate.”
— Andrew Feldman

“NVIDIA's made hay when the sun shines… I think they're now in a situation where they're extorting customers, they're extremely expensive, they're unable to ship.”
— Andrew Feldman

“Data is the new gold… companies that have years of research and exceptional data are gonna step into the fore.”
— Andrew Feldman

Questions Answered in This Episode

How does Cerebras’ wafer-scale approach practically change the workflow and organizational structure of ML teams compared to GPU-first setups?

Andrew Feldman, CEO of Cerebras, explains how wafer-scale dataflow chips and disaggregated memory architectures can radically reduce AI training times and bypass today’s GPU supply crunch.

Get the full analysis with uListen AI

What specific kinds of enterprise data sets are most likely to produce outsized gains when used for fine-tuning versus full training from scratch?

Get the full analysis with uListen AI

Given the rigid semiconductor supply chain, how should AI startups plan their product roadmaps around future compute constraints and delays?

Get the full analysis with uListen AI

What technical and ethical considerations arise when building culturally specific or national LLMs that encode local norms and values?

He also explores model scaling limits, inference costs, multilingual and culturally specific LLMs, and why proprietary enterprise data sets will become the key differentiator in the next wave of AI.

Get the full analysis with uListen AI

At what point do model compression, pruning, and distillation become more attractive than scaling model size further for real-world applications?

Get the full analysis with uListen AI

Transcript Preview

Elad Gil

The shortage of compute for AI, also known as the GPU crunch, is increasingly impacting AI companies big and small. It is delaying training runs and launches for multiple players in the generative AI world. One company coming to the rescue is Cerebras Systems, which is developing the largest computer chip and one of the fastest purpose-built AI processors ever. This week, Sarah and I are joined by Andrew Feldman, CEO of Cerebras Systems. Andrew is one of the few entrepreneurial veterans in the semiconductor world. He previously started SeaMicro, a pioneer of energy-efficient, high-bandwidth micro servers. SeaMicro was acquired by AMD in 2012. Andrew, thank you so much for joining us today.

Andrew Feldman

Elan and Sarah, thank you so much for having me.

Elad Gil

I, I think you all recently announced that Cerebras has closed 100 million dollar deal with G42 to develop one of the largest AI supercomputers in the world.

Andrew Feldman

We, we did announce a strategic partnership with a group called G42, (clears throat) and we announced that we were building nine, uh, supercomputers. Each supercomputer would be four exaflops of AI compute, so (laughs) in total, 36, uh, exaflops of AI compute. Th- th- that was extraordinarily exciting. When you encounter a partner that, that shares your vision and w- wants to build with you (laughs) a- and, uh, you get to start building the biggest computer on earth. I mean, it had (laughs) ... It doesn't get better than that.

Elad Gil

And I think, in general, you all were very forward-thinking and early to identifying AI as a really important market for custom semiconductors. Could you tell us a little bit more about your thinking early on as you started Cerebras and why you focused on AI many years ago?

Andrew Feldman

In, in late 2015, five of us started meeting regularly. I think we were meeting in, in Sarah's offices, actually, at that point. All of us had worked together at our previous company, and we began w- working on, on ideas. And, uh, one day our CTO, Gary, leaned back and he said, "Why would a machine built for pushing pixels to a monitor be ideal for AI?" Then he said, "Wouldn't it be serendipitous if 20 years of optimizing a part for one job left it really well-suited for another?" And that got us excited. We, we began looking at GPUs, looking at AI work, and by, uh, early 2016, we, we'd decided that w- we could build a better part for this work. And our, our strategy was not to build a little bit better, but it, it was to try and do something vastly better. And we went to a technology that had never worked before, called wafer-scale, and we built a chip that's, uh, sort of the size of a dinner plate, whereas most chips are the size of a postage stamp. And we, we, we did that because we knew that this workload would go big (laughs) and that the problems of memory bandwidth and problems with breaking up work and spreading it over lots of little machines would be daunting. This is my fifth startup. This is the first time I was wrong on, uh, the market size on the under side, uh, had no idea it was gonna be this big. And I, I think very few people saw, even those of us who were in it, how big this was gonna be. And so ear- early 2016, we went out, we did eight pitches. We ra- we got eight term sheets. We, we raised money and we're building multi-exaflop AI supercomputers for, for customers around the world.

Install uListen to search the full transcript and get AI-powered insights

Get Full Transcript

Get more from every podcast

AI summaries, searchable transcripts, and fact-checking. Free forever.

Add to Chrome