No PriorsNo Priors Ep. 31 | With Cerebras CEO Andrew Feldman
At a glance
WHAT IT’S REALLY ABOUT
Cerebras CEO on Beating GPU Bottlenecks With Wafer-Scale AI Supercomputers
- Andrew Feldman, CEO of Cerebras, explains how wafer-scale dataflow chips and disaggregated memory architectures can radically reduce AI training times and bypass today’s GPU supply crunch.
- He describes Cerebras’ partnership with G42 to build nine AI supercomputers totaling 36 exaflops, and how surplus compute capacity led them to train and open-source state-of-the-art models like BTLM and the largest Arabic LLM.
- Feldman contrasts chip startups with software businesses, outlines the economics and risks of building custom silicon, and analyzes NVIDIA’s dominance amid a deeply inflexible semiconductor supply chain.
- He also explores model scaling limits, inference costs, multilingual and culturally specific LLMs, and why proprietary enterprise data sets will become the key differentiator in the next wave of AI.
IDEAS WORTH REMEMBERING
5 ideasPurpose-built wafer-scale chips can drastically simplify and accelerate AI training.
Cerebras’ dinner-plate–sized wafer-scale processor with 850,000 dataflow tiles and abundant on-chip SRAM runs models in pure data parallel fashion, eliminating complex tensor/model parallelism and cutting training projects from months to days.
Disaggregating memory from compute unlocks flexible, larger models and easier debugging.
By storing parameters off-chip in a perimeter memory store, Cerebras lets users mix-and-match compute and memory independently, supporting trillion-parameter-scale models on a single system for experimentation without GPU-style memory limits.
The current AI compute crunch stems from a rigid, slow-moving chip supply chain.
TSMC’s advanced fabs are capital-intensive and capacity-constrained, so when demand (like NVIDIA’s) is massively underestimated, ramping supply takes 6–8 months; this structural inflexibility guarantees ongoing shortages and persistent under-forecasting.
Model size must be balanced against inference cost and business use cases.
Hyper-large models (e.g., 175B+ parameters) are expensive to train and up to ~17.5x more expensive to serve than 10B models; many production deployments will cluster around 3–13B parameters where performance is strong but inference is affordable.
Multilingual and culturally specific LLMs require careful tokenization and data curation.
Underrepresented languages like Arabic, Hebrew, or Hindi have fewer tokens and more complex scripts, so naive tokenization and English-dominated corpora can wash out cultural nuance—creating demand for localized data sets and bespoke training efforts.
WORDS WORTH SAVING
5 quotesWhy would a machine built for pushing pixels to a monitor be ideal for AI?
— Andrew Feldman
How long it takes your customer to train a model is the answer.
— Andrew Feldman
We went to a technology that had never worked before, called wafer-scale, and we built a chip that's sort of the size of a dinner plate.
— Andrew Feldman
NVIDIA's made hay when the sun shines… I think they're now in a situation where they're extorting customers, they're extremely expensive, they're unable to ship.
— Andrew Feldman
Data is the new gold… companies that have years of research and exceptional data are gonna step into the fore.
— Andrew Feldman
High quality AI-generated summary created from speaker-labeled transcript.
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome