No Priors Ep. 31 | With Cerebras CEO Andrew Feldman

The GPU supply crunch is causing desperation amongst AI teams large and small. Cerebras Systems has an answer, and it’s a chip the size of a dinner plate. Andrew Feldman, CEO and Co-founder of Cerebras and previously SeaMicro, joins Sarah Guo and Elad Gil this week on No Priors. They discuss why there might be an alternative to Nvidia, localized models and predictions for the accelerator market. As the CEO of Cerebras, Andrew is building a new class of computer to accelerate AI work beyond the current state of the art. He is an entrepreneur dedicated to pushing boundaries in the computer space, and his passion is building teams that solve hard problems. 00:00 - Cerebras Systems CEO Discusses AI Supercomputers 07:03 - AI Advancement in Architecture and Training 16:58 - Future of AI Accelerators and Chip Specialization 26:38 - Scaling Open Source Models and Fine-Tuning

Elad GilhostAndrew FeldmanguestSarah Guohost

Sep 6, 202330mWatch on YouTube ↗

WHAT IT’S REALLY ABOUT

Cerebras CEO on Beating GPU Bottlenecks With Wafer-Scale AI Supercomputers

Andrew Feldman, CEO of Cerebras, explains how wafer-scale dataflow chips and disaggregated memory architectures can radically reduce AI training times and bypass today’s GPU supply crunch.
He describes Cerebras’ partnership with G42 to build nine AI supercomputers totaling 36 exaflops, and how surplus compute capacity led them to train and open-source state-of-the-art models like BTLM and the largest Arabic LLM.
Feldman contrasts chip startups with software businesses, outlines the economics and risks of building custom silicon, and analyzes NVIDIA’s dominance amid a deeply inflexible semiconductor supply chain.
He also explores model scaling limits, inference costs, multilingual and culturally specific LLMs, and why proprietary enterprise data sets will become the key differentiator in the next wave of AI.

IDEAS WORTH REMEMBERING

5 ideas

Purpose-built wafer-scale chips can drastically simplify and accelerate AI training.

Cerebras’ dinner-plate–sized wafer-scale processor with 850,000 dataflow tiles and abundant on-chip SRAM runs models in pure data parallel fashion, eliminating complex tensor/model parallelism and cutting training projects from months to days.

Disaggregating memory from compute unlocks flexible, larger models and easier debugging.

By storing parameters off-chip in a perimeter memory store, Cerebras lets users mix-and-match compute and memory independently, supporting trillion-parameter-scale models on a single system for experimentation without GPU-style memory limits.

The current AI compute crunch stems from a rigid, slow-moving chip supply chain.

TSMC’s advanced fabs are capital-intensive and capacity-constrained, so when demand (like NVIDIA’s) is massively underestimated, ramping supply takes 6–8 months; this structural inflexibility guarantees ongoing shortages and persistent under-forecasting.

Model size must be balanced against inference cost and business use cases.

Hyper-large models (e.g., 175B+ parameters) are expensive to train and up to ~17.5x more expensive to serve than 10B models; many production deployments will cluster around 3–13B parameters where performance is strong but inference is affordable.

Multilingual and culturally specific LLMs require careful tokenization and data curation.

Underrepresented languages like Arabic, Hebrew, or Hindi have fewer tokens and more complex scripts, so naive tokenization and English-dominated corpora can wash out cultural nuance—creating demand for localized data sets and bespoke training efforts.

WORDS WORTH SAVING

5 quotes

Why would a machine built for pushing pixels to a monitor be ideal for AI?

— Andrew Feldman

How long it takes your customer to train a model is the answer.

— Andrew Feldman

We went to a technology that had never worked before, called wafer-scale, and we built a chip that's sort of the size of a dinner plate.

— Andrew Feldman

NVIDIA's made hay when the sun shines… I think they're now in a situation where they're extorting customers, they're extremely expensive, they're unable to ship.

— Andrew Feldman

Data is the new gold… companies that have years of research and exceptional data are gonna step into the fore.

— Andrew Feldman

Cerebras’ wafer-scale chip architecture and dataflow design for AI workloadsGPU constraints, NVIDIA’s market power, and the global AI compute crunchPartnership with G42 and deployment of massive AI supercomputersOpen-source models (BTLM, Arabic LLM) and ease of training on Cerebras systemsModel size trade-offs, inference costs, and the future of training vs. inference hardwareChallenges and structure of building a semiconductor startup vs. software startupsMultilingual and culturally grounded LLMs and the strategic value of proprietary data

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.