No Priors Ep. 31 | With Cerebras CEO Andrew Feldman

Name: No Priors Ep. 31 | With Cerebras CEO Andrew Feldman
Uploaded: 2023-09-07T00:00:00Z
Duration: 30 min 3 s
Description: Andrew Feldman, CEO of Cerebras, explains how wafer-scale dataflow chips and disaggregated memory architectures can radically reduce AI training times and bypass today’s GPU supply crunch.

Elad Gil and Andrew Feldman on cerebras CEO on Beating GPU Bottlenecks With Wafer-Scale AI Supercomputers.

Elad GilhostAndrew FeldmanguestSarah Guohost

Sep 7, 202330m

Cerebras’ wafer-scale chip architecture and dataflow design for AI workloadsGPU constraints, NVIDIA’s market power, and the global AI compute crunchPartnership with G42 and deployment of massive AI supercomputersOpen-source models (BTLM, Arabic LLM) and ease of training on Cerebras systemsModel size trade-offs, inference costs, and the future of training vs. inference hardwareChallenges and structure of building a semiconductor startup vs. software startupsMultilingual and culturally grounded LLMs and the strategic value of proprietary data

In this episode of No Priors, featuring Elad Gil and Andrew Feldman, No Priors Ep. 31 | With Cerebras CEO Andrew Feldman explores cerebras CEO on Beating GPU Bottlenecks With Wafer-Scale AI Supercomputers Andrew Feldman, CEO of Cerebras, explains how wafer-scale dataflow chips and disaggregated memory architectures can radically reduce AI training times and bypass today’s GPU supply crunch.

Cerebras CEO on Beating GPU Bottlenecks With Wafer-Scale AI Supercomputers

Andrew Feldman, CEO of Cerebras, explains how wafer-scale dataflow chips and disaggregated memory architectures can radically reduce AI training times and bypass today’s GPU supply crunch.

He describes Cerebras’ partnership with G42 to build nine AI supercomputers totaling 36 exaflops, and how surplus compute capacity led them to train and open-source state-of-the-art models like BTLM and the largest Arabic LLM.

Feldman contrasts chip startups with software businesses, outlines the economics and risks of building custom silicon, and analyzes NVIDIA’s dominance amid a deeply inflexible semiconductor supply chain.

He also explores model scaling limits, inference costs, multilingual and culturally specific LLMs, and why proprietary enterprise data sets will become the key differentiator in the next wave of AI.

Key Takeaways

Purpose-built wafer-scale chips can drastically simplify and accelerate AI training.

Cerebras’ dinner-plate–sized wafer-scale processor with 850,000 dataflow tiles and abundant on-chip SRAM runs models in pure data parallel fashion, eliminating complex tensor/model parallelism and cutting training projects from months to days.

Disaggregating memory from compute unlocks flexible, larger models and easier debugging.

By storing parameters off-chip in a perimeter memory store, Cerebras lets users mix-and-match compute and memory independently, supporting trillion-parameter-scale models on a single system for experimentation without GPU-style memory limits.

The current AI compute crunch stems from a rigid, slow-moving chip supply chain.

TSMC’s advanced fabs are capital-intensive and capacity-constrained, so when demand (like NVIDIA’s) is massively underestimated, ramping supply takes 6–8 months; this structural inflexibility guarantees ongoing shortages and persistent under-forecasting.

Model size must be balanced against inference cost and business use cases.

Hyper-large models (e. ...

Multilingual and culturally specific LLMs require careful tokenization and data curation.

Underrepresented languages like Arabic, Hebrew, or Hindi have fewer tokens and more complex scripts, so naive tokenization and English-dominated corpora can wash out cultural nuance—creating demand for localized data sets and bespoke training efforts.

Chip startups demand huge upfront capital, long timelines, and heavy QA investment.

Unlike SaaS, a new chip can take 2–3 years and $50–60M before first customer feedback; bugs can cost tens of millions in re-spins, so engineering organizations skew toward verification and simulation, with software still comprising ~75% of headcount.

Enterprise-owned data will be a primary competitive edge in AI.

As internet-scale scraped data becomes a commodity, organizations like Bloomberg, Reuters, and major pharma companies with rich proprietary data will drive differentiated models via training from scratch, continuous training, or high-value fine-tuning.

Notable Quotes

“Why would a machine built for pushing pixels to a monitor be ideal for AI?”
— Andrew Feldman

“How long it takes your customer to train a model is the answer.”
— Andrew Feldman

“We went to a technology that had never worked before, called wafer-scale, and we built a chip that's sort of the size of a dinner plate.”
— Andrew Feldman

“NVIDIA's made hay when the sun shines… I think they're now in a situation where they're extorting customers, they're extremely expensive, they're unable to ship.”
— Andrew Feldman

“Data is the new gold… companies that have years of research and exceptional data are gonna step into the fore.”
— Andrew Feldman

Questions Answered in This Episode

How does Cerebras’ wafer-scale approach practically change the workflow and organizational structure of ML teams compared to GPU-first setups?

Andrew Feldman, CEO of Cerebras, explains how wafer-scale dataflow chips and disaggregated memory architectures can radically reduce AI training times and bypass today’s GPU supply crunch.

What specific kinds of enterprise data sets are most likely to produce outsized gains when used for fine-tuning versus full training from scratch?

Given the rigid semiconductor supply chain, how should AI startups plan their product roadmaps around future compute constraints and delays?

What technical and ethical considerations arise when building culturally specific or national LLMs that encode local norms and values?

He also explores model scaling limits, inference costs, multilingual and culturally specific LLMs, and why proprietary enterprise data sets will become the key differentiator in the next wave of AI.

At what point do model compression, pruning, and distillation become more attractive than scaling model size further for real-world applications?

EVERY SPOKEN WORD

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome