Dwarkesh Podcast

Jeff Dean & Noam Shazeer — 25 years at Google: from PageRank to AGI

This week I welcome two of the most important technologists in any field. Jeff Dean is Google's Chief Scientist, and through 25 years at the company, has worked on basically the most transformative systems in modern computing: from MapReduce, BigTable, Tensorflow, AlphaChip, to Gemini. Noam Shazeer invented or co-invented all the main architectures and techniques that are used for modern LLMs: from the Transformer itself, to Mixture of Experts, to Mesh Tensorflow, to Gemini and many other things. We talk about their 25 years at Google, going from PageRank to MapReduce to the Transformer to MoEs to AlphaChip – and soon to ASI. 𝐄𝐏𝐈𝐒𝐎𝐃𝐄 𝐋𝐈𝐍𝐊𝐒 * Transcript: https://www.dwarkesh.com/p/jeff-dean-and-noam-shazeer * Apple Podcasts: https://podcasts.apple.com/us/podcast/jeff-dean-noam-shazeer-25-years-at-google-from-pagerank/id1516093381?i=1000691556147 * Spotify: https://open.spotify.com/episode/4atx1POpKIL8WGvdVfdnbb?si=DLn5uQYMQMWKPTTkj5pt_A 𝐒𝐏𝐎𝐍𝐒𝐎𝐑𝐒 * Meter wants to radically improve the digital world we take for granted. They’re developing a foundation model that automates network management end-to-end. To do this, they just announced a long-term partnership with Microsoft for tens of thousands of GPUs, and they’re recruiting a world class AI research team. To learn more, go to https://meter.com/dwarkesh * Scale partners with major AI labs like Meta, Google Deepmind, and OpenAI. Through Scale’s Data Foundry, labs get access to high-quality data to fuel post-training, including advanced reasoning capabilities. If you’re an AI researcher or engineer, learn about how Scale’s Data Foundry and research lab, SEAL, can help you go beyond the current frontier at https://scale.com/dwarkesh * Curious how Jane Street teaches their new traders? They use Figgie, a rapid-fire card game that simulates the most exciting parts of markets and trading. It’s become so popular that Jane Street hosts an inter-office Figgie championship every year. Download from the app store or play on your desktop at https://www.figgie.com/ To sponsor a future episode, visit https://www.dwarkesh.com/p/advertise 𝐓𝐈𝐌𝐄𝐒𝐓𝐀𝐌𝐏𝐒 00:00:00 - Intro 00:03:29 - Joining Google in 1999 00:06:20 - Future of Moore's Law 00:11:04 - Future TPUs 00:13:56 - Jeff’s undergrad thesis: parallel backprop 00:15:54 - LLMs in 2007 00:25:09 - “Holy shit” moments 00:27:28 - AI fulfills Google’s original mission 00:32:00 - Doing Search in-context 00:36:12 - The internal coding model 00:37:29 - What will 2027 models do? 00:43:20 - A new architecture every day? 00:49:10 - Automated chips and intelligence explosion 00:53:07 - Future of inference scaling 01:02:38 - Already doing multi-datacenter runs 01:08:15 - Debugging at scale 01:12:41 - Fast takeoff and superalignment 01:20:51 - A million evil Jeff Deans 01:24:22 - Fun times at Google 01:27:51 - World compute demand in 2030 01:34:37 - Getting back to modularity 01:44:48 - Keeping a giga-MoE in-memory 01:49:35 - All of Google in one model 01:57:59 - What’s missing from distillation 02:03:10 - Open research, pros and cons 02:09:58 - Going the distance

Noam ShazeerguestJeff DeanguestDwarkesh Patelhost

Feb 11, 20252h 15mWatch on YouTube ↗

WHAT IT’S REALLY ABOUT

Jeff Dean and Noam Shazeer Envision AI’s Self‑Improving, Compute‑Hungry Future

Jeff Dean and Noam Shazeer reflect on 25 years at Google, from early search and massive n‑gram language models to today’s Gemini and TPUs. They describe how hardware and algorithms have co‑evolved, enabling deep learning, mixture‑of‑experts architectures, massive context windows, and increasingly capable coding and reasoning systems. A major theme is the coming feedback loop where AI designs better AI and hardware, driving rapid progress in algorithms, chips, and large‑scale systems, with huge implications for productivity and global GDP. They also discuss modular, continually‑learning models, inference‑time scaling, multi‑datacenter training, and the need to shape powerful systems safely while exploiting their vast economic and social upside.

IDEAS WORTH REMEMBERING

5 ideas

Hardware specialization and co‑design have been crucial to modern AI progress.

General‑purpose CPUs stopped scaling as fast, so Google built TPUs and leaned into reduced‑precision linear algebra; algorithms like deep learning and transformers then evolved to exploit cheap arithmetic and tolerate quantization.

Language models evolved from n‑grams to neural nets and now power many Google products.

Early massive n‑gram models and spelling correction showed that modeling word sequences at web scale was powerful, but only with later neural architectures and huge compute did these ideas become today’s LLMs and Gemini.

Inference‑time compute is an underexploited axis for improving AI capability.

Because tokens are extremely cheap compared to human labor, there is enormous headroom to spend 10–1,000× more compute at inference, using search, multi‑step reasoning, and drafter–verifier setups to get significantly better answers.

AI is already materially boosting software development productivity and will increasingly assist research.

About 25% of characters in Google’s code commits are AI‑generated, and Dean and Shazeer foresee near‑term systems that can generate, run, and iterate on complex research experiments from high‑level natural‑language specs.

Future models may be large, sparse, modular “blobs” that grow organically and are continually updated.

They propose mixture‑of‑experts‑style architectures with specialized modules, different compute depths per query, modular training by many teams, and frequent distillation into efficient sub‑models for serving.

WORDS WORTH SAVING

5 quotes

Organizing information is clearly like a trillion‑dollar opportunity, but a trillion dollars is not cool anymore. What's cool is a quadrillion dollars.

— Noam Shazeer

The world GDP is almost certainly going to go way, way up to, like, orders of magnitude higher than it is today... due to the fact that we have all of these artificial engineers.

— Noam Shazeer

25% of the characters that we're checking into our code base these days are generated by our AI‑based coding models.

— Jeff Dean

I think one of the beauties of deep learning is you don't need to understand or hand‑engineer every last feature... as long as the collective output and characteristics of the overall system are good.

— Jeff Dean

We’re going to need, like, a million automated researchers to invent all of this stuff.

— Noam Shazeer

Early Google history: search, n‑gram language models, spelling correction, and machine translationHardware–algorithm co‑design: Moore’s Law, TPUs, quantization, and ML acceleratorsTransformers, mixture of experts, long context, and inference‑time compute scalingAI for coding and research: autonomous software engineers and automated experiment searchFuture chip design with AI, multi‑datacenter training, and massive inference demandModular, sparse, continually‑learning “blob” models and Pathways‑style architecturesSafety, alignment, deployment control, and the economic impact of AI on GDP

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.