Dwarkesh Podcast

Sholto Douglas & Trenton Bricken — How LLMs actually think

Had so much fun chatting with my good friends Trenton Bricken and Sholto Douglas on the podcast. No way to summarize it, except: * This is the best context dump out there on how LLMs are trained, what capabilities they're likely to soon have, and what exactly is going on inside them. * You would be shocked how much of what I know about this field, I've learned just from talking with them. * To the extent that you've enjoyed my other AI interviews, now you know why. There's a transcript with links to all the papers the boys were throwing down - may help you follow along. 𝐄𝐏𝐈𝐒𝐎𝐃𝐄 𝐋𝐈𝐍𝐊𝐒 * Transcript: https://www.dwarkeshpatel.com/p/sholto-douglas-trenton-bricken * Spotify: https://open.spotify.com/episode/2dtDauiE4v8ldNRqPFq0uP?si=7S4n69QuTjeYz0lZwW4xIw * Apple Podcasts: https://podcasts.apple.com/us/podcast/sholto-douglas-trenton-bricken-how-to-build-understand/id1516093381?i=1000650748087 * Trenton Bricken's twitter: https://twitter.com/TrentonBricken * Sholto Douglas's twitter: https://twitter.com/_sholtodouglas 𝐓𝐈𝐌𝐄𝐒𝐓𝐀𝐌𝐏𝐒 00:00:00 - Long contexts 00:17:04 - Intelligence is just associations 00:33:27 - Intelligence explosion & great researchers 01:07:44 - Superposition & secret communication 01:23:26 - Agents & true reasoning 01:35:32 - How Sholto & Trenton got into AI research 02:08:08 - Are feature spaces the wrong way to think about intelligence? 02:22:04 - Will interp actually work on superhuman models 02:45:57 - Sholto's technical challenge for the audience 03:04:49 - Rapid fire

Dwarkesh PatelhostTrenton BrickenguestSholto Douglasguest

Mar 28, 20243h 13mWatch on YouTube ↗

WHAT IT’S REALLY ABOUT

Inside LLM Minds: Context Windows, Features, and Future Superintelligence

Dwarkesh Patel interviews Google’s Sholto Douglas and Anthropic’s Trenton Bricken about how large language models work internally, why long context windows matter, and what an “intelligence explosion” might actually look like from the perspective of frontier researchers.
They describe in‑context learning as a kind of gradient descent happening inside the forward pass, argue that long context dramatically boosts effective intelligence and “working memory,” and discuss why current agentic systems are bottlenecked more by reliability than by context length.
Bricken explains mechanistic interpretability work on “features” and superposition—how many more latent concepts than neurons are packed into LLMs—and how dictionary learning might let us detect circuits for things like deception and safely ablate them in future models.
All three also reflect on AI research practice, hiring and talent development, and the risk that alignment might “succeed too well,” giving institutions extremely fine‑grained control over powerful systems.

IDEAS WORTH REMEMBERING

5 ideas

Long context windows are a genuine capability unlock, not just a UX upgrade.

Being able to ingest hundreds of thousands or millions of tokens lets models “instant‑onboard” to complex codebases or esoteric languages and achieve performance jumps comparable to large increases in model scale—giving them a form of working memory far beyond humans.

In‑context learning behaves like gradient descent happening inside the forward pass.

Work the guests cite shows that as you give more examples in context (e.g., for linear regression), the model’s loss drops along a curve that closely matches multiple steps of gradient descent, suggesting attention can implement on‑the‑fly learning separate from weight updates.

The main blocker for agents is reliability, not horizon length or context.

Chaining many tasks multiplies error probabilities, so even 90% per‑step accuracy fails over long workflows; small “extra nines” of reliability likely unlock agentic behavior more than raw context size, which so far hasn’t been the dominant constraint.

Compute and synthetic data may drive an “intelligence explosion” more than new algorithms.

Douglas argues more compute directly accelerates research (he estimates ~5× speedup from 10× compute), while both guests think high‑quality, reasoning‑dense synthetic data generated by stronger models could become a primary driver of further capability gains.

LLMs appear under‑parameterized and rely heavily on superposition to compress concepts.

Bricken explains that with high‑dimensional, sparse real‑world data, models learn to pack many more “features” than neurons into shared activation space; this compression makes neurons look polysemantic and motivates moving to a feature‑based view of model internals.

WORDS WORTH SAVING

5 quotes

This allows them to know things that you don’t in a way that like, it just ingests a huge amount of information in a way you just can’t.

— Sholto Douglas

Most intelligence is pattern matching. And you can do a lot of really good pattern matching if you have a hierarchy of associative memories.

— Trenton Bricken

It keeps me up at night how quickly the models are becoming more capable and, like, just how poor our understanding still is of what’s going on.

— Trenton Bricken

If you do everything, you’ll win.

— Sholto Douglas

You should thank the model by giving it a sequence that’s very easy to predict.

— Trenton Bricken

Long context windows and their impact on LLM capabilitiesIn‑context learning as gradient descent within the forward passReliability, long‑horizon tasks, and why agents haven’t yet taken offIntelligence explosion dynamics: compute, synthetic data, and research automationMechanistic interpretability: residual streams, features, and superpositionDictionary learning and circuits for deception, safety, and controlResearch practice, talent pipelines, and agency in frontier AI labs

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.