
Sholto Douglas & Trenton Bricken — How LLMs actually think
Narrator, Dwarkesh Patel (host), Trenton Bricken (guest), Sholto Douglas (guest), Narrator, Narrator, Narrator, Narrator, Narrator
In this episode of Dwarkesh Podcast, featuring Narrator and Dwarkesh Patel, Sholto Douglas & Trenton Bricken — How LLMs actually think explores inside LLM Minds: Context Windows, Features, and Future Superintelligence Dwarkesh Patel interviews Google’s Sholto Douglas and Anthropic’s Trenton Bricken about how large language models work internally, why long context windows matter, and what an “intelligence explosion” might actually look like from the perspective of frontier researchers.
Inside LLM Minds: Context Windows, Features, and Future Superintelligence
Dwarkesh Patel interviews Google’s Sholto Douglas and Anthropic’s Trenton Bricken about how large language models work internally, why long context windows matter, and what an “intelligence explosion” might actually look like from the perspective of frontier researchers.
They describe in‑context learning as a kind of gradient descent happening inside the forward pass, argue that long context dramatically boosts effective intelligence and “working memory,” and discuss why current agentic systems are bottlenecked more by reliability than by context length.
Bricken explains mechanistic interpretability work on “features” and superposition—how many more latent concepts than neurons are packed into LLMs—and how dictionary learning might let us detect circuits for things like deception and safely ablate them in future models.
All three also reflect on AI research practice, hiring and talent development, and the risk that alignment might “succeed too well,” giving institutions extremely fine‑grained control over powerful systems.
Key Takeaways
Long context windows are a genuine capability unlock, not just a UX upgrade.
Being able to ingest hundreds of thousands or millions of tokens lets models “instant‑onboard” to complex codebases or esoteric languages and achieve performance jumps comparable to large increases in model scale—giving them a form of working memory far beyond humans.
Get the full analysis with uListen AI
In‑context learning behaves like gradient descent happening inside the forward pass.
Work the guests cite shows that as you give more examples in context (e. ...
Get the full analysis with uListen AI
The main blocker for agents is reliability, not horizon length or context.
Chaining many tasks multiplies error probabilities, so even 90% per‑step accuracy fails over long workflows; small “extra nines” of reliability likely unlock agentic behavior more than raw context size, which so far hasn’t been the dominant constraint.
Get the full analysis with uListen AI
Compute and synthetic data may drive an “intelligence explosion” more than new algorithms.
Douglas argues more compute directly accelerates research (he estimates ~5× speedup from 10× compute), while both guests think high‑quality, reasoning‑dense synthetic data generated by stronger models could become a primary driver of further capability gains.
Get the full analysis with uListen AI
LLMs appear under‑parameterized and rely heavily on superposition to compress concepts.
Bricken explains that with high‑dimensional, sparse real‑world data, models learn to pack many more “features” than neurons into shared activation space; this compression makes neurons look polysemantic and motivates moving to a feature‑based view of model internals.
Get the full analysis with uListen AI
Dictionary learning can reveal human‑meaningful features and circuits inside models.
By projecting activations into a higher‑dimensional, sparse space and then back, Anthropic finds monosemantic features (e. ...
Get the full analysis with uListen AI
Agency and problem selection matter as much as raw technical skill in frontier labs.
Both guests attribute their outsized impact to aggressively choosing high‑leverage unsolved problems, pushing past organizational blockers, and iterating quickly—often aided by mentors who deliberately “bootstrap” non‑traditional candidates into central research roles.
Get the full analysis with uListen AI
Notable Quotes
“This allows them to know things that you don’t in a way that like, it just ingests a huge amount of information in a way you just can’t.”
— Sholto Douglas
“Most intelligence is pattern matching. And you can do a lot of really good pattern matching if you have a hierarchy of associative memories.”
— Trenton Bricken
“It keeps me up at night how quickly the models are becoming more capable and, like, just how poor our understanding still is of what’s going on.”
— Trenton Bricken
“If you do everything, you’ll win.”
— Sholto Douglas
“You should thank the model by giving it a sequence that’s very easy to predict.”
— Trenton Bricken
Questions Answered in This Episode
If long‑context in‑context learning is already superhuman in some respects, what concrete new economic or scientific workflows could that enable over the next few years?
Dwarkesh Patel interviews Google’s Sholto Douglas and Anthropic’s Trenton Bricken about how large language models work internally, why long context windows matter, and what an “intelligence explosion” might actually look like from the perspective of frontier researchers.
Get the full analysis with uListen AI
How confident should we be that circuits for high‑level concepts like deception, loyalty, or ambition will have clean, localizable signatures in very large, superhuman models?
They describe in‑context learning as a kind of gradient descent happening inside the forward pass, argue that long context dramatically boosts effective intelligence and “working memory,” and discuss why current agentic systems are bottlenecked more by reliability than by context length.
Get the full analysis with uListen AI
At what point does synthetic data generation by models themselves become more important than scraping new human data, and how would we know if we’ve crossed that threshold?
Bricken explains mechanistic interpretability work on “features” and superposition—how many more latent concepts than neurons are packed into LLMs—and how dictionary learning might let us detect circuits for things like deception and safely ablate them in future models.
Get the full analysis with uListen AI
How might extremely fine‑grained interpretability and editability of model internals change the balance of power between companies, governments, and individuals?
All three also reflect on AI research practice, hiring and talent development, and the risk that alignment might “succeed too well,” giving institutions extremely fine‑grained control over powerful systems.
Get the full analysis with uListen AI
What kinds of empirical evidence would most change your mind about whether AI progress will be driven by steady scaling and engineering or by some qualitatively new algorithmic breakthrough?
Get the full analysis with uListen AI
Transcript Preview
(laughs) It's right after this, and you ruin it. (laughs)
(laughs)
Oh, my God. (laughs)
You're failing the line test right now, really badly. This is like...
Yeah, it is. It is. (laughs)
I'm like, "Wait, really?"
"Can we drink on our glasses?"
That's funny. (laughs)
The glass go? (laughs)
Yeah, let's go. Uh... (laughs)
Oh my God, dude. I'm like, I feel like leaving the house.
(laughs)
My backpack is like, launching...
(laughs)
(laughs) Uh...
Let's get like no context on the chair.
(laughs)
(laughs)
Let's go. (laughs)
Dude, it is literally falling over.
Yeah. It's like... (laughs)
Have you seen the videos?
Yeah.
(laughs)
I think the video has shown it enough that we can almost live it out.
Let's do it.
Like, don't want to collapse it.
(laughs)
(laughs)
Okay. Today I have, uh, the pleasure to talk with two of my good friends, Sholto and Trenton. Um, Sholto-
You just got us mixed up. (laughs)
(laughs)
(laughs)
I knew we did. (laughs) I wasn't going to say anything.
So let's do this in reverse.
(laughs)
How about I started with "my good friends"? (laughs)
Yeah, Gemini 1.5, the context length, just wow.
(laughs)
(laughs) Oh shit. Anyways, um, Sholto, uh, Noam Brown... (laughs)
(laughs)
Noam Brown, the guy who wrote the Diplomacy paper, he said this about Sholto. He said, "He's only been in the field for 1.5 years, but people in AI know that he was one of the most important people behind Gemini's success." Um, and Trenton, who's at Anthropic, uh, works on mechanistic interpretability, and it was widely reported that he has solved alignment.
(laughs)
(laughs)
With his recent paper.
He read random Twitter.
Oh. (laughs)
On, uh... Um, so this will be a capabilities only podcast. Alignment is already solved, so no need to discuss further. Um, okay, so let's start by talking about context lengths.
Yep.
G- it seemed to be under hyped given how important it seems to me to be that you can just put a million tokens into context. There's apparently some other news that, you know, got pushed to the front for some reason. But, um, yeah, I- uh, is, tell me about how you see the future of long context lengths and what that implies for these models?
Yeah. So I think it's really under hyped because until I started working on it, I didn't really appreciate how much of a step up in intelligence it was for the model to be, have the onboarding problem basically instantly solved. Um, and you can see that a little bit in the perplexity graphs in the paper, where just throwing millions of tokens' worth of context about a code base allows it to become dramatically better at predicting the next token in a way that you'd normally associate with huge increments in model's scale. But you don't need that. All you need is like a new context. Um, so under hyped, uh, and yeah, buried by some other news.
Install uListen to search the full transcript and get AI-powered insights
Get Full TranscriptGet more from every podcast
AI summaries, searchable transcripts, and fact-checking. Free forever.
Add to Chrome