Sholto Douglas & Trenton Bricken — How LLMs actually think

Sholto Douglas & Trenton Bricken — How LLMs actually think

Dwarkesh PodcastMar 28, 20243h 13m

Narrator, Dwarkesh Patel (host), Trenton Bricken (guest), Sholto Douglas (guest), Narrator, Narrator, Narrator, Narrator, Narrator

Long context windows and their impact on LLM capabilitiesIn‑context learning as gradient descent within the forward passReliability, long‑horizon tasks, and why agents haven’t yet taken offIntelligence explosion dynamics: compute, synthetic data, and research automationMechanistic interpretability: residual streams, features, and superpositionDictionary learning and circuits for deception, safety, and controlResearch practice, talent pipelines, and agency in frontier AI labs

In this episode of Dwarkesh Podcast, featuring Narrator and Dwarkesh Patel, Sholto Douglas & Trenton Bricken — How LLMs actually think explores inside LLM Minds: Context Windows, Features, and Future Superintelligence Dwarkesh Patel interviews Google’s Sholto Douglas and Anthropic’s Trenton Bricken about how large language models work internally, why long context windows matter, and what an “intelligence explosion” might actually look like from the perspective of frontier researchers.

Inside LLM Minds: Context Windows, Features, and Future Superintelligence

Dwarkesh Patel interviews Google’s Sholto Douglas and Anthropic’s Trenton Bricken about how large language models work internally, why long context windows matter, and what an “intelligence explosion” might actually look like from the perspective of frontier researchers.

They describe in‑context learning as a kind of gradient descent happening inside the forward pass, argue that long context dramatically boosts effective intelligence and “working memory,” and discuss why current agentic systems are bottlenecked more by reliability than by context length.

Bricken explains mechanistic interpretability work on “features” and superposition—how many more latent concepts than neurons are packed into LLMs—and how dictionary learning might let us detect circuits for things like deception and safely ablate them in future models.

All three also reflect on AI research practice, hiring and talent development, and the risk that alignment might “succeed too well,” giving institutions extremely fine‑grained control over powerful systems.

Key Takeaways

Long context windows are a genuine capability unlock, not just a UX upgrade.

Being able to ingest hundreds of thousands or millions of tokens lets models “instant‑onboard” to complex codebases or esoteric languages and achieve performance jumps comparable to large increases in model scale—giving them a form of working memory far beyond humans.

Get the full analysis with uListen AI

In‑context learning behaves like gradient descent happening inside the forward pass.

Work the guests cite shows that as you give more examples in context (e. ...

Get the full analysis with uListen AI

The main blocker for agents is reliability, not horizon length or context.

Chaining many tasks multiplies error probabilities, so even 90% per‑step accuracy fails over long workflows; small “extra nines” of reliability likely unlock agentic behavior more than raw context size, which so far hasn’t been the dominant constraint.

Get the full analysis with uListen AI

Compute and synthetic data may drive an “intelligence explosion” more than new algorithms.

Douglas argues more compute directly accelerates research (he estimates ~5× speedup from 10× compute), while both guests think high‑quality, reasoning‑dense synthetic data generated by stronger models could become a primary driver of further capability gains.

Get the full analysis with uListen AI

LLMs appear under‑parameterized and rely heavily on superposition to compress concepts.

Bricken explains that with high‑dimensional, sparse real‑world data, models learn to pack many more “features” than neurons into shared activation space; this compression makes neurons look polysemantic and motivates moving to a feature‑based view of model internals.

Get the full analysis with uListen AI

Dictionary learning can reveal human‑meaningful features and circuits inside models.

By projecting activations into a higher‑dimensional, sparse space and then back, Anthropic finds monosemantic features (e. ...

Get the full analysis with uListen AI

Agency and problem selection matter as much as raw technical skill in frontier labs.

Both guests attribute their outsized impact to aggressively choosing high‑leverage unsolved problems, pushing past organizational blockers, and iterating quickly—often aided by mentors who deliberately “bootstrap” non‑traditional candidates into central research roles.

Get the full analysis with uListen AI

Notable Quotes

This allows them to know things that you don’t in a way that like, it just ingests a huge amount of information in a way you just can’t.

Sholto Douglas

Most intelligence is pattern matching. And you can do a lot of really good pattern matching if you have a hierarchy of associative memories.

Trenton Bricken

It keeps me up at night how quickly the models are becoming more capable and, like, just how poor our understanding still is of what’s going on.

Trenton Bricken

If you do everything, you’ll win.

Sholto Douglas

You should thank the model by giving it a sequence that’s very easy to predict.

Trenton Bricken

Questions Answered in This Episode

If long‑context in‑context learning is already superhuman in some respects, what concrete new economic or scientific workflows could that enable over the next few years?

Dwarkesh Patel interviews Google’s Sholto Douglas and Anthropic’s Trenton Bricken about how large language models work internally, why long context windows matter, and what an “intelligence explosion” might actually look like from the perspective of frontier researchers.

Get the full analysis with uListen AI

How confident should we be that circuits for high‑level concepts like deception, loyalty, or ambition will have clean, localizable signatures in very large, superhuman models?

They describe in‑context learning as a kind of gradient descent happening inside the forward pass, argue that long context dramatically boosts effective intelligence and “working memory,” and discuss why current agentic systems are bottlenecked more by reliability than by context length.

Get the full analysis with uListen AI

At what point does synthetic data generation by models themselves become more important than scraping new human data, and how would we know if we’ve crossed that threshold?

Bricken explains mechanistic interpretability work on “features” and superposition—how many more latent concepts than neurons are packed into LLMs—and how dictionary learning might let us detect circuits for things like deception and safely ablate them in future models.

Get the full analysis with uListen AI

How might extremely fine‑grained interpretability and editability of model internals change the balance of power between companies, governments, and individuals?

All three also reflect on AI research practice, hiring and talent development, and the risk that alignment might “succeed too well,” giving institutions extremely fine‑grained control over powerful systems.

Get the full analysis with uListen AI

What kinds of empirical evidence would most change your mind about whether AI progress will be driven by steady scaling and engineering or by some qualitatively new algorithmic breakthrough?

Get the full analysis with uListen AI

Transcript Preview

Narrator

(laughs) It's right after this, and you ruin it. (laughs)

Dwarkesh Patel

(laughs)

Sholto Douglas

Oh, my God. (laughs)

Trenton Bricken

You're failing the line test right now, really badly. This is like...

Narrator

Yeah, it is. It is. (laughs)

Trenton Bricken

I'm like, "Wait, really?"

Dwarkesh Patel

"Can we drink on our glasses?"

Narrator

That's funny. (laughs)

Dwarkesh Patel

The glass go? (laughs)

Sholto Douglas

Yeah, let's go. Uh... (laughs)

Dwarkesh Patel

Oh my God, dude. I'm like, I feel like leaving the house.

Narrator

(laughs)

Dwarkesh Patel

My backpack is like, launching...

Narrator

(laughs)

Trenton Bricken

(laughs) Uh...

Dwarkesh Patel

Let's get like no context on the chair.

Narrator

(laughs)

Trenton Bricken

(laughs)

Dwarkesh Patel

Let's go. (laughs)

Sholto Douglas

Dude, it is literally falling over.

Narrator

Yeah. It's like... (laughs)

Dwarkesh Patel

Have you seen the videos?

Sholto Douglas

Yeah.

Narrator

(laughs)

Dwarkesh Patel

I think the video has shown it enough that we can almost live it out.

Sholto Douglas

Let's do it.

Narrator

Like, don't want to collapse it.

Sholto Douglas

(laughs)

Narrator

(laughs)

Dwarkesh Patel

Okay. Today I have, uh, the pleasure to talk with two of my good friends, Sholto and Trenton. Um, Sholto-

Sholto Douglas

You just got us mixed up. (laughs)

Narrator

(laughs)

Trenton Bricken

(laughs)

Dwarkesh Patel

I knew we did. (laughs) I wasn't going to say anything.

Narrator

So let's do this in reverse.

Dwarkesh Patel

(laughs)

Narrator

How about I started with "my good friends"? (laughs)

Dwarkesh Patel

Yeah, Gemini 1.5, the context length, just wow.

Narrator

(laughs)

Dwarkesh Patel

(laughs) Oh shit. Anyways, um, Sholto, uh, Noam Brown... (laughs)

Narrator

(laughs)

Dwarkesh Patel

Noam Brown, the guy who wrote the Diplomacy paper, he said this about Sholto. He said, "He's only been in the field for 1.5 years, but people in AI know that he was one of the most important people behind Gemini's success." Um, and Trenton, who's at Anthropic, uh, works on mechanistic interpretability, and it was widely reported that he has solved alignment.

Narrator

(laughs)

Sholto Douglas

(laughs)

Dwarkesh Patel

With his recent paper.

Trenton Bricken

He read random Twitter.

Narrator

Oh. (laughs)

Dwarkesh Patel

On, uh... Um, so this will be a capabilities only podcast. Alignment is already solved, so no need to discuss further. Um, okay, so let's start by talking about context lengths.

Trenton Bricken

Yep.

Dwarkesh Patel

G- it seemed to be under hyped given how important it seems to me to be that you can just put a million tokens into context. There's apparently some other news that, you know, got pushed to the front for some reason. But, um, yeah, I- uh, is, tell me about how you see the future of long context lengths and what that implies for these models?

Sholto Douglas

Yeah. So I think it's really under hyped because until I started working on it, I didn't really appreciate how much of a step up in intelligence it was for the model to be, have the onboarding problem basically instantly solved. Um, and you can see that a little bit in the perplexity graphs in the paper, where just throwing millions of tokens' worth of context about a code base allows it to become dramatically better at predicting the next token in a way that you'd normally associate with huge increments in model's scale. But you don't need that. All you need is like a new context. Um, so under hyped, uh, and yeah, buried by some other news.

Install uListen to search the full transcript and get AI-powered insights

Get Full Transcript

Get more from every podcast

AI summaries, searchable transcripts, and fact-checking. Free forever.

Add to Chrome