
Richard Sutton – Father of RL thinks LLMs are a dead end
Richard Sutton (guest), Dwarkesh Patel (host)
In this episode of Dwarkesh Podcast, featuring Richard Sutton and Dwarkesh Patel, Richard Sutton – Father of RL thinks LLMs are a dead end explores richard Sutton: Why Reinforcement Learning Beats LLMs for Real Intelligence Richard Sutton argues that large language models (LLMs) are fundamentally limited because they imitate human text without grounding in experience, goals, or consequences, whereas reinforcement learning (RL) is built around agents acting in the world, getting reward, and learning from outcomes over time.
Richard Sutton: Why Reinforcement Learning Beats LLMs for Real Intelligence
Richard Sutton argues that large language models (LLMs) are fundamentally limited because they imitate human text without grounding in experience, goals, or consequences, whereas reinforcement learning (RL) is built around agents acting in the world, getting reward, and learning from outcomes over time.
He emphasizes that intelligence is about achieving goals via continual learning from an ongoing stream of sensation, action, and reward, and that animals (e.g., squirrels) already embody most of what matters for intelligence, with language being a relatively thin layer on top.
Sutton believes scalable AI will come from agents that learn directly from rich real-world experience, form world models that predict what actually happens, and use value functions and temporal-difference learning to bridge long-term goals and short-term actions, rather than from ever-larger supervised or imitation systems.
On the long-term future, he sees a likely “succession” from biological to designed digital intelligences as an inevitable stage of the universe, and encourages humans to view this not only through a human-centric lens but as a major cosmic transition they can be proud to have initiated.
Key Takeaways
Intelligence requires goals and experience, not just prediction of text.
Sutton insists that real intelligence is about achieving goals in the external world via actions and feedback; LLMs optimize next-token prediction without grounded goals or consequences, so they lack a principled notion of right or wrong behavior.
Get the full analysis with uListen AI
Reinforcement learning is built for continual, online learning from the world.
RL agents act, observe what happens, receive reward, and update their policies and value functions during normal interaction, allowing them to adapt to specific environments and tasks that could never be fully anticipated in training data.
Get the full analysis with uListen AI
Supervised and imitation learning are not the primary learning mechanisms in nature.
Drawing from psychology and animal learning, Sutton argues that animals largely learn via prediction and trial-and-error control, not from labeled examples of correct behavior, implying that RL-style learning is closer to biological intelligence than supervised LLM training.
Get the full analysis with uListen AI
Scalable AI will prioritize experience-based learning over human-provided knowledge.
Echoing his ‘Bitter Lesson,’ Sutton predicts that methods which rely heavily on embedded human knowledge and fixed datasets (like LLMs) will be overtaken by agents that can generate vast amounts of their own data through interaction and learn directly from it.
Get the full analysis with uListen AI
Value functions and temporal-difference learning are key to handling long-term goals.
To solve tasks with sparse, delayed rewards (like startups or winning a long game), agents must learn value functions that predict long-term outcomes and use TD learning so that incremental progress can reinforce intermediate actions along the way.
Get the full analysis with uListen AI
Current deep learning has weak, fragile generalization and transfer.
Sutton notes that gradient descent optimizes performance on seen data but does not inherently promote good generalization or transfer; phenomena like catastrophic forgetting show we still lack robust, automated mechanisms for generalization across states and tasks.
Get the full analysis with uListen AI
AI succession to digital intelligences is likely and not inherently catastrophic.
Given the absence of a unified global authority, the inevitability of solving intelligence, and the power advantages of smarter systems, Sutton expects digital or augmented intelligences to dominate over time and urges humans to see this as a major, possibly positive, transition in the history of the universe.
Get the full analysis with uListen AI
Notable Quotes
“For me, having a goal is the essence of intelligence.”
— Richard Sutton
“If we understood a squirrel, we'd be almost all the way there to understanding human intelligence.”
— Richard Sutton
“Large language models are about mimicking people… They're not about figuring out what to do.”
— Richard Sutton
“Supervised learning is not something that happens in nature… Squirrels don't go to school.”
— Richard Sutton
“I think we should be proud that we are giving rise to this great transition in the universe.”
— Richard Sutton
Questions Answered in This Episode
If intelligence fundamentally requires goals and reward, how far can LLM-style systems realistically go before they hit a hard ceiling?
Richard Sutton argues that large language models (LLMs) are fundamentally limited because they imitate human text without grounding in experience, goals, or consequences, whereas reinforcement learning (RL) is built around agents acting in the world, getting reward, and learning from outcomes over time.
Get the full analysis with uListen AI
What would a practical, large-scale continual-learning RL agent look like when embedded in messy real-world environments like companies or cities?
He emphasizes that intelligence is about achieving goals via continual learning from an ongoing stream of sensation, action, and reward, and that animals (e. ...
Get the full analysis with uListen AI
How could we design algorithms that explicitly favor good generalization and transfer, rather than relying on ad hoc human architecture choices?
Sutton believes scalable AI will come from agents that learn directly from rich real-world experience, form world models that predict what actually happens, and use value functions and temporal-difference learning to bridge long-term goals and short-term actions, rather than from ever-larger supervised or imitation systems.
Get the full analysis with uListen AI
What kinds of intrinsic motivations or reward signals should we give general-purpose AI agents to encourage safe, useful exploration and world-model building?
On the long-term future, he sees a likely “succession” from biological to designed digital intelligences as an inevitable stage of the universe, and encourages humans to view this not only through a human-centric lens but as a major cosmic transition they can be proud to have initiated.
Get the full analysis with uListen AI
In a future with many digital minds that can copy and merge, how should we think about issues of corruption, value drift, and identity continuity?
Get the full analysis with uListen AI
Transcript Preview
Why are you trying to distinguish humans? Humans are animals. What we have in common is more interesting. What distinguishes us, we should be paying less attention to.
I mean, we're trying to replicate intelligence, right? No animal can go to the moon or make semiconductors, so we wanna understand what makes humans special.
So, I like the way you consider that obvious, 'cause I consider the opposite obvious. If we understood a squirrel, we'd be almost all the way there. I am personally just kind of content being out of sync with my field for a long period of time, perhaps decades, because occasionally I have been proved right in the past. I don't think learning is really about training. It's about an active process. The child tries things and sees what happens. I think we should be proud that we are giving rise to this great transition in the universe.
Today, I'm chatting with Richard Sutton, who is one of the founding fathers of reinforcement learning, and inventor of many of the main techniques used there, like TD learning and policy gradient methods. And for that, he received this year's Turing Award, which, if you don't know, is basically the Nobel Prize for computer science. Richard, congratulations.
Thank you, Dworkis.
And, uh, thanks for coming on the podcast.
It's my pleasure.
Okay, so first question. My audience and I are familiar with the LLM way of thinking about AI. Conceptually, what are we missing in terms of thinking about AI from the RL perspective?
Well, yes, I think it's really quite a different point of view, and it's, it can easily get separated and lose the ability to talk to each other.
Mm-hmm.
And, um, yeah, large language models have become such a big thing, generative AI in general a big thing, um, and our field is subject to bandwagons and fashions, so we lose, we lose track of the, uh, basic, basic things. 'Cause I consider reinforcement learning to be basic AI, and what is intelligence? Uh, the problem is, is to understand your world.
Right.
And, um, reinforcement learning is about understanding wh- your world, whereas large language models are about mimicking people, doing what people say you should do. They're not about figuring out what to do.
Huh. I guess y- y- you would think that t- uh, to emulate the trillions of tokens in the corpus of internet text, you would have to build a world model. In fact, these models do seem to have very robust world models, and they, they're the best, um, world models we've made to date in AI, right? So, what, what, what do you think that, that's missing?
Uh, I would disagree with most of the things you just said.
(laughs) Great.
(laughs)
(laughs)
Just to mimic the, the, what people say is not really to build a model of the world at all, I don't think. You know, you're mimicking things that have, uh, a model of the world, the people.
Install uListen to search the full transcript and get AI-powered insights
Get Full TranscriptGet more from every podcast
AI summaries, searchable transcripts, and fact-checking. Free forever.
Add to Chrome