
François Chollet: Why Scaling Alone Isn’t Enough for AGI
François Chollet (guest), Garry Tan (host), Diana Hu (host), Diana Hu (host), Garry Tan (host)
In this episode of Y Combinator, featuring François Chollet and Garry Tan, François Chollet: Why Scaling Alone Isn’t Enough for AGI explores chollet argues symbolic learning and ARC benchmarks reveal scaling limits Chollet predicts “AGI around 2030,” but distinguishes automation of valuable tasks from human-like general intelligence defined by sample-efficient skill acquisition across novel domains.
Chollet argues symbolic learning and ARC benchmarks reveal scaling limits
Chollet predicts “AGI around 2030,” but distinguishes automation of valuable tasks from human-like general intelligence defined by sample-efficient skill acquisition across novel domains.
Ndea is positioned as a new ML paradigm that replaces parametric neural models with minimal symbolic programs found via “symbolic descent,” aiming for far greater data efficiency, compositionality, and inference efficiency.
He argues recent breakthroughs in coding agents come primarily from verifiable reward signals and RL-style post-training loops, not a large jump in underlying fluid intelligence.
ARC-AGI’s evolution (V1→V2→V3) is designed to track the remaining gap to human capabilities and resist benchmark “targeting,” shifting from static pattern induction to interactive, efficiency-scored agentic evaluation.
Chollet encourages more diverse AI research bets (including revived older ideas) so long as they can scale without human bottlenecks, and he shares practical lessons from building Keras into a major open-source project.
Key Takeaways
Scaling pretraining can miss key ingredients of intelligence.
ARC V1 stayed low for base LLMs despite massive scaling, which Chollet interprets as evidence that pretraining alone doesn’t produce the kind of causal, generalizable program induction associated with fluid intelligence.
Get the full analysis with uListen AI
Coding agents work because code is verifiable.
Unit tests, compilation, and execution traces create trustworthy reward signals, enabling models to generate attempts, verify outcomes, and produce dense training data via post-training loops.
Get the full analysis with uListen AI
Post-training loops can yield usefulness without much more fluid intelligence.
Chollet argues recent leaps (e. ...
Get the full analysis with uListen AI
Non-verifiable domains will improve more slowly on today’s stack.
Tasks like essay writing lack reliable formal rewards, so progress depends on costly human annotations and may stall relative to domains that can be turned into verifiable environments.
Get the full analysis with uListen AI
Symbolic program synthesis aims at optimality: smaller models, less data, better generalization.
Ndea’s approach replaces parameter fitting with searching for minimal symbolic models (MDL-style compression), which Chollet expects to be more data-efficient, compositional, and cheap at inference.
Get the full analysis with uListen AI
ARC-AGI V3 is designed to test exploration and goal inference, not just pattern modeling.
V3 drops agents into instructionless mini-games and scores action efficiency, forcing systems to gather information, infer objectives/controls, build a world model, plan, and act like humans do under novelty.
Get the full analysis with uListen AI
Alternative paradigms must be scalable without continuous human engineering.
Chollet’s filter for promising approaches (symbolic search, genetic algorithms, architecture changes, search-based optimization) is whether capability gains can come mainly from scaling compute/data/environments rather than expert hand-tuning.
Get the full analysis with uListen AI
Notable Quotes
“I think we're probably looking at AGI 2030.”
— François Chollet
“Code provides you with a verifiable reward signal.”
— François Chollet
“AGI is basically gonna be a system that can approach any new problem... with the same degree of efficiency as a human could.”
— François Chollet
“The moment models started performing well on ARC 1 was with the first reasoning models.”
— François Chollet
“I do believe that... it will turn out that it's a code base that's less than 10,000 lines of code.”
— François Chollet
Questions Answered in This Episode
What exactly does “symbolic descent” look like in practice, and how do you measure progress toward it the way we measure loss/gradients in deep learning?
Chollet predicts “AGI around 2030,” but distinguishes automation of valuable tasks from human-like general intelligence defined by sample-efficient skill acquisition across novel domains.
Get the full analysis with uListen AI
Ndea says it’s not competing with coding agents—what are the first concrete tasks where symbolic program synthesis should beat the LLM+agent harness stack?
Ndea is positioned as a new ML paradigm that replaces parametric neural models with minimal symbolic programs found via “symbolic descent,” aiming for far greater data efficiency, compositionality, and inference efficiency.
Get the full analysis with uListen AI
ARC V2 saturation depended heavily on harness engineering; how do you separate “model capability” from “human-provided strategy” when interpreting benchmark results?
He argues recent breakthroughs in coding agents come primarily from verifiable reward signals and RL-style post-training loops, not a large jump in underlying fluid intelligence.
Get the full analysis with uListen AI
For ARC-AGI V3, what prevents labs from building a giant distribution of synthetic games and effectively reintroducing memorization via scale?
ARC-AGI’s evolution (V1→V2→V3) is designed to track the remaining gap to human capabilities and resist benchmark “targeting,” shifting from static pattern induction to interactive, efficiency-scored agentic evaluation.
Get the full analysis with uListen AI
If verifiable rewards are the unlock, what are the most promising next ‘verifiable’ domains beyond code and math, and what would it take to make them verifiable?
Chollet encourages more diverse AI research bets (including revived older ideas) so long as they can scale without human bottlenecks, and he shares practical lessons from building Keras into a major open-source project.
Get the full analysis with uListen AI
Transcript Preview
I think we're probably looking at AGI twenty thirty. Around the time, uh, that we're gonna be releasing like maybe ARC six or ARC seven. You're not gonna stop, uh, AI progress. I think, I think it's too late for that. And so the next question is, okay, like AI progress is here. Uh, it's actually gonna keep accelerating. How do you make use of it? How do you leverage? How do you ride the wave? That's the question to ask.
[on-hold music] Today, we're lucky to be joined by François Chollet, founder of the ARC Prize, a global competition to solve the ARC-AGI benchmark. His latest project is Ndea, a lab exploring a new paradigm in frontier AI research. François is one of the best people in the world to help us understand the current AI moment and where all of this is going. François, thank you so much for joining us today, and congrats on the launch of ARC-AGI V3.
Thanks so much for having me. I'm super excited to be here. Super exciting time to talk about AI.
So François, tell us a little bit about India. So what exactly is it and what are you guys trying to achieve?
Right. So Ndea is this new AGI research lab, and we are trying some very different ideas. And so our goal is basically to build this new branch of machine learning that will be much closer to optimal, unlike, unlike deep learning.
All of us right now are sort of taken by what's going on with code. Uh, I have sort of this viral moment right now where I got to forty thousand stars this morning-
Oh, wow
... on, uh, GStack. So it's like, oh, this is an open source project that now is one of the biggest ones, and I have more than a hundred PRs from contributors to deal with. I guess you're, you know, one of the best people to talk to about this because you're, you're actually literally coming up with something that is a totally different pathway.
That's right. That's right. So, uh, what we're doing at Ndea is, uh, we're doing program synthesis research. And when I talk about program synthesis, often people ask me, "Oh, so are you doing like codegen? Are you, uh, building an alternative to coding agents?" And it's actually not at all what we are doing. We are working at a much, much more, uh, much lower level than that. Uh, what we're actually doing is that we are trying to build a new branch of machine learning, an alternative to deep learning itself, uh, rather than like coding agents. Coding agents are like this very, very high level last layer piece of the stack, and we're actually trying to rebuild the whole stack on top of different foundations. So we're building a new learning substrate that's very different from, you know, parametric learning, deep learning. So if you go back to, uh, the problem of machine learning, you have some input data, some target data, and you're trying to find a function that will map the inputs to the targets that will hopefully generalize to new inputs. And, uh, if you're doing deep learning, what you're doing is that you have this parametric curve that serves as your, as your function, as your model, and you're trying to fit the parameters of the curve via gradient descent. And this is basically what we are doing, uh, except we are replacing the parametric curve with a symbolic model that is meant to be as small as possible. It's like the simplest, uh, possible, uh, model to explain the data, to model what's going on. Uh, and of course, if you're doing that, you cannot apply gradient descent anymore, so we are building something that we call, uh, symbolic descent, which is like the symbolic space equivalent of gradient descent. The idea is to build this new machine learning engine that's giving you, uh, extremely concise symbolic models of the data you're feeding into it, and then we are gonna make it scale. And so everything you're doing with machine learning today, with parametric curves, we should be able to do it, uh, with symbolic models in the future in a, in a way that will be much, much closer to optimality. Much closer to optimality in the sense that you're gonna need much less data to obtain the models. The models are gonna run much more efficiently at, at inference time because they're gonna be so small. And because they are so small, they will also generalize much better and compose much better. You know, the, the minimum description length principle, that the model of the data that is most likely to generalize is the shortest. And I think you cannot find a model like this if you're doing parametric learning. You need to, you need to try symbolic learning.
Install uListen to search the full transcript and get AI-powered insights
Get Full TranscriptGet more from every podcast
AI summaries, searchable transcripts, and fact-checking. Free forever.
Add to Chrome