Y CombinatorFrancois Chollet: Why ARC-AGI Shows Scaling Hits a Wall
ARC-AGI benchmarks expose where LLMs stop at pattern recognition; Ndea pursues program synthesis as a more efficient alternative to gradient descent.
At a glance
WHAT IT’S REALLY ABOUT
Chollet argues symbolic learning and ARC benchmarks reveal scaling limits
- Chollet predicts “AGI around 2030,” but distinguishes automation of valuable tasks from human-like general intelligence defined by sample-efficient skill acquisition across novel domains.
- Ndea is positioned as a new ML paradigm that replaces parametric neural models with minimal symbolic programs found via “symbolic descent,” aiming for far greater data efficiency, compositionality, and inference efficiency.
- He argues recent breakthroughs in coding agents come primarily from verifiable reward signals and RL-style post-training loops, not a large jump in underlying fluid intelligence.
- ARC-AGI’s evolution (V1→V2→V3) is designed to track the remaining gap to human capabilities and resist benchmark “targeting,” shifting from static pattern induction to interactive, efficiency-scored agentic evaluation.
- Chollet encourages more diverse AI research bets (including revived older ideas) so long as they can scale without human bottlenecks, and he shares practical lessons from building Keras into a major open-source project.
IDEAS WORTH REMEMBERING
5 ideasScaling pretraining can miss key ingredients of intelligence.
ARC V1 stayed low for base LLMs despite massive scaling, which Chollet interprets as evidence that pretraining alone doesn’t produce the kind of causal, generalizable program induction associated with fluid intelligence.
Coding agents work because code is verifiable.
Unit tests, compilation, and execution traces create trustworthy reward signals, enabling models to generate attempts, verify outcomes, and produce dense training data via post-training loops.
Post-training loops can yield usefulness without much more fluid intelligence.
Chollet argues recent leaps (e.g., ARC V2 saturation) reflect targeted RL pipelines and better training/harnesses rather than a fundamental jump in “IQ,” highlighting the tradeoff between knowledge/training and intelligence.
Non-verifiable domains will improve more slowly on today’s stack.
Tasks like essay writing lack reliable formal rewards, so progress depends on costly human annotations and may stall relative to domains that can be turned into verifiable environments.
Symbolic program synthesis aims at optimality: smaller models, less data, better generalization.
Ndea’s approach replaces parameter fitting with searching for minimal symbolic models (MDL-style compression), which Chollet expects to be more data-efficient, compositional, and cheap at inference.
WORDS WORTH SAVING
5 quotesI think we're probably looking at AGI 2030.
— François Chollet
Code provides you with a verifiable reward signal.
— François Chollet
AGI is basically gonna be a system that can approach any new problem... with the same degree of efficiency as a human could.
— François Chollet
The moment models started performing well on ARC 1 was with the first reasoning models.
— François Chollet
I do believe that... it will turn out that it's a code base that's less than 10,000 lines of code.
— François Chollet
High quality AI-generated summary created from speaker-labeled transcript.
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome