Dwarkesh PodcastFrancois Chollet — Why the biggest AI models can't solve simple puzzles
At a glance
WHAT IT’S REALLY ABOUT
ARC prize challenges LLM dominance, demands true machine intelligence progress
- Francois Chollet explains the ARC (Abstraction and Reasoning Corpus) benchmark and a new $1M ARC Prize as a way to measure and drive progress toward genuine machine intelligence, not just larger language models.
- ARC is designed like an IQ test requiring humans’ core knowledge (objects, counting, basic physics, spatial patterns) but is intentionally resistant to memorization, which makes it very hard for current LLMs despite being easy for humans.
- Chollet argues that today’s LLMs scale skill and memorized patterns, not true general intelligence, which he defines as the ability to rapidly acquire new skills and adapt to novel tasks from very little data.
- He and co-sponsor Mike Knoop hope ARC will catalyze new architectures that merge deep learning with program synthesis / discrete search, and that open, reproducible solutions will reorient AI research away from closed, LLM-only paths.
IDEAS WORTH REMEMBERING
5 ideasARC is explicitly built to defeat memorization-based AI.
Each ARC puzzle is a small, novel grid-based transformation task that cannot be solved by recalling seen examples; success requires synthesizing a new solution program from minimal demonstrations, much like human IQ tests.
Current LLMs mainly scale skill and stored patterns, not intelligence.
Chollet distinguishes between expanding a model’s bank of solution templates (skill) and the ability to quickly learn entirely new tasks from sparse data (intelligence); scaling LLMs improves the former but shows little evidence of the latter on ARC.
True general intelligence hinges on rapid adaptation to novelty.
Because real-world tasks and environments constantly change, no system can be pre-trained on every possible situation; intelligence is therefore defined as efficiently acquiring new skills and handling out-of-distribution scenarios on the fly.
Hybrid architectures that combine deep learning with program search are likely necessary.
Deep learning excels at pattern recognition and intuition (System 1), while discrete program synthesis excels at data-efficient, explicit reasoning (System 2) but is computationally expensive; Chollet argues AGI will require fusing these strengths.
Test-time learning and ‘active inference’ are crucial missing ingredients.
Most LLM use is static inference over frozen weights; methods like Jack Cole’s ARC approach, which fine-tune or adapt the model per task at inference time, point toward architectures that genuinely learn during problem-solving.
WORDS WORTH SAVING
5 quotesIf you scale up the size of your database, you are not increasing the intelligence of the system one bit.
— Francois Chollet
ARC is intended as a kind of IQ test for machine intelligence… it’s designed to be resistant to memorization.
— Francois Chollet
General intelligence is not task-specific skill scaled up to many skills.
— Francois Chollet
Intelligence is what you use when you don’t know what to do.
— Francois Chollet (quoting Jean Piaget)
OpenAI basically set back progress towards AGI by probably, like, five to ten years.
— Francois Chollet
High quality AI-generated summary created from speaker-labeled transcript.
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome