Dwarkesh Podcast

Francois Chollet — Why the biggest AI models can't solve simple puzzles

Here is my conversation with Francois Chollet and Mike Knoop on the $1 million ARC-AGI Prize they're launching today. I did a bunch of socratic grilling throughout, but Francois’s arguments about why LLMs won’t lead to AGI are very interesting and worth thinking through. It was really fun discussing/debating the cruxes. Enjoy! Check out ARC-AGI Prize here: https://arcprize.org 𝐄𝐏𝐈𝐒𝐎𝐃𝐄 𝐋𝐈𝐍𝐊𝐒 * Transcript: https://www.dwarkeshpatel.com/p/francois-chollet * Apple Podcasts: https://podcasts.apple.com/us/podcast/francois-chollet-mike-knoop-llms-wont-lead-to-agi-%241/id1516093381?i=1000658672649 * Spotify: https://open.spotify.com/episode/7bmeJQOvXGy4LYl6YoiYYP?si=obUSUEwjSA6tkB8EBcb18w * Follow me on Twitter: https://x.com/dwarkesh_sp 𝐓𝐈𝐌𝐄𝐒𝐓𝐀𝐌𝐏𝐒 00:00:00 – The ARC benchmark 00:11:53 – Why LLMs struggle with ARC 00:19:43 – Skill vs intelligence 00:28:38 – Do we need “AGI” to automate most jobs? 00:49:11 – Future of AI progress: deep learning + program synthesis 01:01:23 – How Mike Knoop got nerd-sniped by ARC 01:09:20 – Million $ ARC Prize 01:11:16 – Resisting benchmark saturation 01:18:51 – ARC scores on frontier vs open source models 01:27:02 – Possible solutions to ARC Prize

Francois CholletguestDwarkesh PatelhostMike Knoopguest

Jun 10, 20241h 34mWatch on YouTube ↗

WHAT IT’S REALLY ABOUT

ARC prize challenges LLM dominance, demands true machine intelligence progress

Francois Chollet explains the ARC (Abstraction and Reasoning Corpus) benchmark and a new $1M ARC Prize as a way to measure and drive progress toward genuine machine intelligence, not just larger language models.
ARC is designed like an IQ test requiring humans’ core knowledge (objects, counting, basic physics, spatial patterns) but is intentionally resistant to memorization, which makes it very hard for current LLMs despite being easy for humans.
Chollet argues that today’s LLMs scale skill and memorized patterns, not true general intelligence, which he defines as the ability to rapidly acquire new skills and adapt to novel tasks from very little data.
He and co-sponsor Mike Knoop hope ARC will catalyze new architectures that merge deep learning with program synthesis / discrete search, and that open, reproducible solutions will reorient AI research away from closed, LLM-only paths.

IDEAS WORTH REMEMBERING

5 ideas

ARC is explicitly built to defeat memorization-based AI.

Each ARC puzzle is a small, novel grid-based transformation task that cannot be solved by recalling seen examples; success requires synthesizing a new solution program from minimal demonstrations, much like human IQ tests.

Current LLMs mainly scale skill and stored patterns, not intelligence.

Chollet distinguishes between expanding a model’s bank of solution templates (skill) and the ability to quickly learn entirely new tasks from sparse data (intelligence); scaling LLMs improves the former but shows little evidence of the latter on ARC.

True general intelligence hinges on rapid adaptation to novelty.

Because real-world tasks and environments constantly change, no system can be pre-trained on every possible situation; intelligence is therefore defined as efficiently acquiring new skills and handling out-of-distribution scenarios on the fly.

Hybrid architectures that combine deep learning with program search are likely necessary.

Deep learning excels at pattern recognition and intuition (System 1), while discrete program synthesis excels at data-efficient, explicit reasoning (System 2) but is computationally expensive; Chollet argues AGI will require fusing these strengths.

Test-time learning and ‘active inference’ are crucial missing ingredients.

Most LLM use is static inference over frozen weights; methods like Jack Cole’s ARC approach, which fine-tune or adapt the model per task at inference time, point toward architectures that genuinely learn during problem-solving.

WORDS WORTH SAVING

5 quotes

If you scale up the size of your database, you are not increasing the intelligence of the system one bit.

— Francois Chollet

ARC is intended as a kind of IQ test for machine intelligence… it’s designed to be resistant to memorization.

— Francois Chollet

General intelligence is not task-specific skill scaled up to many skills.

— Francois Chollet

Intelligence is what you use when you don’t know what to do.

— Francois Chollet (quoting Jean Piaget)

OpenAI basically set back progress towards AGI by probably, like, five to ten years.

— Francois Chollet

Design and purpose of the ARC benchmark as an IQ test for machinesLimitations of LLMs: memorization vs. true generalization and intelligenceCore knowledge in humans vs. learned knowledge and its role in ARCProgram synthesis, discrete search, and hybrid System 1 / System 2 architecturesDetails and goals of the $1M ARC Prize competitionImpact of LLM hype and closed research practices on AGI progressDebate over scaling laws, multimodal models, and the path to AGI

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.