Grant Sanderson (@3blue1brown) – AI and the future of math

Always so much fun to chat with @3blue1brown AI has been making much faster progress in math than in other fields. As a result, mathematics is showing us, very concretely, what AI progress in other fields will look like. Even within mathematics, there's a jagged landscape. What does it look like? What is the nature of the most important conceptual breakthroughs in the history of mathematics, and how different are they from what AIs are currently able to do? Does AI (on net) increase or decrease human understanding of the field? How big is the overhang from having AIs systematically try to connect ideas already in the literature? And what advice does Grant have for aspiring mathematicians, coders, and other students who are passionate about fields that are being most transformed upon by AI? 𝐄𝐏𝐈𝐒𝐎𝐃𝐄 𝐋𝐈𝐍𝐊𝐒 * Transcript: https://www.dwarkesh.com/p/grant-sanderson-2 * Apple Podcasts: https://podcasts.apple.com/us/podcast/grant-sanderson-ai-and-the-future-of-math/id1516093381?i=1000774870615 * Spotify: https://open.spotify.com/episode/0X3t4uRlpVT4MXPYDIrNYX?si=HZf_0Ky2Q42tOWYZNvWi6w 𝐒𝐏𝐎𝐍𝐒𝐎𝐑𝐒 * Gemini 3.5 Live Translate is what I wished I'd had on my last trip to China. It detects more than 70 languages and translates them in near real-time… and it preserves your original pacing and intonation. If you're building an app that needs live translation, you should check out Gemini 3.5 Live Translate. Get started at https://ai.studio/live * Cursor’s harness lets me use models for a huge range of tasks at the podcast. For example, Cursor cuts out the ads from each episode I produce so I can post them on Bilibili. It also helps me prep for interviews — I have a repo full of books and papers that Cursor sorts through to find the exact right file for any given question. Try Cursor yourself at https://cursor.com/dwarkesh * Jane Street sponsors 3Blue1Brown, so Grant has gotten to spend a lot of time with various Jane Streeters. He actually just recorded an interview with a few of them, so when we sat down for this episode, he told me about some of the things he learned, like how Jane Street keeps their role definitions fuzzy to make sure their people keep learning and growing. Go check out Grant’s full interview at https://3b1b.co/janestreet To sponsor a future episode, visit https://dwarkesh.com/advertise. 𝐓𝐈𝐌𝐄𝐒𝐓𝐀𝐌𝐏𝐒 00:00:00 – AI is discovering new proofs. Is that AGI? 00:11:32 – The verification loop on conceptual breakthroughs can be a century long 00:26:12 – Will we understand an AI proof of the Riemann hypothesis? 00:38:08 – Can AI find the hidden bridges between fields? 00:53:48 – Why real-world tasks don’t fit into RL environments 01:07:07 – Good writing requires theory of mind that AI still lacks 01:16:02 – Why learning will still depend on human curation

Dwarkesh Patelhost

Jun 30, 20261h 33mWatch on YouTube ↗

CHAPTERS

0:00 – 3:17
IMO gold, “spiky” progress in math, and why benchmarks aren’t AGI
Dwarkesh revisits his earlier claim that an IMO-gold AI would imply AGI, and Grant explains why it instead became “just another benchmark.” They discuss how AI progress in math is uneven—strong in some subdomains, weak in others—so a single milestone doesn’t translate to universal capability.
- •Benchmarks rarely create an “aha, it’s AGI” moment; they become another rung on a ladder
- •Math is a “spiky frontier,” and even within math there’s fractal spikiness
- •IMO categories differ sharply: geometry is much more solvable than combinatorics
- •The key question isn’t “can it solve hard math?” but “what rate-limiters carry over to other work?”
3:17 – 3:42
What would solving the Riemann hypothesis actually look like? Three paths
Grant lays out different ways an AI might solve a Millennium Prize problem, especially the Riemann hypothesis, and why each implies different things about broader automation. He contrasts lightning-bolt cross-field connections, building entirely new theory “mountains,” and brute-force long proofs.
- •Path 1: connect deep expertise across fields (the “lightning bolt” model)
- •Path 2: build new conceptual machinery (the “mountain-building” model)
- •Path 3: brute-force a very long, hard-to-digest proof (the “raw hustle” model)
- •The form of the solution matters more than the headline milestone for predicting economic impact
3:42 – 8:01
Hidden bridges between fields: Montgomery–Dyson and why LLMs ‘should’ excel
They discuss the famous story of Montgomery and Freeman Dyson connecting zeta zeros to random matrix theory, as an archetype of cross-field insight. Grant argues LLM-like systems—broad experts across domains—seem naturally suited to spotting such analogies at scale.
- •Riemann zeta zero statistics unexpectedly match random matrix eigenvalue statistics
- •Serendipity (lunch conversations) currently drives many big cross-field insights
- •LLMs’ breadth suggests they could industrialize this kind of connection-finding
- •This kind of progress is distinct from the demands of many white-collar tasks (e.g., editing)
8:01 – 13:10
Beyond “solving”: the real premium is conjectures and definitions (and why it’s hard to benchmark)
Dwarkesh proposes that after theorem-proving, the next frontier is generating good problems, conjectures, and even new definitions that reshape fields. Grant agrees these are higher-status mathematical achievements, but notes they resist clean benchmarking and reward-model training.
- •“Good prove theorems; great make conjectures; greatest make definitions”
- •Conjecture/definition quality is subjective and hard to score as a benchmark
- •Progress may appear as a ‘tone shift’ in how mathematicians use AI, not a single PR headline
- •Hard-to-benchmark abilities are also hard to train with current RL/benchmark paradigms
13:10 – 23:22
Century-long verification loops: Galois, group theory, and delayed payoff
Grant uses the history of solving polynomial equations and the birth of group theory to illustrate how some conceptual breakthroughs take decades or a century to be recognized as valuable. The discussion highlights why short feedback loops (human or machine) can miss the most important advances.
- •Lagrange reframed polynomial solvability via symmetry/permutations; Abel proved general quintic unsolvability
- •Galois introduced deeper abstraction about underlying symmetries, but was rejected and poorly understood initially
- •Recognition required later interpreters (Liouville, Jordan) to formalize and disseminate the ideas
- •Ultimate ‘verification’ arrived much later via broad utility (physics symmetries, quarks, cryptography)
23:22 – 35:37
Proof vs explanation: will AI progress deepen human understanding or produce alien math?
Dwarkesh asks whether AI might prove major theorems without improving our understanding. Grant argues it depends on whether progress comes via bridges, new theory-building, or brute-force reasoning, and introduces the idea of “unsolved expository problems” where results exist but intuition lags.
- •Some solutions are naturally human-parsable (small bridging ideas)
- •New theory-building can feel ‘alien’ and take years to digest (ABC conjecture as a cautionary example)
- •There’s a meaningful gap between a proof and an explanation; understanding can lag behind correctness
- •Compression/conciseness may be a proxy for elegance and interpretability
35:37 – 38:07
Who explains the future math? From expositor to curator (and why humans may still matter)
They explore what roles remain for humans if AIs can both prove and explain. Grant suggests the lasting human value may shift toward curation—helping others navigate what’s worth learning—analogous to museum curatorship, driven by trust and social motivation.
- •Great researchers are often lucid writers; AI might inherit both abilities, not just theorem-proving
- •Even with perfect explanations, people want trusted guides to choose what to focus on
- •Curation is already much of educational/content work: deciding what’s worth saying and showing
- •Social trust and relationships shape motivation more than objective quality alone
38:07 – 53:50
Engineering discovery: multi-agent ‘serendipity,’ context resets, and entropy in research
Grant and Dwarkesh discuss how digital minds can be parallelized and systematically diversified to search idea-space. They focus on advantages like restarting from fresh contexts, exploring prove/disprove branches, and deliberately injecting different biases to avoid local minima.
- •Parallelization applies a capability ‘waterline’ across many problems, not just one rare genius
- •Agents can be designed to mimic institute-style cross-pollination and serendipitous conversations
- •Refreshing context (starting over) can escape misleading problem framings—useful in contests and research
- •Diversity of heuristics/biases may counter “entropy collapse” where models converge on the same style
53:50 – 56:33
Why math (and code) advance faster than computer-use: verifiability vs grindability
Dwarkesh argues that fast progress in math isn’t just because answers are verifiable; it’s because training is grindable and containerizable. They compare this to computer-use tasks where bot detection, high rollout costs, and changing environments limit large-scale reinforcement learning.
- •‘Grindability’ enables massive parallel rollouts and clearer credit assignment
- •Code and math are unusually containerizable and deterministic compared to real-world tasks
- •Computer-use is verifiable but expensive to simulate at scale (websites, bot detectors)
- •Sample inefficiency makes large-scale repetition crucial in current deep learning paradigms
56:33 – 1:06:52
Lean, Mathlib, and autonomous exploration: what formalization uniquely enables
They debate whether formal proof systems like Lean are central to current breakthroughs, concluding they’re not strictly necessary for many headline results—but may unlock a different regime: long-running, self-verifying, open-ended mathematical exploration. The conversation highlights the value of “green checkmark” certainty and the prospect of endlessly extending formal libraries.
- •Recent successes often occur in natural language; Lean may be overrated as the *only* driver of progress
- •Formalization enables ‘AlphaZero-style’ self-play for math: run for years without human checking
- •A fully formal “AI Mathlib” could explore vast trees of logic and definitions autonomously
- •Formal proofs mitigate the ‘insufferable’ trust problem when models generate many papers with nonzero error rates
1:06:52 – 1:15:49
Why AI writing lags: novelty, non-modularity, and missing theory of mind
They dig into why writing remains difficult even as math and code improve: writing’s product is the text itself, not a separable functional artifact. Grant connects weak theory-of-mind to embodiment and empathy, using a Botox study anecdote to illustrate how humans simulate others’ feelings to understand them.
- •Writing quality depends on deliberate unpredictability and original insight, not just correctness
- •Unlike code/math, writing isn’t modular; every sentence is “the product,” so slop is more visible
- •Models can explain/distill but struggle to produce genuinely insightful narratives
- •Theory of mind may rely on embodied simulation; LLMs lack human-like mechanisms for mentalizing
1:15:49 – 1:22:30
Using LLMs to learn: best practices, limits, and the value of human-authored structure
Grant and Dwarkesh compare LLM explanations to Wikipedia—useful but often a local minimum constrained by surface correctness. They recommend using models to find great human resources, while relying on carefully curated textbooks/lectures for motivation and sequencing, and using the LLM for targeted clarification rather than full guidance.
- •“Who matters more than what”: author/teacher quality dominates topic choice for learning
- •LLMs are strong at pointing to references, resources, and alternative explanations
- •Best learning stack: human-crafted curriculum + LLM for pruning/clarification + practice
- •LLMs struggle to reframe a learner’s mistaken mental model; great teachers can ‘jujitsu’ it into insight
1:22:30 – 1:33:39
Careers, funding, and the practical value of accelerated math
Grant advises students to think concretely about value creation and the funding structures behind math careers, regardless of AI progress. They discuss potential economic spillovers from accelerated applied math (e.g., PDEs, simulation) while acknowledging that some subfields may remain distant from practical impact.
- •Career advice: understand where money comes from (teaching, grants, prestige, public-good funding)
- •Teaching may be among the most stable post-AGI roles due to its relational/coaching nature
- •Applied areas (PDEs, simulation) plausibly yield direct economic benefits; pure math spillovers are uncertain
- •A faster ‘math engine’ increases the leverage of human judgment in directing useful applications

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

iOS

Android

Claude

Chrome

IMO gold, “spiky” progress in math, and why benchmarks aren’t AGI

What would solving the Riemann hypothesis actually look like? Three paths

Hidden bridges between fields: Montgomery–Dyson and why LLMs ‘should’ excel

Beyond “solving”: the real premium is conjectures and definitions (and why it’s hard to benchmark)

Century-long verification loops: Galois, group theory, and delayed payoff

Proof vs explanation: will AI progress deepen human understanding or produce alien math?

Who explains the future math? From expositor to curator (and why humans may still matter)

Engineering discovery: multi-agent ‘serendipity,’ context resets, and entropy in research

Why math (and code) advance faster than computer-use: verifiability vs grindability

Lean, Mathlib, and autonomous exploration: what formalization uniquely enables

Why AI writing lags: novelty, non-modularity, and missing theory of mind

Using LLMs to learn: best practices, limits, and the value of human-authored structure

Careers, funding, and the practical value of accelerated math

Get more out of YouTube videos.