No Priors Ep. 90 | With Google's DeepMind's AlphaProof Team

In this week’s episode of No Priors, Sarah and Elad sit down with the Google DeepMind team behind AlphaProof, Laurent Sartran, Rishi Mehta, and Thomas Hubert. AlphaProof is a new reinforcement learning-based system for formal math reasoning that recently reached a silver-medal standard in solving International Mathematical Olympiad problems. They dive deep into AI and its role in solving complex mathematical problems, featuring insights into AlphaProof and its capabilities. They cover its functionality, unique strengths in reasoning, and the challenges it faces as it scales. The conversation also explores the motivations behind AI in math, practical applications, and how verifiability and human input come into play within a reinforcement learning approach. The DeepMind team shares advice and future perspectives on where math and AI are headed. Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @Rishicomplex | @LaurentSartran | @ThomasHubert Show Notes: 0:00 Personal introductions 2:19 Achieving silver medal in IMO competition 3:52 How AlphaProof works 5:56 AlphaProof’s strengths within mathematical reasoning 8:56 Challenges in scaling AlphaProof 13:40 Why solve math? 17:50 Pursuing knowledge versus practical applications 21:30 Insights on verifying correctness within reinforcement learning 28:27 How AI could foster more collaboration among mathematicians 30:28 Surprising insights from AI proof generation 34:17 Future of math and AI: advice for math enthusiasts and researchers

Sarah GuohostLaurent SifreguestThomas HubertguestRishi MehtaguestElad Gilhost

Nov 14, 202439mWatch on YouTube ↗

CHAPTERS

0:06 – 2:49
Meet the AlphaProof team: backgrounds from AlphaZero, Go, and applied ML
Sarah Guo welcomes Thomas Hubert, Rishi Mehta, and Laurent Sifre and frames AlphaProof as an AI system for finding and verifying mathematical proofs. The team shares how their paths through games (chess/Go) and ML research led them to formal mathematics and proof search.
- •AlphaProof positioned as a leap from game-playing to rigorous mathematical reasoning
- •Rishi’s inspiration from AlphaZero’s “alien” chess and joining DeepMind
- •Thomas’s focus on systems that improve by “thinking more” with more compute
- •Laurent’s journey from Go/AlphaGo lineage to transformers and math potential
2:49 – 4:24
Why IMO is different from chess: no opponent, pure cognition, harder navigation
The conversation contrasts IMO-style math problems with board games. Unlike self-play with an opponent, math offers no adversary—progress comes from structuring thought, exploring ideas, and persisting through ambiguity.
- •Board games benefit from self-play against an evenly matched opponent
- •Math is ‘purely cognitive’: you can’t run real-world experiments for feedback
- •Hard problems raise the question of how to systematically search for progress
- •IMO context: six extremely hard problems, measuring deep reasoning
4:24 – 6:27
AlphaProof architecture: adapting AlphaZero to formal proof generation
Laurent explains AlphaProof by mapping AlphaZero’s components—neural network, large-scale RL, and planning/search—into the proof domain. Instead of choosing a move, the agent generates lines of formal proof and uses machine-checkable verification as the feedback signal.
- •AlphaZero components: neural net + RL + planning/search
- •Key adaptation: handle (effectively) infinite action spaces (proof steps)
- •Use of formal language (“code to write math”) enables verification
- •Closed loop: find correct proofs → learn from them → improve over time
6:27 – 7:56
The search space in math: strengths, weaknesses, and what takes days
The team discusses why math proof search is far larger than chess: infinite rewrites, novel constructions, and theory dependencies. They describe where AlphaProof currently excels (algebra/number theory) and where it struggles (combinatorics; geometry not attempted in this IMO).
- •Proof steps aren’t just selecting known tricks—often require invention
- •Infinite ways to rewrite expressions; few lead to progress
- •Strongest IMO areas: algebra and number theory
- •Weaker area: combinatorics; geometry not applied this year
7:56 – 9:28
Test-Time Reinforcement Learning (TTRL): learning near the problem at inference time
Rishi introduces TestTime RL as a method for tackling unfamiliar problems by generating many nearby variations, solving them, and learning from those successes. This “bespoke experimentation” at inference time helps hill-climb toward the original problem, sometimes over days of compute.
- •When base policy can’t solve a problem, generate many nearby variants
- •Solve variants → learn → iteratively move toward the target problem
- •Analogy to scientific experimentation when priors don’t apply
- •Explains why some IMO problems solved in minutes vs. after days
9:28 – 12:09
Scaling limits: theory-building and the bottleneck of formalizing combinatorics
Elad asks about obstacles to dramatic future gains; Thomas and Laurent highlight missing capabilities. AlphaProof doesn’t yet build new theories/definitions the way humans do, and many combinatorics problems are hard to translate into Lean and solve due to library/tooling gaps.
- •Primary gap: theory building (creating new concepts and frameworks)
- •Harder domains may require cross-area tools (e.g., analysis for number theory)
- •Combinatorics is often “obfuscated” in natural language, hard to formalize
- •Lean library limitations make some statements awkward to express/solve
12:09 – 14:11
Big milestones and ‘math Turing tests’: Hilbert, Millennium problems, and unknown difficulty
The discussion turns to whether there’s an equivalent of Hilbert’s problems as a benchmark for AI in mathematics. They note the uncertainty in estimating how far systems are from results like the Riemann hypothesis, and emphasize that solving such problems likely requires creating brand-new mathematics.
- •Hilbert’s 23 problems and Millennium problems as historical ‘north stars’
- •Hard to estimate distance to RH: could be orders of magnitude away
- •Major breakthroughs may require novel theories and objects
- •Open question: will theory-building emerge from scaling problem-solving, or need explicit incentives?
14:11 – 18:20
Why solve math: language of the universe, AGI reasoning, and ‘thinking more’ as a testbed
Sarah asks why work on math at all—intrinsic value vs transfer to other domains. The team argues math is both foundational to technology/science and a uniquely rigorous arena for developing general reasoning systems that improve with more compute.
- •Math as a compact, verifiable domain with unbounded complexity
- •Motivation 1: math underpins understanding/prediction of the natural world
- •Motivation 2: math exercises abstraction/generalization central to AGI
- •Thomas: proving as one of the last major domains requiring ‘thinking more’
18:20 – 22:00
Knowledge vs applications: what this could enable (including software verification)
Elad probes application areas and whether this is “for the love of knowledge.” The team highlights both curiosity-driven math (e.g., unifying programs like Langlands) and concrete benefits like scalable formal verification of software and broader transfer of RL and inference-time compute techniques.
- •Pure math often becomes practical later (historical pattern)
- •Laurent’s interest in unifying structures (e.g., Langlands program)
- •Thomas: code verification—prove properties instead of relying on tests
- •Rishi: tech transfer—scaling RL and inference-time compute applies widely
22:00 – 24:28
Verification and RL beyond math: grounding rewards in fuzzy human domains
Sarah asks how the ‘verifiable reward’ lesson transfers to language tasks like humor. Laurent distinguishes domains with ground truth (machine-checkable) from those requiring human or real-world grounding, and argues RL remains a useful framework even without perfect verification.
- •Some domains have hard ground truth; others are inherently fuzzy (e.g., funny)
- •In fuzzy domains, humans (or the real world) provide reward grounding
- •Perfect verification enables much larger scale than human feedback
- •RL framing still applies; what changes is the source/quality of reward
24:28 – 27:52
Human expertise + RL: interpretability, seeding with supervised data, and ‘alien’ proof style
The hosts explore whether expert human labeling (e.g., Terry Tao) would help. Thomas explains AlphaProof learns from self-discovered valid proofs and can look alien; human preference signals could optimize for readability/teaching, while supervised formalized proofs could reduce exploration and accelerate progress.
- •Current loop: discover proofs → verify → learn; style may look ‘alien’
- •Expert preferences could steer toward human-readable proofs
- •Large supervised datasets of formalized proofs could cut exploration cost
- •Rishi: specialist human data seeds behavior; RL can then push to superhuman
27:52 – 30:45
Math community impact: formal checking as a collaboration multiplier (humans + AIs)
Elad asks about access and how this could change mathematical practice. Laurent notes AlphaProof isn’t available yet and isn’t close to top mathematicians, but formal proof systems could enable larger, more distributed collaborations by removing trust/verification bottlenecks—like astronomy’s citizen science model.
- •No public access yet; current level framed as impressive ‘high school’ math
- •Formal systems reduce the need to personally verify collaborators’ work
- •Potential for scaling collaboration across many humans and AIs
- •Near-term vision: act like a ‘graduate student’ responding to mathematician questions
30:45 – 34:46
A surprising ‘alien’ construction: IMO Problem 6 and finding the bound C=2
Sarah asks for an example of something surprising in AlphaProof’s outputs. Rishi walks through the IMO Problem 6 setup and highlights AlphaProof’s key constructive insight—a funky function that demonstrates the maximum number of distinct values is 2—on a problem solved by only a handful of contestants.
- •Problem 6 described as exceptionally hard (only ~5 solves among 500+)
- •Goal: show a universal bound C on distinct values of an expression
- •AlphaProof proves upper bound C≤2, then must show C is not 1
- •Key construction: a specific function yielding two distinct values, implying C=2
34:46 – 39:21
Advice for math learners and researchers: learn Lean, use formalism for learning, focus on questions
In closing, Sarah asks for advice in a world with AlphaProof-like tools. The team encourages learning formal math tooling (Lean) early, using formal proofs as an educational microscope for any level of detail, and developing the human skill of choosing and decomposing worthwhile questions.
- •Formal math (Lean) likely to grow as a collaboration and tooling standard
- •Formal proofs let learners zoom into ‘trivial’ steps at the right granularity
- •Interactive feedback reduces self-deception and speeds learning
- •As machines answer more, humans’ edge shifts to taste: asking/breaking down the right questions

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

iOS

Android

Claude

Chrome

Meet the AlphaProof team: backgrounds from AlphaZero, Go, and applied ML

Why IMO is different from chess: no opponent, pure cognition, harder navigation

AlphaProof architecture: adapting AlphaZero to formal proof generation

The search space in math: strengths, weaknesses, and what takes days

Test-Time Reinforcement Learning (TTRL): learning near the problem at inference time

Scaling limits: theory-building and the bottleneck of formalizing combinatorics

Big milestones and ‘math Turing tests’: Hilbert, Millennium problems, and unknown difficulty

Why solve math: language of the universe, AGI reasoning, and ‘thinking more’ as a testbed

Knowledge vs applications: what this could enable (including software verification)

Verification and RL beyond math: grounding rewards in fuzzy human domains

Human expertise + RL: interpretability, seeding with supervised data, and ‘alien’ proof style

Math community impact: formal checking as a collaboration multiplier (humans + AIs)

A surprising ‘alien’ construction: IMO Problem 6 and finding the bound C=2

Advice for math learners and researchers: learn Lean, use formalism for learning, focus on questions

Get more out of YouTube videos.