No PriorsNo Priors Ep. 90 | With Google's DeepMind's AlphaProof Team
CHAPTERS
- 0:06 – 2:49
Meet the AlphaProof team: backgrounds from AlphaZero, Go, and applied ML
Sarah Guo welcomes Thomas Hubert, Rishi Mehta, and Laurent Sifre and frames AlphaProof as an AI system for finding and verifying mathematical proofs. The team shares how their paths through games (chess/Go) and ML research led them to formal mathematics and proof search.
- •AlphaProof positioned as a leap from game-playing to rigorous mathematical reasoning
- •Rishi’s inspiration from AlphaZero’s “alien” chess and joining DeepMind
- •Thomas’s focus on systems that improve by “thinking more” with more compute
- •Laurent’s journey from Go/AlphaGo lineage to transformers and math potential
- 2:49 – 4:24
Why IMO is different from chess: no opponent, pure cognition, harder navigation
The conversation contrasts IMO-style math problems with board games. Unlike self-play with an opponent, math offers no adversary—progress comes from structuring thought, exploring ideas, and persisting through ambiguity.
- •Board games benefit from self-play against an evenly matched opponent
- •Math is ‘purely cognitive’: you can’t run real-world experiments for feedback
- •Hard problems raise the question of how to systematically search for progress
- •IMO context: six extremely hard problems, measuring deep reasoning
- 4:24 – 6:27
AlphaProof architecture: adapting AlphaZero to formal proof generation
Laurent explains AlphaProof by mapping AlphaZero’s components—neural network, large-scale RL, and planning/search—into the proof domain. Instead of choosing a move, the agent generates lines of formal proof and uses machine-checkable verification as the feedback signal.
- •AlphaZero components: neural net + RL + planning/search
- •Key adaptation: handle (effectively) infinite action spaces (proof steps)
- •Use of formal language (“code to write math”) enables verification
- •Closed loop: find correct proofs → learn from them → improve over time
- 6:27 – 7:56
The search space in math: strengths, weaknesses, and what takes days
The team discusses why math proof search is far larger than chess: infinite rewrites, novel constructions, and theory dependencies. They describe where AlphaProof currently excels (algebra/number theory) and where it struggles (combinatorics; geometry not attempted in this IMO).
- •Proof steps aren’t just selecting known tricks—often require invention
- •Infinite ways to rewrite expressions; few lead to progress
- •Strongest IMO areas: algebra and number theory
- •Weaker area: combinatorics; geometry not applied this year
- 7:56 – 9:28
Test-Time Reinforcement Learning (TTRL): learning near the problem at inference time
Rishi introduces TestTime RL as a method for tackling unfamiliar problems by generating many nearby variations, solving them, and learning from those successes. This “bespoke experimentation” at inference time helps hill-climb toward the original problem, sometimes over days of compute.
- •When base policy can’t solve a problem, generate many nearby variants
- •Solve variants → learn → iteratively move toward the target problem
- •Analogy to scientific experimentation when priors don’t apply
- •Explains why some IMO problems solved in minutes vs. after days
- 9:28 – 12:09
Scaling limits: theory-building and the bottleneck of formalizing combinatorics
Elad asks about obstacles to dramatic future gains; Thomas and Laurent highlight missing capabilities. AlphaProof doesn’t yet build new theories/definitions the way humans do, and many combinatorics problems are hard to translate into Lean and solve due to library/tooling gaps.
- •Primary gap: theory building (creating new concepts and frameworks)
- •Harder domains may require cross-area tools (e.g., analysis for number theory)
- •Combinatorics is often “obfuscated” in natural language, hard to formalize
- •Lean library limitations make some statements awkward to express/solve
- 12:09 – 14:11
Big milestones and ‘math Turing tests’: Hilbert, Millennium problems, and unknown difficulty
The discussion turns to whether there’s an equivalent of Hilbert’s problems as a benchmark for AI in mathematics. They note the uncertainty in estimating how far systems are from results like the Riemann hypothesis, and emphasize that solving such problems likely requires creating brand-new mathematics.
- •Hilbert’s 23 problems and Millennium problems as historical ‘north stars’
- •Hard to estimate distance to RH: could be orders of magnitude away
- •Major breakthroughs may require novel theories and objects
- •Open question: will theory-building emerge from scaling problem-solving, or need explicit incentives?
- 14:11 – 18:20
Why solve math: language of the universe, AGI reasoning, and ‘thinking more’ as a testbed
Sarah asks why work on math at all—intrinsic value vs transfer to other domains. The team argues math is both foundational to technology/science and a uniquely rigorous arena for developing general reasoning systems that improve with more compute.
- •Math as a compact, verifiable domain with unbounded complexity
- •Motivation 1: math underpins understanding/prediction of the natural world
- •Motivation 2: math exercises abstraction/generalization central to AGI
- •Thomas: proving as one of the last major domains requiring ‘thinking more’
- 18:20 – 22:00
Knowledge vs applications: what this could enable (including software verification)
Elad probes application areas and whether this is “for the love of knowledge.” The team highlights both curiosity-driven math (e.g., unifying programs like Langlands) and concrete benefits like scalable formal verification of software and broader transfer of RL and inference-time compute techniques.
- •Pure math often becomes practical later (historical pattern)
- •Laurent’s interest in unifying structures (e.g., Langlands program)
- •Thomas: code verification—prove properties instead of relying on tests
- •Rishi: tech transfer—scaling RL and inference-time compute applies widely
- 22:00 – 24:28
Verification and RL beyond math: grounding rewards in fuzzy human domains
Sarah asks how the ‘verifiable reward’ lesson transfers to language tasks like humor. Laurent distinguishes domains with ground truth (machine-checkable) from those requiring human or real-world grounding, and argues RL remains a useful framework even without perfect verification.
- •Some domains have hard ground truth; others are inherently fuzzy (e.g., funny)
- •In fuzzy domains, humans (or the real world) provide reward grounding
- •Perfect verification enables much larger scale than human feedback
- •RL framing still applies; what changes is the source/quality of reward
- 24:28 – 27:52
Human expertise + RL: interpretability, seeding with supervised data, and ‘alien’ proof style
The hosts explore whether expert human labeling (e.g., Terry Tao) would help. Thomas explains AlphaProof learns from self-discovered valid proofs and can look alien; human preference signals could optimize for readability/teaching, while supervised formalized proofs could reduce exploration and accelerate progress.
- •Current loop: discover proofs → verify → learn; style may look ‘alien’
- •Expert preferences could steer toward human-readable proofs
- •Large supervised datasets of formalized proofs could cut exploration cost
- •Rishi: specialist human data seeds behavior; RL can then push to superhuman
- 27:52 – 30:45
Math community impact: formal checking as a collaboration multiplier (humans + AIs)
Elad asks about access and how this could change mathematical practice. Laurent notes AlphaProof isn’t available yet and isn’t close to top mathematicians, but formal proof systems could enable larger, more distributed collaborations by removing trust/verification bottlenecks—like astronomy’s citizen science model.
- •No public access yet; current level framed as impressive ‘high school’ math
- •Formal systems reduce the need to personally verify collaborators’ work
- •Potential for scaling collaboration across many humans and AIs
- •Near-term vision: act like a ‘graduate student’ responding to mathematician questions
- 30:45 – 34:46
A surprising ‘alien’ construction: IMO Problem 6 and finding the bound C=2
Sarah asks for an example of something surprising in AlphaProof’s outputs. Rishi walks through the IMO Problem 6 setup and highlights AlphaProof’s key constructive insight—a funky function that demonstrates the maximum number of distinct values is 2—on a problem solved by only a handful of contestants.
- •Problem 6 described as exceptionally hard (only ~5 solves among 500+)
- •Goal: show a universal bound C on distinct values of an expression
- •AlphaProof proves upper bound C≤2, then must show C is not 1
- •Key construction: a specific function yielding two distinct values, implying C=2
- 34:46 – 39:21
Advice for math learners and researchers: learn Lean, use formalism for learning, focus on questions
In closing, Sarah asks for advice in a world with AlphaProof-like tools. The team encourages learning formal math tooling (Lean) early, using formal proofs as an educational microscope for any level of detail, and developing the human skill of choosing and decomposing worthwhile questions.
- •Formal math (Lean) likely to grow as a collaboration and tooling standard
- •Formal proofs let learners zoom into ‘trivial’ steps at the right granularity
- •Interactive feedback reduces self-deception and speeds learning
- •As machines answer more, humans’ edge shifts to taste: asking/breaking down the right questions