How a reasoning model cracked an 80-year-old math problem — the OpenAI Podcast Ep. 20

Last month AI found something mathematicians had missed for decades. Reasoning researchers Alexander Wei, Hongxun Wu, and Lijie Chen join the podcast to discuss how a general-purpose model helped disprove an 80-year-old conjecture from famed mathematician Paul Erdős. They walk through the moment the result started looking real, what it took to verify the proof, and what’s happened since sharing the discovery with the world. They also explore what this means for the future of math and for researchers learning to work with AI. Chapters 0:44 AI and the International Math Olympiad and International Olympiad of Informatics 6:35 An OpenAI model disproves the Erdős unit distance conjecture 8:33 Running the model and checking the proof 11:04 Why general models matter for discovery 15:55 Creativity, tools, and how the proof worked 18:25 Why AI should feel empowering for mathematicians 22:31 Advice for researchers using AI 27:24 What comes next for math and AI research 37:30 Cryptography, quantum computing, and the future

Andrew MaynehostLijie ChenguestHongxun WuguestAlexander Weiguest

Jun 4, 202641mWatch on YouTube ↗

WHAT IT’S REALLY ABOUT

OpenAI reasoning model disproves Erdős conjecture, showing discovery potential

The team reports that an OpenAI reasoning model generated a credible disproof of Erdős’s unit distance conjecture, a long-standing central problem in combinatorial/discrete geometry.
They emphasize that the breakthrough came from a general-purpose model using more test-time compute (letting it “think longer”) rather than a narrow math-only system trained to a benchmark.
Verification relied on multiple layers of checking—model self-checking followed by several days of review by mathematically trained colleagues—reflecting how “too good to be true” results are stress-tested.
The proof’s key novelty, as described, was bridging distant fields (notably class field theory/number theory with combinatorial geometry), which they interpret as evidence of genuine creative synthesis.
The speakers argue this should feel empowering to mathematicians and researchers, shifting humans toward digesting, generalizing, and building new theory while models handle more of the exploratory and computational workload.

IDEAS WORTH REMEMBERING

5 ideas

Test-time compute is a major driver of reasoning gains.

They describe a clear pattern: giving the model more time/compute increases correctness, with reported performance approaching ~50% success on the hard unit-distance task under large budgets.

General-purpose capability can yield frontier math results without benchmark-specific training.

The team frames the result as emerging from a broadly capable model (Codex-like tool use, web lookup, Python execution) rather than a system narrowly tuned for one competition or theorem-proving setup.

The breakthrough relied on cross-domain synthesis, not just grinding search.

At a high level, they attribute novelty to connecting class field theory/number theory techniques to combinatorial geometry—an unusual bridge that still required delicate execution.

Validation requires human-style skepticism and multi-stage review.

Their process was: ask the model to check itself, then enlist internal mathematicians to attempt to find errors over several days until confidence rose from “no way” to “likely correct.”

AI outputs can seed rapid follow-on progress by humans.

They claim mathematicians used the construction’s intuition to improve bounds and to tackle other problems (they mention a follow-on disproof of a sum-product conjecture variant over the reals).

WORDS WORTH SAVING

5 quotes

Maybe this is the one in 100 times where it's too good to be true, but it's, it's actually true.

— Alexander Wei

And at that moment you feel like, okay, this model is something that's really amazing.

— Hongxun Wu

Everyone had a hard time sleeping because it's so, so exciting, yeah.

— Lijie Chen

I think it should not be intimidating. I just think it should be empowering instead.

— Hongxun Wu

Get GPT Pro subscription.

— Hongxun Wu

IMO/IOI as reasoning benchmarksInference-time (test-time) compute scalingErdős unit distance conjecture disproofGeneral-purpose models vs. specialized math modelsProof checking and reliability practicesCross-field idea transfer (number theory ↔ geometry)Implications for research workflows, cryptography, and quantum computing

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.