Skip to content
OpenAIOpenAI

How a reasoning model cracked an 80-year-old math problem — the OpenAI Podcast Ep. 20

Last month AI found something mathematicians had missed for decades. Reasoning researchers Alexander Wei, Hongxun Wu, and Lijie Chen join the podcast to discuss how a general-purpose model helped disprove an 80-year-old conjecture from famed mathematician Paul Erdős. They walk through the moment the result started looking real, what it took to verify the proof, and what’s happened since sharing the discovery with the world. They also explore what this means for the future of math and for researchers learning to work with AI. Chapters 0:44 AI and the International Math Olympiad and International Olympiad of Informatics 6:35 An OpenAI model disproves the Erdős unit distance conjecture 8:33 Running the model and checking the proof 11:04 Why general models matter for discovery 15:55 Creativity, tools, and how the proof worked 18:25 Why AI should feel empowering for mathematicians 22:31 Advice for researchers using AI 27:24 What comes next for math and AI research 37:30 Cryptography, quantum computing, and the future

Andrew MaynehostLijie ChenguestHongxun WuguestAlexander Weiguest
Jun 4, 202641mWatch on YouTube ↗

CHAPTERS

  1. Meet the reasoning research team and the path from Olympiads to OpenAI

    Host Andrew Mayne introduces Alexander Wei, Hongxun Wu, and Lijie Chen and why this breakthrough felt like a “can’t sleep” moment. The guests describe their personal routes into reasoning research, including the shock of models reaching Olympiad-level performance and the career pull toward building smarter systems.

  2. What IMO/IOI measure—and why inference-time compute changed the game

    Alex explains the International Math Olympiad (IMO) and International Olympiad of Informatics (IOI) as historically hard benchmarks for AI. The conversation links recent leaps to test-time/inference-time compute: letting models “think longer,” explore alternatives, and self-correct before answering.

  3. Beyond benchmarks: from IMO gold to ‘can it solve P vs NP?’

    The guests contrast competition success with deeper research frontiers. Lijie argues that problems like P vs NP likely require building entirely new theory—potentially “many books” worth—highlighting what still feels out of reach even as problem-solving accelerates.

  4. The 80-year-old target: Erdős’ unit distance conjecture explained

    The team describes the unit distance problem in combinatorial/discrete geometry: given n points in the plane, how many pairs can be exactly distance 1 apart, asymptotically? Erdős conjectured a square-grid arrangement was essentially optimal; the model produced a disproof and a better construction.

  5. How they ran the experiment: prompting, parallel internal models, and validation

    Hongxun and Lijie describe stress-testing model capability using a subset of Erdős problems rather than picking a single target. After seeing a plausible disproof, they used the model to sanity-check, then enlisted mathematicians inside the company for multi-day scrutiny until confidence grew.

  6. Why general models matter: discovery without training to a single benchmark

    The discussion emphasizes that this wasn’t a narrow ‘math-only’ system; it behaved like a general-purpose ChatGPT/Codex-style model. The guests argue that broad reasoning competence plus test-time compute can unlock research-level results, and that similar capability should become accessible to many users.

  7. Inside the proof’s novelty: cross-field creativity and high-powered number theory

    Alex characterizes the proof as beyond typical olympiad difficulty and surprising even to trained researchers. A key novelty is bridging class field theory (number theory) into combinatorial geometry—an unusual connection requiring both insight and careful execution.

  8. Tools and ‘grounding’: web access, coding abilities, and the Cambridge dictionary moment

    The guests clarify the model used common tool abilities like browsing and coding (Python), not formal proof systems like Lean for this result. They share anecdotes showing the model grounding definitions—famously looking up what “unit” means—before proceeding, reflecting a cautious interpretive step.

  9. Why this should feel empowering, not intimidating, for mathematicians

    The team argues that AI changes who can attempt hard problems and how quickly ideas propagate, but humans remain central for understanding, generalizing, and building theory. They cite early examples where mathematicians improved bounds and reused the construction’s intuition to tackle other conjectures.

  10. Practical advice for researchers: trust calibration and asking bolder questions

    They offer concrete workflow tips: use stronger reasoning tiers (e.g., Pro), ask direct ambitious questions, and don’t over-impose human decompositions that may encode wrong priors. They recommend iteratively increasing trust in the model and learning its failure modes to maximize leverage safely.

  11. What’s next for math + AI: longer horizons, new-theory generation, and scaling time

    The guests discuss future milestones: models doing AI research, solving major complexity questions, and eventually inventing new mathematical frameworks. They describe an apparent ‘Moore’s law’ in how long models can work effectively, but note that theory invention may require years/decades of coherent progress.

  12. From one proof to broader science—and implications for cryptography and quantum computing

    They emphasize their goal isn’t racing through Erdős’ list but empowering communities with tools that can accelerate discovery. The conversation turns to cryptography—where AI might prove security assumptions or find weaknesses—and to quantum computing, where AI could speed progress via new error-correction ideas even if paradigms differ.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.