OpenAIHow a reasoning model cracked an 80-year-old math problem — the OpenAI Podcast Ep. 20
CHAPTERS
Meet the reasoning research team and the path from Olympiads to OpenAI
Host Andrew Mayne introduces Alexander Wei, Hongxun Wu, and Lijie Chen and why this breakthrough felt like a “can’t sleep” moment. The guests describe their personal routes into reasoning research, including the shock of models reaching Olympiad-level performance and the career pull toward building smarter systems.
What IMO/IOI measure—and why inference-time compute changed the game
Alex explains the International Math Olympiad (IMO) and International Olympiad of Informatics (IOI) as historically hard benchmarks for AI. The conversation links recent leaps to test-time/inference-time compute: letting models “think longer,” explore alternatives, and self-correct before answering.
Beyond benchmarks: from IMO gold to ‘can it solve P vs NP?’
The guests contrast competition success with deeper research frontiers. Lijie argues that problems like P vs NP likely require building entirely new theory—potentially “many books” worth—highlighting what still feels out of reach even as problem-solving accelerates.
The 80-year-old target: Erdős’ unit distance conjecture explained
The team describes the unit distance problem in combinatorial/discrete geometry: given n points in the plane, how many pairs can be exactly distance 1 apart, asymptotically? Erdős conjectured a square-grid arrangement was essentially optimal; the model produced a disproof and a better construction.
How they ran the experiment: prompting, parallel internal models, and validation
Hongxun and Lijie describe stress-testing model capability using a subset of Erdős problems rather than picking a single target. After seeing a plausible disproof, they used the model to sanity-check, then enlisted mathematicians inside the company for multi-day scrutiny until confidence grew.
Why general models matter: discovery without training to a single benchmark
The discussion emphasizes that this wasn’t a narrow ‘math-only’ system; it behaved like a general-purpose ChatGPT/Codex-style model. The guests argue that broad reasoning competence plus test-time compute can unlock research-level results, and that similar capability should become accessible to many users.
Inside the proof’s novelty: cross-field creativity and high-powered number theory
Alex characterizes the proof as beyond typical olympiad difficulty and surprising even to trained researchers. A key novelty is bridging class field theory (number theory) into combinatorial geometry—an unusual connection requiring both insight and careful execution.
Tools and ‘grounding’: web access, coding abilities, and the Cambridge dictionary moment
The guests clarify the model used common tool abilities like browsing and coding (Python), not formal proof systems like Lean for this result. They share anecdotes showing the model grounding definitions—famously looking up what “unit” means—before proceeding, reflecting a cautious interpretive step.
Why this should feel empowering, not intimidating, for mathematicians
The team argues that AI changes who can attempt hard problems and how quickly ideas propagate, but humans remain central for understanding, generalizing, and building theory. They cite early examples where mathematicians improved bounds and reused the construction’s intuition to tackle other conjectures.
Practical advice for researchers: trust calibration and asking bolder questions
They offer concrete workflow tips: use stronger reasoning tiers (e.g., Pro), ask direct ambitious questions, and don’t over-impose human decompositions that may encode wrong priors. They recommend iteratively increasing trust in the model and learning its failure modes to maximize leverage safely.
What’s next for math + AI: longer horizons, new-theory generation, and scaling time
The guests discuss future milestones: models doing AI research, solving major complexity questions, and eventually inventing new mathematical frameworks. They describe an apparent ‘Moore’s law’ in how long models can work effectively, but note that theory invention may require years/decades of coherent progress.
From one proof to broader science—and implications for cryptography and quantum computing
They emphasize their goal isn’t racing through Erdős’ list but empowering communities with tools that can accelerate discovery. The conversation turns to cryptography—where AI might prove security assumptions or find weaknesses—and to quantum computing, where AI could speed progress via new error-correction ideas even if paradigms differ.