What happens now that AI is good at math? — the OpenAI Podcast Ep. 17

Math is one of the clearest ways to see how far AI has come in a short span. OpenAI researchers Sébastien Bubeck and Ernest Ryu join host Andrew Mayne to explain what changed and what it could mean for the future of research. They reflect on how Ernest used ChatGPT to help solve a 42-year-old open problem, the difference between deep literature search and original mathematical discovery, and what changes when AI can work over longer timelines. Chapters 01:27 The surprising progress of AI’s math capabilities 03:01 Solving an open problem with ChatGPT 06:57 How models went from basic math to research level 11:32 Why math matters for AGI 14:26 AI and the Erdős problems 21:26 Building an automated researcher 28:19 The role of humans as models improve 33:52 Verifying proofs with AI 36:00 The risk of shallow understanding 41:19 Advice for learning math with ChatGPT

Andrew MaynehostSébastien BubeckguestErnest Ryuguest

Apr 28, 202643mWatch on YouTube ↗

WHAT IT’S REALLY ABOUT

AI’s math leap enables research, verification, and automated discovery workflows

Researchers describe math as a uniquely clean benchmark for AI progress because problems are unambiguous and solutions are often verifiable, making capability jumps easy to measure.
Ernest Ryu recounts using ChatGPT—through iterative human verification and steering—to resolve a 42-year-old open optimization problem about divergence in Nesterov’s accelerated gradient method.
Sébastien Bubeck argues the apparent “sudden” math improvement wasn’t just scaling, but a bundle of training and reasoning innovations that expanded models’ ability to sustain long, consistent chains of thought.
The conversation frames strong math reasoning as central to AGI because it demands long-horizon, self-correcting reasoning that should transfer to other sciences and enable “automated researcher” systems.
They emphasize both upside (faster discovery, deeper literature connections, proof checking) and risk (shallow understanding, over-trust, low-quality AI-generated proofs), concluding humans remain essential for direction, standards, and accountability.

IDEAS WORTH REMEMBERING

5 ideas

Math capability gains reflect more than scaling—multiple innovations compound.

Bubeck rejects “scaling alone” as the right framing, noting OpenAI’s progress came from concurrent research advances; this helps explain why users perceived an abrupt jump in reliability on tasks like scheduling and ledger-splitting.

AI can already contribute to research when paired with expert human verification.

Ryu’s 12-hour, multi-day interaction shows the model didn’t magically one-shot a proof; the human played verifier, corrected mistakes, and guided approaches—turning AI into a high-speed collaborator rather than an oracle.

Long, consistent reasoning is the core skill math trains in both humans and models.

They argue math rewards correctness across entire chains: one small error can invalidate everything, so models must learn self-correction and coherence over extended reasoning—properties expected to generalize to other scientific domains.

“AGI time” is a useful lens for progress: seconds → minutes → hours → days → weeks.

Bubeck frames capability not just as IQ-like performance, but duration of sustained competent work; the automated researcher goal is to push this horizon to weeks/months to enable deeper breakthroughs and experimental loops.

Erdős-problem wins illustrate two distinct superpowers: cross-literature connection and original discovery.

Early successes were sometimes “deep literature search” (finding answers in distant fields and translating them), which sparked controversy when framed as solving “open problems”; later they claim models produced genuinely new, publishable combinatorics results.

WORDS WORTH SAVING

5 quotes

Today, two years later, the models are able to help Fields Medalists in their day-to-day work.

— Sébastien Bubeck

And that's how this, uh, 42-year-old open problem got resolved.

— Ernest Ryu

If at some point in your chain of reasoning there is a mistake, this will kill the entire argument.

— Sébastien Bubeck

So you can have AGI seconds, minutes, hours, days, and so on.

— Sébastien Bubeck

I'm worried about potentially having a shallower understanding of things because we rely too much on the tool.

— Sébastien Bubeck

From weak arithmetic to IMO gold-level performanceSolving a genuine open problem via AI-assisted proof developmentWhy math is an ideal benchmark (clarity + verifiability)Reasoning consistency and long-horizon “AGI time”Erdős problems: literature search vs original solutions controversyAutomated researcher vision and context/working-memory limitsProof verification, error-finding, and the risk of shallow understandingHow to learn math with ChatGPT and generate good questions

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.