What happens now that AI is good at math? — the OpenAI Podcast Ep. 17

Math is one of the clearest ways to see how far AI has come in a short span. OpenAI researchers Sébastien Bubeck and Ernest Ryu join host Andrew Mayne to explain what changed and what it could mean for the future of research. They reflect on how Ernest used ChatGPT to help solve a 42-year-old open problem, the difference between deep literature search and original mathematical discovery, and what changes when AI can work over longer timelines. Chapters 01:27 The surprising progress of AI’s math capabilities 03:01 Solving an open problem with ChatGPT 06:57 How models went from basic math to research level 11:32 Why math matters for AGI 14:26 AI and the Erdős problems 21:26 Building an automated researcher 28:19 The role of humans as models improve 33:52 Verifying proofs with AI 36:00 The risk of shallow understanding 41:19 Advice for learning math with ChatGPT

Andrew MaynehostSébastien BubeckguestErnest Ryuguest

Apr 28, 202643mWatch on YouTube ↗

CHAPTERS

From “laughable at math” to Fields Medal assistance
Andrew Mayne opens with Sébastien Bubeck and Ernest Ryu on how quickly AI math capabilities have advanced—from pre-reasoning models to systems that can meaningfully assist working mathematicians. They frame mathematics as an unexpectedly revealing domain for measuring reasoning progress and why the pace has surprised even experts.
A 42-year-old open problem solved with ChatGPT (and a human verifier)
Ernest recounts using ChatGPT to tackle a genuinely open optimization-theory question about Nesterov’s accelerated gradient method. Over about 12 hours across three evenings, he iterated with the model, corrected mistakes, guided approaches, and verified the final argument—resulting in a correct resolution of a decades-old question.
Calibrating progress: from everyday arithmetic failures to IMO gold
The group contrasts early model limitations (splitting expenses, time-zone scheduling) with sudden improvements that culminated in top human-level International Math Olympiad performance. They distinguish competition math (short, “canned” solutions) from research math, while noting the practical threshold: for most STEM users, today’s models cover nearly all needed mathematics—with caution and checks.
What changed: beyond scaling, toward reasoning systems
Sébastien argues the “scaling alone” framing misses the point: multiple innovations progressed together, not one silver bullet. He situates the progress historically (e.g., Minerva era) to show how quickly expectations have shifted, and emphasizes that modern models can often solve problems directly (not just via calculator tools).
Why math matters for AGI: long, consistent chains of thought
Math is presented as more than “cool”—it demands long-horizon, error-intolerant reasoning where a single mistake collapses the whole argument. That property makes it an ideal training/measurement ground for reasoning that should transfer to other domains, mirroring why humans learn math for disciplined thinking.
Erdős problems, literature search breakthroughs, and a communication trap
Sébastien explains Paul Erdős, the culture around his questions, and the online catalog of open problems. They describe early successes where models didn’t invent new proofs but performed deep “literature search + translation,” connecting results across fields—followed by controversy when those results were misunderstood as brand-new solutions.
From rediscovery to genuinely new combinatorics results
They describe rapid acceleration from “finding answers that already exist” to producing novel, publishable solutions. This raises deeper questions about what scientific creativity is—mere recombination plus reasoning, or rare sparks of genius—and whether AI can continuously extend human knowledge without bound.
The automated researcher and ‘AGI time’ (seconds → weeks → months)
The conversation shifts to building systems that can work autonomously over long horizons, not just within a single chat session. Sébastien introduces “AGI time” as the duration an AI can sustain human-like research thinking, arguing that the key frontier is extending this from days to weeks and beyond—an open research problem tied to the “automated researcher” vision.
Context limits, persistent workspaces, and the Codex analogy for math
Ernest highlights that typical chat context is roughly the size of a ~50-page paper, which is insufficient for deep breakthroughs that require far more thinking than the final write-up. He points to tools like Codex that operate over large codebases with persistent artifacts as a model for how math research agents could maintain long-running notes, summaries, and evolving work products.
Science acceleration in practice: lowering friction and expanding who can do what
Andrew shares a hands-on example of generating a benchmark dataset mid-workflow in minutes—something that would otherwise derail progress. Sébastien ties this to ‘science acceleration’ and emphasizes a two-way effect: mathematicians gain easy access to coding/experiments, and scientists in other fields gain access to advanced math.
Humans’ role as models surpass researchers: direction, meaning, and priorities
Sébastien predicts continued progress: systems that think for weeks, then years, plus agents that find mistakes in papers and even propose valuable new questions. He argues the human role becomes more about setting goals and ensuring science serves human needs (health, control over environment), since AI has no intrinsic stake in those outcomes.
Verification, shallow understanding risk, and how to learn math with ChatGPT
They discuss the dual-edged nature of AI: it can accelerate proof checking and improve trust, but overreliance can erode deep expertise and produce confident nonsense—especially from non-experts attempting grand proofs. They close with practical learning advice: use ChatGPT as an adaptive tutor, ask it for problems at your level, iterate socially, but still do the hard work of understanding and verifying.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome

From “laughable at math” to Fields Medal assistance

A 42-year-old open problem solved with ChatGPT (and a human verifier)

Calibrating progress: from everyday arithmetic failures to IMO gold

What changed: beyond scaling, toward reasoning systems

Why math matters for AGI: long, consistent chains of thought

Erdős problems, literature search breakthroughs, and a communication trap

From rediscovery to genuinely new combinatorics results

The automated researcher and ‘AGI time’ (seconds → weeks → months)

Context limits, persistent workspaces, and the Codex analogy for math

Science acceleration in practice: lowering friction and expanding who can do what

Humans’ role as models surpass researchers: direction, meaning, and priorities

Verification, shallow understanding risk, and how to learn math with ChatGPT

Get more out of YouTube videos.