Codex and the future of coding with AI — the OpenAI Podcast Ep. 6

What happens when AI becomes a true coding collaborator? OpenAI co-founder Greg Brockman and Codex engineering lead Thibault Sottiaux talk about the evolution of Codex—from the first glimpses of AI writing code, to today’s GPT-5 Codex agents that can work for hours on complex refactorings. They discuss building “harnesses,” the rise of agentic coding, code review breakthroughs, and how AI may transform software development in the years ahead. Chapters 1:15 – The first sparks of AI coding with GPT-3 4:00 – Why coding became OpenAI’s deepest focus area 7:20 – What a “harness” is and why it matters for agents 11:45 – Lessons from GitHub Copilot and latency tradeoffs 16:10 – Experimenting with terminals, IDEs, and async agents 22:00 – Internal tools like 10x and Codex code review 27:45 – Why GPT-5 Codex can run for hours on complex tasks 33:15 – The rise of refactoring and enterprise use cases 38:50 – The future of agentic software engineers 45:00 – Safety, oversight, and aligning agents with human intent 51:30 – What coding (and compute) may look like in 2030 57:40 – Advice: why it’s still a great time to learn to code

Andrew MaynehostGreg BrockmanguestThibault Sottiauxguest

Sep 14, 202550mWatch on YouTube ↗

WHAT IT’S REALLY ABOUT

GPT-5 Codex ushers in agentic coding, refactoring, and oversight challenges

The conversation traces AI coding from early GPT-3 docstring-to-function “sparks” to today’s Codex as an agentic collaborator embedded in terminals, IDEs, and GitHub.
A central theme is the “harness”: the tools, agent loop, integrations, and UX that let a model reliably act in real environments—often as important as raw model intelligence.
They highlight lessons from GitHub Copilot (latency as a product constraint), and why different interfaces fit different model speeds—fast autocomplete vs slower but more capable agents.
GPT-5 Codex is positioned as tightly coupled to its tool harness, enabling higher reliability, fast responses for small tasks, and sustained multi-hour effort (up to ~7 hours) on complex refactors, alongside growing emphasis on safety, scalable oversight, and looming compute scarcity by 2030.

IDEAS WORTH REMEMBERING

5 ideas

Coding success depends on co-evolving model intelligence and the harness.

They argue you don’t get useful agentic coding from a strong model alone; you need execution, tools, looping, context access, and UX that make code “come to life” in real workflows.

Latency is a first-class feature that shapes what product you can build.

Copilot revealed that autocomplete has a tight budget (~1500ms), forcing smaller/faster models; slower smarter models can still win if the interface shifts to async or delegated work.

Agentic coding emerged from users pushing context limits in chat.

Developers kept pasting more code, traces, and logs; the natural inversion was letting the model fetch context and drive debugging itself rather than the user orchestrating every step.

Form factor experimentation is still ongoing; “one agent, many surfaces” is the goal.

They describe terminals, IDEs, GitHub @mentions, and cloud computers as complementary. The long-term vision is a single coding collaborator that moves across these contexts seamlessly.

agents.md is a practical bridge toward agent memory and preference alignment.

It helps agents navigate a repo efficiently and follow non-obvious conventions (tests here, style there). They also note current agents lack durable memory, making this a valuable stopgap.

WORDS WORTH SAVING

5 quotes

As soon as you saw that, you knew this is going to work, this is going to be big.

— Greg Brockman

For coding… this text comes to life… you realize that the harness is almost like equally part of how you make this model usable as the intelligence.

— Greg Brockman

Latency was a product feature…. fifteen hundred milliseconds… Anything that's slower… no one wants to sit around waiting for it.

— Greg Brockman

Think about it… the harness being your body and the model being your brain.

— Thibault Sottiaux

We've seen it work internally up to seven hours for… very complex refactorings.

— Thibault Sottiaux

Early GPT-3 coding capabilities and “vibe coding” originsWhy OpenAI went unusually deep on coding metrics/dataHarness definition: tools + agent loop + integrationsCopilot lessons: latency budgets and interface designTerminal/IDE/async cloud agent form factorsagents.md as instruction + navigation compressionCodex code review, refactoring, migrations, security patchingLong-running GPT-5 Codex tasks and reliability gainsScalable oversight, sandboxing, permissions, alignment2030 outlook: abundance, compute scarcity, GPUs near usersLearning to code with AI; fundamentals still matter

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.