Codex and the future of coding with AI — the OpenAI Podcast Ep. 6

What happens when AI becomes a true coding collaborator? OpenAI co-founder Greg Brockman and Codex engineering lead Thibault Sottiaux talk about the evolution of Codex—from the first glimpses of AI writing code, to today’s GPT-5 Codex agents that can work for hours on complex refactorings. They discuss building “harnesses,” the rise of agentic coding, code review breakthroughs, and how AI may transform software development in the years ahead. Chapters 1:15 – The first sparks of AI coding with GPT-3 4:00 – Why coding became OpenAI’s deepest focus area 7:20 – What a “harness” is and why it matters for agents 11:45 – Lessons from GitHub Copilot and latency tradeoffs 16:10 – Experimenting with terminals, IDEs, and async agents 22:00 – Internal tools like 10x and Codex code review 27:45 – Why GPT-5 Codex can run for hours on complex tasks 33:15 – The rise of refactoring and enterprise use cases 38:50 – The future of agentic software engineers 45:00 – Safety, oversight, and aligning agents with human intent 51:30 – What coding (and compute) may look like in 2030 57:40 – Advice: why it’s still a great time to learn to code

Andrew MaynehostGreg BrockmanguestThibault Sottiauxguest

Sep 15, 202550mWatch on YouTube ↗

CHAPTERS

Why AI coding feels inevitable: from GPT-3 docstrings to “daily driver” Codex
Andrew Mayne opens with Greg Brockman and Codex engineering lead Thibault Sottiaux on how quickly AI-assisted coding has progressed. Greg recalls early GPT-3 moments where docstrings reliably turned into working functions—an immediate signal that coding would be a major application area.
Why OpenAI went unusually deep on coding (despite the ‘G’ in AGI)
Greg explains that OpenAI typically pushes general capability, but programming became an exception where they built specialized data, metrics, and evaluation programs. The team learned that coding demanded distinct investment to measure and improve real-world usefulness, not just benchmark wins.
The “harness”: why tooling and agent loops matter as much as raw intelligence
They introduce the idea that code is ‘text that comes to life’—it must run, interact with tools, and affect real environments. Thibault defines the harness as the integration layer (tools + agent loop) that lets a model act, comparing it to a body for the model’s brain.
GitHub Copilot lessons: latency budgets and interface co-evolution
Greg reflects on Copilot as the first time many developers felt an AI embedded in their workflow. The key product revelation was that latency is a feature: autocomplete-style experiences require sub-second responses, forcing tradeoffs between speed and intelligence and motivating different interfaces for slower, smarter models.
From copy-paste debugging to agents that fetch their own context
Thibault describes observing developers stuffing more context into ChatGPT (code snippets, traces) until interactions became unwieldy. That pressure suggested flipping the paradigm: let the model drive, gather context itself, and debug with less human micromanagement.
Form-factor experiments: terminal, IDE, cloud async agents, and internal “10x”
They detail prototypes across terminal and remote/async setups, including an internal terminal tool called “10x.” The team explored letting agents run at scale (close laptop, keep working) while also recognizing the practicality of local workflows and the need to meet developers where they already are.
Convenience vs intelligence: integrations can be ‘transformative’ even without smarter models
Greg emphasizes two axes—intelligence and convenience (latency, cost, integration)—and a moving “acceptance region” where users adopt tools. They cite terminal-context integrations that eliminated copy-paste as a step-change in productivity, illustrating how harness improvements can rival model upgrades.
Choosing where to use Codex: terminal, IDE, GitHub @mentions, and Agents.md
The team describes today’s ‘experimentation phase’ across interfaces: terminal power workflows, IDE for controlled edits/undo, and GitHub @mentions for delegated tasks. They introduce Agents.md as a lightweight way to encode navigation hints and team preferences so the agent can operate efficiently and consistently.
Enterprise ‘killer’ work: refactoring, migrations, patching, and tool creation
Greg argues massive refactoring and migrations (e.g., COBOL modernization) remain largely unsolved but economically pivotal. They discuss automating painful work like library migrations and security patching, and the longer-term flywheel where agents build new tools (like modern Unix utilities) to amplify productivity.
Codex code review: crossing the threshold from ‘bot noise’ to trusted safety net
Thibault describes an internal breakthrough: high-signal PR review that checks intention/contract against implementation, traces dependencies, and surfaces deep issues. They note a ‘threshold effect’—below it, auto-review is ignored; above it, teams rely on it and feel pain when it’s unavailable.
What’s new in GPT-5 Codex: harness-optimized reliability and multi-hour ‘grit’
They present GPT-5 Codex as a GPT-5 variant optimized for the Codex harness—tighter coupling between model and tools for reliability. A standout capability is persistence: it can work for hours on complex refactors, while still responding quickly on simple requests.
The agentic future: millions of supervised agents, permissions, and scalable oversight
Thibault forecasts cloud populations of agents producing economic value under human steering. Both highlight the core safety challenge: humans can’t read every line, so systems need sandboxing, permissioning, escalation paths, and scalable oversight methods to maintain trust and alignment with intent.
2030 outlook: abundance of creation, scarcity of compute, and security endgames
They predict AI will enable far easier creation (digital and physical), but compute will remain scarce and strategically important. Discussion includes the security arms race and a possible ‘endgame’ via formal verification, plus the need to bring GPUs closer to users to reduce latency in tool-heavy agent loops.
Still learn to code—now with AI: fundamentals, faster learning, and new leverage
Both guests argue it’s an excellent time to learn programming, with AI accelerating language acquisition and problem solving. They stress that the most successful AI coders still understand fundamentals—architecture, structure, and correctness—using AI to avoid reinventing wheels and to surface questions novices don’t know to ask.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome

Why AI coding feels inevitable: from GPT-3 docstrings to “daily driver” Codex

Why OpenAI went unusually deep on coding (despite the ‘G’ in AGI)

The “harness”: why tooling and agent loops matter as much as raw intelligence

GitHub Copilot lessons: latency budgets and interface co-evolution

From copy-paste debugging to agents that fetch their own context

Form-factor experiments: terminal, IDE, cloud async agents, and internal “10x”

Convenience vs intelligence: integrations can be ‘transformative’ even without smarter models

Choosing where to use Codex: terminal, IDE, GitHub @mentions, and Agents.md

Enterprise ‘killer’ work: refactoring, migrations, patching, and tool creation

Codex code review: crossing the threshold from ‘bot noise’ to trusted safety net

What’s new in GPT-5 Codex: harness-optimized reliability and multi-hour ‘grit’

The agentic future: millions of supervised agents, permissions, and scalable oversight

2030 outlook: abundance of creation, scarcity of compute, and security endgames

Still learn to code—now with AI: fundamentals, faster learning, and new leverage

Get more out of YouTube videos.