
OpenAI’s head of platform engineering on the next 12-24 months of AI | Sherwin Wu
Sherwin Wu (guest), Lenny Rachitsky (host)
In this episode of Lenny's Podcast, featuring Sherwin Wu and Lenny Rachitsky, OpenAI’s head of platform engineering on the next 12-24 months of AI | Sherwin Wu explores aI agents reshape engineering, management, startups, and enterprise automation rapidly Wu reports that Codex is deeply embedded at OpenAI: ~95% of engineers use it daily and 100% of PRs are reviewed by it, with heavy users opening far more PRs.
AI agents reshape engineering, management, startups, and enterprise automation rapidly
Wu reports that Codex is deeply embedded at OpenAI: ~95% of engineers use it daily and 100% of PRs are reviewed by it, with heavy users opening far more PRs.
He argues the engineer role is shifting from writing code to “managing fleets of agents,” requiring new skills in prompting, context management, and preventing agent drift—like “sorcery” with real consequences.
For companies, he warns many AI deployments likely have negative ROI due to top-down mandates without bottoms-up champions; he recommends “tiger teams” of internal power users to drive adoption and best practices.
He predicts major near-term shifts: longer-horizon agents (multi-hour tasks) and big gains in audio/multimodal models, enabling business process automation and potentially a boom of micro-startups supporting “one-person billion-dollar” outcomes.
Key Takeaways
AI is already the default authoring and review layer at OpenAI.
Wu says nearly all code is AI-generated first, ~95% of engineers use Codex daily, and 100% of PRs are reviewed by Codex—making human effort more about steering and verification than typing.
Get the full analysis with uListen AI
High performers compound faster with AI, widening productivity gaps.
Codex-heavy engineers open ~70% more PRs, and Wu expects the gap to grow as power users learn better workflows and trust models more.
Get the full analysis with uListen AI
The new core engineering skill is agent management, not syntax.
Engineers run many parallel threads, supervise multiple agent tasks, and must prevent “Sorcerer’s Apprentice” failure modes where agents go off-rails without sufficient guidance.
Get the full analysis with uListen AI
When agents fail, missing context—not “model stupidity”—is often the root cause.
An internal experiment with a 100% Codex-written codebase shows teams must encode tribal knowledge into repos (docs, comments, structure, . ...
Get the full analysis with uListen AI
AI can shrink the pain of code review and CI if you automate the boring parts first.
Wu notes Codex review can cut review time dramatically (e. ...
Get the full analysis with uListen AI
Many enterprise AI rollouts fail because they’re mandate-driven, not adoption-driven.
Top-down “AI-first” directives without bottom-up evangelists lead to confused users and poor ROI; Wu recommends a dedicated internal tiger team to experiment, document, and teach workflows.
Get the full analysis with uListen AI
In AI product building, customer requests can lock you into dead-end scaffolding.
Because models improve quickly, customers may ask for optimizations of today’s workaround (vector stores, agent frameworks, skills files), but “the models will eat your scaffolding for breakfast,” making those bets obsolete.
Get the full analysis with uListen AI
Build products for where models will be, not where they are today.
Wu’s heuristic: target capabilities that are ~80% there; as models improve, the product “clicks” and becomes dramatically better without rebuilding from scratch.
Get the full analysis with uListen AI
A ‘one-person billion-dollar startup’ implies a broader boom in micro-SaaS and bespoke tools.
Wu argues the second/third-order effect is many small startups selling narrowly tailored software that lets ultra-lean companies outsource functions like support and ops—potentially a golden age for B2B SaaS.
Get the full analysis with uListen AI
Two near-term capability leaps will reshape products: longer-running agents and better audio.
He expects coherent multi-hour tasks to become common within 12–18 months, changing how you design feedback/guardrails, and believes audio is underrated given how much business happens via calls and speech.
Get the full analysis with uListen AI
Notable Quotes
“Ninety-five percent of engineers use Codex. One hundred percent of our PRs are reviewed by Codex.”
— Sherwin Wu
“Engineers are becoming tech leads. They're managing fleets and fleets of agents.”
— Sherwin Wu
“It literally feels like we're wizards casting all these spells.”
— Sherwin Wu
“This team doesn't have that escape hatch.”
— Sherwin Wu
“The models will eat your scaffolding for breakfast.”
— Sherwin Wu
Questions Answered in This Episode
On the “95% use Codex” stat: what counts as “use” (drafting, refactoring, tests, review), and how do you measure it reliably?
Wu reports that Codex is deeply embedded at OpenAI: ~95% of engineers use it daily and 100% of PRs are reviewed by it, with heavy users opening far more PRs.
Get the full analysis with uListen AI
What are the most common failure patterns you see when engineers run 10–20 parallel agent threads, and what guardrails reduce “Sorcerer’s Apprentice” outcomes?
He argues the engineer role is shifting from writing code to “managing fleets of agents,” requiring new skills in prompting, context management, and preventing agent drift—like “sorcery” with real consequences.
Get the full analysis with uListen AI
From the 100% Codex-written codebase experiment: what specific repo artifacts (e.g., agents.md, patterns docs, code comments) produced the biggest improvement in agent success?
For companies, he warns many AI deployments likely have negative ROI due to top-down mandates without bottoms-up champions; he recommends “tiger teams” of internal power users to drive adoption and best practices.
Get the full analysis with uListen AI
You said context is usually the issue when an agent stalls—what’s your step-by-step debugging playbook to diagnose missing/incorrect context?
He predicts major near-term shifts: longer-horizon agents (multi-hour tasks) and big gains in audio/multimodal models, enabling business process automation and potentially a boom of micro-startups supporting “one-person billion-dollar” outcomes.
Get the full analysis with uListen AI
How do you decide which PRs are safe for “Codex-only” review versus requiring a human reviewer, and what risk signals trigger escalation?
Get the full analysis with uListen AI
Transcript Preview
ninety-five percent of engineers use Codex. One hundred percent of our PRs are reviewed by Codex.
For engineers, I don't know what job has changed more in the past couple years.
Engineers are becoming tech leads. They're managing fleets and fleets of agents. It literally feels like we're wizards casting all these spells, and these spells are kinda like going out and doing things for you.
What do you think people aren't pricing in yet?
The second or third order effects of the one-person billion-dollar startup. To enable a one-person billion-dollar startup, there might be a hundred other small startups building bespoke software. So I think we might actually enter into a golden age of B2B SaaS.
I've been hearing more and more there's this stress people feel when their agents aren't working.
There's a team that's actually doing an experiment right now with an OpenAI, where they are maintaining a one hundred percent Codex-written code base. They run into the exact problems that you're describing, and so usually you're like, "All right, I'll roll up my sleeves and figure it out." This team doesn't have that escape hatch.
You've shared that listening to customers is not always the right strategy in AI.
The field and the models themselves are just changing so, so quickly. They tend to, like, disrupt themselves. The models will eat your scaffolding for breakfast.
What's your advice to folks that are like, "Okay, I don't wanna miss the boat?"
Make sure you're building for where the models are going and not where they are today. There's a quote from Kevin Weil, our VP of Science here, and he likes saying: "This is the worst the models will ever be."
[upbeat music] Today, my guest is Sherwin Wu, head of engineering for OpenAI's API and developer platform. Considering that essentially every AI startup integrates with OpenAI's APIs, Sherwin has an incredibly unique and broad view into what is going on and where things are heading. Let's get into it after a short word from our wonderful sponsors. Today's episode is brought to you by DX, the developer intelligence platform designed by leading researchers. To thrive in the AI era, organizations need to adapt quickly, but many organization leaders struggle to answer pressing questions like: Which tools are working? How are they being used? What's actually driving value? DX provides the data and insights that leaders need to navigate this shift. With DX, companies like Dropbox, Booking.com, Adyen, and Intercom get a deep understanding of how AI is providing value to their developers and what impact AI is having on engineering productivity. To learn more, visit DX's website at getdx.com/lenny. That's getdx.com/lenny. Applications break in all kinds of ways: crashes, slowdowns, regressions, and the stuff that you only see once real users show up. Sentry catches it all. See what happened, where, and why, down to the commit that introduced the error, the developer who shipped it, and the exact line of code all in one connected view. I've definitely tried the five tabs and Slack thread approach to debugging. This is better. Sentry shows you how the request moved, what ran, what slowed down, and what users saw. Seer, Sentry's AI debugging agent, takes it from there. It uses all of that Sentry context to tell you the root cause, suggest a fix, and even opens a PR for you. It also reviews your PRs and flags any breaking changes with fixes ready to go. Try Sentry and Seer for free at sentry.io/lenny, and use code Lenny for one hundred dollars in Sentry credits. That's S-E-N-T-R-Y.io/lenny. [upbeat music] Sherwin, thank you so much for being here, and welcome to the podcast.
Install uListen to search the full transcript and get AI-powered insights
Get Full TranscriptGet more from every podcast
AI summaries, searchable transcripts, and fact-checking. Free forever.
Add to Chrome