Lenny's PodcastOpenAI Codex lead on the new shape of product work | Andrew Ambrosino
CHAPTERS
- 0:00 – 2:42
Why Codex is becoming OpenAI’s default “home base” app
Andrew opens with adoption and ambition: Codex is used by nearly everyone at OpenAI and is aiming to be the “best desktop app that has ever existed.” They discuss the quality bar required for an app to become the instinctive place you start work—like opening a browser tab.
- •Codex usage is widespread internally, not just among engineers
- •Vision: Codex as the default, hesitation-free app you open to do work
- •Quality bar and reliability as the primary product challenge
- •Codex expanding beyond coding into general knowledge work
- 2:42 – 6:23
The product-work inversion: implementation is cheap, curation is expensive
Andrew explains how AI flips classic product development: building is no longer the bottleneck. Teams can spin up many competing implementations quickly, so the hard part becomes selecting, framing, and integrating the best work.
- •Anyone can build almost anything quickly with frontier models
- •Explosion of parallel explorations ("90 attempts")
- •Old assumption (implementation is expensive) no longer holds
- •New bottleneck: curation, coherence, and deciding what matters
- 6:23 – 10:26
Documents vs. prototypes: choosing the right medium in an AI world
They push back on simplistic takes like “PRDs are dead.” Andrew argues that because both docs and prototypes are now cheap to create, teams must be deliberate about which medium drives clarity versus prematurely anchoring decisions.
- •Prototypes can over-anchor teams (“primal mark” problem)
- •Docs still matter when the goal is product clarity in vague spaces
- •Prototypes matter when stress-testing interactions with real use
- •Medium no longer reliably signals maturity or readiness to ship
- 10:26 – 12:06
What “taste” really means (and why it’s the new bottleneck)
Andrew unpacks “taste” beyond aesthetics—into systems thinking, context, strategy, and coherence. In a world of abundant implementation, taste is the ability to pick the right goal, shape, and integration path.
- •Taste isn’t just visual style; it includes judgment and systems fit
- •Covers framing, prioritization, and how work fits broader themes
- •Includes interaction semantics (e.g., motion matching meaning)
- •Core question: if we can build anything, what should we build?
- 12:06 – 16:50
Why AI is still bad at design (grading, novelty, and abstraction)
Andrew outlines why design lags coding: it’s harder to evaluate, less tied to labs’ research flywheels, and more culturally dependent. He also highlights deeper design challenges like abstraction and semantic consistency across a codebase.
- •Design is harder to train because feedback loops are subjective
- •Labs prioritize capabilities that accelerate research (coding over design)
- •Design needs novelty; copying patterns isn’t “good design”
- •Hardest layer: abstraction/semantics, not just pixels
- 16:50 – 21:11
Is the design process dead? What changes when everything can be “production-like”
They critique the traditional “case study factory” design process, which assumed building is costly and you must get it right before implementation. AI makes production-quality artifacts easy, so teams must decouple artifact polish from process stage and intent.
- •Classic process overvalued ritual and implied quality via steps
- •Polished artifacts no longer mean “late stage” or de-risked
- •Design process isn’t gone, but its tools and signals have changed
- •Need explicit clarity on what stage an artifact represents
- 21:11 – 24:01
How the Codex team actually works: role overlap, dogfooding, and “average of your work”
Andrew describes a pragmatic, fluid collaboration model where roles overlap and people are defined by what they do most, not strict boundaries. Dogfooding is central—even when it’s uncomfortable—because using Codex reveals what Codex must become.
- •More role “collapse” on Codex than in many orgs, but roles still exist
- •People are the “average” of their work over time (PM/eng/design blend)
- •Dogfooding loop drives product: use it even when it’s not best yet
- •Improving the product sometimes takes priority over optimizing process
- 24:01 – 27:22
Why eliminating roles is risky: specialties, best practices, and the “builder” trap
They explore whether functions like PM, design, and engineering will disappear. Andrew argues that removing rigid lanes is good, but deleting disciplines is harmful because specialties encode hard-won practices and depth that tools can’t replace overnight.
- •Some companies overreact: “get rid of PMs, everyone builds”
- •Roles as disciplines matter because they contain best practices
- •Tools reduce gatekeeping, making role-switching easier
- •Not everyone can (or wants to) do everything; depth still matters
- 27:22 – 30:14
Team structure on Codex: ‘10 to a few thousand,’ agency-first hiring, and zone defense PM
Andrew explains the unusual scale boundaries: a small core team but broad dependency on research and platform groups across OpenAI. Product leadership becomes “zone defense”—spreading coverage to steer chaos, fill gaps, and curate coherence.
- •Core team is double-digit engineers, smaller design group, few PMs
- •Codex as a culmination of many orgs’ work (models, infra, CUA, etc.)
- •Zone defense: PMs avoid clustering; seek coverage and gap-filling
- •Hiring emphasis: agency + taste; product-minded engineers
- 30:14 – 31:38
IC vs. management in the agent era: everyone is ‘managing’ something
They reframe the IC/manager divide: ICs increasingly manage agents and orchestrate work rather than write every character of code. Management remains essential, but the difference is granularity—what you’re coordinating and at what scale.
- •AI shifts IC work from typing to supervising/orchestrating
- •Management isn’t disappearing; it’s changing shape
- •Key competency: filtering signal vs. noise with unlimited output
- •Taste becomes critical to prevent “slop” from shipping
- 31:38 – 35:05
Roadmaps when the models move: hazy long-term plans and feature timing sensitivity
Andrew shares their planning philosophy: detailed near-term, intentionally vague long-term to avoid false precision. Model capability jumps can fully change whether a product succeeds, even if the product shape stays constant.
- •Short horizon = high detail; long horizon = intentionally hazy
- •False precision wastes time in fast-moving applied AI
- •Approach: prototype ideas, wait, and retest as models improve
- •Codex app could have failed months earlier due solely to model quality
- 35:05 – 39:18
Building features that don’t work yet—and the danger of being too ‘AGI-pilled’
They discuss why teams should build and archive “not-ready-yet” features as test artifacts for future model upgrades. Andrew contrasts overly autonomous early Codex concepts with more interactive, model-appropriate designs that succeeded.
- •Code artifacts can be future testbeds even if not shippable today
- •Re-releasing ‘same idea’ with better intelligence changes outcomes
- •Product must match current model limits (ask questions vs. overpromise)
- •Balance: fix paper cuts while also funding disruptive exploration
- 39:18 – 42:05
Frontier workflows: loops, autonomous development, and the ‘delete code’ problem
Andrew talks about the next frontier: supervised vs. unsupervised code generation, harnesses, and autonomous maintenance. A key blocker is that models tend to increase complexity, making fully autonomous improvement loops risky today.
- •The meaningful metric shifts to supervised vs. unsupervised generation
- •Explorations: overnight refactors, cleanup, and autonomous maintenance
- •Key limitation: models often add complexity instead of simplifying
- •Not yet at ‘agent reads Twitter/Slack and improves app’—but pushing toward it
- 42:05 – 46:52
How Andrew uses Codex to run his job: automated briefs, Slack triage, and release coordination
Andrew explains how his Codex usage evolved with his role—from writing the app with the app to managing discovery, alignment, and releases. He describes creating scheduled tasks that monitor Slack, generate daily briefs, and can be coached iteratively.
- •Personal dogfooding loop: fix what blocks your own work
- •Automations: daily brief across many Slack channels and workstreams
- •“Vibe coordinated” releases: auto-collect PR/Slack updates into trackers
- •Key UX challenge: powerful setup exists, but needs to be effortless for non-builders
- 46:52 – 52:06
Browser automation and ‘apps inside Codex’: connectors vs. in-app browser vs. computer use
They break down the different ways Codex can operate on external systems, including taking over the computer to click through tedious UIs. Andrew describes product trade-offs in making a browser a first-class surface, including security, latency, and keyboard shortcut conflicts.
- •Three modes: connectors, in-app browser, and full computer-use clicking
- •Computer use shines when connectors don’t exist (e.g., cloud consoles)
- •Security and tech stack choices (Electron vs. Owl/Atlas browser stack) matter
- •Hard UX problems: tabs, shortcuts, and muscle memory across many apps
- 52:06 – 59:25
The long-term vision: Codex as a desktop ‘home base’ that orchestrates specialist tools
Andrew traces Codex’s evolution from CLI to a right-sized desktop surface, then to broader knowledge-work usage—even when the UI is “hostile” to non-engineers. The goal is a flexible hub that can work inside tools (extensions/connectors) and also bring tools into Codex when useful.
- •Internal PMF first with engineering/research, then surprise adoption across functions
- •Attempts to move workflows elsewhere failed because people stayed in Codex
- •Vision: start/end work in Codex; hand off to best-in-class tools when needed
- •Example: Codex built a Premiere Pro extension to control editing primitives
- 59:25 – 1:09:56
Failure corner, lightning round, and post-roll: staying outcome-focused as process churns
Andrew reflects on years of startup struggle and constant micro-failures even now, emphasizing timing and fit. The lightning round reveals his current season of life (parenting), and the post-roll returns to a core lesson: don’t marry your process—commit to outcomes, adaptability, and self-awareness in the AI era.
- •Career failures: founder slog, regulated industries, repeated attempts
- •At OpenAI: fast feedback loops (and blunt internal critique) harden products
- •Lightning round: children’s books, Magic School Bus, Linear
- •Post-roll advice: don’t attach identity to tools/process; optimize for outcomes and adaptability