Skip to content
Lenny's PodcastLenny's Podcast

OpenAI Codex lead on the new shape of product work | Andrew Ambrosino

Andrew Ambrosino leads development of the Codex desktop app at OpenAI. Nearly 100% of OpenAI employees—not just engineers—now use Codex weekly. A lifelong builder with a background spanning engineering, design, product management, and founding companies, he is now responsible for turning the Codex desktop experience into what he calls “the best desktop app that has ever existed, full stop.” *In our in-depth conversation, we discuss:* 1. Why AI has completely flipped the product development process 2. What “taste” really means as a professional skill, and why it is emerging as the most valuable capability in an AI-first workplace 3. Why Andrew believes the Codex app would have failed if they launched it last November (vs. in February) 4. The “zone defense” model for how product managers at OpenAI operate when everyone can build anything 5. How roles are collapsed on Andrew’s team, and why eliminating the concept of roles entirely is a big mistake 6. How Andrew uses Codex to run his own workflows 7. The vision for a home base that coordinates work across ChatGPT, Codex, and the tools people already use. *Brought to you by:* WorkOS—Make your app enterprise-ready, with SSO, SCIM, RBAC, and more: https://workos.com/lenny Mercury—Radically different banking, now with Command: https://mercury.com/ *Episode transcript:* https://www.lennysnewsletter.com/p/openai-codex-lead-on-the-new-shape *Archive of all Lenny's Podcast transcripts:* https://www.dropbox.com/scl/fo/yxi4s2w998p1gvtpu4193/AMdNPR8AOw0lMklwtnC0TrQ?rlkey=j06x0nipoti519e0xgm23zsn9&st=ahz0fj11&dl=0 *Where to find Andrew Ambrosino:* • X: https://x.com/ajambrosino • LinkedIn: https://www.linkedin.com/in/ajambrosino • Website: https://ambrosino.io *Where to find Lenny:* • Newsletter: https://www.lennysnewsletter.com • X: https://twitter.com/lennysan • LinkedIn: https://www.linkedin.com/in/lennyrachitsky/ *In this episode, we cover:* (00:00) Introduction to Andrew Ambrosino (02:30) How AI is changing the shape of product work (06:32) When to use documents vs. prototypes (10:25) What “taste” actually means (12:06) Why AI is still bad at design (16:18) Is the design process really dead? (21:35) What the design process looks like on the Codex team (23:41) Are product functions disappearing? (27:22) Team structure (30:12) IC vs. management (31:37) Planning roadmaps (35:16) Building features that don’t work yet (38:13) The ambition problem: when you’re too AGI-pilled (39:17) The latest frontier: loops and autonomous development (52:05) How Andrew uses Codex to automate his entire job (46:52) The power of computer use and browser automation (49:10) Will we run all our SaaS apps inside Codex? (52:05) The future vision for Codex (57:20) The videographer who built a Premiere Pro extension with Codex (59:30) Failure corner (1:01:50) Lightning round (1:07:03) BTS: How our producer uses Codex for editing *Referenced:* • Codex: chatgpt.com/codex • The Primal Mark: How the Beginning Shapes the End in the Development of Creative Ideas: https://www.gsb.stanford.edu/faculty-research/publications/primal-mark-how-beginning-shapes-end-development-creative-ideas • Linear: https://linear.app • “Taste” is not just taste in aesthetics: https://x.com/thenanyu/status/2067327619897446721 • Linear’s secret to building beloved B2B products | Nan Yu (Head of Product): https://www.lennysnewsletter.com/p/linears-secret-to-building-beloved-b2b-products-nan-yu • Paul Graham’s website: https://paulgraham.com • The design process is dead. Here’s what’s replacing it. | Jenny Wen (head of design at Claude): https://www.lennysnewsletter.com/p/the-design-process-is-dead • The case study factory: https://essays.uxdesign.cc/case-study-factory • Why humans are AI’s biggest bottleneck (and what’s coming in 2026) | Alexander Embiricos (OpenAI Codex Product Lead): https://www.lennysnewsletter.com/p/why-humans-are-ais-biggest-bottleneck • OpenClaw: https://openclaw.ai • OpenClaw: The complete guide to building, training, and living with your personal AI agent: https://www.lennysnewsletter.com/p/openclaw-the-complete-guide-to-building • From skeptic to true believer: How OpenClaw changed my life | Claire Vo: https://www.lennysnewsletter.com/p/how-openclaw-changed-my-life-claire-vo • The Codex feature that works while you sleep: https://www.lennysnewsletter.com/p/the-codex-feature-that-works-while • The AI paradox: More automation, more humans, more work | Dan Shipper: https://www.lennysnewsletter.com/p/the-ai-paradox-dan-shipper • Atlas: https://chatgpt.com/atlas • Anthropic: https://www.anthropic.com *Recommended books:* • The Gruffalo: https://www.amazon.com/Gruffalo-Julia-Donaldson/dp/0803730470 • The Big Orange Splot: https://www.amazon.com/Big-Orange-Splot-Manus-Pinkwater/dp/0590445103 _Production and marketing by https://penname.co/._ _For inquiries about sponsoring the podcast, email podcast@lennyrachitsky.com._ Lenny may be an investor in the companies discussed.

Andrew AmbrosinoguestLenny Rachitskyhost
Jun 28, 20261h 9mWatch on YouTube ↗

EVERY SPOKEN WORD

  1. 0:002:30

    Introduction to Andrew Ambrosino

    1. AA

      90% of people at OpenAI use Codex. Not 90% of engineers, that was 90% of the entire company.

    2. LR

      You had this tweet the other day where you said that you intend to make Codex the best desktop app that has ever existed.

    3. AA

      Yeah. The quality bar for Codex had to be so high that there was never, like, a hesitation that you have opening this app to do the next thing, that this was your natural choice, just like people have kind of come to open a browser tab, right?

    4. LR

      That's true. I know. There's numbers constantly coming out about the records you guys are setting for usage.

    5. AA

      I don't know. Like, we'll see. A lot of people seem to like the app.

    6. LR

      Why do you think AI and the top frontier models are just not good at design?

    7. AA

      I think design's a little bit harder to grade because the human aspect of taste is, like, part of the feedback mechanism you need. That is still feeling a little bit out of reach with the current technology.

    8. LR

      Mm-hmm. What does the shape of product team look like now versus a couple years ago?

    9. AA

      Everybody at OpenAI is very agentic, has great ideas, and so everybody's building everything. And it's not that people are doing fundamentally different roles or focusing on different things, it's that it's backwards. The implementation is actually not the expensive part anymore.

    10. LR

      Mm.

    11. AA

      It's, dare I say, taste.

    12. LR

      Do you feel like there's this collapse coming where everyone's everything, and that's just the future, or do you think we're gonna s- continue to be mostly divided up?

    13. AA

      There are some things that I'm afraid of. I've heard a lot of companies be like, "We're getting rid of the product role, and everybody's just gonna be a builder," and then what happens is-

    14. LR

      Today my guest is Andrew Ambrosino, product and engineering lead for the Codex app at OpenAI. Codex is quickly becoming people's go-to app for building products and also for non-product work, like organizing files in your computer, drafting documents, doing data analysis, reading your emails, and a lot more. If you stick around for the end of this episode, we actually have a little clip from after we stopped recording, where the producer in the room started talking about how he uses Codex in his editing work. Since this January, Codex usage has grown 6X. They currently have over 5 million weekly active users. I suspect this number is quickly going to be out of date. Internally at OpenAI, nearly 100% of their employees use Codex weekly, and that is not just the engineers. Andrew is a designer turned engineer turned product manager who's building the app that more and more of the world is using to build their own products. Before we get into it, don't forget to check out lennysproductpass.com for a year free of the hottest and most well-crafted AI products in the world, available exclusively to Lenny's newsletter subscribers. With that, I bring you Andrew Ambrosino.

  2. 2:306:32

    How AI is changing the shape of product work

    1. LR

      Andrew, thank you so much for being here, and welcome to the podcast.

    2. AA

      Thank you for having me.

    3. LR

      This is a, a rare in-person podcast. I rarely do this kind of thing. We'll see how it goes.

    4. AA

      We'll see.

    5. LR

      We'll see. People like these more. When we were preparing for this chat, I asked you, "What's the biggest thing you want people to get out of this conversation?" And you said that it was how AI is changing the shape of product work. You're working at maybe the most bleeding edge-

    6. AA

      Mm-hmm

    7. LR

      ... AGI-pilled software team there is.

    8. AA

      Yeah.

    9. LR

      So you have a really interesting lens into where things are heading, where other teams are gonna be in, in a year or two or, or m- or more. What does the shape of product team look like now versus a couple years ago?

    10. AA

      One of the, the hardest things to do right now as a leader building these products is just sort of the inversion of the process, in my mind, which I think a lot of people have talked about, which is that anybody can build anything, right? Like, I, I generally believe now that starting from scratch, if you talk to these models, ours, anybody else's really, um, you can stand up whatever feature you want, right? And that's not necessarily a hard part of software, but that's, like... That's really cool, and I think that has created an environment where people are making all of this, right? You give people unlimited tokens. Everybody at OpenAI, OpenAI is, is very agentic, has great ideas, and so everybody's building everything. Whereas I think, you know, you look back at product process that we've all run for a long time, and it's been a little bit opposite, right? It's been kind of research, ideation. Maybe there was some prototyping, but it was... You know, even when we got past Waterfall, it, it was still kind of flavored of, like, the implementation is expensive. And so you, what you wanna do is you wanna de-risk all implementation up front through documents, through research, through prototypes, because prototypes and designs are cheaper, was kind of the, the assumption there. Uh, and that's changed. That's, like, totally changed. And right now, I'm sure there are 90 different explorations for... There, there's this feature that we desperately need to do that I'm sure there are 90 different uncoordinated teams, like, implementing and trying, right?

    11. LR

      Mm.

    12. AA

      Um, so I guess the short answer is, like, it's, it's backwards, and it's not that people are doing fundamentally different roles or focusing on different things or that even skill sets have vanished or that roles have just disappeared. It's that it's backwards, right? The implementation is actually not the expensive part anymore.

    13. LR

      Mm.

    14. AA

      It's, dare I say, taste. Um, but it's the curation process. It's like, of those 90 attempts, like, what's good about these? What should we fold into other aspects of this, right? How should we frame this? Should it be part of this other feature, right? How many segments should be in the toggle? Um, you know, all of those things.

    15. LR

      This episode is brought to you by our season's presenting sponsor, WorkOS. What do OpenAI, Anthropic, Cursor, Vercel, Replit, Sierra, Clay, and hundreds of other winning companies all have in common? They are all powered by WorkOS. If you're building a product for the enterprise, you've felt the pain of integrating single sign-on, SCIM, RBAC, audit logs, and other features required by large companies. WorkOS turns those deal blockers into drop-in APIs with a modern developer platform built specifically for B2B SaaS. Literally every startup that I'm an investor in that starts to expand upmarket ends up working with WorkOS, and that's because they are the best. Whether you are a seed stage startup trying to land your first enterprise customer or a unicorn expanding globally, WorkOS is the fastest path to becoming enterprise-ready and unblocking growth. It's essentially Stripe for enterprise features. Visit workos.com to get started, or just hit up their Slack, where they have actual engineers waiting to answer your questions. WorkOS allows you to build faster with delightful APIs, comprehensive docs, and a smooth developer experience. Go to workos.com to make your app enterprise-ready

  3. 6:3210:25

    When to use documents vs. prototypes

    1. LR

      today. Taste such a, such a buzzword. I wanna come back to that. Uh, this idea of 90 prototypes, so interesting. So just to make sure I understand that. So there's an idea out there floating around OpenAI. What people used to do is write docs.

    2. AA

      Yeah.

    3. LR

      Here's what we're gonna build, here's the feature, here's the strategies, PRD. Today, what you're describing, which makes all the sense, is people just create a prototype. And what you're saying is people across the company have kinda similar ideas, and now instead of a doc, they create their little prototype, and that leads to kind of 90 different things people can look at and maybe pick, "Here's the direction we wanna go down." Is that-

    4. AA

      Yeah

    5. LR

      ... the idea?

    6. AA

      There's a lot of this.

    7. LR

      Yeah.

    8. AA

      Um, and you know, it's not just happening here. Like, you've seen many product leaders say, "PRDs are dead."

    9. LR

      Yeah.

    10. AA

      "Prototypes are in." And I, I actually don't believe this at all. I think that one of the interesting things that is happening right now is that because implementation has gotten so cheap across every medium, it's very tempting to jump straight to a prototype. E- especially if you're not an engineer, right? Especially if you've never been able to write code or never been interested or never had the time, it's really tempting to say, like, "PRDs are dead. Let me just show you what I mean," right? What I've also noticed, though, is that for engineers, it's really tempting to write a lot of documents, a lot of documents that are not worth reading. Um, and this is no shade on people writing documents. It's that if implementation is abundant, then it's really important to pick the right format for the point you're trying to make. If that point is product clarity around a vague area, then it might actually be a document. If what you're trying to do is get something in people's hands to try out and to stress-test an interaction pattern, it's a prototype. But I think, like, this is kind of the funny thing now, which is that, like, it's really important to pick the medium.

    11. LR

      There's this, uh, term that a, a podcast guest shared that I think about when you say this, uh, which is s- it's called the primal mark. When a designer or a painter or an artist just creates the first mark on, on, on a painting or a piece of art, that mark is what you start to respond to, and so everything kind of trickles down from that first mark you make. And what I'm hearing you saying is sometimes the prototype is the wrong first thing to do because then you're just responding to this prototype versus a different idea versus a bigger idea. So I love hearing this. So, like, everyone's just like, "Okay, forget it. No more writing, no more docs, no more PRDs." You're saying they're actually still useful for specific use cases.

    12. AA

      Yeah. I, I, I think, too, there's this part of the previous world was that the medium implied it had baked in a lot of, a lot of signal around where in the process something was, right? So if you're seeing something that feels like the app in production, that means that it's late in the process, that assumptions have been de-risked, that, you know, design has looked at this, that this is a good business goal, right? And now those things are sort of divorced, right? And the reason it was that way is because it was hard to get resources to build the thing until it was properly de-risked, and now that's, like, just out the window, right? And so I think it's really important to start saying, "Look, we can have prototypes. We can have documents." Is it... Are we clear around what this is doing, right? 'Cause to your point, you do not wanna over-anchor on this thing that was meant to be an exploration, but now it looks so production-ready that, like, oh, visually, it's ready for prod. But it's not actually the right model of, of where the research is going or what users are asking for or what's right for the business, right? Not to overdo the taste thing, but it's like, it's once again, it's like the taste to know, like, what to work on, how to present that information, like, how to achieve the goals, what medium to use is emerging as, like, the most important thing to do. And that's it. That's,

  4. 10:2512:06

    What “taste” actually means

    1. AA

      that's in every field.

    2. LR

      What is taste when you talk about good taste? Is it, is it what you describe deciding, "Here's the thing we're gonna invest in"? Is it also once you have a thing, is this right? Is this the thing to ship? Talk about when you think about what is good taste, good judgment, what is, what is that concretely? 'Cause people hear this word, they're like-

    3. AA

      Yeah

    4. LR

      ... "Oh, I have good taste. I know it." What does it look like in practice?

    5. AA

      Yeah. Um, it's funny. Uh, there was a tweet. I'm too online. There, there was a tweet, I think it was yesterday, from the head of product at Linear. I might be getting that wrong. Sorry to anybody. Who said people overemphasize the aesthetic part of what taste means, and they used, um, Paul Graham's grade, but they used him as an example, saying, "Paul Graham clearly has great taste and wears cargo shorts," right? Like, you know, we gotta, we gotta, like, tease out what taste means a little bit. Um, and, and there's a lot of nuance here. I think it's all of the above to, to what you mentioned. It's, it's the, like, there is an aesthetic part to it, um, but there's also a, a systems thinking part of it, like how does this fit in the system? There's a where are we going and how, like, what, what theme is this part of? There's how to present this. A lot of it is wider context and, you know, obv- obviously there are parts of taste that are like, "Hey, this interaction animation doesn't fit in the semantic meaning it's supposed to," right? Like, it's too snappy for what it's actually trying to convey. And that's incredibly important, and I focus probably too much on that. Um, but there's, there's, like, the, the, like, what shou- what should this be? Like, if we can build anything, like, what's, what's the, what's the goal here and how do we, how do we get there? That I think is, like, s- actually the real taste question here.

  5. 12:0616:18

    Why AI is still bad at design

    1. LR

      When I, when I hear things like this, I always wonder, where will human brains continue to be valuable as AI becomes stronger and better and doing, and do- and doing more of, of the work? And it feels like taste is, is a part of it. Something I think about along these lines is just AI is still very bad at actual design. Like, the output of AI is not great.

    2. AA

      Yeah.

    3. LR

      Rarely is it like, "This is it. They nailed it."

    4. AA

      Yep.

    5. LR

      And it's always like, "Oh, this is Claude design. This is Codex design." Why do you think AI and the top frontier models are just not good at design? ... today.

    6. AA

      Yeah.

    7. LR

      And do you think they'll get there? Do you think we'll get to a place of like, "Holy moly, we're done"?

    8. AA

      Yeah, I tend to, to think that there are some practical reasons why it's lagged, and also some harder problems to crack. I'm not in our research org, like I'm sure I'll get yelled at for saying this. I think design's a little bit harder to grade, um, than so- than, than software, in that, you know, creating a loop where you can train a model on like what's good design and what's bad design is just a little bit more tedious and onerous than, you know, does the code compile? Does it, you know, do, do what it's supposed to, right? It, because the human aspect of taste is, is like part of the feedback mechanism you need. I also think that, uh, the labs historically invest in making their models good at things that accelerates AI research, and that in the era, the early era of coding models, it is very clear that the model being able to write correct code would accelerate research, right? In a way that you can't really make the same case for design. Not that getting good at design isn't important, it's that it's not directly in that, that flywheel, right? Um, those are practical reasons, and I, you know, those will go away. Like, these models will get pretty good at design. There are some kind of murkier things that it is, is gonna be really tough. Um, and like I've kind of a short list of them. One is there is an aspect of, of culture to what is considered good design, in that you remember, it was probably, what, last year, where like every new website that came out was just a copy of Linear's website, right? Like Linear's website, great design, great taste. Like if a model did that, I'd be like, "Wow, this is incredible leaps here," right? If I have a model that outputs Linear's website every time, that's not the challenge here, right? Um, there's an amount of like novelty that is more important in design than it i- actually is in software engineering. Like software engineering, you almost, you almost want it to over-index unknown patterns, right? Whereas design, it's like no, there's an element of randomness here, and, and novelty, right? There's also the, you know, to me, like I s- I spent a lot of time writing code or s- you know, supervising code on the early Codex app. And even as the models get good at design, there's sort of a, an abstraction layer that is an interplay between the software design and the code that's being written. Like, "This thing over here in this corner should share X, Y, and Z in the code base with this thing down here," right? And that's a little bit different than saying, "The model needs to be a better designer," especially on the like, you know, that's not visual... That is visual design, but it is significantly deeper. It's about the abstractions, in that like, oh, if tomorrow our company did a rebrand, the shallow version of this is that we have to, you know, update 263 components one by one. The deep version is like the semantics between these two things that look different, like they're both in list that have the, like this style that convey this interaction pattern to the user. And I think like that is still feeling a little out of, out of reach with the current technology.

    9. LR

      Mm.

    10. AA

      Right? That abstraction layer. So I think, you know, as, as we've gone through this process, right, of, you know, we started the Codex app in, uh, November, and we weren't using it full time. Now we use it for everything. That's been a journey. But now it's like the things that we actually do while using it are different things. So what was the question?

    11. LR

      No, that was, that was an amazing answer.

  6. 16:1821:35

    Is the design process really dead?

    1. LR

      Uh, speaking of design and being creative, the Codex app when it came out, it's like such a new thing-

    2. AA

      Yeah

    3. LR

      ... that nobody has, had seen before. It's like not a terminal thing. It's not an ID thing. It's like this chat thing that codes, and you can see code.

    4. AA

      Yeah.

    5. LR

      To your point, it feels like it'd be hard for AI to be like, "Here's a whole new paradigm for how to code." And that feels like where human brains continue to be valuable for now is like creativity almost, and coming up with something new versus like patterns of things that have been done before.

    6. AA

      Yeah. I mean, I totally agree. Let's give it up for the human brain.

    7. LR

      [laughs] For now. As we were getting ready for this, you said that you were listening to the, uh, episode with Jenny-

    8. AA

      Yes

    9. LR

      ... who is just the head of design for Cloud Code and Cowork and such, and she had this whole kind of thesis that the design process is dead. There's no time for design. Things are moving too fast. Uh, just build now, and design is kind of steering things as things move along. You're implying you have kind of a different perspective on the design process.

    10. AA

      We probably agree on, on a lot of this, Jenny and I. I wasn't a fan of the design, like the design process proper. I agree with, with her take that it is, it is dead. And I genuinely was not a fan of this process before AI. Like I, I think it was-

    11. LR

      Can you describe the process real quick, just when people think about the design process, what does that mean?

    12. AA

      Yeah. Yeah. So I mean, I, when I ran a startup a number of years ago, we, we would, you know, do design hiring, and there was this sort of snarky article that came out about like the case study factory. And it was, it was like mid-SERP era stuff, right? And it was that designers were being taught about this process and valuing that above all else, above all outcomes even, right? And if something went through this process, that two things were true. One, it would be good, and the process would guarantee quality and guarantee impact, and also that if something, the thing was good if it went through that process, even if you don't like it and nobody uses it. And, and it's like the process w- you know, of, of user research and the di- divergence and the convergence, it's the right framework. It was always a little academic, but I think, I think this is really exposing some areas where it falls down, especially because of the speed of implementation. And like once again, like that process is sort of predicated on the assumption that implementation is expensive, and that you can really only afford to build once. And so you need to fully, like exhaustively go through the problem space and the solution space before implementing, right? And that you, and, and then like k- kind of we saw with like, you know, Figma and Origami and all of these tools, that you can fast-forward some of the insights by pulling interactive prototypes earlier into the process, right? That you can, you know, simulate production and like, you know, there ended up being sort of a meme about executives just being like, "Well, can we just do a prototype," and then like expect it to, you know, work. But, but this thing was real, right? That this became part of the, the design process proper, right? We pulled prototyping into that. The, the problem now is that you can pull all of the implementation into that, and there's a mismatch between, I think, a lot of assumptions. Again, like you see this fully polished prototype that looks like it's ready to go out the door, and enough people at a company see that and they're like, "Can we, can we release this now?" But the appropriate... Like, we're actually in the early design process stage, and nobody's just saying that, right? Like, this is, this is where we are with like a-

    13. LR

      Mm-hmm

    14. AA

      ... a bunch of like multiplayer exploration, right? You know, 90 people will have this idea. It'll look really polished, but it's like, no, this is actually... That, that's the design process now, right? Tying the design process to mediums, media, uh, w- you, like that's the scary part. It's that designers have more tools now to do this process with, right? You can put stuff into the current product and you can A/B test it or just, you know, use that as a prototype. Um, many companies right now have this idea of like a baby version of the product, like baby Cursor. You've seen this on Twitter. Like, we have baby Codex, right? A dramatically simplified code base that approximates all of the interactions of the production app and therefore is a lot quicker to vibe code over, right? 'Cause you can be like, "Well, what if the sidebar worked like this?" Or, "What if a pane came in and had like a group chat here? What if X, Y, Z," right? That's like a huge tool that's part of the design process. So to say the design process is dead, I feel like it, it's both true and false, right? It's that if you are, if you are tied to the tools and the ex- like the spec- the exact like day-to-day specifics of the process, then yeah, it's dead. Like, you're not gonna have a good time. But to throw the process out completely or throw like the overlay of the process, the like, "Hey, we're at this point in the process," like that is still more important than ever.

    15. LR

      It's, it's really interesting because you have a background in every function. If people look at your LinkedIn, it's like engineer, designer, product manager, founder. Uh, now you oversee the desktop app. And I think design is not under your purview, is that right? Is there like a separate design team or are they under your-

    16. AA

      Depends on the week.

    17. LR

      Okay. [laughs]

    18. AA

      We work very closely together. Like we believe in all sitting together, being embedded. Like I, reporting line, I don't,

  7. 21:3523:41

    What the design process looks like on the Codex team

    1. AA

      I don't-

    2. LR

      The shift, the shift weekly. What does the design process look like on, on the Codex app?

    3. AA

      Yeah. There's been a lot written about role collapse, existential role collapse. There are no roles anymore. We haven't seen that. Um, we've, we have seen more role collapse in the Codex org, um, than I think other parts of the company and other parts of the economy. I think part of this is that this was a technical product for engineers, and so our designers speak engineer, right? Our product managers speak technical language and write code. Alexander has a master's degree in computer science, which is, I do not have a master's degree in computer science. Like, so we've seen a lot of role collapse and, and I think that, you know, one of, one of the ways that we describe how the groups work together is that there's significantly more overlap in the roles than there used to be, and everybody's sort of defined less by the fence and the boundaries of where design stops and engineering starts, but more the average of where they're working, right? So like, you know, if you average up all of the things that somebody on our design team does, there's plenty of code writing things. There's plenty of things that are product work. But on average, like they're dots over here, right? If you draw, if you draw it out on a diagram. And this sort of speaks to the process, too. Um, especially because the entirety of the Codex app has been informed by the dogfooding loop. There is a desire among all of us to try to do as much as possible in the app, even when it's not the best tool, so that it can become the best tool. And so a lot of design we all work on by using the app and say, "Okay, what's broken about this?" This is a whole thing we do, which is that we o- [laughs] we often don't improve our process so that we can make the product better to do it, which is a deeply like uncomfortable place to be in. But, you know, week to week it's changing.

    4. LR

      I love this point so much that, uh, like what are you? It's your role is the average of what you spend your time on. If most of your work is PM-y work, then okay, you're a PM for now.

    5. AA

      Yeah.

    6. LR

      If it's engineering,

  8. 23:4127:22

    Are product functions disappearing?

    1. LR

      you're an engineer for now. I feel like, was OpenAI the first company to, uh, call people member of technical staff?

    2. AA

      No. I, I believe that this, this might have started with Xerox. First company I interned at, it was a company called UpThere, did the same thing. This has been around. But it's, it's, it's much more common now, but, you know, it is kind of a tradition in research-focused companies, right?

    3. LR

      Okay, got it. So it emerged from research. But I feel like it's such a, I don't know, sign of where things are, might be headed, this idea of, uh, we're just gonna call everyone me- member of technical staff. Your function isn't set. You're not like in this bucket of the PM org or the eng org or the design org. Do you feel like that's where we all head long term? Do you feel like functions will continue to exist? Like there's still the PM skill set and the eng skill set and the design skill set-

    4. AA

      Yeah

    5. LR

      ... and people are, "I'm a designer." Or do you think this is like, like people call it builder. Do you feel like there's this collapse coming where everyone's everything and that's just the future? Or do you think we're gonna continue to be mostly divided up in the functions?

    6. AA

      Yeah. There, there are some things that I'm afraid of, and I think that, you know, some, some companies Like to be very extreme about getting onto the bandwagon of whatever people say is going to happen. And I think part of the danger in eliminating the concept of roles is that it can dangerously eliminate the idea that things are specialties with knowable best practices, right? I, I've heard a lot of companies be like, "We're getting rid of the product role," which I think is, by the way, a terrible idea. Um, and everybody's just gonna be like a builder. And then what happens is they don't... Like, this whole discipline of product that's been built up and has, like, real best practices, real things that have been tried and failed, and, like, real processes, like, that just gets abandoned because people are like, "Oh, I wrote some code," right? Like, that's not a great place to be in. I think that the boundary of, like, so and so, like, this isn't your lane, I, I welcome that part going away. But there's a balance here where it's like not everyone can work on everything, for one, both in terms of breadth and depth, right? Like, this is why managers are not gonna go away. Um, not everybody can work on e-everything. And also, like, every discipline has a skill component to it, which I think a lot of engineers are guilty of not recognizing that like, well, engineering is a skill too, and it's writing code, and, like, other roles are just people vibing. And it's like, no, that's not how it works, right? Um, like, yes, you can use Excel, you, but you cannot work on the finance team, right? That is, like, that kind of stuff, right?

    7. LR

      Yeah. I think there's also just, like, do you want to be doing this work?

    8. AA

      Yeah.

    9. LR

      Do I wanna be-

    10. AA

      I think more of it's that actually now-

    11. LR

      Mm

    12. AA

      ... is, like, it's easier to switch roles. It's easier to learn the best practices. It's easier to not tie your effectiveness in a role with the ability to use the exact tool, right? It's more of, like, can you get yourself into this mindset, learn which things work and which don't, and then, like, focus on it, right? Like, I, I spent so long feeling like I should not be a software engineer because I didn't care about, like, assembly language or memorizing TypeScript syntax, you know? And it's like, there have always been parts to these roles that are that sort of gatekeeping, that are like, "Well, no, this... Like, being good at this role is being good at this tool." And I think that's what's kind of starting to erode. I just don't-- I, I think people take this, they hyperbolize all of this.

  9. 27:2230:12

    Team structure

    1. LR

      What does your team look like, um, on, on, on the Codex team? How many engineers, designers, PMs? What's kinda like the makeup of the team right now?

    2. AA

      Every time people ask me, like, how many people are on the Codex team-

    3. LR

      [laughs]

    4. AA

      Do you remember my answer to this?

    5. LR

      Uh.

    6. AA

      I, I'm like, "It's somewhere between 10 and a few thousand." I mean, uh, it's like a fake answer, but it's, it's real in that we do see this as the culmination of-

    7. LR

      Mm

    8. AA

      ... what everybody works on here. Like, everything that goes into model research, everything that goes into how, you know, models are good at CUA and browser use, everything about how m- you know, model, model personality, all of the product work around, you know, front-end infrastructure, all of the user... Like, all of it is this product. At the same time, we are not accepting PRs daily from thousands and thousands of people on whatever they want. Um, so we got a team, uh, double digits of engineers, probably half that on the design side, um, you know, a few product people, although, you know, product here is kind of more of a zone defense play. And I think one thing that is very common around, am- among everybody on the Codex side or, or on the desktop side is agency and taste, right? A lot of former founders or people who were at larger companies doing founder-shaped things, um, a lot of people with, with immense taste. At OpenAI, we let teams get very large, so we haven't said, "Hey, there's no management," but, like, the teams are quite large, right? It's mostly ICs, um, and I think that's good.

    9. LR

      You use this term zone defense for product work-

    10. AA

      Yeah

    11. LR

      ... and I think that's really interesting. It kind of maps to the design kind of shift also, just like-

    12. AA

      Yeah

    13. LR

      ... you're there to kind of manage and coordinate. Talk a little bit more about what that looks like. What does zone defense look like for a product person?

    14. AA

      Yeah, and I, I have had a lot of conversations with, with Alexander about this, this analogy, w-which is that, like, if two product people are working too closely, that's often not a good signal in that, like, you kind of want... Like, as a product org, you sort of want to do this, like, force-directed activity, where you're like, "Where are the gaps?" Especially in this new world where curation and, like, you know, steering and alignment is a lot of things, where you're like, there's a ton of chaos happening on people throwing ideas all over the place, right? The whole, like, top-down, you know, year-long planning thing, not gonna work. And so now it's like we need the tastemakers to guide things from inception to what the product should be, and that means you basically want company coverage. And so you spread out and you say, "All right, who's, like, who's best at what? Let's create some space between us so that we got full coverage," right? And that's kind of how it goes. And then you fill in the gaps, and you're like, "Look, like, we wanna hire engineers who are product-minded." Like, we don't, we don't want it to be like, you know, we've got a bunch of people writing a bunch of code that needs, like, full team reviewing it for, like, product coherence, right? Like, we want everyone to have these skills, but I think, like, what

  10. 30:1231:37

    IC vs. management

    1. AA

      people go deep on has to change, right?

    2. LR

      This is definitely a thread I've been noticing over and over with talking to folks like you is the, the most valuable person right now, one of the most valuable, is someone that could take an idea from idea to done with the taste to know this is great.

    3. AA

      Yeah.

    4. LR

      Just, like, shepherding throughout this obsession with making it awesome, like this kind of high agency, high taste person exactly as you describe. Is that, is that kinda the way you think about here's who we're hiring, here's who's gonna do really well in this new world?

    5. AA

      Yeah. I think that that's, that's the core piece right now, and it, it also speaks to how I sort of see IC vs. management, which is that it's not that management is going away It's not that everyone's an IC, but like everyone's kind of both now, right? If you're an IC, you're not typing code out character by character, right? Like you are managing something. You're managing agents, you're managing, you know, like you're managing work that is happening, right? That comes together to do a certain thing. If you're a manager of teams, you're doing the same thing, just at a different, like different granularity, right? I generally look for like obviously command over the displan, but then the taste to say like, "Hey, you're gonna have unlimited tokens, and I don't... Like we can't just be doing slop." Like you need to be able to determine what's signal, what's noise, like in a world of

  11. 31:3735:16

    Planning roadmaps

    1. AA

      just infinite content.

    2. LR

      You mentioned planning. Uh, at the pace things are moving, it's become very hard to plan roadmaps.

    3. AA

      Yeah.

    4. LR

      I imagine especially [chuckles] in your world. Uh-

    5. AA

      People are very frustrated with me all the time on this.

    6. LR

      Yes. Because things are just constantly shipping, things are changing, right? How do you plan on, on, on your team? What's kinda like, how far ahead are you thinking?

    7. AA

      Yeah.

    8. LR

      And what does a plan look like? Is it like a spreadsheet? Is it an MD file? What, what's kind of the output of a plan?

    9. AA

      Yeah. I don't, I don't think we do anything revolutionary on that. We're not clever about planning. I think like the basic gist is the shorter term something is, the more detail it needs. And then it's not that we don't plan for nine months out, it's that that just has to stay very hazy, because any amount of precision that you add to a nine-month plan right now is false precision, and like you're just gonna waste time. Like you can say stuff, right? But like nothing that we planned... I think research is different, so I'm not speaking for research here. But like on the applied side when we do product, like anything that you could have planned in November may have been true for December, but like isn't what happened, right? So it's hard. Like it is really hard to do planning. We generally need to know like what do we think models are able to do on what timeline. In my last company, I kind of saw this shift where we were starting to use the models to drive features, and the product process fell down. It basically had to be like let's list out all of the things that we think we are interested in doing for the next year or two. Let's prototype all of them, decide which things are ready now, and then just let the others sit and bake. And then every time there's like a new leap in models, let's try that thing again with it swapped out. 'Cause like the whole premise of whether features were good or not were based on whether they were smart enough, not the shape of them.

    10. LR

      Mm.

    11. AA

      So this is a great story about the Codex app. I-- Like I'm very confident that the Codex app that we released in February, if that had been ready in November, it would've absolutely failed in the market. And the, the, the only difference was the models between November and February, right? And I, I think like there's a lot to that, that this product with the exact same shape I think would've, like its, its outcomes were totally different depending on just a few months of timing.

    12. LR

      This episode is brought to you by Mercury. Radically different banking loved by over 300,000 entrepreneurs, and now with Command. I've been a customer of Mercury's for over six years. I have never once thought about leaving. Mercury is basically what happens when banking is built by product people, not by bankers. They make it so easy, dare I say fun, to send invoices, move money around, set up virtual cards for folks on my team. Does your bank have an API, a terminal native CLI, or an AI-ready MCP server? I don't think so. And just recently they launched Command, a conversational interface built directly into Mercury, which acts as your financial operator. I've been using Command to transfer money around, to figure out what categories I've been spending the most money in, analyze my cash flows. And just today I used it to find out how much I've made from a specific sponsor over the past year. I just ask, "How much have I made from X over the past year?" Ten seconds later I have an answer. It is so freaking cool. Visit mercury.com to learn more and apply online in minutes. Mercury is a fintech company, not an FDIC-insured bank. Banking services provided through Choice Financial Group and Column NA, members FDIC.

  12. 35:1638:13

    Building features that don’t work yet

    1. LR

      This is definitely a thread on this podcast, is build things that are not yet working that will work when the model gets better. And there's this kind of oth- other thread of ambition. Be more ambitious with the things you take on. So is this just like a way you approach things, is just like let's just build a bunch of things that may not work yet. We'll just have them around and wait for a model to catch up. Is that kind of the approach?

    2. AA

      Yeah. I think we have a lot of that. I think sometimes the challenge is like you have to be very clear again about what stage of the design process that's in. People still have this muscle memory of like, "Oh, I wrote the code for this thing, therefore we should put it out there." It's like, no, no, no. That means you have an artifact now that we can test against for into future models, right? Um, this happened with the in-app browser in the app that we have, right? Like we had a, a kind of a working version. I mean, go back to Atlas. We had agent working inside of Atlas and, you know, that was pretty cool. We had Operator before that in ChatGPT, right? That didn't work out. Very cool idea. Like there's some thread that you can draw between Operator, Atlas, Codex, ChatGPT, that it's like fundamentally the same feature, but the re-releasing of it with different intelligence totally changes the outcome here.

    3. LR

      Mm-hmm.

    4. AA

      And so I push people not to be stubborn about like, "No, this isn't working, so it's a bad feature." It's like, no, it might not be ready yet.

    5. LR

      Mm-hmm.

    6. AA

      Um, there's also this aspect of, especially in research, there's always a desire to be the most ambitious and to say, "Okay, but at the limit the model can just do this." And that just doesn't work on the product side. Like if you, if you go back to the original Codex release, basically what it was is it said it was Codex Web and it wasn't good for interacting with. It was like- You give the model a task, and it's gonna go off, do the task, come back to you with it finished. Like, doesn't sound that radical. The problem is, like, it didn't do the task that well. Like, it wrote code. It was, it was, it was good, but it was like that form factor was too early. And then the Claude code comes out totally local, like not hooked up to the cloud. Um, doesn't pretend to be as... It's not as AGI-pilled, right? It's, like, gonna ask you questions. It's gonna sit there. You can't just delegate your life to it. That worked way better, right? 'Cause that's the point that the models were at there, right? So we were like, we were too AGI-pilled for the moment, and I think, like... I, I think about that lesson a lot on this stuff. It used to be that, you know, failing market told you all these things about the shape of the product, about the communication of the product, and now it's like, no, you might need to release this thing six different times before it works. And that might, like, the shape might not change at all.

    7. LR

      There's like-- It's so interesting to hear about all the variables you have to think about building product now. There's the timeline for the models and the research and how smart it gets. There's, like, people's ability to even understand this is how you could build software in the cloud and this is the future. Like, get people prepared for this new feature and then just, uh, what you can build as a team.

  13. 38:1339:17

    The ambition problem: when you’re too AGI-pilled

    1. LR

      And I love that Codex example 'cause comes back to this idea of ambition, and I wanna hear if there's anything there for you of just this thread of just be more ambitious because these models can do so much more than you even imagine. And sometimes it's too ambitious for the market, and they're not ready for it. But do you think about that at all, just, like, pushing your team to be more ambitious because it's so much easier to just do things that maybe felt crazy hard in the past?

    2. AA

      Yeah. This is a core challenge. Once, once there's a product that exists or a feature that exists, it's really easy for people to find paper cuts and pick them out.

    3. LR

      Like optimize.

    4. AA

      And they should. And people on Twitter like to remind us of this, and I thank them for that. Like, people should be focused on the features that exist and making them more reliable and better. But this, you know, this is why we also have a culture of bottoms-up exploration here because sometimes... In the same way the Codex app came and disrupted ChatGPT in some way, right? This thing will get disrupted by a future effort, and that's, that's part of the design is that, like, you can't always, as one team, be good at both the disruptive piece and the, like, maintaining a product and its quality piece.

    5. LR

      Mm.

    6. AA

      At some point, you gotta

  14. 39:1746:52

    The latest frontier: loops and autonomous development

    1. AA

      design a process that allows for both.

    2. LR

      Kinda zooming out a little bit, if you think about the progression that we've been on of, uh, AI impacting how we build product, it's, like, insane how far we've come from, as you said, we used to write all our code by hand, like artisanally created human code, to AI writing 100% of our code, to you actually put it this way, that now, like, coding is steering the AI. And, like, when you think about what percentage of my code is written by AI, it's almost like how many times did I have to steer it in the right direction-

    3. AA

      Yes

    4. LR

      ... is the new version of coding. And now that there's, like, agents and loops and all these things, what's kind of the latest frontier from what you've seen of how people are building? Is, is it loops? Is there something else of just, like, the most AI-pilled, AI-forward teams, here's how they operate now that people may not be aware of?

    5. AA

      Yeah. I mean, loops are so last week, man. I mean, we, we talked about this. Um, you know, one of the big questions is always, "Well, how much of the product is AI written?" And it's always hard to answer that question 'cause if you're using the goalposts from last year, it's like, well, 100% of our product right now is AI-written code. So the question is more like, well, okay, fine. Is, is the code written supervised versus unsupervised, right? And that's, like, a totally different thing. I welcome the moving of goalposts because that means we're making product progress. There have been a lot of explorations here around, like, autonomous, autonomously developed software. Um, a lot of, like, harness engineering stuff, a lot of different explorations on, like, okay, well, what if you came in overnight and did garbage collection of the code base to clean it up, right? One thing that I think all models suffer with right now is just they, they usually increase complexity. If research is listening at any company, please make the models better at deleting code. Um, but, uh, you know, th- that becomes a problem right now when you try to put development completely on autopilot. Um, and it's both on the, the human side and the code base side. So, like, feature requests, right? You know, how do you teach a model which features to build, which ones to ignore, which ones to kind of, like, group together and reframe a little bit? How do you teach a model how to build the right abstractions, right? Like, all of this is getting better. Um, I don't think we're at place yet where we're like, we're just gonna set up a loop that's like improve the app, you know, and listens to Twitter and listens to Slack and listens to email. And, like, we're not there yet. But we are, we are, we're trying to make it happen.

    6. LR

      Do you think we'll get there? Do you think we'll get to a place where it's just, like, grow, [chuckles] like, win?

    7. AA

      Slash goal, make money. Like-

    8. LR

      [chuckles]

    9. AA

      ... make me a billion dollars.

    10. LR

      Yeah. Win, win the market.

    11. AA

      I don't know, man. Like, I-

    12. LR

      Yeah

    13. AA

      ... I am not in the business of saying-

    14. LR

      No

    15. AA

      ... never or always or whatever.

    16. LR

      Yeah. How are you using AI in your work as product leader, engine leader?

    17. AA

      Yeah.

    18. LR

      What are some ways that you use it that maybe people may not be aware they can use the app for?

    19. AA

      Yeah. I, I think I have the best job in, in the world right now. Um, but one of the things that makes it very fun is that when we were developing the original Codex app, the goal for me personally was to make it the thing that I wrote the code with, right? I was like, "I need to make this so good at development that I can build the Codex app with this." And the Codex app at that time was a development tool, right? And we did that, like, super quick dogfooding loop, 'cause you've got your personal dogfood-fooding loop where you're like, "Oh, like, I can't do this thing. I should fix that so that I can do the thing. Now I can do the thing. Now I can do more things," right? Um, you know, we released that, and then the next challenge was, hey, people are starting to do some different shaped things with this, right? And now I, you know, need to grow this, and so I need to hire a few people and help. So then, like, my role changed at the same time that the role of the app needed to change. So I'm like, "Okay, I need to do more product discovery here. I need to- Figure out the right loops for seeing what everybody's working on and steering things that are off track. And so all of a sudden, that's what I started using the Codex app for, right? I did still write code. Like, I've, I've tried to align my own usage of it with the problem that we're trying to solve, right? And now I'm like, "I need to build a spreadsheet that models this out. I need to kind of do a, you know, internal deep research on all of the efforts that have gone into this area of research for the next version of this." There was a release or a series of releases in May-ish that introduced the in-app browser, computer use, and artifact creation to the Codex app. That was, I think, our Codex's for almost everyone release. And everybody knows the term vibe coding. I think that was like our first vibe coordinated release, where like I had a, a, you know, Notion doc somewhere with everything that needed to happen, and I was like automating, like going out to gather updates from pull requests, from Slack channels, and like updating the status tracker. And like now this is pretty commonplace, um, but at the time I felt like I was at the bleeding edge of like how to manage a product release. In short, like the way that I use the Codex app is basically like what, what has my job grown into and how do I make this thing able to do everything I need to do? Um, I will get up in the morning, I will see the daily brief that I have from like everything from the 3,000 Slack channels that I'm in, like which things need my attention. I can kind of message back and be like, "All right. Give me five questions and I'll answer them," and I can, you know, do that.

    20. LR

      How do you set that up? What's like the workflow to, for somebody to set that up?

    21. AA

      Yeah.

    22. LR

      'Cause that sounds amazing.

    23. AA

      Again, I, I think we're still in the like discovery phase on, on a lot of this. And, and so right now it's like I'm making an automation that says, or a scheduled task that says like, "Go through my Slack channels. These are the things that I care about and think are most important." And you know, so I'm still kind of defining that. Like, these are things to watch out for, different categories, like here's some context, and you know, I'll, I'll ... You know, that'll get set up as a, a automated task. And then first few, few times it runs it might need some steering. And luckily with this app, you know, I don't have to find out how to edit the instructions. I can just be like, "Hey, next time this runs, like can you please worry about this instead?" Or, "Can you de-emphasize this work stream?" Or, "Hey, this thing happened and it didn't come up in a brief. Like, can you make sure that stuff in the shape..." So I can kind of coach it along the way. It'll update the way that it notifies me, stuff like that.

    24. LR

      Amazing.

    25. AA

      I think in the future though, like this is, this has been a core problem with the, the chatbot shape, right? Is that I know how to set this up. I have time to set this up because for me it's product discovery to set it up. But if you're not working at OpenAI, not developing this, like you don't wanna have to figure out all this stuff. Like we, we, we need to like figure out that, that shape of things.

    26. LR

      Yeah. What I'm hearing here is I don't think people realize that, uh, your app can act a lot like OpenClaw.

    27. AA

      Yeah.

    28. LR

      You was, it was people were so excited about. Like you just talk to it, set up this thing, check on this thing for me every day, and then tell me what's going on.

    29. AA

      Exactly. Yeah.

    30. LR

      Like it's starting to become a part of all these, all the products, uh, which is amazing. So the way somebody would set this up is they just talk within the app and say, "I wanna set up an automation to do this. Look at my Slack and these are the things I me- I wanna do."

  15. 46:5249:10

    The power of computer use and browser automation

    1. AA

      the least we can do.

    2. LR

      Yeah. A good example of this, I built this little app that filters spammy email for Bandbox. So every em- email that comes in, and I built this in Codex, uh, every email that comes in, it looks at it and decides is this unsisl- unsolicited kind of cold email-y stuff that I don't wanna look at, and labels it and puts it somewhere else. And to set that up, one of the steps was you have to go into like the Google Cloud console and set up all these pubsubhubbub API things and triggers. I don't know if you've ever used that interface. It's like so annoying and slow.

    3. AA

      Right.

    4. LR

      So I was like, "Wait, what if I ask you to do it?" And it was like, "Okay, cool. Do this for me." And it just, uh, you described computer use. I've never actually seen this happen on my computer before.

    5. AA

      Yeah.

    6. LR

      It just takes over my computer and starts going there.

    7. AA

      It's like, "I don't care if you don't have a connector, man, I'll just start clicking."

    8. LR

      Yeah, and it figures it out. It's crazy-

    9. AA

      Yeah

    10. LR

      ... just to like watch it doing its thing.

    11. AA

      Designing the decision boundary between connectors, when to use the in-app browser versus your Chrome extension that's connected versus computer use-

    12. LR

      Yeah

    13. AA

      ... was interesting, and all done through just like feeling it out.

    14. LR

      I saw a great Twitter thread the other day where they describe all these three and what you use it for.

    15. AA

      Yeah.

    16. LR

      So this person described it really well.

    17. AA

      These personal workflows are really interesting because some of them really click. Like some of, you know, people are trying all sorts of stuff. Everybody's making these personal systems. You ask everybody here what they do, and everyone's could be different. And then like certain themes arise and we're like, "You know what? That should be a first, a first-class experience in the app. Like we should take this thing that everybody seems to be setting up and just make that work." And I think memory is sort of in the shape where we've had a lot of people, and a lot of people at other companies too, are like, "Well, I set up an Obsidian base or a Notion area and I tell it how to basically build my mind palace and how to put..." It's like, eh, I don't know if everyone... Like, you shouldn't have to do that. Like, there should be a memory feature that does that for you, right, that's pretty generic. So that's, you know, that's one. But then there are other things like they're your process of your job and they're something that you're like, "Yeah, you should set that up." But I think this is, we're constantly wading through like what's working for individual people, like what should, what should enter the product versus stay like, no, that's just how you do your job, right? What should become a primitive, what shouldn't? Um-

    18. LR

      Yeah.

    19. AA

      Yeah.

    20. LR

      This is the, the taste and judgment you spoke of earlier, you know, citing these things.

  16. 49:1052:05

    Will we run all our SaaS apps inside Codex?

    1. LR

      I wanna talk about this browser use piece a little bit 'cause I think people don't realize how powerful this is. And what it could be used for. Reminds me, I don't know if you watched, uh, when Dan Shipper was on the podcast, he had this, uh, from Every. He had this prediction that we're gonna start using Codex to run our SaaS apps inside of-

    2. AA

      Yeah

    3. LR

      ... so instead of going in Chrome-

    4. AA

      I know, he Slacks me about this every day, ask- asking for stuff.

    5. LR

      [laughs] Do you feel like this is where things go, where we're just working within the Codex app using Notion and Linear and Salesforce inside with your agent kind of helping you along? Or do you think that's a kind of a different direction?

    6. AA

      It's been, yeah, it's been really interesting because obviously we've had a few attempts at browser-shaped activity, right? An operator and ChatGPT agent mode and Atlas, and now we have the in-app browser inside a desktop app. We also have the ab- the ability to install a Chrome extension where the app connects to Chrome. Like, we've had a lot of shapes of this, and I think we've, we've learned a lot of different things. There's, there's a lot at play. There's a lot of really boring things at play, like, you know, we originally launched the app. It's an Electron app. The things that you can do with in-app browsers in there, it's, it's like kinda janky, so we have the, the, the in-app browser was for development. It was for testing your front end on development, and we were like, "It's not really for anything else, guys." Like, it's, it's a developer tool, right? And then we, we switched over to our Owl, Owl stack, which had powered the Atlas browser. And so now, you know, it multi-tab, and we've got enterprise security so that you can actually log into all your websites if you're, you know. So, uh, we've been iterating on this. I think the tough thing has always been what should the shape of this browser be? Like, is this something that is only for the agent, right? That like you've got Chrome, you open Chrome, you, you know, do your thing in Chrome. Y- if you ask the desktop app, it opens up this browser that it can control really quickly, doesn't have the latency of Playwright, whatever. But the, you know. Or are we trying to say, this app is for everything and like we want you to use this as a browser. And those have a lot of trade-offs. It's not super, you know, well-traveled path, right? Like, most browsers are browsers at the top level. They've got browser tabs. Um, this creates a lot of really boring but tedious problems, like keyboard shortcuts, right? Are we trying to do key mapping to VS Code or to Chrome or to our own thing or to Linear? Or like, you know, we wanna have some sort of muscle memory that carries over, but got all these things that have sha- like sub-shapes of different products out on the market. What do we do?

    7. LR

      And this just highlights how extra challenging this app is, where you have to allow it to work for somebody that's never built anything.

    8. AA

      Yeah.

    9. LR

      From like the more basic user to like power, like Peter.

    10. AA

      Yeah.

    11. LR

      OpenClaw trying to code with it.

    12. AA

      I, I'm not convinced I'm gonna get Peter to use the app. I think he might be the last-

    13. LR

      He's a terminal-

    14. AA

      ... terminal holdout.

    15. LR

      Okay.

    16. AA

      But I'm gonna,

  17. 52:0557:20

    The future vision for Codex

    1. AA

      I'm gonna keep trying.

    2. LR

      Okay, let me, uh, let me zoom out for a moment and talk about kinda the big picture of where you're taking all this. What's the, what's kinda the vision for, for Codex? Where does this go? What's it gonna look like in, I don't know, a year or two, 10 years?

    3. AA

      We had Codex as a CLI, right? And then we decided to build this app. And, you know, we were, we were a little uncertain about the app and, um, but had a lot of conviction in, in what it could be as to start a developer tool, right? That, you know. And it wasn't gonna be an IDE, it was gonna be this right-sized surface where it was like sort of a chatbot, but it was more than that, and you could see the code, but we weren't gonna let you edit the code. There's a really interesting thing that happened at OpenAI in January and February. It was before we actually released the Codex app, which is that we'd started to dogfood the Codex app. And what we were finding, like we were converging on some pretty clear internal PMF on engineering, right? And research workflows. They were thrilled. They were loving it. We were like, "All right, we just gotta get the quality bar up before we release it to the world. We're convinced that this will be a thing." But then at the company, we spun up a few other workflows to say, "Hey, like this Codex effort is onto something with these coding agents." And we have people from marketing, from comms, from finance, from legal, from basically every discipline who are using this Codex app even though it is actively hostile to these people, right? It is like trying to show them code. It's trying to ask for approval to run RG on the... You know, it's like it's doing all of these things that are actively not the right product surface for them. So why don't we take our other surfaces and add Codex to them? Like let's add it to the ChatGPT desktop app. Let's add it to the Atlas browser, right? And let's essentially take the lessons of Codex and make it more general for general knowledge work tool, right? And those efforts went for a little bit, and the, the most annoying problem happened, which is nobody would leave the Codex app for the apps that were allegedly for these other personas. And I think the lesson in all of this was just that like the whole developer tool versus general knowledge work tool, like there's a lot of nuance here that isn't just one or the other. And I think we really, we believe really strongly in this, in that there are certainly, in the same way that we talk about the average of your role is like what your role is now, this is true on the product side too. People who are doing Excel work don't wanna see Git repository information. We know that. But we also know that we can tell a lot from what they're doing about what kind of work they do, and we can start simple, grow the product complex as we feel is needed, right? Doesn't mean we don't have modes, right? You might want some modes for organizing your stuff and to sort of be like legible about the, the ways that you enter the experience, right? But we really believe strongly that what we've built here is the right shape to take, take on like really deep, vertically focused things, right? We work deeply with our finance team, with our team working on science, team working on legal, right? And, and we say like- If we can build the right extensibility primitives in the right general model, then you can do anything with this, right? And then our challenge is, well, how do you, you know, how do you generalize it? But this is kind of going back to, like, the best desktop app that we can build. Like, what does that look like? Um, and so, you know, was Codex the developer tool, ChatGPT, like, where is this going? This is how we think about it.

    4. LR

      It is so interesting the point you made that the Codex app was so... Like, you did such a good job getting people to be aware it existed and so good to use and fun to use that everyone's starting to use that versus the ChatGPT app. So clearly the direction is combine them so that you're not creating this confusion, which I know is, you know, things people have been talking about, this idea of com- bringing them together.

    5. AA

      Somebody called it a super app, and wish they hadn't said that because now I have to hear about the super app all day, every day. We'll get past it. Great.

    6. LR

      Yeah. Okay. But is that, is that kind of the... Not, let's not call it a super app, but the idea is, like, one place people go to do all the things. Is that the general idea? Or TBD?

    7. AA

      Yeah. I think what we see here is that it's a great home base. It's a great place to keep track of all of the things that you have to do across different surfaces. And some of those things you do all of it in the app. Some of those things, the app opens other apps to, to do, right? The app can connect to Excel so that, you know, yes, it has a spreadsheet editor inside the app. Is that good enough for people doing financial modeling at OpenAI for raising billions of dollars? Like, probably not. Um, and so the app talks directly to the add-in in Microsoft Excel on your desktop. When it's done, you can close Excel, right? And so it's not just about, hey, we're drawing a rectangle on the screen, and everything needs to happen in that rectangle. It's this thing should be a, a home for you where you start work, you end work, you automate work, and it uses whatever you need to do,

  18. 57:2059:30

    The videographer who built a Premiere Pro extension with Codex

    1. AA

      right? There's a, there's a great story about we had some videos that we shot in this room for the original launch of the Codex app, and our in-house DX videographer, Brent, then was tasked with editing all these videos, right? And he edited all the videos with Codex, which was one of the early, like, whoa, what are people doing with this thing, right? And the process for why he decided to start using Codex was really interesting. He started just because he was curious if Codex could edit videos. And so Codex is not a video editor per se, right? Doesn't have any of that UI in it. But, um, it was able to understand that he used Premiere Pro. It could do some edits by editing the files that were backing what was on screen in Premiere Pro, but it couldn't do everything. So naturally, what Codex then did was built itself an extension that could be installed into Premiere Pro that it could then talk to and say, "Hey, Premiere Pro extension, can you please change this marker inside of the Premiere Pro app?" That was pretty nuts when we saw that happening. Um, it's a great model, right? There are these, like, specialty tools that specialize in things, and so we're trying to do two things at once with Codex and with now with ChatGPT. One is, how can we seamlessly interact with these tools that you're already using and say, like, "We don't, we don't need to build a better video editor for you," right? But, like, Codex and ChatGPT can use that video editor, right? It can interact again, it can hand stuff off to it, right? So how can we do that? And that's often through, um, connectors or computer use or even extensions in this case, right? And then there is, you know, Dan Shipper's thing, which is, "Hey, I have these web apps that you can click around and use, but I wanna be able to open these in Codex and have Codex do extra stuff with it," right? And so those are kind of like two models that are almost inverse of each other that we're doing a lot with both at the same time.

    2. LR

      Mm. This Premiere story is interesting to me because it's another example of just be more ambitious with these AI tools.

    3. AA

      Yeah.

    4. LR

      Like, you may not, you may not know. Maybe they could do this thing. It's almost just like, go try. Go try.

  19. 59:301:01:50

    Failure corner

    1. LR

      See if it figures it out. I'm gonna take us to a recurring corner on the podcast that I call Fail Corner. And so the question for you is, people see people like you just, like, killing it, just growing. Everything's winning. Codex is doing so great. This crazy career. Everything's up and to the right. People may not see the times that things didn't work out and things that you launched that were failures. And so these stories are really important for people to hear, that it's not all just a win all the time. What's a story of a time you failed in your career that taught you something really important?

    2. AA

      It's funny to hear that description played back at me. I'm like, this is perhaps the first time I've, I've not felt like I was failing. I mean, I was a startup founder for a long time. I ended up selling the company for parts essentially, right? And it was just, it was years. It was a slog. It was heavily regulated spaces. The whole thing felt like a constant failure. Um, I went to this other startup, and we were trying to do some AI tools in, in this also, like, pretty locked down, regulated industry, and that felt like just time after time of trying things and it not working. So to me, it's been like, oh, I've failed actually quite a lot. And, you know, sometimes it's just a point in time where things line up, like skill set, passion, point in the market line up. We, with this, you know, with this project to bring what we've learned with the Codex app and marry it with ChatGPT, there have been I don't know how many micro failures on this where we're like, "This is the shape it should look like," and then throw that in Slack, and there's like a 2,000 message thread about how stupid we are. And it's like, that, this is the thing I love about OpenAI is people will just tell us that, right? There's, there's no, uh, there's no holding back on, like, when we fail with product things internally. It's why the external product has been pretty great is 'cause it goes through these cycles of like-

    3. LR

      2,000 "This sucks."

    4. AA

      I failed for like- I don't know, somewhere between 10 and 15 years before getting to this point. So I'm still surprised every day that things are going well.

    5. LR

      [laughs]

    6. AA

      Like, and I know it. I, I like, but-

    7. LR

      I think this is really important for people to hear, that you can have a lot of things not work out, and then things start to work out super well, and it's just keep going and keep learning, I imagine is, is a lesson. Well, with that, we're,

  20. 1:01:501:07:03

    Lightning round

    1. LR

      uh, we reached our very exciting lightning round. I've got five questions for you. Are you ready?

    2. AA

      Great.

    3. LR

      [laughs] Here we go. Uh, what are two or three books that you find yourself recommending most to other people?

    4. AA

      See, man, I'm, I'm a parent now. I'm an, a parent of young kids.

    5. LR

      Yes.

    6. AA

      So I, I, I'm like, I don't know. There's one called The Gruffalo.

    7. LR

      I-

    8. AA

      I read it to my kids

    9. LR

      ... I l- oh my God, our kid just got obsessed with The Gruffalo.

    10. AA

      Yes.

    11. LR

      Uh, we have, like, a, a bedtime chart.

    12. AA

      Yeah.

    13. LR

      And now it's like pajamas, brush teeth, books, Gruffalo, uh, on blankie.

    14. AA

      Yeah.

    15. LR

      Yeah. It's so good.

    16. AA

      So other books I'm reading right now are, like, that style.

    17. LR

      Okay.

    18. AA

      I'm like, a- actually, The Gruffalo-

    19. LR

      It's a strong start

    20. AA

      ... is not a terrible one.

    21. LR

      No.

    22. AA

      I feel like there's some lessons in that one.

    23. LR

      So sweet. Yeah.

    24. AA

      It's-

    25. LR

      Although every kids book is about death, like someone's eating someone, someone's killing. There's always, like, bad, like murder-

    26. AA

      Yeah

    27. LR

      ... and, and destruction.

    28. AA

      'Cause like violence.

    29. LR

      Yeah.

    30. AA

      Even when it doesn't feel like violence to them, like the words, th- they're just like something-

  21. 1:07:031:09:55

    BTS: How our producer uses Codex for editing

    1. SP

      Brent in the Premiere.

    2. AA

      Yeah.

    3. SP

      That's cool. I've actually used Codex-

    4. AA

      Oh

    5. SP

      ... to edit edits as well.

    6. AA

      There we go.

    7. SP

      It's basically simple stuff like, "Oh, could you just cut this into the three breaks?" You know, like if there's a-

    8. AA

      Yeah

    9. SP

      ... pause in conversation. Like the second I- and Codex understands it.

    10. AA

      Basically every job we feel like starts with a story like this, which is the product's not designed for it, but it's sort of a blank chatbot that can write code. So it can do everything, but, like, what-

    11. SP

      Right

    12. AA

      ... what are the useful things for it to do?

    13. SP

      I mean, people just have to have curiosity-

    14. AA

      Yeah

    15. SP

      ... about the project. They, they have to, you know, have an in- an intentional outcome and use Codex as the platform for that-

    16. AA

      Yeah

    17. SP

      ... just to see what happens. You know, 'cause there's no risk at all.

    18. AA

      Right.

    19. SP

      You know?

    20. LR

      Just a few tokens.

    21. SP

      Few tokens. But of course, if you're working for-

    22. AA

      Mm-hmm

    23. SP

      ... OpenAI, there's less risks on that regard.

    24. AA

      True.

    25. LR

      Yeah.

    26. AA

      You asked, um, one of them like what... or a lot of them actually, like what skills are important.

    27. LR

      Yeah.

    28. AA

      And like then, then you've also had conversations about like the cracked new grad versus the-

    29. LR

      Right. Yeah

    30. AA

      ... like I don't know. If you're married to the exact process you have right now, like that, I don't know what advice to ever give, but if there's one piece it's like do not get married to your exact process. Get married to like the outcomes that you are uniquely able to deliver-

Episode duration: 1:09:56

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode P3KDebPTUrw

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.