Skip to content
The Twenty Minute VCThe Twenty Minute VC

OpenAI's Codex Lead: Why Coding as We Know It is Over

Alexander Embiricos leads product development for Codex, OpenAI’s advanced coding agent, helping shape the future of AI-assisted software engineering. Before OpenAI, he co-founded and exited Multi, a collaboration tool, and has deep experience building developer products that accelerate workflows. At OpenAI he focuses on turning AI into a proactive “software engineering teammate” that can write, review, and ship code across the entire lifecycle. ----------------------------------------------- Timestamps: 00:00 Intro 02:12 Will AI automate coding? 03:38 The "Compression of the Talent Stack": The future of engineers, designers & PMs 05:32 The bottleneck of AGI 09:43 Building for individuals vs top-down enterprise automation 10:28 The three phases of agent development 14:30 The importance of inference speed & OpenAI’s partnership strategies 17:04 The transition from "Pair Programming" to "Delegation" with GPT-5.2 Codex 19:17 Why the Codex app isn't a traditional IDE 20:03 The importance of plan reviews and automated code reviews 22:22 Building open standards and the "agents.md" convention 27:47 Winning strategies: Compute advantage, best models, and product execution 30:18 Measuring success: Moving from Weekly Active Users to Daily Active Users 31:40 Chat vs GUIs: The enduring UI of AI interaction 33:52 Designing interfaces for agent-to-agent interaction 35:24 The data moat: Coding data vs knowledge work task data 54:12 Advice for the Next Gen of Engineers 55:51 Lessons from competitors (Claude Code) 57:26 Lessons from Dropbox and Slack 01:00:39 Quick-Fire Round ----------------------------------------------- Subscribe on Spotify: https://open.spotify.com/show/3j2KMcZTtgTNBKwtZBMHvl?si=85bc9196860e4466 Subscribe on Apple Podcasts: https://podcasts.apple.com/us/podcast/the-twenty-minute-vc-20vc-venture-capital-startup/id958230465 Follow Harry Stebbings on X: https://twitter.com/HarryStebbings Follow Alexander Embiricos on X: https://twitter.com/embirico Follow 20VC on Instagram: https://www.instagram.com/20vchq Follow 20VC on TikTok: https://www.tiktok.com/@20vc_tok Visit our Website: https://www.20vc.com Subscribe to our Newsletter: https://www.thetwentyminutevc.com/contact ----------------------------------------------- #20vc #harrystebbings #alexanderembiricos #productlead #codex #ai #openai #coding

Alexander EmbiricosguestHarry Stebbingshost
Feb 21, 20261h 3mWatch on YouTube ↗

EVERY SPOKEN WORD

  1. 0:002:12

    Intro

    1. AE

      You still need software engineers today. You still need designers. I'm a PM. Do you need PMs? You know, you can have some fun jokes about that. I don't think you need them.

    2. HS

      Today, joining us in the hot seat, we have Alexander Embiricos, Product Lead for Codex at OpenAI. This is an incredible discussion. Time to get the notebook out.

    3. AE

      For me, the most exciting future with AI is one where everyone just feels like a superhuman, like empowered by AI. And for that, we need tools that everyone feels fluent with.

    4. HS

      Your job is the success of Codex.

    5. AE

      Actually, our job is the distribution of intelligence, and this is really unintuitive, but, like, we put all this effort into training these models, and then we serve these models to our competitors.

    6. HS

      This is so difficult for me as a venture capitalist to understand. Elon said that coding's one of the first professions to be largely automated. Do you agree?

    7. AE

      For sure, I would agree that coding is one of the first domains where LLMs are really good. But what does it mean for coding to be automated? It's, like, kind of a heavy statement, right? For example-

    8. HS

      Ready to go? [upbeat music] Alex, I'm so excited for this, dude. I told you, I've been at a PE conference, and all I could think was, "Thank God I've got Alex next," 'cause this is gonna be a great one. So thank you so much for joining me, man.

    9. AE

      So excited to be here. Thank you.

    10. HS

      Now, this is a weird first start, but roll with it. You'll, you'll understand my British intricacies. I'm fascinated by people's motivations. Are you motivated more by the fear of losing or, like, the thrill and excitement of winning?

    11. AE

      I, I'm a maximalist. I'm definitely much more motivated by the idea of winning than the fear of losing, but I'll admit, I'll admit to you something. One, I was running a startup before joining OpenAI, and one of my darkest moments, and there were many dark moments while I was running the startup, was recognizing that I had spent the past few months trying to avoid losing.

    12. HS

      [laughs]

    13. AE

      And all of a sudden, I was like, "Oh my God, that is why I'm so unhappy, and that's probably why the startup isn't going well." And so when we flipped, you know, I, basically every now and then, I have to re-catch myself and, like, flip back into this idea of winning. But really what motivates me even more than that is I think I just love building things and building things for people. And man, I am so excited for this year because many amazing things that don't exist yet are gonna be built and given to a lot

  2. 2:123:38

    Will AI automate coding?

    1. AE

      of people.

    2. HS

      I'm diving right in. Elon said that coding's one of the first professions to be largely automated. Do you agree, given your position [laughs] and what you see day to day?

    3. AE

      I think for sure I would agree that coding is one of the first domains where LLMs are really good. You know, what does it mean for coding to be automated? It's, like, kind of a heavy statement, right? Like for example, now that we no longer write Assembly, like when that change happened and we moved to higher level languages, did we say coding is automated? Not really, right? We were just able to write much more code, and then as a result, actually, there was much more demand for code, and there were many more software engineers required. But yeah, part of what they used to do is automated in the same way that, like, do you know the origin of the word computer?

    4. HS

      No.

    5. AE

      Um, I might pronounce the location wrong, but I think it was at Bletchley Park. There were all these machines for, like, decoding German Enigma, and, like, there were humans who would, like, punch out punch cards and, like, put them into the machine and do a bunch of, like, tabulated math. I'm probably butchering this, but basically, there was an, like, intensely manual part of work, and even, like, the first spreadsheet software was kind of loosely based off this idea that you would have an office full of desks arranged in a grid and people doing tabulations and then passing their sheets to the next person. And so all these things, like those specific tasks have become automated, but every time that's happened, there's been an explosion in demand for the output, and so, like, you need many more people actually to do that kind of work, even if the specific

  3. 3:385:32

    The "Compression of the Talent Stack": The future of engineers, designers & PMs

    1. AE

      task has changed.

    2. HS

      So you think we'll have more engineers in five years, not less?

    3. AE

      Yeah, and I, you know, sometimes we change what terms mean, right? Like the term computer now refers to something else, but the, now we have the term software engineer. And so I definitely think we'll have many more builders. You know, something interesting that I'm observing now is, like, there's this compression of the talent stack. Like, you know, you still need software engineers today. You still need designers. I'm a PM. Do you need PMs? You know, you can have a fun, fun, some fun jokes about that. I don't think you need them. Um, but maybe, you know, maybe when you say engineer, you might be thinking of someone who's, like, much more full stack, right, than, than has been true before. Like even if you go back a few years, it was much-- you had many more places where there was, like, the back-end engineer and the front-end engineer, right? Whereas, like, now, at least if I think about the Codex team, like, there's very few-- Like, that's much less the case and things are much more full stack, right? And so I think this compress- this talent stack will compress, but we'll still have people building.

    4. HS

      Why do you think we don't need PMs in this world? You, you, you dangled the carrot.

    5. AE

      Yeah, yeah. It's, it's my fun joke. I think-- Well, first of all, I think it's incredibly dehi- d- hard to define what a PM is, what a product manager is. I kind of think of the role as, like, actually explicitly undefined, and your goal is just to adapt to whatever the team or business needs. And, you know, often if you have a bunch of people, like, say here, like trying to build as quickly as possible, then what a, what a product manager can do is spend time, like, taking a few steps back and trying to look around corners and figure out what to do, you know, collaborate with the, the folks and go to market and maybe be the, the team's, like, greatest cheerleader and quality raiser. But, like, all of those things I just described, which are maybe my current role, could be done by a really strong eng lead or a designer who thinks a lot about product. And so I think it's, like, often useful to have product managers, but you probably don't want many of them until the team is really large.

    6. HS

      I was stalking the shit out of you for the last few days, which was a very fun expedition into your writing, into your tweets, into your

  4. 5:329:43

    The bottleneck of AGI

    1. HS

      prior interviews, and you said that human typing speed and validation work is the key bottleneck to AGI, not model, compute, or architecture. And it kind of left there, and I was like, "Help me understand why human typing speed and validation work is the key bottleneck and, and what you really meant by that."

    2. AE

      For sure. Okay, that's a, that's a fun one. I think there are multiple bottlenecks, but that's maybe the most sort of clickbaity one. So, uh, if you don't mind, I will do this slightly Socratically. Like how many times would you say you use AI today?

    3. HS

      30-plus times a day.

    4. AE

      Okay, cool.How many times do you think, a- assuming it was, like, zero energy expenditure from you, how many times do you think AI could help you per day?

    5. HS

      I mean, I-- in, in everything. I think we'll have-

    6. AE

      Yeah

    7. HS

      ... inference running twenty-four hours a day across every single thing.

    8. AE

      Exactly. And, like, I hear things now from engineers, like, at OpenAI and also outside who are-- who are telling me, like, "You know, I constantly have Codex running. I never close my laptop, and if it's not running while I'm in a meeting, I'm, like, wasting my time. I need to make sure Codex always has work for me that it's doing." And that's, like, super cool and super exciting, but that's a lot of work, right? To, like, manage this, manage these agents and make sure they're always working. Going back to the 30 times per day thing, yeah, like, when we look at how often, uh, you know, Codex users are using Codex, it's, like, kind of this, like, tens of times kind of range. I think AI should be helping us tens of thousands of times per day, you know, compute budget permitting. We'll, and we'll get there over time. Uh, but the problem is, like, at least if I think of myself, like, I work on this stuff. I know I should be using AI for everything, but I'm too lazy to, like, type out that many prompts, and I am too uncreative to figure out all the ways that AI can help me. And so I end up kind of at a similar number as you. Um, and I'm, you know, I still am at the point where when I use AI to do something cool like prep for this conversation with you, I'm, like, kind of proud of myself. I'm like, "Oh, cool. I managed to use AI in this new way." And, but that's fine for people like you and me who are, like, really interested in this topic, right? But I don't think most people, we should expect to, like, in order to benefit from, you know, AGI should need to, like, put so much effort into how to use this tool. It should just be effortless for them. And so I think the world we wanna get to is one where to use AI, you don't really need to, like, figure out the right way to prompt. It's just super easy for you. And you don't even need to recognize that AI could help you. It's just, like, knows you, connected to your context, and chimes in helpfully.

    9. HS

      That's where I think, like, Claude has done well in terms of the packaging they've done. Like Claude for legal, Claude for Excel, where you can implement it and have a DCF model. I'm not into models, but, like, better than one could do before. Do you think it is your job then to productize the prompts and the human actions to remove that bottleneck?

    10. AE

      Yeah, totally. So I, I think that it is our job to make sure that we have the models with the, with amazing capabilities, and then eventually to get to a world where this is, like, highly productized, and so you just have this, like, magic text box or audio input or whatever, or you can just add AI to your, like, group chat, and it just starts to help. But I think there's quite an interesting in-between stage, and I think that that is actually where the most value lies right now. So h- here's what I mean. You could try to productize, like, a specific feature of AI for a specific market, and I, you know, the-- many companies are doing this, but I think it's a little bit hard to know what exactly will work, what is the right form factor. You know, I, I, someone was on your podcast earlier, and I, they said something that I thought was quite interesting about how you, you cannot adopt AI at enterprise without FDEs.

    11. HS

      Yeah. It was Matt Fitzpatrick from, um, Invisible AI. Yeah.

    12. AE

      Yeah, so, so even though I am literally hiring FDEs, and if you're an FDE, please apply for a job with me, I actually disagree with that entirely. So what I think we need to do is build tools for people. Like, you can use FDEs, you know, uh, as, as Fitzpatrick said on the podcast, like, to automate workflows, right? But then you're limited by, like, what you from your top-down perspective can do and what you from your FDE staffing can, can staff to be built, right? But for me, the most exciting future with AI is one where everyone just feels like a superhuman, just, like, empowered by AI. And for that, we need tools that are for

  5. 9:4310:28

    Building for individuals vs top-down enterprise automation

    1. AE

      people, for individual users, and that everyone feels fluent with. I think the f- the phase that's most interesting that we're at now is building for the kind of people who are interested in figuring out how to use AI. So what we need to ship, and I think this was, like, the genius of, like, when Claude Code first shipped, what they really got right, um, was they had this tool that was super easy to use in whatever context you want, just in your terminal, and people started experimenting with where to use it. And so I think as we think about AI being used outside of coding work, one of the most important things we can do is not overly build it like, okay, this is AI capabilities, but only specifically for finance, only for specifically for this workflow, but actually build a much more open-ended tool that someone can just use for any given task creatively. Yeah.

    2. HS

      But does that not put the onus

  6. 10:2814:30

    The three phases of agent development

    1. HS

      or the effort back on the user, back to the point of your bottleneck of human action and lack of activity on them? If you don't define the task, you put the responsibility on them for the defining the task, which humans lack the ability or inclination to do.

    2. AE

      Yeah. I think that, so that's why I think it's the bottleneck. So basically, here are the three phases in my mind. First, let's have agents work really well for software engineering and coding because LLMs happen to be good at that. Next, let's realize that for an agent to be useful more generally, it using a computer is super valuable, and also we'll realize that all agents are actually coding agents 'cause coding is just the best way for an agent to use a computer. So let's take that same super flexible idea, but make it available to anyone who's excited to explore and tinker, and we're already seeing people start to do this with, like, the Codex app. Like, people-- like, Codex app is built for soft- for builders, but we're seeing builders use it for all sorts of non-coding tasks. Then finally, once we see what's working, let's build the, that, like, productization that you were talking about, where you have highly specific features that just work immediately out of the box for people. And I think we're gonna speedrun this entire, like, one, two, three journey, um, in the next months.

    3. HS

      My challenge with what you said about kind of FDEs and implementation within enterprise is data security, sensitivity, permissioning, access provisions is really freaking hard, and people are much less intelligent and, uh, confident than we give them credit for, I think, especially in large enterprise. Sorry. Um, and I, I think you actually need an FDE to go in and custom fit a lot of the different horizontal solutions to make it work. Am I wrong?

    4. AE

      I think you're right if you're trying to go, like, all the way from zero to one and you have this, like-And I, I said I don't mean grand negatively here, but if you have like a grand vision for some like ultimate workflow automation system, then yeah, you're gonna have to clear through all of these security hurdles, all these like compliance hurdles that are really real, right? Build connections to all these data systems and like systems of record and action. Um, yeah, so you're gonna need an FDD to, to do that. What I've seen is that when we do these things top-down, we're, we end up like massively under-leveraging the potential of AI in like helping that company. Whereas if you can maybe do that in parallel, right? But if you can just give AI to the people like actually doing the work, um, they can start to like get a mental model for h- how AI can help, and then they can start pulling AI into their workflows as, at the same time. Here's just like an analogy or, or something here is like, imagine if, um, you know, you work in like a customer support role, and AI's being brought into your role and starting to automate like meaningful chunks of your work, but you've never heard of ChatGPT, nor are you allowed to use it, right? So in this wor- in that scenario, you have like no intuition for what this thing is. Whereas in a world where actually you've been using ChatGPT for work at the same time as like parts of your work are getting automated by an LLM, you have much more intuition for how this works, and, you know, I would argue you feel much more empowered about this idea that it's being accelerated, and you have some degree of control to steer like where these automations are built as opposed to like it's like this complete like ex machina kind of thing, um, that is quite disempowering. So bringing this back, like I think there is a way to do this because the data control issues you mentioned are real, right? But at the end of the day, every tool, every feature, every workflow is for a human who is somewhere, right? An employee somewhere, and that employee's axing that, accessing that tooling via their browser or via their file system, like at the end of the day, right? And so at the end of the day, everything comes to an interface that an agent running locally on your computer can work with, and, you know, I think it's quite unusual, like at OpenAI, we're building a browser, Atlas, right? And you might wonder why, um, and there are many reasons why, but I think one of the key reasons is that by building a browser, we can build sort of s-- and by controlling it like tightly end to end, we can build like safe agentic browsing for enterprise that is a way to access things agentically that is, that are otherwise not yet built out by FDs.

    5. HS

      There are so many questions

  7. 14:3017:04

    The importance of inference speed & OpenAI’s partnership strategies

    1. HS

      that I have to ask you. I, I wanna go back before I lose thread. You mentioned about engineers like not closing their laptops 'cause they don't actually wanna lose productivity and time w- with building with Codex. You partnered with Cerebras, and Cerebras is the fastest provider, obviously, of inference out there. Amazing win, I think, for both, bluntly. How important is speed for developers when using Codex and in the future of AI code?

    2. AE

      It's, I mean, the s- simple answer is it's, it's super important. Um, we-

    3. HS

      And so is it like an inference monopoly, like you have it now and competitors don't?

    4. AE

      This is just my opinion, but I don't think we're gonna end up in like this kind of monopolistic world. I think there is so much competitive pressure that there'll be like multiple answers to this. But I will say that we have like news coming about, coming out about our, that partnership soon, and I'm very excited for these kinds of things to ship. It's g- it's gonna be awesome. But even so, like, you know, with, uh, GPT 5.3 Codex, that model is like significantly, uh, more efficient than prior models, and so we've, in the feedback we've heard is that people actually feel like now this is like a very competitively fast model than before. So there's a lot of things you can do just in terms of the model. There are also things you can do with like improving, uh, how you do inference. So we recently rolled out a change where in the API, like those models are served like forty percent faster, and in Codex they're served like a quarter fas- twenty-five percent faster. So I think like speed matters a lot, and we're kind of approaching it from all angles, like both the hardware, how you do inference, and the model level.

    5. HS

      Y- you mentioned earlier about kind of putting it in the hands of users, and we talked about inference there. One of my dear friends is Jason Lamkin from Sasta, and he says that actually inference is the new sales and marketing. Instead of sales and marketing teams, you're paying for inference, so users can onboard quickly, easily, see value, and you will actually see the removal of sales and marketing teams. It's kind of like next gen of PLG.

    6. AE

      Hmm. Um, I don't know. I think I struggle with that. I think, I think like, you know, fundamentally in this new world where anyone can build and it is increasingly easy to build things, like what b- what is hard, right? I think having a good relationship with a customer, knowing what they need is as hard as ever, maybe even harder, as it's just like there's just more stuff in the market to choose from. You know, the other things that are hard are like building the right thing, having a really high-quality thing. But going back to the sales and marketing thing, like I don't think that goes away because I think that's, as like I said, I think that's just gotten harder as the, as the markets, any given market gets more competitive with more software out there.

    7. HS

      Can I ask, how much

  8. 17:0419:17

    The transition from "Pair Programming" to "Delegation" with GPT-5.2 Codex

    1. HS

      of internal code for you today is produced by Codex? I remember like Claude for Work, Boris said was like a hundred percent or nearly a hundred percent. How much is internal Codex used?

    2. AE

      So spe- I, I'll speak for myself and then for the team. I would say like most people that I know are basically not opening editors anymore.

    3. HS

      Hmm.

    4. AE

      Um, and this was a step function change that happened in-- it, it's been happening gradually, but I'd say the key external market touchpoint for this was like GPT-5.2 Codex, where all of a sudden the model was like way better at running for longer, at handling tasks end to end, you know, managing its, its context, um, and following instructions. And so we kind of saw this inflection point, and that's actually why, part of why we built the app. Um, so, you know, broadly, w- I think what before GPT-5.2 Codex, the, the kinds of AI features we were using to write code were like tab completion or maybe you were pair programming with the model. And in my mind, you know, you still needed to be at your laptop with your hands on the keyboard-ish, and like you might go off and do a little bit of work, but, you know, you, you kind of still need to be there and like drive. It's just like handling these small things for you.And then at the time of GPT-5.2 Codex in December, we kind of switched to like actually, I'm just gonna fully delegate this task. It's like, you know, I'm gonna have a-- do a plan with it, make sure we like the spec that it's gonna do, and then I'm just gonna let go, let it cook. And this is quite a different way of working. So it's like it's changing like literally as we speak. And so part of why we, we built this Codex app that we re-released last week is because we wanted to build like a form factor or user experience where it felt like very ergonomic to be delegating instead of pairing with an agent. Uh, and so like delegating to multiple agents at once. And so even at OpenAI, this is changing massively. Um, I don't have a percentage stat for you, but I would say like the vast majority of code is written by AI. And I would say that now probably like most people are not even like opening IDEs. May-maybe if they are opening IDEs to like maybe you wanna own the interface, right? So you'll like help flesh out like the interface between like two modules and then like AI fills it out. Or maybe you wanna like collaborate on a plan, but then have AI fill it out. The code itself is not being written by humans anymore.

    5. HS

      Will we have IDEs as

  9. 19:1720:03

    Why the Codex app isn't a traditional IDE

    1. HS

      a part of the stack in twenty-four months' time?

    2. AE

      Okay, so the, the formal definition, right, integrated development environment. I mean, that, that phrase is so squishy that like literally anything could be an IDE, right? So I don't think that's very useful. If that's the answer, then yes, you could even argue the Codex app is an IDE. I don't, I don't think it is. Like for me, I think of an IDE as like a, a really powerful editor. We explicitly didn't build editing into the Codex app because we wanted it to be really clear how you're meant to use it. So, you know, it has a lot of affordances for m-managing multiple agents, for delegating, um, for re- for reviewing changes. It has really prominent skills, which are an open standard that are really useful for doing non-coding work, stuff like, you know, triaging tasks, um, or monitoring deploys or something, but it doesn't have text editing.

    3. HS

      If we assume a large percentage

  10. 20:0322:22

    The importance of plan reviews and automated code reviews

    1. HS

      is done by Codex in terms of the code produced, how do you do coding reviews, and is AI responsible for internal coding reviews?

    2. AE

      So the- there are a few things here. Um, first off, the spec for what you wanna do or the plan becomes more important than ever, right? So like think like architecturally, like how should this code work?

    3. HS

      Yeah.

    4. AE

      Um, so you know, we recently shipped like a very prominent plan mode that works a little differently than others, where you have the agent go off and like propose how it's gonna do something. It's like quite a long plan, and then it asks you questions about if you agree on how it wants to do it or, or if you wanna have input. And this is very similar to like if you had a new hire who was new to your code base and, um, you know, they had to present a sort of req- a request for comments to the rest of the team before they started doing the work. So even though that's not formally code review, I would say review of the plan is actually something that's becoming more important because we're entering more of this like delegation phase of working with agents. So that's an underrated thing. Um, then, okay, there's actual code review. I think a problem that I hear a lot of people talking about, especially in the open source world, is like a lot of AI slop. Like people will just be submitting PRs to these open source repos, and they're trash. And like maybe the user hasn't even-- the person submitting the PR hasn't even tested them or definitely hasn't reviewed the code. I think this is a problem, and so a common practice with Codex is to have Codex like review its own PR, uh, or its own change, and Codex is actually incredibly good at this. We've explicitly trained the model to be good at code review. Um, and you know, that included things like making sure it's like really good at, uh, creating like high signal feedback. So it'll like basically have few false positives of criticism, which means you can really trust when it has feedback. And so we not only do we encourage people like on the team and elsewhere like to like just ask Codex to review, you can then also set it up to just like automatically review. So like nearly all code at OpenAI is reviewed by Codex automatically whenever you push it to a Git repo. Actually, like one, one fun thing, uh, for people who haven't tried Codex yet or didn't try it recently is, uh, sometimes the way that people like see how good our models are is by asking Codex to review a different model's code. And, and basically they're like, "Oh, shoot, I should probably just be using Codex to write my code in general."

    5. HS

      You said something really interesting there. You said for those

  11. 22:2227:47

    Building open standards and the "agents.md" convention

    1. HS

      that maybe haven't tried it yet or, uh, you know, are coming back to it, how do you think about retention with this category? I remember Tom Blomfield, who's a YC partner, tweeted months and months ago, but it stuck with me, weird brain, um, about the ease of transition between different providers, whether it was Cursor or Claude Code or Codex. I can't remember which one it was, to be honest. But how sticky are users, and how do you think about retention?

    2. AE

      We've taken this like kind of counterintuitive approach with Codex to just build it super openly. So like the Codex core harness is open source, and we're always trying to make it easier for people to switch. So for instance, um, when we first launched Codex last year, uh, we created like-- I mean, it's-- created is even a heavy word. It was just-- we just established convention, which is called agents.md. This is basically a file that you can put instructions for the agent in. And instead-- we didn't call it Codex.md. We just wanted it to be something that all agents can use. And pretty much every agent except Claude uses agents.md, which is awesome. And then just last week actually, uh, we helped push for putting skills, which are our standard for like giving the agent instructions and scripts. We pushed for those to be sorted in a sort of a neutral named folder called Agents, um, instead of in like Codex or something. And again, everyone has jumped on it except the usual suspect. Uh, so I think it's really great for the developers to have a lot of choice. Um, and we're trying to make it even easier for people to try different things. Now, that said, I think these coding tasks, right, where you're asking an agent to write some code, they're quite hermetic. And what I mean by this is you can k- it's like or ep- maybe an analogy in TV would be like episodic, right? Like you can come in, and you've got this like open-ended like agents file that like any agent can read from. You've got these skills that any agent can use. Um, and you can ask the agent to write some code, and it produces a patch, and that patch goes into Git. So kinda like both ends of this are pretty neutral, vendor neutral, so very easy to move between for now.As agents start to do work that is not writing code, but more general work, again, for software engineers or beyond, for any builder, they're gonna need to start interfacing with other systems, right? So as they start, maybe your agent is talking to Sentry, right, or it's talking to your Google Docs or something. Then I think these agents become much stickier, because actually deciding to, to connect an agent to that system is a sticky decision. And if you're an enterprise, really trusting that the agent is gonna have access to these tools, but there are really good secure guardrails and sandbox and, like, controls over how the agent works with these systems, I think is critically important, and that's not something that you're gonna want to, to do multiple times. And so, you know, we've been kind of building Codex knowing that this is coming, um, and so we have, like, the most conservative sandboxing approach. Sandboxing is kinda like a set of controls, OS-level controls over what the agent can do. Um, and we have-

    3. HS

      But I'm, I'm, I'm, I'm a fan of Seven Powers, this brilliant book which talks about-

    4. AE

      Mm

    5. HS

      ... kind of seven ways that businesses accrue value and sustainability, and, like, you know, your stickiness or your retention is one. If we're on the same team with Codex, how do we create retentive patterns, behaviors, programs to ensure that people stay with Codex and they don't flip to Cursor when there's a better model or Claude Code when there's a better model?

    6. AE

      Yeah, I mean, it's interesting because I think on the one hand, like, we, we think about this, obviously we're running a business, but, you know, our, our mission here is to, like, ensure that, like, we safely deliver the benefits of AGI to all humanity. And so something that's, like, unintuitive to people about, like, the Codex team-

    7. HS

      Alex, you actually-- I'm, I'm... I know, uh, but your job is the success of Codex. I get that as well.

    8. AE

      Actually, our job is the distribution of intelligence, right? Um, and so we're obviously building out Codex, and this is really unintuitive to a lot of listeners, but, like, we put all this effort into training these models, and then we serve these models to our competitors, right? And from our perspective-

    9. HS

      This is so difficult for me as a venture capitalist to understand. You are aware of this. [laughs]

    10. AE

      Yeah. I'm totally aware of it. It's like we're... OpenAI is, like, a really interesting and unusual place to work. But basically, because we're playing such a long game for us, like, if the competition gets better, we learn. It's actually helpful for us. And so we're pushing really hard at growing Codex, and by the way-

    11. HS

      Do, do you-

    12. AE

      ... Codex is growing massively

    13. HS

      ... do you learn? 'Cause if, if, if they're closed-

    14. AE

      Yeah

    15. HS

      ... and they improve, you don't learn.

    16. AE

      I don't think so. Like, for example, there are a bunch of recent launches. Like, even today, I literally just, like, quote tweeted a thing this morning about a launch from Warp. No particular affiliation, right? And there are a bunch of cool ideas in there about how they, like, framed up the way that their agent can work in the cloud at the same time as working locally. For me, that's, like, inspiring, and I think I see all these things from various companies, and, like, one of the coolest things about the space is it's like we're all kind of inevitably reaching the same conclusions together and then building things out. And so, you know, on the Codex team, I think we have some massive advantages, right? We have the massive distribution advantage with ChatGPT. We have the massive, like, capability advantage of training our own models to be good in our harness and building our harness to be good at the new models, and, like, no one else has early access to those. And so I think we're, we're playing to win, and we have a, a really big advantage or a number of advantages, but we're also playing this long game where, you know, again, we serve our models to everyone, uh, where we push for open standards so that everyone can use, like, all the things that we're pushing for as well.

    17. HS

      Can I ask you, what would be the

  12. 27:4730:18

    Winning strategies: Compute advantage, best models, and product execution

    1. HS

      defining factor of winning? And I, I know I'm using venture language, and you-

    2. AE

      Yeah, sure

    3. HS

      ... you're m-brilliant and kind of much more free and open. Uh, but what is, like, the defining factor of winning? Again, if I push you, is it, like, GTM, which is, like, the biggest enterprises in the world do wanna work with OpenAI? I have many friends in your sales team. The inbound that you get from the largest brands is incredible. So GTM, because of the incredible brand, product execution and just Codex being a fricking awesome product, or compute inference speed, actual, like, compute advantage. Which one is the defining winner?

    4. AE

      Okay, so I think if we're gonna talk about it more from an OpenAI perspective, obviously this is way above my pay grade, but I would say it's compute advantage and having the best models, right? And in order to achieve that, we then need to build businesses that generate revenue, and also that something we've... that's really interesting we noticed with having the Codex team, which is, like, a, a sort of combined team of research and product, is also by building these, these successful products, we create a lot of pressure to improve the model in sort of a faster way. So that's maybe the company perspective, right? If we come to the product perspective, I think the single most important thing we can do is build a, a really good product that people want to use. And like I was saying earlier, I think we really wanna build products for individuals and then allow the, like, people to become fluent in those products and then, like, pull in automation, and I, I think that may be counterintuitive but will result in way more impact than anyone purely approaching it from, like, the enterprise workflow perspective. Um, so, you know, I think that's mostly a question of product execution, and then that works for, say, like, prosumer. When it comes to enterprise, the go-to-market side is really important. Like, something that I've learned the hard way is if we go to an enterprise and we're just like, "Hey, we're here," like, "Feel free to use the stuff," that doesn't work. There's actually quite a lot of education that needs to be done, and there's a lot of, like, configuration that we need to support and sort of, like, education of the broader team. So, like, that motion looks much more like coming in, pitching, meeting the head of developer experience or whatever, understanding how they want their team to operate, and then giving them tools to, like, propagate that mechanism of operating to the rest of the team.

    5. HS

      Y- you said the word revenue there, which is one metric to measure a business against. When you think about, like, your metric of success, which you sit down with Sam or Brad or whoever it is and say, "Hey, this is what we're optimizing for," what is the metric that you use

  13. 30:1831:40

    Measuring success: Moving from Weekly Active Users to Daily Active Users

    1. HS

      as the defining north star for your progression?It's actually not revenue is the primary. The primary is active users. Mm.

    2. AE

      Um, which, you know-

    3. HS

      How, how do you measure active users? Like daily active users?

    4. AE

      Okay. Yeah, we, so we measure weekly active users, and it's, um, it's just like, uh, you know, did this person like actually do a turn in our product? You know, did they send a prompt?

    5. HS

      Is weekly active a frequent enough metric, do you think? Sounds nice, but if this is actually replacing the IDE, is daily active not better?

    6. AE

      I think daily active will be better soon. Yeah. We just happen to use weekly active. It's like a standard here, and I think as we were getting started it made sense, but I, I ag- I actually agree with the f- the criticism there. It's like we should probably just be at daily. Like, I think we, we need to be getting to a world where for any given task that you have, your first instinct is to ask an agent to help, right? It's kinda like, you know how like with Google Search, it's just like, okay, anything I need to do, I just like go into this text box and I can get navigated to the right location. Then you had ChatGPT. It's like for any information I need, I can go into this text box, type it out, and get information that helps me. And I think the next phase that we'll see this year is like for any task I need to do, as opposed to just get information, I go to this text box or this input and something happens that helps me, even if it's not the full task, even if it's only a small part of it.

    7. HS

      You said about chat and the interface there.

  14. 31:4033:52

    Chat vs GUIs: The enduring UI of AI interaction

    1. HS

      I'm, I'm really fascinated by this because it, it is a seemingly incredibly efficient input function for, for busy humans. But I, I spoke to Aneesh Akaya, who's a GP at Andreessen, and it came out the other day, and he's like, "No, no, no, this was created by Sam and Elon, and it works for very efficient people, but most of the planet want browser-based discovery interactions UIs." Do you think that chat will be the enduring UI in the next wave of AI interaction with humanity?

    2. AE

      The simple answer is yes, but actually I think there's two components here. Like if we, if we just imagine the future, like just like let's think of some sci-fi movie, right? Like what does AI look like? I, I, I, I believe that sci-fi is a really good predictor of what the simp- the future should look like, and usually it's pretty simple 'cause it's a story, and I think simple is usually right. It's gonna be some just like entity that I can talk to however I want about whatever I want, right? If I like, I shouldn't have to navigate to a place where I work with like my coding AI and then I have this like different place for my like sales AI and I have to like be like, "Hey, I'm now talking to sales thing," and like do that. It's just like I'm just gonna talk to a thing and it's just gonna help. So I think what we're gonna have is that we'll have chat or voice, basically conversational interface will be sort of the, the pillar of everything that you can talk to about anything, um, and that you can add into any group chat or whatever so it can like discover how to help you. But then if you're like a power user and you're very good as-- at a specific thing, you probably don't wanna be disintermediated by having to talk to another person. It'd be like if you had an executive assistant, but you can only work by talking to them. That's like super annoying, right? So at some point you wanna, you wanna get to the show notes and like look at them yourself and like edit them yourself, right? You wanna edit the thing yourself. So I think we'll pair chat with like functional, like graphical interfaces that are bespoke to like what someone needs. So like in my case, I will probably chat to like do my, you know, podcast prep, but when it comes to like actually looking at product and code, I probably want like the Codex app that I can go into and get deep in. Whereas maybe if we're talking to a marketer, maybe that marketer will like chat to ask questions about the product. They're not gonna download the Codex app just to ask questions about the product. But maybe they'll have like a super custom GUI for like ad analytics or something that they go into.

    3. HS

      Totally get that, and it,

  15. 33:5235:24

    Designing interfaces for agent-to-agent interaction

    1. HS

      it kind of wrongly assumes on my behalf a consumer interaction at some point in that journey, and I wanna ask you, how do you think about like agent-to-agent experiences and designing experiences for agents? Like we spoke about, for example, going into large enterprises and how you can be helpful. I'm just using the most boring thing ever, expense approval. You could have agent submission of expenses on my behalf for my trip to San Francisco, and then the agent on the flip side doing approvals for that from OpenAI's compliance department. Agent to agent. How do you think about that and that paradigm shift?

    2. AE

      My like quickest answer to this is that w- like we've noticed as we build Codex that the best, like the best interfaces for Codex to do work are also tend to be the best interfaces for humans. So like when people ask like, "Oh, like how can I make my code base like more efficient for the agent to work with?" The answer is often like, "Well, have you looked at it yourself, and is it, is it easy for a human to work with?" So like a very specific example would be like running tests in a code base. Naively, if you just like set up most test runners, they just like emit all the outputs of all the tests. And so like as a human it's really annoying 'cause you have to go in and like find the one that failed, and it's like you've gotta read hundreds of thousands of lines. Turns out that's terrible for AI as well, but if you filter it down to just only emit the failed test, better for humans, also better for agents. So probably the agent-to-agent interaction points will be very similar to like if there was a human in the loop, and that's nice 'cause it means you can kind of atomically replace individual systems.

    3. HS

      I mentioned our show on LinkedIn and,

  16. 35:2454:12

    The data moat: Coding data vs knowledge work task data

    1. HS

      uh, a wonderful investor from a, a different company. It's like Harry Potter. You know, Voldemort? And it's like, you know, he who shall not be named. Um, I don't want Sam to kill me. But from another company-

    2. AE

      Okay. Okay

    3. HS

      ... was like, "You gotta-- You ask him, ask him how do you think about a coding data moat, and does Anthropic have all the data now?"

    4. AE

      I think that from what we've seen, and you know, and I'll, I would defer to my research team on this, but I feel like we, we feel like we have plenty enough data to build really good coding models. I actually think the, the place that's more interesting for getting data now is like as we get into like knowledge work tasks, that's kind of data that's like not really like available most places on the internet. And so you start to have like really interesting brainstorms for like how to help a model be good at it. Like maybe you have to like pay people, um, to like simulate doing tasks so that you can like learn these trajectories for the model. Maybe you should acquire startups, you know, that are no longer in business or that, and uh, but have, have a lot of like data, like say their Slack or something. Um, yeah, I think that, that kind of knowledge work task distribution is like much harder than coding.

    5. HS

      That's so interesting you said that about kind of the, the data that doesn't exist, so to speak. How do you think about your interactions with the data providers, your McCaw's, your Turing's, your Invisibles, your da, da, da, da, da, da of the world? Like, will you spend 10X there, or will you go, "We are spending too much on data. We should do it ourselves and do data acquisition"?

    6. AE

      Yeah, I mean, I think the way that we think about these things is just, like, how do we move as quickly as possible? And so, you know, getting-- becoming able to set these things up in-house is, like, very expensive in time, and we're a small team. So what I have observed so far is that if we need to run a data campaign at scale, we're usually gonna enlist help from one of these companies.

    7. HS

      On the consumer side for Codex, we've spoken about, like, enterprises and going into them, how to engage in terms of developer experience, developer relations. Do you compete with a Lovable and a Rapid on a, like, low-end consumer basis in a year or two's time? Is that a business where you're like, "You know what? Codex is not for every person to create an About Me, or a small business to create their own site"? How do you think about consumer in that way?

    8. AE

      Yeah, I would say that right now it doesn't feel like we're c-competing super directly. Um, but, you know, I don't know if you saw our, our Super Bowl ad, uh, the tagline of which is just, "You can just build things." Um, with the app, we noticed that, like, many, many people who are less technical are starting to build things, and so the kinds of things they're building are much more hello world-y. And so I think that we will see some overlap in use cases, um, where you have, you know, people just pulling up Codex because they have it as part of their ChatGPT. Actually, like, a big announcement, um, last week was that we're now offering some Codex to people even on free ChatGPT plans or on the Go ChatGPT plan. So this is, this is massive just in terms of, like, bringing availability to everyone. Um, and so I think we're definitely gonna see people with, like, a free ChatGTP-- ChatGPT plan coming in and just, like, building simple things where they otherwise might have gone to a specialized tool.

    9. HS

      What would you most like to do differently, but for whatever reason you can't?

    10. AE

      This is an interesting one. I feel like it's been a very good few weeks for us. [chuckles] Um, so we're very-- I'm pretty jazzed about everything that's happening. But maybe the feeling that I have the most-

    11. HS

      That's really-

    12. AE

      Yeah.

    13. HS

      That's really interesting. You said it's been a very good few weeks for us, and I feel that. Does the team feel changing winds of momentum both in positive and negative cycles?

    14. AE

      Absolutely. We, we are very attuned to it, right? Like, if you look at the, the history of Codex, the first thing we launched last year was, like, this amazing idea that people were super excited about. It's like, "Hey, we're gonna give the agent its own computer in the cloud. You can have a-as many of them as you want work for you in parallel on tasks." Super great idea. To be honest, it didn't work as well as what we shipped later. It was not the best. Um, and then since August with GPT 5, we started pushing really hard on interactive coding, which is where most of the competition in the market is. And, you know, we went on an absolute tear. I feel like the public metric we have is, like, since August, we grew by, like, 20X, and then, like, even, like, late in the year, we, like, doubled from December to now. I forget the exact number there. But, like, you know, that was competing neck and neck. But the, the shift that we feel last week is, you know, we ha- we felt like we had the most intelligent model. That was cemented with 53 Codex. We had feedback around our model being slower and, like, maybe less fun to work with and, like, being less good at communicating with you while it was working. We addressed that feedback. Um, and that's true even compared to, like, the, the other competitor model that launched, like, 20 minutes before us and was like, "Maybe this is spicy." It was like soda for 20 minutes. Soda means state-of-the-art. Um, and then, you know, we'd, we'd always been getting a lot of feedback on, like, the quality of the user experience in Codex. Our, our most popular surface was the IDE extension, and our CLI, which is a command line interface, was less polished. But with the app, the feedback has been, like, s- resounding from the market, uh, that this is, like, a really high-quality experience that's, like, simple, like, unintuitively simple, and people are just loving using. Even our biggest critics are converted. So yeah. A-and then we, and then we had the Super Bowl ad, and then we went to free. And so going back to your question of, like, what do I most wanna do differently, the first is I actually wanna get back to cloud. Um, when we pivoted our strategy from, like, building the cloud-- like focusing on the cloud agent last year to working interactively, the thinking was very simple. It was just... And it's kind of like what I was telling you about FDEs, actually. If you go too far ahead to workflow automation before your end user is fluent with the tooling and can get it to work simply, then there's, like, this disconnect, and you just have this pipe dream idea that's not, like, effective, uh, for, except for the most power users. But once you have this base where people are using your tool every day, like you said, and they're configuring it, and every time they use it, it gets better, then, like, the step up to, like, letting it run independently in the cloud is a much smaller step up, right? Um, so I think it's time for us to, like, get back to, like, building out the cloud product and making it super tightly integrated with the local product. It already is somewhat integrated. And the other thing I wanna do differently is, um, start thinking more about the bottlenecks. Like codegen, writing code has become, like, you know, basically trivial now. Um, but the hard part is, like, what you were talking about with, like, code review, right? Like, how do we know the code quality is good? How do we know we're doing the right things? And those bottlenecks, I think, are under-underappreciated still and underinvested in. So, like, I think we wanna get to a world where you can have an agent that is unbottlenecked, right, that you trust to, like, own an entire microsystem or internal tool or whatever, and can do the full iterative loop, including feedback from users, without having to go through human review. And that is a really hard problem to solve, both from an intelligence perspective, but also from, like, a safety perspective and a controls perspective.

    15. HS

      How much weight should we place on benchmarks and evals?

    16. AE

      I think probably, uh, this is an annoying answer for you. It's, like, some.

    17. HS

      [chuckles]

    18. AE

      Right? Like, they do tell you-- They kind of-- I-- In my mind, they give you a good measure of intelligence.Right? Um, and so you can put weight on those for intelligence, um, and especially before evals are saturated, I think you-- when you see meaningful progress in those benchmarks, it's like very, um, very helpful. Um, and then I think you have to pair that though with like what it feels like to use the model, and that's, that's a vibes thing. Like, whenever I talk to any, like even internally or even talking to like customers of our models, I'm always surprised by how vibes-based the, uh, evaluation of how it feels to work with the model is.

    19. HS

      [laughs] How vibes-based life is. People wanna work with people they like-

    20. AE

      Yeah

    21. HS

      ... is the lesson that I give to kids.

    22. AE

      Exactly. People wanna work with models they like.

    23. HS

      Like relation-- in terms of like market composition, as an investor, I have to think through how do I think about the eventual state of this given market, kind of a, a terminal state. How do you think about that? Is it like Uber and Lyft, and like the majority of the market will be on Codex or Claude Code? Or is it like a AWS, Azure, Google Cloud, and a thirty-three, thirty-three, thirty-three?

    24. AE

      I think this might end up with fewer providers that are capturing a lot of value i, in the long run, and here's why. Like, and maybe this is a bit spicy, but I think that we are kind of in this temporary phase where we have agents that are really good at coding, right? And, and if you look back last year, like maybe more people thought we would have agents that are good at other domains too, but that didn't happen last year. So we're-- so we only have PMF for coding agents, like in the industry overall, I would say, right? And then there's some like very nar-narrow other use cases like customer support, et cetera. Um, but I think that's probably temporary, and then over time, I think we're gonna end up with agents that kind of can do anything for you. This is kind of what I was saying earlier, like there's just like a super assistant. You talk it to it about anything. And then there is like specific like UI that you can go look at if you happen to be deep in a specific function. So in that world, I don't think you want like twelve agents at the company, and you have to like go... your employees have to go figure out the right one to talk to, because then they won't achieve fluency. And if they don't wanna achieve fluency, then they will also won't like pull automation into their roles. But if you have this one thing that you can talk to about anything, right? So your onboarding is just like, "Go talk to this thing about anything you need," then people will develop muscle memory to go to it. It'll become the center of gravity of work, and people will pull in automation. So I think that that future makes much more sense, and I think like as the people building ChatGPT were like really well set up to deliver that. This, this is kind of a stretch, but an analogy here is I used to work at Dropbox. And for a while, w- this is before Slack was big, and for a while we thought, you know, I wonder-- we wondered if people should like go comment on like documents in Dropbox or, and then, or if they should like go talk about the doc-documents in Slack. And it was like obvious that it was like more optimal for people to like put comments on the right timestamp in the video in Dropbox or like comment on the document in Dropbox. Right? So it was more optimal. However, what we saw is that Slack is just such a s- a center of gravity of people just like talking to each other. Like, nobody wants to comment on the document. P-- I just wanna Slack you, right? And so we saw that like there was this really big pull towards things happening in Slack, even if it was less efficient. And I think we're gonna see something similar at work, where if there is a single agent you can use for nearly anything, it-- there will just be this giant pull, and everyone will talk about how they use that one agent for things. You know, teams will share best practices with each other. There'll be hackathons around how to use that best thing. Um, yeah, and you'll end up with just a handful of these.

    25. HS

      You said about kind of agents not really proliferating in terms of usage other than coding, and actually maybe this being the time and, and, you know, customer support is one of the examples. [laughs] My question to you is, I'm an investor today. I'm looking for companies which will accrue value over time and provide incredible products to customers. There is a belief that the durability of revenue of large SaaS companies today is zero, and that SaaS is dead because the model providers, you, Anthropic, others, are going to come for our lunch, so to speak. What would you advise me?

    26. AE

      Like, things are built for humans, like otherwise, what's the point, right? Even in, even SaaS tools are built for humans. And so for me, I think my question is like, does this SaaS company own a relationship with a human on the other end of, of things? And if it does, then I suspect it's, it's not going away. Um, you know, or does the SaaS company own some like really important system of record? It's probably not going away. Maybe those, both of those t-two things, the interaction with the human and the system of record are like more important than ever, actually. On the other hand, is the SaaS company like a kind of a glue layer, uh, but it doesn't own either of those two things? Well, I'm not the expert here, but I'm more nervous about that kind of company.

    27. HS

      So then if we take that stance, Salesforce and ServiceNow, you know, they're down twenty, thirty, forty percent. I think it's massively exaggerated. I, I think there are some companies that legitimately should be. Respectfully, I think Dropbox is in a very difficult position [laughs] . Um, and I think your, I think your monday.coms of the world, though, for the majority of SMBs and consumers who use it, which is the large majority of their market, actually, could they vibe code a to-do list? Yes. Would it be cost efficient to do so? Not really, actually, by the time you customize it and perfect it. And to be honest, a to-do list is generally pretty bland in terms of what you need to do. Add task, complete task, show historical tasks, assign to new members. It is not very difficult, and so actually, I think you just keep it. And so I think it's m-massively overblown, um, and I think that's the classic knee-jerk reaction from markets. But I do think, sorry, I do, I do think, like I think you're gonna come for customer support, and I wouldn't wanna be in that category.

    28. AE

      I think this maybe changes what kind of founder you invest in, right? Like I think there was this maybe temporary phase where... that I liked [laughs] personally as a product builder. There was this phase where you would invest in like the, the person who can just like build good product.And you could kind of ignore if they had a good thesis around a customer, or go-to-market, or distribution, or anything like that because it was so hard to build good product, right? And I think that was a b- that was an anomaly. If we look at where we are now, like, maybe that kind of founder is not the founder you should invest in 'cause it's, like, kind of relatively easier to build good product, and you need to go back to, like, investing in the founder who's, like, thought through distribution, who has a good, good domain expertise of what to build for a specific customer.

    29. HS

      So again, if you were on my team as an investor, how would you think about interesting areas for us to invest in in companies that will accrue value and not be threatened by model providers? 'Cause again, like, you're going into health, you're going into co- obviously Codex is very clear. You're going to customer support. Shit, where, where are you not going? And where is Claude Code not going?

    30. AE

      I'm tempted to just say, like, I don't know. I couldn't... I think it's a hard time to be an investor. It's, uh, the market is so dynamic, it's hard to say.

  17. 54:1255:51

    Advice for the Next Gen of Engineers

    1. HS

      today?

    2. AE

      At least from when I was at Dropbox, the thing we were uniquely good at was desktop software.

    3. HS

      Uh-huh.

    4. AE

      And desktop software is... It's funny, it was never not back, but anyways, it's so back, um, basically because if you're solving for productivity and knowledge work, um, yes, there are systems of record everywhere that you need to connect with, but everything at the end of the day happens on the user's computer, either in their browser or, you know, just like locally in apps on their computer. And so-I do think that the, the, the fastest way we're gonna see productivity gains from agents at work is going to be at first meeting users on their computer, working with the stuff that they have available to them, you know, without having deployed FDEs to set anything up, and then over time you'll connect in these various systems. And so if I was Dropbox, I would be thinking about how do we leverage our unique domain expertise in, like, building really good, like, desktop software, uh, and this sort of collaborative layer on top of your computer? How do we leverage that to enable productivity agents? It's a bit broad, but I think that's the angle you'd go for.

    5. HS

      No, I love it, and I really appreciate the response. Final one before we do a quick fire, I promise.

    6. AE

      Yeah.

    7. HS

      I've been brought up in a world where margin matters. Software margins are wonderful, and it's what makes software a brilliant category to invest in. We're seeing margin profiles that are very different in, in inference-heavy plays in particular. To what extent should I put that out of mind and appreciate that costs will come down, cost of tokens will come down, and actually it's about usage and customer love, margins will come? Or no, margins are actually fricking important, keep, keep that focus?

    8. AE

      I think both costs are gonna come down significantly,

  18. 55:5157:26

    Lessons from competitors (Claude Code)

    1. AE

      and I also think that, you know, if this is the year of agents being deployed, like, broadly at work connected, then this is also the year where they're gonna have to be connected to all these various systems, and I think that's gonna be very sticky. And so I view this year as a race. And so I think you wanna win that race, and you should be okay tak- you know, taking some hit to margin in the meantime.

    2. HS

      Dude, quick fire round. So I say-

    3. AE

      Yeah

    4. HS

      ... a short statement, you give me your immediate thoughts. Does that sound okay?

    5. AE

      Yeah.

    6. HS

      What have you changed your mind on most in the last 12 months?

    7. AE

      When I joined OpenAI, I thought that... This was a little longer than 12 months ago, but when I joined OpenAI, I thought that we would all just be hanging out with our computer screen sharing b- within a year from there. You know, we'd have this agent that we're just talking to. Um, that was completely wrong. Um, I think the rate of, like, progress in, like, multimodal models was, like, slower than I expected. Uh, multimodal means, you know, like, models that work with, like, video and audio. So instead, what happened was that we saw that, like, agents that work with your computer through code are the way, and so for me, that's been a complete rethink in terms of, like, how we bring the benefits of AI to, like, just people generally. It's not, not through video and audio primarily.

    8. HS

      Which lesser-known competitor do you respect most, and why?

    9. AE

      The first one that came to mind was Amp. Um-

    10. HS

      Amp?

    11. AE

      I think they're building... Yeah, A-M-P.

    12. HS

      Okay.

    13. AE

      Um, it's out of, out of the folks at Sourcegraph. Um, their product has a great reputation of just being, like, you know, punching way above its weight. But I think the other thing that I really respect is that they helped initiate this whole, like, standardization around, like, agents.md and, like, .agent/skills,

  19. 57:261:00:39

    Lessons from Dropbox and Slack

    1. AE

      which are what I was saying earlier about, like, making it so that it's easier for users to manage all these different agents that they're trying. Um, you know, we obviously put out agents.md, but they put out agent.md, and basically Quinn started this all by putting out a tweet that said, "Hey, do you guys buy the domain agents.md? We'll standardize to your, your spelling." And as small as that was, that initiated this whole standardization that I think has been awesome in the community.

    2. HS

      Do you think the response to Anthropic's ads was the right response?

    3. AE

      I mean, there were so many different responses. The one that I heard obviously I think was right. The one that I heard was, "Well, one company's being pretty negative about the future, and the other company, us, OpenAI, is being really positive and just telling people they can build things and to dream." I, I thought that response was brilliant.

    4. HS

      What's the hardest product decision you've had to make since being at Codex?

    5. AE

      Well, I can tell you the most painful product decision we had to make.

    6. HS

      Great.

    7. AE

      Um, for a while, um, Codex Cloud was, like, effectively unlimited. Not free, like you needed to pay for ChatGPT, but then you had unlimited usage. And, uh, you know, we-- every day that we left it that way, we knew that it would be harder to wind back it being, like, unlimited.

    8. HS

      Yeah.

    9. AE

      But we were just so focused on competing on our other things that had more PMF that we kind of punted that decision out. And when we, when we wound back, um, that unlimited use to some, like, more reasonable limit, there was a lot of blowback from users, and it was a very small minority of users who, like, thought everything should be kind of, like, pseudo-free forever. But that blowback affected us everywhere because, like, the social chatter doesn't really distinguish, uh, between these things. So, um, I think the lesson I learned the hard way there is, like, you can't, can't make things unlimited for too long.

    10. HS

      Data set pricing, grandfathering pricing is just-

    11. AE

      Yeah

    12. HS

      ... it's such a hard thing. What do we do today in engineering or product that in five years' time you'll look back on and go, "Oh, my God, can you believe that we did that?"

    13. AE

      Well, one is just editing code by hand. Um, I think probably another one, this is maybe spicier, but another one might even be, uh, like, actually managing the deployment and monitoring of, um, systems by hand. Like, I basically think that probably big companies will take a long time to, like, deploy this, but many startups might actually kind of start building on a completely new stack that's, like, fully AI-managed. To be clear, the stack doesn't exist yet, but a fully managed AI stack where becau- like, basically it's been built to give you really strong deterministic guardrails over what the agent can do and, like, control over to, like, roll back deploys and everything like that. And so we'll get to a world where the way you start a company is you start by getting an agent and just asking it to build things, and then you get more agents than that, and then maybe eventually you add, you add your cofounders to this service that you use to work with agents. And so you end up, like, maybe your main communication tool is actually your agent communication tool, and then maybe, uh, you're not actually, like, handholding this, like, very point- painful CI and deploy process, but you're just, like, having agents do things.

    14. HS

      Weird question, but I am intrigued. Are you the one providing agent guardrails? And what I mean by that is your agents can go anywhere within an enterprise. Are you responsible for providing those guardrails,

  20. 1:00:391:02:53

    Quick-Fire Round

    1. HS

      or is there a third-party matter provider?Who is saying, "Hey, whoa, whoa, Alex, you can't go into that. That's human resources," or, "You can't go into that, that's marketing." How do you think about guardrail provisioning, and is that the role of the agent provider or a third-party provider?

    2. AE

      I think we'll probably see both. Like, we are putting a lot of effort into agent guardrails. Like I said, we have, I think the most-- w- we're basically the only company that cares about OS-level sandboxing for coding agents. For instance, there's none that exists on Windows. We're the ones building that, uh, and we're doing it in open source, so hopefully other people can use it. Uh, we think about that a lot. Um, we... ChatGPT supports connectors, so, you know, you can talk to your, like, Google Docs or something, and we put a lot of effort into guardrails around what the agent can do with your Google Docs. Um, so those are just two examples, but we think a lot about this, and I think probably, though, the way that we'll do it will not be sufficient. Like, there'll be third parties who provide, like, very bespoke things for very bespoke, you know, company needs, and there'll probably be a mix of both.

    3. HS

      Final one for you, my friend. What are you most excited about when you look forward 10 years?

    4. AE

      This is probably gonna happen in much less than 10 years, but, like, my mission sort of personally when I joined the company was I just felt like even with the models we had a year and a half ago, there was so much just capability overhang or just, like, ability for these things to be useful, but we hadn't built the right products around that. And so people like me were getting more benefit than, like, people like my grandma. And so what I'm most excited for is to get to, like, a form factor for AI that means that they're just helping everyone, regardless of whether they're in tech, and especially if they're not in tech or especially if they're older. Um, and so, you know, the concrete vision I have is, like, at some point we'll, like, add an agent to, um, like, our family WhatsApp or something, and it'll just start, like, being useful, um, to the family without anyone having to think harder about it than that. Um, there, there are many other ways that, that could happen, but I think concretely that's the most obvious thing we could do with, like, my grandma.

    5. HS

      Dude, I so appreciate you. I so appreciate you putting up with my wandering questions and my very, uh, episodic mind. You've been fantastic, man.

    6. AE

      Thanks so much. I mean, I appreciate you putting up with my wandering answers, so [laughs] all good. We're two here.

Episode duration: 1:03:18

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode S1rQngjpUdI

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome