The Twenty Minute VCWho Wins the AI Coding War? | Codex Product Lead
EVERY SPOKEN WORD
70 min read · 14,441 words- 0:00 – 2:12
Intro
- AEAlexander Embiricos
You still need software engineers today. You still need designers. I'm a PM. Do you need PMs? You know, you can have some fun jokes about that. I don't think you need them.
- HSHarry Stebbings
[upbeat music] Today, joining us in the hot seat, we have Alexander Embiricos, product lead for Codex at OpenAI. This is an incredible discussion. Time to get the notebook out.
- AEAlexander Embiricos
For me, the most exciting future with AI is one where everyone just feels like a superhuman, like empowered by AI, and for that, we need tools that everyone feels fluent with.
- HSHarry Stebbings
Your job is the success of Codex.
- AEAlexander Embiricos
Actually, our job is the distribution of intelligence, and this is really unintuitive, but, like, we put all this effort into training these models, and then we serve these models to our competitors.
- HSHarry Stebbings
This is so difficult for me as a venture capitalist to understand. Elon said that coding is one of the first professions to be largely automated. Do you agree?
- AEAlexander Embiricos
For sure, I would agree that coding is one of the first domains where LLMs are really good, but what does it mean for coding to be automated? It's, like, kind of a heavy statement, right? For example-
- SPSpeaker
Ready to go? [upbeat music]
- HSHarry Stebbings
Alex, I'm so excited for this, dude. I told you, I've been at a PE conference, and all I could think was, "Thank God I've got Alex next," 'cause this is gonna be a great one. So thank you so much for joining me, man.
- AEAlexander Embiricos
So excited to be here. Thank you.
- HSHarry Stebbings
Now, I- this is a weird first start, but roll with it. You'll, you'll understand my British intricacies. I'm fascinated by people's motivations. Are you motivated more by the fear of losing or, like, the thrill and excitement of winning?
- AEAlexander Embiricos
I, I'm a maximalist. I'm definitely much more motivated by the idea of winning than the fear of losing. But I'll admit, I'll admit to you something. When I was running a startup before joining OpenAI, and one of my darkest moments, and there were many dark moments while I was running the startup, was recognizing that I had spent the past few months trying to avoid losing.
- HSHarry Stebbings
[chuckles]
- AEAlexander Embiricos
And all of a sudden I was like: Oh, my God, that is why I'm so unhappy, and that's probably why the startup isn't going well. And so when we flipped, you know, I... Basically, every now and then, I have to re-catch myself and, like, flip back into this idea of winning. But really, what motivates me even more than that is I think I just love building things and building things for people. And man, I am so excited for this year because many amazing things that don't exist yet are gonna be built and given to a lot
- 2:12 – 3:38
Will AI automate coding?
- AEAlexander Embiricos
of people.
- HSHarry Stebbings
I'm diving right in. Elon said that coding is one of the first professions to be largely automated. Do you agree, given your position [chuckles] and what you see day to day?
- AEAlexander Embiricos
I think for sure I would agree that coding is one of the first domains where LLMs are really good. You know, what does it mean for coding to be automated? It's, like, kind of a heavy statement, right? Like, for example, now that we no longer write Assembly, like, when that change happened, and we moved to higher-level languages, did we say coding is automated? Not really, right? We were just able to write much more code, and then, as a result, actually, there was much more demand for code, and there were many more software engineers required. But yeah, part of what they used to do is automated. In the same way that, like, do you know the origin of the word computer?
- HSHarry Stebbings
No.
- AEAlexander Embiricos
Um, I might pronounce the location wrong, but I think it was at Bletchley Park. There were all these machines for, like, decoding German Enigma, and, like, there were humans who would, like, punch out punch cards and, like, put them into the machine and do a bunch of, like, tabulated math. I'm probably butchering this, but basically, there was an intensely manual part of work, and even, like, the first spreadsheet software was kind of loosely based off this idea that you would have an office full of desks arranged in a grid and people doing tabulations and then passing their sheets to the next person. And so all these things, like, those specific tasks have become automated, but every time that's happened, there's been an explosion in demand for the output, and so, like, you need many more people actually to do that kind of work, even if the specific
- 3:38 – 5:32
The "Compression of the Talent Stack": The future of engineers, designers & PMs
- AEAlexander Embiricos
task has changed.
- HSHarry Stebbings
So you think we'll have more engineers in five years, not less?
- AEAlexander Embiricos
Yeah, and I, you know, sometimes we change what terms mean, right? Like, the term computer now refers to something else, but the, now we have the term software engineer, and so I definitely think we'll have many more builders. You know, something interesting that I'm observing now is, like, there's this compression of the talent stack. Like, you know, you still need software engineers today. You still need designers. I'm a PM. Do you need PMs? You know, you can have a fun, fun, some fun jokes about that. I don't think you need them. Um, but maybe, you know, maybe when you say engineer, you might be thinking of someone who's, like, much more full stack, right, than, than has been true before. Like, even if you go back a few years, it was much- you had many more places where there was, like, the back-end engineer and the front-end engineer, right? Whereas, like, now, at least if I think about the Codex team, like, there's very few- like, that's much less the case, and things are much more full stack, right? And so I think this compress- this talent stack will compress, but we'll still have people building.
- HSHarry Stebbings
Why do you think we don't need PMs in this world? Y- Y- You dangled the carrot.
- AEAlexander Embiricos
Yeah, yeah. It's, it's my fun joke. I think... Well, first of all, I think it's incredibly dehi- d- hard to define what a PM is, what a product manager is. I kind of think of the role as, like, actually explicitly undefined, and your goal is just to adapt to whatever the team or business needs. And, you know, often, if you have a bunch of people, like, say, here, like, trying to build as quickly as possible, then what a, what a product manager can do is spend time, like, taking a few steps back and trying to look around corners and figure out what to do. You know, collaborate with the, the folks, and go to market and maybe be the, the team's, like, greatest cheerleader and quality raiser. But, like, all of those things I just described, which are maybe my current role, could be done by a really strong eng lead or a designer who thinks a lot about product. And so I think it's, like, often useful to have product managers, but you probably don't want many of them until the team is really large.
- HSHarry Stebbings
I was stalking the shit out of you for the last few days, which was a very fun expedition into your writing, into your tweets, into your
- 5:32 – 9:43
The bottleneck of AGI
- HSHarry Stebbings
prior interviews, and you said that human typing speed and validation work is the key bottleneck to AGI, not model, compute, or architecture. And it kind of left there, and I was like, help me understand why human typing speed and validation work is the key bottleneck and, and what you really meant by that.
- AEAlexander Embiricos
For sure. Okay, that's a, that's a fun one. I think there are multiple bottlenecks, but that's maybe the most sort of clickbaity one. So, uh, if you don't mind, I will do this slightly Socratically. Like, how many times would you say you use AI today?
- HSHarry Stebbings
Thirty-plus times a day.
- AEAlexander Embiricos
Okay, cool.... how many times do you think, assuming it was like zero energy expenditure from you, how many times do you think AI could help you per day?
- HSHarry Stebbings
I mean, I- in, in everything.
- AEAlexander Embiricos
Yeah.
- HSHarry Stebbings
I think we'll have it running twenty-four hours a day across every single thing.
- AEAlexander Embiricos
Exactly. And like, I hear things now from engineers, like at OpenAI and also outside, who are-- they were telling me, like, "You know, I constantly have Codex running. I never close my laptop, and if it's not running while I'm in a meeting, I'm, like, wasting my time. I need to make sure Codex always has work for me that it's doing." And that's, like, super cool and super exciting, but that's a lot of work, right? To, like, manage this- manage these agents and make sure they're always working. Going back to the thirty times per day thing, yeah, like, when we look at how often, uh, you know, Codex users are using Codex, it's like kind of this, like, tens of times kind of range. I think AI should be helping us tens of thousands of times per day, you know, compute budget permitting. We'll- and we'll get there over time. Uh, but the problem is like, at least if I think of myself, like, I work on this stuff. I know I should be using AI for everything, but I'm too lazy to, like, type out that many prompts, and I am too uncreative to figure out all the ways that AI can help me, and so I end up kind of at a similar number as you. Um, and, um, you know, I still am at the point where when I use AI to do something cool, like prep for this conversation with you, I'm, like, kind of proud of myself. I'm like: "Oh, cool, I managed to use AI in this new way." And-- but that's fine for people like you and me who are, like, really interested in this topic, right? But I don't think most people we should expect to like, in order to benefit from, you know, AGI, should need to, like, put so much effort into how to use this tool. It should just be effortless for them. And so I think the world we want to get to is one where to use AI, you don't really need to, like, figure out the right way to prompt. It's just super easy for you, and you don't even need to recognize that AI could help you. It's just, like, knows you, connected to your context, and chimes in helpfully.
- HSHarry Stebbings
That's where I think, like, Claude has done well in terms of the packaging they've done. Like Claude for Legal, Claude for Excel, where you can implement it and have a DCF model. I'm not into models, but, like, better than one could do before. Do you think it is your job then to productize the prompts and the human actions to remove that bottleneck?
- AEAlexander Embiricos
Yeah, totally. So I, I think that it is our job to make sure that we have the models with the- with amazing capabilities, and then eventually, to get to a world where this is, like, highly productized, and so you just have this, like, magic text box or audio input or whatever, or you can just add AI to your, like, group chat, and it just starts to help. But I think there's quite an interesting in-between stage, and I think that that is actually where the most value lies right now. So h- here's what I mean. You could try to productize, like, a specific feature of AI for a specific market, and, you know, there are many companies are doing this, but I think it's a little bit hard to know what exactly will work, what is the right form factor. You know, I, I- someone was on your podcast earlier, and I- they said something that I thought was quite interesting about how you, you cannot adopt AI at enterprise without FDEs.
- HSHarry Stebbings
Yeah, it was Matt Fitzpatrick from, um, Invisible AI. Yeah.
- AEAlexander Embiricos
Yeah. So, so even though I am literally hiring FDEs, and if you're an FDE, please apply for a job with me, I actually disagree with that entirely. So what I think we need to do is build tools for people. Like, you can use FDEs, you know, uh, as, as Fitzpatrick said on the podcast, like, to automate workflows, right? But then you're limited by, like, what you, from your top-down perspective, can do and what you, from your FDE staffing, can, can staff to be built, right? But for me, the most exciting future with AI is one where everyone just feels like a superhuman, um, or a god, just, like, empowered by AI.
- 9:43 – 10:28
Building for individuals vs top-down enterprise automation
- AEAlexander Embiricos
And for that, we need tools that are for people, for individual users, and that everyone feels fluent with. I think the ph- the phase that's most interesting that we're at now is building for the kind of people who are interested in figuring out how to use AI. So what we need to ship, and I think this was, like, the genius of, like, when Claude Code first shipped, what they really got right, um, was they had this tool that was super easy to use in whatever context you want, just in your terminal, and people started experimenting with where to use it. And so I think as we think about AI being used outside of coding work, one of the most important things we can do is not overly build it like, okay, this is AI capabilities, but only specifically for finance, only for specifically for this workflow, but actually build a much more open-ended tool that someone can just use for any given task
- 10:28 – 14:30
The three phases of agent development
- AEAlexander Embiricos
creatively. Yeah.
- HSHarry Stebbings
But does that not put the onus or the effort back on the user, back to the point of your bottleneck of human action and lack of activity on them? If you don't define the task, you put the responsibility on them for the defining the task, which humans lack the ability or inclination to do.
- AEAlexander Embiricos
Yeah, I think that... So that's why I think it's the bottleneck. So basically, here are the three phases in my mind. First, let's have agents work really well for software engineering and coding because LLMs happen to be good at that. Next, let's realize that for an agent to be useful more generally, it using a computer is super valuable, and also we'll realize that all agents are actually coding agents, 'cause coding is just the best way for an agent to use a computer. So let's take that same super flexible idea but make it available to anyone who's excited to explore and tinker. And we're already seeing people start to do this with, like, the Codex app. Like, people- like, Codex app is built for soft- for builders, but we're seeing builders use it for all sorts of non-coding tasks. Then finally, once we see what's working, let's build the, that, like, productization that you were talking about, where you have highly specific features that just work immediately out of the box for people. And I think we're going to speed run this entire, like, one, two, three journey, um, in the next months.
- HSHarry Stebbings
My challenge with what you said about kind of FDEs and implementation within enterprise is data security, sensitivity, permissioning, access provisions is really freaking hard, and people are much less intelligent and, uh, confident than we give them credit for, I think, especially in large enterprise. Sorry. Um, and I, I think you actually need an FDE to go in and custom fit a lot of the different horizontal solutions to make it work. Am I wrong?
- AEAlexander Embiricos
I think you're right. If you're trying to go, like, all the way from zero to one, and you have this, like-... and I said, I don't mean grand negatively here, but if you have like a grand vision for some like ultimate workflow automation system, then, yeah, you're gonna have to clear through all of these security hurdles, all these like compliance hurdles that are really real, right? Build connections to all these data systems and like systems of record and action. Um, yeah, so you're gonna need an FD to, to do that. What I've seen is that when we do these things top-down, we're-- we end up like massively under-leveraging the potential of AI in, like, helping that company. Whereas if you can maybe do that in parallel, right? But if you can just give AI to the people, like, actually doing the work, um, they can start to, like, get a mental model for how AI can help, and then they can start pulling AI into their workflows as, at the same time. Here's just like an analogy or, or something here is like: imagine if, um, you know, you work in like a customer support role, and AI is being brought into your role and starting to automate like meaningful chunks of your work, but you've never heard of ChatGPT, nor are you allowed to use it, right? So in this wor-- in that scenario, you have like no intuition for what this thing is. Whereas in a world where actually you've been using ChatGPT for work at the same time as like parts of your work are getting automated by an LLM, you have much more intuition for how this works. And, you know, I would argue, you feel much more empowered about this idea that it's being accelerated, and you have some degree of control to steer, like, where these automations are built, as opposed to like, it's like this complete, like, ex machina kind of thing, um, that is quite disempowering. So bringing this back, like, I think there is a way to do this because the data control issues you mentioned are real, right? But at the end of the day, every tool, every feature, every workflow is for a human who is somewhere, right? An employee somewhere. And that employee is axing that-- accessing that tooling via their browser or via their file system, like at the end of the day, right? And so at the end of the day, everything comes to an interface that an agent running locally on your computer can work with. And, you know, I think it's quite unusual, like at OpenAI, we're building a browser, Atlas, right? And you might wonder why. Um, and there are many reasons why, but I think one of the key reasons is that by building a browser, we can build sort of s-- and by controlling it, like, tightly end-to-end, we can build, like, safe agentic browsing for enterprise. That is a way to access things agentically, that is-- that are otherwise not yet built out by
- 14:30 – 17:04
The importance of inference speed & OpenAI’s partnership strategies
- AEAlexander Embiricos
FDs.
- HSHarry Stebbings
There are so many questions that I have to ask you. I, I want to go back before I lose thread. You mentioned about engineers, like, not closing their laptops 'cause they don't actually want to lose productivity and time with, with building with Codex. You partnered with Cerebras, and Cerebras is the fastest provider, obviously, of inference out there. Amazing win, I think, for both, bluntly. How important is speed for developers when using Codex and in the future of AI code?
- AEAlexander Embiricos
It's-- I mean, the simple answer is it's, it's super important. Um, we-
- HSHarry Stebbings
And so is it like an inference monopoly? Like you have it now and competitors don't.
- AEAlexander Embiricos
This is just my opinion, but I don't think we're gonna end up in like this kind of monopolistic world. I think there is so much competitive pressure that there'll be like multiple answers to this. But I will say that we have, like, news coming about-- coming out about, uh, that partnership soon, and I'm very excited for these kinds of things to ship. It's, it's gonna be awesome. But even so, like, you know, with, uh, GPT-5.3 Codex, that model is like significantly, uh, more efficient than prior models, and so we've-- in the feedback we've heard is that people actually feel like now this is like a very competitively fast model than before. So there's a lot of things you can do just in terms of the model. There are also things you can do, like improving, uh, how you do inference. So we recently rolled out a change where in the API, like those models are served like forty percent faster, and in Codex, they're served like a quarter fast, twenty-five percent faster. So I think, like, speed matters a lot, and we're kind of approaching it from all angles, like both the hardware, how you do inference, and the model level.
- HSHarry Stebbings
Y- you mentioned earlier about kind of putting it in the hands of users, and we talked about inference there. One of my dear friends is Jason Lemkin from SaaStr, and he says that actually inference is the new sales and marketing. Instead of sales and marketing teams, you're paying for inference, so users can onboard quickly, easily, see value, and you will actually see the removal of sales and marketing teams. It's kind of like next gen of PLG.
- AEAlexander Embiricos
Hmm. Um, I don't know. I think I struggle with that. I think, I think like, you know, fundamentally in this new world where anyone can build, and it is increasingly easy to build things, like what, what is hard, right? I think having a good relationship with the customer, knowing what they need is as hard as ever, maybe even harder, as it's just like there's just more stuff in the market to choose from. You know, the other things that are hard are like building the right thing, having a really high-quality thing. But going back to the sales and marketing thing, like I don't think that goes away because I think that's, as like I said, I think that's just gotten harder as the, as the markets-- any given market gets more competitive with more software out there.
- 17:04 – 19:17
The transition from "Pair Programming" to "Delegation" with GPT-5.2 Codex
- HSHarry Stebbings
Can I ask, how much of internal code for you today is produced by Codex? I remember, like Claude for Work, Boris said, was like a hundred percent or nearly a hundred percent. How much is internal Codex used?
- AEAlexander Embiricos
So sp-- I, I'll speak for myself and then for the team. I would say, like, most people that I know are basically not opening editors anymore.
- HSHarry Stebbings
Mm.
- AEAlexander Embiricos
Um, and this was a step function change that happened in... It, it's been happening gradually, but I'd say the key external market touch point for this was, like, GPT-5.2 Codex, where all of a sudden, the model was, like, way better at running for longer, uh, handling tasks end-to-end, you know, managing its, its context, um, and following instructions. And so we kind of saw this inflection point, and that's actually why-- part of why we built the app. Um, so, you know, broadly, w- I think before GPT-5.2 Codex, the, the kinds of AI features we were using to write code were like tab completion, or maybe you were pair programming with the model. And in my mind, you know, you still needed to be at your laptop with your hands on the keyboard-ish. And like, you might go off and do a little bit of work, but, you know, you, you kind of still need to be there and like drive. It's just like handling these small things for you.... And then at the time of GPT-5.2 Codex in December, we kind of switched to like, actually, I'm just gonna fully delegate this task. It's like, you know, I'm gonna have a-- do a plan with it, make sure we like the spec that it's gonna do, and then I'm just gonna let go, let it cook. And this is quite a different way of working. So it's like, it's changing, like, literally as we speak. And so part of why we, we built this Codex app that we re- released last week is because we wanted to build like a form factor or user experience, where it felt like very ergonomic to be delegating instead of pairing with an agent. Uh, and so like delegating to multiple agents at once. And so, even at OpenAI, this is changing massively. Um, I don't have a percentage stat for you, but I would say like the vast majority of code is written by AI, and I would say that now probably, like most people are not even like opening IDEs. May- maybe if they are opening IDEs to like, maybe you wanna own the interface, right? So you'll like help flesh out like the interface between like two modules, and then like AI fills it out. Or maybe you wanna like collaborate on a plan, but then have AI fill it out. The code itself is not being written by humans anymore.
- 19:17 – 20:03
Why the Codex app isn't a traditional IDE
- HSHarry Stebbings
Will we have IDEs as a part of the stack in twenty-four months' time?
- AEAlexander Embiricos
Okay, so the, the formal definition, right, integrated development environment. I mean, that, that phrase is so squishy that like, literally anything could be an IDE, right? So I don't think that's very useful. If that's the answer, then yes, you could even argue the Codex app is an IDE. I don't, I don't think it is. Like for me, I think of an IDE as like a, a really powerful editor. We explicitly didn't build editing into the Codex app because we wanted it to be really clear how you're meant to use it. So, you know, it has a lot of affordances for m- managing multiple agents, for delegating, um, for re-- for reviewing changes. It has really prominent skills, which are an open standard that are really useful for doing non-coding work, stuff like, you know, triaging tasks, um, or monitoring deploys or something, but it doesn't have
- 20:03 – 22:22
The importance of plan reviews and automated code reviews
- AEAlexander Embiricos
text editing.
- HSHarry Stebbings
If we assume a large percentage is done by Codex in terms of the code produced, how do you do coding reviews, and is AI responsible for internal coding reviews?
- AEAlexander Embiricos
So the- there are a few things here. Um, first off, the spec for what you wanna do or the plan becomes more important than ever, right? So like think like architecturally, like how should this code work?
- HSHarry Stebbings
Yeah.
- AEAlexander Embiricos
Um, so, you know, we recently shipped like a very prominent plan mode that works a little differently than others, where you have the agent go off and, like, propose how it's gonna do something. It's like quite a long plan, and then it asks you questions about if you agree on how it wants to do it or, or if you wanna have input. And this is very similar to like, if you had a new hire who was new to your code base, and, um, you know, they had to present a sort of req-- a request for comments to the rest of the team before they started doing the work. So even though that's not formally code review, I would say review of the plan is actually something that's becoming more important because we're entering more of this like delegation phase of working with agents. So that's an underrated thing. Um, then, okay, there's actual code review. I think a problem that I hear a lot of people talking about, especially in the open source world, is like a lot of AI slop. Like, people will just be submitting PRs to these open source repos, and they're trash, and like maybe the user hasn't even-- the person submitting the PR hasn't even tested them or definitely hasn't reviewed the code. I think this is a problem, and so a common practice with Codex is to have Codex, like, review its own PR, uh, or its own change, and Codex is actually incredibly good at this. We've explicitly trained the model to be good at code review. Um, and you know, that included things like making sure it's like really good at, uh, creating like high signal feedback. So it'll like, basically have few false positives of criticism, which means you can really trust when it has feedback. And so we-- not only do we encourage people, like on the team and elsewhere, like to, like, just ask Codex to review, you can then also set it up to just, like, automatically review. So, like, nearly all code at OpenAI is reviewed by Codex automatically whenever you push it to a Git repo. Actually, like, one, one fun thing, uh, for people who haven't tried Codex yet or didn't try it recently, is, uh, sometimes the way that people, like, see how good our models are is by asking Codex to review a different model's code. And, and basically, they're like, "Oh, shoot, I should probably just be using Codex [chuckles] to write my code in general."
- 22:22 – 27:47
Building open standards and the "agents.md" convention
- HSHarry Stebbings
You said something really interesting there. You said, for those that maybe haven't tried it yet or, uh, yeah, are coming back to it, how do you think about retention with this category? I remember Tom Blomfield, who's a YC partner, tweeted months and months ago, but it stuck with me, a weird brain, um, about the ease of transition between different providers, whether it was Cursor or Claude Code or Codex. I can't remember which one it was, to be honest. But how sticky are users, and how do you think about retention?
- AEAlexander Embiricos
We've taken this, like, kind of counterintuitive approach with Codex to just build it super openly. So, like, the Codex core harness is open source, and we're always trying to make it easier for people to switch. So, for instance, um, when we first launched Codex last year, uh, we created, like-- I mean, it's-- created is even a heavy word. It was just-- we just established a convention, which is called agents.md. This is basically a file that you can put instructions for the agent in, and instead, we didn't call it codex.md, we just wanted it to be something that all agents can use. And pretty much every agent, except Claude, uses agents.md, which is awesome. And then just last week, actually, uh, we helped push for putting skills, which are our standard for, like, giving the agent instructions and scripts. We pushed for those to be sorted in a sort of a neutral named folder called Agents, um, instead of in, like, Codex or something. And again, everyone has jumped on it except the usual suspect. Uh, so I think it's really great for the developers to have a lot of choice, um, and we're trying to make it even easier for people to try different things. Now, that said, I think these coding tasks, right, where you're asking an agent to write some code, they're quite hermetic. And what I mean by this is, you can k-- it's like or ep-- uh, maybe an analogy in TV would be like episodic, right? Like, you can come in, and you've got this, like, open-ended, like, agents file that, like, any agent can read from. You've got these skills that any agent can use, um, and you can ask the agent to write some code, and it produces a patch, and that patch goes into Git. So kinda like both ends of this are pretty neutral, vendor-neutral, so very easy to move between for now.... as agents start to do work that is not writing code, but more general work, again, for software engineers or beyond, for any builder, they're gonna need to start interfacing with other systems, right? So as they start, maybe your agent is talking to Sentry, right? Or it's talking to your Google Docs or something. Then I think these agents become much stickier, because actually deciding to, to connect an agent to that system is a sticky decision. And if you're an enterprise, really trusting that the agent is gonna have access to these tools, but there are really good secure guardrails, and sandbox, and like controls over how the agent works with these systems, I think is critically important, and that's not something that you're gonna want to, to do multiple times. And so, you know, we've been kind of building Codex knowing that this is coming, um, and so we have, like, the most conservative sandboxing approach. Sandboxing is kinda like a set of controls, OS-level controls over what the agent can do. Um, and we have-
- HSHarry Stebbings
But I'm, I'm, I'm a fan of Seven Powers, this brilliant book which talks-
- AEAlexander Embiricos
Mm
- HSHarry Stebbings
... about kind of seven ways that businesses accrue value and sustainability, and like, you know, your stickiness or your retention is one. If we're on the same team with Codex, how do we create retentive patterns, behaviors, programs to ensure that people stay with Codex and they don't flip to Cursor when there's a better model or Claude Code when there's a better model?
- AEAlexander Embiricos
Yeah, I mean, it's interesting because I think on the one hand, like we, we think about this, obviously, we're running a business, but, you know, our, our mission here is to, like, ensure that, like, we safely deliver the benefits of AGI to all humanity. And so something that's, like, unintuitive to people about, like, the Codex team-
- HSHarry Stebbings
Alex, you actually-- um, I'm, I know, uh, but your job is the success of Codex. I get that as well.
- AEAlexander Embiricos
Actually, our job is the distribution of intelligence, right? Um, and so we're obviously building out Codex, and this is really unintuitive to a lot of listeners, but, like, we put all this effort into training these models, and then we serve these models to our competitors, right? And from our perspective-
- HSHarry Stebbings
This is so difficult for me as a venture capitalist to understand. You are aware of this? [chuckles]
- AEAlexander Embiricos
Yeah. I'm totally aware of it. Like, we're-- OpenAI is, like, a really interesting and unusual place to work, but basically, because we're playing such a long game for us, like, if the competition gets better, we learn. It's actually helpful for us, and so we're pushing really hard at growing Codex. And by the way-
- HSHarry Stebbings
Do, do-
- AEAlexander Embiricos
Codex is growing massively
- HSHarry Stebbings
... if, if, if they're closed-
- AEAlexander Embiricos
Yeah
- HSHarry Stebbings
... and they improve, you don't learn.
- AEAlexander Embiricos
I don't think so. Like, for example, there are a bunch of recent launches, like even today, I literally just, like, quote tweeted a thing this morning about a launch from Warp. No particular affiliation, right? And there are a bunch of cool ideas in there about how they like framed up the way that their agent can work in the cloud at the same time as working locally. For me, that's like inspiring. And I think I see all these things from various companies, and like, one of the coolest things about the space is it's like we're all kind of inevitably reaching the same conclusions together and then building things out. And so, you know, on the Codex team, I think we have some massive advantages, right? We have the massive distribution advantage with ChatGPT. We have the massive, like, capability advantage of training our own models to be good in our harness and building our harness to be good at the new models, and, like, no one else has early access to those. And so I think we're, we're playing to win, and we have a, a really big advantage or a number of advantages, but we're also playing this long game where, you know, again, we serve our models to everyone, uh, where we push for open standards so that everyone can use, like, all the things that we're pushing
- 27:47 – 30:18
Winning strategies: Compute advantage, best models, and product execution
- AEAlexander Embiricos
for as well.
- HSHarry Stebbings
Can I ask you, what would be the defining factor of winning? And I, I know I'm using venture language, and you-
- AEAlexander Embiricos
Yeah, fair
- HSHarry Stebbings
... you're brilliant and kind of much more free and open. Uh, but what is like the defining factor of winning, again, if I push you? Is it like GTM, which is like the biggest enterprise in the world, do want to work with OpenAI? I have many friends in your sales team. The inbound that you get from the largest brands is incredible. So GTM, because of the incredible brand, product execution, and just Codex being a freaking awesome product, or compute inference speed, actual, like, compute advantage, which one is the defining winner?
- AEAlexander Embiricos
Okay, so I think if we're gonna talk about it more from an OpenAI perspective, obviously, this is way above my pay grade, but I would say it's compute advantage and having the best models, right? And in order to achieve that, we then need to build businesses that generate revenue, and also that something we've-- that's really interesting, we noticed with having the Codex team, which is, like, a, a sort of combined team of research and product, is also by building these, these successful products, we create a lot of pressure to improve the model in sort of a faster way. So that's maybe the company perspective, right? If we come to the product perspective, I think the single most important thing we can do is build a, a really good product that people want to use. And like I was saying earlier, I think we really wanna build products for individuals, and then allow the, like, people to become fluent in those products, and then, like, pull in automation. And I, I think that may be counterintuitive but will result in way more impact than anyone purely approaching it from, like, the enterprise workflow perspective. Um, so you know, I think that's mostly a question of product execution, and then that works for, say, like prosumer. When it comes to enterprise, the go-to-market side is really important. Like, something that I've learned the hard way is if we go to an enterprise and we're just like, "Hey, we're here, like, feel free to use the stuff," that doesn't work. There's actually quite a lot of education that needs to be done, and there's a lot of, like, configuration that we need to support and sort of like education of the broader team. So, like, that motion looks much more like coming in, pitching, meeting the head of developer experience or whatever, understanding how they want their team to operate, and then giving them tools to, like, propagate that mechanism of operating to the rest of the team.
- HSHarry Stebbings
You, you said the word revenue there, which is one metric to measure a business against. When you think about, like, your metric of success, which you sit down with Sam or Brad or whoever it is and say, "Hey, this is what we're optimizing
- 30:18 – 31:40
Measuring success: Moving from Weekly Active Users to Daily Active Users
- HSHarry Stebbings
for," what is the metric that you use as the defining north star for your progression?
- AEAlexander Embiricos
... it's actually not revenue is the primary. The primary is active users, mm, uh, which-
- HSHarry Stebbings
How, how do you measure active users? Like daily active users?
- AEAlexander Embiricos
Okay. Yeah, we-- so we measure weekly active users, and it's, um, it's just like, uh, you know, did this person like actually do a turn in our product? You know, did they send a prompt?
- HSHarry Stebbings
Is weekly active a frequent enough metric, do you think? Sounds nice, but if this is actually replacing the IDE, is daily active not better?
- AEAlexander Embiricos
I think daily active will be better soon. Yeah. We just happen to use weekly active. It's like a standard here, and I think as we were getting started, it made sense, but I, I, I actually agree with the, the, the criticism there. It's like, we should probably just be at daily. Like, I think we, we need to be getting to a world where for any given task that you have, your first instinct is to ask an agent to help, right? It's kind of like, you know how, like with Google Search, it's just like, okay, anything I need to do, I just like, go into this text box, and I can get navigated to the right location. Then you had ChatGPT. It's like, for any information I need, I can go into this text box, type it out, and get information that helps me. And I think the next phase that we'll see this year is like, for any task I need to do, as opposed to just get information, I go to this text box or this input, and something happens that helps me, even if it's not the full task, even if it's only a small
- 31:40 – 33:52
Chat vs GUIs: The enduring UI of AI interaction
- AEAlexander Embiricos
part of it.
- HSHarry Stebbings
You said about chat and the interface there. I'm, I'm really fascinated by this because it, it, it is a seemingly incredibly efficient input function for, for busy humans. But I, I spoke to Aneesh Akkaya, who's a GP at Andreessen, and it came out the other day, and he's like, "No, no, no, this was created by Sam and Elon, and it works for very efficient people. But most of the planet want browser-based discovery, interactions, UIs." Do you think that chat will be the enduring UI in the next wave of AI interaction with humanity?
- AEAlexander Embiricos
The simple answer is yes, but actually, I think there's two components here. Like, if we, if we just imagine the future, like just like, let's think of some sci-fi movie, right? Like, what does AI look like? I, I, I, I believe that sci-fi is a really good predictor of what the simple-- the future should look like, and usually, it's pretty simple 'cause it's a story, and I think simple is usually right. It's gonna be some just like entity that I can talk to however I want about whatever I want, right? If I-- like, I shouldn't have to navigate to a place where I work with like my coding AI, and then I have this, like, different place for my, like, sales AI, and I have to like, be like: "Hey, I'm now talking to sales thing," and like, do that. It's just like, I'm just gonna talk to a thing, and it's just gonna help. So I think what we're gonna have is that we'll have chat or voice, basically, conversational interface will be sort of the, the pillar of everything that you can talk to about anything, um, and that you can add into any group chat or whatever so it can, like, discover how to help you. But then, if you're like a power user and you're very good as-- at a specific thing, you probably don't wanna be disintermediated by having to talk to another person. It'd be like if you had an executive assistant, but you could only work by talking to them. That's, like, super annoying, right? So at some point, you wanna, you wanna get to the show notes and, like, look at them yourself and, like, edit them yourself, right? You wanna edit the thing yourself. So I think we'll pair chat with like functional, like graphical interfaces that are bespoke to like what someone needs. So like, in my case, I will probably chat to like do my, you know, podcast prep. But when it comes to like actually looking at product and code, I probably want like the Codex app that I can go into and get deep in. Whereas maybe if we're talking to a marketer, maybe that marketer will like chat to ask questions about the product. They're not gonna download the Codex app just to ask questions about the product, but maybe they'll have like a super custom GUI for like ad analytics or something that they
- 33:52 – 35:24
Designing interfaces for agent-to-agent interaction
- AEAlexander Embiricos
go into.
- HSHarry Stebbings
Totally get that, and it, it kind of wrongly assumes on my behalf, a consumer interaction at some point in that journey. And I wanna ask you, how do you think about, like, agent-to-agent experiences and designing experiences for agents? Like we spoke about, for example, going to large enterprises and how you can be helpful. I'm just using the most boring thing ever, expense approval. You could have agent submission of expenses on my behalf for my trip to San Francisco, and then the agent on the flip side, doing approvals for that from OpenAI's compliance department. Agent-to-agent is... How do you think about that and that paradigm shift?
- AEAlexander Embiricos
My, like, quickest answer to this is that, l- like, we've noticed as we built Codex, that the best, like, the best interfaces for Codex to do work are also tend to be the best interfaces for humans. So like, when people ask, like: "Oh, like, how can I make my code base like more efficient for the agent to work with?" The answer is often like: "Well, have you looked at it yourself? And is it, is it easy for a human to work with?" So like a very specific example would be, like, running tests in a code base. Naively, if you just, like, set up most test runners, they just, like, emit all the outputs of all the tests. And so, like, as a human, it's really annoying because you have to go in and, like, find the one that failed, and it's like you've got to read hundreds of thousands of lines. Turns out that's terrible for AI as well, but if you filter it down to just only emit the failed test, better for humans, also better for agents. So probably, the agent-to-agent interaction points will be very similar to like if there was a human in the loop, and that's nice 'cause it means you can kind of atomically replace individual systems.
- 35:24 – 54:12
The data moat: Coding data vs knowledge work task data
- HSHarry Stebbings
I mentioned our show on LinkedIn, and, uh, a wonderful investor from a, a different company, said Harry Potter, you know, Voldemort, and it's like, you know, he who shall not be named. Um, I don't want Sam to kill me, but from another company-
- AEAlexander Embiricos
Okay, okay.
- HSHarry Stebbings
-was like: "You gotta... You ask him-- ask him, how do you think about a coding data moat, and does Anthropic have all the data now?"
- AEAlexander Embiricos
I definitely don't think they have a significant advantage in terms of data, um, on coding. I think that from what we've seen, and, you know, and I'll-- I would defer to my research team on this, but I feel like we, we feel like we have plenty enough data to build really good coding models. I actually think the, the place that's more interesting for getting data now is like, as we get into, like, knowledge work tasks, that's kind of data that's, like, not really, like, available most places on the internet. And so you start to have, like, really interesting brainstorms for, like, how to help a model be good at it. Like, maybe you have to, like, pay people, um, to, like, simulate doing tasks so that you can, like, learn these trajectories for the model.... maybe you should acquire startups, you know, that are no longer in business or that, and, uh, but have, have a lot of, like, data, like, say, they're Slack or something. Um, yeah, I think that that kind of knowledge work, task distribution is, like, much harder than coding.
- HSHarry Stebbings
That's so interesting you said that about kind of the, the data that doesn't exist, so to speak. How do you think about your interactions with the data providers, your McCaw's, your Turing's, your Invisibles, your da, da, da, da, da, da, of the world? Like, will you spend ten X there, or will you go, "We are spending too much on data. We should do it ourselves and do data acquisition?"
- AEAlexander Embiricos
Yeah, I mean, I think the way that we think about these things is just like, how do we move as quickly as possible? And so, you know, getting-- becoming able to set these things up in-house is, like, very expensive in time, and we're a small team. So what I have observed so far is that if we need to run a data campaign at scale, we're usually gonna enlist help from one of these companies.
- HSHarry Stebbings
On the consumer side for Codex, we've spoken about, like, enterprises and going into them, how to engage in terms of developer experience, developer operations. Do you compete with a Lovable and a RapGenie on a, like, low-end consumer basis in a year or two's time? Is that a business where you're like, "You know what? Codex is not for every person to create an About Me or a small business to create their own site"? How do you think about consumer in that way?
- AEAlexander Embiricos
Yeah, I would say that right now, it doesn't feel like we're co- competing super directly. Um, but, you know, I don't know if you saw our, our Super Bowl ad, uh, the tagline of which is just, "You can just build things." Um, with the app, we noticed that, like, many, many people who are less technical are starting to build things, and so the kinds of things they're building are much more hello world-y. And so I think that we will see some overlap in use cases, um, where you have, you know, people just pulling up Codex because they have it as part of their ChatGPT. Actually, like, a big announcement, um, last week was that we're now offering some Codex to people even on free ChatGPT plans or on the Go ChatGPT plan. So this is, this is massive just in terms of, like, bringing availability to everyone. Um, and so I think we're definitely gonna see people with, like, a free ChatGT- ChatGPT plan coming in and just, like, building simple things where they otherwise might have gone to a specialized tool.
- HSHarry Stebbings
What would you most like to do differently, but for whatever reason, you can't?
- AEAlexander Embiricos
This is an interesting one. I feel like it's been a very good few weeks for us. [chuckles] Um, so we're very... I'm pretty jazzed about everything that's happening, but maybe the feeling that I have-
- HSHarry Stebbings
That's really-
- AEAlexander Embiricos
Yeah.
- HSHarry Stebbings
That's really interesting. You said it's been a very good few weeks for us, and I feel that. Does the team feel changing winds of momentum, both in positive and negative cycles?
- AEAlexander Embiricos
Absolutely. We, we are very attuned to it, right? Like, if you look at the, the history of Codex, the first thing we launched last year was, like, this amazing idea that people were super excited about. It's like, "Hey, we're gonna give the agent its own computer in the cloud. You can have i- as many of them as you want work for you in parallel on tasks." Super great idea. To be honest, it didn't work as well as what we shipped later. It was not the best. Um, and then, since August, with GPT-5, we started pushing really hard on interactive coding, which is where most of the competition in the market is. And, you know, we went on an absolute tear. I feel like the public metric we have was, like, since August, we grew by, like, twenty X, and then, like, even, like, late in the year, we, like, doubled from December to now. I forget the exact number there, but, like, you know, that was competing neck and neck. But the, the shift that we feel last week is, you know, we ha-- we felt like we had the most intelligent model that was cemented with 53 Codex. We had feedback around our model being slower and, like, maybe less fun to work with and, like, being less good at communicating with you while it was working. We addressed that feedback. Um, and that's true even compared to, like, the, the other competitor model that launched, like, twenty minutes before us and was like, "Maybe this is spicy," and was like soda for twenty minutes. Soda means state-of-the-art. Um, and then, you know, we'd, we'd always been getting a lot of feedback on, like, the quality of the user experience in Codex. Our, our most popular surface was the IDE extension, and our CLI, which is a command line interface, was less polished. But with the app, the feedback has been, like, s- resounding from the market, uh, that this is, like, a really high-quality experience. It's, like, simple, like, unintuitively simple, and people are just loving using it. Even our biggest critics are converted. So yeah, a- and then we-- and then we had the Super Bowl ad, and then we went to free. And so going back to your question of, like, what do I most wanna do differently, the first is I actually wanna get back to cloud. Um, when we pivoted our strategy from, like, building the cloud-- like, focusing on the cloud agent last year to working interactively, the thinking was very simple. It was just-- And it's kind of like what I was telling you about FDEs, actually. If you go too far ahead to workflow automation before your end user is fluent with the tooling and can get it to work simply, then there's, like, this disconnect, and you just have this pipe dream idea that's not, like, effective, uh, for-- except for the most power users. But once you have this base where people are using your tool every day, like you said, and they're configuring it, and every time they use it, it gets better, then, like, the step up to, like, letting it run independently in the cloud is a much smaller step up, right? Um, so I think it's time for us to, like, get back to, like, building out the cloud product and making it super tightly integrated with the local product. It already is somewhat integrated. And the other thing I wanna do differently is, um, start thinking more about the bottlenecks. Like, codegen, writing code, has become, like, you know, basically trivial now. Um, but the hard part is, like, what you were talking about with, like, code review, right? Like, how do we know the code quality is good? How do we know we're doing the right things? And those bottlenecks, I think, are under, underappreciated still and underinvested in. So, like, I think we wanna get to a world where you can have an agent that is unbottlenecked, right? That you trust to, like, own an entire microsystem or internal tool or whatever, and can do the full iterative loop, including feedback from users, without having to go through human review. And that is a really hard problem to solve, both from an intelligence perspective, but also from, like, a safety perspective and a controls perspective.
- HSHarry Stebbings
How much weight should we place on benchmarks and evals?
- AEAlexander Embiricos
I think-... probably, uh, this is an annoying answer for you. It's like some, [chuckles] right? Like, they do tell you-- they kind of, uh, in my mind, they give you a good measure of intelligence, right? Um, and so you can put weight on those for intelligence. Um, and especially before evals are saturated, I think you-- when you see meaningful progress in those benchmarks, it's like very, um, very helpful. Um, and then I think you have to pair that, though, with, like, what it feels like to use the model, and that's, that's a vibes thing. Like, whenever I talk to any, like, even internally or even talking to like customers of our models, I'm always surprised by how vibes-based the, uh, evaluation of how it feels to work with the model is.
- HSHarry Stebbings
[chuckles] How vibes-based life is. "People wanna work with people they like-"
- AEAlexander Embiricos
Yeah.
- HSHarry Stebbings
-is the lesson that I give to kids.
- AEAlexander Embiricos
Exactly.
- HSHarry Stebbings
Like relationship-
- AEAlexander Embiricos
People wanna work with models they like.
- HSHarry Stebbings
Yeah, relationships matter. Um, g- can I ask you? I think that Cursor will lose half of their revenue this year, and it will go from a billion to five hundred million. It's a bold statement. Agree or disagree?
- AEAlexander Embiricos
Oof! [chuckles] Can I just, like, no comment?
- HSHarry Stebbings
[laughing] Uh, yeah, you totally can.
- AEAlexander Embiricos
I don't know. I, I think it's really hard to say. Like, g- I'd like-- more serious answer here is just like, I think they've built a really successful business. We see them a lot more in enterprise.
- HSHarry Stebbings
Do you?
- AEAlexander Embiricos
I think one of the-- Yeah.
- HSHarry Stebbings
Or is it just Claude Code? 'Cause I, I don't know anyone that has-
- AEAlexander Embiricos
No, I see Cursor a lot more than Claude Code, and it makes sense to me. Like, my sort of narrative for this is that you, you have to meet people where they're at, right? And so for most people, like they're used to using an IDE. They're-- they've been used to using tab completion even before there was AI, right? Like, tab completion existed pre-AI, and then AI just made it better. And so I think what's like coolest about Cursor, from my perspective, is that it meets developers exactly where they are, and it's a sort of a switch. It's like you used to be using VS Code or something, switch to Cursor, almost nothing is broken about your workflow. Everything works, just certain aspects got better. And obviously, VS Code-- I, I still use VS Code. There's like reasons you might like it more. Um, and, you know, they're improving rapidly as well. But I think that pitch from Cursor lands well with a lot of people. And so, you know, the bet on Cursor, I think, is that they can, like, continue meeting people where they are and then, like, ladder into these more ad- advanced agentic features. Um, so you know, that, that relationship with the customer is valuable, and it's hard to-- I don't think that goes, you know, goes away.
- 54:12 – 55:51
Advice for the Next Gen of Engineers
- AEAlexander Embiricos
roles.
- HSHarry Stebbings
I'm a CS student, okay? And I'm at Stanford, I'm at Imperial, I'm at Cambridge, I'm wherever, ETH, great institution. What would you advise me, knowing all that you know now, that would help me navigate the next five years of my career? I want to be valuable to the AI ecosystem environment as an engineer entering the workforce in the next year.
- AEAlexander Embiricos
Basically, there's actually never been a better time to be an engineer.... because you have incredible tooling available to you to get an incredible amount done, and your ability to, like, ramp into, like, a complex code base that you might be hired into has never been faster because you can go ask AI, like, a ton of questions about the code base, and you can ask it to plan out changes that would otherwise take you, like, days to research maybe. So I think first off, I would say, like, you should be, like, very optimistic. But then, of course, like, about you-- what your abilities once you're at the job, then now the question is: how do you get the job? I think that because it's never been, like, easier to build things, the thing that becomes scarcer is, like, agency, taste, and, like, quality. And so I would urge you to, like, like, just build things and, you know, demonstrate your agency and your taste around what you build, and, like, build things that are of high quality, and then share those things. Like, you know, we get a lot of inbound for-- from folks, um, you know, both applying for jobs through the careers page or also on social. This is just me, but when someone writes to me with, like, some interesting thoughts and, like, a link to an interesting project, that gets my attention much more than, like, a normal resume
- 55:51 – 57:26
Lessons from competitors (Claude Code)
- AEAlexander Embiricos
does.
- HSHarry Stebbings
Final question before we do a quick fire. What has Claude Code done well that you sit back and you learn from?
- AEAlexander Embiricos
Number of things. Um, so like I was saying, I think, uh, way back last year, they made something that was really easy to use and just, like, worked with all your tools with zero setup by running it locally in your terminal. And, um, when we, um, started investing much more in the Codex CLI and, you know, shipped great models for it, like GPT-5, our growth exploded. And so I think that idea of just, like, meet people where they're at, give them something easy to use, let them ramp from there, and, like, figure out how to use it, has been awesome. So, um, that's probably the biggest learning we've had from them.
- HSHarry Stebbings
What mistake do you think they made that you've also learned from, having had the benefit of seeing them make it?
- AEAlexander Embiricos
I think that they over-indexed on their initial success with their command line interface tool. Um, I think at the end of the day, it's, like, not the friendliest UI, and it makes it hard to extend beyond, like, pure builders, and it makes it, um, difficult to, like, truly delegate to agents because, like, effectively, to delegate through that kind of interface, you have to be, like, kind of a power user of, like, your terminal or Tmux or something. And so that's, that's why we built the app, and I think the market reception around the app, to me, like, it was kind of a risk when we started, but it makes me, like, really feel good about that decision because it's-- the app-- the Codex app is, like, a much more intuitive, uh, simple interface to, like, get started with. It's, like, less scary, but then it naturally leads you to this idea of, like, "I'm gonna take my hands off the keyboard and, like, delegate to the agent."
- HSHarry Stebbings
You mentioned Dropbox earlier. It's-- the,
- 57:26 – 1:00:39
Lessons from Dropbox and Slack
- HSHarry Stebbings
the alumni from Dropbox is incredible. I mean, really, like, amazing to see the talent that's come out of Dropbox. What is your single biggest lesson from Dropbox that has shaped some of your thinking now with OpenAI?
- AEAlexander Embiricos
Oh, I don't, I don't need to think about that one. That, that's kind of the thing I was telling you about earlier, right? Like, I think when you're building tooling for people, like, for end users, you have to think about, like, that tooling as a system of engagement, right? If people don't want to use your tool, if it doesn't, like, naturally feel like the easiest way to get something done, then people just won't use it, right? And so, like, again, I learned that from watching how Slack just absolutely took off. Um, and so I think about that a lot now when we're building these agents. I'm like, "If we build our agent purely as like, you know, workflow automation, then it's always gonna be like pulling teeth to get that thing started, right?" You're gonna need to hire Accenture or someone to come in. They're going to need to deploy FDEs. It's gonna be tough. But if you can build a system that, like, people just love using, even if they only use it for partial tasks, over time, they'll get better and better at using it. And then the-- you'll get connected to the tools you want over time, and then you can start laddering in automation. Obviously, these aren't mutually exclusive.
- HSHarry Stebbings
How on earth do you reinvigorate growth at Dropbox today?
- AEAlexander Embiricos
At least from when I was at Dropbox, the thing we were uniquely good at was desktop software.
- HSHarry Stebbings
Uh-huh.
- AEAlexander Embiricos
And desktop software is-- it's funny, it was never not back, but anyways, it's so back. Um, basically because if you're solving for productivity and knowledge work, um, yes, there are systems of record everywhere that you need to connect with, but everything, at the end of the day, happens on the user's computer, either in their browser or, you know, just, like, locally in apps on their computer. And so I do think that the f-- the, the fastest way we're gonna see productivity gains from agents at work is going to be, at first, meeting users on their computer, working with the stuff that they have available to them, you know, without having deployed FDEs to set anything up, and then over time, you'll connect in these various systems. And so if I was Dropbox, I'd be thinking about: How do we leverage our unique domain expertise in, like, building really good, like, desktop software, uh, and this sort of collaborative layer on top of your computer? How do we leverage that to enable productivity agents? It's a bit broad, but I think that's the angle you go for.
- HSHarry Stebbings
No, I love it, and I really appreciate the response. Final one before we do a quick-fire promise.
- AEAlexander Embiricos
Yeah.
- HSHarry Stebbings
I've been brought up in a world where margin matters. Software margins are wonderful, and it's what makes software a brilliant category to invest in. We're seeing margin profiles that are very different in i- inference-heavy players in particular. To what extent should I put that out of mind and appreciate that costs will come down, cost of tokens will come down, and actually, it's about usage and customer love, margins will come? Or, no, margins are actually fricking important, keep, keep that focus.
- AEAlexander Embiricos
I think both costs are gonna come down significantly, and I also think that, you know, if this is the year of agents being deployed, like, broadly at work, connected, then this is also the year where they're gonna have to be connected to all these various systems, and I think that's gonna be very sticky. And so I view this year as a race, and so I think you wanna win that race, and you should be okay tak-- you know, taking some hit to margin in the meantime.
- 1:00:39 – 1:07:39
Quick-Fire Round
- HSHarry Stebbings
Dude, quick fire round.
- AEAlexander Embiricos
Yeah.
- HSHarry Stebbings
So I say a short statement, you give me your immediate thoughts. Does that sound okay?
- AEAlexander Embiricos
Yeah.
- HSHarry Stebbings
... What have you changed your mind on most in the last twelve months?
- AEAlexander Embiricos
When I joined OpenAI, I thought that-- this was a little longer than twelve months ago, but when I joined OpenAI, I thought that we would all just be hanging out with our computer screen sharing, but within a year from there. You know, we'd have this agent that we're just talking to. Um, that was completely wrong. Um, I think the rate of like, progress in like, multimodal models was like, slower than I expected. Ah, multimodal means, you know, like models that work with, like, video and audio. So instead, what happened was that we saw that like, agents that work with your computer through code are the way. And so for me, that's been a complete rethink in terms of like how we bring the benefits of AI to like, just people generally. It's not, not through video and audio, primarily.
- HSHarry Stebbings
Which lesser-known competitor do you respect most, and why?
- AEAlexander Embiricos
The first one that came to mind was Amp. Um-
- HSHarry Stebbings
Amp?
- AEAlexander Embiricos
I think they're really-- Yeah, A-M-P.
- HSHarry Stebbings
Okay.
- AEAlexander Embiricos
Um, it's out of, out of the folks at Sourcegraph. Um, their product has a great reputation of just being like, you know, punching way above its weight. But I think the other thing that I really respect is that they helped initiate this whole, like, standardization around, like, agents.md and like .agent/skills, which are what I was saying earlier about, like, making it so that it's easier for users to manage all these different agents that they're trying. Um, you know, we obviously put out agents.md, but they put out agent.md, and basically, Quinn started this all by putting out a tweet that said, "Hey, do you guys buy the domain agents.md? We'll standardize to your, your spelling." And as small as that was, that initiated this whole standardization that I think has been awesome in the community.
- HSHarry Stebbings
Do you think the response to Anthropic's ads was the right response?
- AEAlexander Embiricos
I mean, there were so many different responses. The one that I heard, obviously, I think was right. The one that I heard was: Well, one company's being pretty negative about the future, and the other company, us, OpenAI, is being really positive and just telling people they can build things and to dream. I, I thought that response was brilliant.
- HSHarry Stebbings
I mean, saw him write an essay. [chuckles] Do you think it was a good response?
- AEAlexander Embiricos
I think so. I mean, I think as... One of the cool things that I love about OpenAI is, like, people are like, very unapologetically and authentically themselves.
- HSHarry Stebbings
Hmm.
- AEAlexander Embiricos
Um, and so for me, that was just like a very authentic response, and I like that we do that.
- HSHarry Stebbings
What's the hardest product decision you've had to make since being at Codex?
- AEAlexander Embiricos
Well, I can tell you the most painful product decision we had to make.
- HSHarry Stebbings
Great.
- AEAlexander Embiricos
Um, for a while, um, Codex Cloud was like, effectively unlimited, not free. Like, you needed to pay for ChatGPT, but then you had unlimited usage. And, uh, you know, we- every day that we left it that way, we knew that it would be harder to wind back it being, like, unlimited.
- HSHarry Stebbings
Yeah.
- AEAlexander Embiricos
But we were just so focused on competing on our other things that had more PMF, that we kind of punted that decision out. And when we, when we wound back, um, that unlimited use to some, like, more reasonable limit, there was a lot of blowback from users, and it was a very small minority of users who, like, thought everything should be kind of like pseudo free forever. But that blowback affected us everywhere because, like, the social chatter doesn't really distinguish, uh, between these things. So, um, I think the lesson I learned the hard way there is, like, you can't, can't make things unlimited for too long.
- HSHarry Stebbings
Data set pricing, grandfathering pricing is just-
- AEAlexander Embiricos
Yeah
- HSHarry Stebbings
... it's such a hard thing. What do we do today in engineering or product that in five years' time you'll look back on and go, "Oh, my God, can you believe that we did that?"
- AEAlexander Embiricos
Well, one is just editing code by hand. Um, I think probably another one, this is maybe spicier, but another one might even be, uh, like, actually managing the deployment and monitoring of, um, systems by hand. Like, I basically think that probably big companies will take a long time to, like, deploy this, but many start-ups might actually kind of start building on a completely new stack that's, like, fully AI managed. To be clear, the stack doesn't exist yet, but a fully managed AI stack where, because-- but basically it's been built to give you really strong deterministic guardrails over what the agent can do, and like, control over to, like, roll back deploys and everything like that. And so we'll get to a world where the way you start a company is you start by getting an agent and just asking it to build things, and then you get more agents than that, and then maybe eventually, you add, you add your co-founders to this service that you use to work with agents. And so you end up like, maybe your main communication tool is actually your agent communication tool, and then maybe, uh, you're not actually, like, handholding this, like, very point- painful CI and deploy process, but you're just, like, having agents do things.
- HSHarry Stebbings
Weird question, but I'm intrigued. Are you the one providing agent guardrails? And what I mean by that is, your agents can go anywhere within the enterprise. Are you responsible for providing those guardrails, or is there a third-party matter provider who is saying: "Hey, whoa, Alex, you can't go into that. That's human resources," or, "You can't go into that, that's marketing." How do you think about guardrail provisioning, and is that the role of the agent provider or a third-party provider?
- AEAlexander Embiricos
I think we'll probably see both. Like, we are putting a lot of effort into agent guardrails. Like I said, we have, I think, the most-- w- we're basically the only company that cares about OS-level sandboxing for coding agents. For instance, there's none that exists on Windows. We're the ones building that, uh, and we're doing it in open source, so hopefully other people can use it. And we think about that a lot. Um, we... ChatGPT supports connectors, so, you know, you can talk to your, like, Google Docs or something, and we put a lot of effort into guardrails around what the agent can do with your Google Docs. Um, so those are just two examples, but we think a lot about this, and I think probably, though, the way that we'll do it will not be sufficient. Like, there'll be third parties who provide, like, very bespoke things for very bespoke, you know, company needs, and there'll probably be a mix of both.
Episode duration: 1:08:04
Install uListen for AI-powered chat & search across the full episode — Get Full Transcript
Transcript of episode S1rQngjpUdI
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome