Lenny's PodcastAlexander Embiricos: How Codex grew 20x by moving into IDEs
How Codex left the asynchronous cloud for IDE extensions and terminal sandboxes; reviewing agent-written code is the new bottleneck, not compute.
EVERY SPOKEN WORD
150 min read · 30,073 words- 0:00 – 5:13
Introduction to Alexander Embiricos
- LRLenny Rachitsky
You lead work on Codex.
- AEAlexander Embiricos
Codex is OpenAI's coding agent. We think of Codex as just the beginning of a software engineering teammate. It's a bit like this really smart intern that refuses to read Slack, doesn't check Datadog unless you ask it to.
- LRLenny Rachitsky
I remember Karpathy tweeted the gnarliest bugs that he runs into, that he just spends hours trying to figure out, nothing else is solved, he gives it to Codex, lets it run for an hour and it solves it.
- AEAlexander Embiricos
Starting to see glimpses of the future where we're actually starting to have Codex be on call for its own training. Codex writes a lot of the code that helps, like, manage its training run, the key infrastructure, and so we have, uh, Codex code review is, like, catching a lot of mistakes. It's actually caught some, like, pretty interesting configuration mistakes. One of the most mind-blowing examples of acceleration is the Sora Android app, like a fully new app. We built it in 18 days and then 10 days later, so 28 days total, we went to the public.
- LRLenny Rachitsky
How do you think you win in the space?
- AEAlexander Embiricos
One of our major goals with Codex is to get to proactivity. If we're going to build a super assistant, it has to be able to do things. One of the learnings over the past year is that for models to do stuff, they are much more effective when they can use a computer. It turns out the best way for models to use computers is simply to write code. And so we're kind of getting to this idea where if you want to build any agent, maybe you should be building a coding agent.
- LRLenny Rachitsky
When you think about progress on Codex, I imagine you have a bunch of evals and there's all these public benchmarks.
- AEAlexander Embiricos
A few of us are, like, constantly on Reddit. You know, there's, uh, there's praise up there and there's a lot of complaints. What we can do is, we as a product team just try to always think about how are we building a tool so that it feels like we're maximally accelerating people, rather than building a tool that makes it more unclear what you should do as the human being.
- LRLenny Rachitsky
At OpenAI, I can't not ask about how far you think we are from AGI.
- AEAlexander Embiricos
The current underappreciated limiting factor is literally human typing speed or human multitasking speed.
- LRLenny Rachitsky
Today, my guest is Alexandris Embiricos, product lead for Codex, OpenAI's incredibly popular and powerful coding agent. In the words of Nick Turley, head of ChatGPT and former podcast guest, "Alex is one of my all-time favorite humans I've ever worked with, and bringing him and his company into OpenAI ended up being one of the best decisions we've ever made." Similarly, Kevin Wheel, OpenAI CPO, said, "Alex is simply the best." In our conversation, we chat about what it's truly like to build product at OpenAI, how Codex allowed the Sora team to ship the Sora app, which became the number one app in the App Store in under one month, also the 20X growth Codex is seeing right now and what they did to make it so good at coding, why his team is now focused on making it easier to review code, not just write code, his AGI timelines, his thoughts on when AI agents will actually be really useful, and so much more. A huge thank you to Ed Bayes, Nick Turley, and Dennis Yang for suggesting topics for this conversation. If you enjoy this podcast, don't forget to subscribe and follow it in your favorite podcasting app or YouTube. And if you become an annual subscriber of my newsletter, you get a year free of 19 incredible products, including a year free of Devin, Lovable, Replit, Bolt, n8n, Linear, Superhuman, Descript, WhisperFlow, Gamma, Perplexity, Warp, Granola, Magic Patterns, Raycast, ChatGPT, MobIn, PostHog, and Stripe Atlas. Head on over to lennysnewsletter.com and click Product Pass. With that, I bring you Alexandris Embiricos after a short word from our sponsors. Here's a puzzle for you. What do OpenAI, Cursor, Perplexity, Vercel, Plat, and hundreds of other winning companies have in common? The answer is they're all powered by today's sponsor, WorkOS. If you're building software for enterprises, you've probably felt the pain of integrating single sign-on, SCIM, RBAC, audit logs, and other features required by big customers. WorkOS turns those deal blockers into drop-in APIs with a modern developer platform built specifically for B2B SaaS. Whether you're a seed stage startup trying to land your first enterprise customer or a unicorn expanding globally, WorkOS is the fastest path to becoming enterprise ready and unlocking growth. They're essentially Stripe for enterprise features. Visit workos.com to get started or just hit up their Slack support where they have real engineers in there who answer your questions super fast. WorkOS allows you to build like the best with delightful APIs, comprehensive docs, and a smooth developer experience. Go to workos.com to make your app enterprise-ready today. This episode is brought to you by Fin, the number one AI agent for customer service. If your customer support tickets are piling up, then you need Fin. Fin is the highest performing AI agent on the market with a 65% average resolution rate. Fin resolves even the most complex customer queries. No other AI agent performs better. In head-to-head bake-offs with competitors, Fin wins every time. Yes, switching to a new tool can be scary, but Fin works on any help desk with no migration needed, which means you don't have to overhaul your current system or deal with delays in service for your customers. And Fin is trusted by over 6,000 customer service leaders and top companies like Anthropic, Shutterstock, Synthesia, Clay, Vanta, Lovable, monday.com, and more. Because Fin is powered by the Fin AI engine, which is a continuously improving system that allows you to analyze, train, test, and deploy with ease, Fin can continuously improve your results too. So if you're ready to transform your customer service and scale your support, give Fin a try for only 99 cents per resolution. Plus, Fin comes with a 90-day money back guarantee. Find out how Fin can work for your team at fin.ai/lenny. That's fin.ai/lenny.
- 5:13 – 11:34
The speed and ambition at OpenAI
- LRLenny Rachitsky
Alexandris, thank you so much for being here and welcome to the podcast.
- AEAlexander Embiricos
Thank you so much. I've been following you for ages and I'm excited to be here.
- LRLenny Rachitsky
I'm even more excited. I really appreciate that. I want to start with your time at OpenAI. So you joined OpenAI about a year ago. Before that you had your own startup for about five years. Before that you were a product manager at Dropbox. I imagine OpenAI is very different from every other place you've worked. Let me just ask you this. What is most different about how OpenAI operates and what's something that you've learned there that you think you're going to take with you wherever you go, assuming you ever leave?
- AEAlexander Embiricos
By far I would say the speed and ambition of working at OpenAI are just, like, dramatically more than what I can imagine. And, you know, I guess it's kind of an embarrassing thing to say because you, you know, everyone who's a startup founder thinks like, "Oh yeah, my startup moved super fast and the talent bar is super high and we're super ambitious." But-I have to say, like, working at OpenAI just kind of, like, made me reimagine what e- what that even means.
- LRLenny Rachitsky
We hear this a lot about, you know. It feels like every AI company is just like, "Oh my God, I can't believe how fast they're moving." Is there an example of just, like, "Wow, that wouldn't have happened this quickly anywhere else"?
- AEAlexander Embiricos
The most obvious thing that comes to mind is just, like, the- the explosive growth of Codex itself. I think it's a while since we bumped our external number, but like, you know, it's like the- the 10x-ing of Codex's scale was just, like, super fast in a matter of months, and it's like, well more since then. And you know, like, once you've lived through that, or at least in speaking for myself, like, having lived through that now, I feel like any time I'm gonna spend my time on, like, you know, building tech product, there's that kind of, that speed and scale that I now need to- to- to meet. If I think of, like, what I was doing in my startup, it moved, like, way slower. And I, you know, there's always this balance with startups of, like, how much do you commit to an idea that you have versus, like, find out that it's not working, uh, and then pivot. But I think one thing I've realized at OpenAI is, like, the- the amount of impact that we can have and, in fact, need to have to do a good job is so high that it- it's- I have to be, like, way more ruthless with how I spend my time now.
- LRLenny Rachitsky
Before we get to Codex, is there a way that they've structured the org or, I don't know, the way that OpenAI operates that allows the team to move this quickly? Because everyone- everyone wants to move super fast. I imagine there's a structural, uh, approach to allowing this to happen.
- AEAlexander Embiricos
I mean, so one thing is just the technology that we're building with has, like, just transformed so many things, you know, from like, both how we build but also, like, what kinds of things we can enable, uh, for users. And, you know, we spend most of our time talking about, like, the sort of improvements in the foundation models. But I- I believe that even if we had no more progress today with models, which is absolutely not the case, but if- even if we had no more progress, we are way behind on product. There's so much more product to build.
- LRLenny Rachitsky
Mm-hmm.
- AEAlexander Embiricos
So, I think like- just like the moment is ripe, if that makes sense.
- LRLenny Rachitsky
Mm-hmm.
- AEAlexander Embiricos
But I think there's a lot of sort of counterintuitive things that surprised me when I arrived as far as, like, how things are structured. One example that comes to mind is, like, when I was working on my startup and- and before that when I was at Dropbox, it was, like, very important, you know, especially as a PM, to, like, always kind of rally the ship. And it was kind of like, make sure you're pointed in the right direction and then you can, like, accelerate in that direction. But here, I think because we don't exactly know, like, what capabilities will even come up soon and we don't know what's going to work, uh, technically, and then we also don't know what's gonna land even if it works technically, it's much more important for us to be very, like, humble and learn a lot more empirically and just try things quickly. And so like, the org is- is set up in that way to- to be incredibly bottoms-up. You know, this is again one of those things that like, as you were saying, everyone wants to move fast. I think everyone likes to say that they're bottoms-up, or at least a lot of people do. But OpenAI is, like, truly, truly bottoms-up and that's, like, been a learning experience for me. That now, like, it's- it'll be interesting if I ever work at, like... I don't think it'll ever th- th- it'll even make sense to work at a non-AI company in the future. I don't even know what that means. But if I were to imagine it or go back in time, I think I would, like, run things t- totally differently.
- LRLenny Rachitsky
What I'm hearing is kind of this, uh, ready, fire, aim, uh, is the approach more than ready, aim, fire. And there's something, uh, a- and as you process that (laughs) , uh, 'cause that may not come across well, but I actually have heard this a lot at AI companies is, because you don't know, and Nick Charlie shared I think the same sentiment, because you don't know how people will use it, it doesn't make sense to spend a lot of time making it perfect. It's better to just get it out there in a primordial way, see how people use it, and then go big on that use case.
- AEAlexander Embiricos
Yeah. It's like to... Okay, to use this analogy a little bit, I feel like there- there is an aim component, but the aim component is much fuzzier. You know, it's kind of like roughly what do we think can happen? Like, someone, um, I've learned a ton from working here is, uh, is a research lead, and he likes to say that like, at OpenAI we can- can have really good conversations about something that's like a year plus from now, and you know, there's a lot of ambiguity in what will happen, but- but like that's a ripe sort of timeline. And then we can have really good conversations about what's happening like in like low months or low... or weeks. But there's kind of this like awkward middle ground which was like as you start approaching a year but you're not at a year where it's, like, very difficult to reason about, right? And so as far as, like, aiming, I think we want to know like, okay, what are some of the futures that we're trying to build towards? And like a lot of the problems we're dealing with in AI, like such as alignment, are problems you need to be thinking out like really far out into the future. So, we're kind of aiming fuzzily there. But when it comes down to the more tactically like, "Oh yeah, like what product will we build and therefore how will people use that product?" That's the place where we're much more like, "Let's find out empirically."
- LRLenny Rachitsky
That's a good way of putting it. Something else that when people hear this, they... People sometimes hear companies like yours saying, "Okay, we're gonna be bottoms-up. We're gonna try a bunch of stuff. We're not gonna have exactly a plan of where it's going in the next few months." The key is you all hire the best people in the world. And so that feels like a really key ingredient in order to be this successful at bottoms-up work. (laughs)
- AEAlexander Embiricos
It just super resonates, basically. Um, I was just like, again, surprised or even shocked when I arrived at, like, the level of, like, individual, like, drive and, like, autonomy that everyone here has. So I think, like, the way that OpenAI runs, like many... Y- you can't, like, read this or be on... listen to a podcast and be like, "I'm- I'm just gonna deploy this to my company." Um, you know, maybe this is a harsh thing to say but I think like, yeah, very few companies have the talent caliber to be able to do that. So, it might need to be, like, adjusted if you were gonna
- 11:34 – 15:43
Codex: OpenAI’s coding agent
- AEAlexander Embiricos
implement this.
- LRLenny Rachitsky
Okay, so let's talk Codex. You lead work on Codex. How is Codex going? What numbers can you share? Is there anything you can share there? Also just not everyone knows exactly what Codex is. Explain what Codex is.
- AEAlexander Embiricos
Totally, yeah. So, uh, I have the very lucky job of- of living in the future and leading products on Codex. Um, and Codex is OpenAI's coding agent. So super concretely, that means it's an IDE extension, like a VS Code extension, uh, that you can install, or a terminal tool that you can install. And when you do so, you can then basically pair with Codex to answer questions about code, write code,... uh, you know, run tests, execute code, and do a bunch of the work in sort of that, like, thick middle section of the software development life cycle, which is all about, uh, you know, writing code that you're gonna get into production. Uh, more broadly, we think of Codex as, like, it's c- that what it currently is, is just the beginning of a software engineering teammate. And so, you know, when we, when you, when we use a big word like teammate, like, some of the things we're imagining are that it's not only able to, to write code but actually it participates, like, early on in, like, the ideation and planning phases of writing software and then further downstream in terms of, like, validation, deploying, and, like, maintaining code. To make that a little more fun, like, one thing I like to imagine is, like, if you think of what Codex is today, it's a bit like this, like, really smart intern that, like, refuses to read Slack and, like, doesn't check Datadog or, like, Sentry unless you ask it to. And so, like, no matter how smart it is, like, how much are you gonna trust it to write code without you also working with it, right? So that's how people use it mostly today is they pair with it. But we wanna get to the point where, you know, it can work, like, just like a new intern that you hire you don't only ask them to write code, but you ask them to participate across the cycle. And so you know that, like, even if they don't get something right the first try, they're eventually gonna be able to iterate their way there.
- LRLenny Rachitsky
I thought the way, uh...
- AEAlexander Embiricos
Um-
- LRLenny Rachitsky
I thought the point about not reading Slack and Datadog was it's just not distracted, it's just constantly focused and is always in flow. But I get what you're saying there is it doesn't have all the context on everything-
- AEAlexander Embiricos
Yeah.
- LRLenny Rachitsky
... that's going on.
- AEAlexander Embiricos
And, like, that's not only true when it's performing a task, but again, if you think of, like, the best teammate and teammates, like, you don't tell them what to do, right? Like, maybe when you first hire them you have, like, a couple meetings and you're like, "Hey," like, you kinda learn, like, "Okay, this is, this, these prompts work for this teammate. These prompts don't," right? "This is how to communicate with this person." Then eventually you give them some starter tasks, you delegate a few tasks. But then eventually you just say like, "Hey, great, okay, you're working with this set of people in this area of the code base. You know, feel free to work with other people in other parts of the code base too even. And, uh, yeah, you tell me what you think makes sense to be done," right? And so, you know, we think of this as, like, proactivity and, like, one of our major goals with Codex is to, like, get to proactivity. I think this is... this is, like, critically important to, like, achieve the mission of OpenAI, which is to deliver the benefits of AGI to all humanity. You know, I like to joke today that, like, AI products, and it's, it's a half joke, they're actually, like, really hard to use because you have to, like, be very thoughtful about when it could help you. And if you're not prompting a model to help you, it's probably not helping you at that time. And if you think of how many times, like, the average user is prompting an AI today, it's probably like tens of times, but if you think of how many times people could actually get benefit from a really intelligent entity, it's thousands of times per day. And so a lar- a large part of our, our goal with Codex is to figure out, like, what is the shape of an actual teammate agent that is sort of helpful by default?
- LRLenny Rachitsky
When people think about Cursor and, uh, even Cloud Code, it, it's like a IDE that helps you code and kind of autocompletes code and maybe does some agentic work. What I'm hearing here is the vision is, is different, which is it's a teammate, it's like a remote teammate building code for you that you talk to and ask to do things, and it also does I- IDE, autocomplete, and things like that. Is that, is that a, kind of a differentiator in the way you think about Codex?
- AEAlexander Embiricos
It's basically this idea that, like, we want the way, like... If you're a developer and you're trying to get something done, we want you to just feel like you have superpowers and you're able to move much, much faster, but we don't think that in order for you to reap those benefits you need to be sitting there constantly thinking about, like, "How can I invoke AI at this point to do this thing?" We want you to be able to sort of, like, plug it in to the way that you work and have it just start to do stuff without you having to
- 15:43 – 24:59
Codex’s explosive growth
- AEAlexander Embiricos
think about it.
- LRLenny Rachitsky
Okay. And I have a lot of questions along those lines, but, uh, just how's it going? Is there any stats, any numbers you can share about how Codex is doing?
- AEAlexander Embiricos
Yeah, it's been... Codex has been growing, like, absolutely explosively, um, since the launch of GPT-5 back in August. Um, there's some, definitely some interesting, like, product insights to talk about as to, like, how we unlock that growth if you're interested. But, you know, and the la- the last stat we shared there was, like, we, we were like well over 10X since August. In fact, it's been like 20X since then. Um, also, the Codex models are serving mill- each many trillions of tokens a week now, and it's basically, like, our most served coding model. Um, one of the really cool things that we've seen is that the way that we decided to set up the Codex team, uh, was to build a, you know, really tightly integrated product and research team that are iterating on the model and the harness together. And it turns out that lets you just do a lot more and try many more experiments as to how these things will work together. And so we were just training these models for use in our first-party harness that we were very opinionated about. And then what we've started to see more recently actually is that other major sort of API coding customers are now starting to adopt these models as well, and so we've reached the point where actually the Codex model is the most served coding model in the API as well.
- LRLenny Rachitsky
You, uh, hinted at this, uh, w- what unlocked this growth. I am extremely interested in hearing that. It felt like before, I don't know, maybe this was before you joined the team, it just felt like Cloud Code was killing it, just everyone was sitting on top of Cloud Code. It was by far the best way to code. And then all of a sudden Codex comes around. I remember Karpathy tweeted that he just, like, has never seen a model like this. He f- uh, I think the tweet was the gnarliest bugs that he runs into that he just spends hours trying to figure out, nothing else is solved, he gives it to Codex, lets it run for an hour, and it solves it. What di- what did you guys do?
- AEAlexander Embiricos
We have this strong sort of mission here at OpenAI to, you know, basically to build AGI. Um, and so we s- we think a lot about what... how can we shape the product so that it can scale, right? You know, o- earlier I was mentioning like, "Hey, like, if you're an engineer you should be getting help from an, from AI like thousands of times per day," right? And so we thought a lot about the primitives for that when we launched our first version of Codex, uh, which was Codex Cloud. And that was basically a product that had its own computer, it lived in the cloud, you could delegate to it, and, you know, the, sort of the coolest part about that was you could run many, many tasks in parallel. But some of the challenges that we saw are that it's a little bit harder...... to set that up, both in terms of, like, environment configuration, like giving the model the tools it needs to validate its changes, and to learn how to prompt in that way. And sort of my min- my analogy for this is going back to this teammate analogy. It's like if you hired a teammate but you're never allowed to get on a call with them, and you can only go back and forth, you know, asynchronously over time. Like, that works for some teammates, and eventually that's actually how you wanna spend most of your time, so that's still the future. But it's hard to initially adopt, uh, th- and so we still have that vision of, like, that's what we're trying to get you to, a teammate that you delegate to and that is proactive, and we're seeing that growing. But the key unlock is actually first you need to land with users in a way that's, like, much more intuitive and, like, trivial to get value from. So, the way that most people discover, like the vast majority of users discover Codex today is either they download an IDE extension or they run it in their CLI, and the agent works there with you on your computer interactively. And, um, it works within a sandbox, which is actually, like, a really cool piece of tech to s- to, to help that be safe and secure. But it has access to all those dependencies, so if the agent needs to do something, like it needs to run a command, it can do so within the sandbox so you don't have to set up any environment. And if it's a command that doesn't work in the sandbox, it can just ask you, and so you can get into this, like, really strong feedback loop using the model. And then over time, like, our team's job is to, like, help turn that feedback loop into you sort of, as a byproduct of using the product, configuring it so that you can then be delegating to it down the line. And again, analogy, you know, keep going back to it, but, like, if you hire a teammate and you ask them to do work but they... you just give them, like, a fresh computer from the store, it's gonna be hard for them to do their job, right?
- LRLenny Rachitsky
Mm-hmm.
- AEAlexander Embiricos
But if as you work with them side by side you can be like, "Oh, you don't have a password for this service we use. Like, here's the password for this service. You know, yeah, don't worry, feel free to run this command," then it's, like, much easier for them to then go off and do work for hours without you.
- LRLenny Rachitsky
So, what I'm hearing is the initial version of Codex was almost too far in the future. It's like a remote, in the cloud, uh, agent that's coding for you asynchronously. And what you did is, okay, let's actually s- come back a little bit. Let's integrate into the way engineers already integrate into IDEs and locally, and help them kinda on-ramp to this new world.
- AEAlexander Embiricos
Totally. And if this was... it was quite interesting because we, we dogfood the product a ton at OpenAI. So, you know, dogfood can be as in we use our own product. And so Codex has been accelerating OpenAI over the course of the entire year, and the cloud product was a massive accelerant to the company as well. Um, it just turns out that this was one of those places where the signal we got from dogfooding is a little bit different from the signal you get from, like, the general market. Because at OpenAI, you know, we train reasoning models all day, and so we're very used to this kind of prompting and, like, you know, think up front, run things massively in parallel, and, uh, you know, it would take some time and then come back to it later asynchronously. And so, you know, now when we build, we still get a t- a ton of signal from dogfooding internally but, uh, you know, we're also very cognizant of, like, the different ways that different audiences use the product.
- LRLenny Rachitsky
That's really funny. It's like, live in the future but maybe not too far in the future, and I could see how everyone at OpenAI is living very far in the future, and sometimes that won't, that won't (laughs) work for everyone.
- AEAlexander Embiricos
Yeah.
- LRLenny Rachitsky
What about just, like, uh, intelligence training data? I don't know, is there something else that helped Codex accelerate its ability to actually code? Is it, like, better cleaner data? Is it more just models advancing? Is there anything else that really helped accelerate it?
- AEAlexander Embiricos
Yeah, so there's, like, a few components here. Um, I guess, you know, you were mentioning models, and the models have improved a ton. In fact, um, just last Wednesday we shipped GPT-5 from one Codex Max, a very, you know, accurately named model. (laughs) Uh, that is, that is awesome. It is awesome both because it is, um, for any given task that you were using GPT-5 from one Codex 4, it's like, you know, roughly, uh, 30% faster at accomplishing that task. But also it unlocks a ton of intelligence, so if you use it at our higher reasoning levels, it's just, like, even smarter. Um, and, you know, that, that feedback they... or that tweet you were saying that Karpathy made about, like, "Hey, give us your gnarliest bugs," like, you know, obviously there's a, a ton going on in the market right now but, like, Codex Max is definitely, like, carrying that mantle of, uh, you know, tackling the hardest bugs. Um, so that is, that is super cool. But I will say it's, like, some of wa- how we're thinking about this is evolving a little bit from being like, "Yeah, we're just gonna think about the model and, like, let's just, like, train the best model," to really thinking about, like, what is an agent actually overall, right? And, you know, I'm not gonna try to define agent exactly, but at least the stack that we think of it as having is it's like you have this model, really smart reasoning model, that knows how to do a specific kinda task really well, so we can talk about how we make that possible. But then actually we need to serve that model through an API into a harness, and both of those things also have a really big role here. So for instance, one of the things, uh, that we're really proud of is you can have GPT-5 from one Codex Max work for really long periods of time. That's not, like, normal but you can set it up to do that or that might happen, but now routinely we'll hear about people saying like, "Yeah, it ran, like, overnight," or, "It ran for 24 hours."
- LRLenny Rachitsky
Hmm.
- AEAlexander Embiricos
And so, you know, for a model to work continuously for that amount of time, it's gonna exceed its context window, and so we have a solution for that which we call compaction. Um, but compaction is actually s- a feature that uses, like, all three layers of that stack, so you need to have a model that has a concept of compaction and knows, like, "Okay, as I start to approach this context window, I might be asked to, like, prepare to be run in a new context window." And then at the API layer you need an API that, like, understands this concept and, like, has an endpoint that you can hit to do this change, and at the harness layer you need a harness that can, like, prepare the payload for this to be done. And so, like, shipping this compaction feature that now just, like, made this behavior possible to, like, anyone using Codex, actually meant working across all three things, and I think that's, like, increasingly gonna be true. Another maybe, like, underappreciated version of this is, is if you think about all the different coding products out there, they all have, like, very different tool harnesses with, like, very different opinions on how the model should work.And so if you wanna train a model to be good at, like, all the different ways, uh, it could work, like, you know, maybe you have a strong opinion that it should work using semantic search, right? Maybe you have a strong opinion that it should, like, call bespoke tools. Or maybe you have, like in our case, a strong opinion that it should just use, like, the shell, work in the terminal. You know, you can be much, you can move much faster if you're just optimizing for one of those worlds, right?
- LRLenny Rachitsky
Mm-hmm.
- AEAlexander Embiricos
And so the way that we built Codex is that it just uses the shell, but in order to make that, like, safer and secure, we, uh, have a sandbox that the model is used to operating in. So I think one of the biggest accelerants to go all the way back to your, to your answer to Gresham is just like, we're building all three things in parallel and like kind of tuning each one and, um, you know, constantly experimenting with how those things work with like a tightly integrated product and research team.
- 24:59 – 33:11
The future of AI and coding agents
- AEAlexander Embiricos
- LRLenny Rachitsky
How do you think you win in this space? Do you think it, it'll event- it'll always be this kind of like race with other models constantly kind of leapfrogging each other? Do you think there's a world where some would just take, runs away with it and no one else can ever catch up? Is there like a path to just we win?
- AEAlexander Embiricos
Again, comes back to this idea of like building a teammate and not just a teammate that, you know, uh, participates in team planning and prioritization. Not just a teammate that, you know, really tests its code and like helps you maintain and deploy it. But even a teammate, you know, like if you think, again, an engineering teammate, they can also like schedule a calendar invite, right? Or move stand up or do whatever, right? And so in my mind, if we just imagine that every day or every week some like crazy new capability is just gonna be deployed by a research lab, it's just impossible for us, like, you know, as humans to keep up and like use all this technology. And so I think we need to get to this world where you kind of just have like an AI teammate or super assistant that you just talk to and it just knows how to be helpful, like on its own, right? And so you don't, you don't have to be like reading the latest tips for how to use it. You're just like, you plugged it in and it just provides help. And so that's kind of the shape of what I think we're building and I think that will be like a very sticky, like winning product if we can do so. So the shape that in, in my head at least I have is that we build, you know, maybe a c- fun topic is like, is chat the right interface for AI? I actually think chat is a very good interface when you don't know what you're supposed to use it for. Uh, in the same way that if I think of like, I'm like on MS Teams or in Slack with a teammate, chat is pretty good. I can ask for whatever I want, right? It's like, it's kind of the, the common denominator for everything. So you can chat with the super assistant about whatever topic you want, whether it be coding or not. And then if you are like a functional expert in a specific domain such as coding, there's like a gooey that you can pull up to go really deep and like look at the code and like work with the code. So I think like what we need to build as OpenAI is basically this idea of like, you have chat, ChatGPT, and that is a tool that's like ubiquitously available to like everyone. You start using it even like outside of work, right? To just help you, you become very comfortable with the idea of being accelerated with AI. And so then you get to work and you just can naturally just, yeah, I'm just gonna ask it for this and I don't need to know about all the connectors or like all the different features. I'm just gonna ask it for help and it'll surface to me the, the best way that it can help at this point in time. And maybe even chime in when I didn't ask it for help. Um, so in my mind, if we can get to that, I think that's, you know, that's how we, we really build like the winning product.
- LRLenny Rachitsky
This is so interesting because at, with the, my chat with Nick Charli, the head of ChatGPT, I think he shared that the original name for ChatGPT was Super Assistant or something like that.
- AEAlexander Embiricos
Yeah.
- LRLenny Rachitsky
And it's interesting that there's like that approach to the Super Assistant and then there's this Codex approach. It's almost like the B2C version and the B2B version. And what I'm hearing is the idea here is okay, you start with coding and building and then it's doing all this other stuff for you. Scheduling meetings, I don't know, probably posting in Slack. Uh, I don't know, shipping designs. I don't know. Is that, is the idea there this is like the, the business version of ChatGPT in a sense? Or is there, or, or is there something else there?
- AEAlexander Embiricos
Yeah, so you know, so we're getting to the like, uh, the like one year time horizon conversation.
- LRLenny Rachitsky
(laughs)
- AEAlexander Embiricos
A lot of this might happen sooner, but in terms of fuzziness, I think we're at the one year, so I'll give you like a contention and like a plausible way we get there. But as for how it happens, who knows?
- LRLenny Rachitsky
Mm-hmm.
- AEAlexander Embiricos
So basically if we're gonna build a super assistant, it has to be able to do things, right? So like, we're gonna have a model and it's gonna be able to do stuff affecting your world. And one of the learnings I think we've seen over the past year or so is that for models to do stuff, they are much more effective when they can use a computer. Right? Okay. So now we're like, okay, we need the super assistant that can use a computer, right? Or many computers. And now the question is, okay, well how should it use the computer, right? And there's lots of ways to use a computer. Uh, you know, you could try to hack the OS and like use accessibility APIs. Maybe a bit easier is you could point and click. That's a little slow, you know, and, uh, unpredictable sometimes. Um, and another way it turns out the best way for models to use computers is simply to write code, right? And so we're kind of getting to this idea where like, well if you wanna build any agent, maybe you should be building a coding agent. And maybe to the user, a non-technical user, they won't even know they're using a coding agent the same way that no one thinks about are they using the internet or not? It's just, they're more just like, is wifi on, right? So I think that what we're doing with Codex is we're building a software engineering teammate and as part of that we're kind of building an agent that can use, uh, a computer by writing code. And so we're already seeing like some pull for this. It's like quite early, but we're starting to see people like who are using Codex for like coding adjacent produ- purposes. And so as that develops, I think we'll just naturally see that like, oh, it turns out like we should just always have the agent write code if there is a coding way to solve a problem instead of, you know, even if you're doing it in financial analysis, right? Like, maybe write some code for that. So basically like, you know, you were like, hey, is this like the two ends of, of, uh, of this product for the super assistant, right? Of ChatGPT. In my mind, like just coding is a core competency of any agent, including ChatGPT. And so like what really what we think we're building is like that competency. But, so here's, here's like the really cool thing about agents writing code is that you can import code, right? Code is like composable.... interoperable, right? 'Cause if, if we, you know, one very reductive view we could have for an agent is it's just gonna be given a computer and it's just gonna, like, point and click and, and, you know, go around. But, you know, that is the future and then how we get there i- is difficult to, sort of, chart a path because a lot of the questions around building agents aren't like, "Can the agent do it?" But it's more about, "Well, how can we help the agent understand the context that it's working in?" And, like, the team that's using it, you know, probably has a way that they like to do things. They have guidelines. They probably want certain deterministic guarantees about what the agent can or cannot do, or they wanna know that the agent understands, sort of, this detail. Like, an example would be, you know, if we're looking at a crash reporting tool, hitting a connector for it, every sub-team is probably... has a different meta prompt for, like, how they want the crashes to be analyzed, right? And so we start to get to this thing where, like, yeah, we have this agent sitting in front of a computer, but we need to make that configurable for the team or for the user, right? And let them, like... Stuff that the agent does often, we probably just wanna, like, build in as a competency that this agent has that it can do. So I think we end up with this generalizable thing that you were saying of, like, an agent that can just write its own scripts for whatever it wants to do. But I think that the, the, the really key part here is can we make it so that everything that the agent has to do often or that it does well, we can just, like, remember and store so that the agent doesn't have to write a script for that again, right? Or maybe, like, if I just joined a team and you are already on the same team as me, I can just, like, use all those scripts that the agents have written already.
- LRLenny Rachitsky
Yeah. That's like, if this is our teammate, uh, we can... They can share things that it's learned from working with other people at the company. It just makes-
- AEAlexander Embiricos
Right.
- LRLenny Rachitsky
... sense as a metaphor.
- AEAlexander Embiricos
Yeah.
- LRLenny Rachitsky
It feels like you're in the, uh, Karpathy camp of agents today are not that great and mostly slop and maybe in the future they'll be awesome. Does that resonate?
- AEAlexander Embiricos
I think... So I think coding agents are pretty great. I think, uh-
- LRLenny Rachitsky
That, that feels right.
- AEAlexander Embiricos
... we're seeing a ton of value there.
- LRLenny Rachitsky
That feels right. Yep.
- AEAlexander Embiricos
And then I think, like, agents si- outside of coding, it's still, like, very early. And, you know, this is just my opinion, but I think they're gonna get a whole lot better once they can use coding too in the, in a composable way.
- LRLenny Rachitsky
Hm.
- AEAlexander Embiricos
This is, it's kind of the fun part of, like, when you're building for software engineers, like, I... At my startup we were building for software engineers too for a lot of that journey and they're just such a fun audience to build for because, you know, they also like building for themselves and are often, like, even more creative than we are in thinking about how to use the technology. Um, and so, like, by building for software engineers you get to just observe a ton of emergent behaviors and, like, things that you should do and build into the product.
- LRLenny Rachitsky
I love how you, you say that 'cause a lot of people building for engineers get really annoyed 'cause the engineers are so... They're just always complaining about stuff. They're like, "Ah, this sucks. Why'd you build it this way?" Uh, I love that you enjoy it, but I think it's probably because you're building such an amazing tool for engineers that can actually solve problems, uh, and just, you know, code
- 33:11 – 44:08
The impact of AI on engineering
- LRLenny Rachitsky
for them. Um, kind of along those lines, you know, there's always this talk of what will happen with jobs, engineers, coding, do you have to learn coding, all these things. Uh, clearly the way you're describing it is it's a teammate, it's gonna work with you and make you more superhuman, it's not gonna replace you. What's a way you just think about the impact on the field of engineering having, uh, this super intelligent engineering teammate?
- AEAlexander Embiricos
I think there's, there's two sides to it, but the one we were just talking about is this idea that maybe every agent should actually use code and be a coding agent. And in my mind that's just, like, a small part of this, like, broader idea that like, hey, as we make code even more ubiquitous... I mean, you could probably claim it's ubiquitous today even pre-AI, right? But as we make code even more ubiquitous, it's actually just going to be used for many more purposes. And so there's just gonna be a ton more need for people with this... like, humans with this competency. So, so that's my view. I think this is, like, quite a complex topic so in... You know, it's something we talk about a lot and we have to kind of see how it pans out, but I think what we can do... What we can do basically as a product team building in this space is just try to always think about how are we building a tool so that it feels like we're, like, maximally accelerating, uh, people. You know, rather than building a tool that makes it, like, more unclear what you should do as the human, right? Like, I think, like, to, to, you know, give an example right now, like, nowadays when you work with a coding agent, um, it writes a ton of code, but it turns out writing code is actually one of the most fun parts of software engineering for many software engineers. And so then you end up reviewing AI code, right? And that's often a less fun part of the job for many software engineers, right? And so I actually think, like, we see that, like, this, this comes up, plays out all the time in, like, a ton of micro decisions and so we as a product team are always thinking about like, "Okay, how do we make this more fun? How do we make you feel more empowered? Where is this not working?" And I, I would argue that, like, reviewing agent-written code is, like, a place that today is, like, less fun. And so, you know, then I think, "Okay, what can we do about that?" Well, we can ship a code review feature that, like, helps you build confidence in the AI-written code. Okay, cool. You know, another thing we could do is we could make it so that the agent's, like, better able to validate its work. And, you know, it, it gets all the way down into, like, micro decisions. Like, if you're gonna have the... an agent capability to validate w- work and let's say you have, like, a... I'm thinking of Codex Web right now, like, you have a p- a pane that sort of reflects the work the agent did, what do you see first? Do you see the diff or do you see the image preview of the code it wrote? Right? And, you know, I think if you're thinking about this from the perspective of, like, how do I empower the human, how do I make them feel like as, as accelerated as possible, like, you obviously see the image first, right? You shouldn't be reviewing the code unless first, you know, you've seen the image, unless it's maybe it's been, like, reviewed by an AI and now it's time for you to take a look.
- LRLenny Rachitsky
When I had, uh, Michael Tritt, the CEO of Cursor on the podcast, he, he had this kind of vision of us moving to something beyond code. And I've seen this rise of something called spec-driven development where you kind of just write the spec and then the code, you know, the AI writes code for you, and so you kind of w- start working at this higher abstraction level. Is that something you see where we're going, just like engineers not having to actually write code or look at code and there's gonna be this higher level of abstraction that we focus on?
- AEAlexander Embiricos
Yeah. I mean, I think, I think there's, like, constantly these levels of abstraction and they're actually already played out today.... right? Like, today, like, coding agents mostly it's, like, prompts to patch, right? We're starting to see people doing, like, spec-driven development or, like, plan-driven development. That's actually one of the ways when people ask, like, "Hey, how do you run Codex on a really long task?" Well, it's, like, often collaborate with it first to write, like, a plan.md, like, a Markdown file that's your plan. And once you're happy with that, then you ask it could- to go off and do work. And if that plan has verifiable steps, it'll, like, work for much longer. Um, so we're totally seeing that. I think spec-driven development is, like, an interesting idea. It's not clear to me that it'll work out that way 'cause a lot of people don't write, like, don't like writing specs either, but it, it seems plausible that some, some people will work that way. You know, like, a, a bit of a joke idea though is, like, if you think of, like, um, the way that many teams work today they're- they often, like, don't necessarily have specs, but the team is just really self-driven and so stuff just gets done. And so almost that is, like, I'm coming up with this on the spot, so it's, you know, not a good name, but, like, chatter-driven development, where it's just, like, stuff is happening, you know, on social media and, like, in your team communications tools. And then as a result, like, code gets written and deployed, right? So yeah, I think I'm a little bit more oriented in that way of, you know, I don't even necessarily wanna have to write a spec. Like, sometimes I want to only if I like writing specs, right? Uh, other times I might just want to say, like, "Hey, here's, like, the customer, you know, service channel and, like, tell me what's interesting to know. But if it's a small bug, just fix it." I don't have to write a spec for that, right?
- LRLenny Rachitsky
Mm-hmm.
- AEAlexander Embiricos
I have this sort of, uh, hypothetical future, uh, that I like to share sometimes with people as a provocation, which is, like, in a world where we have, like, truly amazing agents, like, what does it look like to be a solopreneur? Um, and, uh, you know, one terrible idea for how it could look is that it's actually is a mobile app and, um, every idea that it- the agent has to do is just, like, vertical video on your phone. And then you can, like, swipe left if you think it's a bad idea and you can, like, swipe right if it's a good idea, and, like, you can press and hold and, like, speak to your phone if you wanna get feedback on the idea before you swipe, you know? And in this world, like, basically what your job is just to, like, plug in this app into, like, every single, like, signal system, you know, system of record, and then you just sort of sit back and, like, swipe. I don't know.
- LRLenny Rachitsky
I love this. This is like Tinder meets TikTok meets Codex and s-
- AEAlexander Embiricos
It's pretty terrible.
- LRLenny Rachitsky
No, this is great. (laughs) So the idea here is this thing is this agent is watching and, right, listening to you, paying attention to the market, your users, and it's like, "Well, I hear something I should do." It's like a proactive engineer, just like, "Here, we should build this feature and fix this thing."
- AEAlexander Embiricos
Exactly.
- LRLenny Rachitsky
I think that's a really good idea.
- AEAlexander Embiricos
Exactly. It's c- communicating with you in, like, the lowest, like, effort way-
- LRLenny Rachitsky
Yeah, yeah, and like the-
- AEAlexander Embiricos
... for you to consume and decide.
- LRLenny Rachitsky
... the gyms, like, the, the modern way, way to communicate. (laughs)
- AEAlexander Embiricos
Yeah.
- LRLenny Rachitsky
Swipe left or right and, uh, and vertical feed and then the Sora video. Okay, so I see how this all connects now. I see.
- AEAlexander Embiricos
Yeah. To be clear, we're not building that but, like, you know, it's a fun idea. I mean, 'cause you know, like, in this example though, like, one of the things that it's doing is it's consuming external signals, right? I think the other really interesting thing is, like, if we think about, like, what is the most successful, like, AI product to date, um, I would argue, um... It's funny, actually not to confuse things at all, but, like, w- the first time we used the n- the, the brand Codex at OpenAI was actually the model powering GitHub Copilot. This is, like, way back in the day years ago. Uh, and so we decided to reuse that, that brand recently, um, 'cause it's just so good, you know, Codex, code execution. But I think actually, like, auto-completion in IDEs is, like, one of the most successful AI products to date. And part of what's so magical about it is that when, uh, the- it can surface, like, ideas for helping you really rapidly. When it's right, you're accelerated. When it's wrong, it's not, like, that annoying. It can be annoying, but it's not that annoying, right? And so you can create this, like, mixed initiative system that's, like, contextually responding to, like, what you're attempting to do. And so in my mind, this is, like, a really interesting thing for us as OpenAI as we're building. So for instance, you know, when I think about launching a browser, which we did with Atlas, right? Like, in my mind, one of the really interesting things we can then do is we can then, like, contextually surface, like, ways that we can help you as you're going about your day, right? And so we break out of this, like, you know, "We're just looking at code," or, "We're just in your terminal," um, into this idea that, like, hey, like, a real teammate is dealing with a lot more than just code, right? They're dealing with a lot of things that are web content. So, like, you know, how can we help you with that?
- LRLenny Rachitsky
Man, there's so much there, and I love this. Okay, so autocomplete on the web with the browser. That's so interesting. Just, like, here's all the things that we can help you with as you're browsing and going about your day. I wanna talk about Atlas. I'll come back to that. Uh, Codex, code execution, did not know that. That's really clever. I, I get it now. Okay, and then this chatter, what is it? Chatter-driven development? Uh, I had a... No, this is a really good idea. But it reminds me, I had Jon G- Danji on the podcast CTO of Block and they, they have this product called Goose, which is their own internal agent thing and he talked about an engineer at Block just, uh, has Goose watch him with, like, his screen and listens to every meeting and proactively does work that he should pro- probably wanna do. So ships a PR, sends an email, drafts a Slack message. So he's doing exactly what you're describing in, in kind of a very early way.
- AEAlexander Embiricos
Yeah, that's super interesting. And, you know, I bet you the... So if we go- if we went and asked them what the bottleneck to that productivity is, did- did they share what it is?
- LRLenny Rachitsky
Uh, probably looking at it, just making sure this is the right, the right thing to do.
- AEAlexander Embiricos
Yeah.
- LRLenny Rachitsky
Yeah.
- AEAlexander Embiricos
Yeah. So, like, we see this now, like, we have a Slack integration for Codex. People love, you know, if there's, like, something that you need to do quickly, people will just, like, @mention Codex, like, "Why do you think this bug is happening," right? Doesn't have to be an engineer, even, like, maybe, you know, data scientists often here are, are using Codex a ton to just, like, answer questions like, "Why do you think this metric moved? What happened?" So questions, you know, you get the answer right back in Slack. It's amazing. Super useful. But when it's- as for when it's wi- writing code, then you have to go back and look at the code, right? And so...... the real, like, s- I think bottleneck right now is, like, validating that the code worked and, like, writing code review. So, in my mind, if we wanted to get to something like, uh, you know, that, uh, friend you were talking about s- uh, world, I think we- we really need to figure out how to get people to configure their coding agents to be much more autonomous on those later stages of the work.
- LRLenny Rachitsky
It makes sense. Like you said, writing code. I used to be an engineer. I was an engineer for 10 years. Really fun to write code, really fun to just get in the flow, build, architect, test. Not so fun to look at everyone else's code and just have to go through and be on the hook if it is doing something dumb that's gonna take down production. And now that building has become easier, what I've always heard from companies that are really at the cutting edge of this is the bottleneck is now, like, figuring out what to build, and then it's at the end of, like, "Okay, we have all this- all 100 PRs that you have to review. Who's gonna go through all that?"
- AEAlexander Embiricos
Right.
- LRLenny Rachitsky
Yeah. This episode is brought to you by Jira Product Discovery. The hardest part of building products isn't actually building products, it's everything else. It's proving that the work matters, managing stakeholders, trying to plan ahead. Most teams spend more time reacting than learning, chasing updates, justifying roadmaps, and constantly unblocking work to keep things moving. Jira Product Discovery puts you back in control. With Jira Product Discovery, you can capture insights and prioritize high impact ideas. It's flexible, so it adapts to the way your team works, and helps you build a roadmap that drives alignment, not questions. And because it's built on Jira, you can track ideas from strategy to delivery, all in one place. Less chasing, more time to think, learn, and build the right thing. Get Jira Product Discovery for free at atlassian.com/lenny. That's atlassian.com/lenny.
- 44:08 – 45:40
How Codex has impacted the way PMs operate
- LRLenny Rachitsky
What has the impact of Codex been on the way you operate as a product person, as a PM? It's clear how engineering is impacted. Uh, code is written for you. What has it done to the way you operate and the way PMs operate at- at OpenAI?
- AEAlexander Embiricos
Yeah, I mean, I think mostly I just feel, like, much more empowered. Um, I've always been sort of more technical-leaning PM, and especially when I'm working on products for engineers, I feel like it's necessary to, like, you know, dogfood the product. But even beyond that, I- I- I just feel like I can do much, much more as a PM. And, uh, you know, Scott Belsky talks about this idea of, like, compressing the talent stack. I'm not sure if I phrased that right. But it's basically this idea that, like, maybe the boundaries between these roles are a little bit, like, less needed than before because people can just do much more, and every time you- someone can do more, you can, like, skip one communication boundary and make the team, like, that much more efficient, right? So I think- I think we see it, you know, in a bunch of functions now, but I guess since you asked about, like, product specifically, uh, you know, now, like, answering questions, much, much easier. You can just ask Codex for thoughts on that. Uh, a lot of, like, PM type work, understanding what's changing, again, just ask Codex for help with that. Um, prototyping is often faster than writing specs. This is something that a lot of people have talked about. I think something that I don't think is super surprising, but something that's slightly surprising, is, like, we see... Like, we're mostly building Codex for- to write code that's gonna be deployed to production, but actually we see
- 45:40 – 47:10
Throwaway code and ubiquitous coding
- AEAlexander Embiricos
a lot of throwaway code written with Codex now. That's kind of going back to this idea of, like, you know, ubiquitous code. So you'll see, uh, you know, someone wants to do an analysis, like if I want to understand something, it's like, okay, just give Codex a bunch of data but then ask it to build, like, an interactive, like, data viewer for this data, right? You would- that's just, like, too annoying to do in the past, but now it's just, like, totally worth the time of just getting an agent to go do something. Um, similarly, I've seen, like, some pretty cool prototypes on our design team about, like, if you want to... Well, like a designer basically wanted to build an animation, and this is the coin animation in Codex, and it was like normally it'd be too annoying to program this animation so they just vibe coded a animation editor, and then they used the animation editor to build the animation which they then checked into their repo. Actually, our designers are- there's a ton of acceleration there and, like, speaking of compressing the talent stack, I think our designers are very PM-y. So, you know, they- they do a d- a ton of product work and, like, they actually have, like, an entire, like, vibe-coded sort of side prototype of the Codex app. And so a lot of how we talk about things is, like, we'll have, like, a really quick jam 'cause there's like 10,000 things going on and then the designer will, like, go think about how this should work but instead of, like, talking about it again they'll just, like, vibe code a prototype of that in their, like, standalone prototype. We'll play with it. If we like it, they'll vibe code that prototype into- or vibe engineer that prototype into an actual PR to land and then depending on their comfort with the code base, like Codex CLIs in Rust is a little harder, maybe they'll like land it themselves or they'll like get close and then an engineer can help them like land the PR. Um,
- 47:10 – 49:01
Shipping the Sora Android app
- AEAlexander Embiricos
you know, we recently shipped the Sora Android app, um, and, uh, that was one of the most sort of mind-blowing examples of acceleration actually 'cause the usage of- of Codex internally at OpenAI is obviously really, really high but it's been growing, uh, over the course of the year both in terms of, like, now it's basically, like, all technical staff use it, uh, but even, like, the intensity and know-how of how to make the most of coding agents has gone up by a ton. And so the Sora Android app, right, like a fully new app, we built it in 18 days. It went from like zero to launched to employees and then 10 days later, so 28 days total, we went to just like GA, to the public. And that was done just like with the help of Codex. So pretty insane velocity. I would say it was like a little bit... hm, I don't want to say easy mode but there is one thing that Codex is really good at if you're a company that's like building software on multiple platforms, so you've already figured out like some of the underlying, like, APIs or systems, asking Codex to s- like to port things over is really effective 'cause it has like something it can go look at. And so the engineers on that team, uh, were basically having Codex go look at the iOS app, produce plans of work that needed to be done, and then go implement those. And it was kind of looking at iOS and Android at the same time and so, you know, basically it was like two weeks to launch to employees, four weeks total. Insanely fast.
- LRLenny Rachitsky
What makes that even more insane is it- was the- it became the number one app in the App Store.... I don't-
- AEAlexander Embiricos
Yeah.
- LRLenny Rachitsky
This just boggles the mind. Okay, so- (laughs)
- AEAlexander Embiricos
Yeah.
- LRLenny Rachitsky
... 20 days, yeah. (laughs)
- AEAlexander Embiricos
So imagine building number one app on the App Store-
- LRLenny Rachitsky
Yeah.
- AEAlexander Embiricos
... with, like, a handful of engineers, uh, I think it was, like, two or three possibly, uh, in a handful of weeks. Yeah.
- LRLenny Rachitsky
This is absurd.
- AEAlexander Embiricos
So-
- LRLenny Rachitsky
Wow.
- AEAlexander Embiricos
Yeah. So that's a really fun, um, example of, uh, of acceleration.
- 49:01 – 53:34
Building the Atlas browser
- AEAlexander Embiricos
And then, like, Atlas is the other one that I think, um, Ben did a podcast, the, the, the, um, Engi on Atlas, uh, sharing a little bit about how we built there. You know, many... Atlas is, is actually... I mean, it's a, it's a browser, right? And building a browser is really hard. Um, and so we, uh, had to build a lot of difficult systems in order to do that. And basically we got to the point where that team has a ton of power users of Codex right now. And, um, you know, got to the point where they, they basically were... We, we know, we were talking to them about it 'cause a lot of those engineers are people I used to work with, uh, before my startup. And so they'd say, you know, before this would've taken us like two to three weeks for two to three engineers. And now it's like one engineer, one week. Um, so massive acceleration there as well. And what's quite cool is that, uh, you know, we, we shipped Atlas on, on Mac first, but now we're working on the Windows version, you know, that... So the team now is, like, ramping up on Windows and they're helping us make Codex better on Windows too which is ad- admittedly earlier. Like, just the model we w- we shipped last week is the first model that natively understands PowerShell. So, you know, PowerShell being, uh, the native, like, shell language on Windows. So yeah, it's been, it's been really awesome to see, like, the whole company getel- getting accelerated by Codex, like, from eng- engi- you know, most obviously also research and, like, improving how quickly we train models and how well we do it. And then even, like, uh, design as we talked about and, and marketing. Like, actually we're at this point now where, uh, my product marketer is often also making string changes just directly from Slack or, like, updating docs directly from Slack.
- LRLenny Rachitsky
These are amazing examples. You guys are living at the bleeding edge of what is possible, and this is how other companies are gonna work. Uh, just shipping, again, what became the number one app in the App Store and just blow up all over the... It just, like, took over the, I don't know, the world for (laughs) at least a week. Uh, built you said in 28 days, and, like, I don't know, 10 days, 18 days just to get, like, the core of it working?
- AEAlexander Embiricos
Yeah. So it's, like, 18 days we had a thing that employees were playing with.
- LRLenny Rachitsky
Yeah.
- AEAlexander Embiricos
And then 10 days later we were out.
- LRLenny Rachitsky
And w- you said just a couple engineers.
- AEAlexander Embiricos
Yeah.
- LRLenny Rachitsky
Two or three. Okay. And then Atlas, you said it was, took a week to build?
- AEAlexander Embiricos
No, no, no.
- LRLenny Rachitsky
Okay.
- AEAlexander Embiricos
So Atlas is not the whole week, but A- Atlas was, like, a really meaty project.
- LRLenny Rachitsky
Yeah.
- AEAlexander Embiricos
Um, and so I was talking to one of the engineers on Atlas, um, about, like, you know, just how, what they use Codex for, and it's basically like, "We use Codex for absolutely everything." I was like, "Okay."
- LRLenny Rachitsky
(laughs)
- AEAlexander Embiricos
Well, like, you know, how would you, how would you measure the acceleration? And so basically the-
- LRLenny Rachitsky
Mm-hmm.
- AEAlexander Embiricos
... the answer I got back was, "Previously would've taken two to three weeks for two to three engineers, and now it's, like, one engineer, one week."
- LRLenny Rachitsky
Do you think this eventually moves to non-engineers doing this sort of thing? Like, does it have to be an engineer building this thing? Could sort of built, been built by a, I don't know, a PM or designer?
- AEAlexander Embiricos
I think we will very much get to the point where... well, basically where the boundaries are a little bit blurred, right? Like, I think we- you're gonna want someone who's, like, understands the details of what they're building, but what details those are will evolve. Kinda like how now, like, if you're writing Swift, you don't have to speak Assembly. You know, there's a handful of people in the world and it's really important that they exist and, like, speak Assembly. Uh, maybe more than a handful, right? But that's, like, a specialized function that, like, most companies don't need to have. So I think we're just gonna naturally see, like, an increase in layers of abstraction. And then the cool thing is now we're- we're entering, like, the language layer of abstraction, like natural language. And then natural language itself is really flexible, right? Like, you could have engineers talking about, like, a plan, and then you could have engineers talking about a spec, and then you could have engineers talking about just, uh, you know, a product or an idea. So I think we can also, like, start moving up those layers of ab- of abstraction as well. But, you know, I d- I do think this is gonna be gradual. I don't think it's gonna go up to l- like, all of a sudden, like, nobody ever writes anything and, like, you know any code and it's just specs. I think it's gonna be much more like, okay, we've set up our coding agent to be really good at, like, previewing the build, or, like, uh, running tests. Maybe that's the first part, right, that most people have set up. And it's like, okay, now we've set it up so they can, like, execute the build and it can, like, see the results of its own changes. But, you know, we haven't yet built a good integration harness so that it can, like, in the case of Atlas... Like, by the way, I don't know if they've done any of this or not. I think they've done a lot of this. But, you know, maybe the next stage is, like, enable it to, like, load a few sample pages to see how well those work, right? So then, okay, now we're gonna, like, set up- set up to that. And I think for some time at least, we're gonna have humans kind of curating, like, which of these connectors or systems or components that the agent needs to be good at talking to. And then, you know, in the future there will be an even greater unlock where Codex tells you how to set it up or maybe sets itself up in a repo.
- 53:34 – 55:35
Codex’s impact on productivity
- AEAlexander Embiricos
- LRLenny Rachitsky
What a wild time to be alive. (clears throat) Wow. I'm curious just the second order effects of this sort of thing, just how quickly it is to build stuff. What does that do? Does that mean distribution becomes much, much more important? Does it mean, uh, ideas are just worth a lot more? It's interesting to think about how quick, how that changes.
- AEAlexander Embiricos
Uh, I'm curious what you think. I still don't think ideas are worth as much as maybe some, a lot of people think. I think, still think execution is really hard, right? Like, you can build something fast, but you still need to execute well on it. It still needs to make sense and be a coherent thing overall. Um, yeah, and distribution is massive.
- LRLenny Rachitsky
Yeah. Just feels like everything else is now more important, everything that isn't the building piece, which is-
- AEAlexander Embiricos
Yeah.
- LRLenny Rachitsky
... coming up with an idea, getting to market, profit, all that kind of stuff.
- AEAlexander Embiricos
Yeah. I, I think we might have been in this weird temporary phase where, you know, for a while, like, you could, you could just... it was so hard to build product that you mostly just had to be really good at building product, and then maybe it didn't matter if you, like, had a intimate understanding of a specific customer.Um, but now I think we're getting to this point where actually, like, if I could only choose, like, one thing to understand, it would be, like, really meaningful understanding of, like, the problems that a certain customer has, right? If I could only, if I could only go in with one, like, core competency. So I think that that's, that's ultimately still what's gonna matter most, right? Like, if you're starting a new company today and you have, like, a really good understanding and, like, network of customers that are currently underserved by AI tools, I think you're, like, you're set, right? Whereas if you're, like, good at building, like, you know, websites but you don't have any specific customer to build for, I think you're in a, in for a much harder time.
- LRLenny Rachitsky
Bullish on vertical AI pr- startups is what I'm hearing. (laughs) Yeah. I completely agree. There's like, you know, there's like the general thing that can solve a lot of problems, and then there's like, "We're gonna solve presentations incredibly well, and we're gonna understand the presentation problem, uh, better than anyone. And we're gonna, uh, plug into your workflows, and then all these other things that matter for a very specific problem." Okay.
- 55:35 – 58:09
Measuring progress on Codex
- LRLenny Rachitsky
Incredible. When you think about progress on Codex, I imagine you have a bunch of evals and there's all these public benchmarks. What's something you look at to tell you, "Okay, we're making really good progress"? I imagine it's not gonna be the one thing, but what do you focus on? What's, like, something you're trying to push? What's, like, a KPI or two?
- AEAlexander Embiricos
One of the things that I'm constantly reminding myself of is that a tool like Codex sort of naturally is a tool that you would, you know, become a power user of, right? And so we can accidentally spend a lot of our time thinking about features that are, like, very deep in the user adoption journey. Um, and so we can kind of end up over-solving for that. And so I think it's, like, just critically important to, like, go look at, like, your, like, D7 retention, right? Just go try the product. Like, sign up from scratch again. Um, I have a few too many, like, ChatGPT Pro accounts that I've just, like, in order to maximally correctly dog food, like, signed up for on my Gmail, and they, they charge me like, 200 bucks a month. I need to expense those. But, uh, uh, you know, like, I think just, like, the feeling of being an user and the early retention stats are still, like, super important for us because e- you know, as much as this category is, is taking off, I think we're still in the very early days of, like, people using them. Um, another thing that we do that, that may, might be... I think we might be the most, like, user feedback/social media filled team out there in this space. 'Cause like, a few of us are, like, constantly on Reddit and Twitter and, uh, you know, there's, uh, there's praise up there and there's a lot of complaints, but we take the complaints, like, very seriously and look at them. And I think that, again, because you can use, like, coding agent for so many different things, um, it often is, like, kind of broken in many sort of ways for, like, specific behaviors. Um, and so we, we actually monitor a lot, just like what the vibes are on social media pretty often. Especially, I think for, for Twitter X, um, it's a little bit more hypey, and then Reddit is a little more negative but real, actually. Um, so I've s- I've started increasingly paying attention to, like, how people are talking about using Codex on Reddit actually.
- LRLenny Rachitsky
This is, uh, important for people to know. Hmm, which of the subreddits do you check most? Is there like, an r, uh, Codex or...
- AEAlexander Embiricos
I mean, the algorithm's pretty good at surfacing stuff, but like, r/Codex is, is there.
- LRLenny Rachitsky
Okay. I'll take. Very interesting. And then, uh, if people tag you on Twitter, you still see that, but maybe not as powerful as seeing it on Reddit.
- AEAlexander Embiricos
Well, yeah. And the interesting... Well, the thing with Twitter is it's a little bit more one-to-one, even if it's like, in public. Whereas like, with Reddit, there's like, really good upvoting mechanics and, like, maybe most people are still not bots. Unclear. Um, so you get, you get, like, good signal on what matters and what other people
- 58:09 – 1:01:58
Why they are building a web browser
- AEAlexander Embiricos
think.
- LRLenny Rachitsky
So, uh, interestingly, uh, Atlas. I wanna talk about that briefly. Uh, you guys launched Atlas. I tweeted actually that I tried Atlas and then I, I don't love the AI only, uh, search experience as just like, I just want Google sometimes or whatever. Like, just waiting for AI to give me an answer. I'm like, "I don't want to..." And there was no way to switch, so I just tweeted, "Hey, um, I'm switching back. I don't, it's not great." And I feel like I made some PMs at OpenAI sad, and I saw someone tweet, "Okay, we have this now." Which I imagine was always part of the plan. It's probably an example of we just ship, we gotta ship stuff, see how people use it, and then we figure it out. Uh, so I guess one is that... I don't know. Is there anything there? And two, I'm just curious, why are you guys building a web browser?
- AEAlexander Embiricos
So I, I worked on Atlas for a bit. Um, I don't work on it now. Um, but you know, like the, a bit of the narrative here for, for me, just to tell my story a bit, was like, I was working on this, like, screen sharing, like, pair programming startup, right? And then we joined OpenAI. And so the idea was really to build a contextual desktop assistant, and the reason I believe that's so important is because I think that it's really annoying to have to give all your context to an assistant and then to figure out how it can help you, right? And so if it could just, like, understand what you were trying to do, then it could maximally accelerate you. Um, and so I, I, I would ar- you know, I still think of Codex actually as like, a contextual assistant, um, from a little bit of a different angle, like, starting with coding tasks. But, um, the, some of the, some of the thinking, at least for me personally, I can't speak for the whole product, but was that a lot of work is done in the web, and if we could build a browser then we could be contextual for you, but in a much more first class way. We weren't hacking like, other desktop software which have like, very varied support for, for like, what content they're rendering to the accessibility tree. Uh, we wouldn't be relying on screenshots, which are a little bit slower and unreliable. Instead, we, we, we could like, be in the rendering engine, right, and like, extract whatever we needed to to help you. Um, and also I like to think of like, you know, in video games, like, I don't know if you've played like, I don't know, say Halo, right? Like, you walk up to an object. I mean, this is true for many games. You press... Man, it's been a long time. This is embarrassing. Press X (laughs) and it just does the right thing, right? And I was one of those guys who always read the instruction manual for every video game that I bought. And I remember the first time I read about a contextual action and I just thought it was like, this really cool idea. And, uh, you know, the, the thing about a contextual action is we need to know what you are attempting to do. We need to have a little bit of context and then we can, and then we can help.Uh, and I think this is critically important because, you know, im- imagine this world that we reach, right? Where, where we have agents that are helping you thousands of times per day. Um, imagine if the only way we could tell you that we helped you was if we could, like, p- push notify you. So you get a thousand push notifications a day of an AI saying like, "Hey, I did this thing. Do you like it?" It'd be super annoying, right? Whereas, imagine going back to software engineering, like, I was looking at a dashboard and I noticed some, like, key metric had, like, gone down. And, you know, at that point in time, maybe I could, like, maybe go take a look and then surface the fact that it has an opinion on why this metric went down and maybe a fix right there, right when I'm looking at the dashboard, right? That would be like, that would much more keep me in flow and enable the agent to take action on, like, many more things. So in my mind, like, part of why I'm excited for us to have a browser is that I think we have then, like, much more context around, like, what we should help with. Users have much more control over what they want us to look at. It's like, "Hey, if you wanna open s- if you want us to, like, take action on something, you can open it in your AI browser. If you don't, then you can open it in your other browser." Right? So, like, really clear control and boundaries. And then we have the ability to build UX that's, like, mixed-initiative so that we can surface contextual actions to you, like, at the times they're helpful, as opposed to just, like, randomly notifying
- 1:01:58 – 1:02:53
Non-engineering use cases for Codex
- AEAlexander Embiricos
you.
- LRLenny Rachitsky
Hearing the vision for Codex being the super assistant, it's not just there to code for you. It's trying to do a lot for you as a teammate, as this kind of super teammate, and that makes you awesome at work. So, I get this. Speaking of that, are there other non-engineering common use cases for Codex, just ways that non-engineers... We talked about it, you know, designers, prototyping, building stuff. Are there any, I don't know, fun or unexpected ways people are using Codex that aren't engineers?
- AEAlexander Embiricos
I mean, there's a load of- a load of unexpected ways. But I think, like, most of what we're seeing, like, real traction with people using things are still, for now, like, very, like, I would say coding-adjacent or, like, sort of tech-oriented, places where there's, like, a mature ecosystem, um, or you know, maybe you're doing data ana- data analysis or- or something like that. I personally am expecting that we're gonna see a lot more of that over time. Um, but for now, like, we're keeping the team, like, very focused on just coding for now, 'cause there's so much
- 1:02:53 – 1:04:49
Codex’s capabilities
- AEAlexander Embiricos
more work to do.
- LRLenny Rachitsky
For people that are thinking about trying out Codex, is there like a... Does it work for all kinds of code bases? Uh, what- what code does it support? If you're like, I don't know, at SAP, can you add Codex and start building things? What's kind of, like, the sweet spot? Where does it start to not be amazing yet?
- AEAlexander Embiricos
The so- I'm really glad you asked this question actually, because the best way to try Codex is to give it your hardest tasks, which is a little different than some of the other coding agents. Like, you know, some tools you might think, "Okay, let me, like, start easy or just, like, you know, like, live code something random and decide if I like the tool." Whereas, like, we're really building Codex to be the, like, professional tool that you can give your, like, hardest problems to, um, and you know, and that writes, like, high-quality code in your, like, enormous code base that is, in fact, not perfect right now. So yeah, I think if you're gonna try Codex, you wanna try it on, like, a real task that you have and not necessarily, like, dumb that task down to something that's, like, trivial. But actually, like, you know, like, a good one would be, like, you have a hard bug and you don't know what- what's causing that bug, and you ask Codex to, like, help figure that out or, like, to implement that, you know, the fix.
- LRLenny Rachitsky
I love that answer. (laughs) Just give it your hardest problem.
- AEAlexander Embiricos
I will say, like, you know, if you're- if you're like, "Hey, okay, m- well, the hardest problem I have is that I need to build, like, a new unicorn business," like, obviously that, you know, it's not gonna work.
- LRLenny Rachitsky
Yeah.
- AEAlexander Embiricos
Uh, not yet. So, I think it's like, give it, like, the hardest problem but something that is still, like, one, like, question, right, or one task, um, to start. That's if you're testing. And then over time, you can learn how to use it for, like, bigger things.
- LRLenny Rachitsky
Yeah. What languages does- does it support?
- AEAlexander Embiricos
Basically, the way we've trained Codex is like, there's a distribution of languages that we support, and it's, like, fairly aligned with, like, the frequency of these languages in the world. So, unless you're writing some, like, very esoteric language or like some private language, it should do fine in your language.
- LRLenny Rachitsky
If someone was just getting started, is there a tip you could share to help them be successful? Like, if you could just whisper a little tip and to someone just setting up
- 1:04:49 – 1:05:37
Tips for getting started with Codex
- LRLenny Rachitsky
Codex for the first time to help them have a really good time, what's something you would whisper?
- AEAlexander Embiricos
I might say, "Try a few things in parallel," right? So you c- try giving it a hard task, um, maybe ask it to understand the code base, uh, formulate a plan with it around an idea that you have, and kinda build your way up from there. And like, sort of the meta idea here is i- it's again, it's like you're building trust with a new teammate, right? And so, like, you wouldn't go to a new teammate and just give them, like, "Hey, do this thing. Uh, here's zero context." You would start by, like, first making sure they understand the code base, and then you would, like, maybe align on a pl- an approach, and then you would have them go off and do bit by bit. Right? And I think if you use Codex in that way, you'll just sort of naturally start to understand, like, the different ways of prompting it because it is... It's a super powerful, like, agent and model, but it is, it is a little bit different to prompt Codex
- 1:05:37 – 1:10:36
Skills to lean into in the AI age
- AEAlexander Embiricos
than other models.
- LRLenny Rachitsky
Just a couple more questions. One, we touched on this a little bit, as AI does more and more coding, there's always this question of should I learn to code and wh- why should I spend time doing this sort of thing? For people that are trying to figure out what to do with their career, especially if they're into software engineering, computer science, do you think there's specific elements of computer science that are mo- more and more important to lean into, maybe things they don't need to worry about? Like, what do you think people should be leaning into skill-wise in, as this becomes more and more of a thing i- in our workplace?
Episode duration: 1:25:12
Install uListen for AI-powered chat & search across the full episode — Get Full Transcript
Transcript of episode z1ISq9Ty4Cg
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome