EVERY SPOKEN WORD
20 min read · 4,097 words- 0:00 – 1:10
The origins and evolution of Claude Code
- BCBoris Cherny
When we first released Claude Code, it was like a little video, and I remember posting it to Slack, and there was like two people that gave like the reactions. [laughs]
- CWCat Wu
[laughs]
- BCBoris Cherny
And like people were like excited.
- CWCat Wu
Yeah, I thought it was really cool, especially for my very easy engineering tasks. It was quite good at it.
- BCBoris Cherny
That's like a really nice way to say that it wasn't really good. [laughs]
- CWCat Wu
[laughs] [upbeat music] I can't believe it's only been a year since we first launched Claude Code.
- BCBoris Cherny
It's hard to remember what, what that was like. Like, it, it is, it's so different than what we're doing today. Like, now I just have, like, armies of agents that are doing stuff. Like, I'm prompting one agent, or I have, like, an agent that's, like, prompting agents that's prompting agents-
- CWCat Wu
Yeah
- BCBoris Cherny
... and it's like a tree of, like, thousands of agents. But it, I think it's just, like, the most important idea when working on this stuff is, like, every single time Claude makes a mistake, I don't tell Claude to do it differently, I tell it to write it to the Claude MD or to, like, make a skill or, or something to do it differently. And if you can do this, then Claude can just, like, run forever. And I, I think the other thing that we kind of realized is the verification
- 1:10 – 3:14
How to make Claude good at verification
- BCBoris Cherny
is, is really important. Like, we didn't realize that.
- CWCat Wu
I hear this come up a lot with developers and enterprises that we meet with. Um, what are your tips for making a really good, making Claude Code really good at verification?
- BCBoris Cherny
I sort of feel like this is this thing that just, like, everyone misunderstands, 'cause whenever we talk about verification, people are thinking like unit tests or they're thinking like lint or like type check. These are the things that are obviously really easy to automate, and these are the things that were already automated. But actually, when we talk about verification for agents, it's something slightly different. It's like, can the agent run the thing? It takes a little bit of, uh, mental work to figure out how exactly do you do this, 'cause it's often not straightforward, and I think that's like, that, that's one of the challenges. I remember with, uh, with Opus Four, Claude tested itself, and we, we, we just, like, hooked it up to Opus Four, and I was like, "Claude, build a feature and then test yourself in, like, bash," and it opened a little Claude CLI and tested its own feature, and I was just like, "Whoa." [laughs] It's crazy. Like, now, now we're so used to it. Like, now, you know, now-
- CWCat Wu
Yeah
- BCBoris Cherny
... now we have these loops going for, you know, like, the iOS simulator and the Android simulator and, like, computers for desktop. Like, it's, it's not surprising, but back then, that was crazy. How, how are... Like, how, how are you doing it?
- CWCat Wu
So I've been mainly hacking on the desktop app these days, and one of the engineers on the team actually added this desktop development skill that teaches Claude how to run the local desktop app.
- BCBoris Cherny
Mm.
- CWCat Wu
And I've been having it use it, and it still runs into issues or, like, bugs with the staging environment sometimes. And so what I have it do is, in those cases, I have it read Slack and understand, hey, is, is staging down right now, or is there, has someone else already hit this? Um, and then when it debugs the whole issue, I tell it to update the desktop development skill. What this skill does is Claude actually spins up a local desktop app, and it uses computer use to click around on it. And so when I add a new UX, it clicks around to invoke the new UX. It also tests edge cases, and when there's an issue, it fix it, saves it, and rechecks.
- BCBoris Cherny
This is, like, honestly,
- 3:14 – 4:48
Roles merging: Claude Code beyond engineers
- BCBoris Cherny
one of my favorite things about this team is everyone codes. I, I, I have never been on a team where, like, m- like, the, my, my PM would code, and it's, like, crazy. And, like, your code is, like, really good. Like-
- CWCat Wu
You're too nice. [laughs]
- BCBoris Cherny
[laughs] But I, I also just feel like it's, it's also just becoming easier 'cause it's like, essentially, like, Claude writes the code, and so what matters a little more is, like, what, uh, what's the idea that you have? And I, I feel like if you're a person that has, like, the product context and the business context, and you're thinking about the design and the user, you're just gonna come up with, with better ideas.
- CWCat Wu
It's kind of like all the roles are merging.
- BCBoris Cherny
I remember seeing Megan, our designer's, PRs, and I was just horrified at the beginning. I was like, "Oh my God. Why is Megan putting up PRs?" And then she was like, "Yeah, yeah, I'm just, like, I'm fixing the button." And I was like-
- CWCat Wu
[laughs]
- BCBoris Cherny
... "Okay. All right. Well, the code looks good," so maybe it's, maybe it's fine, and I, I feel like now it's just, like, it's totally normal.
- CWCat Wu
Yeah. And we see this across all the enterprises we talk with. Like, it's the engineers adopt Claude Code first-
- BCBoris Cherny
Mm
- CWCat Wu
... and then the, the eng adjacent roles look over their shoulder, and they're like, "Whoa, this thing is very powerful. Let me try it out." And we found, it's crazy, we found that, like, our designers are more productive making prototypes and making changes directly in the app instead of paying an engineer. PMs are making changes in the app.
- BCBoris Cherny
Mm.
- CWCat Wu
Like, our finance team runs in Claude Code. They do their projections there.
- BCBoris Cherny
Mm.
- CWCat Wu
Um, data science, uh, like, if you talk with our data scientists, it's so cool. It's just like everyone just has Claude Codes up on their screens.
- BCBoris Cherny
Yeah.
- CWCat Wu
Um, I feel like it's, it's remarkably versatile for, for different roles.
- BCBoris Cherny
What do you feel like nowadays
- 4:48 – 6:43
Using routines for CI, code review, and more
- BCBoris Cherny
are, like, the use cases that are, that are pushing the limits?
- CWCat Wu
One that I'm super excited about is routines. There's one engineer on our team who launched voice mode across all of our products, and, um, he has this routine set up that just listens for every ticket that come, every GitHub issue, every bug report about voice mode, and his Claude just picks it up, proactively puts up a fix, and then pings the PR to him. And when he got that working for voice mode, he thought, "Okay, we're getting a lot of other feedback that isn't being responded to." So, uh, he also set up a routine to listen for that. So I shipped this, uh, small feature, and there was, like, an edge case in it that I didn't see. And so someone filed a bug for it, and I was gonna get it, get to the bug that night, and as my Claude was working, it said, "Wait a second. Another Claude has already fixed this."
- BCBoris Cherny
[laughs]
- CWCat Wu
And I was like, "How is this possible?" Like, I've never talked to him about this feature before. And so I pinged him, and I was like, "How did you fix this so quickly?" And he said he has another routine that just looks for bug reports that haven't been responded to in five hours, and puts up a fix, and he merges the ones that are easy to verify.
- BCBoris Cherny
Mm. Claude tells me this, like, all the time now.
- CWCat Wu
That someone else has already fixed it?
- BCBoris Cherny
There's always, like, another person's Claude that's working on it.
- CWCat Wu
[laughs]
- BCBoris Cherny
It's like, yeah, that's been one of the changes. I, I feel like we're, um-A while ago, we were trying to figure out, like, how to use routines, and I feel like just, like, the agent SDK was this first idea that we could use Claude Code programmatically. But I feel like at the beginning, it, it, it just wasn't obvious how do we use it? What do we use it for? And I, I think routines are the first really obvious application. And, um, I don't know, like, it, it just does, like, all the code review. It, it babysits, like, every PR. You remember back in the day, you used to actually have to, like, respond to code review comments. You used to have to, like, fix CI. You used to have to rebase. [chuckles]
- CWCat Wu
Yeah.
- BCBoris Cherny
Like, I, I haven't done that in a long time.
- CWCat Wu
Yeah. When you're in the CLI,
- 6:43 – 8:10
Boris' go-to feature: auto mode
- CWCat Wu
and you're synchronously working with Claude, what are the, your go-to features?
- BCBoris Cherny
Okay, what they used to be is plan mode. I don't use that anymore. I use-
- CWCat Wu
What do you use instead?
- BCBoris Cherny
Auto mode.
- CWCat Wu
Auto mode.
- BCBoris Cherny
It's the best.
- CWCat Wu
Instead of plan mode?
- BCBoris Cherny
Instead of plan mode, yeah, 'cause I-- the newer models, they don't actually need, like, a planning step anymore. I think this was really important for, like, Opus four through maybe four point five. Then I think starting with four six and definitely with four seven, it just doesn't need that planning step. I think some people still use it. They like to have that artifact. I don't, I don't use it. And I just do auto mode for everything because then I, I start my Claude, it starts to work, and then I just, like, move on to the next Claude. And I don't have to sit there and watch it. But from the very early stages, we had this, like, permission prompts model for Claude Code, right? Like, it runs a tool, and then it asks you like, "Hey, are you okay running this tool?" And you had to say yes or no. And at the time, that was kind of the best we had a year and a half ago because we didn't have, you know, classifiers. The model was not as well aligned as it is today. So auto mode was just such a, it was such a big step up because act-actually you don't want to read most of these requests. Just routing it to a different model and having it check for security works so much better.
- CWCat Wu
Yeah.
- BCBoris Cherny
And if a thing, like, is a little sus or, you know, this isn't a command that you think you want to run or it's not safe, the model will just deny it, and then you can go back, and you can allow it later. I think this has been one of those, like, steps we just-- we, we just-- there's no way we could have done this a year and a half ago.
- CWCat Wu
It's just human nature when you accept ninety-nine percent of requests that your eyes just
- 8:10 – 10:24
Securing auto mode: red teaming and evals
- CWCat Wu
glaze over when you read it. And so actually we feel that auto mode is more safe than reading every single permission prompt because it means that you're only paying attention to the most important thing and not, like, being spammed a bunch of things that are just ninety-nine percent yes.
- BCBoris Cherny
I think security is one of these things, like, you can talk about it, and then it, it's a totally different thing to actually do it correctly because it just doesn't always look the way that you think it's gonna look. And it's just all about, like, always red teaming, always pen testing, always looking, you know, always having a threat model and then using that to figure out, you know, how is this thing gonna get attacked? How are people gonna get prompt injected?
- CWCat Wu
Exactly.
- BCBoris Cherny
And I, I just feel like, like, the team is just, like, obsessed with this. And it, it, it's so important because as a result, I just trust the agent to run, and I can move on, and I can just have, like, a second agent. And if I didn't trust it, then I just wouldn't have been able to do that.
- CWCat Wu
And internally, um, to actually get auto mode out to our users, we needed to really trust it first. And so what we did was we collected thousands of transcripts of, like, an entire agent trajectory and a permission prompt and had auto mode classify whether or not it was safe. And it was extremely good at this, so then we got red teamers, and we asked them to try to prompt inject and try to hack, uh, the code base. And we used this to create evals and made sure that all of these were denied. And then we had our own internal teams try to prompt inject and hack Claude Code, uh, Claude Code's auto mode. And then we improved auto mode to make sure that we caught all of these. So it's not only just protecting you against the vulnerabilities that are out there in the wild today, but the, the most intelligent attacks that we can construct.
- BCBoris Cherny
Yeah, I mean, it's like, it's honestly, like, a weird approach. I, I, I feel like there's, like, all these features the last year where the first time someone pitched it, I was like, "Ah, no, no way. That's not gonna work." And I feel like over time, I've just learned, like, I'm actually wrong, like, so often now because, like, building on the model is so weird.
- CWCat Wu
Yeah.
- BCBoris Cherny
It's just, like, all this, like, engineering stuff that I've learned over the years, like, so much of it, I just have to, like, throw out. And this is just, like, part of what the job is now. Like, we're building on a new thing, and we just have to relearn it. And I-- auto mode was definitely one of these. I was like, the first time I heard it, I was like, "Route the prompt to a model? No way. That's not gonna work." And then it actually turns
- 10:24 – 11:06
Why loop is the next leap
- BCBoris Cherny
out empirically it works really, really well.
- CWCat Wu
I heard you also love loop.
- BCBoris Cherny
Um, I, yeah, I, I love loop.
- CWCat Wu
How do you use it?
- BCBoris Cherny
I think for loop, there's this transition that we went through, like, a year and a half ago where we were like, all right, there's source code, but actually the thing an engineer should interact with, maybe it's not the source code. Maybe it's the agent. And so we made this leap of, like, I don't write the source code. I talk to an agent, and the agent writes the source code for me. And I think right now what's happening is we're making the next leap. I don't talk to an agent anymore. I talk to a loop, or I talk to a routine, and it prompts Claude for me. And it's just, it's crazy. I mean, it's been like, it's a year and a half, and
- 11:06 – 13:30
How engineering orgs and responsibilities are changing
- BCBoris Cherny
this was like two big leaps. [chuckles]
- CWCat Wu
If you take, like, a step back, how are you seeing entire engineering orgs change?
- BCBoris Cherny
I'm gonna put, I'm gonna put on my business cat hat. I, I, I have this, like, favorite case study. This is, like, a Harvard Business Review from the nineties, and they were talking about, like, computers are here. Why are we not seeing the productivity benefits? And it's just this, like, amazing snapshot into, like, what it actually felt like at the time because, like, you know, people used to use mainframes. At some point, com-companies switched to personal computers. It was sort of a new thing, and companies were trying to figure out how to, how to use it, the same way they're trying to figure out how to use AI right now. And it turned out that to get the productivity benefits from computers, what you had to do isn't like you have your paper filing cabinet and your, like, paper and pen pro-business process, and then there's, like, a computer on the side that does something. Actually, what you have to do is you throw out the filing cabinet. You have to throw out all your paper and all your pens, and then you put a computer in the center, and everything has to run through the computer. It has to be at the center of every business process. And I, I feel like at Anthropic, we do this thing where when you onboard, you don't ask people questions. Like, no one asks me questions when they onboard. You probably have the same thing. They ask Claude.And this is kind of weird, like, this is the first company I've been at like that. And I feel like for us, Claude is just at the center of everything. Whenever I have a question, I ask Claude. Whenever I write code, I use Claude. Whenever I need a code review, Claude does it. Uh, whenever I need a security review, Claude does it. Whenever I need to, you know, fill out a form or something, Cowork does it. So it's just like Claude is at the center of everything, and I, I feel like the companies that are really figuring it out, and there's a bunch of them now, they're just putting Claude at the center of it.
- CWCat Wu
And I think for computers, the transition took 10 to 15 years. But actually for AI, because so much of our work is already, already digitized, and Claude can use a computer, and it can write code and run code, this transition's happening a lot faster.
- BCBoris Cherny
I think it's just, like, really... It's just really exciting. Like, I feel like now I don't have to bug people anymore, and when I interact with people, it's because it's, like, fun, and I get to collaborate with them on stuff, and we get to create something together. It's not that, like, I need them, I need something f- you know, from them, 'cause, like, Claude can actually do a lot of that stuff now. And I also feel like as an engineer, I've just never had this much fun doing engineering, 'cause the, like, the tedious part I don't have to do. Like, I'm just coming up with ideas. I'm talking-
- CWCat Wu
Yeah
- BCBoris Cherny
... to customers, and every idea, I don't, like, I don't have a to-do list anymore. Like, Claude just builds everything, and so my job is to come up with these ideas, and it's just so fun.
- 13:30 – 14:20
Is the future product or engineering?
- BCBoris Cherny
Okay, so here's a question. Is the future product or engineering? Like, is everyone gonna be a PM, or is everyone gonna be an engineer?
- CWCat Wu
Everyone's gonna be both. I, I feel pretty strongly that these roles are merging.
- BCBoris Cherny
Mm-hmm.
- CWCat Wu
Like, when we look at our team, our product team all writes code. Our DevRel team all writes code. Our design team all writes code. And then we look at our engineers, and a lot of them ship products end to end. They have an idea for what to build, they build it, they work with legal and marketing to figure out how we communicate this to the world and make sure it's safe and with security too, and a lot of times they just see through this whole process end to end. So I think right now, AI really benefits people who have a lot of curiosity, have a lot of product taste, who love to have this, like, end-to-end ownership. And now a lot of people are running,
- 14:20 – 16:05
Working with hundreds of agents: using agent view, voice mode, and Remote Control
- CWCat Wu
like, hundreds of agents. What are the products that you think people should be adopting as they transition from single to multiple to hundreds?
- BCBoris Cherny
Until recently, the way that I wrote code was I had, like, six terminal tabs with six Git checkouts of the same repo, and then I would just, like, tab between them. Now it's pretty different. I have, like, one tab. I use the new agent view that we just shipped. It's, like, so good, and I'm so glad that we took a while to iterate on it to make that really good. And I also use the desktop app because I don't have to fiddle with checkouts that way. It just, like, you know, it, it does the work tree cloning. Like, it, it creates the work trees for me. And the thing that I would not have expected six months ago is probably half my engineering now I do on my phone. So I just have [laughs] like, I have so many agents running that I just start from my phone. I use Remote Control, which is, like, amazing now, and, like, I'll start something on my computer, and then I'll just Remote Control in from my phone. And I'll just, like, walk around. I'll, like, get coffee, and then I'll check in on my agents, and maybe I'll start another agent. And sometimes I'm, like, talking to someone, and we come up with a new idea. I'll just start an agent on the spot. I'll, like, talk to it with voice mode and just have it build something, and I don't even have to go back to my computer anymore.
- CWCat Wu
I remember when you started doing this because you would actually leave work, have your computer on your desk open, plugged in, screen locked, and I just thought you would, like, come back to the office at some point to get your computer.
- BCBoris Cherny
[laughs]
- CWCat Wu
But then it would be, like, pretty late, and I was like, "Hmm, maybe he just, like, left it here by accident." And then it happened again the next day, [laughs] and it happened again the next day, and I was like, "Wait, this is so weird, 'cause you're landing PRs, so your computer is right next to me." And I remember you responding, and you're like, "Yeah, I'm coding from my couch."
- BCBoris Cherny
Yeah. [laughs]
- CWCat Wu
[laughs]
- BCBoris Cherny
That was the week that Remote Control got really good.
- CWCat Wu
Yeah.
- 16:05 – 17:17
From context engineering to context minimalism
- CWCat Wu
So another thing that users are asking about all the time is, how do you do context engineering, especially in a large enterprise?
- BCBoris Cherny
Right. This is a thing. You know, people used to talk about prompt engineering. They used to talk about context engineering. This is sort of matching where the model was at the time. Back in the days of Sonnet 3.5, you had to prompt engineer. Back in the days of Opus 4, you had to context engineer. But with the models of today, you don't do any of this. You give it the minimal possible system prompt, the minimal possible tools, and then you let the model figure it out. Like, you just have to give the model some way to pull in the context. I think that's the most important thing. How do you think about it?
- CWCat Wu
I see things very similarly. I'm a context minimalist, so my general philosophy is tell the model only what it needs to know and let it figure out the rest of it. Um, I think when you give the model too much context, it's kind of like you're micromanaging it, and sometimes the model knows a better way to get to the same outcome, and I personally prefer to give the model that freedom to do that. Um, and then in general, we're also making our harness more lean so that you have more room for your own prompts, um, and so that follows your prompts better.
- BCBoris Cherny
There's all these different ways to use Claude now,
- 17:17 – 18:06
What's next for Claude Code
- BCBoris Cherny
but I feel like in a year it's gonna be a totally new set of things, and it's gonna be so surprising [laughs] if it's still these same things. 'Cause I, I think, like, we're seeing these giant trends happening right now. Agents are running for longer. They're more autonomous. Very rarely am I running one agent at a time. It's usually, like, a few agents or dozens or hundreds or thousands, and so, like, the form factor for that, it's gonna be really different than what came before. And I don't know what it's gonna be, and I, I think in a large part it's gonna be up to the team to figure it out. And this is, um, this is why I'm, like, so happy we run the team the, the way that we do, where everyone just comes up with ideas and everyone is able to think about the product. Everyone talks to users all the time. Because I don't think these ideas are gonna come from us. It's gonna come from the team.
- CWCat Wu
Totally, and from everyone in our community building with us. [upbeat music]
Episode duration: 18:07
Install uListen for AI-powered chat & search across the full episode — Get Full Transcript
Transcript of episode Hth_tLaC2j8
