Beyond the basics with Claude Code

The mechanics that separate basic Claude Code use from real leverage: CLAUDE.md done well, wiring tools in with MCP, packaging team knowledge as skills, and using auto mode safely.

May 22, 202647mWatch on YouTube ↗

EVERY SPOKEN WORD

40 min read · 8,047 words

0:00 – 0:20
Intro
1. SPSpeaker
  [upbeat music]
0:20 – 3:22
From agentic programming to agentic software engineering (talk framing)
1. SPSpeaker
  Hey, everyone. Um, super excited to hang out with you for the next forty-five minutes. My name is Daisy Holman. Uh, I'm an engineer on the Claude Code team. Um, and I, uh, am gonna talk about beyond the basics with Claude Code. This is really, uh, a kind of, uh, next step. I really wanna talk about, more about agentic software engineering than agentic programming, if that makes any sense. This is more a talk targeted at software engineering environments and the kind of constraints that we run into in terms of customizing agents in those environments. So, uh, yeah, like I said, I work on Claude Code. I got the super awesome opportunity to get involved pretty early and have gotten to be involved in some really, really cool efforts, including plugins and agent teams. Um, and, uh, yeah, I come from a background in programming languages. I was once a chair on the C++ committee. Um, and I, I think a lot of the things that I was very interested in about programming languages really apply to agentic harness design. I think I'm very interested in making it easy for people to make, uh, their ideas into production regardless of how technical they are. Um, and I want software engineers to be a part of that too, that software engineering to be a, a part of a thing about ideas, and I think that, um, I think that Claude Code is one of the first times we've really been able to, to start doing that at scale. All right. So where we're headed. Um, we're mostly gonna be talking about ways to customize Claude Code. I think Claude Code itself works pretty well out of the box for very simple, uh, what I would call programming tasks. But, um, as you ratchet up the complexity and as you approach things that I would call software engineering tasks, you need to give it some knobs and whistles and, and some customization to make it work the way you want to. Um, we'll talk about how to think about the context window and, and how there's this analogy to software packaging and how that-- where that breaks down and, and how that works. Um, we'll talk about some of the key abstractions and plugins, and focus on which ones of them scale up to real large scale software engineering environments. Um, environments where you have hundreds or thousands or tens of thousands of engineers working on the same code base, and you need to disseminate information efficiently without filling up your context too quickly. Uh, and finally, I'm gonna, uh, if I have time, hopefully I will, uh, run through a few of the, the new things that we're doing with, um, Claude Code, the ways that we're starting to use it internally to develop Claude Code with Claude Code. Um, and, like, where we're, where we see the next year going or the next three months. The next year is who, who knows? Um, so
3:22 – 6:24
Why customization matters: access, knowledge, and tooling as the core needs
1. SPSpeaker
  first let's talk about why would you need to customize an agentic harness in general. And I, I do wanna-- I do say agentic harness in general, right? I mean, obviously I work on Claude Code. I hope you use Claude Code. I hope you like Claude Code. But I really am also interested in this as an academic question as to how do you customize the generic idea of an agentic harness with information, with connectivity, et cetera. Um, so there's three things that-- three categories of things you really need, um, access, knowledge, and tooling, and I'm gonna kind of break down each of these. This is, if there's one thesis of this whole talk that I want you to take away here, it's that if Claude can't do everything you can do, it can't do your job with you, right? Your job as a software engineer at this point is to make little clones of yourself, so you can scale up your abilities and scale up your work, um, in, you know, across many agents, right? And if Claude can't get to the things that you can get to, and I'm not just talking about source code, I'm talking about Slack messages, um, et cetera. I have another slide about this afterwards, right? I'm talking about emails. I'm talking about understanding the why of your tasks and not just understanding the what, right? The s- the source code doesn't often explain the motivation for why you need to make a change, a- and typing that out in a prompt and is, is not something you always wanna do because that information's already somewhere. So out of the box, Claude just sees a repo and a shell, right? This is-- This works okay, fine for, like, zero to one projects that don't have any conventions, that don't have any built up, uh, technical debt over time, that don't have, um, a wide range of stakeholders that, that Claude needs to understand stakeholder concerns for. Um, and, and, you know, like, when Claude can own everything, it's not all that important to, to customize, to bring in information from different sources. But, you know, this kind of vanilla Claude Code is, is rarely enough to do high quality software engineering at very large scales. I think sometimes you can get away with it if you're working on, like, very leaf software. But especially if anyone depends on you, especially if you have external stakeholders of any kind, you need to give Claude the tools to understand those, those concerns, right? They're not always in the source code. Um, they're not always in the documentation, right? Um, and, and most of the work in professional software engineering, um, especially at very large scalesDoesn't live in the actual source code. I've said this several times, right? We do design documents. We write emails to each other. We talk on Slack, right? This is a, a very important thing to keep in mind. This is why these zero to one projects work fine with no customization, but full-scale software engineering needs a lot of information. Um,
6:24 – 8:56
Access: connect Claude to the places decisions and signals live
1. SPSpeaker
  so let's talk about how this-- the-- what, what you might need to give Claude access to. Team chat, very first one, right? Where are your decisions being made? If you can see the entire conversation, um, in a, in a Slack thread about why you decided to implement this thing, Claude can figure out, you know, why this strategy might not work or why this might be better than that, right? Um, I think often when people get frustrated that Claude is taking the wrong direction, there's some information in your brain that you got from somewhere that Claude can't access. And if Claude can access that, it's much, much more likely to, to jive with your brain, right? Um, CI and CD, absolutely critical. You should not be fixing CI failures yourself at this point in time. Uh, agents are very, very good at that and will very likely continue to be in the future. Uh, dashboards, when something goes down in production, you need to be able to pull in a lot of information very quickly, and the reality is that you're going to compete-- be competing with companies who are doing this agentically. And so you need to be able to do it efficiently, agentically, um, with accuracy, right? And, and in a way that you can trust. And, and Claude needs to be able to see the why, right? This comes back to Claude being able to see the why. Internal documents, uh, design docs, run books, all kinds of other things. I-- We have largely started recording our meetings or transcribing our meetings, and I will go right after the meeting and feed the meeting notes into Claude and say, "Is there any ho- low-hanging fruit from this meeting that you can address?" And I'll get two or three PRs per meeting. I strongly suggest you do that. Claude needs to know why you want to do things in order to choose the best pathway, in order to, to work with you as a colleague. Here's the tip that I give. If you try doing a full day of work without leaving the Claude Code terminal or the desktop or whatever you use, right? Every time you have to reach for another tool, every time you have to Alt + Tab to something else and copy paste into Claude, that's something Claude is missing. Write it down on a piece of paper, and then at the end of the day, make s- try and find ways to connect Claude to all of those things. It will work a lot, lot better than you think. Like, the, the gap is much bigger than you notice until you make all of the connections. Um, knowledge is another reason why we need to cu- customize
8:56 – 11:29
Knowledge: why fine-tuning isn’t the answer and ICL is the main lever
1. SPSpeaker
  Claude. Um, like, we can't train your code base's conventions into the model, right? We can't train institutional memory into the model. Um, things that changed last week or things that are just yours, your internal vocabulary, your internal APIs. Um, fine-tuning doesn't really-- I always get questions about fine-tuning when it comes to this. Fine-tuning doesn't really work very well for this. I can talk a little bit more offline about why that is and what we're learning. There's some papers on this from late twenty twenty-five you might wanna look at about how, like, fine-tuning on specialized information can lead to more hallucinations. Um, and also, it's just not cost efficient, right? Frontier models are churning so quickly that by the time your company is done fine-tuning for something small-- Even if you're a big company, right, even if you're a very big company, um, fine-tuning on a model, the frontier model just isn't cost efficient at this point in time. So you need to do this all via, um, in context memory or in context learning, sorry. Um, you'll hear, hear the term ICL. Who's heard the term ICL before? Okay. Yeah. ICL means-- I-ICL is like a fancy word for when you want people to think you're smart, but you're actually just talking about text files. Um, I use it all the time. It's great. [laughs] Um, but seriously, right, uh, at scale, because of the bitter lesson, if you're familiar with the bitter lesson, right, general AI wins out over specialized AI in the long run. And we're really seeing that with frontier models now. And we can't realistically train anything into the model that's specific to your job or specific to your code base, right? So you have skills, you have tools, you have Claude.md. You have, um, a lot of tools there to get better results, but that's all you have. You don't have a way to affect the weights of the model. So, uh, I mean, there's good and bad there, right? Like, you don't need to understand anything about the weights of the model in order to customize, uh, its behavior. Um, and, like, all of the things that you can do with the model are just text files. So it's, like, really, really easy to, to get started, and people kind of leave it there. They're like, "Well, that's probably good enough." But there's a lot of optimization you can do with in-context learning. Um, yeah. Uh, tooling. Tooling is the other thing I like to think about. Like, what does an IDE
11:29 – 14:31
Tooling: building the agentic equivalent of IDE feedback loops
1. SPSpeaker
  for Claude look like, right? For humans, like, if you remember back two years ago when you used to, like, write code by hand, um, probably-- Is there anyone here who wants to admit they did not use syntax highlighting? I mean, that's, right, like, uh, all of us who have written code before professionally use syntax highlighting. Most of us probably use some sort of LSP, probably use some sort of code completion, um, all of those kinds of things. Claude has none of those out of the box, right? It has an edit tool, and it literally has to write the string it wants to replace verbatim and write the text it wants to replace-It with, right? This is like we haven't even invented VIM yet. This is ED. Um, I hope none of you have ever had to use ED except by choice, but if you have, you know how hard it is to do any kind of real, uh, text editing. Um, what you want to be thinking about in these kinds of customizations is what does the agentic version of VS Code look like? What does the agentic version of code completion look like? What does the agentic version of red squigglies, I like to say, the, the, the little squiggly red lines when you type a variable wrong, right? Or when you give the wrong number of arguments to a function. The red squigglies, think about what they do to your brain, right? They kind of... They, they nudge you in a direction without completely stopping you, right? So you're like, "Wait, should I think about that again? Oh, no, I know that that variable is correct. I just haven't defined the function yet. I'm gonna go up and define it later," right? Um, and you can ignore them, but it reminds you to think twice, and we want something similar to that for agents. Post tool use hooks are perfect for this. I'm gonna get into hooks a little bit later for those of you not familiar, but, like, this is, this is the, this is the red squigglies for your agent, right? You can run linters. You can do-- We have-- We support LSPs in, uh, Claude Code, LSP connections. Um, all of that kind of feedback, like reminders to Claude. Here's another, a great example, a generated file, right? Like, you don't want to hard block Claude from editing a generated file, right? Maybe you do in your code base, um, but, like, certainly at the harness level we can't. You probably want to remind it that it's a generated file because maybe it has a good reason for editing the generated file just to, like, try something out and see if it works. And as long as it gets a reminder that this is a generated file and you shouldn't commit it to the code base and you shouldn't-- it's very likely to remember to revert those changes and put them in the right place or if it has forgotten, then it will stop and undo it usually, right? It's a red squiggly. It's exactly what we want, right? The, the fastest way to make your agent better at your code base isn't a smarter model, it's a tighter feedback loop. And, and the key here is that,
14:31 – 16:33
Tools that scale with intelligence vs. tools that compensate for it
1. SPSpeaker
  like, most of these are scripts you already have because you had to set up environments for developers, right? You had to have a way for humans to edit your code base. So you already have all of the things you need. You just need to hook them up the right way. You don't need to reinvent the wheel here. You just need to give Claude the tools. Um, I also like to think of this in terms of two kinds of tools. Um, there's tools that compensate for a lack of intelligence, and there's tools that scale with intelligence, right? I think the re-- think of the red squigglies as the second one, right? They're a nudge, a reminder of something you might have forgotten, but they're an overridable nudge, right? If you know something special that means that the red squiggly is incorrect, you can keep going, right? Whereas if you were to, like, hard block Claude from ever writing a variable that's not, you know, ever using a variable that's not defined somewhere else, right? That would, I guess, in theory, lead to fewer mistakes, but that doesn't really scale up well with intelligence, right? It kind of makes Claude write code in a very specific order, and that isn't, uh, you know, isn't gonna lead to good results. I actually asked Claude for, uh, [chuckles] good examples of, uh, tools that we shouldn't give it, ones that compensate for lack of intelligence, and it said, "Oh, I don't like it when you take away my tools." [chuckles] Which I thought was cute. Uh, yeah, so I mean, to me, the, the AGI pilled approach here, right, is to think about tools that scale with intelligence. As the models get better, these tools get more useful to the model. Um, so I wanna thank-- I wanna go through a little bit of kind of the mechanics of how this, these customizations work, and I want to ground it in thinking about this context window. Um, so I-- this
16:33 – 19:36
Context window engineering: fixed budgets and the ‘Arduino running npm’ analogy
1. SPSpeaker
  is a kind of Beyond the basics talk. I, I assume most of you are familiar with the concept of a context window. Um, yeah, I'm seeing a lot of nods. Good. Um, in Claude, uh, Opus four point seven, it's like one million tokens usually. Interesting thing here is that context windows aren't really growing. Like, if you look at the leading frontier models from a year ago, they were mostly one million token context models. There were some two hundred thousand token context models, and there's a lot fewer of those now, and there's a lot more of the million token. But, like, otherwise, like, the frontier of context windows hasn't changed and the models have gotten way better, right? The models from now versus the models from a year ago are a lot more-- there's a lot more change than just the size of the context windows, right? And the size of the context windows are remaining relatively constant. So you kind of have a fixed target for how you need to do your context engineering, right? Uh, like I said earlier, right, your tool that you have to make your output better is in context learning, and that means that everything, every customization you put into the model has to go into this context window in some form. We're gonna talk about more scalable ways to do that, right? You can't just dump your whole code base, you can't dump your whole wiki in there, right? You can't dump all of your internal docs into the, into the context window. You need to figure out better ways to get the right information in at the right time. Um, so yeah, the other way I like to think about this is that, like-We have a really constrained amount of space to put a lot of information into, and it's kind of a unique problem. I like to say it's like trying to run npm on an Arduino, right? You've got a, a tiny bit of memory, and you've gotta figure out the very most important things to put in there, and you wanna put the smallest version of it that you can in there in order to leave enough room to, to do real work, right? If you, if you were to install packages willy-nilly on an Arduino, you're not gonna le- leave any room for your own code. It's kind of the same idea, right? You have to be very intentional about what you put into your context window. Um, don't pay for what you don't use, uh, which is, like, a, I think, a famous, uh, originally C++ quote, right? It's a zero overhead abstraction principle. It's not just a nice-to-have here, right? It's not like, well, we'll throw more compute power at it. Like, we are fundamentally at a limit or what looks like a limit, I mean, I might eat my words, but what looks like a limit of context window size, and it's not getting bigger, right? So, so you... The only way to get more efficient at putting information into the model is to get better at not paying for what you don't use. Um, there's one more thing I wanna talk about that kind of makes this even more complicated
19:36 – 21:37
Why caching changes the rules: KV cache and stable-vs-volatile prompt layout
1. SPSpeaker
  and kind of makes the analogy fall apart actually. Um, like, the first thing I thought of when I saw this was like... this, this problem space was like, "Oh, well, we already know how to make caches," right? Like, y- the L1 cache doesn't hold all of memory 'cause it, it's very constrained, and, like, we just evict things we haven't used recently, right? Um, uh, the problem is that there's this other constraint that we have called the KV cache, and the KV cache pretty heavily determines how expensive it is to calculate the next token, right? So if you go and change something really early in the prompt, you're gonna end up paying for uncached tokens, which cost 10 times as much, for all of the rest of your context window, uh, after that change, right? So if you wanted to, like, only include a certain number of tools and then, like, evict one that hasn't been used and replace it with one that needs to be used sooner, you can't do that, right? You, you can't actually take anything out of this tools block without invalidating the entire rest of the cache. Um, and that's, that's actually a really hard constraint to work around, right? I think some of the early approaches to agentic customization did take a very, like, LRU cache approach, and I think it, um, I think it made more sense when we had 32,000 token context windows maybe and, and, like, all of the tokens were expensive no matter what and, like, KV caching wasn't as efficient. But, like, none of that's true anymore, right? These-- You should think of these tokens as cheap and these tokens as expensive, right? And you're gonna pay a whole lot for a lot of expensive tokens, uh, just to, like, save some context windows. So you, you really have to think about putting stable shared stuff at the v- very front and volatile, like, per task information closer
21:37 – 22:38
Plugin primitives under scale: evaluating MCP, skills, hooks, and agents for monorepos
1. SPSpeaker
  to the end, right, so that you can evict it without too much cost. There's a, there's a lot of complexity here, and I don't think we're even close to, like, solving this problem. Um, but we spend a lot of time thinking about it. I mean, these are the kinds of things we think about day-to-day on the Claude Code team. Um, so let's look at these plugin abstractions. I really wanna look at this in the context of large scale software engineering, of monorepos where, like, each one of these I want you to ask, "What happens if I have 10,000 of them? What happens if I have 100,000 of them?" There are people, there are companies out there right now with tens of thousands or hundreds of thousands of skills in their monorepo, and they're really hitting a scaling boundary because of that, and I'm gonna talk about why. Um, but... Uh-oh. Is, is my... Oh, there we go. Slides are okay. We're good. Um, so yeah. The four plugin primitives that I really wanna examine in this light are, uh, MCP,
22:38 – 27:44
MCP at scale: great for public integrations, heavy for developer-internal workflows
1. SPSpeaker
  skills, hooks, and agents. Um, if you've ever written a plugin, you may be familiar with some of these, uh, not all of them. There are other customization points in the plugin spec, um, that I'm not gonna talk about. A lot of this carries over. Um, a lot of this thinking carries over between these kinds of, um, customizations. But anyway, let's dive into MCP. Probably if you're in an advanced workshop, I would guess that you've-- you're familiar with MCP. You've heard of it before. It's been around for a while. Um, the biggest thing to know about it is that it was, like, designed in an era where, like, agents were much simpler or LLMs were much simpler, right? It was designed primarily, initially at least, to work with chatbots, right? And your chatbot is usually running serverless, um, or is usually not running in a container. It doesn't have access to files on your computer. It can't run commands. It can't use a CLI. Um, and it, it's a way to kind of inject more tools into the, into the context. Um, it has some nice properties, like it's transport agnostic. It mostly handles auth for you. It's, it's meant to be the thing that if your company wants to ship an integration with Claude, that integration that's shipped to the public from your company should probably be an MCP server, or at least at first should probably be an MCP server. There's probably some other things you can add on, but this is, like, the general public version.Um, but we're, in this talk, talking about professional software engineering environments, we're talking about large scale mono repos. We're talking about, uh, developers working together on the same piece of code, and how do we share customizations and share information across agents. Um, and like there's a... Oh, yeah, I said this about all the properties. Um, but like it's assumes that it doesn't have a shell. Claude Code does have a shell, right? So if you already have a CLI, it doesn't make a whole lot of sense to wrap that CLI in MCP unless you're shipping it to non-technical customers. I think that's, that's kind of the rule of thumb, right? Uh, usually a skill that just tells Claude how to use the CLI is much easier to write up. And I think often when you're talking about like developer experience for your developers at a large company with lots of, of code that needs to interact with each other, um, you're almost always gonna be shipping skills, um, or customizing with skills and not MCP servers. Now, you still need to use other people's MCP servers, right? You still are gonna need to use an MCP server to connect to Slack at, at this point. You're still gonna need to use MCP servers to connect to email and all of those things I talked about at the very beginning. Um, let's talk about this. Does it scale? What happens if you have ten thousand of these and you wanna put them all... You know, you wanna have them all available to your agent? Um, it has to put the name, the description, and the schema in the system prompt. So for each tool so that Claude knows how to call the tool, right? Um, if you have even twenty servers with fifteen tools each, your-- most of your context windows starts to be, um, tool definitions. So it doesn't scale without help. Um, we have a new kind of approach to this called tool search that mostly works. It does exactly what you think. We put just the names in the system prompt, and then we tell Claude that it has a, it has a tool that it can use to search for tools to use, um, later down in the, the transcript. And if we-- if it finds a tool, it will give it the description and the schema at that point, so it's kind of lazy loaded. The problem is that, like, unless it's something very specific like Slack and the user mentions Slack, Claude's not necessarily gonna know that it needs to search for a tool. So things like edit tool, things like bash tool that are very generic, we end up usually having to put those in the system prompt with their schema directly. Um, and so it-- And the other thing is that, like, the more of the description you put into the system prompt, you can, like, collapse down this description, right? You can do various forms of this description, how much you wanna put or how little you wanna put. And the more of the description you put into the system prompt, the more likely Claude is to search for the tool. So it's like there's, there's not a free lunch here. It is a slightly less expensive lunch. Um, it also doesn't fix the problems with like setting up auth and process life cycle and all of the other things that go with MCP. If you're developing code in your source-- Like, if, if, if your user is a developer within
27:44 – 30:17
Skills: ‘lazy system prompts’ that help—but still carry token and triggering costs
1. SPSpeaker
  your company and they already have access to your source code, right? Like, you probably can do most of the things already. You don't need to set up this whole auth life cycle to make sure the MCP works everywhere and all of the other things that are involved in that. So like using the CLI with a skill is, is a great way to tell Claude how to do things, um, especially for scripts that you already have. So speaking of skills, uh, who is familiar with skills? Like, who says thinks they could... Okay, everyone's familiar with it. You-- Um, the, the term I used to use, um, when describing it when it first came out is that it's like a lazy system prompt, right? It is a CLAUDE.md file with a miniature version that tells Claude when it should read that file, right? Um, there's a one-line description that goes in the front matter that ends up getting put into the system prompt. And, um, Claude has a tool that it can use to load the, the full skill.md and, like, get to all of the scripts and other resources that are in the directory. But fundamentally, a skill is just a folder. It's a folder with a markdown file in it that happens to have some sort of summary associated with it. Um, and so it's like really easy to set this kind of thing up in your repository. That can be good and can be bad. You definitely need to be careful about figuring out how to control the quality of skills in your monorepo, right? Because, because it's so easy to create a new one. Um, let's talk about whether it scales, right? Um, the, the body is pay per use, which is a good thing, right? But the description is always loaded, so you're always paying some fraction, some small fraction of that body in your system prompt. So it's not quite zero overhead in terms of abstraction. Um, reliably trigging-- triggering skills still sometimes takes up to a paragraph, something like three hundred, four hundred tokens sometimes. Um, and like, again, the more you cut out of that description, the less likely it is to reliably trigger without the user explicitly saying something, right? Um, and no, skills don't have a defined way to do hierarchy yet. Um, you can't lazily expose sub-skills. We are working on this. Uh, stay tuned next couple of weeks. Hopefully, it'll be a really cool announcement about that. Um-Yeah. So it kind
30:17 – 33:20
Hooks: true zero-token-overhead gating and the best ‘red squiggles’ primitive
1. SPSpeaker
  of scales. Um, I think we had hoped when we started down that trail, that pathway, that it would scale better, but we also didn't really think ahead to this time period where monorepos would have 100,000 skills. It's just such a massive amount of information, and you really need actual zero overhead abstractions. Um, so speaking of which, hooks. Hooks can't do everything. They're not perfect, but they are an actual zero overhead abstraction here, right? We give you a bunch of different event types to, um, trigger on, and then we just call this script. We call a script that you give us, um, in a special way. There's like, uh, there's a, a JSON format that you pa- you can pass to it, and there's a JSON format that it'll pass back in order to determine whether or not it needs to insert something into the context window. You can look up all of that on the website. Also, Claude knows how to use them and create them very well. Um, fundamentally, no, they're not complicated, right? Something happens in the agentic loop, and it triggers something on your computer to run, and that thing runs and decides if it wants to insert something into the context window or not. So you can have 100,000 of these, and if you have a big enough computer and 99,995 of them, uh, don't trigger, don't match, or don't return any text to put into the context window, your only constraint is your computer, right? You've taken a very constrained resource and kind of blown it out into a much less constrained resource. And when you think about systems like this, that's, that's really what you wanna be looking for, the property you wanna be looking for, right? It runs outside the context window, so there's zero token cost. Um, I mean, again, like, if you have a JavaScript script skill and you're writing Rust, you still pay for this little description in the front that says, like, um, you know, "Use this skill when you're writing JavaScript," and then Claude has to, like, ignore that little description. Um, but if you have a hook that, like, type checks your JavaScript code and you're writing Rust, the hook runs, sees that it's not a JavaScript file, and then stops and doesn't return anything, right? Like, you don't pay for what you don't use. Um, it doesn't work for everything, right? Like, it's not the most AGI pilled thing, right? You, you end up doing things like parsing individual words or regexes out of the, um, out of the, uh, uh, commands or out of the tool calls or whatever. Um, and there are some limitations there. You can use, like, subagents to kind of decide whether or not you wanna inject something, but that starts to get expensive from a token perspective. So there's, there's, like, a lot of trade-offs here. Again, no free lunch, but, like, maybe a little cheaper. Um, this is where our red squigglies live, like I was talking about earlier, right? Um,
33:20 – 37:26
Subagents and ‘what not to do’: avoid unconditional CLAUDE.md injection; memory is different
1. SPSpeaker
  subagents, I'm gonna breeze through this a little bit 'cause a lot of the concerns are pretty similar. But again, the thing you wanna think about here, uh, subagents are structured as a description that goes into the system prompt, and then a, a system prompt for the subagent or a, a set of text, in context learning, um, for the, uh, for the subagent so that it can perform a specific task. And you only pay for those tokens in a separate context, right? So all you're paying for is the tool call of, i- in the main context, and the result from the subagent. Um, but the, the system prompt of the subagent goes into a different context, right? And by, by pay, I mean, like, in terms of cost in your context window, right? Like, obviously tokens aren't just free because you're using a subagent. But I, I'm talking much more about, uh, the challenge of, you know, splitting up this one context window that an agent can have. Um, right. So, like, an, an agent can read 50 files so that the main loop doesn't have to. So they're scalable in that way, but they still have the same problem that each agent's description still sits in the parent prompt and still has the same, like, one-liner text. So if you have 100,000 of these in a monorepo, you're still paying for, like, 100,000 one-liner descriptions. And we can sort of start to do better than that, just like with skills. We're kind of experimenting with a number of ways of doing this. Um, but it's, it's not perfect, right? Um, what are some things that aren't in this list? CLAUDE.md. So one of the first and still most frequent requests I get for plugins is, "Why can't I provide a CLAUDE.md file for my plugin? Why can't I provide a system prompt, a piece of the system prompt that goes unconditionally into when-- into the user's context when the plugin is enabled?" And, like, after this discussion, I think you can all see why I've pushed back on that so much, right? It-- not only is it an extremely expensive abstraction, but it looks super cheap, right? Like, if we allowed plugins to provide a CLAUDE.md file, every single plugin would provide one. Like, pretty much every single plugin would provide one. They'd be like, "Hey, you're also using this plugin," and then, like, a little bit of text. And that doesn't scale. It really doesn't scale. And it looks like it does, right? It, it looks so cheap 'cause it's just a single file. So what we do is that, like, if you really, really wanna do this, then you can return some text from a session start hook. And in that case, like, it's very clear that you are making the user pay something unconditionally every time. And it's, like, a super roundabout and kind of annoying way of doing this, but I think it's actually the right abstraction for building scalable ecosystems of plugins.Um, memory is a different kind of animal here, right? I really want s- you to come away thinking about plugins as a context engineering primitive, right? Which is another way of saying text file, but, like, with more, like, funding. I don't know. Um, context engineering primitives are iterated on, they are evaluated, they are not things that are made on the fly by, uh, an agent in the background, right? And memory has its place. It's kind of low quality, low cost, um, short-lived information, right? Uh, whereas these plugins, we want you to think of as a way to manipulate the context into giving you better results. So memory doesn't really fit into this, this, uh, category either. Okay. That was a little bit of a whirlwind. Um, but I hope that part was helpful. I'm gonna dive into a few more things here about, like, where we see all of this going, and, like, how do we use Claude Code on the Claude Code team, and, and what do we see happening
37:26 – 42:00
How Anthropic uses Claude Code: asynchrony, parallelism, and worktrees as the foundation
1. SPSpeaker
  going forward. Um, the one big theme of all of this is gonna be, like... Well, I guess two big themes of all of this are gonna be asynchrony and parallelism, right? Asynchrony, where you can walk away from the computer, let it work for a while, and come back, and, um, parallelism, where you, you really wanna be doing multiple of these things at a time. And the combination of those two things means that you just really are gonna have to get good at context switching. And I hate it, and I know a lot of software engineers hate it. I was the programmer's programmer. I used to give talks on C++ meta programming, right? Template meta programming. And I loved to get into flow state for eight hours, and, like, look up, you know, at the clock later and be like, "Oh my gosh, it's after midnight. I can't believe it." But that's just, like, if you want to do, uh, high quality, high efficiency engineering these days, it's-- your workdays are likely not gonna look like that. Um, and so, like, figuring out ways to get yourself to be efficient in your context switching is a really important part of this. Worktrees are one of the simplest ways, like baseline ways, to onboard into this. If you're not familiar with Git Worktrees, which I think a lot more people are at this point because of agentic coding, um, they're basically just different checkouts of the same repo on your machine. There's some special fanciness in there where it, like, Git does some, like, lazy symlinking, and then the symlink gets replaced when you edit or something like that, so that you don't use up too much disk. But basically, they're just different checkouts in different folders, uh, of the same repository. Uh, and this-- If you put a different Claude Code instance on each, um... You put a different Claude Code instance on each Worktree, then you get, get them to not step on each other, right? This is just like the way that you work with colleagues back in the day when you wrote code by hand. You each had a different checkout. You were working on different things. Um, Worktrees are the same, but you are now one level up as a technical lead of multiple Claudes. Um, one thing that I find that helps me with context switching, uh, is to rename my sessions and change the color. Um, color actually does trigger memory pretty efficiently, right? I kind of think of this as the, like, syntax highlighting for humans in the agentic era. Um, slash color is a, a super efficient way for me to very quickly kind of click my brain into what I was doing in that session. Um, rename also helps with that, especially if you're color blind. Um, rename is-- will, will get you some of that too. Um, but it's a really easy way to quickly remember what you were doing as you switch between sessions. Like, the, the more you can cut down on that context switching time, the more efficient you're gonna be. So this is what my actual setup looks like. I usually... I have a whole bunch of kind of permanent long-lived, um, Worktrees that all track upstream main. Um, and then, like, these are the two Anthropic monorepos, because two monorepos is the way that monorepos work at most companies. Um, and then here's the Claude Code repos, uh, because Claude Code is now not in the monorepos because monorepo engineering is hard. Um, anyway, but I have a whole bunch of checkouts, and I have persistent agents. I've just recently switched to doing, like, even longer-lived agents that kind of own their own directories. Um, the one that make, made this presentation was very-- wanted to identify itself as, as Agent N. Um, but they're all checked out as separate Worktrees. They all track upstream main, right? But because they're different Worktrees, they have to have different names. That's just how Git Worktrees work. I don't know why. But, um, they're, they're differently named branches that all track the upstream main. I found this workflow to be, like, really efficient. You don't have to-- If you-- If these Worktrees are long-lived, you don't have to, like, run NPM init or Cargo init or whatever you have to do at the very beginning of checking out the Worktree. You don't have to make all the symlinks. It's just kind of long-lived. Um, yeah. Uh, and keeping, uh, each agent keeping its identity is kind of important here. Um,
42:00 – 46:33
Multi-Claude coordination and automation: send-message, /loop, permissions mode, agents view, remote control
1. SPSpeaker
  Claudes that talk to each other. So this is something we released in some form back in January. Um, and we are working on improving it more and more. We're giving-- We're relaxing the constraints on send message tool literally, like, as we speak. Um, Claudes can send messages to other Claudes. Eventually, it should be any other Claude.Uh, with your permission, any other Claude, uh, running on the same account, uh, should be able to talk to each other. And this is really, really helpful if you have one that's working on something and you need to get it to explain, um, something to another one, right? Remember how I was saying, like, all of the places you work need to be accessible to Claude? Like, one of those places that you work now is another Claude. And so the Claude that's over here working on something needs to have access to the information from that conversation in some form. Again, probably with your permission. There are very many valid re- re- reasons to keep these things separate if you wanna do redundancy, if you wanna do testing or all kinds of other things. But by default, they should be able to talk to each other. Um, yeah. So it... Yeah, the, the send message tool. Um, /loop we recently launched, is super, super helpful. Um, it'll just-- it literally just runs a prompt every fixed interval of time. So every 10 minutes it'll run this prompt. Um, and, uh, yeah, uh, Claude has the ability to turn it off when it knows the prompt is no longer relevant. It... Literally the internal name for the tool is cron tool, and /loop is just a text command that tells Claude to use cron tool. Um, babysitting PRs with /loop, super, super useful. Like, really, really helps you, um, pipeline a lot better. It helps you parallelize your work a lot better. You kind of... Once it gets to CI, even if your CI takes two hours to run, you can just leave it for the next, you know, day and a half, and it'll fix all of the CI bugs. Uh, it's really been a game changer for us. Um, yes. Permissions mode. Who uses auto mode? Okay. Should all be using auto mode. Uh, unless you're using dangerously skip permissions, which I'm definitely not recommending. But this is basically, uh, not dangerously skip permissions, right? It's, um, it, it has a whole bunch of other infrastructure around it. I think we put out a blog post about it, but it's, there's basically a, um, a classifier agent, and then another agent that's, like, adversarially checks the tool call to make sure that there's nothing, um, bad happening. It has a lot of instructions. It's basically no more per- permission prompts. This is what makes loop usable. This is what makes agent teams usable. This is what makes overnight work usable. It's a little expensive. Uh, it can be on the order of 30% to 40% more because you are, like, using quite a lot of extra tokens. Um, I don't actually know the number off the top of my head, so don't quote me on that. It could be less than that now. Uh, we're working on getting it down. Um, Claude Agents we just launched. I know I'm running out of time, but give me, like, two more minutes. Um, Claude Agents is one place where you can see all of your agents that are running. It has a little classifier that, that, like, moves them around as they get into different states, right? Um, and it'll show which ones are working, show which ones are blocked. You can send prompts directly to it from this one session. You can hit enter to jump into the session. You can peek in the session. You can send prompts to start a new session, all from this one view. Um, it's actually really, really, um, impressive. It works really well. The engineer that put this together went through, like, 1,000 PRs in the past month, um, using this to, to build on itself basically. Um, so, like, there's, there's some really high quality speed ups here, uh, in terms of your context switch- switching latency. Remote control. If you're not using remote control, you should absolutely use remote control. It's fantastic. It shows things on your phone. It also sh- shows things on Claude Code Desktop. Um, it is a great way to do a 30-second check-in after dinner to make sure that your agents are still running overnight and aren't stuck
46:33 – 47:18
Closing takeaways: give access, mind the box, and choose abstractions that scale
1. SPSpeaker
  on something dumb. Um, so three take-homes. Uh, give it access. Mind the box. I didn't come up with that, Claude did. Um, but yeah, I mean, think about your context window. Um, and pick abstractions that scale. Think about what your plug-ins are going to look like when you have 100,000 lines, 100 million lines in your monorepo, 100,000 skills. Um, and I'll take questions. I'm over time, so I'm gonna take questions off stage. But I just wanna thank, uh, everyone for coming and everyone for listening and, uh, hope you enjoy the rest of your, uh, Code with Claude. [upbeat music]

Episode duration: 47:19

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode tuY2ChJIx48

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

iOS

Android

Claude

Chrome

Intro

From agentic programming to agentic software engineering (talk framing)

Why customization matters: access, knowledge, and tooling as the core needs

Access: connect Claude to the places decisions and signals live

Knowledge: why fine-tuning isn’t the answer and ICL is the main lever

Tooling: building the agentic equivalent of IDE feedback loops

Tools that scale with intelligence vs. tools that compensate for it

Context window engineering: fixed budgets and the ‘Arduino running npm’ analogy

Why caching changes the rules: KV cache and stable-vs-volatile prompt layout

Plugin primitives under scale: evaluating MCP, skills, hooks, and agents for monorepos

MCP at scale: great for public integrations, heavy for developer-internal workflows

Skills: ‘lazy system prompts’ that help—but still carry token and triggering costs

Hooks: true zero-token-overhead gating and the best ‘red squiggles’ primitive

Subagents and ‘what not to do’: avoid unconditional CLAUDE.md injection; memory is different

How Anthropic uses Claude Code: asynchrony, parallelism, and worktrees as the foundation

Multi-Claude coordination and automation: send-message, /loop, permissions mode, agents view, remote control

Closing takeaways: give access, mind the box, and choose abstractions that scale

Get more out of YouTube videos.