Skip to content
ClaudeClaude

Stop babysitting your agents

Routines turn Claude Code into a proactive teammate that reads your repo and opens a PR before you've opened your laptop. You'll see one built end to end, learn the trigger, context, and steering decisions behind any routine, and leave one /schedule command away from your first.

May 20, 202637mWatch on YouTube ↗

EVERY SPOKEN WORD

  1. SP

    [on hold music] Good afternoon, everybody. Um, my name is Sid Bundusaria. Um, I'm one of the founding engineers of Cloud Code. Um, and today I'm excited to be here to talk to you guys about how you can stop babysitting your agents. Um, as models have been getting smarter, um, I've noticed that we're increasingly spending a larger percentage of our time staring at the screen waiting for Claude to finish its work or just acting as a glorified QA tester for Claude. And this can be quite unsatisfying and also, uh, just an inefficient use of your time. And my goal for this talk is to, uh, give you strategies and help you take back some of this time so that you can manage your agents better. Um, you could also think of this as a, um, a more advanced Claude Code talk. So a Claude Code 301 type university class. Um, and because of that, we have some prerequisites, uh, and some table stakes that, uh, everyone here should have at least heard about, if not implemented for your own projects. Um, starting with a very high-quality Claude MD file. Uh, this is the single highest leverage thing that you can do to improve your Claude Code experience. So if you haven't done this yet, highly encourage you to try it out. Uh, number two is connecting your tools to Claude Code. Um, a good rule of thumb is that if a tool is useful for you in your day-to-day life, it will also be useful for Claude. So things like, you know, Slack, um, Asana, Linear, Datadog, BigQuery. All of these things help Claude stitch together a much richer context for itself. Uh, and it's able to perform much better if you give it access to these tools. And finally, setting up, uh, your remote environment on Claude Code Web. Um, this makes it so that the compute that's running your Claude Code is separated or decoupled from your laptop. So you can close your laptop, your laptop could die, you could spill some water on your laptop, and your Claude Code sessions will still continue because they're running in the cloud. Um, I'd love to see a show of hands here. Um, how many people use Claude Code, uh, every day? Okay, that's almost everyone. Uh, how many people have completed the first two things here? So high-quality Claude MD, and they've connect-- have you've connected your tools. Okay, so about fifty percent I'd say. And then how many people have done all three? Okay. Uh, if you haven't raised your hand at all, don't worry, you'll still get some value out of this talk. But I would encourage you to start with these three things first. Okay, so why does your tooling need to change? Uh, most software tooling so far was built with, uh, humans in mind. Um, you know, whether it's linters, IDEs, Prettiers, type checkers, even compilers, they were mostly written with, uh, with the goal of making humans and human teams faster. But the problem now is that humans aren't writing most of our code anymore. It, it's agents. So we have to take a step back, zoom out, and reconsider our tooling. And when you do that, there's, uh, there's some good news, and then there's some bad news. Uh, the good news is that a lot of these tools that we've built for ourselves translate over pretty well for agents as well. So things like Prettiers and linters and symbol servers, Claude and agents can end up using these things quite effectively. Um, and they serve them pretty well. But the, the bad news is that we also have blind spots. As human beings, we have some assumptions that we make about our tooling and our tool chain that Claude doesn't have. And for that reason, it's important to ask the question: What does an agent need from your code base that a human takes for granted? And I'd love for you guys to keep that question in mind as we continue to the rest of the talk because it kind of frames, uh, frames the, the goal of not babysitting your agents as much in, in a much more clear way. So this is our roadmap for today. Uh, we'll be talking about three distinct, uh, uh, three distinct things that build on top of each other. Um, and when you take all of these three things together, they become incredibly powerful and give you a set of tools that, uh, can help you work in a way that we just haven't w-worked before a-as, as human beings. Uh, so we'll be talking about verification, which is how to teach Claude to check its own work. Uh, once Claude can check its own work and be more reliable, we can now run many Claudes at the same time and be confident that they'll be doing the right thing. So we'll be talking about strategies for multi-Clauding or parallelizing your work. And then finally, we'll end with background loops. And background loops are a way for you to completely take your keyboard out of the, uh, the hot path. So your keyboard is not the bottleneck anymore, and Claude just keeps running in the background in a loop, doing useful work for you. Um, so I'd like to start the verification section with, uh, a, a brainstorm for, for a minute or so. Um, I'd like everyone here to think about the last software project or feature that you worked on.And while you were working on that feature, how did you check your own work? And I don't just mean how did you check the final output of your work, but I also mean how did you iterate on your work in a way that gave you confidence that you will end up in a place, uh, wh- where you're expecting to go? So let's take 30 seconds. If you have a pen and paper in front of you, feel free to jot this down. If you have a laptop and you wanna, like, put this in your notes, let's take 30 seconds together and just, like, come up with, uh, come up with, um, um, your last project and kind of how you verified your work there. Okay, I see some typing slowing down. So, um, hopefully you've had a chance to think about it a little bit. Uh, it's okay if you haven't completely. But I've found that most software engineering tasks can be broken down into the series of steps that you see on the screen. Um, some, some combination or se- sequence or subset of these, of these things, uh, enable you to check your own work and build software. So you kind of start with designing and writing code. Um, you then usually end up building your code, running your compilers, type checkers, et cetera. If they fail, you kind of go back and change your code again and run it, and, you know, do that in a loop. Then you might run your, run your executable, whether that's a, a Docker container or a CLI application or a web server. Um, and then you might check for side effects. So if you're running, uh, a web server, you might spin up your browser, and you might see if the, if the UI elements are showing up in the correct place. Uh, you might even look for logs to see if, if, is, is, is a specific log that you're looking for present in your, uh, in your logs. Or you might check the database to see what the state is and if, if state has been manipulated correctly. Um, and then hopefully you'll run unit tests to make sure that you haven't made any regressions and your feature hasn't, uh, broken some other feature. And, and, and hopefully you also add a new unit test for your-- for the thing that you're working on. And then finally, you deploy to staging, or, or if you're really brave, you, you go straight to product. Um, and that's usually how humans kind of verify their work and build software. And what's interesting is that the same exact playbook can be used by Claude quite effectively to also verify its own work and build software. So as we go through the rest of this presentation, um, it's, it's helpful to think about teaching Claude how to do things in a similar way that you would do them. And the only thing that's required is giving Claude the right tools and instruction set to make this possible. Okay, so we've talked about verification, um, how humans do verification and how Claude should theoretically do verification. Uh, but loops are really what makes the whole thing go round. Um, and this is arguably the most important slide i-in this presentation. So if you haven't been paying attention yet, this is, this is a good time to get started. Um, a loop essentially is an autonomous circuit that you can complete for Claude, and it allows Claude to hill climb, um, hi- hill climb on a given task or a given success criteria. So you can think about it as giving Claude, uh, access to tools to verify its own work and to write code. And what Claude will do is it will write some code. It will check if there's a failure. If there's a failure, it will debug that failure and write some more code. And then it keeps doing that in a loop again and again and again until it gets to a success state. And when it finally gets to a success state, you can be confident that it-- the, the PR that it, it's, it's sending you is higher quality and will actually work. So in, in this image that you see on the screen, um, I faced an issue recently where on my personal website, the sign-up button stopped working. And what I told Claude was to make the sign-up button work, and this is kind of what it did. Um, it-- there's more steps here too, but for, for brevity's sake, it basically started writing some code. It, it built my app. It clicked my sign-up button, opened up a browser, and saw that the sign-up-- clicking the sign-up button isn't really doing anything. It doesn't take you anywhere. So then it decided to read some logs, and it fu- found out what the problem was. It fixed the code, reloaded the app, and kept doing that until it got to a successful state. And finally, what it came up with was a PR that indeed worked. So the most important thing to take away from this slide is that wherever possible, our goal now is to get Claude into a loop by giving it the tools and instructions that are required for it to, to work effectively. So verification comes i-in many flavors, right? Like, we talked about UX verification, but you can have back-end verification. You may wanna verify your entire app end-to-end, including infra. And the, the core concept here remains the same. You, you wanna give Claude the tools and the instructions to get it into a loop. And once you kind of figure that piece out, all three, uh, all three of these flavors kind of merge into one, right? Like, you, you, you don't have to be very specific about the instructions you give Claude. As long as it has all the right tools and instructions, it'll be able to verify all of these things. So we've talked a lot about theory, and we've talked a lot about hypotheticals and, um-... jargon, but I wanted th- this slide to be a little bit more concrete. So what does it actually mean to give to Claude the instructions and the tools to make it go in a loop? Uh, and it usually boils down to, like, four things. Uh, and I'll, I'll go through the, the front-end or UX section from the slide. Uh, the first thing is to, is to run your application. So for, uh, a front-end a- application or a front-end, uh, verification loop, this might correspond to running your dev server. So running NPM run start or, or whatever your dev server might be, it just spins up a dev server. Once the dev serve- server is up, you want Claude to actually use the web server, and the way it does that is by opening up a, a browser. My personal MCP tool of choice for this is the Claude in Chrome MCP tool. Uh, you can access this with /chrome, uh, if you're using Claude Code. You can also use Playwright or there's a bunch of other, like, browser control MCPs that you can use to do that. Once Claude can, uh, drive your, your browser, the next, the next step is to, uh, is to prove that something works. So if, if it's a fix it's working on, you, you wanna take a screenshot before the fix and after the fix and make sure that, uh, it's the right state, right? And finally, there's unblocking it. Um, so if you've ever tried to create a verification loop in a production app, you'll very quickly find that, uh, there are some blockers you run into. Uh, and some of the common blockers are, for example, auth and, uh, state, right? So auth basically means, you know, you, you wanna give Claude an identity that it can log into to, to your web application so it can actually start to use your app. And then state means you may want to pre-configure some state. For example, if you have, like, a e-commerce store, you may wanna populate the inventory for that store for Claude to be able to, like, use your app meaningfully. And this isn't very novel. Um, in fact, uh, in traditional software engineering too, when you write end-to-end tests, uh, writing these state setup scripts are quite common. The only difference here is that you wanna give Claude access to these scripts, and you wanna make them dynamic. You don't want to be too, uh, prescriptive about what these scripts are doing, and that allows Claude to do a much wider variety of things than you can do with static scripts. Okay, so we know what a verification loop now is. We know how to write one. How do you package it? How do you distribute the script to your colleagues, to your coworkers, even to your future self? And one of the best ways of doing this is by using a skill. Um, you can think of a skill as just a way to store some arbitrary context about a specific topic, and in, in this case, that topic happens to be a verification loop. Um, the interesting thing about skills also is that you can make them self-improving. So if you put in instructions into your skill about improving the skill every time Claude hits a blocker, you will end up creating this self-documenting, self-improving skill which everyone on your team can contribute to, not just you, and this makes it really powerful. This is actually how we do verification in the Claude Code team as well. We have, uh, a single verification skill, and the skill is, uh, explicitly told to keep documenting itself. So every time someone runs into a blocker, the skill will go back in and edit its... edit itself so that next time when you or your colleague run into the same issue, it's not a problem. Okay. So, uh, we're going to jump into a demo next. Uh, but before the demo, uh, I wanna talk about what the application that I'm going to be using. Um, there is a, uh, a type tester application called MonkeyType. Uh, how many of you have heard of MonkeyType? Okay. I thought so. It's a, it's a niche community. Um, but it's basically a type tester where, uh, it, it shows you a bunch of words, as you can see, uh, and you have to type those words as accurately and as fast as possible. Uh, and the application just, uh, tracks your stats for you. Um, I like this as a demo app because it, it is represente- representative of a real world full stack app. Uh, it's written in TypeScript with a, with a, uh, with a Express backend, uh, and MongoDB and Redis as persistence layers. Uh, and it's open source, so, you know, you guys can go to monkeytype.com right now. You can even check out the source code if you want. Uh, but what we'll be doing in this demo is we'll be creating a verification loop live. So, you know, we'll tell Claude to spin up a new dev server. We'll tell it to, um, to kind of go and use the Chrome MCP to, to check some of its work. Um, and then once we create the verification skill, we'll also create a new feature and ask Claude to use the verification skill to verify itself. So let's get started with the demo. Um, so we can switch over to my laptop screen. Okay. So this is a new, brand new Claude Code session. Um, I've already done the homework of setting up MonkeyType, uh, locally. I've also installed some dependencies and cr- uh, curated a Claude MD because I didn't wanna do that in front of you guys and waste your time. So let's tell Claude to spin up the dev server. Okay. So it says the dev server is already running, and that's right because I started, started it right before our talk. Um, and let's go and check out what the f- what's on the front end. So if we go here, it... MonkeyType opens up. Um, I can tr- start typing, and there's like a little timer that shows up. I'm not very good at typing, so there's a lot of typos here. Uh, but it's, it's essentially what I would expect. Um, let's also check out the backend link.This just returns, uh, a JSON. Um, and it just basically means that, uh, the back end is, is up and running, which is good. Uh, the next thing I'm gonna do is I'm gonna make sure that my Chrome MCP is, is enabled, and the way you do that is just /chrome. And as you can see here, it says, "Status enabled, extension installed," which is, which is exactly what we're looking for. If you don't have it installed, it'll take you to, uh, the setup guide, and you can install it for yourself. Um, and now I'm gonna say use the Chrome MCP to make sure that the front end is working. Make it quick, please. Okay. And what we should see now is that this is the tab that Claude is using, and it should call the Chrome MCP tool. So if you go back here, we can see two Chrome MCP tool calls. I can Control + O and see exactly what it did. So it navigated to localhost:3000, and then it's looking at the contents of, of the tab, which is, which is great. But we wanna do something more exciting. Just looking at the, at a static web page isn't, isn't very helpful. So let's say, can you... Actually, before I do that, I'm gonna resize these so you guys can see what's happening in the background. Okay. Can you try typing and make sure everything works? Okay, so Claude apparently is also not very good at typing. Um, but it typed in something, and it says that typing works. Um, that's great. Let's do one more thing. Let's say, can you also use the settings and change something? Okay, so it navigated to the settings page, and it's changing the difficulty to expert. Not a good idea based on how, how it performed. Okay. And it claims that the setting is persisted, and it's able to, uh, to verify that. So that's great. This is-- What we did so far is we just held Claude's hand and told it exactly what to do. So we were like, "Spin up the dev server. Go and do these, like, two or three things that we care about." And that's basically verification, right? Um, what I can do next is I can tell Claude to take all the learnings from this session and put it into a skill file. So I can say, take everything we learned and put it into a skill file in doc claude demo verification. I didn't have to give it the full path, but, uh, I chose to anyway. Okay, let's see. It wants to create a new directory. Okay, so it's now proceeding to write a fairly large skill.md file. And if you look at what's inside this file, we'll just skim through it real quick. It says, number one, bring up the stack, which is basically what we did. It has some commands to do that, so it has docker compose, uh, blah, blah, blah. Uh, then it loads up the Chrome MCP tools because that's what it, we told it to do next. And then finally, there's, uh, a smoke test where it's using, uh, the browser tools to actually check its own work. Uh, so I'm gonna go ahead and say yes. Great. So that must have looked quite simple, and it really is. Cr- creating a verification loop is simple. Um, I did-- There were a few blockers that came up along the way when I was setting up this demo. Um, we don't have to talk about those right now, but I'm sure that if you, if you were to do this yourself, you can probably get this up and running within five to ten minutes. Um, what I'll do next is, you know, because both Claude and I are so bad at typing, um, I'm gonna tell Claude to, uh, make a confetti animation every time I mistype, and then use the verification skill that we just created to verify its own work. So let's say, every time I mistype, please, uh, show me a confetti animation and use the skill that we just created to verify your work. Okay, so it's gonna do its thing, figure out where to write this code, and then hopefully, the demo gods will be with us tonight. Okay, so it wants to write some files. Uh, I'm gonna switch on auto mode, um, so it doesn't have to ask me for every file edit.Okay, this is interesting. So it created the feature, and then it realized that there were a couple of lint errors. So you see there's like, oh, excellent errors to, uh, two. And then it proceeded to fix those errors next. And then it's verifying itself again. So you see the verification loop, uh, in action now, where it's, uh, it wrote some code, uh, it encountered some issues. Um, it fixed those issues by writing some more code, and it kind of went in a circle doing that until it came to a good state. So let's, let's test it out, um, ourselves as well. Uh, okay, it's still doing something. Let's, let's let it stop. Okay, so we do see the confetti showing up. Um, it put us on, uh, expert mode, which is why it keeps disappearing on me. But effectively, Claude was able to do the job, um, and, uh, fix, fix its own lint errors. Um, I won't-- We're running short of time, so I'm not gonna let this finish. But hopefully that gives you a taste of what-- how powerful a verification loop can be and how Claude can con-continue to hill climb on a task, uh, if you give it the right instructions and tools to do so. Uh, let's switch back over to the slides now. Uh, the key takeaway here is, you know, you should try to hold Claude's hand and show it, um, show it how to do verification. And once you've taught it how to do verification, it can very easily summarize those learnings into a skill file, which you can then package and distribute, uh, for your future self and for, for your teammates. Okay, so now that we have mastered verification, uh, we can graduate to multi-Clauding or parallelizing our work more effectively. Um, the, the problem y- that arises when you try to run too many Claude instances at the same time is that they all eat at your attention, and your attention is a scarce resource. I personally find that more than four to five sessions open simultaneously takes a big load on my gno- on, on my, on my brain, and I, I can't really function beyond that. So what are some ways that we can scale that, and what are some ways that-- some strategies we can use to multi-Claude more effectively? Um, there's four things that we'll talk about today. Um, there's the Claude Code desktop app, which provides you a GUI, um, and makes it easier to, uh, to manage multiple sessions. There is Agent view. So if you love the terminal, uh, like I do, and you wanna stay in the terminal, uh, then we have Claude agents, um, that provide you some of the same benefits of Claude de- the desktop app inside the terminal. You can also run Claude in the cloud. So if you run it on, on our website, uh, Claude is now running in our cloud as opposed to your desk- to your laptop. And finally, there's remote control, which is my favorite feature, uh, and we'll talk more about this when we get to it. So this is a screenshot of what the desktop app looks like. Um, on the left, you have a sidebar, uh, and the sidebar has all your sessions across all surfaces. So it has your sessions that are running locally in the terminal. It has your sessions running in the cloud. It has your sessions running in all Git repos. Um, and so it becomes the central control plane for, for working with, with Claude and, and your sessions. Uh, you can also, uh, pin sessions. You can rename them. You can color your sessions differently. And all of these things effectively are just solving the problem of grabbing your attention, right? Like if you rename a session to something that's memorable to you, when you come back to it, you know what that session was doing. Um, so these are all kinda ways to just make, uh, make your attention, uh, more pr- uh, protect your attention more. Um, if you love the terminal, uh, this used to be how, uh, you would multi-Claude. Um, this is a setup of, uh, of how I used to multi-Claude, at least. Uh, I used to have a Tmux window manager with, uh, with four panes, and each pane would work on a different work tree. Uh, this works, honestly, uh, but it is a lot to manage. Um, who here knows what Tmux is? Okay, great. That's a lot of people. And who here knows what work trees are? Great. About fifty percent. Um, so you have to kinda manage work trees in Tmux yourself, uh, which works and, you know, I, I think I'm used to it now. But it's also not the most convenient thing. We can do better. And what we arrived at was Claude agents. Uh, this is a feature that we released, I think, a week ago, maybe a little bit more than a week. And the way you access it is, uh, just say Claude agents instead of Claude. And it opens up this view, which is very similar to the desktop sidebar that we saw before. Uh, and this view lists all your sessions that are running on your local computer. Uh, it also sorts them by the degree of attention that they require.So if a session needs your immediate attention, and if it's blocked on, let's say, a permission prompt or a question or some input that it needs from you, it'll show up right at the top. If a session is running or if a session has completed its, uh, its desired success state, it'll, it'll be further down. You can also customize it. So you can, again, pin sessions, you can rename sessions, you can reorder them. And again, this is a way to just manage, manage your workload and manage your attention a little bit better. Claude Code on the web. Um, this... We've talked about this a little bit, but the main goal here is how do you decouple your laptop with your Claude Code sessions? Um, I find it quite annoying that when I'm walking from meeting to meeting, I have to have my laptop open and just walk like this everywhere. Um, when I'm driving back home, I'm also annoyed because, you know, there's no, there's no internet, and I, I can't leave my laptop open in my, in my car. So having your sessions be running in the cloud is, is really nice. Uh, you don't have to worry about the compute that it's actually running on. Um, and if you, uh, if you haven't given Claude Code on the web a shot, uh, just go to claude.ai/code, um, and it's pretty easy to get started. And finally, remote control. Um, as I said earlier, this is my favorite feature. Um, and remote control essentially gives you the option to control any session running on any surface with your phone. The way to get started with remote control is you just go to wherever you're running your Claude Code session and say /remote-control. And once you do that, it will pop up on your mobile app. Uh, it will also send you notifications. So if Claude needs, uh, needs some help from you or needs your input, your phone will buzz, and you could be in your car, you could be, you know, doing whatever you want, and you could just, like, give Claude the input that it needs. Um, I am running short on time, so I'm gonna skip this demo, unfortunately. Uh, but I was just gonna show you Claude Agents in, as part of this demo. So if you haven't given Claude Agents a try, uh, just give it a shot. Okay. Um, so we've talked about how to make Claude more reliable, uh, by making it or giving it the skills to verify its own work. We've also talked about how do you multi-Claude more effectively. But even that isn't quite satisfying. You know, you still have to actually spin up a new session. You have to have a goal in mind and, you know, whether it's on the desktop app or, or the terminal or, or web, you have to go and spin up a new session. How do you remove yourself from the loop even more? And that's what this next session is going to be about. So as software engineers, we have, uh, we have a lot of different tasks. Uh, and not all of these tasks are writing code for a specific new feature that-- or a bug that you're working on. A lot of this is just, um, bookkeeping in some ways. So personally, I'm spending a lot of my time now, uh, babysitting my PRs, right? We-- I think we all have a lot more PRs now that we're able to generate with the help of, uh, of Claude and AI, uh, and these PRs need to merge. But before merging, you need to get through your review comments, you need to get through merge conflicts, you need to get through CI failures. There's a lot that goes on, and if you have, like, twenty or thirty of these PRs you're trying to merge in a day, you can easily end up spending hours, uh, on, on babysitting these. Um, updating docs is another good one. Um, I think as we increase our velocity of shipping features and shipping fixes, we also need to keep up with docs. Um, similarly, triaging, monitoring feedback, and just in general, keeping, keeping CI green, these, these are all things that you kind of need to do every day. But they don't necessarily need you in the loop. They just need to be running in some sort of loop. And that's where the /loop command comes in. So /loop is a way to run a prompt at a specific interval in Claude Code. So you can say /loop ten minutes and babysit my open PRs, and what this will do is the Claude Code-- the session that, that's running the slash command will wake up every ten minutes. It will run this prompt, and if you have your Claude MDs and your, like, tools defined and set up correctly, it will be able to figure out what to do by itself. So you don't really have to be babysitting and monitoring your, your PRs, uh, manually. Uh, routines. Uh, routines are basically /loop but running remotely. So we talked about Claude Code on the web before and how that uses a remote container to, to run your sessions. Routines live and work in the same containers. Uh, the way you set up routines is by going to the web app or the desktop app. You'll see a little routines, uh, routines tab out there. Uh, and you can set up a new routine, uh, quite easily. You can, you can have a time-based trigger or you can have an event-based trigger. And both of those triggers can lead to a new Claude Code session opening up with a specified prompt. So for example, we have a routine that updates our docs, uh, every day, uh, for, for the Claude Code team. We also have a routine that looks at, uh, looks at our issues and feedback that's coming in and posts on our Slack channel every six hours. So this can be quite useful to do kind of routine tasks, um, that, you know, don't necessarily require you in the loop. Cool. So once you stack all of these three skills together, uh, you kind of end up at this system which is able to do a lot of work even without you having to manually be on your keyboard. And that, that really is, is, is the ultimate goal, is that you can kind of spend your attention and your time on the tasks that you care about, and everything else can just be delegated to Claude and, um, with a high reliability and a high degree of confidence. Cool. So that's all I have for you guys. Uh, thank you so much, and I hope you enjoyed the talk. [upbeat music]

Episode duration: 37:07

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode wI0ptqCSL0I

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.