Ship your first Managed Agent

Build and ship a working incident-investigator agent on Anthropic's Managed Agents platform: define an Agent, Environment, and Session, stream events, and wire up custom tools, all in six functions. You'll leave with a running agent, the mental model for the server-side loop, and a roadmap to production features like subagents, vaults, and memory.

May 26, 202637mWatch on YouTube ↗

EVERY SPOKEN WORD

35 min read · 7,178 words

0:00 – 0:19
Intro
1. SPSpeaker
  [upbeat music]
0:19 – 1:49
Workshop goals and roadmap: shipping an incident-response agent
1. SPSpeaker
  All right. Hello, everyone. It's great to see you all here today for our session on Shipping Your First Managed Agent. Let's go ahead and get started. My name is Isabella Hee. I'm a member of Technical Staff Anthropic on the Applied AI team. The Applied AI team at Anthropic sits at the intersection of products, research, and our customers, which means that I get to contribute internally to products at Anthropic, like Claude Code and our Claude harnesses, as well as work externally with our customers that are building on top of Claude and on top of our harnesses. So my goal today is to get you all hands-on with actually building on top of managed agents, understanding how the harness works under the hood, and getting you ready to actually ship your first incident response management. So as a quick overview of today's agenda, we're gonna cover first a quick refresher of Claude Managed Agents. I want to talk to you for a little bit about how this harness works under the hood and what makes it so special. Our team put a lot of thought into the architectural design of Claude Managed Agents to make sure that it runs ready and reliably for production-ready agents. So I want to talk to you for a little bit about how that works, so that then when we transition to the second portion here, which is the hands-on workshop, you'll actually understand what each of the primitives you're building actually mean for your agents under the hood. So for the majority of today's session, I want you all to actually have your laptops open, building alongside me, actually working inside of a repository and getting you ready to actually spin up a working incident response agent. Lastly, we'll talk a little bit about beyond the basics. Today's session is the first session of a couple of other ones that will build on
1:49 – 3:51
Why Managed Agents: from Messages API to Agent SDK to a managed harness
1. SPSpeaker
  top of this on Claude Managed Agents. Specifically, right after this one, I think there's another session on dreaming, which is one of my favorite new features with Claude Managed Agents for self-improving agents and memory built into the harness. So encourage everyone to dive in a little bit deeper into what else is in the box after we set you all up for success today with a quick introduction. So let's first touch a little bit about how we got here with Claude Managed Agents. When we first released the very first Claude back in twenty twenty-three, we released the Messages API alongside access to Claude. This provided raw model access to all Claude models. This became the very first way that people could programmatically build on top of Claude and essentially gave a way for people to access tokens in and tokens out via our Claude models. This also meant that for everyone building on top of Claude models, they had to implement all the various primitives themselves. Things like context management, the actual agent loop, compaction, et cetera, all the primitives that come alongside making an agent work. When models were less intelligent back in the early days of, let's say, twenty twenty-three, some of these primitives were much simpler because agents could simply do less. But as we evolved into now with higher model intelligence, and as agents are able to take on more complex tasks and actually take actions within environments and come to actually do entire tasks for humans, the primitives that come alongside context management and managing an agent's ability to execute API calls and tool calls becomes much more complex. So that's when we moved to the agent SDK, which became a harness that allows you to programmatically call Claude Code, one of our favorite agents at Anthropic. So Claude Code is something that an agent has access to a computer and takes actions within the file system, so the agent SDK became a way for you to make Claude much more powerful by leveraging the power of Claude Code within harness. The main thing here, though, is that with the agent SDK, developers still had to manage hosting and scaling on their own and making sure that the agent SDK would be safe to run within
3:51 – 4:21
What Managed Agents provide: production components handled for you
1. SPSpeaker
  their containers. That's when that evolved into Claude Managed Agents, which is the first harness to be able to handle scaling and production-ready components for you by Anthropic, providing things like a purpose-built harness, sandboxing, observability, tool runtime, all within a managed infrastructure system. This means that developers can focus on task and agent configuration, custom tool logic, the things that actually matter for bringing domain expertise and customizability to your agents, where you're handing off
4:21 – 5:52
Harnesses must evolve with models: ‘context anxiety’ example
1. SPSpeaker
  the rest of all the primitives and core compute and primitives of essentially managing the basics of agent running to Anthropic. So that brings me to managed agents as the fastest way to build production-ready agents with Claude. We've seen people build ten to fifteen times faster to production with Claude Managed Agents by leveraging our purpose-built harness. Part of the reason why we built Claude Managed Agents is becau-because harnesses should evolve alongside your agents. For example, back when we were building ourselves on top of models like Sonnet four point five, we noticed that Sonnet four point five emitted a particular behavior called context anxiety. This meant that with Sonnet four point five, Claude started wrapping up tasks early, even when it still had room to spare in its context window. To manage that in our harness, we then added some mitigations to, uh, combat against this early stopping behavior. But when Opus four point five then came out, we actually saw this behavior go away, making all that work we had done inside of the harness essentially obsolete because Claude had evolved beyond that behavior that we had built into the harness to manage. So the takeaway there is that it's a lot of work to maintain harnesses and make sure that they actually evolve alongside your agents, which is why with Claude Managed Agents, we want to make it really easy for Claude and Anthropic to handle all the complexities that come with compash-compaction, caching, things like context anxiety, all these various primitives that come with actually making an agent production ready and getting the most out of Claude. So again, you can focus on the tasks, tools, and things that actually matter for
5:52 – 7:22
Core primitives: Agent (brain), Environment (hands), Session (binding + streaming)
1. SPSpeaker
  building agents on Claude. So three primary resources go into building on Claude Managed Agents. First is the agent's endpoint, which is the persona and capabilities. This is the core system prompt that powers your agent.Essentially here, you're defining the model, the MCP servers, the skills, the various components that your agent can actually leverage when it's able to run in that agent loop. The next is the environments. You can think of this as the hands of the agent, where the previous one is the brain of the agent, where the agent is thinking through what to execute, and then it's using an environment to actually have a space and a container to actually take action on your behalf. Sessions are next the way to tie together agents and environments. A single session has-- is spun up on an agent instance within an environment, so you can connect the two together and actually stream events back to your user and start to take action on behalf of your humans as part of a Claude-powered agent. A key thing here, as I alluded to briefly before, Claude Managed Agent has the agent loop run server side. This means that a lot of the complexities that come with managing hosting and scaling are abstracted away. And when you close your laptop or you hit hard refresh on your agent that you're building on Claude Managed Agents, everything is maintained, and you don't have to worry about durability, reliability, all these various aspects that usually come to bite you when you're trying to turn your agent from a prototype into production. And lastly
7:22 – 9:23
Key architecture choice: decoupling agent loop from tool execution
1. SPSpeaker
  here, before we dive into the hands-on portion, is I want to talk you through a key design decision that went into Claude Managed Agents. Previously, with a lot of agent harnesses, we saw the agent loop coupled tightly with tool execution. This design pattern made sense and still makes sense for some agents because you want to give the agent powerful abilities to actually take action with an environment. For instance, with Claude Code, we want the agent to be able to access various files on your computer, take action within a file system, and therefore it makes sense for the agent to have access to all those tools spun up on every container. But we also realized there are some constraints for this, especially with some agents where you essentially want to be able to decouple the hands from the brains of the agents. For instance, credentials and, um, credentials and security became a huge concern. With the ability to have the agent access your file system, you can actually add very distinct sandboxing by app-- decoupling these two components where the agent is no longer able to access the actual credentials without encryption by decoupling the hands from the sandbox of the agent. The other aspect here is actually you can see huge benefits by doing this decoupling on things like time to first token and latency. Previously, with the agent loop and tool execution in the same box, you had to spin up containers for every single session that you're spinning up an agent, which contributed to additional latency from a time to first, time to first token perspective. But with this now decoupled, our teams actually saw reductions in time to first token along the lines of over ninety percent reduction in TTFT for our P95 metrics on latency. So here you can start to see the power of this design decision coming through from the perspective of safety, reliability, latency, and everything else that you care about when it comes to building production-ready agents. All right, so now it's time for the exciting part of today's session, which is where I want you all to open up your laptops and go to this URL
9:23 – 10:54
Hands-on setup: clone repo, configure API key, run Streamlit app
1. SPSpeaker
  here to actually clone a repository, and let's start to actually feel the magic of everything that I just talked through. So I'm going to give everyone a second to just go over to that URL there and just spin up the repository that we have ready for you. All right, so here's some additional commands that I want you all to run to make sure this is all set up on your computers. So the first step many of you might have done already, but just take that repository, hit the URL, Git clone it, and then I want you to CD into the specific repository for the session, which is ship your first managed agent. And then if you're on Mac, you'll see those two commands on the side, the Python and the source. Um, there's a command there for Windows as well. And you'll just do the rest there where you want to install the requirements, copy over the environment key into your .env file. Um, here you'll put in the Anthropic API key that hopefully all of you also received from the QR code for free credits earlier. And lastly, we'll just run the app. All right, let's go ahead and dive in. But as I mentioned before, let me just show everyone where these instructions are. If you go into the repository and the link and then go to ship your first managed agents, you scroll down on the README, you'll see all the setup instructions here. So feel free to do this, um, as we go along or even in your own time later today and continue playing around with it. But as I mentioned before, everything will be also shown on the screen to follow along with. So do not worry if you did not have time to fully get it set up on your laptop. Without further ado, let's go ahead and dive in. So once you run Streamlit run app.py, you should be able to see a URL that looks
10:54 – 11:54
Scenario framing: building an SRE incident-response agent
1. SPSpeaker
  like this and a page that looks like this. What we're doing here is we're going to be simulating an agent, um, interaction here where we have an incident that's going to come up. A lot of you who might be software engineers in the room will be intimately familiar with the pain that comes alongside incident response. If you are a software engineer, you might be woken up at, let's say, three AM in the morning, two in the morning, when you're out around on, on vacation as you're on call, and this is usually a very painful portion of a software engineer's life, uh, because when you're on call, it means that if a server goes down or service goes down, you have to be immediately the one there to respond and tackle the incident. Usually for a human, this means diving into metrics and logs and deployments so you can actually investigate what's going on. And so what we're going to do is we're going to now have an agent run on Claude Managed Agents to do all this for us so that when we get woken up at three AM, we can hand it off to an agent, or maybe we don't even get woken up at all if Claude is able to do everything for us. Okay, so let's now go ahead and dive into
11:54 – 13:55
Implementing the Agent definition: model, system prompt, and tools
1. SPSpeaker
  the code here. What we're going to open up here is we have the agent.py file on the left and the agent complete on the right. If you want to challenge yourself, you can of course try to implement everything yourself here or with Claude. Um, but what we're going to do just for simplicity's sake is just copy over various elements from the completed file onto the incomplete file one by one, so you can see how these primitives compose our agent one piece at a time.So let's go ahead and start off with this very first part, which is the agent. We mentioned before that the agent is the one that defines the persona and the capabilities of the agent here. So that's model, the system prompts, and the tools in our case for our agent here. So let me go ahead and copy over what we see there on the screen, and you can see here that we're defining the SRE agent. We're gonna use Claude Opus four point seven here, and I've pre-configured a system prompts and tools for the agent. We can actually take a quick look into what that system prompt and tool looks like here. For the system prompt, you can see that it's actually extremely simple for the agent that we're defining today. You can, of course, add more complexity and constraints here, but we actually see a very simple prompt working for our agent that we're building today. We're just telling it that it's an SRE agent, it's responsible for coming in and debugging incidents, and it has access to various tools like metrics, recent deployments, get diff. These are tools that you would want as a developer if you're actually managing an incident response as well, like the ability to actually fetch logs so you can see exactly what's going wrong. So we're gonna give those same tools and the same instructions over to our agent. So now that we've configured this on the screen, and feel free for those of you who are able to spin it up on your own laptops to just follow along with exactly what I'm doing, which is copying over this portion from the right onto the left here. And then when we flip back over to the screen, what we'll see is this wasn't there until I just added that there, but we can now actually have a unique identifier attached to the agent that we're building. Okay, so
13:55 – 15:26
Configuring the Environment: networking controls, MCP tunnels, and BYO compute
1. SPSpeaker
  that's step one. Now let's go ahead and move our step two, which is the environment where the agent is going to actually do work in. All of you here, we're very lucky for those of you who were able to come yesterday as well to Code with Claude London, we actually just released yesterday the ability to bring your own containers and your own compute to Claude Managed Agents, which means that you can actually execute the agent for the tools and the actual ability for the agent's actions to work within your own infrastructure and not just Anthropic's managed infrastructure. So that's an exciting update that just came to Code with Claude London. Um, but for today's purposes, you can actually see if we copy over this environment configuration here. We're defining our SRE agent to work within the Anthropic cloud, and here we're just giving it unrestricted access from a networking perspective. We've made Claude Managed Agents very composable and very customizable, so this networking list here is actually an allow list. If you want your agent to only be able to access specific sites and URLs, you can restrict this down as much as you would like. We also released, um, Claude MCP tunnels, which actually also gives you the ability to run MCP servers within a private environment instead of on the public network as well. So again, just offering various components to help you make sure that your agents are as production-ready and as secure as possible. So now that we've defined this environment here, let's flip back over, and we just saw that environment piece come into our agent as well. So here we have a unique identifier for an agent and an environment, and that will next help us as
15:26 – 16:27
Adding context via files: uploading logs/metrics for agent analysis
1. SPSpeaker
  we go along with setting up the rest of our agents as we start to get into session definitions here. The next thing that we have to do is actually give our agent the ability to look at logs. With Claude Code, that is one of the first times where we realized the power of giving the agent access to files and a file system. Here with Claude Managed Agents, we're leveraging essentially the files API by uploading the metrics and logs to the agent, so the agent can start to run code and process through those files. So here we've attached the log here as a file for our agent, so we just also saw that populate and come through. Again, here, the key takeaway is as much data as you're able to give the agent, um, as possible is what makes it so powerful. Context engineering is a huge portion that comes to actually making an agent powerful, and this is where we see the developers spending the majority of their time working on top of primitives like Claude Managed Agents is managing context and managing what types of files are uploaded, how the agent processes those files. These are components that you compose yourself
16:27 – 18:00
Creating Sessions and streaming ‘events’ (not just token responses)
1. SPSpeaker
  and are very customizable on top of Claude Managed Agents to make it work as far and as wide as you want it to. Okay, so now let's go ahead and start to define the session that we have here. The session is going to-- Oops. The session is going to bind the agent and the environment and also mount the log here. So you can see we're passing in the agent ID, the environment ID, and the resources that we're giving to the agent. And this is going to give it the ability to start to actually act and interact with me as a user. Let's go ahead and just complete the rest of this here so that we can actually start to run our agent. What we want to do is now also give the ability for the agent to come in and stream responses to me as we go along. There we go. Okay, and the key portion here is that when our Claude Managed Agents runs within a single session, instead of responding in tokens in and tokens out, it actually works in units of events. Events here are things like user messages or agent tool calls, agent responses, so that every event can be logged from an observability perspective as well as streamed back to the user for the user to see the agent responding as it calls tools and as it starts to populate responses. This is crucial for both a user experience perspective, so user starts to see things as they come through and not just when Claude finishes an entire task, and also from an observability perspective, and Claude Managed Agents actually has a very neat console built in for looking at everything the agent is doing
18:00 – 20:04
Wiring local tool execution + session deletion for security and lifecycle control
1. SPSpeaker
  and a lot of observability features built into Claude Managed Agents. Okay, the last step here of just being able to put our agent together, you can start to see that our agent is actually starting to come together. We can start to create sessions, and we can start to do things. Um, what we're actually going to see here, though, is that if I send something like "Hi" to the agent, it can respond, um, but it doesn't actually have the ability to be able to call the various tools that we want it to yet because we haven't connected that locally to what we want the agent to do when it calls tools like Get Metrics.So the agent is ready. The agent is actually defined on the server side already. The missing piece here is just to finally give it our local tools so the agent can start to take action here on my computer or my infrastructure. Okay, so now that we have that copied over, the agent is going to be able to start to call, get metrics, get recent deploys, get diffs, so it can truly start to take action in terms of helping us debug this incident. The last thing I'm going to do here is also just to make sure I give my agent the ability to delete sessions so that when I come in, I can start to hit this delete button and delete sessions as I compose my agent. And this is also crucial from a security perspective. If you wanna make sure that, you know, nothing is being retained for sessions that you don't want on the cloud or on your infrastructure, you can actually just come in and proactively manage how sessions are deleted. And once they're deleted, they will be also removed from every single log aspect here so that you can truly make sure that whatever data you want managed is managed actively and proactively via Cloud Managed Agents. Okay, so with that all set up, let's go ahead and give our agent a test run here. I'm going to click the New Session here, and I'm going to just go ahead and, and ask the agent to debug my incident for me. You
20:04 – 24:17
Live run: diagnosing the incident and mapping demo tools to real systems
1. SPSpeaker
  can see here that because we gave the agent access to tools like sandboxing and bash and get recent deploys, the agent is starting to really take powerful actions on my behalf here. It's come in, it's run the sandbox command. We can open this up and see what this looks like. Um, we can see that it's actually coming in and looking at what the logs were added to. It's then come in and called this tool called Get Recent Deploys, which is coming in and returning results like what the recent deployments look like, the metrics. We can see this from a user perspective if you click on the tabs here, but this is essentially the data that's actually being passed into the agent via these local tools that we've defined. And again, we can start to see the magic of that streaming that we implemented come through as well because we saw these tools come in as they were being called from the agent. We saw the user prompts come in as soon as I prompted it to the agent. And the agent is actually streaming responses to me as it comes through with more token response and outputs, as well as as it calls more tools as it goes along as well. Okay, so what we're gonna start to see is the agent being able to help us actually debug what's going on here, which we can see here that the incident is that there's something going wrong with our P99 latency that seems to be ten times above baseline. The agent is coming in and debugging everything for us. It's, like, taking another second there. So some of the major design decisions that come in here when you're designing a real site reliability, uh, on-site incident response management agent for your systems is to think deeply about the various components that go in and the various MCP servers and skills that you wanna give your agent. Here we defined, of course, a very, very simple agent, but for lots of the SRE agents that we build, we actually also think about things like how can we give the agent a skill to actually execute and run runbooks? Runbooks are things where as teams debug incidents, they note down and document how they debug that incident so that they can do it again for a future session or a future incident. You wanna give the agent same access to the materials that you would have as a human developer. So something like a runbook skill where the agent is actually able to look at example runbooks or fetch other postmortems from other incident responses. That is something that is very powerful for the agent to be able to understand how to work within your systems and debug incidents successfully. Okay, let's go ahead and take a look at the agent here. Let's see. I'm going to go ahead and just start a new session here to make sure everything is working well. All right. Let's say, "Debug my incident for me." Okay. So this one works. Is everyone able to get it working on their laptops better than I have on the screen? Okay, we got some success in the room, so hopefully this will work as it goes along. Okay. Looks like we are streaming. We're getting everything in. Where'd the agent go? Okay, agent is checking logs, debugging everything. So if we just also look through some of the data here as the agent is working, the data that's actually being passed in for our agent here is all local just for our sakes of our purposes for our demo and our workshop that we're running today. But with the ability for you to run your agents within a container and infrastructure, you can start to see how things like your get metrics tool that are currently pulling from JSON can be easily moved to something like Datadog or other production systems for your infrastructure from that perspective. So everything that you see here that is currently local can be something that's easily movable into infrastructure as well via Cloud Managed Agents. Okay. Let's all cross our fingers and see if this run works. Oh,
24:17 – 37:07
Results, persistence, and ‘beyond basics’: states, webhooks, subagents, memory, outcomes, vaults
1. SPSpeaker
  there we go. Success. Okay. So the agent has come in. You can see here that as we scroll through all the tool calls, everything is persisted in the cloud. From the logs perspective, all of this will also be logged in the observability console. And then the agent has come back to us with the incident response here. It says that this seems to be caused by a database pool exhaustion. Seems like a commit that someone added here from Alice to refactor the order summary builder.Introduced a query that then caused the pool resources to be exhausted. So it's looking at and giving us the exact everything that went wrong from all the metrics they were able to call. It ruled out various other causes, and it's also giving us recommended actions to take. Another key component here in a lot of other incident response management agents that we built is actually giving the agent to actually go ahead and fix everything that it's been able to find. By giving the agent then access to something like Cloud Code, for instance, you can actually imagine this agent can then go into your code base, suggest fixes, put up a PR, and essentially do everything that it needs to do to help you go from initial incident all the way to fixing a root cause. So again, here for demo purposes, we're stopping at just the agent giving us the recommended actions, but I want you all to imagine the possibilities of where this can go if we give our agent more tools, more ability to take actions, access to your code base, ability to put up PRs, ability to fix incidents, so that you as a human developer can just become the oversight and watch over the agents as they take action, and you no longer have to go through and do manual steps like actually following the agent's instructions here to fix the root cause of the incident. So another key component of what we've built here on Cloud Managed Agents is session persistence. So when I come in and hit hard refresh on the screen, we're seeing that the agent is listing the sessions, and everything is retained from all the sessions that we just ran. We also have the previous sessions that we ran all retained in the cloud. Looks like this one actually came back to us as well. Um, and the previous sessions where we just said hi, everything is retained in the cloud, and we didn't have to deal with things like database and deployment of our agent and moving it from our laptops to production. Everything is already maintained server side. We can also see the ability to delete sessions come in, so I've run that delete, and now we have that, um, running the session here. Now we have that removed from our list here. Another thing that I want you to take a note of, which we'll talk through a little bit in just a second, is the states of the session. Here we can see that the sessions are now idle. Just now as they were running, they were in a running state. We have the sessions managed by state here as part of that same durability and maintenance and reliability of the session. So when I come in and ask the agent something else, like, "Who are you?" It's able to easily resume the session and execute as it goes along within that same session window. So state management here is really important to how Managed Agents works under the hood. All right. So now as if we just take a quick step back and look through everything we were able to accomplish, we started with an empty agent here, just built on a couple of primitives on Cloud Managed Agents. We then went and defined the agent's definition, the persona, the capabilities. We gave the agent an environment. We gave the agent data and context to operate over. We then gave the agent sessions, combining the agent definitions to an environment so the agent can think through which tools to call from an agent loop perspective, and then it can actually call those tools and take action on our behalf. We then came in and streamed the responses to the user into our logs, implemented some local tools, as well as the ability to delete sessions. And within this Streamlit app here, we saw how that actually affected from a front-end perspective how our agent was actually able to be presented to our users by adding all of these primitives together. So now let's go ahead and move back over to the slides to do a quick recap and talk through some of the lessons of what we learned about how Cloud Managed Agents works under the hood. But hopefully, for all of you who are able to actually build on your laptops, you all were able to just build a site reliability agent, so congrats to you all. But let's go ahead and dive in a little bit here into understanding what actually happened when we put all those pieces together. The first thing we saw is that we saw sessions speak in events and not responses in and tokens in, um, tokens out from a request-response perspective like we see typical with things like message API or other APIs that we see. With Cloud Managed Agents, instead of just having a request response, we actually have events appended to logs. Again, this is a huge portion of why Cloud Managed Agents is so reliable and secure because events are coming through and just added into an existing session logs so that it's easy to then resume a session and kick back off where you left off, and it's easy to then come in and look at everything from a log perspective. This is also really important from a reliability perspective when we separate the hands from the brain of the agent, that if a container goes down, we can just spin that container back up again, and we don't have to restart the entire agent loop alongside that container. The next thing here is that we saw the ability to implement local tools, and we implemented in our workshop these local tools defined in JSON and loading them in via our local files here. We were then actually able to see how with our Cloud Managed Agents harness, the execution of the agent is completely separate from the agent loop. We defined everything that executed locally on our laptops and our scripts, um, and our agent loop ran on the cloud inside of Anthropic's managed infrastructure. Again here, especially with what we just released with bring your own compute and bring your own sandbox in here, you can swap out where you want that agent to execute its tools in your own infrastructure or on Anthropic managed infrastructure, but within your own environments and your own containers as well as you spin them up. Moving from things like loading our tools in from JSON into anywhere you want to have your tools run, like a Datadog client using the same wire protocol, making it very easy to then go from initially building the agents for Cloud Managed Agents to then actually producing it and deploying it on production-ready infrastructure. Next thing we saw here as we thought about how our sessions are being streamed into our users and what we see from a front-end perspective is that we saw when our events were being able to be streamed to our users, these were in the forms of actually things we care about as a user. We saw events come in, and we saw the agent's ability to actually log everything to its observability console.And another key thing here is that as we think about how sessions are controlled in Claude Managed Agents, you can actually think about the state as being something very powerful when you can start to take action on behalf of events. What that means is that we saw a couple of key states for sessions in CMA or Claude Managed Agents. We went from idle to running, rescheduling if the agent needs to retry anything, or terminated if any of the sessions fail. And so the agent is able to restart from a reliability perspective, a resumability perspective, but also can actually do some very powerful things. For instance, you can actually have a webhook run, and when an event happens from a webhook, the agent receives that webhook in and can then do something like resume a session or kickstart a specific state based on external events. So again, this powerful form of having events and sessions be the core concepts of how Claude Managed Agents runs means that you can make it very, very easy to compose your agent however you want it to, and have the agent listen for things that happen both internally and externally via webhooks to take actions or resume your agent as you desire. And lastly here, something that we saw come through, through the agent that we all built for the site reliability agent, is that everything lives in the cloud from the Agent Boots perspective. The conversation is persisted. When we hard refresh the page, we saw the same sessions were maintained, and we saw that if we were able to, let's say, exit out of the agent and come back, we didn't have to manage anything from a database perspective or wire up where the agent is stored. We were just able to have all of that persisted in the cloud, again, making it very, very easy to go to production-ready agents. And lastly here, I just want to talk you through, we just built the very basic form of Claude Managed Agents. We saw what was possible with just the very, very simple primitives that we all built with, the basic level of what you can do with Claude Managed Agents. And already there we were able to have something that would usually take us a lot of time to spin up from a production perspective, all of compaction, caching, tool calling, all of that was handled for us there's-- via Claude Managed Agents. And even if we wanted to go beyond that to make our agent much, much more powerful, we could do things like add in skills, add in subagents, add in memory, add in outcomes. These are all core components that we offer to developers out of the box from Claude Managed Agents. I'll just briefly talk you through a couple of the key components, but want to encourage everyone to check out our documentation, what's publicly available on Claude Managed Agents. Attend the session after this one on dreaming to dive in deeper onto these topics. Subagents or multi-agents is a way for you to have an orchestrator agent, um, spin up context with other agents so that you can manage it from a context engineering perspective, where subagents can then handle tasks and have their own context windows and contribute back to the main agent, making it much more powerful from a parallelization perspective as well as the ability for context management. Memory is something that's always very important as you're building agents. I hear a lot of questions about how you can build self-improving agents or agents that learn from user corrections, agents that start to remember user preferences. That's where we're offering memory and a dreaming service for Claude Managed Agents out of the box. What dreaming means for Managed Agents is that Claude can actually come in and also look through its own memory logs and determine what to keep and determine how it can actually start to memorize and manage context for its own memory. So it can actually be able to really accurately remember which parts of your user preferences matter and which part of user corrections you want to retain for future sessions you run on that same agent. Outcomes is another one of my favorites where for Claude Managed Agents, this means that you can actually define a rubric for your agent outcomes. So you can start to think of your agent's tasks as something where you want the agent to reach a desired outcome instead of just executing calls and doing things on your behalf, but not associating that to a result that you want. So with outcomes, you can define a rubric of exactly what you want the agent to produce, and it'll figure out along the way which tool calls and what it needs to do to execute towards that final result. Vaults is something else that I hear come up a lot as of interest for Claude Managed Agents because managing user credentials is something that's very painful from an access management perspective, making sure that your agents are secure and safe to run. So for vaults in Claude Managed Agents, there's actually an encryption that happens between where the credentials are stored on a separate endpoint and what the agent is actually able to access. So you can manage these credentials on a per user, per session basis, all very safely and securely. And this relies in large part due to that architecture that I described earlier of how the brains and the hands of the agent are separated so that credentials can be stored very securely in these vaults. This means that you don't have to set up your own sec- secret stores or your own credential stores, and you can just rely on the built-in capability here. There are a couple other things here that I won't have time to go through in depth, so again, I encourage everyone to check them out in more detail. There are things like the ability to do webhooks and really make this agent run on external events. Things like detailed and fine-grained permission policies, the MCP servers that I mentioned where we just released new MCP server controls as well. And something that I also love just to briefly touch on is the console agent builder, where we have built in a lot of capability and functionality into the default developer console, where you can start to see a beautiful observability dashboard come through and other ways for you to define Claude Managed Agents right there on your consoles. So just as a quick recap to end us off here of what we were able to accomplish today. Hopefully, everyone leaves here with a bit of a mental model about how Managed Agents actually works under the hood. And be proud of yourselves for everyone that was able to come in and build on your laptops and actually ship a site reliability agent so you can all leave here being very happy with yourselves that you were able to come in and save future developers hours of time of being woken up at three AM or two AM in the morning and being able to handle incidents for them. And next, you also learned a little bit about where to go next for how you can really start to unlock the power that comes with Managed Agents and think about how your agents can become super powered with all of these additional functionalities. So that is where I'll end off today, but thank you all so much for coming. I'll be around if you guys-

Episode duration: 37:08

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode 19HDQ9HppOA

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

iOS

Android

Claude

Chrome

Intro

Workshop goals and roadmap: shipping an incident-response agent

Why Managed Agents: from Messages API to Agent SDK to a managed harness

What Managed Agents provide: production components handled for you

Harnesses must evolve with models: ‘context anxiety’ example

Core primitives: Agent (brain), Environment (hands), Session (binding + streaming)

Key architecture choice: decoupling agent loop from tool execution

Hands-on setup: clone repo, configure API key, run Streamlit app

Scenario framing: building an SRE incident-response agent

Implementing the Agent definition: model, system prompt, and tools

Configuring the Environment: networking controls, MCP tunnels, and BYO compute

Adding context via files: uploading logs/metrics for agent analysis

Creating Sessions and streaming ‘events’ (not just token responses)

Wiring local tool execution + session deletion for security and lifecycle control

Live run: diagnosing the incident and mapping demo tools to real systems

Results, persistence, and ‘beyond basics’: states, webhooks, subagents, memory, outcomes, vaults

Get more out of YouTube videos.