How to get to production faster with Claude Managed Agents

Building agents used to mean spending development cycles on secure infrastructure, state management, permissioning, and reworking your agent loops for every model upgrade. Managed Agents, on the Claude Platform, now handles that layer for you. This session covers how to build and deploy a production-grade agent at scale.

May 6, 202617mWatch on YouTube ↗

EVERY SPOKEN WORD

15 min read · 2,652 words

SPSpeaker
[ on-hold music]
SPSpeaker
Please welcome to the stage members of technical staff of Anthropic, Jess Yan and Lance Martin.
SPSpeaker
[clapping] Woo! [upbeat music]
SPSpeaker
Hey, everyone. Hope everyone's having a great time at Code with Claude. Um, I'm Jess Yan, Product for Claude Managed Agents.
SPSpeaker
I'm Lance Martin, DevX team, and I do whatever Jess tells me.
SPSpeaker
[laughing]
SPSpeaker
That's news to me.
SPSpeaker
Okay.
SPSpeaker
Um, but today we have a great agenda for you guys. Uh, first we'll be starting with the AI exponential and how that has materially changed agentic development. We'll talk about the motivations for why we built Claude Managed Agents, a deep dive into the primitives, uh, a couple of demos to bring some sizzle to life, and then we'll walk you through our getting started resources so that you can start building some agents of your own. All right, so we're all familiar with how our model capabilities are increasing exponentially. But as these capabilities increase, so do the task horizons and the complexity of the work that we're delegating to our agents. We're seeing that the bottleneck is increasingly infrastructure and not intelligence. So a couple of years ago, you might have had Opus write and test a single component. Uh, you might be, uh, testing a debug, uh, a flaky test suite, and this would be minutes, maybe an hour of focused work. You'd be steering it heavily along the way, and you'd be correcting it as it went off course. More recently, with our latest models, we're seeing that people are running things overnight, walking away, waking up the next morning and seeing that their entire linear backlog has been resolved by an agent. In the not too distant future, we might see agents take on work that historically took quarters of teams working on it. Multi-agent coordinated teams will be running a full M&A pipeline end to end. And as tasks evolve from prompts to hours and hours and days of work, we need, we need not just prompting scaffolding, but a true agentic runtime.
SPSpeaker
That's right. With Managed Agents, one of the main things we've solved is reliability and security. So with long horizon tasks, these become much greater issues. You have an agent that's working for hours, weeks or days and need to be reliable. You also need to be secure. And also, new interaction modes become possible when you have agents working over long time horizons. So for example, with chatbots, there's-- it's pretty instantaneous for short horizon interactions. Long horizon agents require something we call outcome-oriented tasks. So you give an agent a task and an outcome, like a rubric that indicates what completion means. We also need the ability for agents to stop and resume over the course of a long execution, potentially to ask questions to clarify their work.
SPSpeaker
Famously, start and resume is the most human-like of interaction patterns because there's nothing more human than procrastination.
SPSpeaker
Right.
SPSpeaker
So it's clear that we expect a lot of our agents.
SPSpeaker
Right.
SPSpeaker
And that means historically, we've pushed that burden onto you guys, the developers. We've expected a lot of you as well.
SPSpeaker
Mm-hmm.
SPSpeaker
In research that we conducted prior to launching Managed Agents, we found that developers were, developers were seriously struggling. So one in three were struggling with context management. Context at the right time can be incredibly powerful. It's the knowledge that the agent needs to do work. But context at the wrong time can be a huge distraction. Half of our developers were citing that infra concerns were their number one production blocker. This is some of the concerns that Lance was talking about, about credential management, security and access, keeping a human in the loop. And then lastly, a majority of our users are saying that their agents are running with no formal observability. These agents are running off of predictive models, uh, randomized outputs, and this is very unlike traditional software development of the past. How do you know if your agent is doing something good if it is producing random or probabilistic outputs? So enter Claude Managed Agents. We built this platform so that you don't have to. We combined infrastructure and harness, um, so tool permissioning, tool execution, automatic context management, uh, checkpointing, retries, um, with foundational building blocks that Lance will go into in a bit later that makes it easy to understand how to compose a customizable agent quickly. And then lastly, we paired it with a rich observability platform. Um, we don't want these agents to be running on vibes. You should be able to understand exactly what your agent is doing and how you can improve it.
SPSpeaker
Yeah, that's right. And so using Managed Agents is actually really simple. The mental model is basically as follows: You're defining an agent. Agent you can think of as a configuration. It has a particular model, has a prompt, has tools, has skills. You're laying that out, and then you're allowing that agent to use an environment which you can configure. You can configure the networking packages, and this is where the agent can, for example, write code.And any given execution of the agent is a session. Now, sessions can have resources. For example, GitHub repos. They can have something like an outcome, which we'll talk about a little bit more later. And these sessions emit events, which Jess will cover here briefly, that you can then handle and process and use to understand what the agent's doing.
SPSpeaker
So let's walk the event topology. As agents do more and more complex tasks, the event types that are produced are more and more complex as well. And so we've separated them into four broad categories. So there are user events. You, uh, steer the agent, you guide it, you interrupt it, you define exit criteria. There are agent events. These convey, uh, what the agent is doing, what tools it's running, um, how it's compacting its context over time, who it's delegating to. There are session events, which help you track the life cycle of your work. So is the agent running? Is it idle? Is it waiting for your inputs? And then lastly, there are span events. Um, this is broader instrumentation that lets you group related events together. So we'll pivot to an example agent that we've built. Um, we call it Pascal. It runs on a hypothetical grocery store's, uh, data set, um, a grocery store called Just in Time, and it produces rich analytics and insights in minutes, um, leveraging its preloaded, uh, container, uh, loaded with a set of Python packages. You can see every event in console, and you can even diagnose, um, the event stream after the fact. So we'll walk through it with a bit of a demo. So we're starting the agent execution right now. You can see that the events are updating real-time in console, and console supports a single pane of glass that lets you analyze the agent's configuration as well as its environment as you're looking at the, uh, events produced. Uh, Pascal has started to cook. It's starting to produce some outputs. Uh, it first starts with an analysis of the products. We're learning that bananas are really, really popular.
SPSpeaker
[laughs]
SPSpeaker
Um, the second output that it'll create is an analysis of the shoppers. And we're learning that Sunday morning is peak grocery time. And lastly, my favorite output is a bit of a predictive model, where it's analyzing what is the reorder probability for a single customer given its demographic profile. Now that the agent has finished its completion, uh, the full event stream is available in console, and we can analyze its performance. We offer a debug a- agent in console so that you can look at the event stream, analyze bottlenecks, figure out ways to improve the agent going forward, and take recommended actions. So it looks like it has identified a few bottlenecks, um, which we can then go fix directly in Claude Code.
SPSpeaker
Yeah. So what you just saw is the console showing two really cool things. It's showing you the trace for everything that happened in the session, and allows you to analyze what happened using Claude to look at the session log and give you analytics, insights, and so forth. Now, how do you practically get started? Um, first, I want to promote something that I worked on quite a bit, is a skill built into Claude Code and is shipped globally. In, in Claude Code today, if you just do backslash Claude-API, you'll access our skill, which understands manage- managed agents extremely well, and I use this all the time. In fact, I don't write a lot of managed agent code myself. I have Claude Code do it. It's a very nice trick, and I'll show some very nice tricks later for how it can also be used to grab session logs and so forth. It uses the CLI to grab those logs. CLI is very powerful. It lets you configure agents as, for example, YAML files, which you can check in. It allows you to grab sessions programmatically, which is very useful for working with code agents. And we also have cookbooks.
SPSpeaker
Artisanal code.
SPSpeaker
Yes.
SPSpeaker
So this is what we've been building with since we launched. Um, but I wanted to also touch on some of the more advanced capabilities that we've been shipping over the last couple weeks. Each one extends the capabilities of the agentic, uh, experiences that you can offer. So first, there's multi-agent orchestration. Claude can clone itself. Claude can delegate to pre-configured additional agents, and this allows complex tasks to be decomposed into smaller units that are, uh, achieved with better fidelity. We have outcomes, which we've talked about a little bit earlier in this presentation, where Claude iterates until it sa- satisfies predefined exit criteria. You define the goal, Claude keeps going until it's finished. A couple weeks ago, we launched Memory to public beta. And with Memory solutions, Claude doesn't have to start each new session fresh. Instead, it's reading and writing to persistent memory stores. And then lastly, today, my colleague Mahesh announced our dreaming platform. Here, Claude is reflecting and codifying new learnings into new memories. Agents can literally improve between every single run. So we'll showcase another demo now that showcases how Claude, uh, Managed Agents uses, uh, outcomes and multi-agent to produce great outpu- outputs. Previously, you saw in console our analysis of a single session. Lance will walk through how we can do this programmatically at scale through the course of multiple sessions.
SPSpeaker
Yeah. So this is actually one of the most fun demos I had to build for this conference, and it came from a number of weeks ago. Angela, our head of product, actually prompted me with the question, um, "What would the AGI-pilled CEO have at his disposal, his or her disposal?" So this is an interface where you just type in a question. It will query fake organizational data and render visualizations based upon the input for anything that this fake CEO wants to know. If you use, for example, the Claude app, you understand artifacts. It's basically just Claude producing SVG and rendering that, in this case, in a browser, uh, as a visualization. So we'll show this right nowNow, all I'm doing is I set up a managed agent. You can see session. It has a sandbox, orchestration to handle things like retries. I'm giving it one custom tool, the a- ability to render code to a browser. And what it's gonna do is, based upon the user's input, it's gonna render visualization, and we'll see that in a bit, which will show different, for example, graphs or tables based upon what this fake CEO wants to know. Now, this is where I used outcomes, and I wanna make sure this is really clear because I thought this was really cool. So outcomes allows you to specify a rubric. You're passing instructions, which means the agent runs, and when the agent finishes, a separate sub-agent spins up, looks at the artifacts produced, in this case, my page. And for example, in this case, I specified produce timing and take a screenshot, do an analysis, send the analysis back to the main agent. So what was really cool here is I used outcomes to make this much faster. As you'll see shortly, the, the CEO will ask a question. It's gonna render visualizations, and I want it to be fast. And I used outcomes to do that. Now, one nice thing is when I kick off a run with outcomes, the managed agent will iterate against that outcome over time. And when it finishes, in my particular case, I would look at the result, see the dashboard, and I might have feedback. I say, "Ah, I don't like this." What I would do is I would then tell Claude Code my feedback, and it could use our CLI to pull the session log, reflect on the session, look at the rubric, look at the agent instructions, update those, and kick off a new session. This is what I call the outer loop. These two things work together really nicely. You have an inner loop that's using outcomes with managed agents, given a rubric, cook against that rubric, produce an output. And then this outer loop is looking at that output as a user and saying, "Ah, okay, I don't like this," and allowing, for example, a code agent like Claude Code to modify the rubric or modify instructions and kick off a new session. So these two work really nicely together. And these are actually my results, and you'll see the demo very shortly. I start with a pretty inefficient baseline, and these are all discovered autonomously with managed agent just using the rubric to optimize timing. It figured out how to optimize or, or basically parallelize tool calls. It figured out to use fast mode, perform prompt optimization. And for m- inputs that produce multiple charts, it uses multi-agent, which saves around seven seconds, going from around thirty-seven seconds down to ten seconds for rendering. And all figured out autonomously with a managed agent using outcomes.
SPSpeaker
So now we can see the results. Here's Boss Agent in action. We're able to analyze top-line metrics.
SPSpeaker
Yep, and I'm glad my music got included here.
SPSpeaker
Did Claude make that music?
SPSpeaker
Yes.

Episode duration: 17:25

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode E9gaQHrw_rg

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome