EVERY SPOKEN WORD
30 min read · 5,943 words- SPSpeaker
[upbeat music]
- SPSpeaker
Hi, everybody. Um, I hope everybody's having a good time today. I am Michael. I'm a member of technical staff here at Anthropic working on Claude Managed Agents.
- SPSpeaker
What's up, everybody? My name is Harrison, and I'm also a member of technical staff working on Claude Managed Agents.
- SPSpeaker
A lot of members of technical staff.
- SPSpeaker
Yeah, yeah.
- SPSpeaker
Um, okay. So, uh, today we wanna talk to you about Claude Managed Agents. Um, but before we do that, we wanted to do a quick recap over the last couple of years and the exponential that we've-- I think everybody in this room has been experiencing. After that, we'll, uh, talk a little bit about the motivations behind why we built Claude Managed Agents, um, followed by a deep dive into some of the primitives that we offer with Claude Managed Agents. Um, and then afterwards, we will, uh, bring out some of the partners that we've been working with on some of the new features that we announced today. Um, and then we'll wrap it up with a little bit of a getting started. Cool. So, uh, AI capabilities over the last couple of years have been on, like, an absolute rocket ship of, like, an exponential. I think, like I said, everybody here has been kind of experiencing that. Um, we started with, like, the Claude 3 kind of family of a-- of a-- of models. Um, and even back then, like, you were starting to see the, the semblance of, of really capable things starting to happen. Um, but really you, you could only really get, like, very simple, short things, uh, going. Uh, then with Opus 4, we went on an absolute tear, um, and things like Claude Code, uh, started, like, becoming really, really prominent. Um, and then, uh, these days with some of the newer model families that we have, um, we're seeing that, like, the bottleneck towards increasing capabilities is really the infrastructure around these models and not so much the intelligence for them. So yeah, like I said, with, uh, Opus 3, you could maybe have Claude, like, generate a test function for you. Maybe you, you would steer it a lot throughout, and you were, like, approving every single tool that you were doing. And then with, uh, Opus 4 and Claude Code be-coming around, um, you were able to maybe have it drive an entire feature. Uh, it could maybe put up a PR for you, but you were still steering it a lot throughout the way. Um, and then with, uh, Opus 4, uh, point seven, the, the newest model that we have, uh, like Boris mentioned earlier, people are clearing their entire backlogs, um, and are waking up to, like, a bunch of merge-ready PRs, which is amazing to see. Who doesn't love waking up in the morning to a bunch of PRs that you have to review? Um, and where we think we're seeing, uh, things going in the future is entire quarters' worth of work being able to be getting accomplished within a couple of hours. Um, so you can imagine a full M&A pipeline, uh, being done end-to-end with, like, an-- a swarm of agent teams. And when these agents work for, like, a couple of hours, uh, things like prompt plus tool use are okay. But really where we start, uh-- or where we need to start get-getting going is, uh, towards, like, task completion and overall, uh, agent infrastructure pipelines.
- SPSpeaker
But in order for your agents to be able to accomplish more, they need access to more, and that's where Claude Managed Agents is here to help you manage some of the complexity. You can imagine that if you have an entire team running an M&A deal, they need access to secure credentials, internal systems. If you're making code changes, you need access to your private GitHub repositories and the credentials that, uh, allow that kind of access. And additionally, you need identity and auth for your agents. This is essentially an identifier for who they are. Like, uh, you know, I, I as an engineer have access to Slack and my email and a bunch of tools internally like that. Our agents are gonna need access to those systems as well. But additionally, there-- we're seeing more and more different conversational methodologies for interacting with our agents. The first is probably the most familiar with a lot of folks, which is you send the, the agent text, and it gives you a response conversationally. But we're seeing more of a transition towards outcome-oriented agentic activity. So this is, again, give the M&A deal that needs to happen to the agent and the agent set and have them just go off and accomplish the task, coming back to you only when they feel relatively confident that the entire activity is complete. Additionally, as an agent platform, we'd be remiss to not support other methodologies of interacting with your agents, like starting an agent and then picking it up later on, maybe weeks or months in the future when you want the agent to pick back up right where it left off.
- SPSpeaker
So it was very clear that, um, we're gonna start expecting a lot out of these agents, and, uh, our developers will as well. Um, when we were doing a bunch of research as we were starting to develop something like Claude Managed Agents, um, we saw a lot of key sticking points around infrastructure and primitive development that, um, uh, really stood out. So the first of which was, uh, figuring out things like context management and memory. Um, these things are things that work really, really well if they are working, but if you get it wrong, it can, like, completely destroy how well your agents are gonna work. Um, and infrastructure concerns was another kind of, like, big sticking point. It was actually the number one thing that was cited as preventing people from being able to, like, skate the exponential and, like, really benefit from these improved model intelligences. Um, you need things like reliability, scalability, security. Um, even latency starts mattering when you're having these things running prod. Um, and then finally, uh, no-none of this really matters if you don't have observability into what these things are doing. Um, if you can't tell whether or not your agent is succeeding, uh, or doing things successfully, uh, it doesn't really matter. Like, how do you-- can-- how can you even assess that the, the thing is good? So with Claude Managed Agents, we did all of that platform work, um, so that you don't have to, so that you can kind of pick and choose the primitives that we have available out of the box, uh, around infrastructure, agent primitives, and observability, all available on the Claude Platform, um, where you can kind of pick and choose the, the composable primitives that we have, um, and, and kind of like build your product on top of them.
- SPSpeaker
Cool. So that's a lot. How do you actually get started building with Claude Managed Agents? The first step is just to define an agent. This is essentially a bundle of configuration that identifies who your agent is and what it can do. It's a system prompt, model, skills, tools, permissions, and generally just the identity of the thing that's actually taking the action. Second, you need a con-- you need an environment in which the agent will actually run.So really helps to give Claude access to a computer. In this case, your agent needs a sandboxing environment where you can configure the network allowlist and pre-installed packages within that environment. When all that's ready to go, you can actually kick off the session, ask your agent to go and complete some piece of work, and then come back to you when it's ready to rock. And through it all, if you want to observe the agent as it's doing its thing, cooking, you can just listen to the event stream and understand what the agent is doing, why it's doing it, and generally interact with it in whatever way you see fit.
- SPSpeaker
So let's demystify what we mean when we're talking about this event stream. Every session that you start in Claude Managed Agents is effectively a log of events that you, um, have where you or your end users are interacting with Claude, and Claude's responding. So we kind of, like, split up the domains of events that we have, uh, within the platform so that it's easier for you to kind of understand what each event means. Um, the first of which is user events. These are things that your own end users or maybe your platform is sending to Claude-managed agent sessions. Um, these could include text messages, um, images, documents. Um, you can interrupt your agent if you see that it's going off course and you want to steer it back onto, onto its track. Um, tool results for custom tools that you implement and, uh, execute on your end. Um, and even confirmations for human-in-the-loop controls for any tools that are executed on Anthropic servers. And then finally, we have outcome definitions, which we'll go into a little bit more detail about later. Next, we have agent events. Agent events are, uh, anything that Claude is really doing, um, on, on its side. So this could be responding to the user with a message, um, executing tools on its end, um, or coordinating with other agents, which we'll go into a little bit more detail later. Next, we have the session events. These are just like the overall life cycle of the session itself. So any descriptions around the status of the session changing from idle to running, um, error recovery, and information about the sort of-- sorts of errors that Claude is running into, and outcome processing. And then finally, we have span events, which make it really, really easy to understand when certain things are starting and ending, like Claude starting to write together a really, really long response.
- SPSpeaker
So we know that's a ton of information, so let's make it concrete by doing a quick demo of Pascal, a fictitious agent that's responsible for understanding a little bit more about grocery shopping habits of our users. So if we jump into the demo, we're gonna-- we're gonna start by showing our dashboard that's integrated with Managed Agents. And we're kicking off an analysis run where we've clicked this Analyze button in the top right. Jumping back to the console, where we can see everything that the agent is doing in real time, we can see the list of events that are coming through the event stream, tool runs, agent events, generally understanding what's happening in real time. On the right side, you can see our agentic definition. This includes the system prompt, model, and all of the MCP tool configuration that I was talking about earlier. And as we click into the environment, we can also see our networking configuration as well as the packages that we've installed into our, uh, container. Jumping back to our application, we can see all of this shown on our surface because all of this is exposed via an API. And what's that? Claude came back, found some bits for us. Looks like bananas are super popular, I guess. Pretty cool.
- SPSpeaker
I like bananas.
- SPSpeaker
Uh, and also jumping forward, if you want to avoid the crowds, it turns out that Sunday is not the right time to go shopping for groceries. But then that's not enough for us. We want to understand more about how our shoppers are going to behave in the future. So we had our agent go ahead and kick off, uh, predictive analysis and reorder probabilistic understanding of our users. But again, that's not enough for us. We want our agent to get even better over time, and we want to improve the way that it interacts with these sessions. So we've clicked this Ask Claude button in the top right, where Claude is actually reading the transcript of our session and is offering inputs for how we can optimize the way that we've configured it. So in this case, it's a little bit small, but on the right side, we'll see that a Python script ran for over twenty seconds. This is a Python script that we uploaded to the session, and maybe there's a chance for us to op-- uh, optimize that runtime so things can feel a little bit snappier for our users. So in a nutshell, that's a, a whirlwind tour of the console and many of the features that we support out of the box with Managed Agents. So as we jump forward, as developers, you might want to go ahead and figure out how to get started. If you have Claude Code installed, which I, I hope many of you do, it's m-my favorite tool of all, uh, feel free to just jump in and use the skill that we ship with our, uh, uh, with Claude Code directly, the Claude API skill. It knows all about Managed Agents and will make getting integrated an absolute breeze. Second, we also have shipped a CLI that lets you interact with your agents and your sessions extremely seamlessly via a very simple command line interface. And lastly, if you just want to see some examples of real code, copy/paste ready, you can fit for whatever your needs might be, we have a set of cookbooks that'll show off what it looks like to integrate with all of our API surfaces.
- SPSpeaker
So we covered a lot of the basics, but I wanted to go over some of the more advanced features that we've been releasing over the last couple of weeks, um, just so that you're all kind of aware of them. So the first is probably my favorite, is multi-agent orchestration. It allows Claude to spawn other agent threads with their own context windows in order to delegate work to them. And the cool aspect of it is that Claude can pass these messages back and forth to other Claudes in order for them to do whatever specialized work they get to do. Um, Outcomes allows, uh, you as an end user, uh, to define a rubric, um, or a set of goals for Claude to be able to iterate towards getting towards. Um, so after it does its kind of first pass iteration of something, it'll start triggering its, um, outcome grading, and, uh, will keep going like that in a loop until it feels like it's satisfied with the outputs it's given you. Memory allows Claude to kind of get better over time by reading and writing from these, uh, long-lived memory stores, uh, which, uh, makes every session kind of better than the last that it's had. And then finally, we have Dreaming, which we announced in Research Preview, uh, which allows Claude to kind of like reflect and codify over like thousands of sessions all at once in order to produce new memories, edit existing ones, um, and really make sure that the memories that we're dealing with are, are top-notch.
- SPSpeaker
And earlier this morning, you heard us announce two new features that'll make your agents even more powerful. The first is self-hosted sandboxes, effectively giving you the option to bring your own compute infrastructure to Claude if you want to run tools within your own VPC.Shout out to our partners at Cloudflare, Daytona, Modal, and Vercel, who we've been working with to sort-- support this seamlessly out of the box. And second, we shipped a feature in research preview called MCP Tunnels, which is basically a way for you to expose your private MCP servers directly to Claude via managed agents without ever having to expose anything over the public internet directly. And, um, just diving a little bit deeper into how self-hosted sandboxes work, um, you can bring your own kind of sandboxing infrastructure fleet, um, where you can contain your own private files, uh, your own services, packages. You get to con- kind of control how these sandboxes are provisioned, um, without using our kind of, like, native Anthropic cloud, uh, sandboxes. Inside your own perimeter, you can control things like network policies, your audit logs, um, when these, uh, sandboxes are, uh, spawned and, uh, idled. Um, and, and everything is there is kind of in your control without kind of having to cede that over to, uh, Claude managed agents. Um, all we'll do is just send you a signal whenever we need to, um, have a new sandbox be provisioned because Claude needs to do some, some work in there. And, um, the nice aspect of it is that you can either use your own sandboxing fleet or use one of the partners that we, um, just mentioned earlier today, um, in order to get started with all of this. And let's talk a little bit about MCP tunnels. MCP tunnels, again, are basically just a way for you to get your private MCPs and your network exposed to Claude managed agents without having to do any fancy network configuration on your side. Essentially, all you have to do is expose a fairly basic proxy layer, uh, to our, to our, uh, MCP tunnels, enabling your network infrastructure to speak directly with Claude via secure tunnel in the middle. Okay. So in order to get a little bit deeper about sand-- private sandboxes or s- self-hosted sandboxes, um, I wanted to welcome to the stage Mike from Cloudflare, Ivan from Daytona, Akshat from Modal, and Luke from Vercel to talk a little bit, a little bit more about this. Bye, Harrison. [upbeat music] [audience applauding] Hey. Hi. Good to see you, my friends.
- SPSpeaker
Hey there.
- SPSpeaker
How are we doing?
- SPSpeaker
Hey there.
- SPSpeaker
Do you guys mind if we, like, take a selfie real quick? I think that's, like, a trend here.
- SPSpeaker
I would like to.
- SPSpeaker
I think it's mandatory, right?
- SPSpeaker
Okay. Here. Nice. We look great. Okay, cool. Um, thanks everybody for joining us. Um, you all run, uh, companies or work at companies that run sandboxing fleets for agent, but you've all built them slightly differently, and I'm curious what you are each individually kind of betting on here. Luke, do you want to go first?
- SPSpeaker
Yeah. I think one of the, um, things that's foundational to us is we build all of our infrastructure on top of, like, the same foundation. So we kind of talked about this as fluid compute. Um, internally, we've kind of called it a system of hive. Um, and that allows us to give everyone basically just, like, the full VM. Um, that's for, yeah, whether you're running a build, a sandbox, or a function. Um, and for us, that really means that all of these things interop pretty well. We can reuse, like, the same features between them. Um, so stuff like we've, uh, just made public our, like, firewall for sandboxes where you can filter traffic and, um, inject secrets and stuff like that. Well, we can build that for one thing and, and reuse it across it all. So the same primitive is really flexible from that way. Um, and I think that's really powerful for when you're building something like Claude Managed Agents, and you can call it from a function into a sandbox and vice versa.
- SPSpeaker
Yeah. Nice. Um, Akshat.
- SPSpeaker
Yeah. So Modal is a compute platform that we've built for today's use cases. Uh, and one of the things we are really good at is, um, we run a lot of compute across the world. Uh, so we can, uh, since we have our own scheduler, we can spin up, uh, hundreds of thousands of sandboxes in, uh, in order of minutes, which, uh, we're betting on the fact that people need a lot of scale, and they want to get to that scale really quickly for use cases like RL. Uh, we also run in every region, uh, that you can imagine, so you can get low latency if you care about low latency from US West or US. Um, and, uh, we were really designed around flexibility. Uh, so one of the things we have is, uh, we, we think a lot about persistent volumes. Like, how can you customize that? How can you customize your images? Um, and also we have GPUs, uh, and GPU sandboxes are, are actually a pretty big growing use case for us.
- SPSpeaker
Nice. Um, Ivan?
- SPSpeaker
Yeah. So for us-- So Daytona, we are a company that started out building sandboxes from scratch, and so we had one definite principle that we were thinking about when we were building this or insight. It was that agents will need what humans need.
- SPSpeaker
Mm.
- SPSpeaker
And so when we think about when we started this at that time, a lot of people were just doing code execution boxes for, for, for the most part. But what we sort of understood was that agents like humans will need like different sort of size and specs, CPU, RAM, memory, different operating systems, CPUs, GPUs, like everything that humans need for agents, they-- uh, the humans need to get their job done, agents will need as well. Um, but the difference is you have to have that at insane amount of speed and scale and things like, you know, pausing, resuming, forking so that agents can try multiple outcomes. And so when we were building out Daytona, we're like, "Oh, nothing that exists in the market today can offer that from an infrastructure perspective." And similar to you guys, we built our own scheduler to basically enable this. So yeah.
- SPSpeaker
Nice. That's amazing. Um, Mike?
Episode duration: 29:03
Install uListen for AI-powered chat & search across the full episode — Get Full Transcript
Transcript of episode zenIB7XLZxQ
