Ship your first Managed Agent

Build and ship a working incident-investigator agent on Anthropic's Managed Agents platform: define an Agent, Environment, and Session, stream events, and wire up custom tools, all in six functions. You'll leave with a running agent, the mental model for the server-side loop, and a roadmap to production features like subagents, vaults, and memory.

May 26, 202637mWatch on YouTube ↗

CHAPTERS

0:19 – 1:49
Workshop goals and roadmap: shipping an incident-response agent
Isabella Hee (Anthropic Applied AI) sets the objective: get everyone hands-on with Claude Managed Agents and ship a first production-shaped agent. She previews the flow—refresher, harness internals, then a laptop-driven build of an incident response agent, followed by what’s next beyond the basics.
- •Session goal: build and ship a first Managed Agent end-to-end
- •Agenda: managed agents refresher → harness design → hands-on build → advanced features
- •Focus on understanding primitives so you can reason about production behavior
- •Outcome: a working incident-response/SRE-style agent
1:49 – 3:51
Why Managed Agents: from Messages API to Agent SDK to a managed harness
The talk traces Anthropic’s progression: raw model access via the Messages API required developers to implement agent loops, context management, and tool orchestration themselves. As models and agent capabilities grew, these primitives became more complex, motivating the Agent SDK and ultimately Claude Managed Agents for production scaling and operations.
- •Messages API: tokens in/out, developers build all agent primitives themselves
- •Growing model capability increases complexity of context + tool execution
- •Agent SDK harness enabled more powerful agents (e.g., Claude Code), but still required self-hosting
- •Managed Agents shifts operational burden (scaling, reliability) to Anthropic
3:51 – 4:21
What Managed Agents provide: production components handled for you
Managed Agents are positioned as the fastest path to production-ready agents by bundling a purpose-built harness, sandboxing, observability, and tool runtime in managed infrastructure. Developers focus on domain configuration—tasks, tools, and policies—while the platform handles durability and core agent-running mechanics.
- •Managed infrastructure: scaling, reliability, sandboxing, tool runtime
- •Built-in observability and operational readiness
- •Developer focus shifts to task configuration and custom tool logic
- •Claimed speedup: 10–15× faster to production
4:21 – 5:52
Harnesses must evolve with models: ‘context anxiety’ example
Isabella explains why a managed harness matters: model behaviors change over time, and harness logic must adapt. She cites ‘context anxiety’ in Sonnet 4.5 (premature wrap-up) and how mitigations became unnecessary when Opus 4.5 removed the behavior—illustrating ongoing maintenance costs that Managed Agents absorb.
- •Model behavior drift can make harness mitigations obsolete
- •Example: Sonnet 4.5 ‘context anxiety’ caused early stopping
- •Harness mitigations were later unnecessary with Opus 4.5
- •Managed Agents aims to handle compaction/caching/behavioral mitigations centrally
5:52 – 7:22
Core primitives: Agent (brain), Environment (hands), Session (binding + streaming)
The foundational building blocks are introduced: the agent endpoint defines persona/capabilities, environments provide an execution container, and sessions tie the two together while streaming events back to users. Server-side agent loop execution provides durability even if the client disconnects.
- •Agent endpoint: model, system prompt, MCP servers, skills/tools
- •Environment: where actions occur (containerized execution)
- •Session: binds agent+environment; streams events back to UI
- •Server-side agent loop enables durability across refresh/disconnects
7:22 – 9:23
Key architecture choice: decoupling agent loop from tool execution
The talk highlights a pivotal design decision: separating the ‘brain’ (agent loop) from the ‘hands’ (tool execution). This improves credential security via sandbox separation and reduces latency by avoiding per-session container spin-up—delivering large time-to-first-token reductions.
- •Traditional coupling of loop+tools can be powerful but risky
- •Decoupling improves credential isolation/sandboxing
- •Reduced container spin-up overhead improves latency (noted ~90% P95 TTFT reduction)
- •Supports production concerns: safety, reliability, and responsiveness
9:23 – 10:54
Hands-on setup: clone repo, configure API key, run Streamlit app
Participants begin the workshop by cloning a provided repository, installing dependencies, setting environment variables (Anthropic API key), and running a Streamlit app. The README mirrors the on-screen steps for self-paced follow-along.
- •Git clone workshop repo and enter project directory
- •Create venv, install requirements, copy .env and add API key
- •Run Streamlit app (app.py) to launch the UI
- •README contains full setup instructions
10:54 – 11:54
Scenario framing: building an SRE incident-response agent
The Streamlit UI simulates a real on-call incident where engineers triage metrics, logs, and recent deployments. The goal is to hand off this workflow to a Managed Agent that can investigate and propose remediation, reducing 3AM wake-ups.
- •Incident response pain point: on-call triage at odd hours
- •Agent will inspect metrics/logs/deployments to diagnose issues
- •UI simulates incident prompt/response flow
- •Vision: agent resolves issues or at least narrows root cause quickly
11:54 – 13:55
Implementing the Agent definition: model, system prompt, and tools
The workshop starts by filling in the agent definition in code (copying from a completed reference). The SRE agent uses an Opus model and a concise system prompt that grants access to debugging tools (metrics, deploys, diffs, logs).
- •Define an SRE agent with a chosen Claude model (Opus)
- •System prompt kept intentionally simple but effective
- •Tool access mirrors what an SRE needs (metrics, deploy history, diffs, logs)
- •Agent gets a unique identifier after creation
13:55 – 15:26
Configuring the Environment: networking controls, MCP tunnels, and BYO compute
Next, the environment is configured, noting recent support for bring-your-own containers/compute. The example uses Anthropic cloud with an unrestricted network allowlist, but emphasizes that production setups should narrow access and can use private MCP tunnels.
- •Environment defines where tools/actions execute (Anthropic cloud or your infra)
- •Network policy is an allowlist; can be restricted for security
- •Mention of MCP tunnels for private MCP servers
- •Environment creation yields its own identifier for session binding
15:26 – 16:27
Adding context via files: uploading logs/metrics for agent analysis
The agent is given access to incident artifacts by attaching logs as files, enabling code-like processing similar to file access in Claude Code. This section frames context engineering—choosing what data to provide and how—as a major driver of agent performance.
- •Use a files capability to attach logs/metrics to the agent/session
- •More/better context increases agent effectiveness
- •Context engineering becomes a key developer lever
- •Artifacts can later map to real sources (not just local demo files)
16:27 – 18:00
Creating Sessions and streaming ‘events’ (not just token responses)
A session binds the agent, environment, and mounted resources and enables interactive execution. Managed Agents stream event objects (messages, tool calls, intermediate outputs) which improves UX and observability, and supports resumability and durability.
- •Session ties agent_id + environment_id + mounted resources
- •Streaming is event-based: tool calls, agent messages, user messages
- •Event logs power observability and better incremental UI updates
- •Event-log model supports easy resumption and robust reliability
18:00 – 20:04
Wiring local tool execution + session deletion for security and lifecycle control
The agent can respond, but tool calls won’t work until local tool handlers are connected. The workshop adds tool bindings (e.g., get metrics/deploys/diffs) and includes the ability to delete sessions to manage retention and sensitive data.
- •Agent loop runs server-side; tool execution needs local wiring/handlers
- •Connect local tools so the agent can take real actions
- •Add session deletion for cleanup and data retention control
- •Deletion removes session traces from logs for stronger data hygiene
20:04 – 24:17
Live run: diagnosing the incident and mapping demo tools to real systems
The agent is run against the simulated incident, calling tools and streaming results as it investigates a P99 latency spike. The discussion connects the demo’s local JSON-backed tools to production integrations (e.g., Datadog) and emphasizes designing skills/MCP servers (like runbook access) for real SRE workflows.
- •Agent investigates P99 latency regression via tool calls and log inspection
- •Streaming shows tool-by-tool reasoning progress in the UI
- •Demo data is local, but tools can be swapped for production backends
- •Real SRE agents benefit from runbook/postmortem retrieval skills
24:17 – 37:08
Results, persistence, and ‘beyond basics’: states, webhooks, subagents, memory, outcomes, vaults
The successful run identifies a likely root cause (DB pool exhaustion after a refactor) and proposes actions, with a vision of extending to PR creation and automated fixes. The session then recaps key mechanics—session persistence and state transitions—and surveys advanced Managed Agent features like subagents, memory/dreaming, outcomes rubrics, vaults for credentials, and event-driven webhooks.
- •Example diagnosis: DB pool exhaustion tied to a recent refactor/deploy; recommended remediation
- •Session persistence survives refresh; sessions can be resumed and deleted
- •Session states: idle, running, rescheduling, terminated; supports reliability and automation
- •Advanced capabilities: subagents, memory + dreaming, outcomes rubrics, vaults, webhooks, policies, console builder

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

iOS

Android

Claude

Chrome

Workshop goals and roadmap: shipping an incident-response agent

Why Managed Agents: from Messages API to Agent SDK to a managed harness

What Managed Agents provide: production components handled for you

Harnesses must evolve with models: ‘context anxiety’ example

Core primitives: Agent (brain), Environment (hands), Session (binding + streaming)

Key architecture choice: decoupling agent loop from tool execution

Hands-on setup: clone repo, configure API key, run Streamlit app

Scenario framing: building an SRE incident-response agent

Implementing the Agent definition: model, system prompt, and tools

Configuring the Environment: networking controls, MCP tunnels, and BYO compute

Adding context via files: uploading logs/metrics for agent analysis

Creating Sessions and streaming ‘events’ (not just token responses)

Wiring local tool execution + session deletion for security and lifecycle control

Live run: diagnosing the incident and mapping demo tools to real systems

Results, persistence, and ‘beyond basics’: states, webhooks, subagents, memory, outcomes, vaults

Get more out of YouTube videos.