How to get to production faster with Claude Managed Agents

Building agents used to mean spending development cycles on secure infrastructure, state management, permissioning, and reworking your agent loops for every model upgrade. Managed Agents, on the Claude Platform, now handles that layer for you. This session covers the fundamentals of building and deploying a production-grade agent at scale and you'll also get to hear from industry leaders on the future of agentic infrastructure.

May 21, 202629mWatch on YouTube ↗

CHAPTERS

0:19 – 3:05
Why agent capabilities are accelerating—and why infra is now the bottleneck
The speakers zoom out on how quickly model capabilities have improved from early Claude 3 days to Opus-era workflows that can now generate substantial, review-ready engineering output. They argue the limiting factor is no longer model intelligence but the surrounding infrastructure needed to run agents reliably in production.
- •Progression from simple short tasks to feature-level execution and PR creation
- •“Wake up to merge-ready PRs” as the new baseline workflow
- •Future vision: quarters of work compressed into hours via agent teams
- •Prompt+tool use is insufficient for long-running, high-stakes tasks
- •Infrastructure and orchestration become the central constraint
3:05 – 4:19
What agents need to complete real work: access, identity, and interaction modes
To do meaningful tasks, agents need secure access to credentials, internal systems, and appropriate identities/authorization—similar to human employees. The talk also frames a shift from conversational chat toward outcome-oriented execution and resumable work that can be paused and continued later.
- •Agents require secure credentials and access to internal tools (e.g., GitHub, Slack, email)
- •Agent identity and auth as foundational primitives
- •Transition from chat to outcome-oriented task completion
- •Human-in-the-loop control and confidence-based check-ins
- •Resumability: start now, pick up weeks/months later
4:19 – 5:47
Why Claude Managed Agents exists: common failure points in agent productionization
The team shares research findings on what blocks teams from shipping agents: context/memory pitfalls, infrastructure requirements, and lack of observability. Claude Managed Agents is positioned as a set of composable primitives that offload platform work so developers can focus on product logic.
- •Context management and memory are easy to get wrong and degrade performance
- •Production needs: reliability, scalability, security, and latency control
- •Observability is required to judge success/failure and debug behavior
- •Managed Agents provides infrastructure + primitives + observability on Claude Platform
- •Composable building blocks rather than a single rigid framework
5:47 – 6:38
Core build steps: define the agent, provide a sandbox, run a session, observe via events
They outline the minimal workflow to build with Managed Agents: define an agent configuration, run it in a sandboxed computer environment, start a session, and monitor progress through an event stream. This establishes the mental model for how agents operate and how developers integrate them.
- •Agent definition = system prompt + model + skills/tools + permissions + identity
- •Runtime environment: sandbox with network allowlist and preinstalled packages
- •Sessions execute tasks and return when complete
- •Event stream enables real-time monitoring and interaction
- •“Computer access” framing: the agent needs an environment to act
6:38 – 8:11
Understanding the event stream: user, agent, session, and span events
The event stream is explained as a structured log of everything that happens in a session. They break it into categories so developers can interpret actions, lifecycle state, errors, and timing boundaries for long operations.
- •User events: messages, files, interrupts, tool results, confirmations, outcome definitions
- •Agent events: model responses, tool executions, coordination with other agents
- •Session events: lifecycle transitions, error recovery, outcome processing
- •Span events: start/end boundaries for operations (e.g., long responses)
- •Event stream as the backbone of observability and control
8:11 – 9:10
Demo: “Pascal” agent walkthrough—real-time console, config, environment, and results
A demo shows a fictional grocery-shopping analysis agent running inside Managed Agents. They highlight the dashboard/console experience, event visibility, agent definition display, environment configuration, and how outputs surface back in the product via API.
- •Kick off an analysis run from an app surface and watch events in real time
- •Console shows tool runs and agent actions as they happen
- •Agent definition includes system prompt, model, and MCP tool configuration
- •Environment view exposes networking config and installed packages
- •Demo output examples: popular items (bananas) and shopping-time insights
9:10 – 9:41
Using session transcripts to improve performance: “Ask Claude” optimization loop
They demonstrate a workflow where Claude reviews the session transcript and suggests configuration/performance improvements. A concrete example highlights identifying a long-running Python script and opportunities to reduce latency for end users.
- •Claude can read the session transcript to recommend improvements
- •Helps optimize agent configuration and tool/runtime behavior
- •Performance/latency matters in production agent experiences
- •Example: detect a Python script running 20+ seconds and target optimizations
- •Reinforces observability → iteration → better UX loop
9:41 – 10:43
Getting started options: Claude Code skill, CLI, and copy/paste cookbooks
The talk closes the basics with practical onramps: using Claude Code’s API skill, a dedicated CLI, or cookbook examples. The emphasis is on quick integration and developer ergonomics.
- •Claude Code includes a Claude API skill knowledgeable about Managed Agents
- •CLI for interacting with agents and sessions from the command line
- •Cookbooks provide real, ready-to-adapt integration examples
- •Multiple entry points for different developer preferences
- •Focus on accelerating time-to-production
10:43 – 11:56
Advanced capabilities: multi-agent orchestration, outcomes, memory, and “dreaming”
They introduce higher-level features released recently: delegating tasks across multiple agents with separate contexts, outcome-based iteration, long-lived memory, and a research-preview “Dreaming” feature that refines memories across many sessions.
- •Multi-agent orchestration: spawn sub-agents/threads with their own context windows
- •Agents coordinate by passing messages for specialized work
- •Outcomes: define a rubric/goals and iterate via grading loops
- •Memory: read/write long-lived stores to improve across sessions
- •Dreaming (research preview): reflect across thousands of sessions to create/edit memories
11:56 – 12:26
New launches: self-hosted sandboxes and MCP Tunnels for private connectivity
Two major additions are announced: self-hosted sandboxes (bring your own compute/VPC) and MCP Tunnels (securely connect private MCP servers without public exposure). The section clarifies what each enables and why it matters for security and enterprise integration.
- •Self-hosted sandboxes: run tools in your own infrastructure/VPC
- •Partner integrations highlighted: Cloudflare, Daytona, Modal, Vercel
- •MCP Tunnels (research preview): expose private MCP servers via secure tunnels
- •Avoids exposing services directly on the public internet
- •Aimed at stronger security posture and simpler network setup
12:26 – 14:22
How self-hosted sandboxes work in practice: control, policies, and provisioning signals
They explain operational details: customers control sandbox fleets, files, services, and policy enforcement inside their perimeter. Claude Managed Agents requests provisioning via signals while customers manage auditing, network policy, and lifecycle behavior.
- •Bring and manage your own sandbox fleet or use a partner’s offering
- •Keep private files/services/packages within your own perimeter
- •Control network policies, audit logs, and sandbox spawn/idle behavior
- •Anthropic side sends signals when Claude needs a sandbox provisioned
- •Maintains enterprise governance without giving up agent functionality
14:22 – 18:52
Partner panel: different bets on sandbox infrastructure (Vercel, Modal, Daytona, Cloudflare)
Partners describe their approaches to sandboxing for agents: Vercel’s unified “fluid compute,” Modal’s global scale and GPU sandboxes, Daytona’s human-like dev environments with speed/forking, and Cloudflare’s dual primitives (MicroVMs and lightweight isolates).
- •Vercel: shared foundation across functions/builds/sandboxes; reusable firewall/secrets primitives
- •Modal: fast global scheduling at scale; persistent volumes, custom images, GPUs
- •Daytona: “agents need what humans need” (OS/CPU/GPU) plus pause/resume/fork at speed
- •Cloudflare: MicroVMs for full dev environments + isolates for millisecond-scale lightweight execution
- •Common theme: scale, isolation, flexibility, and operational control
18:52 – 29:03
What people will build next—and the hard problems: assistants, RL optimization, identity/security, resumability
The panel discusses emerging use cases and the remaining challenges to run agents broadly: agents as “chief of staff,” GPU-driven optimization loops, ‘human emulator’ workflows to access legacy apps, continuous learning via RL, and the security/auth and resumability gaps that still need industry alignment.
- •Agents shifting async work into the background; “everyone has a chief of staff”
- •GPU sandboxes enabling automated benchmark hill-climbing and profiling-driven optimization
- •“Human emulator” idea: agents complete end-to-end tasks including legacy/GUI systems
- •Continuous learning/RL loops using fresh data to improve next-day performance
- •Hard problems: identity propagation, egress controls, unified auth protocols, and pausing/resuming/multiplayer collaboration

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

iOS

Android

Claude

Chrome

Why agent capabilities are accelerating—and why infra is now the bottleneck

What agents need to complete real work: access, identity, and interaction modes

Why Claude Managed Agents exists: common failure points in agent productionization

Core build steps: define the agent, provide a sandbox, run a session, observe via events

Understanding the event stream: user, agent, session, and span events

Demo: “Pascal” agent walkthrough—real-time console, config, environment, and results

Using session transcripts to improve performance: “Ask Claude” optimization loop

Getting started options: Claude Code skill, CLI, and copy/paste cookbooks

Advanced capabilities: multi-agent orchestration, outcomes, memory, and “dreaming”

New launches: self-hosted sandboxes and MCP Tunnels for private connectivity

How self-hosted sandboxes work in practice: control, policies, and provisioning signals

Partner panel: different bets on sandbox infrastructure (Vercel, Modal, Daytona, Cloudflare)

What people will build next—and the hard problems: assistants, RL optimization, identity/security, resumability

Get more out of YouTube videos.