Memory and dreaming for self learning agents

Name: Memory and dreaming for self learning agents
Uploaded: 2026-05-21T00:00:00Z
Duration: 21 min 33 s
Description: Memory is introduced as a way for agents to improve from task to task by retaining and reusing learnings, rather than starting each session from a blank slate.

How memory and dreaming turn Claude Managed Agents into self-learning systems. This session walks through design considerations for memory architectures and how dreaming verifies and enriches memory between sessions.

May 21, 202621mWatch on YouTube ↗

CHAPTERS

0:13 – 1:13
Why agents need memory: learning across long-horizon tasks
Ravi frames the core challenge: agents are tackling longer, more complex work, but context management still lags. Memory is introduced as the mechanism that lets agents improve from task to task instead of repeatedly starting from scratch.
- •Agents are increasingly capable on multi-hour, complex tasks
- •Long-horizon context management remains a bottleneck
- •Memory enables task-to-task performance improvement
- •Goal: learning not just within a task, but across tasks/environments/agents
1:13 – 2:44
Milestones leading to Memory & Dreaming (MCP → SDK → Skills → Managed Agents)
A quick timeline shows the platform primitives that made more capable agents possible. The through line is steadily extending what agents can do and how reliably they can do it, setting the stage for persistent learning.
- •2024: Model Context Protocol (MCP) for principled tool/data access
- •2025: Claude Code + Agent SDK lowered the barrier to building agents
- •Skills abstraction enabled bolt-on capabilities for specific tasks
- •Claude Managed Agents improved reliable production execution
2:44 – 3:44
What “learning” looks like: improving from Task 1 to Task N
The talk contrasts a “no memory” baseline with an “optimal” learning curve where performance rises over time. Memory is positioned as a way to capture strategies, avoid repeating mistakes, and transfer knowledge across agents.
- •Without memory, performance stays flat across tasks
- •With memory, agents improve iteratively from prior outcomes
- •Capture common strategies and prior mistakes
- •Transfer knowledge across tools, codebases, files, and agent swarms
3:44 – 5:21
Memory for Claude Managed Agents: product vision and early results
Ravi introduces the launched Memory feature as a frontier memory system designed to maximize intelligence out of the box. He highlights multi-agent support plus enterprise observability/control, backed by partner outcomes.
- •Memory is now available for Claude Managed Agents
- •Designed for multi-agent systems with enterprise governance
- •Emphasis: teams build product, not memory infrastructure
- •Reported outcomes: Rakuten 97% decrease in first-pass errors; Wise Docs fewer recurring issues
5:21 – 6:22
What’s new vs. past “memory”: moving from harness tricks to model-native workflows
Memory isn’t a new concept, but the implementation philosophy has shifted. Instead of complex harness-specific patterns (e.g., claude.md or dedicated tools), the aim is to ‘get out of Claude’s way’ with a flexible format the model already understands.
- •Prior memory approaches were harness- or tool-centric
- •As models improve, simpler primitives work better
- •Parallel to Skills: minimal format, high flexibility
- •Direction: lean on file-based workflows Claude already navigates well
6:22 – 7:23
Design principle: file-system memory and ‘let it cook’
The system is designed around what current models do well: navigating virtual file systems and using standard tools (bash, grep) to manage information. Opus 4.7 is cited as especially strong at deciding what to save and how to structure it.
- •Models excel at interacting with virtual file systems
- •Claude can read/update/organize with bash and grep-like tools
- •Opus 4.7 improves selecting and structuring what to preserve
- •Memory modeled as a file system to match model strengths
7:23 – 8:24
Multi-agent sharing: scopes, hierarchy, and avoiding write conflicts
Ravi explains requirements for shared memory across sessions and agents: simultaneous access, different permission scopes, and scalable hierarchy. To prevent agents overwriting each other, an optimistic concurrency control approach is used.
- •Shared memory must work across agents and sessions
- •Read-only vs read-write scopes for different needs
- •Hierarchical stores (org-wide stable memory + granular task memory)
- •Optimistic concurrency control prevents clobbering writes
8:24 – 9:27
Enterprise-grade memory operations: auditability and standalone API
To be usable in production, Memory includes controls like versioning, diffs, attribution, and audit trails. A standalone API supports CRUD plus enterprise workflows such as exporting and redaction across diverse deployment environments.
- •Version control creates an auditable trail of changes
- •Diffing and attribution show what changed and by whom
- •Standalone Memory API enables external management
- •CRUD plus enterprise operations: export and redaction
9:27 – 11:00
Memory architecture recap—and the limits of in-the-moment note-taking
Ravi summarizes the architecture (storage, structure, processing) and then focuses on processing: agents write memories as they work. At scale, this produces duplicated, fragmented, and locally optimal updates, with repeated mistakes across sessions.
- •Three components: storage layer, memory structure, Claude-driven processing
- •In-session memory writing resembles ‘taking notes’
- •At scale, agents repeat mistakes and inefficiencies independently
- •Problem: local optimizations cause duplication/fragmentation globally
11:00 – 11:31
Dreaming: the out-of-band feedback loop that curates memory
Dreaming is introduced as a research-preview batch process that finds patterns and mistakes across agents and sessions, then reorganizes memory accordingly. Early customer results (e.g., Harvey) suggest large improvements from this continuous refinement.
- •Dreaming runs out-of-band as a batch feedback loop
- •Analyzes cross-session/cross-agent patterns and errors
- •Automatically curates/organizes memory for higher quality
- •Reported outcome: Harvey 6× increase in legal benchmark completion rates
11:31 – 12:31
How Dreaming runs: scheduling, inputs, and verified memory snapshots
The mechanics of dreaming are laid out: it can be triggered on schedules or events, is controlled via API, and inspects transcripts plus current memory. It proposes optimizations and outputs a better-organized snapshot that agents can adopt.
- •Decoupled from live sessions; triggered ad hoc/nightly/hourly/on events
- •Controlled via API for flexible integration
- •Inputs: session transcripts + current memory state
- •Output: verified, better-organized memory snapshot for adoption
12:31 – 13:33
Why out-of-band matters: global insights, clean objectives, zero latency
Ravi argues that decoupling is central to dreaming’s value. It enables global pattern detection across agents, avoids tradeoffs between task completion and memory quality, and keeps performance fast by staying off the hot path.
- •Cross-agent analysis reveals patterns single agents may miss
- •Dedicated harness allows clearer optimization objectives
- •No conflict with task goals; separation of concerns
- •No added latency to running agents (off the hot path)
13:33 – 15:35
Putting it together: organization-scale memory and ‘raising the floor’
With shared memory and global optimization via dreaming, the system scales both the amount and quality of knowledge available to agents. Ravi compares dreaming to test-time compute: investing effort upfront yields better downstream performance.
- •Shared memory across agents/environments becomes organizational knowledge
- •Dreaming reconciles and improves memory quality globally
- •Shared improving memory raises baseline performance for all agents
- •Analogy: test-time compute—extra work upfront improves outcomes
15:35 – 17:36
Demo: SRE agent platform using read-only org memory + task read/write memory
Ravi demonstrates an on-call/SRE workflow where agents triage alerts using multiple memory stores. Org-wide read-only memory contains stable policies and runbooks, while task-specific read/write stores capture evolving incident context.
- •Agents spin up to triage and remediate incoming alerts/pages
- •Read-only org store: SLOs, runbooks, on-call mappings
- •Read/write task stores: incident-specific evolving knowledge
- •Demonstrates practical cross-session coordination during operations
17:36 – 20:41
Demo deep dive: version history, concurrency safeguards, and Dream diffs
The demo highlights production controls (audit logs, version history, attribution) and shows a dreaming run that synthesizes patterns across recent sessions. The resulting diffs add actionable guidance (e.g., CPU spike → delayed alert pattern) and improve logs from raw chronology to higher-level insight.
- •Audit logs/version history enable inspection and rollback
- •Optimistic concurrency via preconditions prevents overwrite conflicts
- •Dreaming run selects recent sessions and generates an output store
- •Diffs capture recurring alert patterns and improve triage guidance
20:41 – 21:33
Closing: continuous self-improvement as agents run for days
Ravi closes by emphasizing that longer-running agents will require continuous learning about their environments. Memory systems—especially when paired with dreaming—are positioned as foundational to this next phase of agent capability.
- •Agents will operate over longer timescales (multi-day horizons)
- •Continuous world/organization understanding becomes critical
- •Memory + dreaming enable ongoing self-improvement loops
- •Call to action: try it and engage with the team for questions