Getting more out of the Claude Platform

Cut cost, manage context, boost intelligence. In this session, we'll show you how to put our latest platform capabilities to work. Through live demos you'll see what great prompt caching looks like, learn to keep context lean for long-running agents with tool search, programmatic tool calling, and compaction, and use the advisor strategy for a cost-effective intelligence boost. Together, they're a set of patterns you can apply to your agents today to get more from every token.

May 6, 202628mWatch on YouTube ↗

WHAT IT’S REALLY ABOUT

Production agent performance: prompt caching, context engineering, and advisor models

Prompt caching is presented as the highest-impact optimization, delivering major cost reductions, faster time-to-first-token, and relief from API rate-limit pressure for repeated prompt segments.
Context engineering is framed as an explicit discipline of controlling what enters the model context, avoiding abstractions that hide context composition and prevent optimization.
Three production techniques—tool search, programmatic tool calling, and compaction—are shown to drastically reduce context growth while preserving capability in long-running agent loops.
An “advisor” pattern pairs a cheaper execution model (e.g., Sonnet/Haiku) with on-demand Opus reviews to achieve near-Opus intelligence at substantially lower cost.
The talk closes with platform additions (workload identity federation and the Ant CLI) that improve security posture and operational automation for teams running agents in production.

IDEAS WORTH REMEMBERING

5 ideas

Prompt caching is the first optimization to implement for long-running agents.

Repeated tool-call loops create large shared prompt prefixes; caching those segments avoids reprocessing and can yield ~90% discounts on cached input tokens plus faster time-to-first-token.

Cache hit rate is a production KPI you should actively monitor.

The Claude Console prompt cache dashboard surfaces real usage; if your hit rate isn’t “in the 90s,” you likely have structural prompt issues or missing cache markers.

Use Claude Code’s prompt-caching skill to quickly improve cacheability.

The talk highlights an installed-by-default Claude Code capability that can suggest where to add cache-control markers and how to reorganize prompts to raise cache hit rates.

Context engineering requires visibility—avoid layers that hide what’s in context.

If frameworks or abstractions obscure the actual transcript/context, developers lose the ability to decide what belongs in context and can’t effectively optimize cost, latency, or reliability.

Tool search reduces context bloat by loading tools just-in-time.

You can declare many tools but defer injecting full tool schemas until the model needs them; customers reported meaningful token reduction (e.g., ~10%) and sometimes improved model focus.

WORDS WORTH SAVING

5 quotes

With prompt caching, if you mark which sections are common i- in your prompt, then we're able to com- compute the KV values, uh, essentially pre- pre-cache the, um, the models, you know, part of the inputs to the models in KVs and save those.

— Brad Abrams

In fact, it's a ninety percent discount.

— Brad Abrams

One mistake I see developers doing is using abstractions over top of the platform that obscure what's in the context, and then as a developer, you don't really know what Claude's seeing in its context.

— Brad Abrams

Context engineering is really a discipline. It's the discipline of deciding what belongs in Claude's context.

— Brad Abrams

So what the problem we're trying to solve with advisor is we want Opus-level intelligence, but at Haiku-level costs or Sonnet-level cost.

— Brad Abrams

Prompt caching mechanics and economicsCache hit-rate monitoring and toolingContext engineering disciplineTool search (deferred tool loading)Programmatic tool calling via code executionConversation compaction and thresholdsAdvisor strategy (Sonnet/Haiku executor + Opus reviewer)Workload identity federation (WIF)Ant CLI for console parity and automation

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.