The expanding toolkit

Over the last year, capabilities that used to require heavy scaffolding have moved into the model: reliable tool use, context management, writing and running code, computer use, and more. This session walks through these capabilities, shows what changed between model generations, and demonstrates how they compose into agents that finish work instead of just starting it.

May 7, 202621mWatch on YouTube ↗

WHAT IT’S REALLY ABOUT

Model scaffolding moves into Claude, simplifying reliable agent development workflows

The talk argues that agent “scaffolding” once built around LLMs (routers, retries, validators, compaction, coordinate math) increasingly ships with Claude, reducing developer-owned reliability code.
For tool use, Claude can now select relevant tools and self-recover from tool errors, making heuristic routers and manual retry wrappers often counterproductive.
For long-running context, large context windows plus server-side compaction and context editing reduce the need for bespoke memory systems like summarizers, chunking, and heavy RAG just to fit windows.
A hosted code execution tool collapses the write-run-fix loop into a single API turn by giving Claude a server-side sandbox for computation and debugging.
Computer use improves via native-resolution screenshots and 1:1 coordinates up to 1440p, enabling more reliable UI automation; a demo shows Claude Code + Chrome performing end-to-end QA and bug fixing in a web app.

IDEAS WORTH REMEMBERING

5 ideas

Stop writing heuristic tool routers; let the model choose tools.

Lucas argues routers are brittle “guesses about intent” that break as toolsets grow; Claude’s improved tool-selection accuracy makes pre-filtering usually worse than giving the full set and letting the model decide.

Rely on Claude’s built-in tool error recovery instead of custom retry loops.

Rather than wrapping tools with backoff and re-routing logic, the model can now interpret tool errors and re-call tools appropriately, reducing harness complexity.

Describe tool output schemas, not just inputs, to improve downstream reasoning.

Including fields like ID/title/snippet/score in the tool description lets Claude anticipate what will return and act immediately (e.g., ranking by score) without extra round trips.

Use large context + server-side compaction to replace much bespoke memory scaffolding.

With ~1M context at flat pricing plus compaction and context editing, many prior approaches (frequent summarization models, manual cache breakpoints, heavy chunking) become configuration rather than infrastructure.

Regularly clear stale tool results to save tokens without losing decisions.

Pruning bulky artifacts (screenshots, search dumps, file reads) while keeping the model’s conclusions and key reasoning reduces real-time context pressure and cost.

WORDS WORTH SAVING

5 quotes

The overarching theme of today's talk is that the scaffolding that you had to build last year actually ships with the model today.

— Lucas

So I want you all to think of the model no longer as just an input/output LLM box, but rather as a series of tools around that model that expands its capabilities and leads to better performance.

— Lucas

Routers like those are basically guesses about the user intent written in conditional if statements. They're brittle, and they're sort of the first thing that breaks when you try actually adding a new tool.

— Lucas

This means that that entire loop that I just described effectively happens inside a single API turn.

— Lucas

The rule that you should have in your mind is any code that you're writing that is compensating for model unreliability will have a half-life of just months. You should leave that work to us.

— Lucas

“Scaffolding” moving into the model/APITool routing and automatic retriesOutput schema descriptions for toolsLong-context management: 1M context, server-side compaction, context editingPruning stale tool resultsHosted code execution sandbox vs local bashComputer use: native resolution, 1:1 coordinates up to 1440pOSWorld evaluation gains (sub-50% to ~78%)Claude Code tips: hooks, /context, /schedule, Chrome integrationStrategic advice: avoid reliability compensations; invest in custom world-connection

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.