The expanding toolkit

Over the last year, capabilities that used to require heavy scaffolding have moved into the model: reliable tool use, context management, writing and running code, computer use, and more. This session walks through these capabilities, shows what changed between model generations, and demonstrates how they compose into agents that finish work instead of just starting it.

May 8, 202621mWatch on YouTube ↗

CHAPTERS

Talk premise: the model is now an “expanding toolkit,” not an I/O box
Lucas frames the core idea: the scaffolding developers used to build around LLMs has increasingly moved into the model/API itself. The session is structured as “before vs. after” comparisons showing how common agent patterns look now versus a year ago.
From DIY agent scaffolding to built-in reliability
He recalls what building an agent used to entail: routers, retries, validators, context compaction, and even coordinate math for UI control. The key claim is not that this work disappears, but that developers no longer need to own it because it’s increasingly embedded in the platform.
Tool use upgrade: routing and retries are handled by the model
Lucas contrasts brittle heuristic routers with modern tool-selection capabilities where the model can choose relevant tools directly. He also notes improved recovery: when tools fail, Claude can often interpret errors and retry without external backoff wrappers.
Practical tool tip: describe tool outputs (schemas), not just inputs
He recommends providing Claude with an explicit output schema in the tool description so it knows what fields will come back. This helps Claude plan downstream steps (like ranking results) without extra round trips or guesswork.
Claude Code tip: pre/post tool-use hooks for control and observability
Lucas highlights Claude Code’s ability to run custom logic before or after specific tool calls via settings hooks. This enables policy enforcement (blocking certain tool calls) and structured logging/analysis of tool outputs.
Context management: from custom memory systems to built-in long-context features
He describes how long-running agents previously required chunking, RAG, frequent summarization, and manual cache breakpoints. The “after” story is a combination of very large context (1M tokens) plus server-side compaction and context editing, turning much of that into configuration.
Context tip: prune stale tool results while keeping decisions
He recommends periodically clearing tool outputs (screenshots, search dumps, file reads) to save tokens while preserving the model’s reasoning and conclusions. The idea is to keep what matters—the decisions informed by tools—without carrying all raw artifacts forever.
Claude Code tip: use /context to visualize what fills the context window
Lucas points to a built-in command that shows a live, color-coded breakdown of context usage. This helps developers see how much space is taken by messages, tool outputs, system prompts, and MCP definitions, along with optimization suggestions.
Code execution: collapsing the write–run–fix loop into a single API turn
He explains how developers used to provision sandboxes/VMs, run model-generated code, parse tracebacks, and feed errors back repeatedly. With a hosted code-execution tool, that iterative loop can happen server-side within one API interaction.
Mental model: Claude’s sandbox vs. your local bash
Lucas suggests thinking of code execution as giving Claude its own separate computer for scratch work. Claude can do stateless compute and install libraries without touching your local machine, and only uses your real environment when it needs local assets like a repo or specific setup.
Claude Code tip: /schedule for cron-like autonomous runs
He introduces scheduling autonomous Claude Code runs, enabling recurring tasks without manual prompting. This extends the self-iteration loop (write/run/fix) into time-triggered automation.
Computer use: native-resolution screenshots and 1:1 coordinates up to 1440p
Lucas describes the previous pain of downscaling screenshots and rescaling click coordinates to match model limits. Opus 4.7 can now accept native-resolution screenshots and return precise pixel coordinates up to 1440p, removing scaling math and improving reliability.
Computer use performance + implementation advice (formats, 4K downscaling)
He cites OSWorld as the headline benchmark: Claude rose from below 50% to ~78% on Opus 4.7, nearing 80%. He advises experimenting with resolutions and image formats (JPEG/PNG/WebP), and still downscaling for 4K scenarios.
Demo: Claude Code + Claude in Chrome closes the test–debug–fix loop
A pre-recorded demo shows Claude reproducing UI bugs in a project management dashboard via browser control, then editing code to fix them. Claude verifies card creation, discovers a drag-and-drop issue, patches it, and re-tests end-to-end before summarizing changes.
Closing message: reliability scaffolding has a short half-life; world-connecting code compounds
Lucas concludes with a rule of thumb: code that compensates for model unreliability will quickly become obsolete as the platform absorbs it. The enduring value is in connecting models to your unique world—tools, data, auth, and context—because the model can’t absorb what it can’t see.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome

Talk premise: the model is now an “expanding toolkit,” not an I/O box

From DIY agent scaffolding to built-in reliability

Tool use upgrade: routing and retries are handled by the model

Practical tool tip: describe tool outputs (schemas), not just inputs

Claude Code tip: pre/post tool-use hooks for control and observability

Context management: from custom memory systems to built-in long-context features

Context tip: prune stale tool results while keeping decisions

Claude Code tip: use /context to visualize what fills the context window

Code execution: collapsing the write–run–fix loop into a single API turn

Mental model: Claude’s sandbox vs. your local bash

Claude Code tip: /schedule for cron-like autonomous runs

Computer use: native-resolution screenshots and 1:1 coordinates up to 1440p

Computer use performance + implementation advice (formats, 4K downscaling)

Demo: Claude Code + Claude in Chrome closes the test–debug–fix loop

Closing message: reliability scaffolding has a short half-life; world-connecting code compounds

Get more out of YouTube videos.