CHAPTERS
- 0:26 – 2:28
Expanding toolkit: the scaffolding now ships with the model
Lucas frames the talk’s core idea: what used to be extensive agent scaffolding code is increasingly built into Claude and its surrounding capabilities. He previews a “before vs. after” tour of how common agent tasks look now, emphasizing simpler development and better reliability.
- •Main thesis: stop thinking of the model as I/O only—think of an expanding toolkit around it
- •Talk structure: “before and after” comparisons (last year vs. 2026)
- •Less time spent on wrappers/retries; more time spent on outcomes
- •Coverage preview: tool use, context management, code execution, computer use
- •Includes practical tips plus Claude Code-specific tips
- 2:28 – 2:58
From DIY agent scaffolding to built-in reliability
He describes the old reality of building agents: routers, retries, validators, context compaction, and even coordinate math for UI automation. The message is not that the work disappears, but that the platform now owns much of it.
- •Old agents required large scaffolding codebases before product logic
- •Common components: routers, retry loops, output validators, context compaction
- •Computer-use required manual coordinate scaling and verification
- •Shift: scaffolding moved into the model/API—developers no longer “own” it
- •Outcome: simpler agent development and maintenance
- 2:58 – 4:29
Tool use: retiring brittle routers and external retry loops
Lucas contrasts heuristic tool routing (string matching, intent guesses) with the new paradigm where the model selects tools directly. He argues that pre-filtering often harms performance now, and Claude can increasingly recover from tool errors itself.
- •Old approach: limit toolset, route via heuristics (e.g., keyword ‘SQL’)
- •Routers are brittle and break as tools are added/changed
- •New approach: model searches/chooses relevant tools autonomously
- •Tool selection accuracy improved; routers can make results worse
- •Claude can observe tool errors and retry/recover without custom decorators
- 4:29 – 6:01
Tooling tip: document tool outputs (schemas), not just inputs
A practical prompt/tooling improvement: include the output schema in the tool description. This helps Claude reason about downstream steps (like ranking results) without extra back-and-forth.
- •Most devs only describe tool parameters; also describe return fields
- •Example: docs search returns id, title, snippet, score
- •Helps Claude plan post-processing (e.g., ranking by score) immediately
- •Can reduce round trips and make tool use more efficient
- •Leads to more predictable, intelligent tool-driven workflows
- 6:01 – 6:31
Claude Code tip: pre/post tool hooks for control and observability
He highlights a Claude Code feature: pre- and post-tool hooks in settings. These allow teams to enforce policies (block calls) and collect logs/analysis after tool invocations.
- •Define hooks that run before or after specific tool calls
- •Use cases: block tools under certain conditions
- •Use cases: automatically analyze and log tool outputs
- •Adds governance and instrumentation without rewriting agent logic
- •Useful for productionizing Claude Code workflows
- 6:31 – 8:02
Context management: from bespoke memory systems to 1M context + compaction
Long-running agents used to require RAG, chunking, periodic summarization, and manual cache breakpoints. Now, large context, flat pricing, server-side compaction, and context editing reduce most of that into configuration.
- •Old approach: memory systems (chunking/RAG) and summarization loops
- •Manual cache breakpoints to control cost and reuse prior turns
- •New: one-million-token context reduces window pressure
- •Server-side compaction and context editing simplify long-run sessions
- •Result: closer to an ‘infinite context window’ feel via API features
- 8:02 – 9:02
Context tip: prune stale tool results while keeping decisions
He recommends periodically clearing tool outputs (like screenshots, search dumps, large file reads) to save tokens. The key is to keep the decisions and reasoning they enabled, while removing bulky artifacts.
- •Every N turns: clear out old tool results from context
- •Targets: screenshots, search results, large file reads
- •Keep: the transcript decisions/actions informed by those results
- •Real-time token savings without losing the narrative of work
- •Makes long sessions cheaper and more stable
- 9:02 – 9:32
Claude Code tip: /context visualization to see what’s filling the window
A hands-on optimization tool: the /context command shows a colored breakdown of context usage. It helps developers quickly identify what’s consuming space and where to optimize.
- •Use /context to view a live grid of context composition
- •Shows relative size of messages, tool results, system prompts, MCP defs
- •Makes token usage tangible for faster iteration
- •Includes optimization suggestions
- •Useful during active Claude Code sessions
- 9:32 – 11:04
Code execution: collapsing the write-run-fix loop into one API turn
Lucas explains how developers previously had to orchestrate VMs, run code, parse errors, and iterate. The code execution tool now provides a hosted sandbox so Claude can run and debug within a single API interaction.
- •Old loop: provision VM/sandbox, run model code, capture tracebacks, feed back
- •Multiple harness round trips until success
- •New: code execution tool provides a server-side hosted sandbox
- •Execution/debug iteration happens inside one API turn
- •Simplifies integration and improves iteration speed
- 11:04 – 12:05
Mental model: Claude’s sandbox computer vs your local bash
He suggests thinking of code execution as giving Claude its own computer (like a powerful calculator) for stateless compute and analysis. Claude can switch to local bash only when it needs access to your machine-specific assets like repos or environments.
- •Sandbox is separate from your local machine; avoids cluttering local FS
- •Great for stateless compute, data analysis, installing libraries
- •Claude chooses between sandbox and local bash based on what’s needed
- •Local bash used for repo access or local environment dependencies
- •Encourages clean separation between analysis and local integration
- 12:05 – 12:35
Claude Code tip: /schedule for cron-triggered autonomous runs
He introduces scheduled, autonomous Claude Code runs. This enables recurring maintenance, checks, or iterative tasks to happen automatically on a timer.
- •Use /schedule to set up cron-like autonomous executions
- •Turns iterative loops into scheduled automation
- •Useful for recurring tasks (checks, reports, upkeep)
- •Reduces manual triggering and oversight
- •Fits into an agentic workflow mindset
- 12:35 – 14:06
Computer use: native-resolution screenshots and 1:1 coordinates up to 1440p
Computer use used to require downscaling screenshots to fit limits and then re-scaling click coordinates back—plus retries and verification. Opus 4.7 removes most of that by supporting native resolution screenshots and returning 1:1 pixel coordinates up to 1440p.
- •Old: heavy ‘image glue’—downscale, track factors, rescale clicks, wrap in retries
- •Pain point: reliable clicking required custom coordinate math
- •New: native screenshots + 1:1 pixel coordinates up to 1440p
- •Scaling math largely eliminated; simpler and more reliable automation
- •Marks rapid progress in practical computer-use usability
- 14:06 – 15:37
Computer-use performance and practical guidance: OSWorld gains, test formats/resolutions
He cites major improvements on OSWorld (from <50% to ~78%) as evidence computer use is nearing broad usability. He advises experimenting with resolutions and image formats, and still downscaling for 4K scenarios.
- •OSWorld eval tracks completion of tasks in professional/consumer software
- •Less than 12 months: below 50% to ~78% on Opus 4.7
- •Positioning: computer use is at the cusp of broad usability
- •Guidance: experiment with resolutions up to 1440p for best results
- •Try JPEG/PNG/WebP; compression artifacts can affect UI automation
- 15:37 – 19:12
Demo: Claude Code + Claude in Chrome closes the QA-and-fix loop
A pre-recorded demo shows an agent debugging a project management dashboard by reproducing UI issues in a real browser, fixing code, and retesting. The sequence demonstrates how computer use enables human-like testing and faster end-to-end iteration.
- •Claude opens the app in Chrome to reproduce the ‘New button’ bug
- •Tests live UI interactions, then edits code to fix card creation
- •Checks additional behavior (drag-and-drop) and identifies a second bug
- •Implements a drag-and-drop fix and retests from the start
- •Value: software built for humans can be tested by Claude like a human would
- 19:12 – 21:22
Closing message: reliability scaffolding has a short half-life—connect Claude to your world
Lucas concludes that code written to compensate for model unreliability (routers, retries, planners, verification loops) will quickly become obsolete as it gets absorbed into the model. The durable work is integrating Claude with unique tools, data, and context—building the ‘front door’ to your systems.
- •Rule of thumb: unreliability-compensation code becomes outdated in months
- •Expect retries/routers/planners/verification to be absorbed into the model
- •Focus on compounding work: custom tools, data, auth, domain context
- •Model can’t absorb what it can’t see—your integrations create durable value
- •Future outlook: software will expose ‘agent front doors’; differentiation is behind them
