How we Claude Code

See how Anthropic engineers actually use Claude Code day to day. You'll configure a real repo the way we do internally: project context files, custom commands, hooks, and subagents. Then run the same task stock versus tuned and see the difference for yourself.

May 23, 202631mWatch on YouTube ↗

CHAPTERS

0:20 – 2:09
Workshop kickoff: goals, setup, and what you’ll build
Arno welcomes attendees, gauges familiarity with Claude Code, and explains the hands-on workshop format. He points participants to a QR code and repo they’ll use throughout three phases of exercises.
- •Workshop is interactive with credits/QR code setup
- •Repo provided to clone and follow along
- •Three phases: prompting, HTML specs, verification framework
- •Support available in-room for technical issues
2:09 – 3:09
Why this talk now: longer-running agents demand better workflows
The workshop frames a shift in development habits driven by more capable models and agents. As agents run longer on more complex tasks, the cost of ambiguity and wrong turns rises, making better up-front alignment essential.
- •Agents can run longer as models improve
- •Longer runs increase token burn risk if misdirected
- •Need to change human habits and process with agents
- •Front-load clarity to avoid expensive mistakes
3:09 – 3:40
From Markdown to HTML: richer specs and verification up front
Arno introduces the idea (inspired by Tariq’s ‘Unreasonable Effectiveness of HTML Files’) that HTML can replace long Markdown specs. HTML enables denser, more ergonomic review and can embed verification-related context more naturally.
- •Talk based on Tariq’s recent SF version and blog post
- •Markdown specs get too long to read and review effectively
- •HTML can be more information-dense and scannable
- •Goal: move verification thinking earlier into the artifact
3:40 – 6:43
Prompting philosophy: ‘The Bitter Lesson’ and letting Claude interview you
Using Sutton’s ‘Bitter Lesson’ as an analogy, Arno argues you should resist over-constraining capable models. Instead, let Claude draw out latent requirements through iterative questioning—similar to how users ‘know it when they see it.’
- •Avoid hard-coding constraints that limit model capability
- •Claude can extract requirements better than you can pre-specify
- •Requirements are latent—elicitation is part of the process
- •Use interactive back-and-forth to remove ambiguity
6:43 – 7:43
What good prompting looks like (and what doesn’t)
Arno contrasts vague prompts like “make it better” with prompts that define domains and evaluation criteria without over-specifying solutions. The aim is to guide Claude to ask clarifying questions and converge iteratively.
- •“Make it better” is an anti-pattern
- •Specify domains/areas of focus (audience, goals, constraints)
- •Keep outcomes open-ended to enable exploration
- •Design prompts to trigger iterative interviewing
7:43 – 8:44
Claude Code workflow tips: auto mode, fast mode, and effort settings
Arno demonstrates key Claude Code interaction modes and settings that impact speed and quality. He encourages auto mode for smoother iteration and discusses effort configuration and when fast mode is useful.
- •Auto mode (Shift+Tab) is strongly recommended
- •Fast mode can speed iteration (with higher cost)
- •Use /effort to control model effort (often x-high)
- •Practical command patterns for workshop flow
8:44 – 10:22
Live example: bill-splitting app requirements via ‘ask user question’
Arno walks through prompting Claude to interview him about a bill-splitting app, using the ‘ask user question’ tool to collect requirements. Claude then turns answers into a spec and plan, reducing the need for perfect upfront articulation.
- •Bill-splitting app as a simple running example
- •Use a prompt that explicitly invokes ‘ask user question’
- •Tab-through Q&A to refine requirements quickly
- •Outputs: structured spec and implementation plan
10:22 – 10:52
Why HTML specs win: faster human review and better feedback loops
Arno explains that HTML mock/spec files make it easier to validate intent than long Markdown documents. HTML supports richer artifacts (layout, density, screenshots) and enables more direct visual feedback to Claude.
- •HTML is more ergonomic than 200+ line Markdown specs
- •Easier to validate “is this what I meant?” quickly
- •Screenshots can be incorporated for higher-fidelity feedback
- •Sets the stage for richer agent + human collaboration
10:52 – 12:53
Design exploration: generating multiple HTML design directions
Using Opus, Arno shows how to generate several distinct UI directions as separate HTML variants. He demonstrates how comparing visuals accelerates decision-making and gives Claude clearer feedback than text descriptions alone.
- •Prompt Claude to generate 4 design directions in HTML
- •Examples: brutalist, Tokyo/fintech styles, etc.
- •Visual comparison improves feedback quality and speed
- •Encourages screenshot-based iteration (especially for UI)
12:53 – 14:25
Verification as the main event: making artifacts agent-verifiable
The workshop shifts to embedding verification into the product artifact so agents can run checks reliably. Arno previews a framework using Storybook fixtures, schemas, invariants, DOM-emitted state, and Playwright MCP for a React app.
- •Verification is distinct from traditional testing in this context
- •Goal: verification steps usable by humans and agents
- •Tools/primitives: fixtures, schemas, invariants, DOM attributes
- •Playwright MCP enables browser-driven agent verification
14:25 – 15:56
Repo walkthrough: the three phases and the phase-3 verification demo app
Arno points to the workshop repo and outlines its three phases: interactive requirements prompting, HTML design generation, and a deeper verification framework. Phase three uses a React to-do app to demonstrate embedded verification.
- •Repo location shared (Claude Code workshops)
- •Phase 1: requirement interrogation prompt
- •Phase 2: generate multiple HTML designs
- •Phase 3: React to-do app with built-in verification framework
15:56 – 18:29
How the to-do app exposes state: publishing contracts to the DOM
Arno demonstrates the to-do app and shows how component state is emitted into the DOM via structured attributes. This avoids brittle DOM scraping and gives agents a stable contract to read and verify against.
- •To-do app actions: add, complete, clear completed
- •DOM includes ‘data verify’ units and totals (done/active)
- •State changes update emitted contract data in real time
- •Agent reads explicit contracts rather than inferring UI structure
18:29 – 23:19
Running the verification dashboard: schemas, invariants, probes, and a forced failure
Arno runs verification from a human-readable dashboard showing checks and detailed results. A deliberately failing invariant illustrates how probes push beyond the happy path and how evidence can be inspected per check.
- •Dashboard lists schemas, fixtures/known states, invariants
- •Can run checks individually or all at once
- •One invariant is intentionally wrong to demonstrate failure
- •Probes help catch edge cases beyond standard flows
23:19 – 25:57
Breaking the contract (not the app): why contracts matter for agent workflows
Arno intentionally deletes a DOM contract field to show how verification fails even when the UI still appears functional. This highlights the importance of maintaining the agent-readable contract as part of the artifact.
- •Demonstration: remove a contract field and rerun verification
- •Multiple checks fail due to contract mismatch
- •Shows separation between app behavior and verifiable contract
- •Reinforces ‘embed verification into the artifact’ principle
25:57 – 30:31
Agent-driven diagnosis with Playwright MCP + Claude (and recording evidence)
Arno lets Claude run verification headlessly via Playwright MCP and diagnose the failing check. He also explains how runs can be recorded as video clips and stored/shared (e.g., S3) as verification evidence.
- •Claude uses browser automation to locate failed invariant
- •Example failure: totals don’t add up (3+4 ≠ 10)
- •Recording runs produces shareable clips as proof/evidence
- •Three surfaces: human UI, agent-in-browser, and CI/headless
30:31 – 31:42
Wrap-up recommendations: Opus for vision, fast mode for iteration, HTML isn’t wasteful
Arno closes with practical guidance: use Opus for stronger vision-based feedback, consider fast mode for quick spec iteration, and don’t worry that HTML is token-inefficient. The richer artifact reduces iteration cycles and improves outcomes over time.
- •Prefer Opus 4.7 for vision-heavy workflows
- •Fast mode can accelerate spec iteration despite higher cost
- •HTML specs often reduce total iteration (and overall tokens)
- •Encouragement to explore repo docs and experiment with changes