Skip to content
ClaudeClaude

How we Claude Code

See how Anthropic engineers actually use Claude Code day to day. You'll configure a real repo the way we do internally: project context files, custom commands, hooks, and subagents. Then run the same task stock versus tuned and see the difference for yourself.

May 23, 202631mWatch on YouTube ↗

CHAPTERS

  1. Workshop setup: goals, QR code, and repo overview

    Arno welcomes attendees, checks familiarity with Claude Code, and gets everyone set up via a QR code and a clonable repo. He previews that the workshop is hands-on and structured into three phases, with helpers available for troubleshooting.

  2. Why workflows must change as agents run longer

    The talk frames a shift in habits: as models improve, agents can take on longer, more complex tasks—raising the cost of mistakes. The response is to front-load clarity and verification so the agent doesn’t burn tokens heading the wrong direction.

  3. From markdown to “the unreasonable effectiveness of HTML files”

    Arno explains the motivation behind using HTML specs instead of long markdown documents. HTML is presented as a denser, more ergonomic artifact for humans (and eventually agents), making review and feedback easier before implementation runs long.

  4. Bitter Lesson applied: resist over-constraining the model

    Drawing on Richard Sutton’s “The Bitter Lesson,” Arno argues that as models become more capable, over-specifying can be counterproductive. Instead, let Claude extract requirements through interaction—similar to how product teams learn what users want.

  5. Prompting tactics: avoid “make it better,” guide the interview

    Arno contrasts weak prompts (“make it better”) with prompts that specify domains and evaluation dimensions without dictating the final solution. The key technique is instructing Claude to ask targeted questions, iterating toward a complete spec.

  6. Claude Code settings that matter: auto mode, fast mode, effort

    He demonstrates operational settings inside Claude Code—fast mode, auto mode (Shift+Tab), and the effort parameter. The recommendation is to use auto mode broadly and run higher effort (often “X high”) for stronger outcomes.

  7. Live example: bill-splitting app spec via interactive questioning

    Arno shows Claude interviewing him to define requirements for a simple bill-splitting app. By explicitly invoking the “ask user question” tool, Claude produces a spec that is more complete than a one-shot description.

  8. HTML design directions: compare UI concepts faster than markdown

    Using Opus, Arno generates multiple HTML “design directions” (e.g., brutalist, Tokyo, fintech) for the same app. He highlights that clickable HTML mockups enable clearer feedback and faster convergence than text-only documents.

  9. Phase 3 focus: embed verification into the artifact (agent-native)

    The workshop shifts from planning to verification: making verification “native” to the app so agents can drive it. The approach combines familiar tools (fixtures, Storybook patterns, Playwright MCP) in a new arrangement to support agent-driven validation.

  10. Repo structure and the three phases

    Arno points attendees to the workshop repo and explains each phase. Phase 1 covers interview-style prompting, Phase 2 generates HTML design directions, and Phase 3 implements the verification framework (demonstrated on a React to-do app).

  11. To-do app demo: publishing component state to the DOM for agents

    In the React to-do app, components expose their state through DOM attributes (data emitted contracts). This enables an agent to read state directly without brittle DOM scraping or relying on React internals.

  12. Verification dashboard: schemas, fixtures, invariants, probes

    Arno walks through the human-readable verification dashboard where checks can be run individually or as a matrix. The system includes schemas, known states/fixtures, invariants that must always hold, and probes that push beyond the happy path.

  13. Failing cases and contract breaks: distinguishing app bugs from contract issues

    A deliberate invariant failure (e.g., sums not matching) demonstrates how the framework surfaces errors. Arno also shows breaking the contract (DOM attribute name) without breaking the app, causing verification to fail—highlighting the importance of stable agent-facing contracts.

  14. Agent-driven diagnosis with Playwright MCP + recording verification evidence

    Arno runs verification headlessly via Claude Code using Playwright MCP and has Claude diagnose the failure. He explains recording verification runs into video clips as evidence that can be shared/stored (e.g., S3) and used in team workflows.

  15. Wrap-up: why HTML specs and embedded verification scale with better models

    Arno closes by reiterating that richer specs (HTML) reduce iteration even if they sometimes cost more tokens upfront. He recommends Opus (better vision), suggests fast mode for spec iteration, and emphasizes that the key innovation is remixing existing tools into an agent-first workflow.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome