Skip to content
ClaudeClaude

Live coding session with Boris Cherny and Jarred Sumner

Ever wonder how the Claude Code team uses Claude Code themselves? Watch Boris Cherney, Head of Claude Code, and Jarred Sumner, creator of Bun, livestream their everyday workflows together.

Jarred SumnerguestBoris Chernyhost
May 6, 202632mWatch on YouTube ↗

CHAPTERS

  1. 1:02 – 2:08

    Stage setup: a live, developer-first coding session

    The hosts set expectations for a hands-on session focused on coding rather than slides. Jarred frames the talk around how Bun’s team uses Claude Code in real production workflows.

    • Developer conference format: mostly live coding
    • Jarred introduces using Claude Code to build/maintain Bun
    • They immediately start agents to fix real GitHub issues
  2. 2:08 – 3:11

    RoboBun workflow: auto-reproduce GitHub issues and open PRs with tests

    Jarred shows Bun’s repo automation where every new issue triggers a bot run to reproduce the bug. If reproduced, RoboBun generates a pull request that includes a test and must demonstrate fail-before/pass-after behavior.

    • Every issue triggers an automated reproduction attempt
    • Bot submits PRs automatically when it can reproduce
    • PRs must include tests as a hard requirement
    • Bot validates that tests fail on old version and pass on the fix
  3. 3:11 – 4:10

    From debugging to merging: shifting the bottleneck to review confidence

    With reproduction and initial fixes automated, the main human work becomes deciding whether the proposed fix is correct and safe to merge. They discuss how this changes engineering effort and attention.

    • Automation moves work from debugging to decision-making
    • Main question becomes: “Is this the right fix?”
    • Test-based gating is the core trust mechanism
  4. 4:10 – 6:19

    Multi-agent code review: RoboBun + CodeRabbit back-and-forth

    They describe an emergent workflow where automated reviewers interact: CodeRabbit leaves comments and RoboBun responds and resolves threads. Claude-based review is positioned as better at deep contextual issues, while CodeRabbit handles style and conventions.

    • Automated reviewers iterate in comment threads and resolve them
    • CodeRabbit: style/lint/conventions; Claude review: subtle context-heavy edge cases
    • Reduces human context switching (checkout branch, fix lint, push, repeat)
  5. 6:19 – 8:00

    Why this works well for Bun (and how it generalizes to support tickets)

    Bun’s CLI nature makes verification easier because tests don’t require browser automation. Jarred then generalizes the pattern: many companies could feed customer support tickets into the same reproduce→fix→review pipeline.

    • CLI/system software lends itself to reliable repro and verification
    • Browser testing could be added via screenshots/video if needed
    • Generalizable idea: customer support ticket → bot repro → PR → automated review
  6. 8:00 – 10:05

    Claude.md and ‘compound engineering’: documenting every repeated lesson

    Jarred emphasizes that automation only becomes scalable when the environment and conventions are written down. Their Claude.md captures build/test commands, repo layout, prior pitfalls, and even how to surface the best error messages for agents.

    • Claude.md is critical to prevent low-quality/unmergeable PRs
    • Document build/run/test workflows and where tests belong
    • Rule of thumb: anything repeated should go into Claude.md
    • Optimize logs/errors so agents can see the most informative failures
  7. 10:05 – 12:10

    Closing the loop: agents that run tests, read CI, and arrive “merge-ready”

    They outline the goal of full-cycle agents that write code, run tests, monitor CI, and interpret build logs before a human ever looks. The aim is to present changes with enough evidence to merge confidently.

    • Agent should execute full loop: edit → test → CI → interpret errors
    • “Set up for success” so humans only evaluate correctness/intent
    • Reducing uncertainty requires better surfaced proof and artifacts
  8. 12:10 – 12:56

    Adoption curve and model inflection point: ‘47’ makes it practical

    Boris polls the audience about their development workflows and notes many are moving toward higher abstraction. They attribute the practical viability of this workflow to recent model improvements—efficient enough for daily use without heavy scaffolding.

    • Audience snapshot: manual terminal juggling vs more automated loops
    • Claimed inflection: model ‘47’ enables reliable day-to-day automation
    • Earlier approaches required heavy scaffolding and token-heavy prompting
  9. 12:56 – 15:40

    Live PR triage: judging plausibility, style alignment, and deeper control-flow risks

    They inspect bot-generated PRs and discuss how Jarred decides whether to wait for automated review or merge quickly. Claude review is valued for tracing control flow beyond the diff, with a relatively high signal-to-noise ratio compared to older tools.

    • Reviewing bot PRs: look for correct patterns and performance implications
    • Wait for Claude code review to catch non-obvious control-flow edge cases
    • Reported accuracy: ~10% of the time review concerns may be wrong
  10. 15:40 – 17:53

    How long has this been possible? Scaling commit volume and shifting bottlenecks

    They discuss the timeline of RoboBun’s effectiveness and how commit volume has increased as models improved. The limiting factor is no longer generating code—it’s confidence, verification, and CI throughput.

    • Automation effectiveness is recent and improving rapidly
    • Bottleneck shifts from code generation to merge confidence
    • CI and full test execution become gating constraints
  11. 17:53 – 20:22

    Large-feature development with Claude: HTTP/3, image processing, and benchmarks

    Jarred shares examples of major work done with Claude Code, including HTTP/3 support and a built-in image processing library. They highlight benchmark-driven iteration and performance targets as a productive way to let the model “hill climb.”

    • Examples: HTTP/3 server, HTTP/2 work, fetch support, image API, exploratory Rust rewrite
    • Claude ran benchmarks on a Linux machine with a clear target (e.g., faster than Sharp)
    • Iterative optimization guided by measurement and verification
  12. 20:22 – 22:14

    Hill climbing explained: metrics + verification enable autonomous iteration

    Boris connects Jarred’s benchmark workflow to an internal concept: hill climbing. With a measurable objective and a way to verify results, the model can iterate repeatedly in auto mode until it meets the target.

    • Provide a metric/goal and a verification loop
    • Model iterates improvements autonomously when allowed to run
    • Framed as an underutilized capability that’s now strong in ‘47’
  13. 22:14 – 25:03

    Claude Code UX and autonomy: auto mode, permissions, no-flicker CLI, and monitoring loops

    They get tactical about day-to-day ergonomics: auto mode avoids stalls from permission prompts, and no-flicker improves terminal usability via a rewritten renderer. Jarred also shows an agent monitoring PR status on a sleep/wake loop.

    • Auto mode reduces idle time caused by approval prompts
    • Discussion of risky settings vs safer automation
    • No-flicker mode: virtualized terminal rendering for smooth scrolling and constant resource use
    • Agents can monitor PRs/CI on scheduled loops (sleep/wake)
  14. 25:03 – 28:45

    What’s next: deeper verification, planning/taste, and when to trust auto-merge

    They reflect on remaining bottlenecks: proving correctness, rollback confidence, and eventually higher-level planning about what should be built. They discuss feature requests, engineering taste, and how AI-generated PRs change social dynamics by making “no-merge” easier.

    • Next bottlenecks: deeper verification and then planning/selection of work
    • Need better proof artifacts and rollback strategies for confidence
    • RoboBun can do features when prompted, but shouldn’t implement everything by default
    • AI PRs become “suggestions,” raising the merge bar without human feelings involved
  15. 28:45 – 32:00

    Wrap-up: watching the fourth PR land and the vision of iterative bottleneck automation

    They wait for one more PR to submit, emphasizing that the entire run came from a single prompt in auto mode. The session ends with a broader point: engineering is trending toward continuous experimentation and systematically automating each newly exposed bottleneck.

    • Fourth PR submission as a live demonstration of sustained autonomy
    • Auto mode enables long-running sessions without permission dead-ends
    • Theme: identify bottlenecks → automate them → repeat

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.