Live coding session with Boris Cherny and Jarred Sumner

Ever wonder how the Claude Code team uses Claude Code themselves? Watch Boris Cherney, Head of Claude Code, and Jarred Sumner, creator of Bun, livestream their everyday workflows together.

Jarred SumnerguestBoris Chernyhost

May 5, 202632mWatch on YouTube ↗

WHAT IT’S REALLY ABOUT

Bun’s RoboBun pipeline shows agents reproducing issues into mergeable PRs

Bun runs an automated Claude-powered bot (“RoboBun”) on every new GitHub issue to reproduce the problem and often open a PR with a failing-then-passing test as a merge gate.
The workflow shifts engineering effort from manual debugging to evaluating whether an automated fix is correct and aligns with project intent, with CI and review as the main remaining bottlenecks.
Jarred combines multiple agents/tools—Claude Code for deeper contextual review and CodeRabbit for style/lint conventions—creating an “adversarial” back-and-forth that resolves many review comments automatically.
A well-maintained “Claude MD” (project-specific playbook) is positioned as the prerequisite for scalable agent autonomy, capturing repeated lessons about builds, tests, repo layout, and CI/log interpretation.
They argue current model improvements (e.g., “Opus 4.7”) make closed-loop iteration and “hill climbing” feasible in day-to-day engineering, enabling parallel work and long-running auto-mode sessions.

IDEAS WORTH REMEMBERING

5 ideas

Automate issue triage by making reproduction the default, not the exception.

RoboBun attempts to reproduce every incoming Bun issue and converts successful repros into PRs, saving maintainers from spending time just getting to a minimal failing case.

Require a verification loop: tests must fail before the fix and pass after.

Jarred describes a hard requirement that the bot can’t open a PR unless it adds tests and demonstrates the regression in the prior version and success in the patched branch.

Use specialized reviewers: style bots for hygiene, Claude for deep edge cases.

CodeRabbit is framed as useful for formatting/conventions, while Claude’s code review is valued for tracing control flow and surfacing subtle context-dependent bugs with a higher signal-to-noise ratio.

Treat “Claude MD” as the scaling mechanism for running many agents safely.

They repeatedly stress that every recurring instruction (how to build correctly, where tests live, common pitfalls, reading CI logs) should be written down so agents produce mergeable work consistently.

Auto mode removes the “permission prompt” stall that breaks long-running workflows.

Both note that agents previously got stuck awaiting approvals; auto mode enables hours-long runs and makes “one prompt → 30 minutes of productive work” practical.

WORDS WORTH SAVING

5 quotes

every time somebody submits an issue, uh, we have a Claude bot automatically run and try to reproduce the issue.

— Jarred Sumner

we can see that Robobun is now a bigger contributor to Bun than I am.

— Jarred Sumner

basically, the, the pattern here is like every time that you find yourself repeating something, it, it should probably go in Claude MD.

— Jarred Sumner

this entire thing was one prompt, and, and that just ran for thirty minutes.

— Boris Cherny

it started, like, writing code was the bottleneck, and now it's no longer the bottleneck. And then, like, v-verification and running tests, that was, like, the bottleneck, and that's no longer the bottleneck.

— Boris Cherny

RoboBun: issue reproduction → PR automationTest requirements: fail-on-old/pass-on-new gatingMulti-agent code review (Claude + CodeRabbit)Claude MD as operational documentationAuto mode vs permission prompts; autonomy tradeoffsVerification bottlenecks: CI, proof, rollbackHill-climbing via metrics and benchmarks

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.