Stop babysitting your agents

Routines turn Claude Code into a proactive teammate that reads your repo and opens a PR before you've opened your laptop. You'll see one built end to end, learn the trigger, context, and steering decisions behind any routine, and leave one /schedule command away from your first.

May 20, 202637mWatch on YouTube ↗

WHAT IT’S REALLY ABOUT

Build verification loops to scale agents without constant supervision

The talk argues that as agents write more code, developers waste time waiting and QA’ing, so tooling and workflows must adapt for agent-driven development.
It introduces verification loops—giving Claude tools and instructions to build, run, observe failures, debug, and repeat until success—to raise output reliability.
It shows how to package verification behavior into reusable, self-improving “skills” that document blockers and get better over time across a team.
It covers strategies for managing multiple concurrent agent sessions (desktop app, terminal Agent view, cloud sessions, and phone-based remote control) without overwhelming attention.
It explains background automation using /loop and cloud “routines” to continuously handle maintenance tasks like PR babysitting, CI health, and docs updates without being on the keyboard.

IDEAS WORTH REMEMBERING

5 ideas

Treat agent reliability as a verification problem, not a prompting problem.

Instead of manually QA’ing every change, give the agent the same checks you’d use—build/typecheck, run the app, inspect side effects, run tests—so it can prove progress and catch regressions autonomously.

A loop is the core primitive for “hands-off” agent work.

When Claude can write code, detect failure (lint/test/UI), debug, and retry repeatedly, it can hill-climb toward a clear success criterion and deliver higher-confidence PRs with less human oversight.

UX verification becomes practical when the agent can drive a real browser.

Tools like the Chrome MCP (/chrome) let Claude navigate, type, change settings, and capture before/after evidence (e.g., screenshots), turning subjective UI checks into repeatable steps.

Expect real-world blockers—solve them once, then codify them.

Auth and state setup are the common “E2E test” pain points; providing credentials/identities and state seeding scripts (inventory, test accounts, etc.) keeps loops from stalling and makes verification reusable.

Package verification as a team asset via Skills.

Turning a working loop into a skill file creates a shareable runbook; instructing the skill to update itself when it hits new blockers makes it self-documenting and improves team-wide over time.

WORDS WORTH SAVING

5 quotes

As models have been getting smarter, um, I've noticed that we're increasingly spending a larger percentage of our time staring at the screen waiting for Claude to finish its work or just acting as a glorified QA tester for Claude.

— Sid Bundusaria

But the problem now is that humans aren't writing most of our code anymore. It, it's agents.

— Sid Bundusaria

What does an agent need from your code base that a human takes for granted?

— Sid Bundusaria

A loop essentially is an autonomous circuit that you can complete for Claude, and it allows Claude to hill climb, um, hi- hill climb on a given task or a given success criteria.

— Sid Bundusaria

So /loop is a way to run a prompt at a specific interval in Claude Code.

— Sid Bundusaria

Why existing dev tooling assumes humans, not agentsAgent blind spots vs human assumptions in codebasesVerification loops and hill-climbing to a success stateBrowser-driven UX verification (Chrome MCP, Playwright)Unblocking loops: auth and state setupSkills as reusable, self-improving runbooksMulti-session management: desktop, agents view, cloud, remote controlBackground automation: /loop and routines for recurring chores

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.