Stop babysitting your agents

Routines turn Claude Code into a proactive teammate that reads your repo and opens a PR before you've opened your laptop. You'll see one built end to end, learn the trigger, context, and steering decisions behind any routine, and leave one /schedule command away from your first.

May 20, 202637mWatch on YouTube ↗

CHAPTERS

0:13 – 1:14
Why “babysitting agents” is wasting your time (and what this talk aims to fix)
Sid Bundusaria frames a common pain: engineers spending too much time waiting on or QA’ing agents instead of doing higher-leverage work. He sets the goal of teaching practical strategies to manage agents so they can work more autonomously.
- •Agents are getting smarter, but humans often become “glorified QA testers”
- •The objective is to reclaim attention/time through better agent management
- •Positioned as an advanced “Claude Code 301” style talk
1:14 – 3:17
Prerequisites: Claude.md, tool connections, and running sessions in the cloud
Before advanced workflows, Sid outlines three foundational setup steps that significantly improve outcomes. He also surveys the audience to show many people haven’t implemented all the basics yet.
- •Create a high-quality Claude.md as the highest-leverage improvement
- •Connect day-to-day tools (Slack, Linear, Datadog, BigQuery, etc.) to enrich context
- •Use Claude Code Web/remote environments so sessions keep running off-laptop
- •Audience poll reveals partial adoption; encourages starting here
3:17 – 4:50
Rethinking developer tooling for agents, not humans
Traditional tooling was built to accelerate humans, but increasingly agents write the code. Sid explains that while many tools transfer well, agent workflows expose new blind spots.
- •Most existing tooling (linters, IDEs, compilers) assumes a human operator
- •Good news: many tools still help agents (linters/formatters/symbol servers)
- •Bad news: humans assume context that agents don’t have
- •Key framing question: what does an agent need that humans take for granted?
4:50 – 5:55
Roadmap: Verification → Multi-Clauding → Background loops
Sid outlines the three-layer strategy that builds toward autonomous, high-throughput development. Verification makes agents reliable, multi-Clauding scales output, and loops remove the keyboard from the critical path.
- •Verification: teach Claude to check its own work
- •Parallelization: run multiple Claudes once reliability is higher
- •Background loops: keep Claude doing useful work without constant supervision
5:55 – 9:10
How humans verify work—and how to transfer that playbook to Claude
The talk decomposes typical engineering verification into repeatable steps. Sid argues Claude can follow the same iterative process if given the right tools and instructions.
- •Human verification steps: write code → build/typecheck → run → inspect side effects/logs/DB → run tests → deploy
- •Iteration loops create confidence during development, not just at the end
- •Agents can replicate this if equipped with tooling + explicit instructions
9:10 – 12:12
The core idea: put Claude into a self-correcting verification loop
Sid introduces loops as the key mechanism for autonomous progress. Claude writes code, checks for failures, debugs, and repeats until success criteria are met—producing higher-quality PRs with less human oversight.
- •A loop is an autonomous circuit enabling “hill climbing” toward success
- •Cycle: implement → verify → detect failure → debug → re-verify
- •Personal example: diagnosing and fixing a broken signup button via browser + logs
- •Goal: give tools/instructions so Claude can reach a known-success state
12:12 – 14:15
Making loops concrete: run app, drive browser, prove results, unblock auth/state
Sid breaks “verification loop” creation into practical building blocks, especially for UX/front-end checks. He emphasizes planning for common blockers like authentication and test data/state.
- •Start the app (e.g., dev server) so verification can happen continuously
- •Use browser automation (e.g., /chrome, Playwright, other MCPs)
- •Prove correctness via before/after evidence (screenshots, observed behavior)
- •Unblock common issues: auth identities and pre-seeded state/test data
- •Prefer dynamic, agent-friendly setup over overly rigid scripts
14:15 – 15:48
Packaging verification as a reusable, self-improving “skill”
To share loops across a team (and future you), Sid recommends encoding them as skills. Skills can be made self-documenting by instructing Claude to update the skill whenever it hits a blocker.
- •Skills store reusable context/instructions—ideal for verification loops
- •Add “self-improving” instructions so the skill updates itself when blocked
- •Team-wide benefit: collective contributions reduce repeated troubleshooting
- •Claude Code team uses a single evolving verification skill internally
15:48 – 17:19
Demo setup: MonkeyType as a realistic full-stack testbed
Sid introduces the demo application and explains why it’s a good representative system for verification loops. MonkeyType includes a TypeScript frontend, Express backend, and multiple persistence layers.
- •MonkeyType is an open-source typing test app used for the live demo
- •Represents real-world complexity: TypeScript + Express + MongoDB + Redis
- •Demo goal: build a verification loop, then ship a feature using it
17:19 – 22:38
Live demo: driving the app with Chrome MCP and generating a verification skill file
In a fresh Claude Code session, Sid validates the dev server and uses /chrome to let Claude navigate and interact with the UI. He then asks Claude to summarize the steps into a skill.md file for repeatable verification.
- •Confirm dev server and backend health, then enable Chrome MCP via /chrome
- •Claude navigates to localhost, types in the UI, changes settings, and verifies persistence
- •Human-guided steps become the seed for an automated loop
- •Claude writes a structured skill.md capturing bring-up and smoke-test steps
22:38 – 26:42
Feature demo: implement confetti-on-mistype and watch the loop fix lint/test issues
Sid assigns a new UI feature and explicitly requires Claude to use the newly created verification skill. Claude iterates: makes changes, encounters lint errors, fixes them, and re-verifies—showing the loop in action.
- •Task: show confetti animation when the user mistypes
- •Claude uses the verification skill rather than relying on manual checking
- •Observed iteration: code changes → lint errors → fixes → re-verification
- •Result: feature works in the UI, demonstrating autonomous hill-climbing
26:42 – 33:15
Multi-Clauding without attention overload: desktop app, agents view, web, remote control
Sid explains that parallel sessions can overwhelm attention and proposes four approaches to manage many sessions. He highlights tooling that centralizes session management and surfaces what needs human input first.
- •Attention becomes the bottleneck beyond ~4–5 simultaneous sessions
- •Desktop app provides a control plane: sidebar across surfaces, pin/rename/color sessions
- •Claude Agents brings a similar session list to the terminal and sorts by attention needed
- •Claude Code on the web decouples sessions from your laptop/availability
- •Remote control enables phone-based prompts/approvals + notifications
33:15 – 36:19
Background autonomy: /loop and “routines” for ongoing engineering chores
Sid targets repetitive engineering overhead—PR babysitting, CI failures, doc updates, and feedback triage. He introduces /loop for periodic prompts and routines for remote, scheduled or event-driven execution in cloud containers.
- •Many recurring tasks don’t require constant human involvement
- •/loop runs a prompt at intervals (e.g., “every 10 minutes, babysit my open PRs”)
- •Routines are /loop-like automation running remotely in cloud containers
- •Triggers can be time-based or event-based; each launches a session with a prompt
- •Examples: daily doc updates; periodic feedback/issue summaries to Slack
36:19 – 37:07
Putting it all together: reliable agents, parallel throughput, and hands-off workflows
Sid summarizes the stacked system: verification makes agents trustworthy, multi-Clauding scales output, and loops/routines keep work moving without a keyboard in the hot path. He closes with the goal of spending attention only where it matters most.
- •Verification loops increase correctness and confidence in agent outputs
- •Multi-session management tools reduce attention fragmentation
- •Loops/routines automate continuous chores and long-running work
- •End state: delegate more to Claude with higher reliability, focus on high-value tasks

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

iOS

Android

Claude

Chrome

Why “babysitting agents” is wasting your time (and what this talk aims to fix)

Prerequisites: Claude.md, tool connections, and running sessions in the cloud

Rethinking developer tooling for agents, not humans

Roadmap: Verification → Multi-Clauding → Background loops

How humans verify work—and how to transfer that playbook to Claude

The core idea: put Claude into a self-correcting verification loop

Making loops concrete: run app, drive browser, prove results, unblock auth/state

Packaging verification as a reusable, self-improving “skill”

Demo setup: MonkeyType as a realistic full-stack testbed

Live demo: driving the app with Chrome MCP and generating a verification skill file

Feature demo: implement confetti-on-mistype and watch the loop fix lint/test issues

Multi-Clauding without attention overload: desktop app, agents view, web, remote control

Background autonomy: /loop and “routines” for ongoing engineering chores

Putting it all together: reliable agents, parallel throughput, and hands-off workflows

Get more out of YouTube videos.