CHAPTERS
Why you’re “babysitting” agents (and why it’s a problem)
Sid Bundusaria frames the core issue: as models get smarter, developers spend more time waiting, monitoring, and QA’ing agent output. The talk’s goal is to reclaim attention and time by changing how we run and manage Claude-driven work.
Table stakes: Claude.md, tool connections, and remote environments
Before advanced workflows, Sid emphasizes three prerequisites that dramatically improve agent performance and reliability. These setup steps give Claude better instructions, richer context, and resilience beyond your laptop session.
Rethinking developer tooling for a world where agents write code
Most tooling was built to make humans faster, but agents now produce much of the code. Some tools transfer well, but the gaps come from assumptions humans make that agents don’t.
Roadmap overview: Verification → Multi-Clauding → Background loops
Sid lays out a three-part system that compounds: teach Claude to verify its own work, then safely parallelize multiple sessions, then remove the keyboard from the critical path via background automation.
How humans verify work—and how to teach Claude the same playbook
Sid asks the audience to reflect on how they validated their last project, then maps common engineering verification steps. The insight: Claude can follow the same iterative build-run-check-test-deploy cycle if given the right tools and instructions.
The core concept: “loops” as autonomous hill-climbing for reliability
The talk’s most important idea is creating an autonomous loop where Claude writes code, checks failures, debugs, and repeats until success criteria are met. This produces higher-quality PRs and reduces the need for human babysitting.
Building verification loops in practice (front-end example)
Sid makes verification concrete by breaking a UX verification loop into four parts: run the app, control a browser, prove the behavior, and unblock auth/state issues. The same pattern extends to backend, E2E, and infra verification.
Packaging verification as a reusable, self-improving “skill”
To share and reuse verification loops, Sid recommends encoding them as skills—saved context/instructions Claude can apply repeatedly. Skills can be designed to update themselves whenever a new blocker is found, becoming team-wide, self-documenting assets.
Demo setup: MonkeyType as a realistic full-stack target
Sid introduces MonkeyType (typing test app) as the demo project because it resembles real production stacks. It’s TypeScript with an Express backend plus MongoDB and Redis, and it’s open source for attendees to explore.
Live demo: using Chrome MCP to validate UI behavior
In a fresh Claude Code session, Sid shows Claude checking the running dev server and validating UI actions through the Chrome MCP. Claude navigates the site, types into the test, and changes settings to confirm behavior.
Turning the demo actions into a formal verification skill file
Sid asks Claude to summarize the verification steps into a skill file so it can be reused. The generated skill captures bringing up the stack, enabling browser tooling, and running smoke tests via automated interactions.
Feature demo: confetti on mistypes + self-verification loop in action
Sid requests a new feature (confetti animation on mistypes) and instructs Claude to verify using the new skill. Claude implements, encounters lint errors, fixes them, and re-verifies—demonstrating iterative hill-climbing without manual QA.
Multi-Clauding without attention overload (desktop, agents, web, remote control)
With verification improving reliability, Sid turns to running many Claude sessions in parallel while protecting human attention. He covers GUI and terminal tools, cloud-hosted sessions, and phone-based remote control for approvals and prompts.
Background loops: /loop and Routines to remove the keyboard from the hot path
Sid closes with automation patterns for recurring engineering chores like PR babysitting, CI health, doc updates, and feedback triage. /loop runs prompts on an interval inside a session, while Routines trigger remote sessions on schedules or events.
