At a glance
WHAT IT’S REALLY ABOUT
A year of Claude Code: agents, loops, and safer autonomy
- Claude Code’s biggest productivity unlock came from turning repeated mistakes into reusable skills (e.g., updating Claude.md) so agents improve and can run longer with less supervision.
- “Verification” for agents is less about traditional unit tests and more about the agent’s ability to actually run workflows end-to-end (apps, simulators, environments) and self-check results.
- Routines operationalize automation by continuously monitoring inputs (issues, bug reports) and generating fixes/PRs proactively, reducing human time spent on CI, code review, and maintenance chores.
- Auto mode replaces constant permission prompting with model-based security classification, supported by transcripts, red teaming, and evals to earn enough trust for unattended execution.
- The organization-wide impact is role convergence (PM/design/finance coding) and a shift from prompt/context engineering toward “context minimalism,” as newer models need less scaffolding.
IDEAS WORTH REMEMBERING
5 ideasTreat failures as productizable skills, not one-off corrections.
Instead of telling Claude “do it differently” each time it errs, they update Claude.md or create a skill so the fix becomes durable and compounds over time, enabling longer autonomous runs.
Agent verification means “can it run and observe reality,” not just “did tests pass.”
They emphasize end-to-end execution—spinning up apps, using simulators, clicking through UI with computer-use, and validating behavior—because agent work often fails in environment and workflow steps beyond unit tests.
Routines are the first “obvious” programmatic use of agents at scale.
By continuously listening to GitHub issues/bug reports and generating candidate fixes and PRs, routines convert reactive maintenance into a background process, often fixing problems before the original author responds.
Auto mode increases throughput by removing humans from 99% of approvals.
Boris prefers auto mode over plan mode because newer models don’t need explicit planning artifacts, and auto mode lets him start work and switch to other agents without watching tool prompts.
Security for autonomy must be empirical and adversarial.
They earned trust in auto mode by labeling thousands of real trajectories, using red teamers to attempt prompt injection/hacks, converting attacks into evals, and iterating until suspicious actions are reliably denied.
WORDS WORTH SAVING
5 quotesLike, now I just have, like, armies of agents that are doing stuff. Like, I'm prompting one agent, or I have, like, an agent that's, like, prompting agents that's prompting agents-
— Boris Cherny
I think it's just, like, the most important idea when working on this stuff is, like, every single time Claude makes a mistake, I don't tell Claude to do it differently, I tell it to write it to the Claude MD or to, like, make a skill or, or something to do it differently.
— Boris Cherny
It's just human nature when you accept ninety-nine percent of requests that your eyes just glaze over when you read it. And so actually we feel that auto mode is more safe than reading every single permission prompt because it means that you're only paying attention to the most important thing and not, like, being spammed a bunch of things that are just ninety-nine percent yes.
— Cat Wu
I don't write the source code. I talk to an agent, and the agent writes the source code for me.
— Boris Cherny
But with the models of today, you don't do any of this. You give it the minimal possible system prompt, the minimal possible tools, and then you let the model figure it out.
— Boris Cherny
High quality AI-generated summary created from speaker-labeled transcript.
