ClaudeHow Spotify runs agents across 20M+ lines of code, with Niklas Gustavsson
At a glance
WHAT IT’S REALLY ABOUT
Spotify scales agentic coding and verification across massive monorepos
- Spotify built agent-driven automation because its codebase grew far faster than its engineering headcount, making manual migrations and maintenance unsustainably slow.
- The internal platform Honk evolved from deterministic, script-based “fleet management” to Claude Agent SDK-powered agents that can execute tasks, run CI, and safely automate PRs at scale.
- Verification and test automation are positioned as the critical enabling layer for closed-loop agent development, allowing auto-merges without requiring every owning team to review each change.
- Spotify reports large measurable productivity gains (e.g., higher PR frequency and a majority of PRs AI-authored) while aiming to keep quality metrics neutral through ongoing reliability investments.
- Beyond engineering, Spotify is expanding agent-enabled prototyping so designers, PMs, and executives can build and share working app prototypes quickly via an internal prototype “app store.”
IDEAS WORTH REMEMBERING
5 ideasAgents became viable once Spotify stopped treating changes as one-shot prompts.
Early attempts failed when they simply fed code to a model and requested a full transformation; success improved with decomposition, iterative workflows, and (initially) judging/verification to raise PR success rates.
Model quality reduced the need for an explicit “judge,” but not for verification.
Honk previously used a judge to boost success rates (roughly 20–30% to ~80%); as models improved, Spotify removed the judge, yet still relies heavily on CI/tests as the core correctness gate.
Test automation is the price of safely auto-merging agent-generated changes.
Spotify’s shift from team-reviewed changes to auto-merged automated PRs forced stronger component-level tests, because ownership teams are no longer guaranteed to be in the loop for every change.
Standardization improves agent performance as much as it improves human productivity.
Consistent frameworks, patterns, and tooling reduce ambiguity, letting agents “learn by example” from nearby code in large monorepos instead of encountering many divergent ways to do the same thing.
Large monorepos can be a strength for agents when retrieval and code reuse work well.
Despite concerns about indexing and size, Niklas reports Claude performs well in Spotify’s 20M+ LOC monorepo, often leveraging existing internal patterns to implement new changes more reliably.
WORDS WORTH SAVING
5 quotesI found myself not using an ID anymore. And like the, the way that I was working had completely changed. It changed that I had not seen in the 30 years that I've been doing this type of work.
— Niklas Gustavsson
Claude works amazingly well in those repositories and, um, I think one of the things we found is how good Claude is looking at other code, uh, in the repository to get, I guess, inspiration for the problem you're trying to solve.
— Niklas Gustavsson
We make something like 4,500 production deployments every day.
— Niklas Gustavsson
We're seeing a 75%-plus improvement in PR frequency, for example, uh, that we can directly attribute to AI tooling, and I think by now 73-ish percent of PRs are directly attributed to being AI authored.
— Niklas Gustavsson
Those types of things are, were unimaginable a year ago, and now we're doing them every day.
— Niklas Gustavsson
High quality AI-generated summary created from speaker-labeled transcript.