How Stripe deploys 1,300 AI-written PRs per week

How Stripe deploys 1,300 AI-written PRs per week

How I AIMar 25, 202641m

Steve Kaliski (guest), Claire Vo (host)

Minions concept and brandingActivation energy vs execution in large orgsCloud devboxes and parallel agent workstreamsAgent loop, system prompts, and one-shot PR creationGoose agent harness (forked/customized)CI/testing, blue-green deploys, and review at scaleMachine-to-machine payments and agent receiptsAPI-first, ephemeral businesses selling to agents

In this episode of How I AI, featuring Steve Kaliski and Claire Vo, How Stripe deploys 1,300 AI-written PRs per week explores stripe’s Slack-triggered AI “Minions” ship 1,300 reviewed PRs weekly Stripe uses Slack-activated AI “Minions” that provision isolated cloud dev environments, run an agent loop, and often deliver a ready-to-review pull request from a single prompt.

Stripe’s Slack-triggered AI “Minions” ship 1,300 reviewed PRs weekly

Stripe uses Slack-activated AI “Minions” that provision isolated cloud dev environments, run an agent loop, and often deliver a ready-to-review pull request from a single prompt.

The biggest benefit is reduced “activation energy” in large organizations: work can start from natural collaboration surfaces (Slack/Jira/Docs) rather than requiring an engineer to manually set up context in an editor.

Minions ride on top of strong developer experience investments—hosted devboxes, internal docs, CI/testing, and standardized workflows—which increase agent success rates in a massive codebase.

Stripe handles high volume (about 1,300 AI-written PRs/week) by relying on robust CI, test coverage, and safe deployment practices, shifting human effort from authoring to review and product judgment.

A second demo shows agents as economic actors via a “machine payment protocol,” where Claude spends small amounts on third-party APIs (browser sessions, search, mail) and provides an itemized receipt plus a Stripe Climate offset.

Key Takeaways

Lowering activation energy can matter more than faster execution.

Stripe starts work where ideas already live (Slack/Jira/Docs) and triggers agents with an emoji reaction, reducing the friction between a suggestion and a concrete code change.

Get the full analysis with uListen AI

Great DX is a prerequisite for reliable agents in large codebases.

Internal docs, “blessed paths,” and standardized workflows make it likelier an agent can find the right place to change code without blowing the context window or wandering.

Get the full analysis with uListen AI

Cloud environments unlock true multi-threaded agentic engineering.

Instead of running multiple worktrees locally, Stripe provisions many isolated devboxes in parallel, allowing multiple agents to work concurrently without laptop constraints.

Get the full analysis with uListen AI

One-shot prompting is feasible when the harness and tools do the heavy lifting.

The visible system prompt is intentionally minimal (“Implement this task completely…”) because the agent loop, tool access (MCP servers), and prebuilt workflows supply structure and context.

Get the full analysis with uListen AI

Review scalability depends more on CI confidence than on who wrote the code.

Stripe emphasizes strong test coverage, synthetics, and safe rollout/rollback (e. ...

Get the full analysis with uListen AI

High agent output shifts bottlenecks to review, ideas, and distribution.

If code generation becomes cheap, organizations must invest in review processes, prioritization, and coordination so increased throughput produces real product value.

Get the full analysis with uListen AI

Agents will increasingly pay for capabilities, not just consume tokens.

The birthday-party demo shows an agent purchasing micro-access to services (BrowserBase, Parallel, PostalForm) and producing a receipt, foreshadowing agent-native monetization for API-first businesses.

Get the full analysis with uListen AI

Notable Quotes

At Stripe, we're landing about thirteen hundred PRs that have no human assistance besides review per week.

Steve Kaliski

I don't remember the last time I started work in the text editor.

Steve Kaliski

When you're in larger organizations, there's so much friction that can come between a good idea and getting it into the world.

Claire Vo

Whether the text has been written by Steve or the text has been written by Steve's robot, you still want that CI environment that's providing confidence.

Steve Kaliski

You know you're doing something wrong if I have to load environmental variables to celebrate someone's birthday.

Claire Vo

Questions Answered in This Episode

What specific toolchain steps does a Minion run end-to-end (provisioning, code search, edits, tests, commit, PR creation), and where are the most common failure points?

Stripe uses Slack-activated AI “Minions” that provision isolated cloud dev environments, run an agent loop, and often deliver a ready-to-review pull request from a single prompt.

Get the full analysis with uListen AI

Stripe reports ~1,300 AI-written PRs/week—what portion are trivial changes (docs/tests) vs production logic, and how do you gate riskier categories?

The biggest benefit is reduced “activation energy” in large organizations: work can start from natural collaboration surfaces (Slack/Jira/Docs) rather than requiring an engineer to manually set up context in an editor.

Get the full analysis with uListen AI

How is Goose customized or forked inside Stripe, and what capabilities were missing from off-the-shelf commercial agents?

Minions ride on top of strong developer experience investments—hosted devboxes, internal docs, CI/testing, and standardized workflows—which increase agent success rates in a massive codebase.

Get the full analysis with uListen AI

What does the reviewer experience look like for Minion PRs—are there special diff summaries, provenance signals, or automated checklists generated by the agent?

Stripe handles high volume (about 1,300 AI-written PRs/week) by relying on robust CI, test coverage, and safe deployment practices, shifting human effort from authoring to review and product judgment.

Get the full analysis with uListen AI

How do you prevent parallel Minions from stepping on each other (conflicting changes, shared dependencies, flaky tests) when many devboxes run concurrently?

A second demo shows agents as economic actors via a “machine payment protocol,” where Claude spends small amounts on third-party APIs (browser sessions, search, mail) and provides an itemized receipt plus a Stripe Climate offset.

Get the full analysis with uListen AI

Transcript Preview

Steve Kaliski

At Stripe, we're landing about thirteen hundred PRs that have no human assistance besides review per week. A lot of where our work begins is it could be in a Google Doc as we're planning a new feature, or maybe a Jira ticket comes in, or we're talking about something in Slack. I can click an emoji, and then the Minion will sort of attempt to one-shot resolving that prompt using all the tools that are available at Stripe.

Claire Vo

When you're in larger organizations, there's so much friction that can come between a good idea and getting it into the world.

Steve Kaliski

Not only can I have one of these, but I could have many, many of these running in parallel in isolated environments, making isolated changes all at the same time.

Claire Vo

How are you getting all this code review done?

Steve Kaliski

Whether the text has been written by Steve or the text has been written by Steve's robot, you still want that CI environment that's providing confidence that the code that's being changed is safe, and that as it rolls out, we're having blue/green deployments, so you can roll back, too. All that is super critical, independent of the nature of the authoring of it.

Claire Vo

No matter how juiced these laptops are, you get three or four worktrees in, and, like, it starts to sound like an airplane taking off. It's no good. And so I do think on this multi-threading agentic engineering work, cloud environments and virtual environments are so important to unlock velocity. [upbeat music] Welcome back to How I AI. I'm Claire Vo, product leader and AI obsessive, here on a mission to help you build better with these new tools. Today we have Steve Kaliski, a software engineer at Stripe, and he's gonna show us how the Stripe team deploys a bunch of Minions to do their engineering work. We'll also watch an agent spend a little bit over five dollars to plan a birthday party, all in Claude Code. Let's get to it. This episode is brought to you by Optimizely. Most marketing teams aren't short on ideas, but what they are short on is time, and that's exactly what Optimizely Opal gives you back. With AI agents that handle real marketing workflows, you know, like creating content and checking compliance, generating experiment variations, personalizing user experiences, analyzing pages for GEO, even tasks like approvals and reporting. It's your AI agent orchestration platform for marketing and digital teams, plugging seamlessly into the tools you already use, handling the boring busy work, and keeping everything on brand. That leaves marketers with more time to do your actual job. See what Opal can automate for your team by signing up for a free enterprise agentic AI workshop with Optimizely. Find out more at optimizely.com/howiai. Attend live, and you'll get a free pair of Rayban Meta AI glasses. Steve, I'm so excited to have you on How I AI because I saw the Stripe Minions on the timeline, and one, exceptional branding, don't sue us, and two, I just love the idea that you and your colleagues in the team at Stripe have created not just one agent, but minions all across the company that can help with development work. And I'm so excited for you to show us how that helps you in your day-to-day here. So welcome to How I AI.

Install uListen to search the full transcript and get AI-powered insights

Get Full Transcript

Get more from every podcast

AI summaries, searchable transcripts, and fact-checking. Free forever.

Add to Chrome