Giving coding agents their own computers: How Cursor built cloud agents

Local agents hit a ceiling — they compete for your machine's resources, can't verify their own work, and bottleneck at one or two tasks at a time. Alexi Robbins, Head of Engineering for Cursor's async agents, share how they gave each agent its own isolated VM so agents can write code, spin up browsers, test their own changes, and deliver merge-ready PRs in parallel — now behind 30%+ of Cursor's internal merged PRs.

Alexi Robbinsguest

May 7, 202614mWatch on YouTube ↗

WHAT IT’S REALLY ABOUT

Cursor’s cloud agents: onboarding, computer-use autonomy, self-improving workflows loop

Cursor’s core thesis is that as models improve, the bottleneck shifts from intelligence to providing tools, context, and objectives that let agents operate autonomously.
They built a cloud onboarding agent that explores a repo specifically to figure out how to run it (services, env vars, permissions) and returns an interactive demo for developers to review.
To reduce repeated cloud setup friction, they created DevEx infrastructure (e.g., an Anydev CLI) so agents can start services, wait reliably, check status, and handle common tasks like test accounts and third-party sign-ins.
They argue “computer use” (pixels in, mouse/keyboard out) is a foundational autonomy primitive, enabling end-to-end GUI navigation and higher-bandwidth validation via recorded demos.
Cursor evolved toward “building the system that builds the system,” where agents report workflow pain (WCF: Work On the Factory), issues are triaged (technical/permission/ignorance), and fixes are validated via multi-agent evaluation to increase trust and adoption.

IDEAS WORTH REMEMBERING

5 ideas

Treat agent autonomy like human onboarding—start by giving them a real “computer.”

Cursor mirrors employee onboarding: environment setup, documentation, and the ability to run the app/services, because sight-reading code without execution creates bottlenecks and frustration.

Optimize the cloud dev environment because every inefficiency multiplies across runs.

Cloud agents restart from scratch each time, so missing “waits,” status checks, and service-management tooling causes widespread idle time; investing in DevEx triggers a positive feedback loop of more agent usage and value.

“Eyes” are a prerequisite for autonomy and debugging at scale.

Agents need visibility into what humans see (running app state, changes made during testing, even other agents’ chats) to prevent blind spots and reduce back-and-forth.

Computer-use capability is less about clicking accuracy and more about navigation intelligence.

GUI work resembles a video game with partial observability, one-way doors, and failure states, requiring metacognition and backtracking—skills they highlight as strengths of Claude-family computer-use models.

Recorded, end-to-end demos make agent output reviewable before reading code.

When many cloud agents run in parallel, a demo provides a high-bandwidth sanity check that the feature works, reducing the cost of context switching and code-heavy review.

WORDS WORTH SAVING

5 quotes

Models are getting really good. And for more and more work, the bottleneck is no longer the model intelligence. The bottleneck is humans giving the models the tools and the context and the increasingly ambitious, uh, tasks and objectives to go flex their potential.

— Alexi Robbins

So instead of spending your day, uh, hand-holding agents from task A to D, you take that time to, uh, build the system that can solve for A to Z.

— Alexi Robbins

The foundational primitive we believe for human autonomy is computer use.

— Alexi Robbins

If coding is like chess, where you can see all the pieces out on the board, na-navigating these GUIs is more like a video game, where you can only see a little slice at a time. There are one-way doors. There are game over states that you can get into.

— Alexi Robbins

Work On the Factory is the idea that when something is annoying, broken, or confusing, you take a moment to report it, so we can improve the tools and workflows rather than just grinding through.

— Alexi Robbins

Cloud onboarding agents for reposAgent DevEx optimization and Anydev CLIPrinciples of autonomy: eyes, tools, security constraintsComputer-use models for GUI navigation and testingHigh-bandwidth review via recorded demosScaling patterns: many small tasks vs long-running projectsSelf-improving agent experience (WCF) and issue triage

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.