Giving coding agents their own computers: How Cursor built cloud agents

Local agents hit a ceiling — they compete for your machine's resources, can't verify their own work, and bottleneck at one or two tasks at a time. Alexi Robbins, Head of Engineering for Cursor's async agents, share how they gave each agent its own isolated VM so agents can write code, spin up browsers, test their own changes, and deliver merge-ready PRs in parallel — now behind 30%+ of Cursor's internal merged PRs.

Alexi Robbinsguest

May 8, 202614mWatch on YouTube ↗

CHAPTERS

Why the bottleneck shifted from model intelligence to tooling and context
Alexi Robbins frames the current moment: models are strong enough that the limiting factor is humans supplying tools, context, and well-scoped objectives. Cursor’s goal becomes “setting agents free” safely so they can tackle larger tasks with less hand-holding.
Three stages of building autonomous agents: from A→D hand-holding to A→Z systems
Cursor’s evolution moves from making agents more autonomous, to updating development patterns to exploit better models, to building a system that improves itself. The emphasis shifts from supervising individual tasks to engineering compounding infrastructure.
Onboarding agents like new hires: give them a computer and an environment
Cursor mirrors human onboarding: new developers receive a machine, a working dev environment, and documentation. In contrast, models were previously “thrown into” a large codebase without the ability to run/test, making effective work surprisingly difficult.
Cloud onboarding agent: exploring repos to learn how to run the system
Cursor built a cloud onboarding agent (cursor.com/onboard) that inspects a repository to determine how to run it rather than immediately making code changes. It navigates dependencies, services, environment variables, and permissions via an interactive loop with the developer.
Optimizing cloud DevEx: Anydev CLI for starting, waiting, and diagnosing services
Because cloud agents start from scratch every run, startup and coordination overhead becomes a major time sink. Cursor built an “Anydev” CLI as a Swiss Army knife to start services, wait for readiness, check status, and perform setup tasks like creating accounts.
Giving agents the same documentation (and simplified “runbooks”) humans use
To reduce stalls in edge cases, Cursor provided agent-friendly versions of internal documentation. This ensures agents can resolve tricky scenarios without constant human intervention and increases reliability across repeated runs.
Principles of autonomy: give agents eyes and the tools you have (with security constraints)
Robbins distills autonomy into two fundamentals: visibility and capability. Agents must see what developers see (apps, state changes, other chats) and be able to do what developers do (run apps, access services) within appropriate security boundaries.
Computer use as the next foundational primitive beyond coding
Cursor views “computer use” (pixels in, mouse/keyboard out) as a key step toward broader autonomy, with Claude models performing well. GUI navigation is compared to a video game: partial observability, one-way doors, and failure states demand backtracking and metacognition.
Agent-generated demos for high-bandwidth review and end-to-end validation
Robbins shows an example where an agent implements a feature and records a demo of the working result. Demos allow developers to validate behavior before diving into code, which becomes especially valuable when juggling many parallel cloud agents.
Scaling work: from “prompt instead of ticket” to long-running autonomous projects
With more reliable agents, teams shift task intake: small bugs and chores can go directly into prompts rather than accumulating in trackers. Separately, agents can take on larger projects that run longer, expanding the unit of work delegated to the cloud.
Security through freedom: why the cloud made development more enjoyable
Cloud execution provides isolation and reduces developer anxiety around secrets, environment variables, and resource management. This “security through freedom” encourages more experimentation and delegation, and reduces context-switching burdens for developers.
Compounding reliability: treat failures as system bugs and invest in fixes
Robbins emphasizes learning from failures: when agents break in repeatable ways, it’s worth debugging and implementing fixes that benefit every future run. Improvements build trust, enabling one-shot success on bigger tasks and driving more investment in the system.
Agent Experience (AX): agents iteratively improving their own workflows
Cursor formalizes “Agent Experience” as a first-class concern, analogous to Developer Experience. Agents are instructed to report issues as they work, creating a feedback pipeline where issues are collected, triaged, and fixed by humans and agents with the goal of reducing human involvement over time.
Work On the Factory (WCF) and robust fixes via agent-driven validation
The key meta-skill is WCF (“Work On the Factory”): when something is annoying/broken/confusing, the agent reports it to improve tools rather than brute-forcing. To solve flaky DevEx problems, Cursor has agents validate fixes by launching multiple cloud agent runs as an evaluation set before sending PRs for human review.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome

Why the bottleneck shifted from model intelligence to tooling and context

Three stages of building autonomous agents: from A→D hand-holding to A→Z systems

Onboarding agents like new hires: give them a computer and an environment

Cloud onboarding agent: exploring repos to learn how to run the system

Optimizing cloud DevEx: Anydev CLI for starting, waiting, and diagnosing services

Giving agents the same documentation (and simplified “runbooks”) humans use

Principles of autonomy: give agents eyes and the tools you have (with security constraints)

Computer use as the next foundational primitive beyond coding

Agent-generated demos for high-bandwidth review and end-to-end validation

Scaling work: from “prompt instead of ticket” to long-running autonomous projects

Security through freedom: why the cloud made development more enjoyable

Compounding reliability: treat failures as system bugs and invest in fixes

Agent Experience (AX): agents iteratively improving their own workflows

Work On the Factory (WCF) and robust fixes via agent-driven validation

Get more out of YouTube videos.