a16zAaron Levie and Steven Sinofsky on the AI-Worker Future
CHAPTERS
From chat UI to background “workers”: what AI agents are becoming
The discussion opens by reframing the “talking to a chatbot” form factor as a temporary phase. The panel argues the endpoint is autonomous, background-running software that does real work with minimal user intervention.
- •Agents as background tasks (the Linux “&” metaphor)
- •Agentic-ness measured by how much work happens without user involvement
- •Today’s agents still feel like “bad interns,” but are improving
- •Shift from conversational UI to autonomous execution
Defining agency vs autonomy: long-running tasks and self-feedback
They separate autonomy (running for a long time) from true agency (taking outputs and feeding them back as new inputs). This introduces the technical and safety constraints of closed-loop behavior and why check-ins are needed.
- •Long-running inference is easy; self-directed feedback loops are harder
- •Agency as iterative self-guidance (output becomes input)
- •Need for intermediate user validation to prevent wasted work
- •Distribution shift risk when agents consume their own outputs
Why multi-agent decomposition is winning (and the return of Unix-style tools)
Rather than one monolithic AGI-like system, they see a practical architecture emerging: many specialized agents orchestrated together. Smaller, scoped tasks reduce drift and increase reliability, echoing the Unix philosophy of small composable tools.
- •Task subdivision reduces “getting lost” and improves outcomes
- •Specialized “expert” agents per task/domain
- •Orchestration becomes a distinct capability from deep expertise
- •AGI discourse shifting from monolith to systems of agents
Stop anthropomorphizing AI: clarifying AGI and the economics of impact
They argue AGI talk often imports human/robot narratives that distort economic reality. Even very capable systems don’t automatically imply job destruction or immediate equilibrium shifts—costs, incentives, and deployment constraints still matter.
- •Anthropomorphization drives unhelpful fear/hope cycles
- •AGI as a vague term that ‘does infinite work’ in debates
- •Economic feasibility and equilibrium determine real impact
- •AI already boosts human productivity without removing humans from the loop
Predictions, timelines, and exponential progress: why “by 2027” is a trap
The group critiques date-based forecasting, noting that exponential improvement breaks intuition and makes metrics contentious. Instead, they suggest focusing on capability drivers like compute, data, and model/tool integration.
- •Timeline predictions become arguments over definitions/metrics
- •Exponential curves are hard to forecast yet keep compounding
- •Historical cycles: AI winters, then breakthroughs (vision, translation, NLP)
- •Practical lens: compute, data, and applied capability evolution
Recursive self-improvement: feedback loops are real, but not magic
They unpack “recursive self-improvement” as a slogan that hides difficult control-theory questions. Feedback loops can converge, diverge, or asymptote; improvement doesn’t imply runaway superintelligence, especially without well-defined distributions and constraints.
- •Box-and-arrow diagrams oversimplify nonlinear dynamics
- •Control theory: stability of adaptive feedback loops is hard
- •Anthropomorphic leaps turn ‘improve’ into sci‑fi escalation
- •Systems may improve yet plateau or require external guidance
Hallucinations to verification culture: enterprise adoption is maturing
Enterprise attitudes have shifted from initial excitement to concern about hallucinations to a more nuanced operational stance. As model quality improves and tooling (RAG, context handling) matures, companies adopt AI for more critical tasks—paired with systematic review.
- •Hallucination rates improving plus better mitigation patterns (e.g., RAG)
- •Organizations learning AI is probabilistic, not deterministic
- •Value measured as ‘verify time vs do-it-yourself time’
- •Code review analogy: verification was always part of professional workflows
Experts get supercharged: tool mastery, prompting, and ‘formal language’ returning
They argue AI amplifies experts first because experts can ask better questions and detect errors. Prompting isn’t disappearing; it’s becoming more like jargon/formal language—efficient communication among domain experts—yielding better outputs with richer instructions.
- •Expert users tolerate ‘slot machine’ iteration for large productivity gains
- •Non-experts risk deploying wrong outputs due to weak evaluation ability
- •Prompts are getting longer and more structured, not shorter
- •Natural language evolves into formalized jargon for efficient precision
Workflows invert: tools don’t just automate work—work adapts to tools
A core theme is the moment when people stop forcing new tech into old processes and instead redesign processes around the new capability. They draw analogies to phones losing keypads, expense reporting evolving from forms to receipts, and email wiping out formatted agendas.
- •Early phases mimic old workflows (anthropomorphizing work)
- •Over time workflows reconfigure to match new tool strengths
- •Examples: expense reports, meeting agendas, early internet ‘jammed into Office’
- •AI adoption shifting from centralized platform teams to individual usage patterns
Abdicating logic vs reducing work: platform shifts and lost control
They debate whether using LLMs means apps are ‘abdicating logic’ to third parties, contrasting with prior shifts that mostly abstracted resources (cloud) or devices (drivers). The broader point: each platform shift changes both user interaction and what developers build against.
- •LLMs can externalize decision-making previously hard-coded in apps
- •Historical parallels: print drivers/clipboard moving into the OS broke incumbents
- •The web forced developers to give up rich UI control (gray ‘Submit’ era)
- •Platform shifts change abstractions for both users and developers
Parallel work via background agents: PR-level control and context-rot constraints
They explore why senior engineers run many background coding agents and review at the pull-request layer. The driver is practical: context windows degrade (“context rot”), so partitioning work across scoped agents (often aligned to microservices) improves reliability and throughput.
- •Many scoped agents outperform one general agent for large codebases
- •PR-level interaction becomes a managerial interface for senior engineers
- •Microservice-per-agent pattern with dedicated READMEs/instructions
- •Counter-narrative: more agents + narrower tasks + more complex prompts
Division of labor accelerates: agents reshape org design and task serialization
Agents enable parallelization of work that was previously serialized by human bandwidth and tooling constraints. They forecast a shift where individuals orchestrate many sub-agents across workstreams (events, legal matters, etc.), with new ‘AI productivity’ roles emerging.
- •Tooling historically forced linear workflows; agents unlock parallelism
- •Humans become managers of task queues and gating decisions
- •Agents surface dependencies (“I need a logo/date/venue”) earlier
- •New org roles: AI workflow designer / productivity operator
Verticalization and applied AI: why domain-specific agents create thousands of companies
They argue the future is highly vertical: agents that do specific jobs deeply (payroll specialist, signing, niche workflows). As pretraining’s broad generalization gives way to post-training, RL, and enterprise-private data, applied companies gain durable advantage.
- •Post-training and domain data access drive differentiation
- •Private enterprise data + permissions favor applied vendors
- •‘1,000 workflows’ thesis: agents per function, dept, and vertical
- •Analogy: APIs became companies (Auth, PubSub); agents may repeat this pattern
Platform competition and the application layer: why model providers won’t eat everything
They push back on fears that foundation model companies will subsume all apps, citing historical overestimation of incumbents’ ability to dominate every category. Aggressive ‘Sherlocking’ chills ecosystems, and it’s operationally hard to go deep in dozens of verticals—leaving room for specialists.
- •Big-platform fear historically exceeds reality; specialists often win
- •Ecosystem chilling effect if model providers aggressively subsume apps
- •Execution depth across many verticals is difficult for model companies
- •Coding tools are a special competitive zone; most other domains remain open