Skip to content
a16za16z

Aaron Levie and Steven Sinofsky on the AI-Worker Future

What exactly is an AI agent, and how will agents change the way we work? In this episode, a16z general partners Erik Torenberg and Martin Casado sit down with Aaron Levie (CEO, Box) and Steven Sinofsky (a16z board partner; former Microsoft exec) to unpack one of the hottest debates in AI right now. They cover: - Competing definitions of an “agent,” from background tasks to autonomous interns - Why today’s agents look less like a single AGI and more like networks of specialized sub-agents - The technical challenges of long-running, self-improving systems - How agent-driven workflows could reshape coding, productivity, and enterprise software - What history — from the early PC era to the rise of the internet — tells us about platform shifts like this one The conversation moves from deep technical questions to big-picture implications for founders, enterprises, and the future of work. Timecodes: 0:00 Introduction: The Evolution of AI Agents 0:36 Defining Agency and Autonomy 1:39 Long-Running Agents and Feedback Loops 4:27 Specialization and Task Division in AI 6:04 Anthropomorphizing AI and Economic Impact 9:10 Predictions, Progress, and Platform Shifts 11:31 Recursive Self-Improvement and Technical Challenges 13: 13 Hallucinations, Verification, and Expert Productivity 16:16 The Role of Experts and Tool Adoption 22:14 Changing Workflows: Agents Reshaping Work Patterns 45:55 Division of Labor, Specialization, and New Roles 48:47 Verticalization, Applied AI, and the Future of Agents 54:44 Platform Competition and the Application Layer Resources: Find Aaron on X: https://x.com/levie Find Martin on X: https://x.com/martin_casado Find Steven on X: https://x.com/stevesi Stay Updated: Let us know what you think: https://ratethispodcast.com/a16z Find a16z on Twitter: https://twitter.com/a16z Find a16z on LinkedIn: https://www.linkedin.com/company/a16z Subscribe on your favorite podcast app: https://a16z.simplecast.com/ Follow our host: https://x.com/eriktorenberg Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details, please see a16z.com/disclosures.

Erik TorenberghostMartin CasadohostSteven Sinofskyguest
Aug 25, 202556mWatch on YouTube ↗

CHAPTERS

  1. From chat UI to background “workers”: what AI agents are becoming

    The discussion opens by reframing the “talking to a chatbot” form factor as a temporary phase. The panel argues the endpoint is autonomous, background-running software that does real work with minimal user intervention.

    • Agents as background tasks (the Linux “&” metaphor)
    • Agentic-ness measured by how much work happens without user involvement
    • Today’s agents still feel like “bad interns,” but are improving
    • Shift from conversational UI to autonomous execution
  2. Defining agency vs autonomy: long-running tasks and self-feedback

    They separate autonomy (running for a long time) from true agency (taking outputs and feeding them back as new inputs). This introduces the technical and safety constraints of closed-loop behavior and why check-ins are needed.

    • Long-running inference is easy; self-directed feedback loops are harder
    • Agency as iterative self-guidance (output becomes input)
    • Need for intermediate user validation to prevent wasted work
    • Distribution shift risk when agents consume their own outputs
  3. Why multi-agent decomposition is winning (and the return of Unix-style tools)

    Rather than one monolithic AGI-like system, they see a practical architecture emerging: many specialized agents orchestrated together. Smaller, scoped tasks reduce drift and increase reliability, echoing the Unix philosophy of small composable tools.

    • Task subdivision reduces “getting lost” and improves outcomes
    • Specialized “expert” agents per task/domain
    • Orchestration becomes a distinct capability from deep expertise
    • AGI discourse shifting from monolith to systems of agents
  4. Stop anthropomorphizing AI: clarifying AGI and the economics of impact

    They argue AGI talk often imports human/robot narratives that distort economic reality. Even very capable systems don’t automatically imply job destruction or immediate equilibrium shifts—costs, incentives, and deployment constraints still matter.

    • Anthropomorphization drives unhelpful fear/hope cycles
    • AGI as a vague term that ‘does infinite work’ in debates
    • Economic feasibility and equilibrium determine real impact
    • AI already boosts human productivity without removing humans from the loop
  5. Predictions, timelines, and exponential progress: why “by 2027” is a trap

    The group critiques date-based forecasting, noting that exponential improvement breaks intuition and makes metrics contentious. Instead, they suggest focusing on capability drivers like compute, data, and model/tool integration.

    • Timeline predictions become arguments over definitions/metrics
    • Exponential curves are hard to forecast yet keep compounding
    • Historical cycles: AI winters, then breakthroughs (vision, translation, NLP)
    • Practical lens: compute, data, and applied capability evolution
  6. Recursive self-improvement: feedback loops are real, but not magic

    They unpack “recursive self-improvement” as a slogan that hides difficult control-theory questions. Feedback loops can converge, diverge, or asymptote; improvement doesn’t imply runaway superintelligence, especially without well-defined distributions and constraints.

    • Box-and-arrow diagrams oversimplify nonlinear dynamics
    • Control theory: stability of adaptive feedback loops is hard
    • Anthropomorphic leaps turn ‘improve’ into sci‑fi escalation
    • Systems may improve yet plateau or require external guidance
  7. Hallucinations to verification culture: enterprise adoption is maturing

    Enterprise attitudes have shifted from initial excitement to concern about hallucinations to a more nuanced operational stance. As model quality improves and tooling (RAG, context handling) matures, companies adopt AI for more critical tasks—paired with systematic review.

    • Hallucination rates improving plus better mitigation patterns (e.g., RAG)
    • Organizations learning AI is probabilistic, not deterministic
    • Value measured as ‘verify time vs do-it-yourself time’
    • Code review analogy: verification was always part of professional workflows
  8. Experts get supercharged: tool mastery, prompting, and ‘formal language’ returning

    They argue AI amplifies experts first because experts can ask better questions and detect errors. Prompting isn’t disappearing; it’s becoming more like jargon/formal language—efficient communication among domain experts—yielding better outputs with richer instructions.

    • Expert users tolerate ‘slot machine’ iteration for large productivity gains
    • Non-experts risk deploying wrong outputs due to weak evaluation ability
    • Prompts are getting longer and more structured, not shorter
    • Natural language evolves into formalized jargon for efficient precision
  9. Workflows invert: tools don’t just automate work—work adapts to tools

    A core theme is the moment when people stop forcing new tech into old processes and instead redesign processes around the new capability. They draw analogies to phones losing keypads, expense reporting evolving from forms to receipts, and email wiping out formatted agendas.

    • Early phases mimic old workflows (anthropomorphizing work)
    • Over time workflows reconfigure to match new tool strengths
    • Examples: expense reports, meeting agendas, early internet ‘jammed into Office’
    • AI adoption shifting from centralized platform teams to individual usage patterns
  10. Abdicating logic vs reducing work: platform shifts and lost control

    They debate whether using LLMs means apps are ‘abdicating logic’ to third parties, contrasting with prior shifts that mostly abstracted resources (cloud) or devices (drivers). The broader point: each platform shift changes both user interaction and what developers build against.

    • LLMs can externalize decision-making previously hard-coded in apps
    • Historical parallels: print drivers/clipboard moving into the OS broke incumbents
    • The web forced developers to give up rich UI control (gray ‘Submit’ era)
    • Platform shifts change abstractions for both users and developers
  11. Parallel work via background agents: PR-level control and context-rot constraints

    They explore why senior engineers run many background coding agents and review at the pull-request layer. The driver is practical: context windows degrade (“context rot”), so partitioning work across scoped agents (often aligned to microservices) improves reliability and throughput.

    • Many scoped agents outperform one general agent for large codebases
    • PR-level interaction becomes a managerial interface for senior engineers
    • Microservice-per-agent pattern with dedicated READMEs/instructions
    • Counter-narrative: more agents + narrower tasks + more complex prompts
  12. Division of labor accelerates: agents reshape org design and task serialization

    Agents enable parallelization of work that was previously serialized by human bandwidth and tooling constraints. They forecast a shift where individuals orchestrate many sub-agents across workstreams (events, legal matters, etc.), with new ‘AI productivity’ roles emerging.

    • Tooling historically forced linear workflows; agents unlock parallelism
    • Humans become managers of task queues and gating decisions
    • Agents surface dependencies (“I need a logo/date/venue”) earlier
    • New org roles: AI workflow designer / productivity operator
  13. Verticalization and applied AI: why domain-specific agents create thousands of companies

    They argue the future is highly vertical: agents that do specific jobs deeply (payroll specialist, signing, niche workflows). As pretraining’s broad generalization gives way to post-training, RL, and enterprise-private data, applied companies gain durable advantage.

    • Post-training and domain data access drive differentiation
    • Private enterprise data + permissions favor applied vendors
    • ‘1,000 workflows’ thesis: agents per function, dept, and vertical
    • Analogy: APIs became companies (Auth, PubSub); agents may repeat this pattern
  14. Platform competition and the application layer: why model providers won’t eat everything

    They push back on fears that foundation model companies will subsume all apps, citing historical overestimation of incumbents’ ability to dominate every category. Aggressive ‘Sherlocking’ chills ecosystems, and it’s operationally hard to go deep in dozens of verticals—leaving room for specialists.

    • Big-platform fear historically exceeds reality; specialists often win
    • Ecosystem chilling effect if model providers aggressively subsume apps
    • Execution depth across many verticals is difficult for model companies
    • Coding tools are a special competitive zone; most other domains remain open

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.