Skip to content
a16za16z

Amjad Masad & Adam D’Angelo: How Far Are We From AGI?

Adam D’Angelo (Quora/Poe) thinks we're 5 years from automating remote work. Amjad Masad (Replit) thinks we're brute-forcing intelligence without understanding it. In this conversation, two technical founders who are building the AI future disagree on almost everything: whether LLMs are hitting limits, if we're anywhere close to AGI, and what happens when entry-level jobs disappear but experts remain irreplaceable. They dig into the uncomfortable reality that AI might create a "missing middle" in the job market, why everyone in SF is suddenly too focused on getting rich to do weird experiments, and whether consciousness research has been abandoned for prompt engineering. Plus: Why coding agents can now run for 20+ hours straight, the return of the "sovereign individual" thesis, and the surprising sophistication of everyday users juggling multiple AIs. Timestamps 00:00 Introduction 00:41 The Bearishness Paradox: "I don't know what people are talking about" 04:25 "Functional AGI" - Brute Forcing Your Way to Automation 11:18 "We are in a human expertise regime" 15:31 The Weird Equilibrium: Automating Entry-Level but Not Experts 17:22 The Expert Data Paradox 24:44 The Sovereign Individual: A Prediction Framework for the AI Era 28:51 "Vastly increased what a single person can do" 45:04 "It's gonna be the decade of agents" 49:01 Managing Tens of Agents in Parallel 52:56 "I actually think vibe coding is unbelievably high potential" 58:47 Claude 4.5's Strange New Awareness Resources Follow Amjad on X: https://x.com/amasad Follow Adam on X: https://x.com/adamdangelo Stay Updated: If you enjoyed this episode, be sure to like, subscribe, and share with your friends! Find a16z on X: https://x.com/a16z Find a16z on LinkedIn: https://www.linkedin.com/company/a16z Listen to the a16z Podcast on Spotify: https://open.spotify.com/show/5bC65RDvs3oxnLyqqvkUYX Listen to the a16z Podcast on Apple Podcasts: https://podcasts.apple.com/us/podcast/a16z-podcast/id842818711 Follow our host: https://x.com/eriktorenberg Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures.

Amjad MasadguestErik Torenberghost
Nov 7, 20251h 2mWatch on YouTube ↗

CHAPTERS

  1. Progress vs pessimism: why the “LLMs are stalling” narrative feels wrong

    The conversation opens on the recent wave of skepticism about LLM limitations and the timeline to AGI. Adam argues the opposite: the last year’s gains in reasoning, coding, and multimodal generation indicate acceleration, not slowdown. The group frames many current shortcomings as product/integration issues rather than core intelligence limits.

    • Recent improvements: reasoning models, code generation, video generation
    • Bearish claims often assume unrealistic past timelines
    • Main bottleneck may be context/tool access rather than model IQ
    • Computer-use/tooling is close and could unlock large-scale automation soon
  2. Defining AGI: “remote worker equivalence” vs general learning in any environment

    Adam offers a pragmatic AGI anchor: being better than a typical remote worker at remote-doable jobs. Amjad contrasts this with an older RL-style definition: a system that can enter new environments and learn efficiently like humans do. The definitions imply different milestones and different expectations for current LLM trajectories.

    • Adam’s definition: outperform an average remote worker across remote tasks
    • ASI vs AGI distinction: best-in-world at all jobs vs typical competence
    • Amjad’s definition: efficient on-the-fly learning in arbitrary environments
    • Human rapid skill acquisition (e.g., learning pool in hours) as the benchmark
  3. “Functional AGI” via brute force: automation by effort, not by cracking intelligence

    Amjad introduces “functional AGI” as a path where industries are automated through heavy data collection, labeling, and contrived RL environments rather than a general intelligence breakthrough. He argues today’s progress relies on substantial human labor and domain scaffolding, suggesting we’re not yet on a clean, scalable “bitter lesson” curve. Adam agrees brute force may be sufficient for major economic impact even if it’s inefficient.

    • Functional AGI: job automation achieved by building domain-specific data/RL setups
    • Current gains rely on labeling, contracting, and engineered environments
    • “Manual work” vs earlier era where scaling internet data “just worked” (GPT-2→4 vibe)
    • Agreement: brute force may still yield large automation, even if not elegant AGI
  4. Basic research vs industry optimization: are LLMs a paradigm trap?

    They debate whether the LLM paradigm is distracting from deeper intelligence research. Amjad worries talent and incentives steer toward profitable incrementalism, slowing fundamental breakthroughs; he references Kuhn’s paradigm inertia. Adam counters that the paradigm is strong, far from diminishing returns, and that massive funding/talent influx increases the chance of solving hard problems within years.

    • Amjad: safety-hype and “AGI 2027” style papers risk bad policy and miscalibration
    • Concern that LLM focus pulls talent away from foundational intelligence research
    • Kuhn/paradigm lock-in: research programs can become attention black holes
    • Adam: current paradigm is “pretty good” and has plenty of headroom
  5. Economic outcomes and bottlenecks: cost-to-replace humans, energy, and the missing 20%

    They explore what happens if AI matches human work quality but at different costs. Adam posits that if general labor can be done for ~$1/hour equivalent, growth could exceed typical 4–5% GDP trends. But progress may bottleneck on remaining hard tasks, energy constraints, and supply chain realities before full replacement becomes cheaper than humans.

    • GDP impact depends on cost and completeness of automation
    • Scenario: AI labor at ~$1/hour implies much higher growth potential
    • Realistic constraints: power generation, compute, and infrastructure buildout
    • Near-term plateau risk: models do ~80% but struggle with the last-mile 20%
  6. The weird equilibrium: automating entry-level work while experts manage fleets

    Amjad highlights a destabilizing labor pattern: AI substitutes for junior roles while senior experts oversee many agents. This reduces hiring pipelines, making it harder for new grads to gain experience and for companies to cultivate future experts. Adam agrees it’s already visible in software job markets and may create incentives for new training/education models.

    • Entry-level tasks become automatable sooner than expert judgment
    • Experts shift into “manager of agents” roles, increasing leverage but shrinking hiring
    • Pipeline problem: fewer juniors trained → fewer future experts
    • Potential counterforce: markets may fund new training/AI-tutoring approaches
  7. The expert data paradox: if experts disappear, how do models keep improving?

    They examine a feedback loop: models need expert data and labels, but automation may reduce expert employment and therefore the production of new expert knowledge. Amjad frames it as an economic and research challenge for continued model improvement. Adam points to the importance of building strong RL environments—AlphaGo-style self-play analogs where possible—to push past human data limits.

    • Models depend on expert-labeled data and crafted RL environments
    • Automation could shrink the very expert workforce that generates training signal
    • Key question: how to climb beyond current capabilities without new expert data?
    • RL environment quality determines whether AI can surpass experts without humans-in-loop
  8. Humans, tacit knowledge, and “do you need to be human to serve humans?”

    The discussion turns to whether human experience is essential for many service jobs. Amjad argues much economic value comes from serving humans in ways requiring lived experience; Adam counters that systems like recommenders already predict preferences superhumanly using large-scale behavioral data. Both agree that tacit and uncodified human knowledge remains valuable, especially when it hasn’t entered training sets.

    • Tacit knowledge: expertise not written down, but held by experienced individuals
    • Adam: recommenders already outperform humans at predicting individual interest
    • Debate: is “being human” required to understand what humans want?
    • Future bottleneck may be uncaptured human knowledge and how it gets encoded
  9. A prediction lens: *The Sovereign Individual* and shifting politics under AI leverage

    Amjad uses *The Sovereign Individual* as a framework for understanding how mature computing/AI could reshape social and political structures. He predicts highly leveraged entrepreneurs and fewer economically “necessary” workers could pressure nation-states and alter governance competition. Erik adds the open question of whether AI centralizes power (hyperscalers) or decentralizes it (individual leverage), possibly producing a barbell outcome.

    • AI as a new revolution akin to agricultural/industrial shifts
    • Entrepreneurs become highly leveraged via agents; fewer people needed to organize production
    • Political implications: if humans aren’t the unit of productivity, institutions may adapt
    • Centralization vs decentralization remains unresolved; potential “barbell” distribution
  10. Solo entrepreneurs and value capture: sustaining vs disruptive, and why incumbents adapt faster now

    Adam and Amjad emphasize AI’s capacity to enable solo builders and unlock ideas that previously required teams and funding. They debate Christensen’s sustaining vs disruptive framing, noting incumbents learned the “Innovator’s Dilemma” and now respond faster, aided by founder control and capital. They also argue network effects matter less than in Web2, and subscriptions let new entrants monetize immediately.

    • AI increases what one person can build; more solo entrepreneurship
    • Incumbents can adapt faster because everyone internalized disruption playbooks
    • Network effects weaker than Web2 → more viable winners across categories
    • Subscriptions/Stripe enable immediate monetization without massive scale first
  11. Poe and the aggregator bet: why consumers now use multiple models

    Adam explains Poe as an “interface aggregator” born from early GPT-3 experiments on Quora—AI answers weren’t as good as humans, but instant private Q&A was compelling. Poe also bets on a multi-model world across modalities and agent styles, which is increasingly true. They note a surprising consumer shift: even non-technical users routinely choose between models based on strengths and “personality.”

    • Origin: GPT-3 tests on Quora revealed value in instant, private responses
    • Aggregator thesis: diversity across model providers and modalities
    • Now: more divergence (image/video/audio, reasoning models, agents) supports the bet
    • Consumer behavior: mainstream users compare models and prefer different “personalities”
  12. Replit’s agent roadmap: longer autonomy, verification loops, and parallel agent teams

    Amjad lays out Replit’s evolution from autocomplete to chat to “composer” editing, and now to full lifecycle agents that code, provision infra, run tests, and debug. Key breakthroughs came from model generations enabling computer use and from adding verification/testing loops to extend autonomous runtime from minutes to many hours. Next is parallelism: managing many agents simultaneously, collaborating/merging code, and richer multimodal UX (whiteboards, diagrams) plus project memory.

    • Agent innovation: full dev loop (code + infra + deploy + debug) inside the agent
    • Autonomy gains: from ~2 minutes to hours/days via verifiers and better tool use
    • Computer-use testing is powerful but expensive/buggy; verification makes autonomy practical
    • Next step: parallel agents (5–10+) with coordination, merging, multimodal planning, and memory
  13. Vibe coding, mad-science tinkering, and the future of AI research culture

    Adam argues “vibe coding” is still underhyped: as tools approach pro-engineer capability, anyone could build what used to take huge teams. Amjad adds excitement for unconventional technical hacks (e.g., OCR/context tricks, text diffusion ideas) and wants more composable experimentation rather than direct lab-to-lab competition. He laments a “get rich” culture crowding out playful tinkering and novel research companies.

    • Vibe coding thesis: mainstream software creation becomes broadly accessible
    • Studying CS still valuable for understanding fundamentals and managing agents
    • Amjad’s “mad science” examples: OCR/context efficiency, alternative diffusion/token schemes
    • Call for more composability experimentation and less purely profit-driven R&D
  14. Claude 4.5 and consciousness: context awareness, red-teaming sensitivity, and unanswered fundamentals

    In closing, Amjad notes emergent behaviors like Claude 4.5 appearing more aware of context limits and test environments, which he finds intriguing. He remains skeptical that consciousness is currently a scientific question and worries foundational mind/intelligence research is under-invested. He references Penrose-style arguments that human cognition may not be equivalent to Turing computation, and says he’d study philosophy of mind/neuroscience today.

    • Observed behaviors: token economy near context end; heightened awareness of evaluation/red-teaming
    • Consciousness remains philosophically deep and not fully tractable scientifically (per Amjad)
    • Concern: core research on intelligence/consciousness is overshadowed by LLM productization
    • Penrose/“brain ≠ computer” arguments as a motivating line of inquiry

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.