CHAPTERS
Progress vs pessimism: why the “LLMs are stalling” narrative feels wrong
The conversation opens on the recent wave of skepticism about LLM limitations and the timeline to AGI. Adam argues the opposite: the last year’s gains in reasoning, coding, and multimodal generation indicate acceleration, not slowdown. The group frames many current shortcomings as product/integration issues rather than core intelligence limits.
- •Recent improvements: reasoning models, code generation, video generation
- •Bearish claims often assume unrealistic past timelines
- •Main bottleneck may be context/tool access rather than model IQ
- •Computer-use/tooling is close and could unlock large-scale automation soon
Defining AGI: “remote worker equivalence” vs general learning in any environment
Adam offers a pragmatic AGI anchor: being better than a typical remote worker at remote-doable jobs. Amjad contrasts this with an older RL-style definition: a system that can enter new environments and learn efficiently like humans do. The definitions imply different milestones and different expectations for current LLM trajectories.
- •Adam’s definition: outperform an average remote worker across remote tasks
- •ASI vs AGI distinction: best-in-world at all jobs vs typical competence
- •Amjad’s definition: efficient on-the-fly learning in arbitrary environments
- •Human rapid skill acquisition (e.g., learning pool in hours) as the benchmark
“Functional AGI” via brute force: automation by effort, not by cracking intelligence
Amjad introduces “functional AGI” as a path where industries are automated through heavy data collection, labeling, and contrived RL environments rather than a general intelligence breakthrough. He argues today’s progress relies on substantial human labor and domain scaffolding, suggesting we’re not yet on a clean, scalable “bitter lesson” curve. Adam agrees brute force may be sufficient for major economic impact even if it’s inefficient.
- •Functional AGI: job automation achieved by building domain-specific data/RL setups
- •Current gains rely on labeling, contracting, and engineered environments
- •“Manual work” vs earlier era where scaling internet data “just worked” (GPT-2→4 vibe)
- •Agreement: brute force may still yield large automation, even if not elegant AGI
Basic research vs industry optimization: are LLMs a paradigm trap?
They debate whether the LLM paradigm is distracting from deeper intelligence research. Amjad worries talent and incentives steer toward profitable incrementalism, slowing fundamental breakthroughs; he references Kuhn’s paradigm inertia. Adam counters that the paradigm is strong, far from diminishing returns, and that massive funding/talent influx increases the chance of solving hard problems within years.
- •Amjad: safety-hype and “AGI 2027” style papers risk bad policy and miscalibration
- •Concern that LLM focus pulls talent away from foundational intelligence research
- •Kuhn/paradigm lock-in: research programs can become attention black holes
- •Adam: current paradigm is “pretty good” and has plenty of headroom
Economic outcomes and bottlenecks: cost-to-replace humans, energy, and the missing 20%
They explore what happens if AI matches human work quality but at different costs. Adam posits that if general labor can be done for ~$1/hour equivalent, growth could exceed typical 4–5% GDP trends. But progress may bottleneck on remaining hard tasks, energy constraints, and supply chain realities before full replacement becomes cheaper than humans.
- •GDP impact depends on cost and completeness of automation
- •Scenario: AI labor at ~$1/hour implies much higher growth potential
- •Realistic constraints: power generation, compute, and infrastructure buildout
- •Near-term plateau risk: models do ~80% but struggle with the last-mile 20%
The weird equilibrium: automating entry-level work while experts manage fleets
Amjad highlights a destabilizing labor pattern: AI substitutes for junior roles while senior experts oversee many agents. This reduces hiring pipelines, making it harder for new grads to gain experience and for companies to cultivate future experts. Adam agrees it’s already visible in software job markets and may create incentives for new training/education models.
- •Entry-level tasks become automatable sooner than expert judgment
- •Experts shift into “manager of agents” roles, increasing leverage but shrinking hiring
- •Pipeline problem: fewer juniors trained → fewer future experts
- •Potential counterforce: markets may fund new training/AI-tutoring approaches
The expert data paradox: if experts disappear, how do models keep improving?
They examine a feedback loop: models need expert data and labels, but automation may reduce expert employment and therefore the production of new expert knowledge. Amjad frames it as an economic and research challenge for continued model improvement. Adam points to the importance of building strong RL environments—AlphaGo-style self-play analogs where possible—to push past human data limits.
- •Models depend on expert-labeled data and crafted RL environments
- •Automation could shrink the very expert workforce that generates training signal
- •Key question: how to climb beyond current capabilities without new expert data?
- •RL environment quality determines whether AI can surpass experts without humans-in-loop
Humans, tacit knowledge, and “do you need to be human to serve humans?”
The discussion turns to whether human experience is essential for many service jobs. Amjad argues much economic value comes from serving humans in ways requiring lived experience; Adam counters that systems like recommenders already predict preferences superhumanly using large-scale behavioral data. Both agree that tacit and uncodified human knowledge remains valuable, especially when it hasn’t entered training sets.
- •Tacit knowledge: expertise not written down, but held by experienced individuals
- •Adam: recommenders already outperform humans at predicting individual interest
- •Debate: is “being human” required to understand what humans want?
- •Future bottleneck may be uncaptured human knowledge and how it gets encoded
A prediction lens: *The Sovereign Individual* and shifting politics under AI leverage
Amjad uses *The Sovereign Individual* as a framework for understanding how mature computing/AI could reshape social and political structures. He predicts highly leveraged entrepreneurs and fewer economically “necessary” workers could pressure nation-states and alter governance competition. Erik adds the open question of whether AI centralizes power (hyperscalers) or decentralizes it (individual leverage), possibly producing a barbell outcome.
- •AI as a new revolution akin to agricultural/industrial shifts
- •Entrepreneurs become highly leveraged via agents; fewer people needed to organize production
- •Political implications: if humans aren’t the unit of productivity, institutions may adapt
- •Centralization vs decentralization remains unresolved; potential “barbell” distribution
Solo entrepreneurs and value capture: sustaining vs disruptive, and why incumbents adapt faster now
Adam and Amjad emphasize AI’s capacity to enable solo builders and unlock ideas that previously required teams and funding. They debate Christensen’s sustaining vs disruptive framing, noting incumbents learned the “Innovator’s Dilemma” and now respond faster, aided by founder control and capital. They also argue network effects matter less than in Web2, and subscriptions let new entrants monetize immediately.
- •AI increases what one person can build; more solo entrepreneurship
- •Incumbents can adapt faster because everyone internalized disruption playbooks
- •Network effects weaker than Web2 → more viable winners across categories
- •Subscriptions/Stripe enable immediate monetization without massive scale first
Poe and the aggregator bet: why consumers now use multiple models
Adam explains Poe as an “interface aggregator” born from early GPT-3 experiments on Quora—AI answers weren’t as good as humans, but instant private Q&A was compelling. Poe also bets on a multi-model world across modalities and agent styles, which is increasingly true. They note a surprising consumer shift: even non-technical users routinely choose between models based on strengths and “personality.”
- •Origin: GPT-3 tests on Quora revealed value in instant, private responses
- •Aggregator thesis: diversity across model providers and modalities
- •Now: more divergence (image/video/audio, reasoning models, agents) supports the bet
- •Consumer behavior: mainstream users compare models and prefer different “personalities”
Replit’s agent roadmap: longer autonomy, verification loops, and parallel agent teams
Amjad lays out Replit’s evolution from autocomplete to chat to “composer” editing, and now to full lifecycle agents that code, provision infra, run tests, and debug. Key breakthroughs came from model generations enabling computer use and from adding verification/testing loops to extend autonomous runtime from minutes to many hours. Next is parallelism: managing many agents simultaneously, collaborating/merging code, and richer multimodal UX (whiteboards, diagrams) plus project memory.
- •Agent innovation: full dev loop (code + infra + deploy + debug) inside the agent
- •Autonomy gains: from ~2 minutes to hours/days via verifiers and better tool use
- •Computer-use testing is powerful but expensive/buggy; verification makes autonomy practical
- •Next step: parallel agents (5–10+) with coordination, merging, multimodal planning, and memory
Vibe coding, mad-science tinkering, and the future of AI research culture
Adam argues “vibe coding” is still underhyped: as tools approach pro-engineer capability, anyone could build what used to take huge teams. Amjad adds excitement for unconventional technical hacks (e.g., OCR/context tricks, text diffusion ideas) and wants more composable experimentation rather than direct lab-to-lab competition. He laments a “get rich” culture crowding out playful tinkering and novel research companies.
- •Vibe coding thesis: mainstream software creation becomes broadly accessible
- •Studying CS still valuable for understanding fundamentals and managing agents
- •Amjad’s “mad science” examples: OCR/context efficiency, alternative diffusion/token schemes
- •Call for more composability experimentation and less purely profit-driven R&D
Claude 4.5 and consciousness: context awareness, red-teaming sensitivity, and unanswered fundamentals
In closing, Amjad notes emergent behaviors like Claude 4.5 appearing more aware of context limits and test environments, which he finds intriguing. He remains skeptical that consciousness is currently a scientific question and worries foundational mind/intelligence research is under-invested. He references Penrose-style arguments that human cognition may not be equivalent to Turing computation, and says he’d study philosophy of mind/neuroscience today.
- •Observed behaviors: token economy near context end; heightened awareness of evaluation/red-teaming
- •Consciousness remains philosophically deep and not fully tractable scientifically (per Amjad)
- •Concern: core research on intelligence/consciousness is overshadowed by LLM productization
- •Penrose/“brain ≠ computer” arguments as a motivating line of inquiry
