Lenny's PodcastAnthropic co-founder Ben Mann: Why 2028 is his bet on AGI
Mann reframes AGI as an economic Turing test for money-weighted jobs; x-risk sits at 0 to 10 percent, with safety research now shaping Claude at Anthropic.
CHAPTERS
- 0:00 – 4:53
How soon could superintelligence arrive? Ben’s 2028 median forecast
The conversation opens with a stark framing: powerful AI could be humanity’s last necessary invention. Ben shares his current timeline intuition, putting a ~50th percentile chance of superintelligence in the 2028 range, setting the tone for the rest of the discussion.
- •Superintelligence framed as potentially humanity’s last major invention
- •Ben’s median-ish forecast: superintelligence around 2028
- •Why timelines now feel less speculative than a decade ago
- 4:53 – 7:48
The AI talent war and why mission beats money (sometimes)
Lenny asks about headline-grabbing compensation packages and poaching among top labs. Ben explains why Anthropic has been less impacted: many employees see the mission—shaping humanity’s future—as more compelling than maximizing profit.
- •$100M offers as a rational response to massive business leverage
- •Inference efficiency gains can be worth enormous sums
- •Mission orientation as a retention moat
- •Capex growth implies even larger numbers soon
- 7:48 – 10:51
Are scaling laws slowing down—or speeding up?
Ben rejects the “plateau” narrative, arguing progress is accelerating and releases are simply more frequent. He explains time-compression effects, benchmark saturation, and how shifting techniques (including RL/post-training) keeps scaling trends alive.
- •Plateau narratives recur but haven’t held up empirically
- •Release cadence has compressed from yearly to monthly/quarterly
- •Benchmark saturation hides continued capability gains
- •Scaling laws still hold, but the definition of ‘scale’ evolves
- 10:51 – 12:28
Transformative AI, the economic Turing test, and measurable AGI thresholds
Ben avoids the loaded term “AGI,” preferring “transformative AI” anchored in observable economic impact. He outlines the economic Turing test and proposes a basket-of-jobs approach for judging when AI meaningfully transforms society.
- •“Transformative AI” as a clearer, outcome-based framing
- •Economic Turing test: hire an agent; if it’s a machine and succeeds, it passes
- •A “market basket of jobs” to quantify labor displacement capability
- •Crossing the threshold implies major GDP and institutional shifts
- 12:28 – 14:49
Jobs, unemployment, and the ‘transition period’ to abundance
Building on the economic lens, Ben discusses how AI affects employment via both skills mismatch and job elimination. He projects that far past the singularity, today’s capitalism and labor market structures may not resemble the future, emphasizing the importance of managing the near-term transition.
- •Two unemployment modes: mismatch vs. outright elimination
- •Long-run abundance could make today’s job concepts obsolete
- •Institutions change slowly even if capabilities shift quickly
- •The transition period is the scary part—and worth planning for
- 14:49 – 17:45
What’s already changing at work: customer support and software engineering examples
Lenny presses for concrete present-day evidence. Ben cites high automation rates in customer support and dramatic code throughput increases, arguing we’re still early on an exponential curve even as some domains are already being reshaped.
- •People systematically underestimate exponentials
- •Customer support automation reaching high resolution rates
- •Internal software teams writing dramatically more code with AI assistance
- •Near-term effect may be “bigger pie” before widespread displacement
- 17:45 – 21:56
Career-proofing advice: be ambitious with tools, iterate, and learn fast
Ben gives pragmatic advice for individuals: embrace new AI tools and use them in bold, workflow-changing ways rather than as minor upgrades. He highlights a simple but effective tactic—retrying tasks multiple times due to model stochasticity—and notes adoption is spreading beyond engineers to legal and finance.
- •No one is fully immune long-term; near-term adaptation matters
- •Use tools ambitiously, not like “old tools”
- •Retrying (even same prompt) can markedly improve success rates
- •Non-technical teams can get major leverage from AI tools too
- 21:56 – 24:29
What he’s teaching his kids: curiosity, creativity, kindness over credentials
Asked how to prepare children for an AI future, Ben emphasizes character and learning orientation rather than traditional credential-maximizing paths. He shares how Montessori-style curiosity and self-led learning feel more durable than memorizing facts in a world where information is commoditized.
- •Montessori-inspired focus on curiosity and self-directed learning
- •Creativity and kindness as future-proof traits
- •Deprioritizing elite-school optimization in an uncertain future
- •“Facts fade into the background” as AI access expands
- 24:29 – 27:39
Leaving OpenAI to found Anthropic: safety as the top priority
Ben recounts his role at OpenAI (GPT-3, demos, Azure transfer) and the internal “tribes” tension he observed. He explains that Anthropic formed from leaders of safety efforts who believed safety needed to be the primary organizing principle, not a competing faction.
- •OpenAI’s internal tension among “safety, research, startup” tribes
- •Decision to leave centered on safety not being top priority
- •Safety talent remains scarce relative to global spend
- •Anthropic’s bet: be frontier-capable and safety-first simultaneously
- 27:39 – 30:45
Safety and capability aren’t a tradeoff: alignment as product advantage
Ben argues the safety vs. progress framing is often wrong; alignment work can improve user experience and trust. He points to Claude’s personality and low sycophancy as downstream outcomes of alignment rigor, and previews how safety enables higher-trust agentic products.
- •Safety/capability relationship can be “convex” (mutually reinforcing)
- •Claude’s personality and refusal style as alignment outputs
- •Constitutional AI builds customer trust through explicit principles
- •Reducing sycophancy as an alignment-driven differentiator
- 30:45 – 34:51
Constitutional AI, explained: principles, self-critique, and self-rewrite loops
Ben walks through how Constitutional AI works operationally: generate an answer, select applicable principles, critique for compliance, rewrite, then train the model to produce compliant outputs directly. He stresses the need for societal involvement in setting values and notes ongoing research toward “collective constitutions.”
- •A list of natural-language principles guides behavior (e.g., rights, privacy)
- •Model critiques its own responses and rewrites to comply
- •Training removes the middle steps so compliance becomes default
- •Publishing constitutions and exploring broad input on values
- 34:51 – 44:12
Why AI safety feels urgent now: from sci‑fi to bio-risk and responsible scaling
Ben describes his personal path into safety: sci‑fi background, then Nick Bostrom’s Superintelligence as a turning point. He explains how language models made the path to powerful AI clearer, introduces Anthropic’s ASL framework, and discusses biosecurity uplift evaluations and why transparency builds policy trust.
- •Superintelligence as a vivid catalyst; LMs made timelines feel concrete
- •“God in a box” vs. real-world push to give models broad access
- •ASL levels (ASL-3 today; higher levels imply larger harm potential)
- •Bio-risk uplift vs. search baselines; importance of expert evaluation
- •Publishing scary findings to enable policymakers and labs to respond
- 44:12 – 48:36
Agents, software-to-physical risk, and forecasting the superintelligence inflection
The discussion moves to autonomous agents and real-world harm pathways, noting software alone can create physical impacts (e.g., infrastructure attacks). Ben then revisits timelines, citing forecasting work (AI 2027 report) and proposes economic/GDP-based signals for recognizing a real inflection.
- •Software-only systems can still cause physical harm (infrastructure, hacking)
- •Robotics hardware is advancing; intelligence is the missing ingredient
- •Median-ish superintelligence timeframe: “small handful of years” (~2028)
- •Economic Turing tests and GDP acceleration as observable indicators
- •Societal impact will be unevenly distributed and lag capability
- 48:36 – 53:26
How hard is alignment, really? Probabilities, worlds, and what to do next
Lenny asks for odds of successful alignment; Ben offers a structured framework: pessimistic, optimistic, and “pivotal” middle worlds. He admits wide uncertainty, offers a 0–10% range for extreme bad outcomes, and argues the small probability still warrants intense effort due to the stakes and low global safety staffing.
- •Three-world model: impossible, easy-by-default, or pivotal actions matter
- •Evidence against both extremes: alignment works somewhat; deception appears
- •Forecasting rare events is hard; reference classes are limited
- •Ben’s x-risk/extremely bad outcome range: 0–10%
- •Call to expand safety work beyond just researchers (product, ops, etc.)
- 53:26 – 1:14:58
RLAIF, bottlenecks (compute/power), and Anthropic’s product innovation engine
Ben explains reinforcement learning from AI feedback (RLAIF) as a scalable path beyond RLHF, alongside risks of recursive self-improvement and power-seeking in lab settings. He then outlines key capability bottlenecks (chips, power, algorithms, data) and closes with personal reflections, the “Frontiers” (formerly Labs) team’s role, and a lightning round of recommendations and fun personal notes.
- •RLAIF: models improving via AI-generated critique/evaluation, not humans
- •Recursive self-improvement creates alignment risks (power seeking, shutdown resistance)
- •Primary bottlenecks: data centers, chips, power; plus algorithms and efficiency
- •Compute/algorithm/data as the three scaling-law ingredients
- •Frontiers/Labs team: turning frontier research into products (Claude Code, MCP)
- •Personal sustainability: “resting in motion” and ego-less culture
- •Lightning round: books, shows, mottos, and the infamous bidet tip