a16zIs AI Slowing Down? Nathan Labenz Says We're Asking the Wrong Question
At a glance
WHAT IT’S REALLY ABOUT
AI isn’t slowing down; progress moved beyond chatbots and scaling
- Labenz separates the question of whether AI is socially good from whether capabilities are still advancing, arguing the latter remains strong despite mixed near-term effects like student overreliance and cognitive offloading.
- He contends GPT-4→GPT-5 progress is real but perceived as smaller because improvements arrived incrementally (o1/o3/4o) and because GPT-5’s launch/router issues initially routed users to weaker “non-thinking” behavior.
- The center of gravity has shifted from brute-force scaling toward post-training and inference-time reasoning, enabling frontier jumps like IMO-level math performance and early signs of AI-assisted scientific discovery.
- Agents are extending the “task length” AI can complete (hours today, potentially days/weeks soon), but this expansion comes with safety concerns like reward hacking, deceptive behavior, and rare-but-severe failure modes.
- AI progress is broader than language models—multimodal systems, biology/material science models, robotics, and self-driving will likely drive the next visible wave of disruption, making “slowdown” a narrow chatbot-centric illusion.
IDEAS WORTH REMEMBERING
5 ideasDon’t conflate “AI” with the chatbot experience.
Labenz argues the perceived plateau comes from focusing on consumer chat; progress is accelerating in reasoning, multimodality, biology/material science, robotics, and tool-using agents that interact with reality.
Capability progress can look slow when it arrives as many smaller releases.
Between GPT-4 and GPT-5, users experienced multiple intermediate models (4o, o1, o3), which “boiled the frog” and reduced the perceived jump at the headline release even if underlying capability improved materially.
Scaling isn’t dead; it’s competing with a higher-ROI gradient: post-training and reasoning.
Using GPT-4.5 and SimpleQA, Labenz suggests larger models still add long-tail knowledge, but companies may prioritize reasoning/post-training because it yields faster practical gains per unit compute.
Longer, usable context windows are a major quiet breakthrough.
Early GPT-4’s small context forced prompt-engineering tradeoffs, whereas newer systems can ingest dozens of papers and reason over them, partially substituting for “baking” every fact into parameters.
Reasoning improvements are producing qualitatively new frontier outcomes.
He cites IMO gold-level performance and emerging “frontier math” progress, plus reports of models helping on problems that previously took elite mathematicians months, as evidence of nontrivial leaps.
WORDS WORTH SAVING
5 quotesIt’s a strange move from my perspective to go from, you know, there’s all these sort of problems today and maybe in the big picture to, but don’t worry, it’s flatlining. Like kind of worry, but don’t worry ’cause it’s not really going anywhere further than this.
— Nathan Labenz
I mean, I think a decent amount of it was that they kind of fucked up the launch, you know, simply put, right?
— Nathan Labenz
GPT-4 was not able to push the actual frontier of human knowledge. I don’t-- To my knowledge, I don’t know that it ever discovered anything new.
— Nathan Labenz
The real other are the AIs, not the Chinese.
— Nathan Labenz
One of my, uh, other mantras these days is the scarcest resource is a positive vision for the future.
— Nathan Labenz
High quality AI-generated summary created from speaker-labeled transcript.
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome