a16zDylan Patel on GPT-5’s Router Moment, GPUs vs TPUs, Monetization
CHAPTERS
Why NVIDIA is so hard to beat in AI compute
The conversation opens with a framing claim: competing head-on with NVIDIA is nearly impossible because it out-executes across supply chain, time-to-market, and cost structure. Any challenger must deliver a step-function leap (on the order of ~5x) to overcome NVIDIA’s cumulative advantages.
- •NVIDIA advantages span networking, HBM memory, leading process nodes, and faster ramps
- •Superior negotiating leverage with TSMC, SK Hynix, rack vendors, cabling, and the broader supply chain
- •Competing by building “the same thing” is not viable; differentiation must be dramatic
- •Even strong engineers (e.g., AMD) struggle because NVIDIA’s ecosystem compounds advantages
GPT-5 reactions: why some power users feel it’s a downgrade
Dylan argues GPT-5’s reception depends on user tier and what models were available before. For heavy users, losing access to slower-but-stronger options (e.g., longer-thinking models) makes GPT-5 feel less capable on certain tasks, despite baseline improvements over earlier general models.
- •User-tier effects: some subscribers lose access to prior models (e.g., 4.5, o3) they preferred
- •GPT-5 “thinking” often runs shorter (seconds vs. tens of seconds), reducing per-query compute
- •GPT-5 is not obviously larger; improvements are more about efficiency/routing than scale
- •OpenAI appears to be optimizing away excessive “wasted” thinking tokens
The router moment: routing, graceful degradation, and compute allocation
The group unpacks the “router” as the real product shift: OpenAI can dynamically choose between base, mini, or thinking modes. This enables load-based throttling and cost control while sometimes improving free-user experiences by occasionally routing them to stronger reasoning.
- •Routing decides: base vs mini vs thinking, plus “how much to think”
- •Enables graceful degradation under load and tighter control of blended inference costs
- •Free users may occasionally get a stronger experience than before due to routing
- •Signals a shift: economics and infrastructure efficiency are now headline features
Monetization strategy: using the router to monetize free users via agents
Dylan connects routing to a broader monetization path: agentic commerce. Instead of ads that degrade an assistant experience, OpenAI could send low-value queries to cheaper models while spending heavily on high-intent queries (shopping, flights, services) where it can earn a take rate.
- •Traditional ads (banner/injection) don’t fit an AI assistant UX; monetization needs a new shape
- •Router enables value-based compute spend: cheap for trivia, expensive for high-intent tasks
- •Agentic shopping/booking can support take rates and affiliate-like economics
- •Vision: user adds a credit card; the agent completes purchases and OpenAI captures value
AI economics: cost vs performance becomes the benchmark
They argue AI model competition is moving to a Pareto frontier: quality relative to cost. GPT-5’s rollout (higher rate limits, more tokens served) looks like an “economic release,” reflecting the pressure from heavy usage and negative-margin subscription behavior.
- •Cost/performance tradeoffs increasingly define competitiveness, not just benchmark scores
- •Rate limits and token serving scale imply declining unit costs or improved infrastructure
- •Real-world usage (coding all day) can generate extremely high inference costs
- •Subscription plans invite “negative gross margin” exploitation by top power users
Pricing debate: usage-based pricing vs stickiness and product UX moats
The discussion explores why usage-based pricing is attractive to model providers but often disliked by customers seeking predictability. The stickiness of coding tools may come from workflow/UI design—how well tools help users verify, steer, and understand agent changes—rather than from models alone.
- •Model companies prefer usage-based pricing due to large variable COGS; customers want certainty
- •Coding/agent workflows require human-in-the-loop verification; UI/feedback loops matter
- •Potential moat: superior change visualization, fast feedback for small edits, richer tooling
- •Enterprise pricing may smooth variability better than consumer pricing
Advice to Sam Altman: turn ChatGPT into a transaction engine
Asked what he’d tell Sam Altman to increase OpenAI’s value, Dylan proposes immediate deployment of agentic commerce with integrated payments. The core idea is to monetize outcomes (book, buy, schedule) rather than impressions, capturing value proportional to intent.
- •Add credit card input and permissions for agentic actions inside ChatGPT
- •Deep integrations: calendar, preferences, travel booking, shopping workflows
- •Take rate model: OpenAI earns a cut on completed transactions
- •Shift away from ad debates toward monetization that preserves assistant quality
NVIDIA’s growth outlook: demand drivers and who is buying all the chips
Dylan breaks demand into segments: frontier labs (OpenAI/Anthropic), ad-driven giants, and a long tail that may be less economic. He argues value creation is already huge, but value capture by model companies is lagging, which complicates how sustainable capex growth is.
- •Frontier labs’ compute demand is accelerating; training race intensifies (Meta/Google, etc.)
- •A large share of chips go to OpenAI/Anthropic; another big chunk supports ads businesses
- •Long-tail compute buyers may be “uneconomic” and not sustainably scalable
- •Value creation is high, but value capture by AI providers is currently “broken”
Custom silicon vs NVIDIA: TPUs, Trainium, and concentration vs dispersion
Custom silicon is framed as NVIDIA’s biggest strategic threat—especially when AI workloads and customers are concentrated. If adoption disperses through open-source models and easier deployment, NVIDIA’s general platform and software ecosystem becomes more defensible; if concentrated, custom chips win more share.
- •Google TPUs are highly utilized; Amazon Trainium is growing (utilization still improving)
- •Custom silicon works best with captive customers and stable internal workloads
- •Dylan argues Google should consider selling TPUs externally (chips + racks), but culture/org is a barrier
- •Market structure matters: concentrated AI favors custom silicon; dispersed AI favors NVIDIA
Silicon startup boom: why so many new accelerators struggle
They discuss the surge of venture funding into accelerator startups, including companies raising large rounds pre-product. Dylan argues the core difficulty is that models and workload shapes change faster than chip design cycles, so bets on specific architectures can become obsolete by launch.
- •Many accelerator startups (plus older players like Groq/Cerebras/Tenstorrent) chase NVIDIA
- •Without a captive customer, startups must out-innovate while also building software + supply chain
- •Hardware-software co-design creates a moving target; model shapes evolve to what works on GPUs
- •To win, a challenger needs ~5x advantage—then hopes workloads don’t shift before ship
Data center bottlenecks: power, grid interconnects, labor, and speed-to-energize
The next constraint is less about the global amount of electricity and more about getting power to the right place, converting it, and building fast. Dylan emphasizes capital dominates TCO for modern GPU clusters, so spending more on temporary solutions can be rational if it brings clusters online sooner.
- •Bottlenecks: transmission, substations, interconnect queues, power conversion gear, skilled labor
- •Power/cooling are a smaller share of TCO versus capex on GPUs/networking/data center build
- •Firms pay premiums (generators, mobile chillers) to bring up capacity months earlier
- •Crypto mining sites and “powered shells” become valuable acquisition targets for AI buildouts
Intel’s role in the AI era: why the world still needs a second foundry
Dylan argues Intel remains strategically important because TSMC’s dominance is a systemic risk, and Samsung is not clearly ahead on leading-edge processes. Intel’s challenges are execution speed, bureaucracy, and capital needs; splitting the company may be directionally right but operationally too slow given urgency.
- •TSMC is effectively a monopoly on leading-edge, creating geopolitical and economic risk
- •Intel may be ahead of Samsung in some leading-edge development, but both trail TSMC
- •Core issues: slow design-to-ship cycles, too many revisions, heavy hierarchy, inefficiency
- •Splitting Intel could help long-term, but the restructuring effort risks running out the clock
Rapid-fire advice to tech giants: NVIDIA, Google, Meta, Apple, Microsoft, xAI
In a closing “advice” round, Dylan proposes strategic moves: NVIDIA should invest deeper into infrastructure; Google should open/monetize TPU externally; Meta should ship more AI products beyond its walled garden; Apple needs a major infrastructure push; Microsoft must fix product execution; xAI should focus and retain talent.
- •NVIDIA: use massive cash flows to accelerate/own more of the infrastructure layer despite channel conflict
- •Google: sell TPUs externally and open more of the software stack; build data centers more aggressively
- •Meta: speed product releases and compete directly in assistants/coding tools, not only internal surfaces
- •Apple/Microsoft: invest more and improve product execution; xAI: monetize pragmatically but reduce destabilizing snap decisions
AI policy and export controls: China, chips, and ecosystem consequences
They discuss export controls and China’s options: power isn’t the main limiter—capital, access to better chips abroad, and ecosystem strategy matter. Dylan notes the tradeoff: selling GPUs might slow Huawei’s ecosystem, but it can also accelerate China’s capability because models/services may capture more societal value than hardware sales.
- •China can access better compute via offshore rentals and buildouts; performance-per-dollar drives behavior
- •US buildouts are constrained by power delivery and data center readiness more than by capital alone
- •Export control narratives include ecosystem shaping (e.g., preventing Huawei software stack maturation)
- •Strategic tension: hardware sales revenue vs enabling downstream AI capability/value creation