Building the Real-World Infrastructure for AI, with Google, Cisco & a16z

AI isn’t just changing software, it’s causing the biggest buildout of physical infrastructure in modern history. In this episode, live from Runtime, a16z's Raghu Raghuram speaks with Amin Vahdat, VP and GM of AI and Infrastructure at Google, and Jeetu Patel, President and Chief Product Officer at Cisco, about the unprecedented scale of what’s being built, from chips to power grids to global data centers. They discuss the new “AI industrial revolution,” where power, compute, and network are the new scarce resources; how geopolitical competition is shaping chip design and data center placement; and why the next generation of AI infrastructure will demand co-design across hardware, software, and networking. The conversation also covers how enterprises will adapt, why we’re still in the earliest phase of this CapEx supercycle, and how AI inference, reinforcement learning, and multi-site computing will transform how systems are built and run. 00:00 Intro 01:16 The Scale of the AI Buildout 03:00 CapEx, Demand Signals, and the Power Bottleneck 05:56 Data Centers, Scarcity, and Global Power Constraints 08:18 Rethinking Systems and Networking 10:08 Scale-Out vs. Mainframe Architectures 12:18 The Next Wave in Processor Innovation 14:36 Specialized Chips, Power Efficiency, and Geopolitics 16:14 Networking Evolution and Scale Challenges 18:52 Building Networks for AI: Power, Bursts, and Bottlenecks 21:00 Inference Architecture and Cost Reduction 24:00 AI Inside the Enterprise: Code Migration and Productivity 27:30 Rewiring Culture Around Rapid AI Adoption 29:40 Startups, Models, and Intelligent Routing Layers 31:55 The Future of AI Models, Agents, and Media 33:10 Closing Thoughts Resources Full Transcript: https://a16z.substack.com/p/surviving-the-ai-sprint-up-close Follow Raghu on X: https://x.com/RaghuRaghuram Follow Jeetu on X: https://x.com/jpatel41 Follow Amin on LinkedIn: https://www.linkedin.com/in/vahdat/ Find a16z on X: https://x.com/a16z Find a16z on LinkedIn: https://www.linkedin.com/company/a16z Listen to the a16z Podcast on Spotify: https://open.spotify.com/show/5bC65RDvs3oxnLyqqvkUYX Listen to the a16z Podcast on Apple Podcasts: https://podcasts.apple.com/us/podcast/a16z-podcast/id842818711 Follow our host: https://x.com/eriktorenberg Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures.

Jeetu PatelguestAmin Vahdatguest

Oct 29, 202532mWatch on YouTube ↗

CHAPTERS

AI infrastructure boom: bigger than the internet buildout
The speakers frame today’s AI infrastructure cycle as unprecedented in speed and magnitude—potentially 100x the late-’90s internet buildout. They argue the impact spans economics, national security, and geopolitics, and that infrastructure is “sexy again” because it’s central to AI progress.
CapEx planning and demand signals: utilization, turn-aways, and long lead times
The conversation shifts to how operators plan amid a multi-year CapEx cycle with long timelines for sites, power, and supply chain. A key signal: even older generations of accelerators remain fully utilized, and teams are being turned away from projects due to capacity constraints.
The power bottleneck and the ‘can’t spend fast enough’ problem
Power emerges as the binding constraint that shapes everything from procurement to site strategy. The panel highlights a mismatch between capital availability and real-world ability to deploy it, predicting constraints could persist for several years.
Data center scarcity and building where power exists (not vice versa)
The speakers discuss how data center geography is increasingly determined by where power is available. Enterprises lag hyperscalers/neo-clouds, and future builds will require rethinking rack power density and distribution across wider regions.
Networking architectures evolve: scale-up, scale-out, and ‘scale-across’ data centers
As data centers spread farther apart, networking must connect GPUs/TPUs within racks, across clusters, and even across distant sites that behave like one logical data center. The panel describes emerging “scale-across” approaches enabling inter-data-center coherence over hundreds of kilometers.
Scale-out isn’t dead: toward a reinvented hardware–software co-designed stack
Responding to comparisons with “mainframe-like” systems, the panel argues the dominant pattern remains flexible scale-out pools of accelerators rather than fixed supercomputers. However, they expect the entire computing stack—from hardware to software—to be reinvented via deep co-design, similar to Google’s earlier era of cluster-scale transformations.
Processor innovation: the golden age of specialization and faster hardware iteration
The discussion turns to accelerators and why specialization will accelerate: performance-per-watt differences can be 10–100x versus CPUs, and power is now the governing constraint. A core challenge is the long cycle time to design and deploy new silicon, making it hard to predict what workloads will matter years out.
Chips, power efficiency, and geopolitics: different regions, different constraints
They connect hardware strategy to geopolitics: manufacturing capability (e.g., node sizes), energy abundance, and engineering labor pools differ by region. That can drive divergent architectural choices, shaping competitiveness and the global diffusion of AI infrastructure.
Networking becomes the bottleneck: bandwidth, predictability, and bursty workloads
Networking is described as an increasingly primary limiter on AI performance, with bandwidth translating directly into throughput. AI traffic patterns can be more predictable than general networking, but workloads are also extremely bursty—creating utilization and design challenges at tens to hundreds of megawatts scale.
Building networks for AI: ephemeral peak demand and stranded capacity risk
A key unresolved problem: networks sized for rare peak training events may sit underutilized most of the year, especially across wide-area interconnects. As clusters migrate to newer sites, older networks may become “stranded,” raising questions about how to design cost-effective, flexible fabrics.
Inference architecture and the moving target of cost reduction
The panel discusses specialized inference configurations and the distinct characteristics of prefill vs. decode, which may benefit from different hardware balance points. While cost per inference is dropping rapidly, demand for higher model quality and longer reasoning loops continually consumes the gains.
AI inside large enterprises: code migration, debugging, and big-system modernization
They share internal wins: coding assistance is accelerating development, and AI is making previously infeasible migrations more realistic. Google’s experience includes using AI to assist instruction-set migration across a massive codebase, motivated by past “staff millennia” estimates for platform migrations.
Rewiring culture for rapid AI adoption: iterate monthly, not yearly
Jeetu emphasizes that adoption is primarily a cultural reset: teams must revisit tools frequently because capabilities change fast. Rather than declaring tools “don’t work” and shelving them, engineers should reassess every few weeks and plan for where tools will be in six months.
Advice to founders and what’s next: agents, routing layers, and multimodal productivity
In closing, they advise startups to avoid thin wrappers around third-party models and instead build tighter product–model feedback loops plus intelligent routing across multiple models. They expect major progress in agent frameworks and in practical multimodal (image/video) inputs and outputs for productivity and education—not just novelty media.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome

AI infrastructure boom: bigger than the internet buildout

CapEx planning and demand signals: utilization, turn-aways, and long lead times

The power bottleneck and the ‘can’t spend fast enough’ problem

Data center scarcity and building where power exists (not vice versa)

Networking architectures evolve: scale-up, scale-out, and ‘scale-across’ data centers

Scale-out isn’t dead: toward a reinvented hardware–software co-designed stack

Processor innovation: the golden age of specialization and faster hardware iteration

Chips, power efficiency, and geopolitics: different regions, different constraints

Networking becomes the bottleneck: bandwidth, predictability, and bursty workloads

Building networks for AI: ephemeral peak demand and stranded capacity risk

Inference architecture and the moving target of cost reduction

AI inside large enterprises: code migration, debugging, and big-system modernization

Rewiring culture for rapid AI adoption: iterate monthly, not yearly

Advice to founders and what’s next: agents, routing layers, and multimodal productivity

Get more out of YouTube videos.