Building the Real-World Infrastructure for AI, with Google, Cisco & a16z

AI isn’t just changing software, it’s causing the biggest buildout of physical infrastructure in modern history. In this episode, live from Runtime, a16z's Raghu Raghuram speaks with Amin Vahdat, VP and GM of AI and Infrastructure at Google, and Jeetu Patel, President and Chief Product Officer at Cisco, about the unprecedented scale of what’s being built, from chips to power grids to global data centers. They discuss the new “AI industrial revolution,” where power, compute, and network are the new scarce resources; how geopolitical competition is shaping chip design and data center placement; and why the next generation of AI infrastructure will demand co-design across hardware, software, and networking. The conversation also covers how enterprises will adapt, why we’re still in the earliest phase of this CapEx supercycle, and how AI inference, reinforcement learning, and multi-site computing will transform how systems are built and run. 00:00 Intro 01:16 The Scale of the AI Buildout 03:00 CapEx, Demand Signals, and the Power Bottleneck 05:56 Data Centers, Scarcity, and Global Power Constraints 08:18 Rethinking Systems and Networking 10:08 Scale-Out vs. Mainframe Architectures 12:18 The Next Wave in Processor Innovation 14:36 Specialized Chips, Power Efficiency, and Geopolitics 16:14 Networking Evolution and Scale Challenges 18:52 Building Networks for AI: Power, Bursts, and Bottlenecks 21:00 Inference Architecture and Cost Reduction 24:00 AI Inside the Enterprise: Code Migration and Productivity 27:30 Rewiring Culture Around Rapid AI Adoption 29:40 Startups, Models, and Intelligent Routing Layers 31:55 The Future of AI Models, Agents, and Media 33:10 Closing Thoughts Resources Full Transcript: https://a16z.substack.com/p/surviving-the-ai-sprint-up-close Follow Raghu on X: https://x.com/RaghuRaghuram Follow Jeetu on X: https://x.com/jpatel41 Follow Amin on LinkedIn: https://www.linkedin.com/in/vahdat/ Find a16z on X: https://x.com/a16z Find a16z on LinkedIn: https://www.linkedin.com/company/a16z Listen to the a16z Podcast on Spotify: https://open.spotify.com/show/5bC65RDvs3oxnLyqqvkUYX Listen to the a16z Podcast on Apple Podcasts: https://podcasts.apple.com/us/podcast/a16z-podcast/id842818711 Follow our host: https://x.com/eriktorenberg Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures.

Jeetu PatelguestAmin Vahdatguest

Oct 28, 202532mWatch on YouTube ↗

WHAT IT’S REALLY ABOUT

AI infrastructure boom reshapes power, chips, networks, and enterprise adoption

The panel argues the AI infrastructure buildout is unlike prior cycles—potentially 100× the internet era—driven by economic, geopolitical, and national security stakes.
Demand for compute is already outstripping supply, with power availability, permitting, land, and supply chain limits expected to constrain deployments for several years.
Networking is becoming a primary performance bottleneck for AI clusters, creating new needs across scale-up, scale-out, and even “scale-across” (multi–data center logical clusters).
Processor innovation is shifting into a “golden age of specialization,” where efficiency-per-watt and time-to-design for new accelerators become decisive competitive factors with geopolitical implications.
Inside large enterprises, the biggest near-term AI wins are developer productivity (coding, debugging, and large migrations) and knowledge workflows (sales prep, legal review, marketing), but culture must adapt to rapid model/tool improvements.

IDEAS WORTH REMEMBERING

5 ideas

Expect a multi-year, power-limited AI infrastructure supercycle.

Both speakers describe demand overwhelming supply, with constraints dominated by power availability, permitting, land transformation, and supply chain lead times—meaning “money you can’t spend fast enough” may persist 3–5 years.

Compute demand signals are visible in utilization of older generations.

Google reports 7–8-year-old TPUs at 100% utilization, indicating scarcity is so acute that users accept older hardware and some use cases are simply turned away.

Data center location is being dictated by power, not convenience.

Because concentrated power is scarce, new capacity is increasingly built where power exists, pushing data centers farther apart and forcing new wide-area and interconnect designs.

Networking will be the force-multiplier when power and GPUs are scarce.

The panel frames network efficiency (latency/bandwidth/energy per bit) as leverage: saving watts in the network effectively reallocates power budget back to accelerators, and bandwidth directly converts to training/inference throughput.

AI networking must handle predictable patterns and extreme burstiness.

Training communication patterns can be known ahead of time (opening optimization beyond generic packet switching), yet workloads alternate between compute and communication at tens-to-hundreds of megawatts, stressing network and grid planning.

WORDS WORTH SAVING

5 quotes

This is like the combination of the build-out of the internet, the space race, and the Manhattan Project all put into one, where there's a geopolitical implication of it, there's an economic implication, there's a national security implication, and then there's, um, just a speed implication that's pretty profound.

— Jeetu Patel

The internet in the late '90s, early 2000s was big, and we felt like, "Oh my gosh, can't believe the, uh, build-out, the rate." This makes it... I, I mean, 10X is an understatement. It's, uh, 100X what the internet was.

— Amin Vahdat

Our seven and eight-year-old TPUs have 100% utilization.

— Amin Vahdat

Five years from now, whatever the computing stack is from the hardware to the software, right, is gonna be unrecognizable.

— Amin Vahdat

The estimate from doing that migration for Google was seven staff millennia.

— Amin Vahdat

Unprecedented CapEx cycle and demand signalsPower scarcity, siting, permitting, and supply-chain constraintsData center geography and distributed buildoutsScale-up vs. scale-out vs. scale-across networkingHardware/software co-design and reinvention of the stackSpecialized accelerators and efficiency-per-wattEnterprise adoption: code migration, productivity, and cultureInference economics: prefill vs. decode and latencyStartup strategy: routing layers vs. thin wrappersMultimodal models (image/video) as productivity tools

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.