No Priors Ep. 77 | With Foundry CEO and Founder Jared Quincy Davis

In this episode of No Priors, hosts Sarah and Elad are joined by Jared Quincy Davis, former DeepMind researcher and the Founder and CEO of Foundry, a new AI cloud computing service provider. They discuss the research problems that led him to starting Foundry, the current state of GPU cloud utilization, and Foundry's approach to improving cloud economics for AI workloads. Jared also touches on his predictions for the GPU market and the thinking behind his recent paper on designing compound AI systems. Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @jaredq_ Show Notes: 00:00 Introduction 02:42 Foundry background 03:57 GPU utilization for large models 07:29 Systems to run a large model 09:54 Historical value proposition of the cloud 14:45 Sharing cloud compute to increase efficiency 19:17 Foundry’s new releases 23:54 The current state of GPU capacity 29:50 GPU market dynamics 36:28 Compound systems design 40:27 Improving open-ended tasks

Sarah GuohostJared Quincy DavisguestElad Gilhost

Aug 21, 202442mWatch on YouTube ↗

WHAT IT’S REALLY ABOUT

Foundry’s Jared Davis Reimagines AI Cloud, GPU Economics, And Scaling Laws

Jared Quincy Davis, CEO of Foundry, argues that modern AI breakthroughs like AlphaFold and ChatGPT are really stories of immense compute leverage, not just small heroic teams. He explains how Foundry is rebuilding the public cloud from first principles for AI workloads, claiming 12–20x better economics by improving utilization, reliability, and elasticity of GPU clusters. Davis critiques today’s AI cloud as effectively expensive, inflexible colocation, and outlines new products like SPOT and MARS that turn underused, reserved GPU capacity into elastic, preemptible compute. He also discusses a shift from monolithic giant models toward compound AI systems and verifiable, massively parallel “networks of networks” that can push frontier performance without requiring only the largest, most interconnected GPU superclusters.

IDEAS WORTH REMEMBERING

5 ideas

GPU clusters are far more underutilized than most people assume.

Even top-tier pretraining runs often see effective utilization below 80%—sometimes under 50%—because newer GPU systems are complex, failure-prone, and require 10–20% of capacity to be held back as healing buffer.

Modern AI cloud behaves more like rigid colocation than true elastic cloud.

Developers are pushed into multi‑year, fixed‑size GPU reservations (e.g., thousands of GPUs for three years), which clashes with the inherently spiky, uncertain demand of AI training and inference and forces huge upfront capital and risk.

Better GPU economics come from orchestration and elasticity, not just more hardware.

By rethinking scheduling, reliability, and interconnect from first principles and enabling elastic access to otherwise-idle reserved capacity, Foundry claims 12–20x cost improvements over many current GPU cloud offerings.

SPOT-style preemptible capacity can safely monetize idle reserved GPUs.

Foundry’s SPOT model lets others temporarily use someone’s reserved GPUs (like parking in another person’s reserved spot) with automatic, low-friction preemption, so spot users pay less, owners earn back costs, and the platform increases effective utilization.

There is far more raw compute in the world than current AI clouds expose.

Davis notes Ethereum once used the equivalent of 10–20 million V100-class GPUs, and even phones like the iPhone 15 Pro now rival a V100 in FP16 flops—yet much of this global compute is not interconnected, secured, or orchestrated for AI workloads.

WORDS WORTH SAVING

5 quotes

There’s a really fundamental disconnect between people’s mental image of what GPUs are today and what they actually are.

— Jared Quincy Davis

Currently AI cloud is not cloud in the originally intended sense by any means… it’s basically colocation.

— Jared Quincy Davis

One of the killer ideas of the cloud was that fast was free… you could run something 1,000 times faster for the same cost, and then give the compute back.

— Jared Quincy Davis (paraphrasing early Snowflake thinking)

The percentage of the world’s GPU capacity owned by major public clouds is in the basis points.

— Jared Quincy Davis

We haven’t yet elucidated the principles for how to construct networks of networks… these compound AI systems where you have many, many calls, maybe external components.

— Jared Quincy Davis

Genesis and mission of Foundry as an AI-specific public cloudGPU utilization problems, hardware failure, and healing buffers in large clustersHow today’s AI cloud resembles colocation and parking-lot style economicsFoundry’s SPOT and MARS products to unlock idle capacity and improve reliabilityGlobal GPU capacity vs. actual usable and utilized AI computeEvolving GPU market dynamics and the limits of scaling via ever-larger clustersCompound AI systems, synthetic data, and verifiable workflows as a new scaling paradigm

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.