No PriorsNo Priors Ep. 77 | With Foundry CEO and Founder Jared Quincy Davis
At a glance
WHAT IT’S REALLY ABOUT
Foundry’s Jared Davis Reimagines AI Cloud, GPU Economics, And Scaling Laws
- Jared Quincy Davis, CEO of Foundry, argues that modern AI breakthroughs like AlphaFold and ChatGPT are really stories of immense compute leverage, not just small heroic teams. He explains how Foundry is rebuilding the public cloud from first principles for AI workloads, claiming 12–20x better economics by improving utilization, reliability, and elasticity of GPU clusters. Davis critiques today’s AI cloud as effectively expensive, inflexible colocation, and outlines new products like SPOT and MARS that turn underused, reserved GPU capacity into elastic, preemptible compute. He also discusses a shift from monolithic giant models toward compound AI systems and verifiable, massively parallel “networks of networks” that can push frontier performance without requiring only the largest, most interconnected GPU superclusters.
IDEAS WORTH REMEMBERING
5 ideasGPU clusters are far more underutilized than most people assume.
Even top-tier pretraining runs often see effective utilization below 80%—sometimes under 50%—because newer GPU systems are complex, failure-prone, and require 10–20% of capacity to be held back as healing buffer.
Modern AI cloud behaves more like rigid colocation than true elastic cloud.
Developers are pushed into multi‑year, fixed‑size GPU reservations (e.g., thousands of GPUs for three years), which clashes with the inherently spiky, uncertain demand of AI training and inference and forces huge upfront capital and risk.
Better GPU economics come from orchestration and elasticity, not just more hardware.
By rethinking scheduling, reliability, and interconnect from first principles and enabling elastic access to otherwise-idle reserved capacity, Foundry claims 12–20x cost improvements over many current GPU cloud offerings.
SPOT-style preemptible capacity can safely monetize idle reserved GPUs.
Foundry’s SPOT model lets others temporarily use someone’s reserved GPUs (like parking in another person’s reserved spot) with automatic, low-friction preemption, so spot users pay less, owners earn back costs, and the platform increases effective utilization.
There is far more raw compute in the world than current AI clouds expose.
Davis notes Ethereum once used the equivalent of 10–20 million V100-class GPUs, and even phones like the iPhone 15 Pro now rival a V100 in FP16 flops—yet much of this global compute is not interconnected, secured, or orchestrated for AI workloads.
WORDS WORTH SAVING
5 quotesThere’s a really fundamental disconnect between people’s mental image of what GPUs are today and what they actually are.
— Jared Quincy Davis
Currently AI cloud is not cloud in the originally intended sense by any means… it’s basically colocation.
— Jared Quincy Davis
One of the killer ideas of the cloud was that fast was free… you could run something 1,000 times faster for the same cost, and then give the compute back.
— Jared Quincy Davis (paraphrasing early Snowflake thinking)
The percentage of the world’s GPU capacity owned by major public clouds is in the basis points.
— Jared Quincy Davis
We haven’t yet elucidated the principles for how to construct networks of networks… these compound AI systems where you have many, many calls, maybe external components.
— Jared Quincy Davis
High quality AI-generated summary created from speaker-labeled transcript.
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome