No Priors Ep. 77 | With Foundry CEO and Founder Jared Quincy Davis

No Priors Ep. 77 | With Foundry CEO and Founder Jared Quincy Davis

No PriorsAug 22, 202442m

Sarah Guo (host), Jared Quincy Davis (guest), Elad Gil (host), Elad Gil (host), Elad Gil (host), Sarah Guo (host)

Genesis and mission of Foundry as an AI-specific public cloudGPU utilization problems, hardware failure, and healing buffers in large clustersHow today’s AI cloud resembles colocation and parking-lot style economicsFoundry’s SPOT and MARS products to unlock idle capacity and improve reliabilityGlobal GPU capacity vs. actual usable and utilized AI computeEvolving GPU market dynamics and the limits of scaling via ever-larger clustersCompound AI systems, synthetic data, and verifiable workflows as a new scaling paradigm

In this episode of No Priors, featuring Sarah Guo and Jared Quincy Davis, No Priors Ep. 77 | With Foundry CEO and Founder Jared Quincy Davis explores foundry’s Jared Davis Reimagines AI Cloud, GPU Economics, And Scaling Laws Jared Quincy Davis, CEO of Foundry, argues that modern AI breakthroughs like AlphaFold and ChatGPT are really stories of immense compute leverage, not just small heroic teams. He explains how Foundry is rebuilding the public cloud from first principles for AI workloads, claiming 12–20x better economics by improving utilization, reliability, and elasticity of GPU clusters. Davis critiques today’s AI cloud as effectively expensive, inflexible colocation, and outlines new products like SPOT and MARS that turn underused, reserved GPU capacity into elastic, preemptible compute. He also discusses a shift from monolithic giant models toward compound AI systems and verifiable, massively parallel “networks of networks” that can push frontier performance without requiring only the largest, most interconnected GPU superclusters.

Foundry’s Jared Davis Reimagines AI Cloud, GPU Economics, And Scaling Laws

Jared Quincy Davis, CEO of Foundry, argues that modern AI breakthroughs like AlphaFold and ChatGPT are really stories of immense compute leverage, not just small heroic teams. He explains how Foundry is rebuilding the public cloud from first principles for AI workloads, claiming 12–20x better economics by improving utilization, reliability, and elasticity of GPU clusters. Davis critiques today’s AI cloud as effectively expensive, inflexible colocation, and outlines new products like SPOT and MARS that turn underused, reserved GPU capacity into elastic, preemptible compute. He also discusses a shift from monolithic giant models toward compound AI systems and verifiable, massively parallel “networks of networks” that can push frontier performance without requiring only the largest, most interconnected GPU superclusters.

Key Takeaways

GPU clusters are far more underutilized than most people assume.

Even top-tier pretraining runs often see effective utilization below 80%—sometimes under 50%—because newer GPU systems are complex, failure-prone, and require 10–20% of capacity to be held back as healing buffer.

Get the full analysis with uListen AI

Modern AI cloud behaves more like rigid colocation than true elastic cloud.

Developers are pushed into multi‑year, fixed‑size GPU reservations (e. ...

Get the full analysis with uListen AI

Better GPU economics come from orchestration and elasticity, not just more hardware.

By rethinking scheduling, reliability, and interconnect from first principles and enabling elastic access to otherwise-idle reserved capacity, Foundry claims 12–20x cost improvements over many current GPU cloud offerings.

Get the full analysis with uListen AI

SPOT-style preemptible capacity can safely monetize idle reserved GPUs.

Foundry’s SPOT model lets others temporarily use someone’s reserved GPUs (like parking in another person’s reserved spot) with automatic, low-friction preemption, so spot users pay less, owners earn back costs, and the platform increases effective utilization.

Get the full analysis with uListen AI

There is far more raw compute in the world than current AI clouds expose.

Davis notes Ethereum once used the equivalent of 10–20 million V100-class GPUs, and even phones like the iPhone 15 Pro now rival a V100 in FP16 flops—yet much of this global compute is not interconnected, secured, or orchestrated for AI workloads.

Get the full analysis with uListen AI

Scaling AI is shifting from ever-bigger monolithic models to compound systems.

Examples like Phi‑3, LLaMA‑3. ...

Get the full analysis with uListen AI

Verifiable tasks enable “networks of networks” that dramatically boost accuracy.

In Davis’s compound AI paper, repeatedly calling models, then using a verifier/judge to select the best answer, yielded 10x gains on prime factorization and ~3% MMLU improvements—bigger than typical SOTA gaps—suggesting many future systems will rely on such best‑of‑K, parallelizable pipelines.

Get the full analysis with uListen AI

Notable Quotes

There’s a really fundamental disconnect between people’s mental image of what GPUs are today and what they actually are.

Jared Quincy Davis

Currently AI cloud is not cloud in the originally intended sense by any means… it’s basically colocation.

Jared Quincy Davis

One of the killer ideas of the cloud was that fast was free… you could run something 1,000 times faster for the same cost, and then give the compute back.

Jared Quincy Davis (paraphrasing early Snowflake thinking)

The percentage of the world’s GPU capacity owned by major public clouds is in the basis points.

Jared Quincy Davis

We haven’t yet elucidated the principles for how to construct networks of networks… these compound AI systems where you have many, many calls, maybe external components.

Jared Quincy Davis

Questions Answered in This Episode

How far can compound AI systems and verifiable best‑of‑K pipelines push performance before we truly need another 10x in raw compute?

Jared Quincy Davis, CEO of Foundry, argues that modern AI breakthroughs like AlphaFold and ChatGPT are really stories of immense compute leverage, not just small heroic teams. ...

Get the full analysis with uListen AI

What technical and contractual barriers must be overcome to safely and securely expose more of the world’s idle GPU and non‑GPU compute to AI workloads?

Get the full analysis with uListen AI

How might GPU financial instruments (futures, options, secondary markets) change risk management and business models for AI companies?

Get the full analysis with uListen AI

In practice, how should engineers decide when to train a new model versus composing multiple existing models and tools into a compound system?

Get the full analysis with uListen AI

What new abstractions and developer tools are needed so typical teams—not just elite research labs—can design, debug, and reliably run large “networks of networks” AI pipelines?

Get the full analysis with uListen AI

Transcript Preview

Sarah Guo

(music plays) Hi, listeners. Welcome to No Priors. Today we're talking to Jarrod Quincy Davis, the founder and CEO of Foundry. Jarrod worked at DeepMind and was doing his PhD with Matei Zaharia at Stanford before he began his mission to orchestrate compute with Foundry. We're excited to have him on to talk about GPUs and the future of the cloud. Welcome, Jarrod.

Jared Quincy Davis

Thanks, Sara, and great to see you. Thanks a lot as well.

Elad Gil

Yeah, great seeing ya.

Sarah Guo

The mission at Foundry is directly related to some problems that you, uh, had seen in, in research and at DeepMind. Can you talk a little bit about the genesis?

Jared Quincy Davis

A couple of the most inspiring events I've witnessed in my career so far were the release of AlphaFold 2 and also ChatGPT. Um, you know, I think that one of the things that was so remarkable to me about AlphaFold 2 is initially it was a really small team, you know, three and then later 18 people or so, and they solved what was kind of a 50-year grand challenge in biology, which is a pretty remarkable fact that, you know, every university, every pharma company hadn't solved. And similarly with, with ChatGPT, a pretty small team, OpenAI was 400 people at the time, you know, released a system that really shook up the entire global business landscape. You know, that's a pretty remarkable thing, you know, and I think it's kind of intriguing to think about what would need to happen for those types of events to be a lot more common in the world. And, you know, although those events are really amazing 'cause of the small numbers of people working on them, I think, you know, it's not quite the David and Goliath story, neither are quite the David and Goliath story that they appear to be when you, when you double-click. In OpenAI's case, you know, there are only 400 people, but had $13 billion worth of compute, you know, which is, uh, quite a bit of computational scale there. And in DeepMind's case, it was a small team, you know, but obviously they were standing on the shoulders of giants in some sense with Google, right? And the leverage that they had via Google. And so, one thing I think that, you know, we thought about is, you know, what can we do to make the type of computational leverage and tools that are currently exclusively the domain of OpenAI and DeepMind kind of available to a much broader class of people? And so that's a lot of what we worked on with Foundry, saying, "C- can we build a public cloud, y- built specifically for AI workloads, where we reimagine a lot of the components that constitute the cloud end-to-end for first, from first principles? And in doing that, can we make things that currently cost a billion dollars cost 100 million, then 10 million, you know, over time?" And that'd be a pretty massive contribution. I think it would increase the frequency of events like AlphaFold 2 by 10X, 100X, or maybe even more super linearly. Um, and we're already starting to see the early signs of that, um, but quite a, you know, quite a lot of room left to push this agenda. So, really exciting. Um, so that's kind of maybe an initial introduction, preamble to how we thought about it, and I can trace that line of reasoning a bit more, but, um, that's kind of part of what we've done and...

Install uListen to search the full transcript and get AI-powered insights

Get Full Transcript

Get more from every podcast

AI summaries, searchable transcripts, and fact-checking. Free forever.

Add to Chrome