No Priors Ep. 3 | With Stability AI’s Emad Mostaque

AI-generated images have been everywhere over the past year, but one company has fueled an explosive developer ecosystem around large image models: Stability AI. Stability builds open AI tools with a mission to improve humanity. Stability AI is most known for Stable Diffusion, the AI model where a user puts in a natural language prompt and the AI generates images. But they're also engaged in progressing models in natural language, voice, video, and biology. This week on the podcast, Emad Mostaque joins Sarah Guo and Elad Gil to talk about how this barely one-year-old, London-based company has changed the AI landscape, scaling laws, progress in different modalities, frameworks for AI safety and why the future of AI is open. 00:00 - Introduction 02:00 - Emad’s background as one of the largest investors in video games and artificial intelligence 07:24 - Open-source efforts in AI 13:09 - Stability.AI as the only independent multimodal AI company in the world 15:28 - Computational biology, medical information and medical models 23:29 - Pace of Adoption 26:31 - AGI versus intelligence augmentation 31:38 - Stability.AI’s business model 37:44 - AI Safety

Sarah GuohostEmad MostaqueguestElad Gilhost

May 3, 202345mWatch on YouTube ↗

CHAPTERS

0:00 – 2:15
From hedge funds to AI-for-good: autism, COVID, and the “moral imperative”
Emad shares how his computer science background and hedge fund career shifted toward applied AI after his son’s autism diagnosis and later COVID work with global institutions. He frames his motivation as making powerful AI capabilities broadly accessible rather than concentrated in a few labs.
- •Was a major investor in video games and AI while working in hedge funds
- •Used AI-driven literature review and biomolecular pathway analysis to help guide autism treatment
- •Launched education tablet efforts for refugees and underserved communities
- •COVID catalyzed an AI initiative with groups like the World Bank/UNESCO, revealing institutional barriers
- •Concludes that superhuman narrow AI should be available to everyone
2:15 – 4:31
Backing early open communities: EleutherAI, compute grants, and the image-model spark
The conversation turns to how Emad began supporting open AI communities—especially EleutherAI—after struggling to access closed models for COVID research. He explains how text-guided image generation (CLIP + early generative models) convinced him open image models could unlock massive creative and economic value.
- •Closed-model access issues pushed him toward supporting open alternatives
- •Supported EleutherAI’s language-model wave while focusing heavily on image generation
- •Personal angle: aphantasia and the appeal of externalized visualization via generative models
- •Early experimentation (VQ-GAN/CLIP era) led to surprising real-world value (NFT sale + donation)
- •Funded compute, jobs, and grants across the early image ecosystem (e.g., notebooks and tooling)
4:31 – 5:59
Why Stability started: coordinating open source, DAO attempt, and ‘commercial open source’
Emad describes Stability’s origin as an attempt to coordinate fast-growing, chaotic communities into a shared “common good.” After abandoning the DAO concept, he lands on commercial open source as a scalable way to align incentives while keeping core infrastructure accessible.
- •Initial idea: coordinate communities via a DAO/DAO-of-DAOs; quickly found it impractical
- •Shifted to commercial open source software to create aligned, widely available tech
- •Sees scaling laws and private-lab dominance driving brain drain from academia
- •Motivation: provide a third path between under-resourced academia and big tech labs
- •Goal: make frontier capabilities broadly available while still enabling a viable business
5:59 – 9:10
Open models as infrastructure: the Stable Diffusion adoption shock and ‘national models’
Emad argues that foundation models are a new programming primitive: expensive pre-compute produces cheap, ubiquitous inference. He predicts open base models will become critical public infrastructure, enabling countries and institutions to build “Japan models,” “India models,” and more.
- •Model training is done by small teams + major compute; open source here is different than classic OSS
- •Stable Diffusion as a deterministic ‘string-to-image’ infrastructure artifact (small weight file, huge impact)
- •Explains rapid ecosystem adoption (GitHub momentum) as a sign of a new computing primitive
- •Predicts closed ecosystems eventually get commoditized when open alternatives arrive
- •Positions governments as key stakeholders in foundation-model infrastructure spending
9:10 – 11:37
Compute strategy and the future split: open foundation models vs private instruct/fine-tuning
Elad presses on whether capital intensity will force AI to stay closed; Emad counters with access to national supercomputers and public infrastructure. He outlines a phased model: open deep-learning base models, then private-sector strength in RLHF/instruct, fine-tuning, and clinical or enterprise deployment.
- •Claims access to significant public/national compute and involvement in exascale efforts
- •Views foundation models as broadly open infrastructure during the ‘deep learning phase’
- •Describes successive phases: base model → instruct/RLHF → domain fine-tuning → human-in-the-loop practice
- •Uses PaLM → FLAN-PaLM → Med-PaLM as an example of performance gains via instruction + domain data
- •Argues model-combination strategies may beat one-size-fits-all multilingual monoliths
11:37 – 14:41
Inside Stability’s scope: independent multimodal AI and “customizable, editable” models
Emad positions Stability as an independent multimodal player spanning image, audio, language, code, and more. He emphasizes optimization, distillation, and deployability (including on-device) over simply scaling model size.
- •Claims Stability is the only independent multimodal AI company (vs big integrated labs)
- •Works across modalities: audio, language, code, and open equivalents of closed/private models
- •Stable Diffusion optimized for accessible hardware; roadmap includes fast, edge-capable variants
- •Focus on customization/editability and efficiency rather than only bigger models
- •Highlights data quality and optimization cycles as key levers (references Chinchilla-style thinking)
14:41 – 17:31
Computational biology and medical models: OpenFold, DNA diffusion, and a ‘Med Stable GPT’
The discussion drills into biology and medicine as a major future impact area. Emad lays out plans for open medical language models and ongoing work in protein folding and molecular/chemical modeling, aiming to standardize and align incentives across the ecosystem.
- •Plans to release an open medical model (“Med stable GPT”)
- •Key involvement in OpenFold with speed/ablation improvements relative to AlphaFold approaches
- •Mentions DNA diffusion for predicting outcomes from sequences
- •BioLM efforts for chemical reactions and broader biotech modeling
- •Sees a role for Stability in aligning incentives and creating shared foundations across bio stakeholders
17:31 – 19:42
Why healthcare adoption has lagged: infrastructure barriers, trust, and building ‘from the bottom up’
Elad raises the historical gap between technical capability and real-world medical adoption. Emad argues that entrenched infrastructure and incentive structures block deployment, and proposes greenfield systems—especially in underserved regions—as the fastest path to meaningful integration.
- •Healthcare tech adoption historically slow despite early expert-system successes (e.g., Mycin)
- •Open source complex systems may lower barriers vs purely private deployments
- •Advocates human-in-the-loop deployment where appropriate, but wants systemic redesign
- •Highlights deploying education/health stacks in low-infrastructure environments as a wedge
- •Believes competitive pressure from new systems can force modernization in Western healthcare
19:42 – 22:36
Pace of adoption and leapfrogging: Asia/Africa, schools reacting to ChatGPT, and ROI dynamics
Sarah and Emad compare AI’s adoption curve to mobile leapfrogging in East Asia. Emad describes governments and schools scrambling to respond, arguing emerging markets may embrace generative AI faster due to greenfield opportunity and favorable cost/ROI profiles.
- •Analogy: mobile leapfrogging suggests emerging markets may leapfrog to generative/personalized AI
- •Anecdote: school leaders calling about students using ChatGPT for homework; assessment methods shifting
- •Contrasts defensive reactions (bans/pen-and-paper) with proactive adoption strategies in parts of Asia
- •Emphasizes training costs falling and high ROI for “ChatGPT for everyone” style deployment
- •Cites early education results in refugee contexts as proof of rapid compounding gains
22:36 – 25:43
The future of media: making visual communication and creativity universally accessible
Emad explains why media is a strategic edge: image/video/audio generation changes who can create and how. He ties the opportunity to human wellbeing (expression, art therapy) and to massive production efficiencies already visible in studios.
- •Frames visual communication as historically ‘hard’ compared to text—AI changes that dramatically
- •Personal background: film awards involvement and major video game investing informs his focus
- •Predicts UGC shifts from one shared “metaverse” to many shareable personal worlds
- •Notes production tooling improvements (editing, asset generation, new 3D methods)
- •Claims studios are already saving substantial costs by integrating Stable Diffusion workflows
25:43 – 27:41
AGI vs intelligence augmentation: multimodality, model swarms, and human alignment
Sarah asks about AGI; Emad deprioritizes it except for safety risk, focusing on augmenting humans. He challenges the “one giant model” approach and suggests a future of many specialized models combined—potentially more diverse and more aligned.
- •Stability’s mission framed as “activate humanity’s potential” rather than AGI for its own sake
- •Belief that multimodality matters more than some AGI roadmaps admit
- •Two paradigms: scale monoliths vs combine many specialized models
- •Observation: combining many fine-tuned diffusion models yields emergent quality improvements
- •Argues a plural “hive” of models may be more human-aligned than a single forced-to-comply system
27:41 – 31:06
Language-model strategy: open ChatGPT equivalents, efficiency breakthroughs, and smaller models
Emad outlines Stability’s language efforts: popular open LMs, RLHF/instruct tooling, and training up to ~100B parameters. He stresses that optimization, instruction, and data quality can outperform brute-force scaling—mirroring what happened with Stable Diffusion inference speed.
- •References GPT-Neo/J/X adoption and ongoing training of larger open models
- •CARPER lab + TRLX/Instruct framework aimed at making models more helpful/aligned
- •Argues Chinchilla-style efficiency and instruction can reduce the need for extreme parameter counts
- •Notes evidence that small instructed models can beat much larger base models on key tasks
- •Predicts a major optimization cycle for LMs similar to SD’s rapid inference speedups
31:06 – 32:35
Business model: open benchmarks plus private, regulated, and licensed models (media + enterprise)
Elad asks how Stability monetizes while keeping foundations open. Emad describes a hybrid: open base models as benchmarks and infrastructure, alongside bespoke training for nations, large corporates, and licensed media datasets (e.g., region-specific video/music models).
- •Exclusive licensing strategy for certain media datasets (e.g., Bollywood video models)
- •Partnership approach to enable customers to train on their own data (e.g., cloud/enterprise workflows)
- •Core promise: help entities that lack the expertise to train and operate large models safely
- •Open models serve as optimized reference points; private models address on-prem and regulated needs
- •Sees complementarity with big-tech APIs: convenience vs control/privacy customization
32:35 – 36:44
Democracy, trust, and ‘models as fiction’: AI as a filter layer over information
Emad explores AI’s role in translating complex information for citizens and potentially enabling more direct, informed participation. He also warns that compressed models can preserve patterns while losing factual fidelity, so society must treat them carefully as decision-making “coordinators.”
- •Envisions national/public interfaces that help citizens understand issues and reduce manipulation
- •Personalization: models can adapt explanations to different ages/contexts and user preferences
- •Discusses multi-model systems (e.g., diplomacy agents) as a path to better coordination
- •Warns LMs should often be treated as creative/fictional generators, not truth engines
- •Highlights the risk of ceding authority to opaque optimizers without robust safeguards
36:44 – 44:26
AI safety and regulation: large-model registration, data rights, deepfakes, and defense risks
The closing segment tackles safety across alignment, content harms, political bias, and national security. Emad advocates regulation focused on very large training runs, shared evaluation expertise, opt-out/opt-in data mechanisms, and provenance standards to combat misinformation and deepfakes.
- •Rejects the idea that technology is neutral; models reflect embedded biases and choices
- •Supports regulation: register/monitor models above a compute (FLOPs) threshold, akin to dual-use regimes
- •Calls for international expert teams to define tests, standards, and evaluation methods
- •Advocates opt-out/opt-in and attribution mechanisms for training data (artists, creators, sensitive sources)
- •Deepfakes/misinformation: favors metadata/provenance standards (e.g., authenticity initiatives) and warns of military autonomy risks
44:26 – 45:02
Controversial five-year prediction: edge ‘hive’ models beat giant models
Sarah asks for a bold prediction to close. Emad predicts smaller, combined models will outperform giant single models, and that ChatGPT-level capability will run locally on smartphones within five years.
- •Small/medium models + combination strategies will outperform large monoliths
- •Edge deployment becomes mainstream: powerful assistants on-device
- •Continues the theme of accessibility, customization, and efficiency over raw scale

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

iOS

Android

Claude

Chrome

From hedge funds to AI-for-good: autism, COVID, and the “moral imperative”

Backing early open communities: EleutherAI, compute grants, and the image-model spark

Why Stability started: coordinating open source, DAO attempt, and ‘commercial open source’

Open models as infrastructure: the Stable Diffusion adoption shock and ‘national models’

Compute strategy and the future split: open foundation models vs private instruct/fine-tuning

Inside Stability’s scope: independent multimodal AI and “customizable, editable” models

Computational biology and medical models: OpenFold, DNA diffusion, and a ‘Med Stable GPT’

Why healthcare adoption has lagged: infrastructure barriers, trust, and building ‘from the bottom up’

Pace of adoption and leapfrogging: Asia/Africa, schools reacting to ChatGPT, and ROI dynamics

The future of media: making visual communication and creativity universally accessible

AGI vs intelligence augmentation: multimodality, model swarms, and human alignment

Language-model strategy: open ChatGPT equivalents, efficiency breakthroughs, and smaller models

Business model: open benchmarks plus private, regulated, and licensed models (media + enterprise)

Democracy, trust, and ‘models as fiction’: AI as a filter layer over information

AI safety and regulation: large-model registration, data rights, deepfakes, and defense risks

Controversial five-year prediction: edge ‘hive’ models beat giant models

Get more out of YouTube videos.