No Priors Ep. 96 | With Modal CEO and Founder Erik Bernhardsson

Today on No Priors, Elad chats with Erik Bernhardsson, founder and CEO of Modal Labs, a platform simplifying ML workflows by providing a serverless infrastructure designed to streamline deployment, scaling, and development for AI engineers. Erik talks about his early work on Spotify’s ML algorithms, what Modal offers today, and his vision for building an end-to-end solution for AI engineers. They dive into GPU trends, cloud vs on-premise setups, and when to train custom models vs use off-the-shelf solutions. Erik also shares his thoughts on the evolving role of AI in fields like coding, physics, and music. Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @Bernhardsson Show Notes: 0:00 Introduction 0:22 Erik's early interest in ML infra 1:22 Founding Modal Labs 4:17 State of GPU use today and what’s to come 7:14 Modal's end-to-end vision 9:00 Differentiating amongst competition 10:20 Cloud vs on-premise 12:35 Popular AI models 13:20 Gaps in AI infrastructure 14:55 Insights on vector databases 16:48 Training models vs off-the-shelf models 17:47 AI’s impact on coding and physics 22:14 AI's impact on music

Elad GilhostErik BernhardssonguestSarah Guohost

Jan 9, 202523mWatch on YouTube ↗

CHAPTERS

0:00 – 1:03
Erik’s background: Spotify recommender systems and early ML infrastructure
Elad introduces Erik Bernhardsson and his work leading Spotify’s machine learning efforts. Erik reflects on how little off-the-shelf data infrastructure existed in the late 2000s and how that pushed him to build internal tools.
- •Early days at Spotify (starting 2008) building recommendation systems
- •Lack of mature data/ML infrastructure at the time (Hadoop era)
- •Built foundational internal tools like Luigi (workflow scheduler)
- •Built an early vector database effort at Spotify
- •Pattern of repeatedly needing to build infra to get ML done
1:03 – 1:28
Why Modal: developer productivity and making cloud feel like local
Erik explains the motivation to found Modal after time at Better.com and a period of experimentation during the pandemic. The core insight: cloud development is powerful but frustrating, and ML teams need tighter feedback loops.
- •Better.com experience shaped focus on developer productivity
- •Pandemic “hacking time” clarified what he wanted to build
- •Cloud workflows feel annoying compared to local iteration
- •Goal: fast feedback loops and ergonomics for ML/data workflows
- •Modal’s genesis as an infra-first company for AI/ML
1:28 – 2:28
Building the foundation: why Modal ditched Kubernetes and built its own stack
Erik describes Modal’s early engineering decision to avoid the standard Docker/Kubernetes approach to achieve the developer experience they wanted. The first years were spent building core primitives like a filesystem, scheduler, and container runtime to reduce friction and cold-start latency.
- •Couldn’t hit desired UX/performance with Docker + Kubernetes
- •Built a custom filesystem early on
- •Built a custom scheduler for running workloads
- •Built a custom container runtime for fast startup
- •Two-year focus on foundational infra to enable the product vision
2:28 – 4:16
What Modal offers today: serverless GPU/CPU + Python SDK, traction via Stable Diffusion
Modal is positioned as infrastructure-as-a-service with a large multi-tenant compute pool and a Python SDK that turns functions into serverless cloud workloads. Adoption accelerated when Stable Diffusion drove demand for easy, on-demand GPU inference without provisioning overhead.
- •Operates a large multi-tenant pool of GPUs/CPUs
- •Can provide large GPU bursts (e.g., ~100 GPUs) quickly
- •Python SDK abstracts containerization and infra management
- •Stable Diffusion became an early “killer app” for serverless genAI
- •Expanding beyond images into audio/music; Suno as an example customer
4:16 – 6:57
GPU flexibility and utilization: why on-demand matters most for inference
Elad and Erik discuss why GPU procurement has drifted toward long-term commitments due to scarcity, creating waste and planning pain—especially for startups. Erik argues inference workloads are volatile and benefit from usage-based pricing and pooled capacity to avoid over/under-provisioning.
- •GPU scarcity pushed the market toward long-term commitments
- •Startups often can’t predict demand; commitments are a poor fit
- •Training can be planned; inference demand is bursty and uncertain
- •Usage-based billing: pay only when containers run
- •Multi-tenant pooling smooths utilization across customers
6:57 – 7:44
Beyond inference: pre-processing today and bursty training tomorrow
Erik highlights how customers already use Modal for batch preprocessing (sometimes GPU-based feature extraction) even if training happens elsewhere. Modal is interested in expanding toward training, focusing on shorter, experimental runs rather than massive frontier-model training.
- •Modal’s historical center of gravity: inference
- •Customers use Modal for large-scale batch preprocessing/feature extraction
- •Common workflow: preprocess on Modal, train elsewhere, infer on Modal
- •Training product interest: short, bursty, experimental runs
- •Frontier-scale, months-long training is seen as a different market
7:44 – 9:12
End-to-end platform vision: serving the full ML lifecycle for “high-code” teams
Erik lays out Modal’s ambition to support the full ML lifecycle—from data preprocessing to training to inference and feedback loops—aimed at engineers writing custom code. The emphasis is on enabling flexible, programmable workflows rather than narrow, single-model products.
- •Platform goal: cover ML lifecycle end-to-end
- •Target audience: ML engineers building/customizing models and workflows
- •Includes pipelines, nightly batch jobs, and feedback loops
- •Positioned against point solutions focused only on LLMs or inference
- •Platform breadth enabled by a maturing compute layer
9:12 – 10:22
How Modal differentiates: cloud-native multi-tenancy and general-purpose custom code
Erik argues Modal’s core differentiation is being cloud-maximalist and fully multi-tenant, enabling better capacity management and fast GPU bursts. A second differentiator is generality: safely running arbitrary user code with fast cold starts required deep infra investment.
- •Cloud-native, multi-tenant design enables superior capacity management
- •Instant access to large GPU quantities supports bursty workloads
- •General-purpose platform for custom code (not just single-purpose inference)
- •Hard problems: safe code execution, cold starts, fast container boot
- •Custom infra (scheduler/runtime/filesystem) as a strategic moat
10:22 – 12:39
Cloud vs hyperscalers vs enterprise constraints: security, egress, and adoption tailwinds
Elad raises enterprise concerns: existing hyperscaler commitments, latency, and security reviews. Erik compares the shift to cloud and Snowflake’s model, arguing multi-tenancy will continue to gain acceptance, with improvements in app-layer security and strategies to reduce bandwidth/egress friction.
- •Enterprises prefer staying within existing AWS/Azure/GCP environments
- •Challenges: compliance/security approvals, latency, and data movement costs
- •Analogy: early skepticism about cloud and later mainstream adoption
- •Snowflake-like infra-as-a-service precedent for multi-tenant acceptance
- •Tailwinds: app-layer security, lower bandwidth costs, zero-egress options (e.g., R2)
12:39 – 13:24
Popular models and modalities: Flux, and the underexplored opportunity in audio
Erik notes a perceived shift toward more proprietary models, while calling out Flux as a model getting attention. He’s particularly interested in audio as a modality with significant room for open-source innovation, even if he hasn’t seen the breakout yet.
- •Recent popularity: Flux cited as drawing attention
- •Trend: tilt toward proprietary or semi-open models
- •Audio viewed as underexplored relative to text/image
- •Open-source opportunity remains large in audio
- •Expectation of meaningful new model development in the space
13:24 – 14:59
What’s missing in AI infrastructure: custom workflows, storage evolution, and less bandwidth-hungry training
Erik argues a major gap is a truly ergonomic way for engineers to run custom AI/data code without heavy platform overhead. He also points to open questions in vector storage and training infrastructure, especially reducing networking intensity so distributed training can be more flexible across locations.
- •Need for better infra to run custom code and bespoke workflows easily
- •Outside LLMs, many applications still require training/running custom models
- •Unresolved storage questions: vector DB evolution and training data storage efficiency
- •Training today often requires expensive networking setups (e.g., InfiniBand)
- •If training becomes less bandwidth hungry, it could reshape training infrastructure
14:59 – 16:52
Vector databases debate: pgvector vs specialized systems, and an AI-native rethink of storage
The conversation turns to whether standalone vector databases are necessary versus extending relational databases like Postgres. Erik suggests the bigger opportunity is an “AI-native” storage interface where the system handles embedding and multimodal inputs directly, implying current DB metaphors may be limiting.
- •Ongoing debate: specialized vector DBs vs Postgres + pgvector
- •Erik is uncertain the long-term answer is “a database” as we know it
- •Potential shift: the storage system itself generates embeddings
- •AI-native storage could accept text/images/video directly for search
- •Expectation: the space may take 5–10 years to clarify
16:52 – 17:51
Build vs buy models: when training your own becomes the moat
Elad asks for heuristics on training custom models versus using off-the-shelf options. Erik argues that for companies where model quality is central—especially beyond LLMs in audio/video/image—owning a superior model can be the most defensible moat, otherwise differentiation must come elsewhere in the stack.
- •If model quality is core, relying only on third-party models can weaken defensibility
- •Owning a clearly better model can serve as a durable moat
- •If not training, the moat must come from another layer (product, data, workflow, distribution)
- •More obvious need to train in non-LLM modalities (audio/video/image)
- •Modal’s customers often skew toward teams building their own models
17:51 – 19:08
AI and developer productivity: compilers-to-cloud continuum and rising demand for engineers
Erik frames AI coding tools as another step in a long history of productivity improvements in software engineering. He predicts increased productivity will unlock more software demand rather than reduce the need for engineers.
- •AI coding seen as part of ongoing tooling evolution (compilers, languages, frameworks, cloud)
- •Historical pattern: productivity jumps don’t reduce headcount; demand grows
- •“Latent demand” for software expands as building becomes cheaper
- •Bullish view on long-term software engineering demand
- •AI as an accelerant rather than a replacement
19:08 – 22:15
Physics, simulation, weather, and biotech: where ML meets scientific compute
Elad and Erik discuss opportunities for AI in physics-adjacent simulation and scientific workloads. Erik points to meteorology as a domain where deep learning could help with hard-to-model phenomena like turbulence, and notes increasing Modal usage in computational biology and medical imaging pipelines.
- •Discussion of physics and simulation as promising AI application areas
- •Weather/turbulence prediction highlighted as a good DL fit
- •References to published work from NVIDIA/Google in weather simulation
- •Biotech/computational biology cited as a thriving area (e.g., protein-related advances)
- •Modal usage examples: medical imaging and computer vision over microscopy data
22:15 – 23:36
AI-generated music as a frontier product: why Suno feels uniquely new
Erik highlights AI-generated music as a compelling example of new human-impact products enabled by modern generative models. He notes rapid iteration is improving quality and positions music as historically an early proving ground for technology shifts.
- •Suno as an exciting example of large-scale genAI inference on Modal
- •Current outputs still show “uncanny valley,” but improving quickly
- •Music often demonstrates tech shifts early (from piracy to streaming to new formats)
- •AI music represents a product category that couldn’t exist five years ago
- •Closing remarks and wrap-up

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

iOS

Android

Claude

Chrome

Erik’s background: Spotify recommender systems and early ML infrastructure

Why Modal: developer productivity and making cloud feel like local

Building the foundation: why Modal ditched Kubernetes and built its own stack

What Modal offers today: serverless GPU/CPU + Python SDK, traction via Stable Diffusion

GPU flexibility and utilization: why on-demand matters most for inference

Beyond inference: pre-processing today and bursty training tomorrow

End-to-end platform vision: serving the full ML lifecycle for “high-code” teams

How Modal differentiates: cloud-native multi-tenancy and general-purpose custom code

Cloud vs hyperscalers vs enterprise constraints: security, egress, and adoption tailwinds

Popular models and modalities: Flux, and the underexplored opportunity in audio

What’s missing in AI infrastructure: custom workflows, storage evolution, and less bandwidth-hungry training

Vector databases debate: pgvector vs specialized systems, and an AI-native rethink of storage

Build vs buy models: when training your own becomes the moat

AI and developer productivity: compilers-to-cloud continuum and rising demand for engineers

Physics, simulation, weather, and biotech: where ML meets scientific compute

AI-generated music as a frontier product: why Suno feels uniquely new

Get more out of YouTube videos.