No Priors Ep. 6 | With Daphne Koller from Insitro

Life-saving therapeutics continue to grow more costly to discover. At the same time, recent advances in using machine learning for the life sciences and medicine are extraordinary. Are we on the verge of a paradigm shift in biotech? This week on the podcast, a pioneer in AI, Daphne Koller, joins Sarah Guo and Elad Gil on the podcast to help us explore that question. Daphne is the CEO and founder of Insitro — a company that applies machine learning to pharma discovery and development, specifically by leveraging “induced pluripotent stem cells.” We explain Insitro’s approach, why they’re focused on generating their own data, why you can’t cure schizophrenia in mice, and how to design a culture that supports both research and engineering. Daphne was previously a computer science professor at Stanford, and co-founder and co-CEO of edutech company Coursera. 00:00 - Introduction 01:49 - How Daphne combined her biology and tech interests and ran a bifurcated lab at Stanford 04:34 - Why Daphne resigned an endowed chair at Stanford to build Coursera 14:14 - How insitro approaches target identification problems and training data 18:33 - What are pluripotent stem cells and how insitro identifies individual neurons 24:08 - How insitro operates as an engine for drug discovery and partners to create the drugs themselves 26:48 - Role of regulations, clinical trials and disease progression in drug delivery 33:19 - Building a team and workplace culture that can bridge both bio and computer sciences 39:50 - What Daphne is paying attention to in the so-called golden age of machine learning 43:12 - Advice for leading a startup in edtech and healthtech

Sarah GuohostDaphne KollerguestElad Gilhost

May 19, 202346mWatch on YouTube ↗

CHAPTERS

0:00 – 1:25
From ML research to biology: finding richer datasets and running a “two-worlds” Stanford lab
Daphne explains how her initial attraction to biology came from the technical richness of emerging genomic datasets in the mid-90s. She describes building a bifurcated Stanford lab where one half published core ML work and the other half published top-tier biology, often with colleagues unaware of her dual identity.
- •Biology offered more technically interesting datasets than classic ML benchmarks at the time
- •Early genomics enabled measuring gene activity across the whole genome
- •Her Stanford lab split between core ML and core biology research
- •Publishing in both CS venues and Nature/Cell/Science created a “bifurcated existence”
- •Cross-disciplinary work was fun but culturally segmented
1:25 – 3:08
Probabilistic graphical models vs. deep learning: the pendulum swings back toward reasoning and causality
She reflects on the rise of probabilistic graphical models as a bridge from symbolic AI to data-driven ML, and how deep learning later overshadowed interpretability-focused approaches. Daphne argues the field is now moving toward synthesis—combining deep pattern recognition with causal reasoning and interpretability requirements, especially in clinical settings.
- •PGMs helped shift AI toward numerical, statistical machine learning
- •Deep learning reduced emphasis on explicit, interpretable representations
- •Modern needs include causality, reasoning, and explainability
- •Clinical adoption pressures models to justify decisions
- •Future paradigm is a hybrid of deep learning and structured reasoning
3:08 – 4:34
Leaving Stanford for Coursera: urgency to create direct real-world impact
Daphne recounts feeling increasing urgency to affect real people beyond academia. The early Stanford MOOCs revealed massive potential impact, motivating her to take a leave to co-found Coursera and “make sure it was done right.”
- •Desire for personal, direct societal impact beyond training students
- •Stanford’s early MOOCs demonstrated transformative scale
- •Decision to step out of academia to build a real product/company
- •Initial plan was a temporary leave, not a permanent exit
- •Impact and execution quality were key motivators
4:34 – 6:04
Resigning an endowed chair and resurfacing in a transformed ML landscape
She describes how Stanford’s strict leave policy forced a dramatic choice—resigning an endowed chair to remain at Coursera. After ~5 years, she looked up to see the 2012+ ML revolution had reshaped tech, but life sciences had not benefited nearly as much, creating a compelling new mission.
- •Stanford leave limits led to a forced decision point
- •Resigned an endowed chair to stay in industry
- •While building Coursera, ML underwent a major revolution
- •Noted ML transforming many sectors but lagging in life sciences
- •Opportunity: rare ability to speak both ML and biology languages
6:04 – 10:09
Calico to insitro: learning biopharma realities and choosing a platform path
Daphne explains seeking advice from Art Levinson, which led her to Calico for an intense learning period. She left after realizing she wanted to build a broad drug-discovery platform rather than focus on a single biology area like aging—where human longitudinal data is especially slow and scarce.
- •Sought mentorship from Art Levinson; joined Calico
- •Calico offered exposure to elite biotech leadership and practices
- •Key insight: biopharma translation from insight to therapy is old-fashioned and data-poor
- •Aging is important but hard due to long timelines and limited human cohorts
- •Founded insitro to build a scalable platform for drug discovery
10:09 – 13:52
Where AI can move the needle most: reducing failure by fixing target selection
Daphne and Elad map AI opportunities across drug discovery—biology/target discovery, molecular design, and clinical enablement. She argues the biggest cost driver is the 95% failure rate, largely due to wrong targets/indications/patient populations, making target identification the highest-leverage (and hardest) place to focus.
- •AI could help across target discovery, molecule design, and trials/endpoints
- •Drug development is slow, expensive, and high-failure
- •Most failures come from modulating the wrong biology, not just molecule issues
- •Improving target/indication/patient selection can reduce downstream failures
- •High-leverage focus: target identification despite limited direct labels
13:52 – 16:55
insitro’s two-pronged data strategy: human “experiments of nature” plus engineered cellular perturbations
She details the core challenge: target ID lacks obvious training labels because clinical outcomes arrive only at the end of trials. insitro combines human genetic/phenotypic data with large-scale wet-lab perturbation datasets, using ML to model genotype-to-phenotype mappings and build cellular systems predictive of human outcomes.
- •Target ID is hard because clinical outcome labels come late
- •Use human genetics as “experiments of nature” (genotype → phenotype)
- •Generate proprietary perturbation data via genome editing in cells
- •Apply ML to high-content modalities: imaging, transcriptomics, proteomics
- •Goal: predictive human-relevant cellular models to reduce reliance on animal models
16:55 – 18:35
Choosing therapeutic areas: unmet need + differentiated data availability (neuro, metabolism, oncology)
Daphne explains how insitro selects domains where current tools are ineffective and where their platform offers unique advantage. Neuroscience stands out due to poor animal-model translatability and strong potential for stem-cell-derived neuron models; metabolism and oncology benefit from abundant high-content clinical data collected in standard care.
- •Selection criteria: big unmet need and unique platform advantage
- •Neuroscience: huge unmet need; animal models often fail to translate
- •iPSCs can be differentiated into neurons with disease-relevant phenotypes
- •Human data availability matters (e.g., brain MRIs)
- •Metabolism and oncology offer rich, routine, disease-relevant datasets
18:35 – 21:31
Primer on iPSCs and neuron creation: reprogramming, pluripotency, and in-vitro ‘A/B tests’
In accessible terms for a tech audience, Daphne describes making patient-derived neurons from blood or skin cells via reprogramming into induced pluripotent stem cells (iPSCs). She explains how genome editing enables controlled comparisons—like A/B tests—to isolate effects of disease-linked variants across genetic backgrounds.
- •Start from blood or skin cells; reprogram into iPSCs (Nobel-recognized tech)
- •Pluripotent cells can become neurons, liver cells, heart cells, etc.
- •Patient genetics are preserved in derived cellular models
- •Genome editing introduces/removes variants for controlled comparisons
- •Framing: in-vitro case/control is analogous to an A/B test
21:31 – 24:04
Pragmatism about disease complexity: when single-cell models work, and when to wait for organoids/organs-on-chips
Daphne addresses limitations of reductionist models for complex diseases like neurodegeneration and multi-system metabolic disorders. insitro often starts where single-lineage models capture enough signal (sometimes by stressing cells to mimic disease conditions), while tracking emerging technologies like organoids and organs-on-chips to expand scope later.
- •Many diseases involve multiple cell types and system interactions
- •Single-cell lineage models can still be informative if designed carefully
- •Cells can be pushed into disease-like states with environmental factors
- •Organoids and organs-on-chips are promising but still maturing
- •Strategy: deliver value now; expand as enabling tech improves
24:04 – 26:49
Platform-to-program strategy: partnering, licensing, and accelerating via existing assets
Elad asks how far insitro goes in drug development versus partnering. Daphne explains insitro is an “engine” for targets and insights, allowing flexible partnering or out-licensing without emptying the pipeline, and sometimes collaborating around existing drugs repurposed to the right indication/patient population to save years of development time.
- •insitro views itself as an engine that continuously generates targets/programs
- •Partnerships and out-licensing are viable without undermining the company’s core
- •Sometimes the best path uses an existing drug aimed at a newly identified context
- •Leveraging existing safety/biomarker work can shave 2–5 years
- •Pragmatic objective: maximize patient impact rather than own everything
26:49 – 30:10
Biomarkers and patient stratification: why genetics + ML can double clinical success odds
They dig into biomarkers as a key lever for faster, more successful trials. Daphne cites evidence that biomarkers and human-genetics-backed targets each roughly double success probabilities, argues biomarkers often fall out of human-data-centric discovery, and emphasizes that “all-comer” trials can mask true efficacy—illustrated by the Herceptin example.
- •Biomarkers are strongly associated with higher clinical success rates
- •Human genetics support also correlates with better program success
- •Human clinical data analysis can yield stratification signals and biomarkers
- •All-comer trials can fail even when a drug works for a subset
- •Herceptin’s success depended on selecting HER2+ patients
30:10 – 33:56
Regulation, trial timelines, and the limits of speed: learning from COVID without ignoring biology
Elad asks whether ML-first approaches can dramatically shorten drug development or if regulation dominates timelines. Daphne argues acceleration requires early, proactive regulator engagement and robust real-world biomarker collection, but notes that some timelines are constrained by biology and disease progression (e.g., Alzheimer’s requires long readouts), making proxies imperfect and patient heterogeneity critical.
- •Regulators have valid concerns: robustness and reproducibility of biomarkers in the wild
- •Engaging regulators early can enable faster paths than end-stage negotiations
- •COVID sped trials due to high case counts and rapid disease progression
- •Chronic diseases have intrinsic timeline constraints for clinical endpoints
- •Proxy biomarkers (e.g., amyloid) may work for subsets; stratification remains key
33:56 – 40:58
Building a culture that bridges engineers and scientists: mindsets, processes, and behavioral norms
Daphne details how difficult—and defensible—it is to build an organization spanning computer science and biology. She contrasts engineering predictability with biological variability, highlights differing approaches to patterns vs. outliers, and explains how insitro uses both product and project management while anchoring culture in explicit norms: open, constructive, respectful engagement.
- •Bridging biology and engineering requires a strong learning mindset on both sides
- •Biology is variable and messy; experiments can fail for unexpected reasons
- •Engineers seek general patterns; scientists often chase outliers for discovery
- •Use appropriate processes: agile for platform work, traditional timelines for long wet-lab cycles
- •Codified cultural norm: engage openly, constructively, and with respect
40:58 – 46:57
The ‘golden age’ of AI applied to real-world benefit: bio + tech beyond drug discovery, plus founder advice
Daphne shares what excites her beyond insitro: applying AI to domains that tangibly improve lives. She points to the underappreciated revolution in biological toolkits (CRISPR, stem cells, microscopy) and their combination with AI across agriculture, environment, energy, biomaterials, food tech, and education—then closes with advice to founders to prioritize differentiated ideas and meaningful impact even in tougher sectors like edtech and healthtech.
- •Personal compass: deploy AI to improve lives, not just optimize ads
- •Biology’s toolkit is advancing rapidly (CRISPR, iPSCs, microscopy)
- •AI+bio opportunities extend to agriculture, environment, energy, materials, food tech
- •Education remains ripe for personalization beyond superficial uses like essay-writing bots
- •Founder advice: strong differentiation + clear impact can attract capital; optimize for life value, not only certainty of returns

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

iOS

Android

Claude

Chrome

From ML research to biology: finding richer datasets and running a “two-worlds” Stanford lab

Probabilistic graphical models vs. deep learning: the pendulum swings back toward reasoning and causality

Leaving Stanford for Coursera: urgency to create direct real-world impact

Resigning an endowed chair and resurfacing in a transformed ML landscape

Calico to insitro: learning biopharma realities and choosing a platform path

Where AI can move the needle most: reducing failure by fixing target selection

insitro’s two-pronged data strategy: human “experiments of nature” plus engineered cellular perturbations

Choosing therapeutic areas: unmet need + differentiated data availability (neuro, metabolism, oncology)

Primer on iPSCs and neuron creation: reprogramming, pluripotency, and in-vitro ‘A/B tests’

Pragmatism about disease complexity: when single-cell models work, and when to wait for organoids/organs-on-chips

Platform-to-program strategy: partnering, licensing, and accelerating via existing assets

Biomarkers and patient stratification: why genetics + ML can double clinical success odds

Regulation, trial timelines, and the limits of speed: learning from COVID without ignoring biology

Building a culture that bridges engineers and scientists: mindsets, processes, and behavioral norms

The ‘golden age’ of AI applied to real-world benefit: bio + tech beyond drug discovery, plus founder advice

Get more out of YouTube videos.