Skip to content
Daphne Koller: Biomedicine and Machine Learning | Lex Fridman Podcast #93
This video isn’t embeddableWatch on YouTube →
Lex Fridman PodcastLex Fridman Podcast

Daphne Koller: Biomedicine and Machine Learning | Lex Fridman Podcast #93

Daphne Koller is a professor of computer science at Stanford University, a co-founder of Coursera with Andrew Ng and Founder and CEO of insitro, a company at the intersection of machine learning and biomedicine. Support this podcast by signing up with these sponsors: - Cash App - use code "LexPodcast" and download: - Cash App (App Store): https://apple.co/2sPrUHe - Cash App (Google Play): https://bit.ly/2MlvP5w EPISODE LINKS: Daphne's Twitter: https://twitter.com/daphnekoller Daphne's Website: https://ai.stanford.edu/users/koller/index.html Insitro: http://insitro.com PODCAST INFO: Podcast website: https://lexfridman.com/podcast Apple Podcasts: https://apple.co/2lwqZIr Spotify: https://spoti.fi/2nEwCF8 RSS: https://lexfridman.com/feed/podcast/ Full episodes playlist: https://www.youtube.com/playlist?list=PLrAXtmErZgOdP_8GztsuKi9nrraNbKKp4 Clips playlist: https://www.youtube.com/playlist?list=PLrAXtmErZgOeciFP3CBCIEElOJeitOr41 OUTLINE: 0:00 - Introduction 2:22 - Will we one day cure all disease? 6:31 - Longevity 10:16 - Role of machine learning in treating diseases 13:05 - A personal journey to medicine 16:25 - Insitro and disease-in-a-dish models 33:25 - What diseases can be helped with disease-in-a-dish approaches? 36:43 - Coursera and education 49:04 - Advice to people interested in AI 50:52 - Beautiful idea in deep learning 55:10 - Uncertainty in AI 58:29 - AGI and AI safety 1:06:52 - Are most people good? 1:09:04 - Meaning of life CONNECT: - Subscribe to this YouTube channel - Twitter: https://twitter.com/lexfridman - LinkedIn: https://www.linkedin.com/in/lexfridman - Facebook: https://www.facebook.com/LexFridmanPage - Instagram: https://www.instagram.com/lexfridman - Medium: https://medium.com/@lexfridman - Support on Patreon: https://www.patreon.com/lexfridman

Lex FridmanhostDaphne Kollerguest
May 5, 20201h 12mWatch on YouTube ↗

CHAPTERS

  1. 0:00 – 2:30

    Why Daphne Koller moved from Coursera to ML for human health

    Lex introduces Daphne Koller’s background—Stanford professor, Coursera co-founder, and CEO of insitro—and frames the conversation as the early days of using machine learning to transform biomedicine. He also notes the recording predates COVID-19, underscoring the relevance of scalable drug discovery and health technologies.

    • Daphne’s roles: Stanford CS, Coursera co-founder, insitro founder/CEO
    • Machine learning as a data-driven lever for drug discovery and treatment development
    • Context note: recorded before COVID-19 but broadly relevant to pandemics
    • Lex sets a philosophical tone about disease, longevity, and impact
  2. 2:30 – 6:31

    Can we cure all diseases? How little we understand many mechanisms

    Daphne pushes back on sweeping predictions while explaining why “curing” disease is fundamentally hard—often damage is extensive by diagnosis and would require regeneration. She highlights that for many major diseases we may be close to zero in mechanistic understanding, and that several conditions are actually heterogeneous clusters rather than a single disease.

    • Avoiding hubris in long-term predictions; “one day” is a very long time
    • Cures are rare; treatments are more common than true cures
    • Many diseases have near-zero mechanistic understanding today
    • Alzheimer’s/schizophrenia likely heterogeneous rather than single mechanisms
  3. 6:31 – 10:10

    Longevity as overlapping biology: aging, healthspan, and cellular wear-and-tear

    The discussion shifts to longevity and how aging overlaps with disease risk, especially after ~40 as risks rise sharply for many conditions. Daphne emphasizes healthspan—staying healthy and functional longer—over immortality, and describes cellular processes that contribute to both aging and disease.

    • Disease risk often increases exponentially with age after ~40
    • Longevity and disease mechanisms partially overlap, but aren’t identical
    • Healthspan as a more meaningful target than immortality
    • Cellular contributors: DNA damage, misfolded proteins, inflammation
  4. 10:10 – 13:05

    What ML has been missing in biomedicine: the right datasets (and insitro’s approach)

    Daphne argues ML has had limited impact in biology largely because the necessary large, high-quality datasets didn’t exist. She describes insitro’s strategy: deliberately design and generate datasets using modern bioengineering tools so ML can build predictive models that meaningfully improve human health.

    • Historically: insufficient scale/quality of biological datasets for powerful ML
    • Recent changes: new technologies enabling data generation at scale
    • insitro flips the workflow: build datasets explicitly for predictive modeling
    • Data is a means; improving health is the end goal
  5. 13:05 – 16:25

    Personal motivation: early bio interest and a family tragedy shaping drug discovery focus

    Lex asks where Daphne’s health focus came from, and she recounts early dissatisfaction with uninspiring ML datasets and a pull toward biologically meaningful problems. She also shares how her father’s death from an autoimmune lung condition—where treatment options were limited—deepened her interest in better, safer, more targeted therapeutics.

    • Early 2000s: biology datasets felt more meaningful than classic ML benchmarks
    • Growing interest in biology beyond ML-centric work
    • Father’s autoimmune disease and limited treatment options (prednisone)
    • Belief we can now make safer/more effective drugs, but need better biology understanding
  6. 16:25 – 20:02

    Disease-in-a-dish vs animal models: why translation fails and what iPS cells enable

    Daphne explains traditional animal models, often mice, and why they frequently fail: the induced phenotype may not share the human disease mechanism. She contrasts this with disease-in-a-dish models enabled by induced pluripotent stem cells (iPSCs), where a person’s cells can be reprogrammed and differentiated to relevant human cell types to study genetics-driven disease phenotypes.

    • Animal models can mimic phenotype without matching human mechanism
    • Many mice don’t naturally get key human diseases (Alzheimer’s, schizophrenia, etc.)
    • iPSCs: reprogram adult cells to pluripotency, then differentiate into target cell types
    • Goal: observe disease phenotypes in human-genetics-derived cells and test interventions
  7. 20:02 – 23:50

    Scaling and realism: Yamanaka reprogramming, CRISPR perturbations, and variability

    The conversation dives into practical constraints: iPSC creation is increasingly industrialized but not yet at massive population scale. Daphne describes using CRISPR to introduce specific mutations for clean comparisons, and discusses sources of variability—genetic differences and differences in differentiation quality across lines.

    • Yamanaka-factor-style reprogramming is now more reliable but not perfect
    • Global iPSC line counts are still limited; scale is improving but constrained
    • CRISPR enables controlled “healthy vs mutated” comparisons in the same background
    • Variability: genetics plus differences in differentiation robustness across individuals
  8. 23:50 – 29:37

    From genetics to cells to data: polygenic risk and measuring ‘squishy’ biology digitally

    Daphne defines disease burden and polygenic risk scores as ways to quantify genetic predisposition across many small variants. She argues cell-derived phenotypes can be closer to clinical outcomes than genetics alone, especially given limited biological understanding. They then discuss measurement revolutions that turn cells into rich digital datasets.

    • Disease burden as aggregate genetic predisposition; polygenic risk scores
    • Genetics contains signal, but cellular phenotypes may be more predictive/meaningful
    • High-throughput measurement: microarrays → single-cell RNA-seq
    • Advanced microscopy (including super-resolution) yields quantitative subcellular structure data
  9. 29:37 – 33:24

    How ML drives drug discovery in this paradigm: subtypes, perturbations, and prediction

    Daphne outlines multiple ways to use cell data: hypothesis-driven backward reasoning, forward perturbation screens, and insitro’s pattern-finding ML approach. A central aim is discovering molecular subtypes and identifying interventions (drugs or gene edits) that shift diseased cells toward healthy states, then validating downstream.

    • Three approaches: backward biological inference, forward perturbation tests, ML pattern discovery
    • Identify molecular subtypes hidden behind similar clinical labels
    • Search for interventions that revert cellular disease signatures toward normal
    • Less hypothesis-bound discovery can uncover non-obvious therapeutic avenues
  10. 33:24 – 36:43

    Which diseases fit disease-in-a-dish today: genetic basis, tractable cell types, organoids

    Lex asks which diseases this can help; Daphne is cautious about promises and instead lists characteristics that make success more likely. She highlights genetic-driven diseases, robust/reproducible cellular models, and diseases localized to a small number of cell types, then points to organoids and multi-organ systems as the next frontier.

    • Better fit: strong genetic basis and clear cellular phenotype
    • Need reproducible, low-noise in vitro models at workable scale
    • Harder: systemic, multi-organ diseases that are difficult to recreate in a dish
    • Organoids (brain/liver/kidney/gut) and emerging multi-organoid connections expand tractability
  11. 36:43 – 41:51

    The Coursera/MOOC origin story: Stanford experiments to global scale

    The conversation pivots to education and the birth of MOOCs at Stanford, where Daphne and Andrew Ng explored online teaching for both quality and reach. The viral adoption of early Stanford MOOCs (100k+ learners) convinced them the demand was too important to ignore, leading to Coursera’s launch in 2012.

    • Late 2000s Stanford initiatives: online teaching quality + scale
    • Andrew Ng’s Stanford Engineering Everywhere and Daphne’s interactive modular approach
    • 2011 Stanford MOOCs explode in popularity with minimal marketing
    • Decision process: Stanford effort vs nonprofit vs for-profit → Coursera
  12. 41:51 – 49:04

    What makes online learning work: short modules, fast feedback, and flipped classrooms

    Daphne distills lessons from Coursera’s experimentation: shorter is better at every level, and learners need flexibility and immediate feedback. She explains micro-quizzes, auto-graded assessments, and the flipped classroom model, noting these approaches demand far more instructor preparation than traditional lecturing.

    • Brevity matters: 5–7 minute videos often outperform longer lectures
    • Courses should be modular with natural completion points
    • Engagement via micro-quizzes and rapid feedback improves persistence
    • Flipped classroom: content before class, deeper problem-solving in-person; higher prep burden
  13. 49:04 – 55:08

    Advice for learning AI and the most beautiful deep learning ideas

    Daphne advises aspiring ML practitioners to build strong foundations (math, stats, programming), then learn ML and practice on real problems, ideally with collaborators. She names end-to-end learning and representation/transfer learning as especially powerful ideas, while reflecting on how data scale changed what’s possible in high-dimensional learning.

    • Start learning, but don’t skip fundamentals (math/stats/programming)
    • Move from theory to practice via real workplace problems or Kaggle
    • End-to-end training as a foundational deep learning paradigm
    • Transfer learning/representation learning as underutilized and human-like in spirit
  14. 55:08 – 1:06:35

    Uncertainty, robustness, and AI safety: calibrated confidence and risks from ‘dumb’ systems

    Daphne discusses the dangers of poorly calibrated ML confidence, especially in medicine and autonomous driving, and surveys techniques like Bayesian approaches and ensembles. She then tackles AGI: she views it as far away, but emphasizes real near-term risks from complex, poorly understood systems, misuse, and societal deployment choices.

    • Neural networks can be confidently wrong; calibration is critical
    • Medical diagnosis and self-driving illustrate high-stakes failure modes
    • Methods: Bayesian deep learning, Gaussian processes, ensembles; still open research
    • AGI seen as distant; nearer risks: system complexity, robustness testing, misuse (surveillance, weapons), and parallels to CRISPR dangers
  15. 1:06:35 – 1:12:03

    Human nature, social norms, and a personal meaning-of-life framework

    Lex asks whether people are fundamentally good; Daphne is broadly optimistic but warns that social norms can drift toward rewarding harmful behavior. She closes on meaning and purpose: “making a dent in the universe,” improving the world—especially as a responsibility of privilege—and teaching that ethos to her children.

    • Most people mean well, but societies can incentivize harmful behavior
    • Importance of social norms aligning status with doing good
    • Meaning of life as leaving the world better than you found it
    • Privilege increases responsibility to contribute positively

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.