Lex Fridman PodcastDmitry Korkin: Evolution of Proteins, Viruses, Life, and AI | Lex Fridman Podcast #153
CHAPTERS
- 0:00 – 2:00
Why viruses are terrifying and beautiful (and who Dmitry Korkin is)
Lex frames viruses as both a threat and a window into the deepest questions about life. He introduces Dmitry Korkin and sets up the conversation as a blend of virology, protein science, evolution, and AI.
- •Dmitry Korkin’s background in bioinformatics and computational biology
- •Viruses as both civilization-level risk and scientific beauty
- •Motivation: understanding viruses to understand life (and maybe extraterrestrial life)
- •Conversation roadmap: proteins → viruses → evolution → AI
- 2:00 – 4:55
Proteins as modular machines: domains as the real building blocks
Korkin challenges the simplistic idea that proteins are the basic building blocks of life, arguing that protein domains are the key functional and evolutionary units. He explains domains as “beads on a string” and why this view matters for function and evolution.
- •Protein domains as modular functional units within proteins
- •Domain shuffling as a driver of evolutionary novelty
- •Why many proteins are multi-domain and multifunctional
- •Historical reason domains are under-emphasized: early structures were mostly single-domain proteins
- 4:55 – 6:53
How we see protein structure: X-ray, NMR, and the cryo-EM revolution
The conversation shifts to structural biology methods and why cryo-EM changed the game for large complexes. Cryo-EM enabled rapid early structural insight into SARS‑CoV‑2 proteins, especially the spike.
- •X-ray crystallography and NMR as traditional 3D structure tools
- •Cryo-electron microscopy enables larger molecules/complexes
- •Spike protein as an early cryo-EM success story in COVID
- •Methodological advances reshape what biology can measure
- 6:53 – 13:28
Spike protein deep dive: trimers, receptor-binding dynamics, and hidden regions
Lex and Korkin focus on the spike protein’s complexity: multiple domains, trimer formation, and dynamic “arms” that bind ACE2. They discuss how only part of spike was resolved early and why the membrane-embedded region is hard to characterize.
- •Spike functions as a homo-trimer (three copies)
- •Receptor-binding domains (RBDs) move asynchronously
- •Mutation can increase likelihood of multiple RBD ‘arms’ opening
- •Cryo-EM resolved ~2/3; membrane-embedded/unstable regions remain challenging
- 13:28 – 14:27
Coronavirus architecture: capsid, envelope, and the four structural proteins
Korkin explains the physical organization of the virion and defines key terms like capsid and nucleocapsid. He outlines the structural proteins (S, E, M, N), their copy numbers, oligomeric states, and how M proteins may form a lattice shaping the virus.
- •Capsid/nucleocapsid: RNA protected by N protein; structural role beyond shielding
- •Four structural proteins: S (trimer), E (pentamer), M (dimer), plus N
- •Copy numbers per virion: many M dimers (~1000), fewer E (2–3), dozens of spikes (50–90)
- •Hypothesis: M dimer lattice self-organizes viral envelope shape
- 14:27 – 20:46
Therapeutic ‘attack surfaces’: from spike antibodies to targeting the M protein shell
They connect structural understanding to intervention strategies. Korkin discusses nanoparticles that compete for ACE2, small molecules that disrupt protein function, and the strategic appeal of targeting M protein because it’s abundant and more evolutionarily stable than spike.
- •Structural biology guides vaccine/drug design and mechanistic understanding
- •Nanoparticles mimicking virion/spike to block ACE2 receptors
- •Small-molecule disruption of key viral proteins
- •M protein as a promising target: abundant, structural, and more evolutionarily stable than spike
- 20:46 – 27:16
Mutations and variants: cross-species jumps, selective pressure, and what we still don’t know
Lex asks about UK/South Africa variants and what mutations matter. Korkin emphasizes mutation as the engine of viral evolution, especially during host jumps, but cautions that we still can’t confidently predict which changes will increase pathogenic efficiency.
- •Mutations enable adaptation and species-to-species transmission
- •Host jumps (including human↔animal) can introduce novel mutation sets
- •Vaccine-driven selective pressure is plausible but evidence is still limited
- •Many proteins/functions could be optimized further; key effects remain uncertain
- 27:16 – 37:02
Protein evolution beyond sequences: domains, linkers, and alternative splicing
Zooming out, Korkin presents modularity as the central theme of protein evolution. He explains why domains tend to stay intact, how flexible linkers and termini add functional complexity, and how alternative splicing creates multiple proteins from one gene with exon–domain interplay.
- •Domains are conserved evolutionary building blocks; splitting them often breaks function
- •Linkers are flexible, adaptive interaction mediators with different evolutionary constraints
- •Termini (N/C) can be critical for specific protein interactions
- •Alternative splicing: exon/intron architecture yields multiple functional protein products
- •Exon boundaries often (not always) align with domain boundaries
- 37:02 – 44:32
Self-replicating code and evolving software agents: biology as inspiration for CS
A detour into computer science explores Quines (self-replicating programs), code golf, and the idea of software ecosystems that replicate, mutate, and evolve. They discuss intelligent agents, evolutionary algorithms, and broader nature-inspired computation.
- •Quines as a playful analogue to biological self-copying
- •Code golf culture: shortest programs and unusual languages
- •Speculation: autonomous agents that interact, exchange information, and evolve
- •Evolutionary algorithms as partial precursor; recombination vs mutation as search operators
- •Other bio-inspired methods (e.g., slime mold algorithms) suggest more to borrow from nature
- 44:32 – 52:02
Origin of life and the Drake equation: how rare is biology?
Lex asks about the probability of life emerging and what molecules might underpin alien life. Korkin references discovery of glycine in comet dust and argues life might be rare but likely not unique, while acknowledging huge uncertainty about how life begins.
- •Drake equation parameter: probability life emerges on habitable worlds
- •Evidence hint: glycine found in comet dust (building blocks can be extraterrestrial)
- •Open question: would alien life be protein-based, RNA-based, or something else?
- •Korkin’s intuition: probability likely lower than 1%, but not vanishingly small
- •Rare Earth hypothesis vs “life is everywhere” perspective
- 52:02 – 53:52
Life in our solar system: Venus, Mars, Europa—and why discovery would be a turning point
They discuss prospects for microbial life on Venus, Mars, and icy moons like Europa. Korkin hopes we find life and expects it would be carbon-based, emphasizing that even single-cell evidence would transform science.
- •Potential biosignatures on Venus and historical prospects on Mars
- •Europa and other moons as candidates for subsurface life
- •Expectation: extraterrestrial microbes would likely be carbon-based
- •Discovery would be a major scientific turning point
- •Similarity vs difference: the ‘how different?’ question is as important as ‘is it there?’
- 53:52 – 1:03:01
Joshua Lederberg and DENDRAL: early AI for chemistry (born from space curiosity)
Korkin tells a personal story about receiving advice from Nobel laureate Joshua Lederberg on a computational chemistry forum. They explore DENDRAL’s core idea: using symbolic AI/expert systems to infer molecular structure from mass spectrometry—originally motivated by analyzing extraterrestrial molecules.
- •Korkin’s early PhD work in symbolic machine learning for chemoinformatics
- •A surprise reply from Joshua Lederberg sparked deeper interest in DENDRAL
- •DENDRAL: candidate molecule inference from mass spec data + chemical knowledge
- •Original motivation tied to studying molecules from space
- •1960s-era cross-disciplinary vision as a foundation for modern bio/cheminformatics
- 1:03:01 – 1:05:12
Why expert systems ‘failed’ (or succeeded too well) and echoes in modern ML
Korkin explains two interpretations: expert systems underdelivered, or they became so embedded they transformed into modern tools. He argues successful AI—including AlphaFold—often hard-codes domain knowledge, a direct continuation of the expert-systems philosophy.
- •Two historical narratives: expert systems weren’t good enough vs became ubiquitous and rebranded
- •Expert knowledge embedded in modern ML architectures and pipelines
- •Biology as a domain where structured expertise matters
- •Transition from symbolic systems to data-driven systems as evolution, not replacement
- 1:05:12 – 1:33:48
AlphaFold2 and CASP: what ‘solved protein folding’ really means
They unpack CASP as the benchmark competition and clarify protein folding as predicting 3D structure from sequence. Korkin highlights that AlphaFold’s leap is real and useful, but ‘solved’ applies mainly to compact single-domain (or small) proteins, not the full multi-domain complexity of biology.
- •CASP: blinded evaluation using withheld experimental structures
- •Protein folding remains mechanistically mysterious despite predictable end states
- •AlphaFold’s key advance: contact map/distance constraints + ML innovations
- •Structure often more conserved than sequence; multiple sequence alignments provide powerful signals
- •‘Solved’ is limited: CASP targets are mostly compact; many real proteins are multi-domain and harder
- 1:33:48 – 1:38:17
Beyond single proteins: multi-domain folding, protein complexes, and interaction prediction (CAPRI)
Lex asks about multi-protein folding and complexes, and Korkin explains it blends folding with protein–protein interaction challenges. He discusses flexible scaffold proteins like PSD95 and notes that interaction prediction has its own benchmark ecosystem (CAPRI), where machine learning can learn scoring functions for docking quality.
- •Complexity jump: multi-domain proteins and flexible scaffolds (e.g., PSD95)
- •Many proteins don’t adopt one stable structure; function can depend on flexibility
- •Protein complexes require understanding both intramolecular folding and intermolecular binding
- •CAPRI benchmark focuses on macromolecular interactions (protein–protein/DNA)
- •ML can learn docking scoring functions and interaction determinants
- 1:38:17 – 1:51:31
Nobel prizes, protein design, and biosecurity: can AI engineer dangerous viruses?
They discuss whether AlphaFold-like tools could contribute to Nobel-worthy discoveries and how computational methods are increasingly recognized. The conversation then turns to protein design (Baker/Rosetta) and the dual-use risk of using ML to predict pathogenicity or potentially guide harmful engineering—tempered by experimental difficulty and regulation.
- •Nobel prizes evolving toward cross-disciplinary computational contributions
- •Historical precedent: computational biophysics recognized in protein folding Nobel (2013)
- •Protein design as an established field (Rosetta, de novo design)
- •ML for pathogenicity prediction could help forecast dangerous mutations
- •Dual-use concerns: engineering for harm vs barriers (limited data, experimental hurdles, regulation, gain-of-function debates)
- 1:51:31 – 1:55:32
What 2020 taught about science: speed of sequencing, structure, and global collaboration
Reflecting on the pandemic year, Korkin is impressed by the scientific community’s unprecedented pace. He contrasts SARS-era timelines with COVID-era months, highlighting massive sequencing efforts that enable fine-grained evolutionary tracking across geography and hosts.
- •Structural characterization accelerated from years (SARS) to months (COVID)
- •Unprecedented volume of viral sequencing enables precise evolutionary tracing
- •Evolution is host–pathogen co-evolution influenced by human genotypes
- •Geographic anchoring of sequences improves epidemiological interpretation
- •AI forecasting efforts (e.g., XPRIZE) remain data-limited but promising
- 1:55:32 – 2:12:36
Books, family, and a Russian poem: personal influences and closing reflections
Korkin recommends influential books spanning Russian literature, political-allegorical medicine, and von Neumann’s reflections on brains and computers, plus a deeply personal science memoir audiobook. He shares a New Year resolution about family time and ends by reciting a Russian poem that evokes winter, longing, and love.
- •Book recommendations: Bulgakov’s *Master and Margarita*; Solzhenitsyn’s *Cancer Ward*
- •Von Neumann’s essay *The Computer and the Brain* as dense late-life synthesis
- •Honorable mention audiobook: Hope Jahren’s *Lab Girl* (author-narrated, deeply personal)
- •Resolution: prioritize family and time with kids even after normalcy returns
- •Poem recitation in Russian and discussion of winter’s ‘magic’