No PriorsBiohub: The Future of Biology is Open-Source with Mark Zuckerberg, Priscilla Chan, and Alex Rives
CHAPTERS
Open tools for biology: the core thesis (cold open)
The episode opens with the central idea behind Biohub: build powerful biology tools and share them broadly, accelerating the entire scientific ecosystem. The guests frame success not as personally “curing diseases,” but enabling many scientists to move faster with open, generalizable models.
How “cure all disease” became a serious plan: origins of Biohub
Mark and Priscilla recount early conversations with leading scientists who laughed at the ambition, forcing them to ask what structurally slows biology. The key blockers they heard were silos, poor sharing, and missing durable tooling—prompting a tool-centric philanthropic strategy.
Biohub’s model: long-term tool development across hubs and universities
They describe the original Biohub approach: engineers + scientists working across institutions to build tools with long time horizons. Over time, the effort expanded geographically and became the primary philanthropic focus, unified by a “virtual biology” initiative.
Frontier AI + frontier biology: why biology needs new data generation
Unlike language models, biology lacks abundant “internet-scale” training data; much of the most valuable data doesn’t exist yet. The team explains why progress requires inventing new experimental methods—imaging, sensors, and cellular engineering—to create datasets that models can learn from.
From single-cell sequencing to community corpora: the tooling flywheel
Priscilla traces a through-line from early single-cell sequencing funding to the Human Cell Atlas and annotation tools like Cell by Gene. The broader point: shared datasets and software can seed communities that contribute far beyond the initial funder’s efforts, turning “stamp collecting” into foundation for modeling.
Micro-to-macro modeling: hierarchical world models from proteins to cells to systems
The group debates whether biology can be modeled “end-to-end” at higher levels or must be built up layer by layer. Their strategy is hierarchical: protein interactions enable cellular models, which enable tissue/immune-system-level understanding, with targeted experiments creating connective tissue across levels.
Mechanistic interpretability for protein language models: opening the black box
Alex explains applying mechanistic interpretability—popular in LLMs—to protein language models trained on billions of sequences. The aim is to extract biological insight from representations that capture latent structure/function “grammar,” connecting unknown proteins to known biology.
Why a nonprofit and why open-source: scale, time horizon, and ecosystem leverage
They argue a nonprofit structure better fits the ambition: generating new biological methods and datasets is not a simple pay-to-produce pipeline, and the work benefits from long horizons and open dissemination. Open-source also mobilizes broader talent and enables work on rare and niche diseases that markets may ignore.
What “understanding biology” means in practice: individualized mechanistic chains
Priscilla reframes disease impact as building mechanistic chains from genetic variants to proteins to phenotypes, enabling bespoke interventions. Today’s medicine often relies on coarse cohort analogies; the goal is to treat people as individuals with causal, testable understanding.
Timelines and early leverage points: systems like inflammation and immunity
On timelines for “curing all disease,” they emphasize dynamic complexity and uncertainty, but express increased optimism due to AI acceleration. Rather than picking single diseases, they highlight system-level targets—like inflammation and immune function—as high-leverage foundations others can translate into therapies.
From bench to bedside: what must change in clinical research and deployment
Priscilla notes translation to clinic is less clear than research acceleration; clinical research and safe deployment methods must evolve. They reference work like CRISPR Cures and examples such as rapid, targeted interventions in carefully chosen contexts (e.g., liver-delivered therapies) as early pathways.
ESMFold2 launch: a fast open “world model” of protein biology and design
Alex details ESMFold2 as a general protein model trained on billions of sequences that predicts atomic-resolution structures quickly and accurately. Beyond folding, its generality enables protein and antibody design digitally, followed by small-batch lab validation—illustrating the open discovery engine concept.
Validation, edge cases, and de-risking drug development: off-target effects and rare disease cohorts
They discuss pairing model-driven design with wet-lab confirmation (cell assays, cryo-EM) and using broader biological models to predict off-target effects (e.g., receptor expression in unexpected cell types). They also highlight patient-led rare disease registries and opt-in trial participation as accelerators for “edge case” learning and faster iteration.
What’s next: agentic pipelines, the “virtual cell,” and defining success in 5 years
Looking forward, they describe early integrations of ESMFold2 with agentic systems to automate design loops. The major research agenda is the virtual cell—models spanning genetic/transcriptomic/proteomic layers to phenotype with generalization to unseen interventions—while success is defined by world-class, uniquely better models that catalyze ecosystem-wide idea generation.