No PriorsNo Priors Ep. 103 | With Vevo Therapeutics and the Arc Institute
At a glance
WHAT IT’S REALLY ABOUT
Tahoe 100 launches virtual cell era, redefining AI-driven drug discovery
- The episode features leaders from Vevo Therapeutics and the Arc Institute announcing Tahoe 100, a 100‑million–cell single‑cell RNA sequencing dataset, paired with Arc’s 230‑million–cell SC Basecamp to form a 330‑million–cell Virtual Cell Atlas.
- They argue that biology is entering its “virtual cell” moment, analogous to ImageNet and GPT in vision and language, enabling models that predict how cells respond to genetic and chemical perturbations rather than just protein structures.
- The discussion covers why prior single‑cell data were too small, noisy, and observational, how large perturbational datasets unlock causal and predictive modeling, and why open‑sourcing Tahoe 100 is strategically important.
- They outline how accurate virtual cell models could transform drug discovery, improve target selection, and reduce clinical failure rates, while reshaping how biotech companies, platforms, and global competition (including China) evolve.
IDEAS WORTH REMEMBERING
5 ideasPerturbational single-cell data is the missing foundation for causal models in biology.
Most historic single‑cell data are small, observational, and focused on healthy tissue, which limits models to correlations; Tahoe 100 massively expands drug-perturbed, disease-relevant data, enabling models that learn how interventions cause state changes in cells.
Virtual cell models complement, not replace, protein structure and language models.
Protein models capture binding and structural biology, but many drug failures stem from targeting the wrong pathways in complex cellular contexts; virtual cell models aim to learn the ‘system-level’ transcriptomic response of cells, bridging from molecular binding to organism-level outcomes.
Data quality and diversity matter as much as raw scale for training foundation models.
Early single‑cell foundation models barely degraded when trained on only ~1% of prior public data, revealing redundancy and narrow biological coverage; Tahoe 100 focuses on rich perturbations across 50 cancer models and 1,200 drugs with minimal batch effects to maximize information content.
Open-sourcing Tahoe 100 is a strategic move to amplify impact with a small team.
By making the dataset public and combining it with Arc’s SC Basecamp, Vevo and Arc catalyze an ecosystem of external researchers building virtual cell models, effectively multiplying their R&D capacity without scaling headcount.
AI agents can already perform valuable “plumbing” for biology by cleaning and unifying data.
Arc’s SC Basecamp uses an AI agent to crawl the Sequence Read Archive, re‑process heterogeneous datasets with uniform pipelines, and reduce analytical batch effects, demonstrating how agents can automate dry‑lab workflows and create higher‑quality inputs for models.
WORDS WORTH SAVING
5 quotes“Tahoe 100 is the world's biggest single-cell RNA sequencing dataset… We actually think it's the first dataset that's going to enable machine learning in this space.”
— Johnny (Vevo Therapeutics)
“This is the domain we are talking about… the language of systems biology. The first thing you should be doing is to try out the things that worked in the other domains in this domain.”
— Nima (Vevo Therapeutics)
“In biology we have treated humans as the foundation models that ingest information and come up with hypotheses… now we actually want to go beyond that.”
— Nima (Vevo Therapeutics)
“How do we go from a discipline that primarily respects experiments today to something more like physics, where theory drives a lot of progress? These virtual cell models are a core wedge in making that happen.”
— Patrick Hsu (Arc Institute)
“I think it's morning in bio… We should be playing a different kind of game here.”
— Nima (Vevo Therapeutics)
High quality AI-generated summary created from speaker-labeled transcript.
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome