The Future of AI Molecular Discovery

During last month’s NeurIPS 2025 conference, YC’s Ankit Gupta sat down with Ellen Zhong, an assistant professor of computer science at Princeton, to discuss how machine learning is reshaping structural biology. They explore how proteins aren’t static structures but dynamic molecular machines, and how techniques like cryo-electron microscopy combined with ML are revealing protein motion beyond traditional structure prediction. The conversation also dives into inverse problems, noisy experimental data, and what’s next for AI-driven scientific discovery. Chapters 00:11 — Introduction 00:55 — From Supercomputers to Cryo-EM 02:43 — The Rise of Cryo-EM 03:30 — Proteins as Dynamic Systems 04:30 — Inverse Problems in Biology 05:31 — Lessons from DeepMind, Industry and Academia 07:35 — Why Protein Dynamics Remain Unsolved 08:29 — Collaborating with Experimental Scientists 09:28 — What’s Overhyped and Underhyped in AI-Driven Biology 10:51 — The Future of AI-Driven Biology Apply to Y Combinator: https://www.ycombinator.com/apply Work at a startup: https://www.ycombinator.com/jobs

Ankit GuptahostEllen Zhongguest

Jan 23, 202611mWatch on YouTube ↗

WHAT IT’S REALLY ABOUT

AI meets cryo-EM to reveal protein motions and discovery

Zhong’s lab applies machine learning to inverse problems in structural biology, especially reconstructing protein structures and dynamics from noisy cryo-EM images.
Cryo-EM’s recent surge parallels deep learning’s rise, with improved instrumentation enabling near-atomic resolution and creating new computational reconstruction challenges.
A central theme is that proteins are dynamic molecular machines, so focusing only on static sequence-to-structure prediction misses functionally critical conformational changes.
Experience across D. E. Shaw (reproducibility and simulation), DeepMind/AlphaFold (clean objectives and optimization), and academia (open-ended problems) shapes how Zhong frames research problems.
Future progress will require not just better models but new experimental data sources and tight collaboration with experimentalists to bridge molecular insights to human health.

IDEAS WORTH REMEMBERING

5 ideas

Cryo-EM turns structural biology into a data-driven inverse problem.

Rather than directly “seeing” a single 3D structure, cryo-EM produces noisy 2D projections of many particle states; ML helps infer the underlying 3D structures and distributions of conformations.

Protein function often lives in motion, not a single structure.

Zhong emphasizes that proteins “jiggle” and shift between conformations to perform work, so understanding dynamics is essential for mechanistic biology and next-generation discovery.

Experimental measurements address a key weakness of pure simulation.

Compared with molecular dynamics, cryo-EM is compelling because it is grounded in observed data, reducing (though not eliminating) the burden of validating purely simulated motions.

Defining the objective is easy for some tasks and messy for others.

Structure reconstruction can often be framed with clearer likelihood/objective formulations, while design problems (e.g., protein design) can be harder to validate and may not reduce cleanly to a single metric.

Scaling from “folding solved” to real biology requires new methods and new data.

Even if sequence-to-static-structure is strong, many real cellular machines are massive multi-component assemblies with complex, poorly characterized structural space that likely needs new experimental technologies plus ML.

WORDS WORTH SAVING

5 quotes

I think one major, uh, advance in structural biology over the last, you know, couple years is, okay, we- it's so hard to get a single structure, and we think of it as just this, like, static object, but in reality, like, everything is jiggling, everything is moving in order to actually perform functions that lead to life.

— Ellen Zhong

So on the machine learning side, the class of problems that our group works on are all these inverse problems. We have the experimental measurements, but they're actually extremely incomplete, right? So you have noisy 2D projection images, and somehow from this data you wanna infer the 3D structure.

— Ellen Zhong

That was one of the main reasons why I decided to go down the academic path, because I think there's a lot of really long-term just, like, uh, research directions and problems that we don't understand, and protein dynamics is one of those, right? We don't really have a good grasp on the motions of proteins in general, a way of describing it.

— Ellen Zhong

Will machine learning alone be able to do that? No.

— Ellen Zhong

I think to really bridge that gap, there's going to need to be new experimental technologies, and then I, I'm excited in the future about kind of how we can collaborate with the, kind of collaborate with experimentalists and develop new kind of machine learning enabled models for, um, you know, doing science.

— Ellen Zhong

Cryo-EM reconstruction from noisy 2D projectionsProtein dynamics vs static structuresInverse problems and physics-inspired MLMolecular dynamics simulations and validation limitsLessons from D. E. Shaw and AlphaFold/DeepMindProblem formulation, objectives, and reproducibilityCollaboration with structural biologists and chemists

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.