Skip to content
YC Root AccessYC Root Access

The Future of AI Molecular Discovery

During last month’s NeurIPS 2025 conference, YC’s Ankit Gupta sat down with Ellen Zhong, an assistant professor of computer science at Princeton, to discuss how machine learning is reshaping structural biology. They explore how proteins aren’t static structures but dynamic molecular machines, and how techniques like cryo-electron microscopy combined with ML are revealing protein motion beyond traditional structure prediction. The conversation also dives into inverse problems, noisy experimental data, and what’s next for AI-driven scientific discovery. Chapters 00:11 — Introduction 00:55 — From Supercomputers to Cryo-EM 02:43 — The Rise of Cryo-EM 03:30 — Proteins as Dynamic Systems 04:30 — Inverse Problems in Biology 05:31 — Lessons from DeepMind, Industry and Academia 07:35 — Why Protein Dynamics Remain Unsolved 08:29 — Collaborating with Experimental Scientists 09:28 — What’s Overhyped and Underhyped in AI-Driven Biology 10:51 — The Future of AI-Driven Biology Apply to Y Combinator: https://www.ycombinator.com/apply Work at a startup: https://www.ycombinator.com/jobs

Ankit GuptahostEllen Zhongguest
Jan 24, 202611mWatch on YouTube ↗

EVERY SPOKEN WORD

  1. 0:000:11

    Intro

    1. AG

      [upbeat music]

  2. 0:110:55

    Introduction

    1. AG

      I'm Ankit from YC. We're here at NeurIPS at the after party we're hosting with The Arc Prize. I'm here with Ellen Zhong, assistant professor at Princeton who focuses on machine learning and structural biology. Really excited to have you here.

    2. EZ

      Yeah, thanks for having me.

    3. AG

      Could you tell us a bit about your research and what you're working on?

    4. EZ

      Yeah. So our group wor- works on various problems in molecular machine learning with a focus on kind of scientific discovery of protein dynamics from cryo-electron microscopy or cryo-EM, and also now small molecule structure elucidation.

    5. AG

      Okay, cool.

    6. EZ

      Yeah.

    7. AG

      Um, I wanna dive into that, but before that, I'd love to hear a bit about the backstory. How did you arrive at this field? What was the set of things you did before this-

    8. EZ

      Mm-hmm

    9. AG

      ... that got you there?

    10. EZ

      Yeah. So cryo-EM is a super cool, I guess, way of imaging

  3. 0:552:43

    From Supercomputers to Cryo-EM

    1. EZ

      protein structures, and I think the original kind of entry of, uh, my research into protein structures was an accident. It was, um, at D. E. Shaw Research.

    2. AG

      Mm.

    3. EZ

      So this, uh, you know, billionaire-funded non-profit research institute, uh, building supercomputers for folding, for protein folding. So I think I cut my teeth there, spent a couple years working there in New York, and then eventually went to do my PhD. Uh, for my PhD, I was in the computational systems biology PhD program at MIT, and my goal was just to, like, learn something new, right? I'd done MD simulations for a couple years, and I was like, "Okay, there are so much more interesting things in biology, uh, that's now made possible by AI," or at least computation at that time. You know, after exploring for a year in, like, different areas, neuroscience, mass spec, you know, RNA stuff, uh, learned about cryo-EM.

    4. AG

      Mm.

    5. EZ

      Which you take pictures of proteins to solve their 3D structures, and similar to molecular dynamics, you can get the dynamics of proteins from these electron microscope images, but it's an experimental technique. So instead of, like, these simulations where you're making predictions about the motions, now we actually are just looking at the data. That was super compelling because I think one of the shortcomings of molecular dynamics is that you still need to validate since it's still just this, like, simulation-based approach.

    6. AG

      And so that, that original approach was, the molecular dynamics approach was-

    7. EZ

      Yeah

    8. AG

      ... a very computationally heavy simulation-based approach.

    9. EZ

      Yeah.

    10. AG

      This was what Desres was doing with-

    11. EZ

      Yeah

    12. AG

      ... giant supercomputers, and here you're saying this was a relatively new method of being able-

    13. EZ

      Yeah

    14. AG

      ... to actually measure these, as opposed to X-ray crystallography-

    15. EZ

      Mm-hmm

    16. AG

      ... you know, the method that's been used for 3D structure-

    17. EZ

      Right

    18. AG

      ... measurement for-

    19. EZ

      Yeah

    20. AG

      ... several decades now.

    21. EZ

      Right. And like cryo-EM, I guess the rise of cryo-EM actually mirrors the rise of deep learning, right? So there

  4. 2:433:30

    The Rise of Cryo-EM

    1. EZ

      was this period in like 2012, 2013, where suddenly things just started working. Um, and by started working, like, there were just new technologies that enabled, uh, better images from the electron microscopes. And so now we could suddenly get atomic resolution structures of proteins. And then from the computer science perspective, it's like, okay, amazing, we can, like, study these proteins, but there's this interesting reconstruction problem. So how do we actually analyze these extremely noisy imaging data to infer the 3D atomic coordinates or the movies, right, the motions of these proteins?

    2. AG

      And so does cryo-EM give you a static image, or does it give you multiple set images that give you the sort of sense of motion, or how exactly does that work?

    3. EZ

      Yeah. So it's a static image in the sense that you take a single picture, but the picture is of an

  5. 3:304:30

    Proteins as Dynamic Systems

    1. EZ

      ensemble of different snapshots of the protein. And so that's the inference problem is how do you actually combine all these different snapshots into the different kind of conformations of the proteins? 'Cause you collect the data. You know, I think one major, uh, advance in structural biology over the last, you know, couple years is, okay, we- it's so hard to get a single structure, and we think of it as just this, like, static object, but in reality, like, everything is jiggling, everything is moving in order to actually perform functions that lead to life. These are machines that do things, right? And so if we can actually see the different conformations and the motions of these molecular machines, we can better understand how they work.

    2. AG

      A- and so where does that now tie into machine learning, where-

    3. EZ

      Yeah

    4. AG

      ... you know, here you've described so far an experimental method to-

    5. EZ

      Mm-hmm

    6. AG

      ... just look at what proteins are doing-

    7. EZ

      Mm-hmm

    8. AG

      ... and infer what-

    9. EZ

      Yeah

    10. AG

      ... the motion might look like. Where does this now become a prediction problem, where you can do-

    11. EZ

      Mm

    12. AG

      ... modeling and use that-

    13. EZ

      Mm-hmm

    14. AG

      ... to do something new and interesting?

    15. EZ

      Yeah. So on the machine learning side, the class of problems that our group works on are all

  6. 4:305:31

    Inverse Problems in Biology

    1. EZ

      these inverse problems. We have the experimental measurements, but they're actually extremely incomplete, right? So you have noisy 2D projection images, and somehow from this data you wanna infer the 3D structure. And so that's where machine learning comes in. So, uh, we're kind of using, you know, physics-inspired machine learning models to analyze and combine the data to be able to infer this distribution, like learn these complex distributions of structures from the imaging data.

    2. AG

      So you've worked at a, a few different places that must have somewhat different cultures. There was Desres, famous at the time for-

    3. EZ

      Mm-hmm

    4. AG

      ... this high-performance computing on large, I think basically custom computers.

    5. EZ

      Yeah.

    6. AG

      You worked at DeepMind, where-

    7. EZ

      Yeah

    8. AG

      ... um, it's a somewhat different culture-

    9. EZ

      Mm

    10. AG

      ... much more driven by machine learning. Uh, and now you're at Princeton-

    11. EZ

      Mm-hmm

    12. AG

      ... um, in an academic setting.

    13. EZ

      Mm-hmm.

    14. AG

      And I'm curious what you draw from your past experiences-

    15. EZ

      Mm

    16. AG

      ... that now affect how you-

    17. EZ

      Yeah

    18. AG

      ... uh, run your lab and run your teaching.

    19. EZ

      Yeah. I think I've ta- taken some, like, core lessons from all these different places, and it has

  7. 5:317:35

    Lessons from DeepMind, Industry and Academia

    1. EZ

      been super interesting, and I'm extremely privileged to have these different experiences, uh, both at Desres, at Google working with the AlphaFold team during the release of AlphaFold 2, which was, like, so crazy, and the, that environment is, like, a, a very unique environment.

    2. AG

      I can imagine. Yeah.

    3. EZ

      Yeah. Very exciting times. And then also-In academia, both as a grad student, you're just kind of like a singular person, a singular student just, like, chugging away doing science. I think in all of these places, though, there was a focus on the science, like what is the problem? What are you trying to solve? And then the lessons that I learned from each are just, like, different ways of, um, I guess structuring the problem, uh, a focus on reproducibility. That was, like, huge at D. E. Shaw Research. The problem is definitely, like, a huge focus of DeepMind and how they do things, right? It's just, like, a single objective function that if you can cleanly describe your problem like that, then you can optimize. That's very much true for some of the problems in cryo-EM. Other problems in machine learning for structural biology become, like, a lot messier, so like protein design. There, it's unclear how you validate, um-

    4. AG

      Right, it's not a singular problem in that case.

    5. EZ

      Yeah.

    6. AG

      Yeah.

    7. EZ

      It's more of a design problem. One of the really interesting things about academia and research right now is how do you define these problems in terms of, like, a machine learning, you know, optimization objective typically, or maybe that's not exactly even how we wanna frame it anymore. But at a high level, uh, one of the interesting challenges and all these different experiences gave me, like, different perspectives for thinking about structuring the research problem.

    8. AG

      Yeah, interesting. And, and you know, when I think about a lot of work that's happened in protein design, a lot of it is focused on these, um, static ways of representing proteins and predicting their folds, as in predicting the ability for a sequence to become a particular static structure. To your point earlier, though, that's not a complete picture of-

    9. EZ

      Yeah

    10. AG

      ... what a protein's actually doing.

    11. EZ

      Yeah.

    12. AG

      Does that inform your work now? Do you have... How do you think about that in terms of-

    13. EZ

      Yeah

    14. AG

      ... what types of machine learning do you think will be impactful here?

    15. EZ

      That was one of the main reasons why I decided to go down the academic path,

  8. 7:358:29

    Why Protein Dynamics Remain Unsolved

    1. EZ

      because I think there's a lot of really long-term just, like, uh, research directions and problems that we don't understand, and protein dynamics is one of those, right? We don't really have a good grasp on the motions of proteins in general, a way of describing it. Um, and so if we wanna study these molecular machines, uh, you know, we want to be able to focus on just, uh, like, looking at these things with cryo-EM. Um, will machine learning alone be able to do that? No. And so I think one of the kind of nice things about academia is that you can collaborate with, like, lots of different research groups and, like, figure out what exactly is the science to be doing next.

    2. AG

      Does that mean that there's a, a lot of experimental work that's also happening at the same time as the computational work to make that happen, or what types of collaborations are you referring to there?

    3. EZ

      Yeah. Our group does a lot of collaborations with both structural biologists and now chemists, because what

  9. 8:299:28

    Collaborating with Experimental Scientists

    1. EZ

      sparks joy for me is being able to directly contribute to this discovery process. And so in this kind of class of inverse problems that we work on, it's always working with experimentalists who have the data, who have the expertise to know, like, what's a discovery or not, and then developing methods that can help process to, like, automate or, uh, kind of reveal more information from the data.

    2. AG

      I mean, the space you're in is really exciting right now. I mean, the Nobel Prize last year went to-

    3. EZ

      Yeah

    4. AG

      ... work related to protein design, um, some of the work you contributed to.

    5. EZ

      Mm-hmm.

    6. AG

      There's, there's a sense that, okay, in many ways, protein folding has been solved, people say.

    7. EZ

      Mm-hmm.

    8. AG

      Um, I'm, I'm curious if you see any aspects of-

    9. EZ

      Mm-hmm

    10. AG

      ... machine learning in-

    11. EZ

      Mm-hmm

    12. AG

      ... in protein design or machine learning in structural biology broadly as either over-hyped or under-hyped, and maybe that can be kind of one of the last things we talk about here.

    13. EZ

      Yeah. I think, I mean, maybe this, like, specific problem of, uh, genetically encoded

  10. 9:2810:51

    What’s Overhyped and Underhyped in AI-Driven Biology

    1. EZ

      sequence to relatively static protein structure prediction has been solved.

    2. AG

      Right.

    3. EZ

      But there's, like, so much more beyond that, so the dynamics that we've talked a lot about already. Um, but then, you know, one of the humbling things about working with the experimentalists, and specifically cryo-EM structural biologists, is that a lot of these proteins are gigantic.

    4. AG

      Right.

    5. EZ

      They're like, you know, these machines composed of tens or, like, hundreds of different complexes all together that are somehow assembled in cells and, you know, perform, like, do all these complex motions, right? And so I think there's so much more that we don't know. Uh, there's a lot of, you know, we have a good idea of kind of alpha helices and beta sheets, these, like, ca- canonical motifs in protein structures, but there's certainly lots of structural space that we still don't know even how to describe. So, you know, we've solved this extremely fundamental base problem, and now we can work on, like, this entire other ambient space of additional problems.

    6. AG

      Do those additional problems look like taking the methods that have worked for the base problems and basically composing them into bigger ones, or is ... or do you think it's gonna be basically, like, a fundamentally different set of methods that it takes because those don't scale for whatever reason to these more complex or, you know, multi, uh-

    7. EZ

      Interesting. I think some of them will, uh, but I do think for, like, truly new

  11. 10:5111:51

    The Future of AI-Driven Biology

    1. EZ

      discoveries, it's gonna be a combination of new experimental data sources, right? Like, right now, the machine learning and, like, AI models have ingested all the data that we have, and there's, like, interesting things we can do just by distilling, like, whatever knowledge is in this, in these models. Um, but I think there's so much we don't understand about biology, right? Like, there's such a gap between, uh, molecular biology and, like, human health.

    2. AG

      Right.

    3. EZ

      So I think to really bridge that gap, there's going to need to be new experimental technologies, and then I, I'm excited in the future about kind of how we can collaborate with the, kind of collaborate with experimentalists and develop new kind of machine learning enabled models for, um, you know, doing science.

    4. AG

      Awesome. I think that's a really exciting future.

    5. EZ

      Yeah.

    6. AG

      Really excited you're working on that. Thanks so much.

    7. EZ

      Yeah. Thank you. [outro music]

Episode duration: 11:52

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode s5w7pbpeGZ0

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome