EVERY SPOKEN WORD
10 min read · 2,232 words- 0:00 – 0:11
Intro
- AGAnkit Gupta
[upbeat music]
- 0:11 – 0:55
Introduction
- AGAnkit Gupta
I'm Ankit from YC. We're here at NeurIPS at the after party we're hosting with The Arc Prize. I'm here with Ellen Zhong, assistant professor at Princeton who focuses on machine learning and structural biology. Really excited to have you here.
- EZEllen Zhong
Yeah, thanks for having me.
- AGAnkit Gupta
Could you tell us a bit about your research and what you're working on?
- EZEllen Zhong
Yeah. So our group wor- works on various problems in molecular machine learning with a focus on kind of scientific discovery of protein dynamics from cryo-electron microscopy or cryo-EM, and also now small molecule structure elucidation.
- AGAnkit Gupta
Okay, cool.
- EZEllen Zhong
Yeah.
- AGAnkit Gupta
Um, I wanna dive into that, but before that, I'd love to hear a bit about the backstory. How did you arrive at this field? What was the set of things you did before this-
- EZEllen Zhong
Mm-hmm
- AGAnkit Gupta
... that got you there?
- EZEllen Zhong
Yeah. So cryo-EM is a super cool, I guess, way of imaging
- 0:55 – 2:43
From Supercomputers to Cryo-EM
- EZEllen Zhong
protein structures, and I think the original kind of entry of, uh, my research into protein structures was an accident. It was, um, at D. E. Shaw Research.
- AGAnkit Gupta
Mm.
- EZEllen Zhong
So this, uh, you know, billionaire-funded non-profit research institute, uh, building supercomputers for folding, for protein folding. So I think I cut my teeth there, spent a couple years working there in New York, and then eventually went to do my PhD. Uh, for my PhD, I was in the computational systems biology PhD program at MIT, and my goal was just to, like, learn something new, right? I'd done MD simulations for a couple years, and I was like, "Okay, there are so much more interesting things in biology, uh, that's now made possible by AI," or at least computation at that time. You know, after exploring for a year in, like, different areas, neuroscience, mass spec, you know, RNA stuff, uh, learned about cryo-EM.
- AGAnkit Gupta
Mm.
- EZEllen Zhong
Which you take pictures of proteins to solve their 3D structures, and similar to molecular dynamics, you can get the dynamics of proteins from these electron microscope images, but it's an experimental technique. So instead of, like, these simulations where you're making predictions about the motions, now we actually are just looking at the data. That was super compelling because I think one of the shortcomings of molecular dynamics is that you still need to validate since it's still just this, like, simulation-based approach.
- AGAnkit Gupta
And so that, that original approach was, the molecular dynamics approach was-
- EZEllen Zhong
Yeah
- AGAnkit Gupta
... a very computationally heavy simulation-based approach.
- EZEllen Zhong
Yeah.
- AGAnkit Gupta
This was what Desres was doing with-
- EZEllen Zhong
Yeah
- AGAnkit Gupta
... giant supercomputers, and here you're saying this was a relatively new method of being able-
- EZEllen Zhong
Yeah
- AGAnkit Gupta
... to actually measure these, as opposed to X-ray crystallography-
- EZEllen Zhong
Mm-hmm
- AGAnkit Gupta
... you know, the method that's been used for 3D structure-
- EZEllen Zhong
Right
- AGAnkit Gupta
... measurement for-
- EZEllen Zhong
Yeah
- AGAnkit Gupta
... several decades now.
- EZEllen Zhong
Right. And like cryo-EM, I guess the rise of cryo-EM actually mirrors the rise of deep learning, right? So there
- 2:43 – 3:30
The Rise of Cryo-EM
- EZEllen Zhong
was this period in like 2012, 2013, where suddenly things just started working. Um, and by started working, like, there were just new technologies that enabled, uh, better images from the electron microscopes. And so now we could suddenly get atomic resolution structures of proteins. And then from the computer science perspective, it's like, okay, amazing, we can, like, study these proteins, but there's this interesting reconstruction problem. So how do we actually analyze these extremely noisy imaging data to infer the 3D atomic coordinates or the movies, right, the motions of these proteins?
- AGAnkit Gupta
And so does cryo-EM give you a static image, or does it give you multiple set images that give you the sort of sense of motion, or how exactly does that work?
- EZEllen Zhong
Yeah. So it's a static image in the sense that you take a single picture, but the picture is of an
- 3:30 – 4:30
Proteins as Dynamic Systems
- EZEllen Zhong
ensemble of different snapshots of the protein. And so that's the inference problem is how do you actually combine all these different snapshots into the different kind of conformations of the proteins? 'Cause you collect the data. You know, I think one major, uh, advance in structural biology over the last, you know, couple years is, okay, we- it's so hard to get a single structure, and we think of it as just this, like, static object, but in reality, like, everything is jiggling, everything is moving in order to actually perform functions that lead to life. These are machines that do things, right? And so if we can actually see the different conformations and the motions of these molecular machines, we can better understand how they work.
- AGAnkit Gupta
A- and so where does that now tie into machine learning, where-
- EZEllen Zhong
Yeah
- AGAnkit Gupta
... you know, here you've described so far an experimental method to-
- EZEllen Zhong
Mm-hmm
- AGAnkit Gupta
... just look at what proteins are doing-
- EZEllen Zhong
Mm-hmm
- AGAnkit Gupta
... and infer what-
- EZEllen Zhong
Yeah
- AGAnkit Gupta
... the motion might look like. Where does this now become a prediction problem, where you can do-
- EZEllen Zhong
Mm
- AGAnkit Gupta
... modeling and use that-
- EZEllen Zhong
Mm-hmm
- AGAnkit Gupta
... to do something new and interesting?
- EZEllen Zhong
Yeah. So on the machine learning side, the class of problems that our group works on are all
- 4:30 – 5:31
Inverse Problems in Biology
- EZEllen Zhong
these inverse problems. We have the experimental measurements, but they're actually extremely incomplete, right? So you have noisy 2D projection images, and somehow from this data you wanna infer the 3D structure. And so that's where machine learning comes in. So, uh, we're kind of using, you know, physics-inspired machine learning models to analyze and combine the data to be able to infer this distribution, like learn these complex distributions of structures from the imaging data.
- AGAnkit Gupta
So you've worked at a, a few different places that must have somewhat different cultures. There was Desres, famous at the time for-
- EZEllen Zhong
Mm-hmm
- AGAnkit Gupta
... this high-performance computing on large, I think basically custom computers.
- EZEllen Zhong
Yeah.
- AGAnkit Gupta
You worked at DeepMind, where-
- EZEllen Zhong
Yeah
- AGAnkit Gupta
... um, it's a somewhat different culture-
- EZEllen Zhong
Mm
- AGAnkit Gupta
... much more driven by machine learning. Uh, and now you're at Princeton-
- EZEllen Zhong
Mm-hmm
- AGAnkit Gupta
... um, in an academic setting.
- EZEllen Zhong
Mm-hmm.
- AGAnkit Gupta
And I'm curious what you draw from your past experiences-
- EZEllen Zhong
Mm
- AGAnkit Gupta
... that now affect how you-
- EZEllen Zhong
Yeah
- AGAnkit Gupta
... uh, run your lab and run your teaching.
- EZEllen Zhong
Yeah. I think I've ta- taken some, like, core lessons from all these different places, and it has
- 5:31 – 7:35
Lessons from DeepMind, Industry and Academia
- EZEllen Zhong
been super interesting, and I'm extremely privileged to have these different experiences, uh, both at Desres, at Google working with the AlphaFold team during the release of AlphaFold 2, which was, like, so crazy, and the, that environment is, like, a, a very unique environment.
- AGAnkit Gupta
I can imagine. Yeah.
- EZEllen Zhong
Yeah. Very exciting times. And then also-In academia, both as a grad student, you're just kind of like a singular person, a singular student just, like, chugging away doing science. I think in all of these places, though, there was a focus on the science, like what is the problem? What are you trying to solve? And then the lessons that I learned from each are just, like, different ways of, um, I guess structuring the problem, uh, a focus on reproducibility. That was, like, huge at D. E. Shaw Research. The problem is definitely, like, a huge focus of DeepMind and how they do things, right? It's just, like, a single objective function that if you can cleanly describe your problem like that, then you can optimize. That's very much true for some of the problems in cryo-EM. Other problems in machine learning for structural biology become, like, a lot messier, so like protein design. There, it's unclear how you validate, um-
- AGAnkit Gupta
Right, it's not a singular problem in that case.
- EZEllen Zhong
Yeah.
- AGAnkit Gupta
Yeah.
- EZEllen Zhong
It's more of a design problem. One of the really interesting things about academia and research right now is how do you define these problems in terms of, like, a machine learning, you know, optimization objective typically, or maybe that's not exactly even how we wanna frame it anymore. But at a high level, uh, one of the interesting challenges and all these different experiences gave me, like, different perspectives for thinking about structuring the research problem.
- AGAnkit Gupta
Yeah, interesting. And, and you know, when I think about a lot of work that's happened in protein design, a lot of it is focused on these, um, static ways of representing proteins and predicting their folds, as in predicting the ability for a sequence to become a particular static structure. To your point earlier, though, that's not a complete picture of-
- EZEllen Zhong
Yeah
- AGAnkit Gupta
... what a protein's actually doing.
- EZEllen Zhong
Yeah.
- AGAnkit Gupta
Does that inform your work now? Do you have... How do you think about that in terms of-
- EZEllen Zhong
Yeah
- AGAnkit Gupta
... what types of machine learning do you think will be impactful here?
- EZEllen Zhong
That was one of the main reasons why I decided to go down the academic path,
- 7:35 – 8:29
Why Protein Dynamics Remain Unsolved
- EZEllen Zhong
because I think there's a lot of really long-term just, like, uh, research directions and problems that we don't understand, and protein dynamics is one of those, right? We don't really have a good grasp on the motions of proteins in general, a way of describing it. Um, and so if we wanna study these molecular machines, uh, you know, we want to be able to focus on just, uh, like, looking at these things with cryo-EM. Um, will machine learning alone be able to do that? No. And so I think one of the kind of nice things about academia is that you can collaborate with, like, lots of different research groups and, like, figure out what exactly is the science to be doing next.
- AGAnkit Gupta
Does that mean that there's a, a lot of experimental work that's also happening at the same time as the computational work to make that happen, or what types of collaborations are you referring to there?
- EZEllen Zhong
Yeah. Our group does a lot of collaborations with both structural biologists and now chemists, because what
- 8:29 – 9:28
Collaborating with Experimental Scientists
- EZEllen Zhong
sparks joy for me is being able to directly contribute to this discovery process. And so in this kind of class of inverse problems that we work on, it's always working with experimentalists who have the data, who have the expertise to know, like, what's a discovery or not, and then developing methods that can help process to, like, automate or, uh, kind of reveal more information from the data.
- AGAnkit Gupta
I mean, the space you're in is really exciting right now. I mean, the Nobel Prize last year went to-
- EZEllen Zhong
Yeah
- AGAnkit Gupta
... work related to protein design, um, some of the work you contributed to.
- EZEllen Zhong
Mm-hmm.
- AGAnkit Gupta
There's, there's a sense that, okay, in many ways, protein folding has been solved, people say.
- EZEllen Zhong
Mm-hmm.
- AGAnkit Gupta
Um, I'm, I'm curious if you see any aspects of-
- EZEllen Zhong
Mm-hmm
- AGAnkit Gupta
... machine learning in-
- EZEllen Zhong
Mm-hmm
- AGAnkit Gupta
... in protein design or machine learning in structural biology broadly as either over-hyped or under-hyped, and maybe that can be kind of one of the last things we talk about here.
- EZEllen Zhong
Yeah. I think, I mean, maybe this, like, specific problem of, uh, genetically encoded
- 9:28 – 10:51
What’s Overhyped and Underhyped in AI-Driven Biology
- EZEllen Zhong
sequence to relatively static protein structure prediction has been solved.
- AGAnkit Gupta
Right.
- EZEllen Zhong
But there's, like, so much more beyond that, so the dynamics that we've talked a lot about already. Um, but then, you know, one of the humbling things about working with the experimentalists, and specifically cryo-EM structural biologists, is that a lot of these proteins are gigantic.
- AGAnkit Gupta
Right.
- EZEllen Zhong
They're like, you know, these machines composed of tens or, like, hundreds of different complexes all together that are somehow assembled in cells and, you know, perform, like, do all these complex motions, right? And so I think there's so much more that we don't know. Uh, there's a lot of, you know, we have a good idea of kind of alpha helices and beta sheets, these, like, ca- canonical motifs in protein structures, but there's certainly lots of structural space that we still don't know even how to describe. So, you know, we've solved this extremely fundamental base problem, and now we can work on, like, this entire other ambient space of additional problems.
- AGAnkit Gupta
Do those additional problems look like taking the methods that have worked for the base problems and basically composing them into bigger ones, or is ... or do you think it's gonna be basically, like, a fundamentally different set of methods that it takes because those don't scale for whatever reason to these more complex or, you know, multi, uh-
- EZEllen Zhong
Interesting. I think some of them will, uh, but I do think for, like, truly new
- 10:51 – 11:51
The Future of AI-Driven Biology
- EZEllen Zhong
discoveries, it's gonna be a combination of new experimental data sources, right? Like, right now, the machine learning and, like, AI models have ingested all the data that we have, and there's, like, interesting things we can do just by distilling, like, whatever knowledge is in this, in these models. Um, but I think there's so much we don't understand about biology, right? Like, there's such a gap between, uh, molecular biology and, like, human health.
- AGAnkit Gupta
Right.
- EZEllen Zhong
So I think to really bridge that gap, there's going to need to be new experimental technologies, and then I, I'm excited in the future about kind of how we can collaborate with the, kind of collaborate with experimentalists and develop new kind of machine learning enabled models for, um, you know, doing science.
- AGAnkit Gupta
Awesome. I think that's a really exciting future.
- EZEllen Zhong
Yeah.
- AGAnkit Gupta
Really excited you're working on that. Thanks so much.
- EZEllen Zhong
Yeah. Thank you. [outro music]
Episode duration: 11:52
Install uListen for AI-powered chat & search across the full episode — Get Full Transcript
Transcript of episode s5w7pbpeGZ0
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome