CHAPTERS
NeurIPS meetup: Ellen Zhong’s lab focus—protein dynamics from cryo-EM and small-molecule structure elucidation
The conversation opens at a NeurIPS afterparty, introducing Ellen Zhong (Princeton) and her group’s agenda in molecular machine learning. She frames the lab around scientific discovery problems: inferring protein dynamics from cryo-EM and extending similar ideas to small-molecule structure elucidation.
Career path: D. E. Shaw supercomputers and molecular dynamics as the starting point
Zhong describes entering protein-structure work somewhat accidentally through D. E. Shaw Research. That experience grounded her in computationally intensive molecular dynamics (MD) simulations and the culture of high-performance computing for protein folding-related questions.
Switch to experimental structure: discovering cryo-EM during an MIT PhD exploration phase
During her PhD at MIT’s Computational & Systems Biology program, Zhong explored diverse biological measurement modalities before landing on cryo-EM. Cryo-EM appealed because it can ground questions of motion and structure in real experimental data rather than purely simulation outputs.
Why cryo-EM ‘took off’: a 2012–2013 inflection similar to deep learning’s rise
Zhong connects cryo-EM’s recent success to a technological inflection point—improvements in instruments and imaging that enabled atomic-resolution structures. That new data quality created a rich computational challenge: reconstructing 3D structure (and motion) from extremely noisy measurements.
Proteins aren’t static: ensembles, conformations, and molecular machines
The discussion shifts from structure as a single snapshot to proteins as dynamic systems. Cryo-EM images capture ensembles of particles in different states, so the central inference task becomes recovering multiple conformations and their relationships—key to understanding function.
Machine learning as inverse-problem solving: from noisy 2D projections to 3D distributions
Zhong explains where ML fits: cryo-EM analysis is fundamentally an inverse problem. The measurements are incomplete (noisy 2D projections), so learning requires physics-informed models that combine evidence across many images to infer 3D structures and distributions over states.
Lessons from D. E. Shaw, DeepMind (AlphaFold2 era), and academia: objectives, rigor, and problem structure
Reflecting on her time across industry and academia, Zhong highlights recurring themes: prioritize the scientific question, enforce reproducibility, and define problems cleanly when possible. She contrasts crisp objective-driven settings (e.g., AlphaFold-style tasks) with messier design problems where validation is harder.
Why protein dynamics remains unsolved—and why academia is a good home for it
Zhong argues that while static sequence-to-structure prediction has made major strides, protein dynamics lacks a general, practical description and remains a long-horizon research challenge. She positions academia as enabling deeper collaboration and exploratory work that blends computation with new science questions.
Collaboration model: pairing ML with structural biology and chemistry to turn data into discoveries
Zhong emphasizes that impactful progress comes from close partnerships with experimentalists who generate data and define what constitutes a real discovery. Her lab aims to build methods that automate analysis pipelines and extract additional information from complex experimental measurements.
What’s overhyped vs. underhyped: folding ‘solved’ for static structures, but biology is far bigger
In the closing segment, Zhong distinguishes between the success of static structure prediction and the broader landscape of unsolved problems. She notes that many proteins are massive multi-component machines and that large regions of structural and functional space remain poorly characterized.
Future of AI-driven biology: new experimental technologies + ML to bridge molecular biology and health
Zhong forecasts that major advances will require not only better models but also new experimental modalities that expand what data exists. With current AI largely trained on existing datasets, she argues that bridging molecular understanding to human health will depend on tighter ML–experiment co-design.
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome