Publications
Diffusion Models for Protein Structure Design: From Backbone Generation to Atomic-Resolution Enzyme Design
Abstract
The field of protein structure modeling has been revolutionized by the introduction of deep learning methods, particularly AlphaFold2, which has achieved near-experimental accuracy in predicting protein structures from amino acid sequences. This dissertation explores the application of diffusion models to create general solutions to protein design tasks. We introduce RFdiffusion, a model that generates protein structures as a series of backbone frames, which achieves state of the art performance on unconditional generation, motif scaffolding, and protein-protein binder design. We then leverage a broadened molecular vocabulary to predict general biomolecular structures including nucleic acids, small molecules, post-translational modifications, metals, and ions with RoseTTAFoldAA. Using the RoseTTAFoldAA architecture we finetune a diffusion model capable of generating proteins which bind small molecules. Finally, we present RFdiffusion2, a flow-matching model trained from random weight initializations capable of unindexed atomic motif scaffolding, enabling the design of enzymes with complex active sites. In all cases we validate the design capabilities of the models in vitro. Our work demonstrates the potential of diffusion models to advance the field of protein design and opens new avenues for enzyme engineering.
Related Publications