Publications
Machine learning methods for efficient antibody discovery, engineering and optimization
Abstract
Antibodies have emerged as one of the most important biopharmaceuticals with transformative outcomes in the treatment of various diseases including cancer, autoimmune disorders, and infectious diseases. Despite their success, the discovery, engineering and optimization of therapeutic antibodies remain limited by experimental bottlenecks along the entire development pipeline that substantially increase the cost of bringing an antibody therapeutic to the patients. Traditional in vivo discovery campaigns generate high-affinity antibodies through in vivo maturation that possess favorable developability properties as opposed to in vitro methods. However, in vivo discovery relies heavily on animal immunization and experimental screening of B cells and developability optimization is constrained by experimental low-throughput assays, and therefore costly and labor-intensive. Computational advancements, such as machine learning (ML), have the potential to transform this field, but are equally constrained by limited data availability. In this thesis, we address key challenges in antibody discovery, affinity engineering and developability optimization through three complementary studies. First, we generated a unique dataset of single-cell transcriptomes and antibody repertoires from immunized mice labeled for antigen specificity. We investigated predictive patterns in transcriptome and antibody amino acid sequences and demonstrated that gene expression-based ML models outperform sequence-based approaches in predicting antigen specificity within an antigen cohort. This work highlights the potential of single-cell gene expression patterns for in vivo antibody discovery. Second, we developed a workflow for ML-guided affinity engineering of an antigen-specific antibody variant. Using antibody repertoires from immunized mice a computational workflow aimed to select a set of antigen-binding variants was developed. The amino acid sequences and their experimentally measured affinities were used to train ML regression models and were able to accurately predict continuous affinity values. This approach enabled the ML-guided design of eight synthetic antibody variants, of which seven exhibited the desired affinities when experimentally validated. These findings highlight the feasibility of leveraging small datasets (
Product Used
Genes
Related Publications