Publications
ThesisJan 2020

Mining the sequence space of antibody repertoires to predict and design antigen-specific antibodies

Friedensohn, S
Product Used
Genes
Abstract
The mammalian adaptive immune system is able to identify specific molecular structures on foreign pathogens. Specificity to these epitopes is achieved through a group of receptors belonging to the immunoglobulin superfamily: B cell receptors (BCR), their secreted version (Antibodies) and T cell receptors (TCR). Each of these receptors carries highly variable regions, which facilitate antigen recognition and which are generated during progenitor cell development (and thus are thought to be unique clones or clonal lineages). The current estimate for the theoretical diversity of unique naïve BCR sequences is around 5x1013 clonal combinations for humans and at least 1012 for mice. The diverse population of BCRs, antibodies or TCRs in a given individual is referred to as the immune repertoire. Immune repertoire sequencing (AIRR-Seq, Ig-Seq) utilizes deep sequencing to access and analyze this vast diversity in different immunological compartments and immune cell subsets. This massive wealth of information has generated novel insights in the fields of antibody engineering, immunodiagnostics, vaccine design, as well as basic immunology. In Chapter 1 of this thesis, I review the current trends in immune repertoire sequencing and the efforts taken to improve existing protocols in relation to accuracy and quality of the sequencing data. I highlight several of the most major challenges in the field, such as obtaining paired variable region (e.g., variable heavy and variable light) sequencing and a lack of accuracy. For example, since sequencing library preparation and platforms for deep sequencing can introduce errors and biases, it can compromise immunological interpretations. This is especially confounding in the context of B cells that undergo somatic hypermutation, a natural process that introduces mutations in antibody variable regions. In Chapter 2, I describe an experimental and computational method we have developed based on synthetic standards and molecular barcoding, which has been implemented to achieve highly accurate antibody repertoire sequencing. We show how this conceptually simple procedure allows us to significantly reduce error rates across the whole sequencing region. By applying this technique to human B cell samples, we demonstrate that it can improve the measurements of antibody repertoires across various dimensions. Although it is now possible to produce high quality Ig-Seq datasets, linking sequence to antigen-specificity is an immensely challenging task. In Chapter 3, I provide an introduction to the concept of modeling the large sequence space of immune repertoires in order to extract deterministic sequence motifs that correlate with antigen exposure and specificity. I review various classes of statistical and machine learning algorithms that can be used to model sequence generation. In chapter 4 I develop a novel approach to identify antigen-specific sequence patterns in antibody repertoires based on generative deep models. To model the underlying process of BCR generation, variational autoencoders (VAE)s were used, where it was assumed that data generation follows a Gaussian mixture model (GMM) in latent space. This provided both a latent embedding and also cluster labels that group similar sequences together, which revealed a multitude of convergent, antigen-associated sequence patterns. These antigen-associated sequence patterns were predictive of immunological history and represent antigen-binding antibodies. Finally, I demonstrate how these sequence patterns can be used to generate further antigen-specific antibodies in silico, that are experimentally verified to retain antigen-specificity.
Product Used
Genes

Related Publications