Publications
ThesisJan 2025

Computational methods to characterize the role of genomic repeats and duplicated genes in gene regulation

Morrissey, A
Product Used
Oligo Pools
Abstract
Transposable elements (TEs) and other repetitive regions have been shown to contain gene regulatory elements, including transcription factor binding sites. However, regulatory elements harbored by repeats have proven difficult to characterize using short-read sequencing assays such as ChIP-seq or ATAC-seq. Most regulatory genomics analysis pipelines discard multi-mapped reads that align equally well to multiple genomic locations. Because multi-mapped reads arise predominantly from repeats, current analysis pipelines fail to detect a substantial portion of regulatory events that occur in repetitive regions. In this dissertation, we describe a new approach, Allo, to allocate multi-mapped reads in an efficient, accurate, and user-friendly manner. Allo combines probabilistic mapping of multi-mapped reads with a convolutional neural network that recognizes the read distribution features of potential peaks, offering enhanced accuracy in multimapping read assignment. Allo also provides read-level output in the form of a corrected alignment file, making it compatible with existing regulatory genomics analysis pipelines and downstream peak-finders. Allo may be particularly beneficial in identifying ChIP-seq peaks at centromeres, near segmentally duplicated genes, and in younger TEs, enabling new regulatory analyses in these regions. To demonstrate the utility of Allo, this dissertation also contains sections with applications across a broad range of biological contexts. Allo is used to study transposable element binding and expression in CCR4-NOT mediated transcriptional regulation, stress erythropoiesis, and the maintenance of pluripotency in mouse ESCs. Finally, Allo is applied, along with the new Telomere-to-Telomere genome assembly, to better study the gene regulatory networks of segmentally duplicated genes through the discovery of thousands of previously uncharacterized binding sites of transcription factors.
Product Used
Oligo Pools

Related Publications