Publications
ThesisJan 2025

Development and Application of Stochastic Methods for Modeling DNA-Binding Protein Specificity and Dynamics

Farhat, A
Product Used
Genes
Abstract
Gene regulation has been of profound interest to the scientific community since the 1900s, when chromosomes were identified as a physical carrier of genes. Genes are molecular entities, and thus the chemical interactions of genes and their regulators have profound implications on the study of biology. However, gene regulation is a vast interconnected web. Despite many advances in biochemistry, studying all of the implications of even a single regulatory system is difficult to do directly. Here, I present my graduate work, where I computationally analyze these complex sets of interactions. In eukaryotes, genes are often regulated by modifications of histones, proteins which help condense DNA and whose effects can be passed down epigenetically, without changing any underlying genes. To help understand how histone modifications can lead to gene regulation, I worked with experimental collaborators to study the gene silencing histone binding protein Swi6 in fission yeast. Swi6 binds histones to silence genetic loci, preferring histones marked by H3K9me3. The silencing mechanism for Swi6 involves binding other Swi6's on histones, and the ensemble silences gene loci. This prevents gene expression and keeps structural DNA regions like the pericentromeric region from being expressed as though they were genes. I created an algorithm which used my collaborators' single molecule binding data on Swi6 to infer the textit{in vivo} chemical rate constants of Swi6's transitions to different states in the nucleus. With these rate constants, we drew two conclusions. First, while Swi6 proteins bind both nonspecific and H3K9me3 histones, their inherent preference for H3K9me3 histone, their ability to self associate around H3K9me3, and stronger competition effects from binding nucleic acid around nonspecific histone, all help to give it the specificity to silence H3K9me3 gene loci. Second, the rate constants I inferred indicated a key discrepancy: while individual Swi6 proteins can change states on the order of seconds, the epigenetic effects they modulate last generations. In conjunction with other data, we concluded that it is the ensemble of Swi6 phase condensation that is likely the silencing mode in heterochromatin, not any particular Swi6 molecule. I also extended the use of this algorithm to analyze data from some of Swi6's many binding partners, showing that the histones these proteins bind are active participants in their own regulation. In a different project, I analyzed sequence specific genetic regulation using a Reversible Jump Metropolis Hastings algorithm. Sequence specific gene regulation is a common pattern in cells responding to stimuli, so a general algorithm for inferring which sequences are responsible for particular regulatory binding modes would rapidly improve our understanding of genetic regulation in many contexts. Thus, I created The Algorithm for Reversible Jump Inference of Motifs, or TARJIM, which uses DNA-protein binding data with sequence information to infer a sequence motif set that induces gene regulation. Not only was I able to show that TARJIM can identify binding modes in individual transcription factor binding, I was also able to demonstrate that TARJIM is able to unmix binding modes when they are blended, opening the possibility for TARJIM to identify multiple binding modes for promiscuous transcription factors and unmix DNA binding data where there are many proteins of interest. Overall, my graduate work is a collection of analyses of gene regulation from many biological angles, providing a deep view into some of the most complex aspects of genetic regulation
Product Used
Genes

Related Publications