Publications
ThesisJan 2025

High-Throughput Approaches for Engineering Precise Transcriptional Control via Genetic and Epigenetic Mechanisms

Herschl, MH
Product Used
Variant Libraries
Abstract
Transcription—the process that converts DNA to RNA—is fundamental to all life as we know it. This process is the first step in converting our genetically encoded DNA into functional proteins and is tightly regulated by cells to ensure that genes are expressed in the correct contexts and dosages. As we have learned about this fundamental process, we have also learned to control it for a variety of downstream applications including basic research, therapeutics, and biomanufacturing. However, there is still much to learn and more control to be gained. This dissertation presents two complementary high-throughput screening approaches that increase our understanding of transcriptional regulation and provide new tools for its precise control. The first approach focuses on the epigenetic regulation of transcription, which often involves the coordinated interplay of diverse proteins. To systematically explore combinations of proteins that regulate the epigenome, we developed COMBINE (combinatorial interaction exploration), a highthroughput platform that tests over 50,000 pairs of epigenetic effector domains up to 2,094 amino acids in length for their ability to modulate endogenous human gene transcription. COMBINE revealed diverse synergistic interactions between epigenetic effector domains, including a potent KRAB-L3MBTL3 fusion that enhanced gene silencing up to 34-fold in dose-limited conditions and enabled robust dual-directional CRISPR perturbation. Inducible screening showed DNA methylation modifiers are essential for epigenetic memory, with distinct combinations driving long-term repression and activation. Notably, we identified TET1-based combinations that induce hit-and-run upregulation for up to 35 days, demonstrating long-term transcriptional activation. This systematic analysis provides a rich resource for understanding epigenetic crosstalk and developing next-generation epigenome editing tools. Moving from endogenous to synthetic contexts, we explored how regulatory sequences can be engineered to control gene expression. The ability to deliver genetic cargo to human cells is enabling rapid progress in molecular medicine, but designing this cargo for precise expression in specific cell types remains challenging. Expression is driven by regulatory DNA sequences within short synthetic promoters, but relatively few of these promoters are cell-type-specific. We investigated transfer learning strategies for modeling promoter-driven expression, proposing various pretraining tasks, transfer approaches, and model architectures. Through two benchmarks reflecting data-constrained and large dataset settings, we found that pretraining followed by transfer learning improves performance by 24-27% in data-limited scenarios. The methods identified are broadly applicable for modeling promoter-driven expression in understudied cell types and guide the selection of models for designing promoters in gene delivery applications. 2 Building on these modeling insights, we developed a comprehensive framework for designing celltype-specific promoters using model-based optimization (MBO) in a data-efficient manner. While previous MBO approaches have focused on markedly different cell types with distinct regulatory features, we emphasized discovering promoters for closely related cell types that share similar regulatory environments—a more challenging task. By implementing conservative objective models that minimize adversarial designs and incorporating practical considerations for sequence diversity and uncertainty estimation, we generated promoters tailored for three leukemia cell lines (Jurkat, K562, and THP-1). Experimental validation confirmed the effectiveness of this approach, with designed sequences showing improved cell-type specificity. For K562 cells specifically, we discovered a promoter with 75.85% higher cell-type specificity than the best promoter from the initial dataset used to train the models. By advancing both molecular tools and computational frameworks, these approaches collectively represent a significant contribution to our ability to precisely control transcription with applications spanning from basic research to therapeutic development.
Product Used
Variant Libraries

Related Publications