Twist Bioscience HQ
681 Gateway Blvd
South San Francisco, CA 94080
System-wide extraction of cis-regulatory rules from sequence-to-function models in human neural development
PRODUCTS USED
ABSTRACT
Abstract The genomic cis -regulatory code (CRC) underlies spatiotemporal specificity of gene expression. While sequence-to-function (S2F) models can accurately encode the CRC of transcriptional enhancers, decoding these models into human-interpretable rules remains a major challenge. Here we tackle this challenge in human neural development, for which we generate two new single-cell multiome atlases, one from a human embryo and one from neural tube organoids. We use this comparative framework to robustly extract combinations of transcription factor (TF) binding sites that are necessary and sufficient to design enhancers. As such we extract cis -regulatory rules for dorsal-ventral progenitors, neural crest, mesenchyme and neurons. To enable this, we develop a new strategy and computational package, called TF-MINDI, to embed, cluster, and annotate candidate TF binding sites, and to extract combinatorial rules for each cell type. We evaluate rule-based models in conjunction with blackbox S2F models through simulations, evolutionary comparisons with zebrafish, topic modeling, and enhancer reporter-assays. Our findings show robust and interpretable rule extraction and constitute a step forward in deciphering, explaining, and formalizing the CRC. TF-MINDI is available at: https://github.com/aertslab/TF-MINDI .