Publications
Structure-based discovery and definition of RiPP recognition elements
Abstract
Ribosomally synthesized and post-translationally modified peptides (RiPPs) are a large class of natural products with wide-ranging structural and functional diversity. Central to many RiPP biosynthetic pathways is the RiPP Recognition Element (RRE), a structurally conserved peptide-binding domain that enables class-independent genome mining. Bioinformatic tools, such as RRE-Finder, have leveraged this domain to identify novel RiPPs, but accuracy has been limited by high false-positive rates. To improve accuracy, we assessed whether structure-based searching of the AlphaFold database with the rapid tertiary structure alignment tool Foldseek could reduce false-positive rates and identify previously unretrievable, sequence-divergent RREs. We used these divergent RREs to build 11 new Foldseek-derived Hidden Markov Models (HMMs) and refined existing models through improved seed alignments, domain excision, bit score thresholds, and Pfam filters. Improved precision mode HMMs retrieved nearly twice as many RREs from the UniProt database as the original models, including novel domain fusions. In total, the updated workflow identified >90,000 high-confidence RREs. To further characterize these RREs and assess their functional relevance, we used a combined bioinformatic and AlphaFold 3 approach to predict over 8,000 RRE-peptide complexes. This enabled the mapping of 13 distinct recognition sequences across known RiPP classes. We further validated the ability of AlphaFold to predict precursor peptide interactions with their cognate RRE domains through binding assays to streamline recognition sequence and putative substrate identification. Together, these improvements enhance the accuracy and scope of RRE-Finder, improving access to previously hidden RRE-dependent biosynthetic pathways.Genome mining relies heavily on sequence similarity searches, which severely limit the discovery potential for sequence-divergent proteins. To mitigate this challenge for RRE domain discovery, we employed structure-based alignments to predict sequence-divergent RREs using Foldseek. The newly identified RRE domains were then used to build new HMMs for use by RRE-Finder. This process identified 5,000 previously unidentified but high-confidence RRE domains. Representatives in this sequence-divergent group retain the canonical RRE fold but display new domain fusions, offering additional bioinformatic handles for genome mining. In parallel, AlphaFold 3 modeling of RRE-precursor peptide interactions enabled the identification of 13 distinct recognition sequence motifs, spanning many RiPP biosynthetic pathways. These approaches have significantly expanded the RRE-dependent RiPP biosynthetic landscape.
Product Used
Oligo Pools
Related Publications