Publications
bioRxivJan 2025 DOI:
10.1101/2025.10.24.684421

MillionFull enables massive, full-length enzyme sequence-fitness data collection at low cost for machine learning-guided enzyme engineering

Li, J; Erichsen, B; Krarup, SR; Yuan, S; Jijakli, K; Karst, S
Product Used
Genes
Abstract
Machine learning holds great promise for accelerating enzyme optimization, but its power is fundamentally constrained by the limited availability of sequence-fitness data. Here, we introduce MillionFull, a low-cost method that enables high-throughput full-length sequence- fitness mapping for enzymes of arbitrary length. Each run yields on the order of 10⁵-10⁷ data points, capturing sequence-function relationships at unprecedented scale. By overcoming the data bottleneck, MillionFull provides a foundation for dramatically advancing AI-driven enzyme engineering.
Product Used
Genes

Related Publications