What is the definition of a low complexity region for target enrichment?

We are currently using a proprietary model to predict whether or not a probe will be difficult to detect by Illumina sequencing. The inputs to this model include GC content, global complexity (measured as the number of distinct kmers contained in a probe sequence), local complexity (measured as the number of distinct kmers within a limited region of a sequence), and the presence of homopolymers.

Generally, probes with GC content > 85%, homopolymers longer than 20 nt, or tandem repeats of di- or tri- nucleotides longer than 30 nt will be difficult to sequence.

Was this article helpful?


Still have questions? Contact Us