All Clonal Gene sequences between 300bp to 5000bp that are scored as “Standard” complexity by our screener are eligible for the Express service.
Our scoring system adopts a machine learning model which analyzes and combines multiple sequence parameters ( i.e. overall GC percent, maximum homopolymer length, maximum repeat length, sequence length, repeat density, etc.). This model was trained using our historical manufacturing data, and it provides a more precise estimation of production success for genes and antibody products.
Complexity issues are driven mainly by repetitive structures and extreme GC content. Guidelines for sequence design are available online and below:
- Avoid repeats of ≥ 20bp or Tm ≥ 60C
- Global GC content must be between 25% and 65%
- Avoid extreme differences in GC content within a gene (i.e. the difference in GC content between the highest and lowest 50 bp stretch should be no greater than 52%)
- Minimize homopolymers
- Minimize the number/length of small repeats scattered throughout the sequence
- For HIS tags, use a combination of CAC and CAT codons i.e. CACCAT
The likelihood of sequence acceptance improves by limiting repeats and reducing GC content. For example: a gene may score as “Not Accepted” with 66% overall GC content, but when reduced to 65% or less, may be accepted as a “Complex” sequence. Further lowering GC content to ~60% or less may improve the complexity score to “Standard”, at which point the sequence would be eligible for the Express service. The same holds true for eliminating repeats, reducing homopolymer stretches, and evening out extreme variations in GC windows. The more such structures can be avoided or reduced in a gene, the more likely it is to qualify for the Express service.
For additional information on our scoring algorithm and sequence design, please visit our FAQs.