Publications
Genetics in Medicine OpenMar 2023 |
1
(
1
),
100525
DOI:
10.1016/j.gimo.2023.100525

P478: Simultaneous analysis of 50+ different repeat expansions with twist target enrichment and PacBio HiFi sequencing

Han, Tina; Corbitt, Holly; Dolzhenko, Egor; Gonzaludo, Nina; Kingan, Sarah; Tung, Po-Yuan; Toro, Esteban; Locklear, Chad
Product Used
Genes
Abstract
Introduction The expansion of unstable genomic short tandem repeat (STR) has been identified as the causal DNA mutation in more than 30 Mendelian diseases. Short-read sequencing, which is frequently used to analyze STR expansions, is not capable of determining the exact length, sequence composition, or methylation status of long expansions. Targeted resequencing allows for high-resolution characterization of hundreds and thousands of gene regions at a scale and cost that is more accessible than whole genome sequencing. Here we describe a method to leverage Twist target enrichment panels sequenced with PacBio long HiFi reads to accurately measure STRs. Methods Using a proprietary algorithm, we design gene panels targeting various STRs. Our long-read capture protocol starts with 200 nanograms of fragmented gDNA. After end-repair, a-tailing, and adapter ligation, unique dual indices for sample barcoding are added during PCR. Multiple samples can be pooled for an overnight hybridization. The post-capture libraries then undergo SMRTbell library preparation and sequencing on PacBio Sequel II or Revio. Depending on target size, up to 400 samples may be sequenced in one SMRT Cell with HiFi read length of 5-10 kilobases. Results We benchmarked performance of the target enrichment workflow in reference samples with known repeat expansion alleles using the Tandem Repeat Genotyper Tool (TRGT). This tool was previously applied to whole genome sequence datasets and shown to accurately characterize tandem repeat regions and expansions up to tens of kilobases in length. This tool provides a VCF file with repeat motif, count, and mean methylation level for each repeat allele plus a visualization of repeat region with the motifs and counts as TRVZ plot. We demonstrate that this method efficiently enables comprehensive coverage of 50+ different repeat expansions. Using trio samples, we could consistently detect an ATXN1 repeat spanning 31 copies of TGC motifs and three adjacent repeats as CAGG, CAGA, CA motifs in the CNBP region. Clinical validation with confirmed repeat expansion is ongoing. Conclusion The demonstrated method allows for scalable and cost-efficient hybrid capture of 50+ different repeat expansions with long read lengths, minimizing coverage bias, and maximizing accuracy to fully capture all variant types. This includes structural variation and haplotype phasing information which are inaccessible to short-read and Sanger sequencing.
Product Used
Genes

Related Publications