Twist Bioscience
June 26, 2023
5 min read

Where Target Capture Meets Long-Read Sequencing

Abstract image showing columns of different colored, thin rectangles, representing DNA sequencing results.

 

The human genome is complex with variation, from small base changes to expansive tandem repeats and whole-gene duplications. Studying these aberrations and their physiological effects has given us crucial insight into the genetic basis of human evolution and disease. Yet, these insights have also been incomplete, with a substantial amount of genetic variation laying beyond the reach of many next-generation sequencing technologies.

 

For decades, the majority of DNA sequencing has been done using short-read technology which requires DNA to be shredded into smaller, more manageable pieces for sequencing. While valuable, short-read sequencing struggles to resolve complex portions of the genome, such as pseudogenes or tandem repeats. As a result, many sequencing efforts—from population genomics studies to clinical diagnostics—are unable to meaningfully examine the more structurally complex portions of the genome.

 

Sequencing structurally complex regions is important for many reasons, not least of all being that as many as 400 clinically relevant genes fall into this category. Twist Bioscience has recently developed new tools to help researchers reduce the costs of interrogating these regions through targeted long-read sequencing.

 

Long-read vs short-read sequencing

 

One of the significant drawbacks of short-read sequencing is that it produces short fragments that lack the sequence diversity of larger DNA fragments. For this data to be useful, these fragments have to be pieced together to reassemble the original DNA sequence. This is no small task, particularly when the genome contains structurally complex regions. In emphasizing this point, recent evidence suggests that human reference genomes generated with short-read sequencing were missing thousands of structural variants, many of which overlapped with protein-coding genes. As a result, the human reference genome—which guides the reassembly of sequencing fragments—has been incomplete for nearly 20 years, with only 92% of it resolved. The remaining 8% was obscured by its complexity1,2.

 

But where short-read sequencing falls short, long-read sequencing excels.

 

As its name implies, long-read sequencing makes it possible to process fragments of DNA that are thousands of nucleotides long. This means that genomes can be broken down into fewer fragments that are easier to reassemble—just as it’s easier to assemble a 500-piece vs 10,000-piece puzzle. Long-read sequencing is particularly well suited for interrogating complex portions of the genome because long fragments can span the length of a complex area, reducing the guesswork required for reassembly.

 

Using long-read sequencing technology, researchers have found that most individuals contain upwards of 20,000 structural variants in their genome, the majority of which are unresolvable using short-read technology. Importantly, some of these variants intersect with clinically relevant genes2.

 

The advantages of long-read sequencing reached a global audience in 2022 when researchers from the Telomere-2-Telomere (T2T) consortium announced the publication of a truly complete human genome sequence. Using long-read sequencing technology, the T2T consortium was able to reliably sequence the entirety of the human genome, including the 8% that had been obscured in previous reference genomes. This effort adds nearly 200 million base pairs of new DNA sequence, including 99 genes that are expected to code for proteins, and corrects many structural errors that have plagued previous versions of the human reference genome. Producing such a complete genome was only possible thanks to long-read sequencing1.

 

Benefits of Targeted Long-read Sequencing

 

Whether using short-read or long-read sequencing, researchers rarely wish to sequence the entire genome. Instead, resources can be conserved by using a target enrichment strategy in which select portions of the genome are isolated. Doing so allows for greater depth and more samples to be run in parallel, greatly improving the cost efficiencies of sequencing efforts.

 

Target enrichment for long-read sequencing can be uniquely challenging, however, especially when targeting genes that are marred by complexity. In addition to all of the normal considerations for target capture—such as designing probes to uniquely bind their target sequence, and to capture targets uniformly—designing capture probes for long-read applications requires careful probe placement to preserve the integrity of long fragments during the capture process. And, when targeting complex regions, probe promiscuity may be a problem.

 

To help researchers overcome these challenges, Twist partnered with PacBio and applied our significant probe design expertise to create enrichment panels for long-read sequencing.

 

The Twist Alliance Dark Genes Panel was developed in partnership with Pacbio and Fritz Sedlazeck, an Associate Professor at Baylor College of Medicine, to enable the sequencing of 389 clinically relevant genes, each of which is difficult—if not impossible—to sequence with short-read technology. With this panel, researchers can begin to interrogate the wide range of genetic variation that occurs in genes such as the survival of motor neuron 1 (SMN1) gene. Though mutations in this gene have been linked to spinal muscular atrophy, knowledge of the gene’s structure and regulation has been limited owing to frequent duplications and deletions that undermine sequencing efforts3.

 

Similarly, the Twist Alliance Long-Read PGx Panel is designed to help researchers interrogate 49 complex genes that influence drug metabolism and pharmacodynamics, including CYP2D6. The genes captured by this panel are often challenging to sequence using short-read technology due to the presence of pseudogenes. Targeted long-read sequencing presents an efficient and effective way to survey these pharmacogenes for clinically relevant variations and potentially uncover insights that will one day guide therapeutic decisions.

 

Both of these long-read panels are designed with probes that have been optimized for high uniformity and sequencing efficiency. What’s more, these panels can be customized to fit specific needs and sequencing platforms.

 

Collectively, Twist’s long-read panels make it possible for researchers to fill the gaps left by short-read sequencing and shine light on the human genome’s structural complexity.

 

References

 

  1. Nurk, Sergey, et al. “The Complete Sequence of a Human Genome.” Science, vol. 376, no. 6588, Apr. 2022, pp. 44–53, https://doi.org/10.1126/science.abj6987.
  2. Ho, Steve S., et al. “Structural Variation in the Sequencing Era.” Nature Reviews Genetics, vol. 21, no. 3, 15 Nov. 2019, pp. 171–189, https://doi.org/10.1038/s41576-019-0180-9.
  3. SMN1 Survival of Motor Neuron 1, Telomeric [Homo Sapiens (Human)] - Gene - NCBI.” www.ncbi.nlm.nih.gov, www.ncbi.nlm.nih.gov/gene/6606.

What did you think?

Dislike

Love

Surprised

Interesting

Get the latest by subscribing to our blog