Publications
Analysis of Error Profiles of Indels and Structural Variants in Deep Sequencing Data
Abstract
Background: Accurate detection of low frequency mutations is of critical importance in the study of genetic heterogeneity, such as detection of minimal residual diseases for cancer prognosis. Prior works have resulted in successful computational error suppression for substitutions (SNV). However, the error profiles of small insertion/deletion (Indel) and structural variants (SV) remain elusive. Results: Using conditional probability theory, we hypothesize that Indels and SVs can have lower error rate than that of SNVs. To test this hypothesis, we generated ultra-deep (~10,000,000 X) sequencing data using previously established dilution models (COLO829/COLO829BL) on known somatic Indels (n=23) and SVs (n=17). We discovered that the error rate (1000-fold lower than that of SNVs, although Indels from repetitive regions (repeat-Indels) have error rate as high as 1%. This error pattern was recapitulated in our analysis of 309 Indels and 1,063 SVs discovered from a relapsed B-ALL cohort of 103 patients by using whole genome sequencing (aggregated depth of ~50,000X) data from 1662 healthy donors, where repeat-Indels have error rates as high as 1%. To strengthen our observation on repeat-Indels, we further analyzed 11,378 repeat-Indels in 339 cancer driver genes. Our data indicated that the number of repeat units are highly predictive to the error rate of repeat-Indels (R2>0.9, P0.1%, the detection rate dropped to 16% for MRD less than 0.1%, indicating the difficulty in recovering mutant molecules when their frequencies are very low. Conclusions: Overall, we established Indel and SV error profiles in deep next generation sequencing data that enabled superior tumor detection performance at very low burdens, which has a significant impact on the clinical diagnosis and monitoring of human cancers and beyond. Our data also suggests future research directions to improve recovery of mutant reads in ultra-deep sequencing applications.
Product Used
Variant Libraries
Related Publications