Publications
The fifth international hackathon for developing computational cloud-based tools and resources for pan-structural variation and genomics
Abstract
The TykeVar package has been tested using Python>3.10. The TykeVar workflow can be broadly split into 3 parts. The TykeVarSimulator takes an aligned BAM file, a reference, and several parameters (such as range of VAF, variant sizes) to generate a set of simulated mosaic SV and SNVs. It does so by choosing a random location and VAF from the given range and then evaluating whether that location has sufficient coverage for the desired VAF. If that condition is met, that variant is added to the output VCF file. The TykeVarEditor is responsible for inserting the simulated variants into the query sequences from the original dataset to generate modified reads with the mosaic variants built-in. For each variant, it fetches the overlapping reads from the BAM file, subsamples the reads to get the coverage that satisfies the desired VAF, and traverses the cigar string, query and reference sequences for each alignment to find the exact location to insert the variant using pysam (0.21.0). Once a modified read is created, it is written out into a FASTQ file. Note that for all new bases (SNVs or inserts), the _q-score_ of 60 is chosen. The parsing and traversing of the VCF, BAM and reference files are performed using APIs from pysam, biopython. SeqIO and NumPy. Lastly, the TykeVarMerger re-introduces the modified reads into the original dataset using minimap2 (v2.24-r1122)52 and bwa-mem2 (v2.2.1).59 Additional non-standard dependencies include NumPy (1.25.2), and BioPython (1.81), all of which are available through the pip package management system.
Product Used
Genes
Related Publications