MindTheGap
MindTheGap detects and assembles DNA insertion variants from next-generation sequencing (NGS) read datasets, including insertions that exceed the paired-end insert size, to reconstruct insertion sequences.
Key Features:
- Detection of Insertion Variants: Uses a k-mer-based method to pinpoint insertion sites in a reference genome and identify insertions regardless of novelty or duplication status in the donor genome.
- Assembly of Insertions: Assembles insertion sequences from donor reads to reconstruct full insertion sequence and composition.
- Versatility Across Variant Types: Calls novel and duplicated insertions as well as homozygous and heterozygous events in the donor genome.
- Performance on Complex Datasets: Shows high recall and precision on simulated datasets and has been applied to Caenorhabditis elegans and human NA12878, detecting and assembling insertions greater than 1 kilobase.
- Resource Efficiency: Processes large datasets using at most 14 GB of memory.
Scientific Applications:
- Genome evolution studies: Provides reconstructed insertion sequences to support analyses of genetic diversity and evolutionary processes.
- Structural variant discovery in complex genomes: Enables detection and reconstruction of long insertion variants in projects where traditional methods fail.
Methodology:
Employs a k-mer–based detection algorithm to locate insertion sites in a reference genome and assembles insertion sequences from donor reads, supporting identification of novel, duplicated, homozygous, and heterozygous insertions and handling insertions larger than the paired-end insert size.
Topics
Collections
Details
- Maturity:
- Emerging
- Cost:
- Free of charge
- Tool Type:
- command-line tool
- Operating Systems:
- Linux, Windows, Mac
- Added:
- 1/21/2015
- Last Updated:
- 11/24/2024
Operations
Data Inputs & Outputs
Variant calling
Outputs
Publications
Rizk G, Gouin A, Chikhi R, Lemaitre C. MindTheGap: integrated detection and assembly of short and long insertions. Bioinformatics. 2014;30(24):3451-3457. doi:10.1093/bioinformatics/btu545. PMID:25123898. PMCID:PMC4253827.