MindTheGap

MindTheGap detects and assembles DNA insertion variants from next-generation sequencing (NGS) read datasets, including insertions that exceed the paired-end insert size, to reconstruct insertion sequences.


Key Features:

  • Detection of Insertion Variants: Uses a k-mer-based method to pinpoint insertion sites in a reference genome and identify insertions regardless of novelty or duplication status in the donor genome.
  • Assembly of Insertions: Assembles insertion sequences from donor reads to reconstruct full insertion sequence and composition.
  • Versatility Across Variant Types: Calls novel and duplicated insertions as well as homozygous and heterozygous events in the donor genome.
  • Performance on Complex Datasets: Shows high recall and precision on simulated datasets and has been applied to Caenorhabditis elegans and human NA12878, detecting and assembling insertions greater than 1 kilobase.
  • Resource Efficiency: Processes large datasets using at most 14 GB of memory.

Scientific Applications:

  • Genome evolution studies: Provides reconstructed insertion sequences to support analyses of genetic diversity and evolutionary processes.
  • Structural variant discovery in complex genomes: Enables detection and reconstruction of long insertion variants in projects where traditional methods fail.

Methodology:

Employs a k-mer–based detection algorithm to locate insertion sites in a reference genome and assembles insertion sequences from donor reads, supporting identification of novel, duplicated, homozygous, and heterozygous insertions and handling insertions larger than the paired-end insert size.

Topics

Collections

Details

Maturity:
Emerging
Cost:
Free of charge
Tool Type:
command-line tool
Operating Systems:
Linux, Windows, Mac
Added:
1/21/2015
Last Updated:
11/24/2024

Operations

Data Inputs & Outputs

Variant calling

Publications

Rizk G, Gouin A, Chikhi R, Lemaitre C. MindTheGap: integrated detection and assembly of short and long insertions. Bioinformatics. 2014;30(24):3451-3457. doi:10.1093/bioinformatics/btu545. PMID:25123898. PMCID:PMC4253827.

Documentation