DNABarcodes
DNABarcodes implements error-correcting algorithms to detect and correct insertions, deletions, and substitutions in DNA barcode sequences for accurate sample identification in high-throughput sequencing experiments.
Key Features:
- Error Correction Capabilities: Implements an adaptation of Levenshtein codes to correct nucleotide insertions, deletions, and substitutions and to recover corrupted barcode sequences.
- Adaptation for DNA Context: Tailors classical Levenshtein and Hamming concepts to the continuous nature of DNA with embedded barcodes by redefining word lengths dynamically in response to detected insertions or deletions.
- Customizable Barcode Generation: Generates barcode sets with user-definable parameters including sequence filtering, number of correctable mutations, and barcode length.
- Simulation-Validated Performance: Uses simulations to demonstrate correction of a predefined number of insertions, deletions, and substitutions, recovery of new corrupted codeword lengths, and higher average correction of random mutations compared with traditional codes.
- Error Sources Addressed: Targets barcode errors arising during synthesis, primer ligation, DNA amplification, and sequencing.
Scientific Applications:
- Multiplexed Sequencing: Enables accurate multiplexing by correcting barcode errors when sequencing multiple samples together, applicable to small genomes or fractions of larger genomes.
- Improved Sample Identification: Increases the number of correctly identified samples by correcting barcode-induced misidentifications due to indels and substitutions.
Methodology:
Implements an adaptation of Levenshtein codes that redefines word lengths dynamically to account for insertions and deletions, provides parameterized barcode set generation (sequence filtering, number of correctable mutations, barcode length), and evaluates performance via simulations that test correction of predefined indels and substitutions and recovery of corrupted codeword lengths.
Topics
Collections
Details
- License:
- GPL-2.0
- Tool Type:
- command-line tool, library
- Operating Systems:
- Linux, Windows, Mac
- Programming Languages:
- R
- Added:
- 1/17/2017
- Last Updated:
- 1/10/2019
Operations
Publications
Buschmann T, Bystrykh LV. Levenshtein error-correcting barcodes for multiplexed DNA sequencing. BMC Bioinformatics. 2013;14(1). doi:10.1186/1471-2105-14-272. PMID:24021088. PMCID:PMC3853030.