LoRDEC
LoRDEC performs hybrid error correction of long reads from third-generation sequencing (PacBio single molecule real-time, SMRT) by leveraging high-accuracy second-generation short reads to reduce indel-rich sequencing errors and improve downstream analyses such as read mapping and de novo genome assembly.
Key Features:
- Hybrid error correction: Uses high-accuracy second-generation short reads mapped onto long reads to correct sequencing errors.
- Succinct de Bruijn graph construction: Builds a compact de Bruijn graph from short reads to represent sequence information for correction.
- Graph-based correction via traversal: Traverses the succinct de Bruijn graph to identify corrective sequences for erroneous regions in long reads.
- Targeted indel correction: Specifically addresses the high insertion and deletion error rates characteristic of PacBio SMRT long reads.
- Performance: Reported to reduce errors by up to 99%, run at least six times faster than comparable methods, and require approximately 93% less memory or disk space.
- Implementation: Implemented in C++ and tested on Linux platforms.
Scientific Applications:
- Long-read error correction: Corrects PacBio SMRT long reads to improve base-level accuracy.
- Read mapping preprocessing: Produces corrected reads that facilitate more accurate read mapping.
- De novo genome assembly preprocessing: Produces corrected reads that improve the quality of de novo genome assemblies.
Methodology:
Maps short reads onto long reads and constructs a succinct de Bruijn graph from the short reads, then traverses the graph to identify and apply corrective sequences to erroneous regions in long reads as part of a hybrid error correction strategy.
Topics
Collections
Details
- Maturity:
- Mature
- Tool Type:
- command-line tool
- Operating Systems:
- Linux
- Programming Languages:
- C++
- Added:
- 8/3/2017
- Last Updated:
- 11/25/2024
Operations
Data Inputs & Outputs
Sequencing error detection
Inputs
Outputs
Publications
Salmela L, Rivals E. LoRDEC: accurate and efficient long read error correction. Bioinformatics. 2014;30(24):3506-3514. doi:10.1093/bioinformatics/btu538. PMID:25165095. PMCID:PMC4253826.