LoRDEC

LoRDEC performs hybrid error correction of long reads from third-generation sequencing (PacBio single molecule real-time, SMRT) by leveraging high-accuracy second-generation short reads to reduce indel-rich sequencing errors and improve downstream analyses such as read mapping and de novo genome assembly.


Key Features:

  • Hybrid error correction: Uses high-accuracy second-generation short reads mapped onto long reads to correct sequencing errors.
  • Succinct de Bruijn graph construction: Builds a compact de Bruijn graph from short reads to represent sequence information for correction.
  • Graph-based correction via traversal: Traverses the succinct de Bruijn graph to identify corrective sequences for erroneous regions in long reads.
  • Targeted indel correction: Specifically addresses the high insertion and deletion error rates characteristic of PacBio SMRT long reads.
  • Performance: Reported to reduce errors by up to 99%, run at least six times faster than comparable methods, and require approximately 93% less memory or disk space.
  • Implementation: Implemented in C++ and tested on Linux platforms.

Scientific Applications:

  • Long-read error correction: Corrects PacBio SMRT long reads to improve base-level accuracy.
  • Read mapping preprocessing: Produces corrected reads that facilitate more accurate read mapping.
  • De novo genome assembly preprocessing: Produces corrected reads that improve the quality of de novo genome assemblies.

Methodology:

Maps short reads onto long reads and constructs a succinct de Bruijn graph from the short reads, then traverses the graph to identify and apply corrective sequences to erroneous regions in long reads as part of a hybrid error correction strategy.

Topics

Collections

Details

Maturity:
Mature
Tool Type:
command-line tool
Operating Systems:
Linux
Programming Languages:
C++
Added:
8/3/2017
Last Updated:
11/25/2024

Operations

Data Inputs & Outputs

Sequencing error detection

Publications

Salmela L, Rivals E. LoRDEC: accurate and efficient long read error correction. Bioinformatics. 2014;30(24):3506-3514. doi:10.1093/bioinformatics/btu538. PMID:25165095. PMCID:PMC4253826.

Documentation

Downloads

Links