CUDA-EC

CUDA-EC corrects sequencing errors in high-throughput short-read DNA data to produce error-free reads for de novo genome assembly and downstream genomic analyses.


Key Features:

  • Scalable parallel algorithm: Implements a scalable parallel algorithm using the Compute Unified Device Architecture (CUDA) programming model to perform error correction across large short-read data sets.
  • Spectral alignment: Uses spectral alignment to identify and correct errors within sequencing reads by analyzing their spectral properties.
  • CUDA texture memory utilization: Leverages CUDA texture memory to enhance computational efficiency during error correction processes.
  • Space-efficient Bloom filter: Incorporates a space-efficient Bloom filter data structure for spectrum membership queries.

Scientific Applications:

  • Graph-based short-read assembly support: Provides corrected reads to graph-based short-read assembly tools to improve the accuracy of de novo genome assembly.
  • Illumina sequencing data processing: Applicable to real and simulated Illumina sequencing data across varying read lengths, error rates, and input sizes.

Methodology:

Implements a scalable parallel algorithm using the CUDA programming model and spectral alignment; uses CUDA texture memory and a space-efficient Bloom filter for spectrum membership queries; tested on real and simulated Illumina data sets and reported speedups of 12-84× for parallelized error correction and 3-63× versus the Euler-SR program for sequential preprocessing and parallelized error correction.

Topics

Details

Tool Type:
command-line tool
Operating Systems:
Linux, Windows, Mac
Programming Languages:
C
Added:
1/13/2017
Last Updated:
11/25/2024

Operations

Publications

Shi H, Schmidt B, Liu W, Müller-Wittig W. A Parallel Algorithm for Error Correction in High-Throughput Short-Read Data on CUDA-Enabled Graphics Hardware. Journal of Computational Biology. 2010;17(4):603-615. doi:10.1089/cmb.2009.0062. PMID:20426693.

Documentation