VCAKE

VCAKE performs de novo assembly of short reads with robust error correction by k-mer extension and consensus verification using high-depth coverage.


Key Features:

  • Error correction via high-depth coverage: Leverages high sequencing depth to identify and correct errors characteristic of short-read technologies such as Illumina's Solexa Sequencing (≈30 bp reads).
  • K-mer extension-based consensus verification: Constructs assemblies by extending k-mers and verifying consensus sequences through repeated observations across high-coverage reads.
  • Modification of prior k-mer extension algorithms: Implements a simple modification of earlier k-mer extension approaches (e.g., SSAKE) to improve assembly accuracy in the presence of errors.
  • Validated on diverse datasets: Demonstrates improved assembly performance on both simulated and experimental datasets containing sequencing errors.
  • Targeted for small-genome assembly: Optimized for de novo assembly of organisms with small genomes using large numbers of short reads.

Scientific Applications:

  • De novo genome assembly (small genomes): Produces accurate assemblies from short, high-coverage reads for small-genome sequencing projects.
  • Evolutionary biology: Enables generation of reliable genome assemblies for comparative and evolutionary analyses.
  • Microbiology: Supports assembly of microbial genomes from short-read sequencing data with high error rates.
  • Genetics: Provides accurate sequence reconstructions for downstream genetic analyses and variant discovery in small-genome contexts.

Methodology:

Uses a modified k-mer extension algorithm (building on SSAKE) with consensus verification by repeated observation across high-depth reads to identify and correct sequencing errors.

Topics

Details

License:
GPL-3.0
Maturity:
Legacy
Tool Type:
command-line tool
Operating Systems:
Linux, Mac
Programming Languages:
Perl, C
Added:
1/13/2017
Last Updated:
11/25/2024

Operations

Publications

Jeck WR, Reinhardt JA, Baltrus DA, Hickenbotham MT, Magrini V, Mardis ER, Dangl JL, Jones CD. Extending assembly of short DNA sequences to handle error. Bioinformatics. 2007;23(21):2942-2944. doi:10.1093/bioinformatics/btm451. PMID:17893086.

Documentation