SHARCGS

SHARCGS assembles short-read high-throughput sequencing data de novo to produce accurate contigs for genomic analysis.


Key Features:

  • High Throughput Capability: Optimized for large datasets from next-generation sequencing technologies and capable of producing gigabase-scale assemblies.
  • Short-Read Assembly: Specifically addresses assembly of short reads (25–40-mers), including outputs from Illumina's 1G sequencer.
  • Accuracy and Speed: Demonstrates superior speed and accuracy relative to existing assembly algorithms.
  • Robustness Against Errors: Manages missing reads and incorrect base calls to produce high-quality assemblies under suboptimal conditions.

Scientific Applications:

  • Eukaryotic Genomes: Tested on BAC inserts from Drosophila and Arabidopsis, achieving simulated N50 sizes greater than 20 kbp in datasets that include data imperfections.
  • Yeast Chromosomes: Demonstrated efficacy in assembling yeast chromosomes.
  • Bacterial Genomes: Applied to bacterial genomes including Haemophilus influenzae and Escherichia coli.

Methodology:

Assembles short-read data into contigs; produced 949,974 contigs longer than 50 bp with nearly all aligning error-free to references, and assembled 36-mer reads for Helicobacter acinonychis covering 98% of the genome with an N50 of 3.7 kbp and minimal discrepancies.

Topics

Details

License:
GPL-3.0
Maturity:
Mature
Cost:
Free of charge
Tool Type:
command-line tool
Operating Systems:
Linux
Programming Languages:
Perl
Added:
1/13/2017
Last Updated:
11/24/2024

Operations

Publications

Dohm JC, Lottaz C, Borodina T, Himmelbauer H. SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing. Genome Research. 2007;17(11):1697-1706. doi:10.1101/gr.6435207. PMID:17908823. PMCID:PMC2045152.

Documentation

Downloads