SHARCGS
SHARCGS assembles short-read high-throughput sequencing data de novo to produce accurate contigs for genomic analysis.
Key Features:
- High Throughput Capability: Optimized for large datasets from next-generation sequencing technologies and capable of producing gigabase-scale assemblies.
- Short-Read Assembly: Specifically addresses assembly of short reads (25–40-mers), including outputs from Illumina's 1G sequencer.
- Accuracy and Speed: Demonstrates superior speed and accuracy relative to existing assembly algorithms.
- Robustness Against Errors: Manages missing reads and incorrect base calls to produce high-quality assemblies under suboptimal conditions.
Scientific Applications:
- Eukaryotic Genomes: Tested on BAC inserts from Drosophila and Arabidopsis, achieving simulated N50 sizes greater than 20 kbp in datasets that include data imperfections.
- Yeast Chromosomes: Demonstrated efficacy in assembling yeast chromosomes.
- Bacterial Genomes: Applied to bacterial genomes including Haemophilus influenzae and Escherichia coli.
Methodology:
Assembles short-read data into contigs; produced 949,974 contigs longer than 50 bp with nearly all aligning error-free to references, and assembled 36-mer reads for Helicobacter acinonychis covering 98% of the genome with an N50 of 3.7 kbp and minimal discrepancies.
Topics
Details
- License:
- GPL-3.0
- Maturity:
- Mature
- Cost:
- Free of charge
- Tool Type:
- command-line tool
- Operating Systems:
- Linux
- Programming Languages:
- Perl
- Added:
- 1/13/2017
- Last Updated:
- 11/24/2024
Operations
Publications
Dohm JC, Lottaz C, Borodina T, Himmelbauer H. SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing. Genome Research. 2007;17(11):1697-1706. doi:10.1101/gr.6435207. PMID:17908823. PMCID:PMC2045152.