MaSuRCA
MaSuRCA assembles whole genomes by transforming paired-end reads into super-reads and combining de Bruijn graph and Overlap-Layout-Consensus (OLC) methods to integrate Illumina, 454, and Sanger sequencing data.
Key Features:
- Hybrid Assembly Approach: Merges computational speed of de Bruijn graph methods with adaptability of overlap-based Overlap-Layout-Consensus (OLC) strategies.
- Super-Reads Transformation: Transforms large numbers of paired-end reads into a smaller set of longer super-reads to enable integration of short reads with longer reads.
- Versatility in Data Handling: Assembles datasets composed of only short reads or combinations of short and long reads from technologies including Illumina, 454, and Sanger.
- Performance Evaluation: Performed on par or better than Allpaths-LG and outperformed SOAPdenovo2 when evaluated against high-quality reference sequences of Rhodobacter sphaeroides and mouse chromosome 16.
- Enhanced Assembly with Long Reads: Improves assembly quality by augmenting short-read data with long reads.
Scientific Applications:
- Microbial Genomics: Reconstruction of microbial genomes from mixed read-length datasets.
- Model Organism Genomics: Assembly of chromosomes and genomes such as mouse chromosome 16 using mixed sequencing technologies.
- General Genome Reconstruction Projects: Precise whole-genome assembly tasks that require integration of Illumina, 454, Sanger, short-read, and long-read data.
Methodology:
Transforms paired-end reads into super-reads and uses those super-reads to construct a de Bruijn graph while integrating de Bruijn graph and Overlap-Layout-Consensus (OLC) approaches.
Topics
Details
- Tool Type:
- command-line tool
- Operating Systems:
- Linux
- Added:
- 8/3/2017
- Last Updated:
- 4/22/2021
Operations
Publications
Zimin AV, et al. The MaSuRCA genome assembler. Bioinformatics. 2013; 29:2669-77. doi: 10.1093/bioinformatics/btt476
PMID: 23990416