MaSuRCA

MaSuRCA assembles whole genomes by transforming paired-end reads into super-reads and combining de Bruijn graph and Overlap-Layout-Consensus (OLC) methods to integrate Illumina, 454, and Sanger sequencing data.


Key Features:

  • Hybrid Assembly Approach: Merges computational speed of de Bruijn graph methods with adaptability of overlap-based Overlap-Layout-Consensus (OLC) strategies.
  • Super-Reads Transformation: Transforms large numbers of paired-end reads into a smaller set of longer super-reads to enable integration of short reads with longer reads.
  • Versatility in Data Handling: Assembles datasets composed of only short reads or combinations of short and long reads from technologies including Illumina, 454, and Sanger.
  • Performance Evaluation: Performed on par or better than Allpaths-LG and outperformed SOAPdenovo2 when evaluated against high-quality reference sequences of Rhodobacter sphaeroides and mouse chromosome 16.
  • Enhanced Assembly with Long Reads: Improves assembly quality by augmenting short-read data with long reads.

Scientific Applications:

  • Microbial Genomics: Reconstruction of microbial genomes from mixed read-length datasets.
  • Model Organism Genomics: Assembly of chromosomes and genomes such as mouse chromosome 16 using mixed sequencing technologies.
  • General Genome Reconstruction Projects: Precise whole-genome assembly tasks that require integration of Illumina, 454, Sanger, short-read, and long-read data.

Methodology:

Transforms paired-end reads into super-reads and uses those super-reads to construct a de Bruijn graph while integrating de Bruijn graph and Overlap-Layout-Consensus (OLC) approaches.

Topics

Details

Tool Type:
command-line tool
Operating Systems:
Linux
Added:
8/3/2017
Last Updated:
4/22/2021

Operations

Publications

Zimin AV, et al. The MaSuRCA genome assembler. Bioinformatics. 2013; 29:2669-77. doi: 10.1093/bioinformatics/btt476

PMID: 23990416

Documentation

Links