MEGAHIT

MEGAHIT assembles NGS metagenomic sequencing reads de novo using a succinct de Bruijn graph to produce assemblies of large and complex metagenomic datasets.


Key Features:

  • De novo NGS metagenomic assembly: Performs de novo assembly of next-generation sequencing (NGS) metagenomic reads.
  • Succinct de Bruijn graph: Uses a succinct de Bruijn graph approach to reduce memory footprint during assembly.
  • Low memory operation: Operates with low memory usage while maintaining high performance.
  • No preprocessing required: Assembles large datasets without requiring partitioning or normalization preprocessing.
  • GPU-accelerated and CPU modes: Supports GPU-accelerated execution (demonstrated) and also runs without a GPU.
  • Scalability demonstration: Successfully assembled a 252 gigabase pair (Gbps) soil metagenomics dataset in 44.1 hours with a GPU and in 99.6 hours without a GPU on a single computing node.
  • Improved assembly quality: Produces assemblies reported as three times larger than previous methods, with improved contig N50 and average contig lengths.
  • Increased read alignment: Achieved 55.8% of reads aligned to the assembly for the demonstrated dataset, representing a fourfold improvement over earlier techniques.
  • Environmental sample suitability: Effective for extensive metagenomic data from environments such as soil.

Scientific Applications:

  • Large-scale metagenome assembly: Assembly of large and complex metagenomic datasets, including soil metagenomes.
  • Comparative assembly benchmarking: Comparative evaluation to improve assembly size, contig N50, and average contig lengths versus previous methods.
  • Read recruitment enhancement: Increasing the fraction of reads that align to assembled metagenomes (e.g., 55.8% in the demonstrated dataset).

Methodology:

Performs de novo assembly of NGS metagenomic reads using a succinct de Bruijn graph approach and can run with GPU acceleration or in CPU-only mode.

Topics

Details

Tool Type:
command-line tool
Operating Systems:
Linux, Mac
Programming Languages:
C++
Added:
8/3/2017
Last Updated:
11/24/2024

Operations

Publications

Li D, Liu C, Luo R, Sadakane K, Lam T. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct <i>de Bruijn</i> graph. Bioinformatics. 2015;31(10):1674-1676. doi:10.1093/bioinformatics/btv033. PMID:25609793.

Documentation

Links