Mix

Mix combines multiple draft genome assemblies without a reference to reduce contig fragmentation and maximize cumulative contig length for improved de novo genome assembly.


Key Features:

  • Integration of Multiple Draft Assemblies: Combines two or more draft assemblies into a single output without relying on a reference genome.
  • Reduction of Contig Fragmentation: Reduces contig fragmentation by extending contigs via an algorithm based on an extension graph.
  • Extension Graph Algorithm: Represents contig extremities as vertices and alignments between extremities as edges to guide contig extension.
  • Maximization of Cumulative Contig Length: Computes a set of paths in the extension graph that maximizes cumulative contig length to produce a more contiguous assembly.

Scientific Applications:

  • GAGE-B evaluation: Evaluated on bacterial next-generation sequencing (NGS) data from the GAGE-B study, demonstrating efficacy and robustness.
  • Mycoplasma genome assembly: Applied to newly sequenced Mycoplasma genomes with significant improvement in assembly quality.
  • De novo genome projects: Delivers results when guided solely by standard assembly statistics, supporting de novo assembly workflows.

Methodology:

Constructs an extension graph with vertices representing contig extremities and edges representing alignments between extremities, then identifies a set of paths that maximizes cumulative contig length to extend contigs and reduce fragmentation.

Topics

Details

License:
MIT
Maturity:
Mature
Cost:
Free of charge
Tool Type:
command-line tool
Operating Systems:
Linux
Programming Languages:
Python
Added:
10/29/2015
Last Updated:
11/25/2024

Operations

Data Inputs & Outputs

Publications

Soueidan H, Maurier F, Groppi A, Sirand-Pugnet P, Tardy F, Citti C, Dupuy V, Nikolski M. Finishing bacterial genome assemblies with Mix. BMC Bioinformatics. 2013;14(S15). doi:10.1186/1471-2105-14-s15-s16. PMID:24564706. PMCID:PMC3851838.

Documentation