GAML

GAML assembles genomes by maximizing the likelihood of candidate assemblies using a probabilistic model that accounts for sequencing error rates and insert lengths across Illumina, 454, and PacBio datasets.


Key Features:

  • Probabilistic modeling: Uses a probabilistic model that captures sequencing error rates and insert lengths specific to each sequencing technology to evaluate assemblies.
  • Likelihood maximization: Searches for assembly configurations that maximize the likelihood under the probabilistic model.
  • Integration of diverse sequencing data: Accepts and integrates Illumina and 454 reads across various insert sizes as well as PacBio reads in a single assembly framework.
  • Repeat resolution and scaffolding: Targets resolution of repeats and scaffolding of shorter contigs through its likelihood-based assembly evaluation.
  • Comparative assembly performance: Achieves N50 sizes and error rates reported as comparable to established assemblers such as ALLPATHS-LG and Cerulean.

Scientific Applications:

  • Multi-platform genome assembly: Integrates multiple sequencing platforms (Illumina, 454, PacBio) to produce cohesive genome assemblies from heterogeneous datasets.
  • Complex genome reconstruction: Supports projects requiring repeat resolution and contig scaffolding to improve assembly continuity and accuracy.

Methodology:

Searches for an optimal assembly configuration that maximizes likelihood within a probabilistic model that explicitly accounts for dataset-specific error profiles and insert lengths.

Topics

Details

Tool Type:
command-line tool
Operating Systems:
Linux, Windows, Mac
Programming Languages:
C++
Added:
8/3/2017
Last Updated:
11/25/2024

Operations

Publications

Boža V, Brejová B, Vinař T. GAML: genome assembly by maximum likelihood. Algorithms for Molecular Biology. 2015;10(1). doi:10.1186/s13015-015-0052-6. PMID:26042154. PMCID:PMC4454275.

PMID: 26042154
PMCID: PMC4454275
Funding: - VEGA: 1/0719/14, 1/1085/12

Documentation

Links