Flye

Flye assembles genomes de novo from single-molecule long reads (PacBio and Oxford Nanopore Technologies) to produce polished contigs for genomic analyses.


Key Features:

  • Long-Read Assembly: Assembles PacBio and Oxford Nanopore Technologies (ONT) long reads to resolve repetitive regions and improve contiguity.
  • Repeat Graph Construction: Generates disjointigs from error-prone long reads and constructs a repeat graph to resolve repeats and increase assembly quality, with reported NGA50 improvements for human assemblies.
  • Hybrid Assembly Capabilities: Integrates long-read data with Illumina short reads for hybrid assemblies to enhance accuracy and completeness, demonstrated on bacterial genomes including Enterobacteriaceae.
  • Polishing and Output: Processes raw PB/ONT reads and outputs polished contigs as the final assembly product.
  • Automation and Efficiency: Provides an automated pipeline for de novo assembly of raw long reads into contigs.
  • Metagenomic Assembly (metaFlye): Extends Flye for long-read metagenomic datasets (metaFlye), addressing uneven bacterial composition and intra-species heterogeneity to reconstruct complete or near-complete genomes.

Scientific Applications:

  • Bacterial Genome Assembly: Reconstructs high-quality bacterial genomes for studies of complex genetic structures and antimicrobial resistance, including applications to Enterobacteriaceae.
  • Human and Microbiome Genomics: Improves contiguity in human genome assemblies and enables resolution of full-length biosynthetic gene clusters in microbiome studies.
  • Metagenomic Studies: Assembles complex microbial communities from long-read metagenomic datasets using metaFlye to recover complete or nearly complete bacterial genomes.

Methodology:

Generates disjointigs from error-prone long reads; constructs a repeat graph for repeat resolution; supports hybrid assembly by integrating Illumina short reads with long reads; outputs polished contigs; metaFlye adapts the approach for uneven composition and intra-species heterogeneity in metagenomic datasets.

Topics

Collections

Details

License:
BSD-3-Clause
Cost:
Free of charge
Tool Type:
command-line tool, workflow
Operating Systems:
Mac, Linux
Programming Languages:
C++, Python, C
Added:
11/14/2019
Last Updated:
6/18/2025

Operations

Data Inputs & Outputs

Publications

Kolmogorov M, Bickhart DM, Behsaz B, Gurevich A, Rayko M, Shin SB, Kuhn K, Yuan J, Polevikov E, Smith TPL, Pevzner PA. metaFlye: scalable long-read metagenome assembly using repeat graphs. Nature Methods. 2020;17(11):1103-1110. doi:10.1038/s41592-020-00971-x. PMID:33020656. PMCID:PMC10699202.

PMID: 33020656
Funding: - National Science Foundation: 1715911

De Maio N, Shaw LP, Hubbard A, George S, Sanderson ND, Swann J, Wick R, AbuOun M, Stubberfield E, Hoosdally SJ, Crook DW, Peto TEA, Sheppard AE, Bailey MJ, Read DS, Anjum MF, Walker AS, Stoesser N. Comparison of long-read sequencing technologies in the hybrid assembly of complex bacterial genomes. Microbial Genomics. 2019;5(9). doi:10.1099/mgen.0.000294. PMID:31483244. PMCID:PMC6807382.

Kolmogorov M, Yuan J, Lin Y, Pevzner PA. Assembly of long, error-prone reads using repeat graphs. Nature Biotechnology. 2019;37(5):540-546. doi:10.1038/s41587-019-0072-8. PMID:30936562.

Links