CANU

CANU assembles de novo genomes from noisy single-molecule long reads generated by Pacific Biosciences (PacBio) and Oxford Nanopore to produce high-quality reference assemblies.

Key Features:

Support for Long-Read Sequencing: Optimized for long-read data from Pacific Biosciences (PacBio) and Oxford Nanopore for improved reconstruction of reference genomes.
Error Rate Management: Implements algorithmic strategies to handle high error rates in single-molecule long reads while resolving large repeats and closely related haplotypes.
Adaptive Overlapping Strategy: Uses a tf-idf weighted MinHash adaptive overlapping strategy to improve overlap detection accuracy and assembly continuity.
Sparse Assembly Graph Construction: Constructs sparse assembly graphs to prevent collapse of diverged repeats and haplotypes for more accurate assemblies.
Reduced Depth-of-Coverage Requirements: Lowers depth-of-coverage requirements by approximately half compared to Celera Assembler 8.2.
Improved Runtime Efficiency: Achieves substantial runtime reductions—by an order of magnitude for large genomes—relative to earlier versions.
High Assembly Continuity and Quality: Capable of producing complete microbial genomes and near-complete eukaryotic chromosomes, achieving contig NG50 > 21 Mbp on human and Drosophila melanogaster PacBio datasets.

Scientific Applications:

Complex genome assembly: Reconstruction of microbial and eukaryotic genomes from long-read sequencing data.
Reference-quality genome generation: Automated production of reference-quality genome assemblies for downstream genomic analyses.
Graph-based integration: Outputs assembly graphs in GFA format for integration with phasing and scaffolding techniques.

Methodology:

Uses novel overlapping and assembly algorithms including an adaptive overlapping strategy based on tf-idf weighted MinHash and sparse assembly graph construction to handle high error rates and avoid collapse of diverged repeats and haplotypes.

Visit Official Homepage →

Topics

Genomics

Details

Tool Type:: command-line tool
Operating Systems:: Linux, Mac
Programming Languages:: Shell, Perl
Added:: 11/27/2017
Last Updated:: 11/24/2024

Operations

De-novo assembly

Publications

Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM. Canu: scalable and accurate long-read assembly via adaptive <i>k</i> -mer weighting and repeat separation. Genome Research. 2017;27(5):722-736. doi:10.1101/gr.215087.116. PMID:28298431. PMCID:PMC5411767.

DOI: 10.1101/gr.215087.116

PMID: 28298431

PMCID: PMC5411767

Funding: - National Institutes of Health: HSHQDC-07-C-00020 - National Science Foundation: NSF IOS-1237993

Documentation

General

https://github.com/marbl/canu/blob/master/README.md

← Back to search