CANU
CANU assembles de novo genomes from noisy single-molecule long reads generated by Pacific Biosciences (PacBio) and Oxford Nanopore to produce high-quality reference assemblies.
Key Features:
- Support for Long-Read Sequencing: Optimized for long-read data from Pacific Biosciences (PacBio) and Oxford Nanopore for improved reconstruction of reference genomes.
- Error Rate Management: Implements algorithmic strategies to handle high error rates in single-molecule long reads while resolving large repeats and closely related haplotypes.
- Adaptive Overlapping Strategy: Uses a tf-idf weighted MinHash adaptive overlapping strategy to improve overlap detection accuracy and assembly continuity.
- Sparse Assembly Graph Construction: Constructs sparse assembly graphs to prevent collapse of diverged repeats and haplotypes for more accurate assemblies.
- Reduced Depth-of-Coverage Requirements: Lowers depth-of-coverage requirements by approximately half compared to Celera Assembler 8.2.
- Improved Runtime Efficiency: Achieves substantial runtime reductions—by an order of magnitude for large genomes—relative to earlier versions.
- High Assembly Continuity and Quality: Capable of producing complete microbial genomes and near-complete eukaryotic chromosomes, achieving contig NG50 > 21 Mbp on human and Drosophila melanogaster PacBio datasets.
Scientific Applications:
- Complex genome assembly: Reconstruction of microbial and eukaryotic genomes from long-read sequencing data.
- Reference-quality genome generation: Automated production of reference-quality genome assemblies for downstream genomic analyses.
- Graph-based integration: Outputs assembly graphs in GFA format for integration with phasing and scaffolding techniques.
Methodology:
Uses novel overlapping and assembly algorithms including an adaptive overlapping strategy based on tf-idf weighted MinHash and sparse assembly graph construction to handle high error rates and avoid collapse of diverged repeats and haplotypes.
Topics
Details
- Tool Type:
- command-line tool
- Operating Systems:
- Linux, Mac
- Programming Languages:
- Shell, Perl
- Added:
- 11/27/2017
- Last Updated:
- 11/24/2024
Operations
Publications
Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM. Canu: scalable and accurate long-read assembly via adaptive <i>k</i> -mer weighting and repeat separation. Genome Research. 2017;27(5):722-736. doi:10.1101/gr.215087.116. PMID:28298431. PMCID:PMC5411767.