HiCanu
HiCanu assembles PacBio HiFi high-fidelity long-read sequencing data (reads exceeding 10 kilobases with per-base accuracy greater than 99.9%) to produce accurate, contiguous genome assemblies that resolve complex regions such as segmental duplications, satellites, and allelic variants.
Key Features:
- Homopolymer Compression: HiCanu employs homopolymer compression to manage repetitive homopolymer runs and improve assembly accuracy and computational efficiency.
- Overlap-Based Error Correction: HiCanu uses overlap-based error correction to reduce sequencing errors while maintaining assembly contiguity.
- Aggressive False Overlap Filtering: HiCanu applies aggressive false-overlap filtering to minimize incorrect read overlaps and protect assembly integrity in complex regions.
- Hierarchical Assembly Pipeline: HiCanu operates a hierarchical pipeline that includes detecting overlaps in high-noise sequences using MHAP.
Scientific Applications:
- Assembly of complex genomic regions: HiCanu reconstructs segmental duplications, satellite DNAs, and allelic variants in genomes with high complexity and heterogeneity.
- Diploid human genome benchmarking: At 30× PacBio HiFi coverage, HiCanu achieves high accuracy and allele recovery in diploid human genomes.
- Haploid reference assembly (CHM13): On the CHM13 cell line HiCanu produced assemblies with an NG50 contig size of 77 Mbp and per-base consensus accuracy of 99.999% (QV50).
- Validation of segmental duplications: HiCanu correctly resolved 337 of 341 validation BACs from known segmental duplications.
- Centromere assembly: HiCanu produced preliminary assemblies for nine complete human centromeric regions.
Methodology:
Computational methods include homopolymer compression, overlap-based error correction, aggressive false-overlap filtering, and a hierarchical assembly pipeline that detects overlaps in high-noise sequences using MHAP.
Topics
Details
- Tool Type:
- command-line tool
- Programming Languages:
- C++, C, Perl
- Added:
- 1/18/2021
- Last Updated:
- 1/30/2021
Operations
Publications
Nurk S, Walenz BP, Rhie A, Vollger MR, Logsdon GA, Grothe R, Miga KH, Eichler EE, Phillippy AM, Koren S. HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads. Unknown Journal. 2020. doi:10.1101/2020.03.14.992248.