HiCanu

HiCanu assembles PacBio HiFi high-fidelity long-read sequencing data (reads exceeding 10 kilobases with per-base accuracy greater than 99.9%) to produce accurate, contiguous genome assemblies that resolve complex regions such as segmental duplications, satellites, and allelic variants.


Key Features:

  • Homopolymer Compression: HiCanu employs homopolymer compression to manage repetitive homopolymer runs and improve assembly accuracy and computational efficiency.
  • Overlap-Based Error Correction: HiCanu uses overlap-based error correction to reduce sequencing errors while maintaining assembly contiguity.
  • Aggressive False Overlap Filtering: HiCanu applies aggressive false-overlap filtering to minimize incorrect read overlaps and protect assembly integrity in complex regions.
  • Hierarchical Assembly Pipeline: HiCanu operates a hierarchical pipeline that includes detecting overlaps in high-noise sequences using MHAP.

Scientific Applications:

  • Assembly of complex genomic regions: HiCanu reconstructs segmental duplications, satellite DNAs, and allelic variants in genomes with high complexity and heterogeneity.
  • Diploid human genome benchmarking: At 30× PacBio HiFi coverage, HiCanu achieves high accuracy and allele recovery in diploid human genomes.
  • Haploid reference assembly (CHM13): On the CHM13 cell line HiCanu produced assemblies with an NG50 contig size of 77 Mbp and per-base consensus accuracy of 99.999% (QV50).
  • Validation of segmental duplications: HiCanu correctly resolved 337 of 341 validation BACs from known segmental duplications.
  • Centromere assembly: HiCanu produced preliminary assemblies for nine complete human centromeric regions.

Methodology:

Computational methods include homopolymer compression, overlap-based error correction, aggressive false-overlap filtering, and a hierarchical assembly pipeline that detects overlaps in high-noise sequences using MHAP.

Topics

Details

Tool Type:
command-line tool
Programming Languages:
C++, C, Perl
Added:
1/18/2021
Last Updated:
1/30/2021

Operations

Publications

Nurk S, Walenz BP, Rhie A, Vollger MR, Logsdon GA, Grothe R, Miga KH, Eichler EE, Phillippy AM, Koren S. HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads. Unknown Journal. 2020. doi:10.1101/2020.03.14.992248.

Links