PECAT
PECAT performs haplotype-aware error correction and phased assembly of diploid genomes from long noisy reads to retain heterozygous alleles and produce haplotype-specific contigs.
Key Features:
- Haplotype-aware error correction: Applies a haplotype-aware error correction method that preserves heterozygote alleles while correcting sequencing errors.
- Corrected-read and raw-read SNP callers: Utilizes both a corrected-read SNP caller and a raw-read SNP caller to enhance variant detection.
- Inconsistent-overlap identification: Uses SNP information to identify inconsistent overlaps within the string graph structure.
- Read grouping by haplotype: Groups reads into distinct haplotype groups to guide haplotype-specific assembly.
- Haplotype-specific contig generation: Produces haplotype-specific contigs with improved contiguity and specificity.
- Long-read technology support: Supports Nanopore R9, Nanopore R10, and PacBio CLR long-read sequencing technologies.
- Comparative performance: Generates more contiguous haplotype-specific contigs compared to other assemblers.
- Empirical results — B. taurus: Nearly achieved haplotype-resolved assembly on B. taurus (Bison×Simmental) using Nanopore R9 reads.
- Empirical results — human HG002: Achieved phase block NG50 metrics of 59.4/58.0 Mb for human sample HG002 using Nanopore R10 reads.
Scientific Applications:
- Diploid genome assembly: Construction of phased diploid genome assemblies from long noisy reads.
- Haplotype-resolved assembly and phasing: Generation and evaluation of haplotype-specific contigs and phase block metrics (e.g., NG50).
- Complex eukaryotic genome assembly: Assembly of complex diploid genomes such as B. taurus (Bison×Simmental) using Nanopore or PacBio long reads.
Methodology:
Implements haplotype-aware error correction, applies corrected-read and raw-read SNP callers, identifies inconsistent overlaps in a string graph using SNP information, groups reads by haplotype, and assembles haplotype-specific contigs.
Topics
Details
- Tool Type:
- command-line tool
- Added:
- 4/19/2024
- Last Updated:
- 11/24/2024
Operations
Publications
Nie F, Ni P, Huang N, Zhang J, Wang Z, Xiao C, Luo F, Wang J. De novo diploid genome assembly using long noisy reads. Nature Communications. 2024;15(1). doi:10.1038/s41467-024-47349-7. PMID:38580638. PMCID:PMC10997618.
PMID: 38580638
PMCID: PMC10997618
Funding: - United States Department of Agriculture | National Institute of Food and Agriculture: 2023-70029-41309
- NSF | BIO | Division of Biological Infrastructure: ABI-1759856