RecoverY
RecoverY: k-mer-based Y Chromosome Read Classification and Assembly Optimization
RecoverY enhances assembly of the haploid mammalian Y chromosome by applying k-mer-based read classification to identify and select Y-specific reads from datasets characterized by high repeat content and low sequencing depth.
Key Features:
- Automated Parameter Selection: Automatically determines the k-mer abundance threshold defining Y-specific k-mers, eliminating manual parameter tuning.
- Integration of Prior Knowledge: Incorporates Y chromosome information from related species or known Y transcript sequences to improve Y-specific read identification accuracy.
- Robust Performance Across Datasets: Validated on simulated and real human and gorilla genome data, demonstrating stability across parameter variations.
- Improved Assembly Metrics: Achieves 33% increase in assembly size and 20% improvement in NG50 compared to read or contig filtering strategies.
Scientific Applications:
- Y-Chromosome Genomics: Supports accurate assembly for studies of genetic diversity, evolutionary biology, and sex chromosome-associated disease.
Methodology:
RecoverY classifies sequencing reads using k-mer abundance profiles to identify Y-specific sequences. Automatic threshold selection, guided by prior Y chromosome knowledge, refines discrimination of Y-derived reads within complex genomic datasets prior to assembly.
Topics
Details
- Tool Type:
- command-line tool
- Operating Systems:
- Linux, Mac
- Programming Languages:
- Python
- Added:
- 6/24/2018
- Last Updated:
- 11/25/2024
Operations
Publications
Rangavittal S, Harris RS, Cechova M, Tomaszkiewicz M, Chikhi R, Makova KD, Medvedev P. RecoverY: <i>k</i> -mer-based read classification for Y-chromosome-specific sequencing and assembly. Bioinformatics. 2017;34(7):1125-1131. doi:10.1093/bioinformatics/btx771. PMID:29194476. PMCID:PMC6030959.