RecoverY

RecoverY: k-mer-based Y Chromosome Read Classification and Assembly Optimization

RecoverY enhances assembly of the haploid mammalian Y chromosome by applying k-mer-based read classification to identify and select Y-specific reads from datasets characterized by high repeat content and low sequencing depth.


Key Features:

  • Automated Parameter Selection: Automatically determines the k-mer abundance threshold defining Y-specific k-mers, eliminating manual parameter tuning.
  • Integration of Prior Knowledge: Incorporates Y chromosome information from related species or known Y transcript sequences to improve Y-specific read identification accuracy.
  • Robust Performance Across Datasets: Validated on simulated and real human and gorilla genome data, demonstrating stability across parameter variations.
  • Improved Assembly Metrics: Achieves 33% increase in assembly size and 20% improvement in NG50 compared to read or contig filtering strategies.

Scientific Applications:

  • Y-Chromosome Genomics: Supports accurate assembly for studies of genetic diversity, evolutionary biology, and sex chromosome-associated disease.

Methodology:

RecoverY classifies sequencing reads using k-mer abundance profiles to identify Y-specific sequences. Automatic threshold selection, guided by prior Y chromosome knowledge, refines discrimination of Y-derived reads within complex genomic datasets prior to assembly.

Topics

Details

Tool Type:
command-line tool
Operating Systems:
Linux, Mac
Programming Languages:
Python
Added:
6/24/2018
Last Updated:
11/25/2024

Operations

Publications

Rangavittal S, Harris RS, Cechova M, Tomaszkiewicz M, Chikhi R, Makova KD, Medvedev P. RecoverY: <i>k</i> -mer-based read classification for Y-chromosome-specific sequencing and assembly. Bioinformatics. 2017;34(7):1125-1131. doi:10.1093/bioinformatics/btx771. PMID:29194476. PMCID:PMC6030959.

PMID: 29194476
PMCID: PMC6030959
Funding: - NSF: DBI-1356529, IIS-1453527, IIS-1421908 and CCF-1439057, DBI-ABI 0965596

Documentation