misFinder

misFinder identifies and corrects mis-assemblies in genome assemblies by integrating reference-genome comparison and paired-end read alignment to distinguish assembly errors from structural variations.


Key Features:

  • Unbiased Error Identification: Integrates reference-genome comparison and aligned paired-end reads to distinguish mis-assemblies from true structural variations.
  • High Accuracy in Error Detection: Leverages a reference genome (or closely related references) together with coverage data and insert-distance consistency features derived from paired-end reads to make high-confidence error calls.
  • Reduction of False Positives/Negatives: Minimizes false positives and false negatives in mis-assembly detection to improve the reliability of downstream genomic analyses.
  • Performance Superiority: Demonstrated superior detection of true mis-assemblies with fewer false calls than QUAST and REAPR on simulated and real paired-end read datasets.

Scientific Applications:

  • Variant Detection: Improves accuracy of variant calling by correcting mis-assemblies that could confound variant detection.
  • Gene Annotation: Supports reliable gene annotation by ensuring contiguous and correctly assembled gene loci.
  • Comparative Genomics: Enables accurate comparative genomics by distinguishing structural variation from assembly artifacts.

Methodology:

Combines reference-genome comparison with paired-end read alignment and assesses coverage consistency and insert-distance features to identify and correct mis-assembled positions and to discriminate structural variations from assembly errors.

Topics

Details

Tool Type:
command-line tool
Operating Systems:
Linux
Programming Languages:
C
Added:
8/3/2017
Last Updated:
11/25/2024

Operations

Data Inputs & Outputs

Sequence assembly

Outputs

Other operations do not define inputs or outputs.

Publications

Zhu X, Leung HCM, Wang R, Chin FYL, Yiu SM, Quan G, Li Y, Zhang R, Jiang Q, Liu B, Dong Y, Zhou G, Wang Y. misFinder: identify mis-assemblies in an unbiased manner using reference and paired-end reads. BMC Bioinformatics. 2015;16(1). doi:10.1186/s12859-015-0818-3. PMID:26573684. PMCID:PMC4647709.

PMID: 26573684
PMCID: PMC4647709
Funding: - National Nature Science Foundation of China: 61173085, 61571152, 81402054 - National High-Tech Research and Development Program (863) of China: 2012AA020409, 2012AA02A601, 2012AA02A604, 2012AA02A616 - Hong Kong GRF: HKU 7111/12E, HKU 719611E, HKU 719709E - Shenzhen Basic Research Project: JCYJ20120618143038947 - Outstanding Researcher Award: 102009124 - Heilongjiang Province and China Postdoctoral Projects: 2014M560272, LBH-Z14138 - Natural Science Foundation of Heilongjiang Province: 41400298-9-15057 - Natural Science Foundation of Heilongjiang Province (CN): F201214 - Scientific Research Fund of Heilongjiang Provincial Education Department: 12541240

Documentation

Links