SMRT

SMRT implements a progressive computational approach for SNP calling and haplotype assembly from Single Molecular Sequencing (SMS) data to produce phased variant calls despite SMS's higher error rates.


Key Features:

  • Data type: Processes Single Molecular Sequencing (SMS) reads that can cover approximately 90% of chromosomal positions.
  • Progressive approach: Implements a progressive computational method tailored to mitigate SMS's higher error rate for SNP identification and haplotype construction.
  • Scalability: Capable of processing large datasets such as >200 million non-N bases on Chromosome 1 with millions of reads across >100 blocks.
  • Block processing: Handles blocks that can contain upwards of 2 million bases and average around 3,000 SNP sites per block.
  • SNP density per sample: Reported average SNP sites per block are 3,378 for NA12878 and 5,736 for NA24385.
  • Performance metrics: Reported false discovery rates are ~15.7% (NA12878) and ~16.5% (NA24385), false negative rate ~11.0%, and switch errors 7.26 (NA12878) and 5.21 (NA24385).

Scientific Applications:

  • SNP calling: Produces variant calls from high-error-rate SMS reads for single-nucleotide polymorphism analysis.
  • Haplotype assembly: Generates phased haplotypes from SMS data for downstream phasing analyses.
  • Genetic diversity and ancestry studies: Provides phased variant and haplotype information suitable for analyses of genetic diversity and ancestry.

Methodology:

Implements a progressive computational approach specifically tailored for SMS data and is reported to process large-scale inputs (e.g., >200 million non-N bases on Chromosome 1 across millions of reads and >100 blocks).

Topics

Details

Tool Type:
command-line tool
Operating Systems:
Linux, Mac
Programming Languages:
Java
Added:
6/30/2018
Last Updated:
11/25/2024

Operations

Publications

Guo F, Wang D, Wang L. Progressive approach for SNP calling and haplotype assembly using single molecular sequencing data. Bioinformatics. 2018;34(12):2012-2018. doi:10.1093/bioinformatics/bty059. PMID:29474523.

PMID: 29474523
Funding: - Research Grants Council of the Hong Kong Special Administrative Region, China: CityU 11256116 - NSFC: 61373048, 61772362 - Tianjin Research Program of Application Foundation and Advanced Technology: 16JCQNJC00200

Documentation