SMRT
SMRT implements a progressive computational approach for SNP calling and haplotype assembly from Single Molecular Sequencing (SMS) data to produce phased variant calls despite SMS's higher error rates.
Key Features:
- Data type: Processes Single Molecular Sequencing (SMS) reads that can cover approximately 90% of chromosomal positions.
- Progressive approach: Implements a progressive computational method tailored to mitigate SMS's higher error rate for SNP identification and haplotype construction.
- Scalability: Capable of processing large datasets such as >200 million non-N bases on Chromosome 1 with millions of reads across >100 blocks.
- Block processing: Handles blocks that can contain upwards of 2 million bases and average around 3,000 SNP sites per block.
- SNP density per sample: Reported average SNP sites per block are 3,378 for NA12878 and 5,736 for NA24385.
- Performance metrics: Reported false discovery rates are ~15.7% (NA12878) and ~16.5% (NA24385), false negative rate ~11.0%, and switch errors 7.26 (NA12878) and 5.21 (NA24385).
Scientific Applications:
- SNP calling: Produces variant calls from high-error-rate SMS reads for single-nucleotide polymorphism analysis.
- Haplotype assembly: Generates phased haplotypes from SMS data for downstream phasing analyses.
- Genetic diversity and ancestry studies: Provides phased variant and haplotype information suitable for analyses of genetic diversity and ancestry.
Methodology:
Implements a progressive computational approach specifically tailored for SMS data and is reported to process large-scale inputs (e.g., >200 million non-N bases on Chromosome 1 across millions of reads and >100 blocks).
Topics
Details
- Tool Type:
- command-line tool
- Operating Systems:
- Linux, Mac
- Programming Languages:
- Java
- Added:
- 6/30/2018
- Last Updated:
- 11/25/2024
Operations
Publications
Guo F, Wang D, Wang L. Progressive approach for SNP calling and haplotype assembly using single molecular sequencing data. Bioinformatics. 2018;34(12):2012-2018. doi:10.1093/bioinformatics/bty059. PMID:29474523.
PMID: 29474523
Funding: - Research Grants Council of the Hong Kong Special Administrative Region, China: CityU 11256116
- NSFC: 61373048, 61772362
- Tianjin Research Program of Application Foundation and Advanced Technology: 16JCQNJC00200