SEQSpark
SEQSpark is a software tool that facilitates the analysis of large-scale sequence-based studies by implementing parallel processing based on Spark. It increases the speed and efficiency of performing data quality control, annotation, and association analysis. SEQSpark was demonstrated to be versatile and fast by analyzing whole-genome sequence data from the UK10K, testing for associations with waist-to-hip ratios. An exome-wide significant association was observed with CCDC62 using several rare variant aggregate association methods. The performance of SEQSpark was compared to Variant Association Tools and PLINK/SEQ, and it was always faster, reducing computation time to a hundredth of the time in some situations. SEQSpark will help large sequence-based epidemiological studies to quickly elucidate genetic variation involved in the etiology of complex traits.
Topic
Genetic variation;Exome sequencing;Whole genome sequencing
Detail
Operation: Genetic variation analysis
Software interface: Command-line user interface
Language: Java;Scala;Perl
License: Apache License 2.0
Cost: Free
Version name: -
Credit: National Human Genome Research Institute.
Input: -
Output: -
Contact: sleal@bcm.edu
Collection: -
Maturity: -
Publications
- SEQSpark: A Complete Analysis Tool for Large-Scale Rare Variant Association Studies Using Whole-Genome and Exome Sequence Data.
- Zhang D, et al. SEQSpark: A Complete Analysis Tool for Large-Scale Rare Variant Association Studies Using Whole-Genome and Exome Sequence Data. SEQSpark: A Complete Analysis Tool for Large-Scale Rare Variant Association Studies Using Whole-Genome and Exome Sequence Data. 2017; 101:115-122. doi: 10.1016/j.ajhg.2017.05.017
- https://doi.org/10.1016/j.ajhg.2017.05.017
- PMID: 28669402
- PMC: PMC5501866
Download and documentation
Documentation: https://github.com/statgenetics/seqspark/tree/master/docs
Home page: https://github.com/statgenetics/seqspark
< Back to DB search