SEQSpark

SEQSpark is a software tool that facilitates the analysis of large-scale sequence-based studies by implementing parallel processing based on Spark. It increases the speed and efficiency of performing data quality control, annotation, and association analysis. SEQSpark was demonstrated to be versatile and fast by analyzing whole-genome sequence data from the UK10K, testing for associations with waist-to-hip ratios. An exome-wide significant association was observed with CCDC62 using several rare variant aggregate association methods. The performance of SEQSpark was compared to Variant Association Tools and PLINK/SEQ, and it was always faster, reducing computation time to a hundredth of the time in some situations. SEQSpark will help large sequence-based epidemiological studies to quickly elucidate genetic variation involved in the etiology of complex traits.

Topic

Genetic variation;Exome sequencing;Whole genome sequencing

Detail

Operation: Genetic variation analysis
Software interface: Command-line user interface
Language: Java;Scala;Perl
License: Apache License 2.0
Cost: Free
Version name: -
Credit: National Human Genome Research Institute.
Input: -
Output: -
Contact: sleal@bcm.edu
Collection: -
Maturity: -

Publications

SEQSpark: A Complete Analysis Tool for Large-Scale Rare Variant Association Studies Using Whole-Genome and Exome Sequence Data.
Zhang D, et al. SEQSpark: A Complete Analysis Tool for Large-Scale Rare Variant Association Studies Using Whole-Genome and Exome Sequence Data. SEQSpark: A Complete Analysis Tool for Large-Scale Rare Variant Association Studies Using Whole-Genome and Exome Sequence Data. 2017; 101:115-122. doi: 10.1016/j.ajhg.2017.05.017
https://doi.org/10.1016/j.ajhg.2017.05.017
PMID: 28669402
PMC: PMC5501866

Download and documentation

Source: https://github.com/statgenetics/seqspark
Documentation: https://github.com/statgenetics/seqspark/tree/master/docs
Home page: https://github.com/statgenetics/seqspark

< Back to DB search