SEQSpark

SEQSpark is a software tool that facilitates the analysis of large-scale sequence-based studies by implementing parallel processing based on Spark. It increases the speed and efficiency of performing data quality control, annotation, and association analysis. SEQSpark was demonstrated to be versatile and fast by analyzing whole-genome sequence data from the UK10K, testing for associations with waist-to-hip ratios. An exome-wide significant association was observed with CCDC62 using several rare variant aggregate association methods. The performance of SEQSpark was compared to Variant Association Tools and PLINK/SEQ, and it was always faster, reducing computation time to a hundredth of the time in some situations. SEQSpark will help large sequence-based epidemiological studies to quickly elucidate genetic variation involved in the etiology of complex traits.

Topic

Genetic variation;Exome sequencing;Whole genome sequencing

Detail

  • Operation: Genetic variation analysis

  • Software interface: Command-line user interface

  • Language: Java;Scala;Perl

  • License: Apache License 2.0

  • Cost: Free

  • Version name: -

  • Credit: National Human Genome Research Institute.

  • Input: -

  • Output: -

  • Contact: sleal@bcm.edu

  • Collection: -

  • Maturity: -

Publications

  • SEQSpark: A Complete Analysis Tool for Large-Scale Rare Variant Association Studies Using Whole-Genome and Exome Sequence Data.
  • Zhang D, et al. SEQSpark: A Complete Analysis Tool for Large-Scale Rare Variant Association Studies Using Whole-Genome and Exome Sequence Data. SEQSpark: A Complete Analysis Tool for Large-Scale Rare Variant Association Studies Using Whole-Genome and Exome Sequence Data. 2017; 101:115-122. doi: 10.1016/j.ajhg.2017.05.017
  • https://doi.org/10.1016/j.ajhg.2017.05.017
  • PMID: 28669402
  • PMC: PMC5501866

Download and documentation


< Back to DB search