Crossbow

Crossbow leverages Bowtie and SOAPsnp to perform Hadoop-based parallel short-read alignment and single nucleotide polymorphism (SNP) calling on high-throughput DNA sequencing data.


Key Features:

  • Integration of Bowtie and SOAPsnp: Integrates Bowtie for efficient short-read alignment and SOAPsnp for SNP calling.
  • Hadoop-based parallel processing: Uses the Hadoop framework to distribute computation across cloud-based clusters for parallel processing.
  • High-throughput sequencing support: Processes high-throughput DNA sequencing and short DNA sequence data at scale.
  • Combined alignment and genotyping workflow: Combines alignment and SNP calling into a cohesive pipeline for genotyping analyses.
  • Performance demonstration: Demonstrated analysis of a 38-fold coverage human genome in approximately 3 hours on a 320-CPU cloud cluster.
  • Cost example: The reported performance example was achieved at an approximate expense of $85.

Scientific Applications:

  • Large-scale genomic studies: Enables rapid processing of whole-genome sequencing datasets such as human genome analyses.
  • Genotyping and SNP discovery: Performs genome-wide single nucleotide polymorphism detection and genotyping from short-read data.
  • High-coverage variant analysis: Applicable to analysis of high-coverage sequencing datasets for variant detection and characterization.

Methodology:

Alignment is performed with Bowtie, SNP calling with SOAPsnp, and computations are distributed using the Hadoop framework across cloud-based clusters (example: 320 CPUs analyzing a 38-fold human genome in ~3 hours at ~$85).

Topics

Details

Maturity:
Mature
Tool Type:
command-line tool
Programming Languages:
Perl
Added:
1/13/2017
Last Updated:
11/25/2024

Operations

Publications

Langmead B, Schatz MC, Lin J, Pop M, Salzberg SL. Searching for SNPs with cloud computing. Genome Biology. 2009;10(11). doi:10.1186/gb-2009-10-11-r134. PMID:19930550. PMCID:PMC3091327.

Documentation