CloudBurst

CloudBurst maps short reads from next-generation DNA sequencing technologies to reference genomes to provide scalable, parallel alignment for large-scale analyses such as SNP discovery, genotyping, and personal genomics.


Key Features:

  • Parallelization with MapReduce: CloudBurst leverages the Hadoop implementation of MapReduce to parallelize execution across multiple compute nodes, enabling scaling with data size and computational resources.
  • Performance Efficiency: In a 24-processor core configuration, CloudBurst achieves up to a 30-fold increase in speed compared to single-core execution of RMAP while producing an identical set of alignments.
  • Scalability: Running time scales linearly with the number of reads mapped and exhibits near-linear speedup with added processors, with reported performance improvements exceeding 100-fold on a 96-core remote compute cloud, reducing processing from hours to minutes for millions of short reads.
  • Flexibility in Alignment Reporting: Modeled after RMAP, CloudBurst can report either all possible alignments or the unambiguous best alignment per read and accommodate any number of mismatches or differences.

Scientific Applications:

  • Reference Genome Mapping: Mapping next-generation sequence data to the human genome and other reference genomes.
  • SNP Discovery: Identifying single nucleotide polymorphisms within large sequencing datasets.
  • Genotyping: Determining genetic variants across individuals or populations.
  • Personal Genomics: Facilitating analyses of individual genomic data for personalized medicine applications.

Methodology:

CloudBurst uses the Hadoop implementation of MapReduce to distribute read-mapping tasks across compute nodes and implements alignment reporting behavior modeled on RMAP.

Topics

Details

Maturity:
Legacy
Tool Type:
command-line tool
Operating Systems:
Linux, Windows, Mac
Programming Languages:
Java
Added:
1/13/2017
Last Updated:
11/25/2024

Operations

Publications

Schatz MC. CloudBurst: highly sensitive read mapping with MapReduce. Bioinformatics. 2009;25(11):1363-1369. doi:10.1093/bioinformatics/btp236. PMID:19357099. PMCID:PMC2682523.

Documentation