CloudBurst

CloudBurst maps short reads from next-generation DNA sequencing technologies to reference genomes to provide scalable, parallel alignment for large-scale analyses such as SNP discovery, genotyping, and personal genomics.

Key Features:

Parallelization with MapReduce: CloudBurst leverages the Hadoop implementation of MapReduce to parallelize execution across multiple compute nodes, enabling scaling with data size and computational resources.
Performance Efficiency: In a 24-processor core configuration, CloudBurst achieves up to a 30-fold increase in speed compared to single-core execution of RMAP while producing an identical set of alignments.
Scalability: Running time scales linearly with the number of reads mapped and exhibits near-linear speedup with added processors, with reported performance improvements exceeding 100-fold on a 96-core remote compute cloud, reducing processing from hours to minutes for millions of short reads.
Flexibility in Alignment Reporting: Modeled after RMAP, CloudBurst can report either all possible alignments or the unambiguous best alignment per read and accommodate any number of mismatches or differences.

Scientific Applications:

Reference Genome Mapping: Mapping next-generation sequence data to the human genome and other reference genomes.
SNP Discovery: Identifying single nucleotide polymorphisms within large sequencing datasets.
Genotyping: Determining genetic variants across individuals or populations.
Personal Genomics: Facilitating analyses of individual genomic data for personalized medicine applications.

Methodology:

CloudBurst uses the Hadoop implementation of MapReduce to distribute read-mapping tasks across compute nodes and implements alignment reporting behavior modeled on RMAP.

Visit Official Homepage →

Topics

Genotype and phenotype Personalised medicine

Details

Maturity:: Legacy
Tool Type:: command-line tool
Operating Systems:: Linux, Windows, Mac
Programming Languages:: Java
Added:: 1/13/2017
Last Updated:: 11/25/2024

Operations

Read mapping

Publications

Schatz MC. CloudBurst: highly sensitive read mapping with MapReduce. Bioinformatics. 2009;25(11):1363-1369. doi:10.1093/bioinformatics/btp236. PMID:19357099. PMCID:PMC2682523.

DOI: 10.1093/bioinformatics/btp236

PMID: 19357099

PMCID: PMC2682523

Documentation

General

https://sourceforge.net/p/cloudburst-bio/wiki/CloudBurst/

← Back to search