CloudBurst
CloudBurst maps short reads from next-generation DNA sequencing technologies to reference genomes to provide scalable, parallel alignment for large-scale analyses such as SNP discovery, genotyping, and personal genomics.
Key Features:
- Parallelization with MapReduce: CloudBurst leverages the Hadoop implementation of MapReduce to parallelize execution across multiple compute nodes, enabling scaling with data size and computational resources.
- Performance Efficiency: In a 24-processor core configuration, CloudBurst achieves up to a 30-fold increase in speed compared to single-core execution of RMAP while producing an identical set of alignments.
- Scalability: Running time scales linearly with the number of reads mapped and exhibits near-linear speedup with added processors, with reported performance improvements exceeding 100-fold on a 96-core remote compute cloud, reducing processing from hours to minutes for millions of short reads.
- Flexibility in Alignment Reporting: Modeled after RMAP, CloudBurst can report either all possible alignments or the unambiguous best alignment per read and accommodate any number of mismatches or differences.
Scientific Applications:
- Reference Genome Mapping: Mapping next-generation sequence data to the human genome and other reference genomes.
- SNP Discovery: Identifying single nucleotide polymorphisms within large sequencing datasets.
- Genotyping: Determining genetic variants across individuals or populations.
- Personal Genomics: Facilitating analyses of individual genomic data for personalized medicine applications.
Methodology:
CloudBurst uses the Hadoop implementation of MapReduce to distribute read-mapping tasks across compute nodes and implements alignment reporting behavior modeled on RMAP.
Topics
Details
- Maturity:
- Legacy
- Tool Type:
- command-line tool
- Operating Systems:
- Linux, Windows, Mac
- Programming Languages:
- Java
- Added:
- 1/13/2017
- Last Updated:
- 11/25/2024
Operations
Publications
Schatz MC. CloudBurst: highly sensitive read mapping with MapReduce. Bioinformatics. 2009;25(11):1363-1369. doi:10.1093/bioinformatics/btp236. PMID:19357099. PMCID:PMC2682523.