HAlign
HAlign performs large-scale multiple sequence alignment of homologous DNA and RNA sequences using a center-star strategy implemented in Java and optimized for distributed computation.
Key Features:
- Center-star multiple sequence alignment: Selects a single central reference sequence and aligns all other homologous DNA/RNA sequences against it to reduce computational complexity.
- Trie trees for acceleration: Uses trie tree data structures to accelerate alignment, reducing expected time complexity from quadratic to linear.
- Hadoop parallelization: Integrates Hadoop-based distributed processing to parallelize alignment across multiple nodes for improved scalability.
- Performance metrics: Demonstrates improved running time, sum-of-pairs scores, and scalability on large datasets.
- Implementation platform: Packaged as multi-platform Java software for deployment on distributed computing environments.
Scientific Applications:
- Genomic research: Facilitates alignment of homologous sequences across species or large genomes for comparative genomics analyses.
- Evolutionary biology: Enables study of evolutionary relationships by aligning DNA/RNA sequences from multiple organisms.
- Functional genomics: Assists identification of conserved regions and functional elements through large-scale multiple sequence alignment.
Methodology:
Implements a center-star MSA by selecting one sequence as the central reference, employs trie trees to achieve expected linear time complexity, and uses Hadoop for distributed parallel processing.
Topics
Details
- Tool Type:
- command-line tool
- Operating Systems:
- Linux
- Programming Languages:
- Java
- Added:
- 8/3/2017
- Last Updated:
- 11/25/2024
Operations
Publications
Zou Q, Hu Q, Guo M, Wang G. HAlign: Fast multiple similar DNA/RNA sequence alignment based on the centre star strategy. Bioinformatics. 2015;31(15):2475-2481. doi:10.1093/bioinformatics/btv177. PMID:25812743.
PMID: 25812743