HAlign

HAlign performs large-scale multiple sequence alignment of homologous DNA and RNA sequences using a center-star strategy implemented in Java and optimized for distributed computation.


Key Features:

  • Center-star multiple sequence alignment: Selects a single central reference sequence and aligns all other homologous DNA/RNA sequences against it to reduce computational complexity.
  • Trie trees for acceleration: Uses trie tree data structures to accelerate alignment, reducing expected time complexity from quadratic to linear.
  • Hadoop parallelization: Integrates Hadoop-based distributed processing to parallelize alignment across multiple nodes for improved scalability.
  • Performance metrics: Demonstrates improved running time, sum-of-pairs scores, and scalability on large datasets.
  • Implementation platform: Packaged as multi-platform Java software for deployment on distributed computing environments.

Scientific Applications:

  • Genomic research: Facilitates alignment of homologous sequences across species or large genomes for comparative genomics analyses.
  • Evolutionary biology: Enables study of evolutionary relationships by aligning DNA/RNA sequences from multiple organisms.
  • Functional genomics: Assists identification of conserved regions and functional elements through large-scale multiple sequence alignment.

Methodology:

Implements a center-star MSA by selecting one sequence as the central reference, employs trie trees to achieve expected linear time complexity, and uses Hadoop for distributed parallel processing.

Topics

Details

Tool Type:
command-line tool
Operating Systems:
Linux
Programming Languages:
Java
Added:
8/3/2017
Last Updated:
11/25/2024

Operations

Publications

Zou Q, Hu Q, Guo M, Wang G. HAlign: Fast multiple similar DNA/RNA sequence alignment based on the centre star strategy. Bioinformatics. 2015;31(15):2475-2481. doi:10.1093/bioinformatics/btv177. PMID:25812743.

Documentation

Links