ALICO

ALICO generates randomized versions of input multiple sequence alignments while preserving dependence structure, average percent identities (PID), and per-sequence k-mer composition to produce alignment-constrained null models for statistical validation of sequence analyses.


Key Features:

  • Preservation of Crucial Alignment Features: Randomized alignments retain the dependence structure present in the original multiple sequence alignment.
  • Percent Identity (PID) Maintenance: The method preserves average pairwise percent identities between input sequences.
  • K-mer Composition Resemblance: Per-sequence k-mer composition is maintained to mirror genomic training data characteristics.
  • Pairwise Alignment Training Data Requirement: The approach operates using only pairwise alignment training data rather than multiple-alignment training sets.

Scientific Applications:

  • Homology-aware Finders: Provides null models for homology-aware tools such as PhyloCon, MEME with conservation prior, and PRIORITY-C.
  • Naive Finders: Supplies randomized alignments for naive finders like GibbsMarkov and for datasets such as the MacIsaac orthologous yeast data.
  • Combining Results from Multiple Finders: Enables combination of results via p-values derived from ALICO sampling to integrate outputs from multiple sequence analysis tools.

Methodology:

ALICO generates alignment-constrained null models by systematic randomization of sequences while maintaining dependence structure, average PID, and k-mer composition, and produces samples used to compute p-values for statistical validation.

Topics

Details

License:
Unlicense
Maturity:
Mature
Cost:
Free of charge
Tool Type:
command-line tool
Operating Systems:
Linux, Windows, Mac
Programming Languages:
Perl
Added:
8/3/2017
Last Updated:
11/24/2024

Operations

Publications

Ng P, Keich U. Alignment Constrained Sampling. Journal of Computational Biology. 2011;18(2):155-168. doi:10.1089/cmb.2010.0220. PMID:21314455.

Documentation

Downloads

Links