ALICO
ALICO generates randomized versions of input multiple sequence alignments while preserving dependence structure, average percent identities (PID), and per-sequence k-mer composition to produce alignment-constrained null models for statistical validation of sequence analyses.
Key Features:
- Preservation of Crucial Alignment Features: Randomized alignments retain the dependence structure present in the original multiple sequence alignment.
- Percent Identity (PID) Maintenance: The method preserves average pairwise percent identities between input sequences.
- K-mer Composition Resemblance: Per-sequence k-mer composition is maintained to mirror genomic training data characteristics.
- Pairwise Alignment Training Data Requirement: The approach operates using only pairwise alignment training data rather than multiple-alignment training sets.
Scientific Applications:
- Homology-aware Finders: Provides null models for homology-aware tools such as PhyloCon, MEME with conservation prior, and PRIORITY-C.
- Naive Finders: Supplies randomized alignments for naive finders like GibbsMarkov and for datasets such as the MacIsaac orthologous yeast data.
- Combining Results from Multiple Finders: Enables combination of results via p-values derived from ALICO sampling to integrate outputs from multiple sequence analysis tools.
Methodology:
ALICO generates alignment-constrained null models by systematic randomization of sequences while maintaining dependence structure, average PID, and k-mer composition, and produces samples used to compute p-values for statistical validation.
Topics
Details
- License:
- Unlicense
- Maturity:
- Mature
- Cost:
- Free of charge
- Tool Type:
- command-line tool
- Operating Systems:
- Linux, Windows, Mac
- Programming Languages:
- Perl
- Added:
- 8/3/2017
- Last Updated:
- 11/24/2024
Operations
Publications
Ng P, Keich U. Alignment Constrained Sampling. Journal of Computational Biology. 2011;18(2):155-168. doi:10.1089/cmb.2010.0220. PMID:21314455.
PMID: 21314455