Tuiuiu
Tuiuiu identifies local similarities and approximate repeats in biological sequences and filters sequences with insertions, deletions, and substitutions to enable efficient preprocessing for multiple sequence alignment, repeats inference, and downstream phylogenetic analyses, particularly at high error rates.
Key Features:
- Handling High Error Rates: Manages error rates exceeding 10% of the repeat length while tolerating insertions, deletions, and substitutions.
- Preprocessing Capability: Serves as a preprocessing filter for multiple alignment or repeats inference by excluding sequences unlikely to contain approximate repeats, thereby reducing input size for downstream methods.
- Multiple Versions with Varying Sensitivity: Version 1 extends existing necessary-condition criteria from the literature to multiple sequences; Version 2 implements a stronger condition that increases filtering efficiency without substantial additional computational time; Version 3 incorporates an additional condition that further enhances sensitivity at the cost of increased computation in certain scenarios and is particularly beneficial for large error rates.
- Rapid Verification of Necessary Conditions: Verifies several strong necessary conditions quickly and efficiently to enable early exclusion of sequences from approximate-repeat analysis.
Scientific Applications:
- Preprocessing for Multiple Sequence Alignment: Reduces combined filtering-plus-alignment time by an average factor of 63 and up to 530 compared to direct alignment, often yielding higher-quality alignments.
- Repeats Inference and Phylogenetic Analysis: Facilitates scalable inference of repeats and improves downstream phylogenetic analyses by removing sequences unlikely to contain approximate repeats from consideration.
Methodology:
Tuiuiu operates by quickly verifying several strong necessary conditions that identify sequences that can be confidently excluded from containing approximate repeats, balancing computational efficiency and sensitivity.
Topics
Details
- Tool Type:
- api
- Operating Systems:
- Linux, Windows, Mac
- Added:
- 8/3/2015
- Last Updated:
- 11/25/2024
Operations
Publications
Peterlongo P, Sacomoto GAT, do Lago AP, Pisanti N, Sagot M. Lossless filter for multiple repeats with bounded edit distance. Algorithms for Molecular Biology. 2009;4(1). doi:10.1186/1748-7188-4-3. PMID:19183438. PMCID:PMC2661881.