FASTA

FASTA compares protein and DNA sequences to identify regions of local and global similarity for inferring homology and evaluating evolutionary relationships.

Key Features:

Initial identity seeding: Uses amino-acid identity-based seeding to accelerate searches and improve selectivity.
Local and global detection: Detects both local and global sequence similarities between protein and DNA sequences.
PAM250 rescoring: Applies the PAM250 substitution matrix to score regions with numerous identities and to rescore alignments for increased sensitivity.
Joining of similarity regions: Joins initial regions of similarity to combine separated conserved segments, accommodating variable-length loops and improving detection of related sequences such as G-protein-coupled receptors.
Initial pairwise scoring: Calculates initial pairwise similarity scores that allow multiple regions of similarity to be joined and enhance overall alignment scores.
Large-library searching: Optimized to search large sequence libraries (e.g., the NBRF protein sequence library of 2.5 million residues) on standard hardware (e.g., an IBM-PC microcomputer) in under 20 minutes.
Sensitivity versus selectivity: Trades slight sensitivity for increased selectivity and speed compared with NWS-based programs, which are generally slower but sometimes more sensitive.
RDF2 program: Evaluates significance of similarity scores using a shuffling method that preserves local sequence composition.
LFASTA program: Identifies all regions of local similarity above a threshold and presents them as graphic matrix plots or individual alignments.
Lineage from FASTP: Developed from the predecessor FASTP into more sensitive variants including the present FASTA implementations.

Scientific Applications:

Homology inference: Infers homologous relationships and evaluates evolutionary relationships among protein and DNA sequences.
Gene family identification: Identifies members of gene families, including distantly related family members while limiting false positives.
Functional annotation support: Supports inference of functional relationships through sequence similarity evidence.
Analysis of complex protein families: Detects conserved segments separated by variable-length loops in families such as G-protein-coupled receptors.
Statistical validation: Uses RDF2 shuffling to assess the statistical significance of similarity scores.

Methodology:

Computational steps explicitly include amino-acid identity-based initial seeding, calculation of initial pairwise similarity scores, joining multiple regions of similarity, rescoring joined regions with the PAM250 matrix, significance evaluation by RDF2 using sequence shuffling that preserves local composition, and identification/plotting of local similarity regions by LFASTA; searches can be run against large sequence libraries.

Visit Official Homepage →

Topics

Sequence composition, complexity and repeats Gene transcripts Sequence analysis Sequencing

Collections

FASTA

Details

Tool Type:: web application
Operating Systems:: Linux, Windows, Mac
Added:: 1/29/2015
Last Updated:: 11/25/2024

Operations

Publications

Pearson WR. [5] Rapid and sensitive sequence comparison with FASTP and FASTA. Methods in Enzymology. 1990. doi:10.1016/0076-6879(90)83007-v. PMID:2156132.

DOI: 10.1016/0076-6879(90)83007-v

PMID: 2156132

Pearson WR, Lipman DJ. Improved tools for biological sequence comparison.. Proceedings of the National Academy of Sciences. 1988;85(8):2444-2448. doi:10.1073/pnas.85.8.2444. PMID:3162770. PMCID:PMC280013.

DOI: 10.1073/pnas.85.8.2444

PMID: 3162770

PMCID: PMC280013

DOI: 10.1073/pnas.85.8.2444

PMID: 3162770

PMCID: PMC280013