FASTX-Toolkit
FASTX-Toolkit performs preprocessing and translation-based sequence comparisons of short-reads in FASTA/FASTQ formats to detect protein-coding regions and frameshifts by aligning translated DNA sequences to protein databases.
Key Features:
- Sequence comparison programs: The suite includes FASTX, FASTY, TFASTX, and TFASTY for comparative analysis between DNA and protein sequences.
- Translation and alignment capabilities: FASTX and FASTY translate a DNA sequence into three reading frames and align the translations against a protein database with allowance for gaps and frameshifts, while TFASTX and TFASTY translate sequences in a DNA database across six frames and align them to a protein sequence with gaps and frameshifts.
- Frameshift and substitution handling: FASTX and TFASTX permit frameshifts only between codons, whereas FASTY and TFASTY permit substitutions and frameshifts within codons.
- Performance evaluation: The toolkit has been evaluated across penalties for gap openings, gap extensions, frameshifts, and nucleotide substitutions and performs equivalently when query sequences contain up to 10% errors.
- Statistical accuracy: FASTX and FASTY provide statistical estimates that are generally accurate but can be less reliable when out-of-frame translation yields a low-complexity protein sequence.
Scientific Applications:
- Protein-coding gene identification: The toolkit is used to detect and characterize protein-coding regions and to identify putative coding sequences in genomic data.
- Genome-wide scanning and boundary correction: It has been applied to Mycoplasma genitalium, Haemophilus influenzae, and Methanococcus jannaschii, identifying at least nine new protein-coding genes and discovering at least 35 genes with potentially incorrect boundaries.
Methodology:
Translate DNA sequences into multiple reading frames (three for FASTX/FASTY and six for TFASTX/TFASTY) and align translated sequences against a protein database, allowing gaps and frameshifts and evaluating penalties for gap openings, extensions, frameshifts, and nucleotide substitutions.
Topics
Collections
Details
- License:
- AGPL-3.0
- Tool Type:
- web application, workflow
- Operating Systems:
- Linux, Windows, Mac
- Programming Languages:
- Shell, C++, C
- Added:
- 1/17/2017
- Last Updated:
- 11/24/2024
Operations
Publications
Pearson WR, Wood T, Zhang Z, Miller W. Comparison of DNA Sequences with Protein Sequences. Genomics. 1997;46(1):24-36. doi:10.1006/geno.1997.4995. PMID:9403055.
PMID: 9403055
Documentation
Links
Repository
https://github.com/agordon/fastx_toolkitIssue tracker
https://github.com/agordon/fastx_toolkit/issuesRelated Tools
cshl_fastx_artifacts_filter
Relation: includes
cshl_fastx_clipper
Relation: includes
cshl_fastx_collapser
Relation: includes
cshl_fastx_nucleotides_distribution
Relation: includes
cshl_fastx_quality_statistics
Relation: includes
cshl_fastx_renamer
Relation: includes
cshl_fastx_reverse_complement
Relation: includes
cshl_fastx_trimmer
Relation: includes
cshl_princeton_fastx_barcode_splitter
Relation: includes