RSAT oligo-analysis
RSAT oligo-analysis computes k-mer frequencies and detects statistically over-represented and under-represented oligonucleotides and oligopeptides to identify regulatory motifs and sequence signals such as transcription factor binding sites, restriction sites, RNA termination and polyadenylation signals, and replication origins.
Key Features:
- K-mer frequency computation: Computes and analyzes frequencies of oligonucleotides (k-mers) and oligopeptides across nucleotide and protein sequences.
- Motif discovery: Identifies DNA binding sites for transcription factors by extracting and analyzing upstream regulatory sequences from families of coregulated genes.
- Statistical rigor: Employs an exhaustive statistical approach defining motif significance based on observed frequencies in non-coding sequences from the yeast genome.
- Over-/under-representation detection: Detects motifs that are significantly over-represented or under-represented in input sequence sets.
- Polyadenylation signal analysis: Analyzes oligonucleotide composition downstream of stop codons to identify over-represented words associated with efficiency and positioning, including distributions around ~35 bp after the stop codon.
- Oligopeptide frequency analysis: Computes and analyzes oligopeptide frequencies in protein sequences to reveal sequence patterns.
- Restriction site identification: Detects motifs indicative of restriction sites within bacterial genomes by analyzing motif frequency patterns.
- Comparative genomics and genome-scale pattern matching: Supports comparative genomics and regulatory variation analyses and performs genome-scale pattern matching.
- Data integration: Integrates data from fully sequenced genomes with updates from GenBank for background and comparative analyses.
Scientific Applications:
- Transcription factor binding motif discovery: Identification of putative transcription factor binding sites from coregulated gene upstream regions.
- Restriction site identification: Mapping and detection of restriction motifs in bacterial genomic sequences.
- RNA processing signal detection: Detection of RNA termination and polyadenylation signals by analyzing oligonucleotide composition downstream of stop codons.
- Replication origin analysis: Analysis of oligonucleotide frequency patterns to investigate replication origin sequences.
- Protein sequence analysis: Examination of oligopeptide frequency distributions to study protein sequence features and evolution.
Methodology:
Computes and analyzes k-mer frequencies; performs exhaustive statistical analysis defining motif significance from observed frequencies in non-coding sequences (yeast genome); identifies over- and under-represented motifs; analyzes oligonucleotide composition downstream of stop codons (noting enrichment around 35 bp); extracts and analyzes upstream regulatory sequences from coregulated genes; analyzes oligopeptide frequencies in protein sequences; and performs genome-scale pattern matching and comparative genomics analyses.
Topics
Collections
Details
- License:
- AFL-3.0
- Maturity:
- Mature
- Cost:
- Free of charge
- Tool Type:
- api, command-line tool, web application
- Operating Systems:
- Linux, Mac
- Programming Languages:
- Perl
- Added:
- 3/24/2016
- Last Updated:
- 11/24/2024
Operations
Data Inputs & Outputs
Sequence motif discovery
Inputs
Outputs
Publications
van Helden J, André B, Collado-Vides J. Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies 1 1Edited by G. von Heijne. Journal of Molecular Biology. 1998;281(5):827-842. doi:10.1006/jmbi.1998.1947. PMID:9719638.
Helden Jv. Statistical analysis of yeast genomic downstream sequences reveals putative polyadenylation signals. Nucleic Acids Research. 2000;28(4):1000-1010. doi:10.1093/nar/28.4.1000. PMID:10648794. PMCID:PMC102588.
Medina-Rivera A, Defrance M, Sand O, Herrmann C, Castro-Mondragon JA, Delerce J, Jaeger S, Blanchet C, Vincens P, Caron C, Staines DM, Contreras-Moreira B, Artufel M, Charbonnier-Khamvongsa L, Hernandez C, Thieffry D, Thomas-Chollier M, van Helden J. RSAT 2015: Regulatory Sequence Analysis Tools. Nucleic Acids Research. 2015;43(W1):W50-W56. doi:10.1093/nar/gkv362. PMID:25904632. PMCID:PMC4489296.
van Helden J. Regulatory Sequence Analysis Tools. Nucleic Acids Research. 2003;31(13):3593-3596. doi:10.1093/nar/gkg567. PMID:12824373. PMCID:PMC168973.
Documentation
Downloads
- Binarieshttp://rsat.eu/
- Source codehttp://rsat.eu/