53 Free DNA Sequence Analysis Tools - Software and Resources

General DNA, RNA sequence Analysis Tools

  1. ALP
    • Description : ALP (Ascending Ladder Program) is a tool to calculate the statistical parameters in the modified Gumbel distribution for BLAST. The ALP algorithm computes E-values for random local DNA-DNA and protein-protein alignments, gap costs, and character abundances for any substitution matrix. FALP (Frameshift Ascending Ladder Program) is a tool for comparable tasks for frameshifting DNA-protein alignments. The tools are available as a library or a standalone implementation.
  2. Arioc
    • Description : Arioc is a set of tools to align short bisulfite-treated DNA sequences (BS-seq reads) to long reference DNA sequences. The Arioc algorithm runs both in GPU and CPU and uses parallel sort and reduction routines to distinguish locations of likely alignments.
  3. BioWord
    • Description : BioWord tool is an add-on for Microsoft Word 2007 and 2010 word-processors to manipulate DNA sequences. BioWord editing functions include reverse-complementing, translating, sequence searching, pair-wise sequence alignment, motif discovery, generation of consensus logos representing multiple sequence alignments (MSA), and FASTA formatting.
  4. BuddySuite
    • Description : BuddySuite is a collection of four related tools:
      1. SeqBuddy is a tool to handle FASTA, GenBank, and NEXUS sequence file formats. SeqBuddy includes functions to manipulate and analyze sequence data using 50+ separate tool modules.
      2. AlignBuddy: 30 separate tool modules to read, write, analyze, and manipulate PHYLIP, Stockholm, and NEXUS sequence alignment files.
      3. PhyloBuddy: Consists of 18 tool modules to manage and manipulate phylogenetic trees in NEXUS, Newick, and NeXML formats.
      4. DatabaseBuddy: Contains function to search NCBI, UniProt, and Ensembl databases. The DatabaseBuddy algorithms can sort and filter the search results.
  5. CorGen
    • Description : CorGen is a web-based tool to measure long-range correlations in DNA sequences characterized by a power-law decay of the autocorrelation function of the GC-content. CorGen also has a function that generates random DNA sequences with user-specified parameters, alternatively by using the parameters obtained from another DNA sequence.
  6. cpgplot
    • Description : cpgplot is a tool for plotting and identification of CpG islands in nucleotide sequences. The cpgplot algorithm computes CpG islands in overlapping windows along a sequence and by default defines a CpG island where the percent G + C is more than 50% and the observed vs. expected ratio is over 0.6. The minimum length of the region is 200 bases and at least 10 windows.
  7. cpgplot_(EBI)
    • Description : Cpgplot or "EMBOSS Cpgplot" is a web-based tool at EBI to recognize and plot CpG islands in nucleotide sequences. See also Cpgplot
  8. DAMBE7
    • Description : DAMBE7 is a tool for genomic and phylogenetic sequence data analysis. The DAMBE7 package includes functions for 1. Sequence alignment, 2. Molecular phylogenetics, 3. Position weight matrix to analyze sequence motifs, 4. Perceptron for classification of sequence motifs, 5. Gibbs sampler, 6. Hidden Markov models, 7. Secondary structure prediction, 8. rRNA anticodon identification, 9. Codon usage bias, 10. Computation of isoelectric point, and 11. Peptide mass fingerprinting. DAMBE7 works with a variety of well-known sequence formats.
  9. DIAL
    • Description : DIAL (dihedral alignment) is a web-based tool for RNA sequence alignment based on structures. The DIAL algorithm does not require a reference genome and utilizes nucleotide sequence, dihedral angle, and nucleotide base-pairing similarity. The DIAL includes functions for Needleman-Wunsch (global), Smith-Waterman (local), motif search (global-semi global) alignments, and a viewer for 3-dimensional superposition of query and target.
  10. IgDiscover
    • Description : IgDiscover tool is for the analyzes of antibody repertoires. IgDiscover algorithm identifies new V genes, heavy chains, kappa, and lambda light chains to discover VH, VK, and VL genes.
  11. Orchid
    • Description : Orchid is a machine learning framework tool to manage, annotate, and analyze cancer mutations to support the knowledge of tumor genetic data. Example of usage: Sub-typing aggressive vs. non-aggressive prostate cancer using mutational profiles in tumor sequence data. NOTE: Orchid requires code or data under separate licenses or copyrights restricting the usage to non-commercial activities.
  12. PyBamView
    • Description : PyBamView is a tool to visualize sequence alignments from BAM files with an optiontional of FASTA-formated reference genome. The PyBamView algorithm renders Single-nucleotide polymorphism (SNP), insertions, and deletions and provides an export function for the creation of publication-ready figures.
  13. SPARSE
    • Description : SPARSE (Sparsified Prediction and Alignment of RNAs based on their structure Ensembles) is a tool to align RNA sequences based on structural properties of RNA ensembles. The SPARSE algorithm uses a Sankoff-style algorithm and runs in quadratic time without heuristics.
  14. supermatcher
    • Description : supermatcher is a tool to compute approximate alignments between search sequences and the target sequences, for example, sequences in a database. The supermatcher algorithm determines likely matching sequences utilizing word matches and produces the sequence alignments using the Smith-Waterman local alignment method.
  15. unitas
    • Description : unitas is a tool to annotate small non-coding RNA datasets generated by high-throughput sequencing. The unitas algorithm uses the latest reference sequences from public online databases for the annotation.
  16. WebSat
    • Description : WebSat is a web-based tool to predict molecular markers, visualization of microsatellites, and design primers for them. The WebSat algorithm accepts user-defined search parameters and a simple way to export the results.
  17. wordmatch
    • Description : wordmatch is a tool to find all identical matches between two nucleotide sequences.

Repeat Analysis Tools

  1. ATRHunter
    • Description : ATRHunter is a tool to find approximate tandem repeats in DNA sequences. The ATRHunter algorithm uses a statistical model allowing a variety of definitions of tandem repeats.
    • Description : CENSOR is a tool to mask a sequence given a reference collection of sequences. CENSOR also reports all masked sequences. Note that EBI has retired this tool.
    • Description : CHOPCHOP is a web-based tool to select target sites for CRISPR/Cas9- or TALEN-directed mutagenesis. The CHOPCHOP algorithm predicts off-target binding of single-guide RNAs (sgRNAs) and TALENs using effective sequence alignment methods and visualizes the candidate target sites, restriction sites, primer candidates, together with color-coded quality scores.
  4. CRISPRCasFinder
    • Description : CRISPRCasFinder is a tool to find CRISPR (clustered regularly interspaced short palindromic repeats) arrays and detect Cas proteins. The CRISPRCasFinder algorithm aids validation using a rating system, predicts the orientation of CRISPRs, and detects and types Cas protein based on the latest classification.
  5. CRISPRcompar
    • Description : CRISPRcompar is a web-based tool to assist biologists using the CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) for comparative and evolutionary analyzes of closely related bacterial strains.

      The CRISPRs web server contains several tools for the CRISPR analyzes: CRISPRdb, BLAST CRISPRdb, CRISPRcompar, CRISPRtionary, FlankAlign, CRISPRs finder, and CRISPRs Utilities.
  6. CRISPRdb
    • Description : CRISPRdb is a web-based database for the identification and analyses of CRISPRs structures (Clustered Regularly Interspaced Short Palindromic Repeats).
  7. CRISPRFinder
    • Description : CRISPRFinder is a web-based tool for the discovery of CRISPRs, the definition of direct repeats (DR), extraction of spacers, obtaining flanking sequences from the Genbank database, and examine DRs in prokaryotic genomes in general. See also CRISPRcompar
  8. detectIR
    • Description : detectIR is a tool to find perfect and imperfect repeats and inverted repeats in DNA sequences. The detectIR algorithm uses vector calculation of complex numbers.
  9. Dfam
    • Description : Dfam is a web-based database containing transposable Element DNA sequence alignments (interspersed repeats), Hidden Markov Models (HMMs), consensus sequences, and genome annotations. The Dfam database represents the transposable element alignments as families together with annotations.
  10. einverted
    • Description : einverted is a tool for finding inverted repeats, or stem-loops, in nucleotide sequences. The einverted algorithm uses dynamic programming for the local alignments.
  11. equicktandem
    • Description : equicktandem is a tool to detect nucleotide sequence sections that potentially contain repeats in tandem. The equicktandem algorithm identifies segments of sequences where bases match elsewhere in the sequence without gaps. The equicktandem algorithm scores each match with +1 and each mismatch with -1.
  12. etandem
    • Description : etandem is a tool to find tandem repeats in DNA sequences. The tandem algorithm computes consensus sequences and scores them using +1 for each match and -1 for each mismatch. The tool accepts A, C, G, T, and N characters as input and uses non-overlapping windows to search putative repeated sequence stretches.
  13. Genomon-ITDetector
    • Description : Genomon-ITDetector is a tool to discover regions duplicated in tandem specifically in cancer genome sequencing data sets. The Genomon-ITDetector depends on the following externals tools: 1. blat (Ver. 34x13), 2. bedtools (Ver. 2.14.3), 3. CAP3 (Ver.Date: 12/21/07), 4. fasta36 (Ver. 3.5c), 5. SAMtools Ver. 0.1.18), and 6. refGene.txt, knownGene.txt, ensGene.txt and simpleRepeat.txt from the UCSC.
  14. GREAM
    • Description : GREAM (Genomic Repeat Element Analyzer for Mammals) is a tool to select, screen, and analyze genomic repeats in mammals that are likely to be important.

      GREAM offers the following listings and analyzes:
      1. Produce a categorized list of a wide range of statistically over- or under-reperensented repeated elements, and specific types, such as, for example, transposons, retro-transposons.
      2. Enrichment within a specied region of a chromosome.
      3. comparative distribution across the neighborhood of orthologous genes.
  15. hipSTR
    • Description : HipSTR (Haplotype inference and phasing for Short Tandem Repeats) is a tool to genotype, phase short tandem repeats (STRs), and to analyze and validate de novo STR mutations genome-wide. HipSTR also includes a function to visualize the supporting reads. The HipSTR algorithm uses an EM algorithm to learn locus-specific PCR stutter models, a hidden Markov model (HMM) to align reads to candidate alleles avoiding STR artifacts, and phased SNP haplotypes for genotyping and phasing.
  16. Kmer-SSR
    • Description : Kmer-SSR is a tool to detect simple sequence repeats (SSRs) in genomic sequences. The Kmer-SSR algorithm has an option for an exhaustive search.
  17. LobSTR
    • Description : lobSTR is a tool to align and genotype short tandem repeat profiles from high-throughput sequencing data. The lobSTR algorithm uses concepts from signal processing and statistical learning methods to circumvent gapped alignment to filter noise.
  18. LTR_Finder
    • Description : LTR_Finder (Long Terminal Repeat Finder) is a web-based tool to find full-length LTR retrotransposons in genome sequences.
  19. mreps
    • Description : mreps is a tool to identify tandem repeats in DNA sequences.
  20. palindrome
    • Description : palindrome is a tool to find inverted repeats (palindromes, stem-loops) in nucleotide sequences. The palindrome algorithm detects all inverted repeats given a minim and a maximum length, maximum gap, and a maximum number of mismatches.
  21. PlotRep
    • Description : PLOTREP is a web-based tool to visualize dispersed genomic repeats. The PLOTREP algorithm merges similar repeat copies and visualizes the results alike to dot plots.
  22. Repbase
    • Description : Repbase is a web-based database consisting of eukaryotic transposable elements (TEs) and repeat sequence elements. Some sections of the service require registration.
  23. Repeat Enrichment Estimator
    • Description : Repeat Enrichment Estimator is a web-based tool to estimate the enrichment of repetitive elements from short-read sequencing data. The Repeat Enrichment Estimator processes simple ChIP-control paired and large datasets. The web site contains assemblies for Human hs36, Mouse mm9, Drosophila dm3.
  24. RepeatAnalyzer
    • Description : RepeatAnalyzer is a tool to store, manage, and analyze short sequence repeats (SSRs) to identify strains. The RepeatAnalyzer uses Anaplasma marginale as a model species, but the tool can analyze any SSRs in any species. The RepeatAnalyzer algorithm uses regional genetic diversity as a part of analyses and has functions for visualizing genotype and SSR distributions.
  25. RepeatExplorer
    • Description : RepeatExplorer pipeline tool is for the identification and characterization of DNA repeats in plant and animal genomes using high-throughput data sets. The RepeatExplorer algorithm utilizes graph-based clustering.
  26. RepeatMasker
    • Description : RepeatMasker is a tool to detect repeats and low complexity DNA sequences. The RepeatMasker can use nhmmer, cross_match, ABBlast, WUBlast, RMBlast, and Decypher for repeat detection, as well as Dfam and Repbase libraries. The RepeatMasker algorithm outputs annotation and a FASTA file with repeats masked, i.e., replaced by Ns by default.
  27. RepeatModeler
    • Description : RepeatModeler is a tool to find transposable elements and consists of three programs to compute repeat boundaries and classify family relationships from DNA sequence data sets:
  28. RepeatRunner
    • Description : RepeatRunner is a pipeline tool to identify repeated sequences in DNA sequences. The RepeatRunner algorithm uses RepeatMasker to search nucleotide libraries of knowns repeats and BLASTX searches.
  29. REPET
    • Description : REPET pipeline is a tool to detect, annotation, and analyze repeats in genomic DNA sequences. The REPET algorithm has two separate pipelines, specifically designed for the analysis of transposable elements (TE). 1. TEdenovo pipeline compares a genome to itself using BLASTER and clusters the results using GROUPER, RECON, and PILER that are clustering tools, specific for interspersed repeated sequences. It builds multiples sequence alignments for each of the clusters and classifies each of them by specific transposable element features to create a non-redundant library of consensus sequences. 2. TEannot pipeline uses BLASTER, RepeatMasker, and CENSOR to annotate the library that TEdenovo created. TEannot pipeline also annotates short, simple repeats (SSRs), using TRF, RepeatMasker, and MREPS.
  30. REPuter
    • Description : REPuter is a tool to study repetitive DNA on a genomic scale. The REPuter algorithm detects various types of repeats, reports statistical significance, and has interactive visualization.
  31. ReUPRed
    • Description : ReUPred (Repetitive Units Predictor) is a tool to predict and classify repeat units. The ReUPred algorithm uses Structure Repeat Unit Library (SRUL) derived from RepeatsDB.
  32. Satellog
    • Description : Satellog is a database to identify and dynamically prioritize repeats by using various characteristics, for example, repeat unit, repeat length percentile rank, class, period, total length, genomic coordinates, UniGene polymorphism profile, proximity to or presence within gene regions, such as CDS, UTR, location upstream.
  33. SBARS
    • Description : SBARS (Spectral-Based Approach for Repeats Search) is a tool to identify various types of repeats. The SBARS algorithm uses spectral methods to profile nucleotide sequences on multiple scales to decrease the running time for creating dot plots.
  34. STRScan
    • Description : STRScan is a tool to profile short tandem repeats (STRs) in high-throughput sequencing data sets and is useful for human identity testing.
  35. TRAP
    • Description : TRAP (the Tandem Repeats Analysis Program) is a tool to choose, classify, quantify, and automatic annotation of sequences repeated in tandem. The TRAP algorithm utilizes the results from the Tandem Repeats Finder to analyze the satellite content of DNA sequences.
  36. TRF
    • Description : Tandem Repeats Finder is a tool to find tandem repeats in DNA sequences. The Tandem Repeats Finder algorithm uses k-tuples for matching to speed up the computation and computes consensus sequences.

