image
...Continued from previous page.

Part 4: Introduction to Information Theory and Its Applications to DNA and Protein Sequence Alignments


33 Software Tools To Generate Sequence Logos

Online Tools

1. kpLogo, (k-mer probability logo) an integrated framework for sensitive detection and visualization of position-specific ultra-short motifs from either weighted or unweighted sequences aimed especially for high-throughput studies, such as in vitro selection. The source code is freely available for download. kpLogo is also available for Galaxy tool and is available in Galaxy Tool Shed.

Publication: Xuebing Wu and David P. Bartel (2017). "kpLogo: positional k-mer analysis reveals hidden specificity in biological sequences." Nucleic Acids Res. 2017 Jul 3; 45(Web Server issue): W534–W538. PMC

2. Gene Slider, can visualize entropy and conservation of long DNA and protein sequences. It has a zoom function to view details or an entire sequence. It is also available as an app on http://araport.org website implemented in JavaScript and Processing.js. In addition, the source code is available at SourceForge under GNU GPLv2 license.

Publication: Waese J, Pasha A, Wang TT, van Weringh A, Guttman DS, Provart NJ.(2016). "Gene Slider: sequence logo interactive data-visualization for education and research." Bioinformatics. 2016 Dec 1;32(23):3670-3672. Epub 2016 Aug 13.

3. The iceLogo web server and SOAP service. Instead of fixed background probabilities for nucleotides 0.25 and amino acids 0.05, the iceLogo compares peptide sequences against a reference sequence set, visualizing over- and underrepresented residues. The software is free for all users.

Publication: Davy Maddelein, Niklaas Colaert, Iain Buchanan, Niels Hulstaert, Kris Gevaert, and Lennart Martens. (2015). "The iceLogo web server and SOAP service for determining protein consensus sequences." Nucleic Acids Res. 2015 Jul 1; 43(Web Server issue): W543–W546. PMC

Publication: Niklaas Colaert, Kenny Helsens, Lennart Martens, Joël Vandekerckhove & Kris Gevaert. (2009). "Improved visualization of protein consensus sequences by iceLogo." Nature Methods volume 6, pages 786–787 (2009)

4. LogOddsLogo, based on WebLogo 3 source code and uses per-observation multiple-alignment sequence alignment log-odds scores for DNA and protein sequences. Aims for simple usage. The source code is freely available for download at the FTP site.

Publication: Yu YK, Capra JA, Stojmirovic A, Landsman D, Altschul, SF. (2015). "Log-odds sequence logos". Bioinformatics 31:324-31. PubMed

5. pLogo: a probabilistic approach to visualizing sequence motifs, displays motifs where the heights are proportional to their statistical significance. License: Free for academic and non-profit organizations.

Publication: Joseph P O’Shea, Michael F Chou, Saad A Quader, James K Ryan , George M Church & Daniel Schwartz. (2013). "pLogo: a probabilistic approach to visualizing sequence motifs." Nat Methods 10, 1211-1212. doi:10.1038/nmeth.2646

6. BlockLogo visualizes continuous and discontinuous protein sequence motifs. A user can select motif positions. In addition to a sequence logo, the program outputs a table of motif frequencies.

Publication: Lars Rønn Olsen, Ulrich Johan Kudahl, Christian Simon, Jing Sun, Christian Schönbach, Ellis L. Reinherz, Guang Lan Zhang, Vladimir Brusic. (2013). "BlockLogo: Visualization of peptide and sequence motif conservation." J Immunol Methods. 2013 Dec 31;400-401:37-44. doi: 10.1016/j.jim.2013.08.014. Epub 2013 Aug 31. PubMed

7. Seq2Logo for visualization of amino acid binding motifs. Specific features: sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion. The software is also available for download.

Publication: Martin Christen Frolund Thomsen; Morten Nielsen. (2012). "Seq2Logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion." Nucleic Acids Research 2012; 40 (W1): W281-W287.

8. RILogo web server creates logos for two interacting RNAs, downloadable in SVG, EPS, and PNG formats. The server accepts input in FASTA, Stockholm, and CLUSTAL formats. The source code is covered by GNU LGPL 3 license and is downloadable at Github.

Publication: Peter Menzel Stefan E. Seemann Jan Gorodkin. (2012), "RILogo: visualizing RNA–RNA interactions." Bioinformatics, Volume 28, Issue 19, 1 October 2012, Pages 2523–2526, https://doi.org/10.1093/bioinformatics/bts461

9. The MHC Motif Viewer visualizes MHC binding motifs. Clickable motifs available for Human, Swine, Murine, Gorilla, Chimpanzee, and Macaque alleles.

Publication: Nicolas Rapin, Ilka Hoof, Ole Lund, and Morten Nielsen. (2010). "The MHC Motif Viewer: A Visualization Tool for MHC Binding Motifs." Current Protocols in Immunology. doi:10.1002/0471142735.im1817s88

10. The MEME Suite Motif-based sequence analysis tools.

Publication: Timothy L. Bailey, Mikael Bodén, Fabian A. Buske, Martin Frith, Charles E. Grant, Luca Clementi, Jingyuan Ren, Wilfred W. Li, William S. Noble. (2009). "MEME SUITE: tools for motif discovery and searching", Nucleic Acids Research, 37:W202-W208, 2009

11. CorreLogo: An online server for 3D sequence logos of RNA and DNA alignments. The server produces the output in VRML and JVX format, requiring the corresponding viewers VRML viewer or JavaView. Both are freely downloadable.

Publication: Eckart Bindewald, Thomas D. Schneider1 and Bruce A. Shapiro. (2006). "CorreLogo: an online server for 3D sequence logos of RNA and DNA alignments." Nucleic Acids Research, 2006, Vol. 34, Web Server issue W405–W411 doi:10.1093/nar/gkl269, PDF

12. Two Sample Logo calculates and visualizes differences between two sets of aligned samples of protein or DNA sequences.

Publication: Vacic V., Iakoucheva L.M., and Radivojac P. (2006). "Two Sample Logo: A Graphical Representation of the Differences between Two Sets of Sequence Alignments." Bioinformatics, 22(12): 1536-1537. PMC

13. enoLOGOS: A versatile web tool for energy normalized sequence logos.No server response

Publication: Christopher T. Workman, Yutong Yin, David L. Corcoran, Trey Ideker, Gary D. Stormo, and Panayiotis V. Benos. (2005). "enoLOGOS: a versatile web tool for energy normalized sequence logos." Nucleic Acids Res. 2005 Jul 1; 33(Web Server issue): W389–W392. PMC

14. WebLogo 3 is based on the original inventor's Tom Schneider and Mike Stephens to construct sequence logos of DNA and RNA sequences with a few additional features and enhancements. The package is available for download covered with open source license.

Publication: Crooks GE, Hon G, Chandonia JM, Brenner. (2004). "SE WebLogo: A sequence logo generator." Genome Research, 14:1188-1190, (2004)

15. Skylign Online tool for creating logos of both sequence alignments and hidden Markov model profiles. Skylign also offers POST and GET for remote access. For installing the server on your site, see installation instructions. License: Creative Commons Attribution 3.0 Unported License.

Publication: Benjamin Schuster-Böckler, Jörg Schultz and Sven Rahmann. (2004). "HMM Logos for visualization of protein families." BMC Bioinformatics20045:7 https://doi.org/10.1186/1471-2105-5-7

16. RNA Structure Logo extends the sequence logo by Schneider and Stephens by prior nucleotide distribution, mutual information content, and allowing gaps in the alignment.

Publication: J. Gorodkin, L. J. Heyer, S. Brunak and G. D. Stormo. (1997). "Displaying the information contents of structural RNA alignments: the structure logos." Comput. Appl. Biosci., Vol. 13, no. 6 pp 583-586, 1997.

17. Protein Sequence Logos is a variation of RNA Structure Logo using relative entropy. It has two different display modes: 1. The height of a logo is proportional to its frequency, and 2. The height of a logo is in proportion to the fraction of the observed frequency and the expected, a priori, frequency.

Publication: J. Gorodkin, L. J. Heyer, S. Brunak and G. D. Stormo. (1997). "Displaying the information contents of structural RNA alignments: the structure logos." Comput. Appl. Biosci., Vol. 13, no. 6 pp 583-586, 1997.

18. TINYRAY web-logo creates sequences logos based on Position Weight Matrices (PWM). PWM, introduced by Gary Stormo, is a common way to represent motifs or patterns in both DNA and protein sequences. Other names: position-specific weight matrix (PSWM) or position-specific scoring matrix (PSSM). TINYRAY also offers a POST request service by embedding a client-side application to your website.

Other Software To Create Sequence Logos

1. motifStack, an R/Bioconductor package for DNA/RNA sequence motif, affinity logo, and amino acid sequence motifs.

Publication: Ou J, Wolfe SA, Brodsky MH, Zhu LJ. (2018). “motifStack for the analysis of transcription factor binding site evolution.” Nature Methods, 15, 8-9. doi: 10.1038/nmeth.4555, http://dx.doi.org/10.1038/nmeth.4555.

2. CircularLogo, a web-based application visualizes a position-specific nucleotide consensus, diversity and displays intra-motif dependencies. It is implemented in JavaScript and Python based on the Django web framework. The source code is freely available at http://circularlogo.sourceforge.net. The web server at http://bioinformaticstools.mayo.edu did not respond at the time of writing.

Publication: Zhenqing Ye, Tao Ma, Michael T. Kalmbach, Surendra Dasari, Jean-Pierre A. Kocher and Liguo Wang. (2017). "CircularLogo: A lightweight web application to visualize intra-motif dependencies." BMC BioinformaticsBMC series 201718:269 https://doi.org/10.1186/s12859-017-1680-2

3. ggseqlogo, an R package for generating publication-ready sequence logos using ggplot2.

Publication: Wagih, Omar. (2017). "ggseqlogo: a versatile R package for drawing sequence logos". Bioinformatics 33, no. 22 (2017): 3645-3647. https://doi.org/10.1093/bioinformatics/btx469 PMID: 29036507

4. Logolas, an R package for Enrichment Depletion Logo plots with string symbols, an extension to the seqLogo package. It has an adaptive scaling of position-weight matrices (PWMs).

Publication: Dey, K.K., Xie, D. and Stephens, M., 2017. "A new sequence logo plot to highlight enrichment and depletion." bioRxiv doi:10.1101/226597.

5. DiffLogo, an R package showing differences between DNA and protein motifs. Publication ready visualizations.

Publication: Martin Nettling, Hendrik Treutler, Jan Grau, Jens Keilwagen, Stefan Posch and Ivo Grosse. (2015). "DiffLogo: a comparative visualization of sequence motifs." BMC Bioinformatics201516:387 https://doi.org/10.1186/s12859-015-0767-x

6. SequenceLogoVis, aims to enhance perception difficulties by introducing a glyph-based approach. It is an easily customizable Javascript library. The results can also be exported as a scalable vector graphic (SVG) for publication. The source code is freely downloadable at Github.

Publication: E. Maguire, P. Rocca-Serra, S.-A. Sansone, and M. Chen. (2014). "Redesigning the sequence logo with glyph-based approaches to aid interpretation." In Proceedings of EuroVis 2014, Short Paper PDF

7. SequenceLogoVis, aims to enhance perception difficulties by introducing a glyph-based approach. It is an easily customizable Javascript library. The results can also be exported as a scalable vector graphic (SVG) for publication. The source code is freely downloadable at Github.

Publication: E. Maguire, P. Rocca-Serra, S.-A. Sansone, and M. Chen. (2014). "Redesigning the sequence logo with glyph-based approaches to aid interpretation." In Proceedings of EuroVis 2014, Short Paper PDF

8. The JProfileGrid, an alternative to sequence logos. The software is free and available from http://www.ProfileGrid.org.

Publication: Alberto I Roca. (2014). "ProfileGrids: a sequence alignment visualization paradigm that avoids the limitations of Sequence Logos." BMC Proc. 2014; 8(Suppl 2): S6. Published online 2014 Aug 28. doi: 10.1186/1753-6561-8-S2-S6

9. CodonLogo, based on WebLogo3 and uses the same heuristics but treats codons as inseparable units of a 64-letter alphabet. The source code available for download at http://recode.ucc.ie/CodonLogo and at the Galaxy Tool Shed.

Publication: Virag Sharma, David P. Murphy, Gregory Provan, and Pavel V. Baranov. (2012). "CodonLogo: a sequence logo-based viewer for codon patterns." Bioinformatics. 2012 Jul 15; 28(14): 1935–1936. Published online 2012 May 17. doi: 10.1093/bioinformatics/bts295

10. HOMER (Hypergeometric Optimization of Motif EnRichment), a suite of tools for Motif Discovery and next-gen sequencing analysis. A collection of command line programs in Perl and C++. Freely downloadable.

Publication: Heinz S, Benner C, Spann N, Bertolino E et al. (2010) "Simple Combinations of Lineage-Determining Transcription Factors Prime cis-Regulatory Elements Required for Macrophage and B Cell Identities." Mol Cell 2010 May 28;38(4):576-589. PMID: 20513432

11. LogoBar is a Java application to display protein sequence logos. Freely downloadable.

Publication: Pérez-Bercoff, Å., Koch, J. and Bürglin, T.R. (2006). "LogoBar: bar graph visualization of protein logos with gaps." Bioinformatics 22, 112-114.

12. The berrylogo, a sequence logo alternative, and plots log relative frequency for the background frequency from GC-content. Source available for download at Github.

Publication: Charles Berry, Sridhar Hannenhalli, Jeremy Leipzig, and Frederic D Bushman. (2006). "Selection of Target Sites for Mobile DNA Integration in the Human Genome." PLoS Comput Biol. 2006 Nov; 2(11): e157. Published online 2006 Nov 24. doi: 10.1371/journal.pcbi.0020157

13. DNAlogo, a mini application written in Visual Basic. The exe-file is freely downloadable at Github.

Publication: Yabin GUO. "DNAlogo: a smart mini application for generating DNA sequence logos." bioRxiv, The reprint Server for Biology, https://doi.org/10.1101/096933

14. sequence_motifs.js, a jQuery plugin for making sequence motifs. Demo: jsfiddle.

15. RWebLogo, a wrapper for the WebLogo python package (BSD license)

Summary

Information is relative and measured as \(H_{Before}-H_{After}\). Any decrease in entropy requires energy and according to the second law of thermodynamics, in a closed system entropy can only be constant or increase.

If the probabilities of a message or any sequence of symbols are not known, we can calculate the maximum entropy by taking a logarithm base two of the number of symbols in that sequence; Thus, by taking individual probabilities into account, the entropy is lower than the maximum, except when all the symbols are equally likely.

DNA can maximally carry two bits of information and protein sequences about 4.3 bits. However, in sequence alignments usage of different scoring systems usually results in lower information content than the maximum possible.

You may also be interested in:

Multiple sequence alignment (MSA) tools

Single nucleotide polymorphism (SNP) tools

What next?

Tutorials Main Page

Pair-wise sequence alignment
Pair-wise sequence alignment methods
Construction of substitution matrices
DNA scoring matrices
How to select the right scoring matrix?





References

Data Never Sleeps 5.0 domo.com

Szilard L. (1929). "Über die Entropieverminderung in einem thermodynamischen System bei Eingrien intelligenter Wesen." Z. Phys. 53, 840–856. doi:10.1007/BF01341281.

Fisher R.A. (1935) "The logic of inductive inference." J. R. Stat. Soc. 98, 39–82. doi:10.2307/2342435.

Kullback S. "Information theory and statistics." In Wiley 1959 New York, NY:Wiley Google Scholar

Shannon C.E., Weaver W. (1962). "The mathematical theory of communication." In The University of Illinois Press 1962 Urbana, IL:The University of Illinois Press Google Scholar

Jaynes E.T. (2003). "Probability theory. The logic of science." In Cambridge University Press 2003 Cambridge, UK:Cambridge University Press Google Scholar

Karnani M., Pääkkönen K., Annila A. (2009). "The physical character of information." Proc. R. Soc. A. 465 (2107): 2155–75. doi:10.1098/rspa.2009.0063.

Cargill Gilston Knott (1911). "Quote from undated letter from Maxwell to Tait". Life and Scientific Work of Peter Guthrie Tait. Cambridge University Press. pp. 213–215.

Shannon, Claude E. (July 1948). "A Mathematical Theory of Communication". Bell System Technical Journal. 27 (3): 379–423. doi:10.1002/j.1538-7305.1948.tb01338.x.

Shannon, Claude E. (October 1948). "A Mathematical Theory of Communication". Bell System Technical Journal. 27 (4): 623–666. doi:10.1002/j.1538-7305.1948.tb00917.x

Robert C Edgar, "Optimizing substitution matrix choice and gap parameters for sequence alignment." BMC Bioinformatics 2009, 10:396. doi:10.1186/1471-2105-10-396

S Henikoff and J G Henikoff, "Empirical determination of effective gap penalties for sequence comparison." Bioinformatics. 2002 Nov;18(11):1500-7. PubMed

Schneider TD; Stephens RM (1990). "Sequence Logos: A New Way to Display Consensus Sequences." Nucleic Acids Res. 18 (20): 6097–6100. doi:10.1093/nar/18.20.6097

Schneider TD; Stormo GD (1986). "Information content of binding sites on nucleotide sequences." Journal of Molecular Biology. 188 (3): 415–431. doi:10.1016/0022-2836(86)90165-8

Schneider TD (2002). "Consensus Sequence Zen." Appl Bioinform. 1 (3): 111–119. PMC 1852464

Anzaldi LJ; Muñoz-Fernández D; Erill I. (2012). "BioWord: a sequence manipulation suite for Microsoft Word." BMC Bioinformatics. 13 (124): 124. doi:10.1186/1471-2105-13-124

Schneider TD, Stephens RM. (1990). "Sequence Logos: A New Way to Display Consensus Sequences." Nucleic Acids Res. 18:6097-6100