CoMSA is a compression and decompression tool for FASTA and Stockholm format multiple sequence alignment (MSA) files. The algorithm in CoMSA relies on a generalization of the positional Burrows-Wheeler transform of non-binary characters. The Authors claim it to be significantly faster than gzip and it can, for example, compress a Stockholm file of size 41.6 Gb into 1.74 Gb, compared to gzip file size of 5.6 Gb. Apart from source code, CoMSA is also available with binaries for Windows and Linux.
Deorowicz S, Walczyszyn J, Debudaj-Grabysz A "CoMSA: compression of protein multiple sequence alignment files." Bioinformatics. 2019 Jan 15;35(2):227-234. https://doi.org/10.1093/bioinformatics/bty619
If you find errors, please report here.
Find thousands of Bioinformatics and Life Science software tools and databases in the newly launched