CoMSA is a compression and decompression tool for FASTA and Stockholm format multiple sequence alignment (MSA) files. The algorithm in CoMSA relies on a generalization of the positional Burrows-Wheeler transform of non-binary characters. The Authors claim it to be significantly faster than gzip and it can, for example, compress a Stockholm file of size 41.6 Gb into 1.74 Gb, compared to gzip file size of 5.6 Gb. Apart from source code, CoMSA is also available with binaries for Windows and Linux.
Data management
Deorowicz S, Walczyszyn J, Debudaj-Grabysz A "CoMSA: compression of protein multiple sequence alignment files." Bioinformatics. 2019 Jan 15;35(2):227-234. https://doi.org/10.1093/bioinformatics/bty619
PMID: 30010777
If you find errors, please report here.
SECTIONS
TutorialsFind thousands of Bioinformatics and Life Science software tools and databases in the newly launched
Ads