MiPepid
MiPepid is a machine learning tool for identifying micro peptides, small proteins with lengths less than or equal to 100 amino acids, from DNA sequences. Short open reading frames (ORFs) that could produce micro peptides were traditionally ignored due to technical difficulties and the lack of experimentally confirmed small peptides. However, in recent years, many micropeptides have been shown to play significant roles in vital biological activities.
Existing tools for classifying coding and noncoding ORFs were built on datasets that considered "normal-sized" proteins as positives and short ORFs as noncoding. Since the functional and biophysical constraints on small peptides differ from those on "normal" proteins, MiPepid was developed to train and independently predict short-translated ORFs.
MiPepid uses logistic regression with 4-mer features and only requires the sequence information of an ORF to predict whether it encodes a micropeptide. The tool was trained using carefully cleaned data from existing databases and achieves 96% accuracy on a blind dataset of high-confidence micro peptides. It also correctly classifies newly discovered micro peptides not included in the training or blind test data.
Topic
Functional, regulatory and non-coding RNA;Machine learning;Small molecules;Gene transcripts;Biophysics
Detail
Operation: Coding region prediction;Regression analysis;miRNA expression analysis
Software interface: Command-line user interface
Language: Python
License: Not stated
Cost: Free of charge
Version name: -
Credit: -
Input: -
Output: -
Contact: Michael Gribskov mgribsko@purdue.edu
Collection: -
Maturity: -
Publications
- MiPepid: MicroPeptide identification tool using machine learning.
- Zhu M and Gribskov M. MiPepid: MicroPeptide identification tool using machine learning. MiPepid: MicroPeptide identification tool using machine learning. 2019; 20:559. doi: 10.1186/s12859-019-3033-9
- https://doi.org/10.1186/S12859-019-3033-9
- PMID: 31703551
- PMC: PMC6842143
Download and documentation
Documentation: https://github.com/MindAI/MiPepid/blob/master/README.md
Home page: https://github.com/MindAI/MiPepid
Data: https://github.com/MindAI/MiPepid/blob/master/datasets.tar.gz
< Back to DB search