To search, Click below search items.

 

All Published Papers Search Service

Title

SMOTE-GBM: An Improved Classification Model for Early Folding Residues During Protein Folding

Author

Isra Al-Turaiki

Citation

Vol. 20  No. 3  pp. 217-222

Abstract

Proteins are fundamental molecules that play important roles in the cell. The function and behavior of proteins are determined by their native structure. However, the protein folding process is not well understood. Machine learning algorithms have been widely used to solve bioinformatics problems. Building predictive models from early folding residues (EFRs) has recently been investigated. However, the datasets used suffer from the class imbalance problem. This renders the classification task difficult. In this paper, we address the class imbalance problem in an EFR dataset using the synthetic minority oversampling technique (SMOTE). We trained an ensemble model, the gradient boosted machine (GBM), using the balanced dataset. We then compared the performance of our trained model with that of other models in the literature. Our experimental results indicate that better classification performance is obtained when oversampling is used to overcome the class imbalance problem. In particular, better improvement was observed in terms of precision, recall, and F-measure values.

Keywords

Early folding residue (EFR), Machine learning, Synthetic minority oversampling technique (SMOTE), Ensemble, Gradient boosted machine (GBM)

URL

http://paper.ijcsns.org/07_book/202003/20200328.pdf