To search, Click below search items.

 

All Published Papers Search Service

Title

A Compromise between N-gram Length and Classifier Characteristics for Protein Classification

Author

Faouzi Mhamdi, Ricco Rakotomalala, Mourad Elloumi

Citation

Vol. 6  No. 4  pp. 82-87

Abstract

Many scientific works deal with the protein classification problem and various learning methods and descriptors are used in them. In this paper, we want to systematize the analysis of the behavior of learning algorithms according to the features extracted from the primary description of proteins. We have used n-grams descriptors by testing the interaction between various length n of n-grams and the characteristics of the supervised learning methods. The main conclusion is that moderate length of n-grams (n = 2 or n = 3, ...) and linear support vector classifier (SVM) give the best compromise. But, a thorough analyze of the results puts into perspective this conclusion: the main characteristic which influences the accuracy of the classifier seems to be the dimensionality of the representation space.

Keywords

Data mining, Protein Classification, n-grams, KNN, SVM, CART

URL

http://paper.ijcsns.org/07_book/200604/200604A15.pdf