To search, Click below search items.


All Published Papers Search Service


A Comparative Study of Effective Supervised Learning Methods on Arabic Text Classification


Rachid Sammouda


Vol. 17  No. 12  pp. 130-133


Nowadays, Arabic Text Classification (ATC) is attracting researchers’ attention in many fields, including text mining, web search, social media, security, and other fields. Text Classification or Categorization (TC) is the process of classifying text documents to proper categories based on their contents. Few studies have been developed for the comparison of supervised learning (SL) methods on ATC. Consequently, this paper is concerned with ATC of Arabic documents. The proposed approach adopted for this comparative study consists of three steps: (i) document pre-processing step where Arabic stop words, punctuations, diacritics, common prefix and suffix (Arabic words light stemmer) are removed from the Arabic documents, (ii) document filtering step where the words strings are converted into number of individual words vectors using term frequency transform (TFT) technique, inverse document frequency transform (IDFT) technique and both, (iii) classification step where a comparison of eight effective known SL methods is adopted for ATC. The impact of using TFT, IDFT and both on the effectiveness of these SL methods is also studied. The results show that the accuracy of 10-fold cross validation test mode obtained by LSVM classifier with IDFT technique is the highest compared to other SL methods used in this study. This outcome can be used in the future as a guidance for developers of ATC applications.


Text Classification of Arabic documents, Supervised Learning Methods, Arabic Light Stemmer, and Weka Tool.