To search, Click below search items.


All Published Papers Search Service


Effect of Pruning on Feature Ranking Metrics in Highly Skewed Datasets in Text Classi?cation


Muhammad Nabeel Asim, Abdur Rehman and Muhammad Idrees


Vol. 17  No. 10  pp. 135-144


A variety of feature ranking algorithms are available for text data to select appropriate features for a classification task. To improve the feature selection process, data is preprocessed to remove too frequent and too rare terms, called pruning. Although not required for non-text data, pruning has become and essential step to simplify the feature selection of text data, which results in boosting the overall classification performance. In this paper we have studied the effect of pruning on eight well known feature selection metrics, namely NDM, IG, ODDS, CHI, DFS, POIS, GINI and ACC2. while evaluation of FR metrics is done using featured micro and macro F1 measure by using SVM classifier. Experimental results on five bench mark datasets, including WAP, RE0, RE1, K1a and K1b, show that pruning adversely affect three feature ranking algorithms IG, DFS and ACC2, for which pruning reduces the overall efficiency of the classification. While pruning improves the classification performance for the rest five FR metrics.


Text Classi?cation, ranking algorithms