To search, Click below search items.

 

All Published Papers Search Service

Title

A Feature Selection Methods Based on Concept Extraction and SOM Text Clustering Analysis

Author

Lin Wang, Minghu Jiang, Shasha Liao, Yinghua Lu

Citation

Vol. 6  No. 1  pp. 20~28

Abstract

The feature selection is an important part in automatic classification. In this paper, we use the HowNet to extract the concept attributes from word to build a feature set. However, as the concept defi4nition sometimes is too weak in expression, we set a shielded level in the sememe Tree and filter the concept attributes which can not give enough information for classification, and reserve the word whose definition is too weak in expression. By this method, we build a feature set composing of both sememes from the HowNet and the Chinese words. We also give different sememes different values according to their expression ability and relation to the word when we extract them from the word. After comparing the weight theories and classification precise, we give the CHI-MCOR weight method, which is derived from two normal methods. Then we use the Self-Organizing Map (SOM) to realize automatic text clustering. The experiment result shows that if we can extract the sememes properly, we can not only reduce the feature dimension but also improve the classification precise. The combined weight method makes a good balance between the fuzzy words which have a high occurrence and the dividing words which have a middle or low occurrence, and the classification precise is higher than other weight methods. SOM can be used in text clustering in large scales and the clustering results are good when the concept feature is selected. Between-cluster distance of the texts of concept features is bigger than that of texts of word features, word features data nevertheless exhibit some clusters.

Keywords

Concept Attributes, Self-Organizing Map, Clustering, Text Classification

URL

http://paper.ijcsns.org/07_book/200601/200601A04.pdf