To search, Click below search items.

 

All Published Papers Search Service

Title

Hybrid Clustering Approach for Concept Generation

Author

K.Thammi Reddy, M.Shashi, L.Pratap Reddy

Citation

Vol. 7  No. 4  pp. 62-69

Abstract

Information retrieval is one of the major research areas due to accumulation of huge information in digital form. Various techniques of Information retrieval are based on the fact that various terms present in a document along with their frequency of occurrence signify the semantics of the document. Recent attempts to find the relevant document for a context represents documents in a Latent Semantic Indexing (LSI) model as document-term vector representing term weights for every index term in that document. As there will be enormous number of index terms this leads to high dimensionality problem. We can reduce the dimensionality based on the observation that groups of terms associated with related concepts occur together or do not occur in a document based on whether the document is relevant or not to that concept. Such a group of terms is identified as a Concept and can be viewed as a single dimension in a Rough set based information retrieval system. In this paper we present a hybrid clustering approach for the formation of equivalence classes of terms associated with related concepts. It uses the outcome of hierarchical clustering to provide seed points for implementing Incremental K-means algorithm. Due to the sparsity of the term vector, the cosine similarity estimate is found to be less effective for term clustering. Another promising measure of proximity estimate generally used in information retrieval is the Euclidian distance that it is biased towards changes in the term frequencies in larger documents when the term weights are represented by Term frequency-inverse document frequency (tf-idf) estimates. In this paper we propose a new term weight estimate namely term probability?inverse document frequency (tp-idf) for representing a term as a vector before clustering the terms

Keywords

Hierarchical clustering, Partitional clustering, Text mining, Dimensionality reduction, Proximity estimate.

URL

http://paper.ijcsns.org/07_book/200704/20070408.pdf