To search, Click below search items.

 

All Published Papers Search Service

Title

A Hybrid Stemmer of Punjabi Shahmukhi Script

Author

Abdul Mateen, M. Kamran Malik, Zubair Nawaz , H. M. Danish, M. Hassan Siddiqui ,Qaiser Abbas¢Ó

Citation

Vol. 17  No. 8  pp. 90-97

Abstract

Stemming is a heuristic process to chop off end part of words and sometimes adding additional letters at the end of words to get the basic meaningful forms of surface words. The basic goal of stemming is to reduce inflectional forms of words to root words using multiple techniques. In this paper, hybrid approaches are used for stemming Punjabi words. There has not been any stemmer reported for Punjabi ??? ???? (Shahmukhi) script. We used database lookup approach and rule based stemming for Punjabi Stemmer. Our dataset consists of 2.5 million tokens which were divided into three parts of 1500000, 500000 and 500000 tokens and used for training, development and testing purpose respectively. We got 86.01% accuracy while tested our stemmer over above specified dataset by using 63 rules.

Keywords

Rule based stemmer, morphology, lookup approach, root words, hybrid stemmer, affixes and normalization.

URL

http://paper.ijcsns.org/07_book/201708/20170813.pdf