To search, Click below search items.


All Published Papers Search Service


Performing Natural Language Processing on Roman Urdu Datasets


Zareen Sharf and Dr Saif Ur Rahman


Vol. 18  No. 1  pp. 141-148


This work is a predecessor of a larger task which requires discourse based sentiment analysis on Roman Urdu Datasets. In order to perform this task, we first needed to collect a large data corpus in Roman Urdu from social Media websites. Next we cleaned the raw data, lexically normalized it for standard representation of words, performed POS tagging for the words to be tokenized meaningfully and finally identified the presence or absence of a discourse element. After achieving these task, we are now ready to perform Neural Network based sentiment Analysis on Roman Urdu dataset taking discourse into consideration as our future work.


Natural Language Processing, POS Tagging, Discourse units, Roman Urdu Data