To search, Click below search items.

 

All Published Papers Search Service

Title

Urdu News Classification using Application of Machine Learning Algorithms on News Headline

Author

Muhammad Badruddin Khan

Citation

Vol. 21  No. 2  pp. 229-237

Abstract

Our modern ¡®information-hungry¡¯ age demands delivery of information at unprecedented fast rates. Timely delivery of noteworthy information about recent events can help people from different segments of life in number of ways. As world has become global village, the flow of news in terms of volume and speed demands involvement of machines to help humans to handle the enormous data. News are presented to public in forms of video, audio, image and text. News text available on internet is a source of knowledge for billions of internet users. Urdu language is spoken and understood by millions of people from Indian subcontinent. Availability of online Urdu news enable this branch of humanity to improve their understandings of the world and make their decisions. This paper uses available online Urdu news data to train machines to automatically categorize provided news. Various machine learning algorithms were used on news headline for training purpose and the results demonstrate that Bernoulli Na?ve Bayes (Bernoulli NB) and Multinomial Na?ve Bayes (Multinomial NB) algorithm outperformed other algorithms in terms of all performance parameters. The maximum level of accuracy achieved for the dataset was 94.278% by multinomial NB classifier followed by Bernoulli NB classifier with accuracy of 94.274% when Urdu stop words were removed from dataset. The results suggest that short text of headlines of news can be used as an input for text categorization process.

Keywords

Text categorization, Machine learning, Na?ve Bayes, Support vector machine, Logistic regression, Word Cloud, Urdu language

URL

http://paper.ijcsns.org/07_book/202102/20210227.pdf