To search, Click below search items.


All Published Papers Search Service


A Model for Processing Arabic Text on Twitter


Mohamed Osman Hegazi, Yasser Al-Dossari, Abdullah Al-Yahya, Abdulaziz Al-Sumari, and Anwer Hilal


Vol. 20  No. 5  pp. 150-157


This paper proposes a model that can be used as a framework for preprocessing Arabic text on Twitter for data analysis and information extraction. The model provides an online collection of Arabic text on Twitter and stores it in a structured database. The source data are then preprocessed to derive clean, meaningful Arabic text from which information can be extracted. The paper presents new methods and algorithms for preprocessing unstructured Arabic text on social media, and it provides solutions that address the difficulties of working with Arabic text on social media, including uncleaned, informal, and dialect language. The preprocessed Arabic text is stored in structured database tables to provide a useful data set to which information selection and data analysis algorithms can be applied. The implementation of the model yields a useful and full-featured dataset, and the text is presented as the source data, the cleaned data and separate Arabic words with their stems, roots and morphologies, among other forms. In addition, the model shows how information can be selected and extracted from this dataset.


Information retrieval Natural Language Processing Database Data Analysis Text Mining Arabic Text.