To search, Click below search items.

 

All Published Papers Search Service

Title

Towards A New Token Based Framework for Record Linkage in Arabic Data Set

Author

Hesham H. Abdel Ghafour, Ali El-Bastawissy, Abdelfatah A. Hegazy

Citation

Vol. 11  No. 6  pp. 146-151

Abstract

Record linkage is the process of identifying if two records represent the same real entity or not. Record Linkage is one of the most important and most investigated issue in data quality literature. Most of the current researches have been applied on English context and these researches didn¡¯t mention the required modifications in order to be applicable in other contexts like Arabic context. Applying record linkage algorithms on Arabic context is a challenging task due to the unique characteristics of Arabic language in terms of its morphological and orthographical features. This paper proposed a token based framework for record linkage in Arabic data set. In our framework we use a new technique for Arabic name tokenization and use a new approach for similarity computation.

Keywords

Arabic Data Cleaning, Data Quality, Duplicate Detection, Data warehouse, Entity Resolution, Record Linkage, Object Identification, String Similarity

URL

http://paper.ijcsns.org/07_book/201106/20110622.pdf