To search, Click below search items.


All Published Papers Search Service


Classification of documents based on contents using the n-gram method of MNB model


Junaina Jamil Najim Aldin AL-Bayati


Vol. 15  No. 10  pp. 17-21


Nowadays the large number of documents needs to classify by content not only by the name of the document, this research focused on the Arabic documents due to more complication. This study aims to apply this classification technique for files management to raise the level of organization and retrieval of files. In this system of study, number of ways had been utilized to increase the performance of the (MNB) or what called the multinomial na?ve bays classification tool, improved the multinomial na?ve Bayes model by using the n-gram. Document data was selected as consecutive pairs of keywords which called the bi-gram, or as three consecutive keywords which called the tri-gram, or as four of consecutive keywords which called the 4-gram, by used of the n-grams the classification performance was increased. In this system, we found that the bi-grams were the most efficient process in the MNB model.


Document classification, MNB model, n-gram, training document, testing document, recall and precision