To search, Click below search items.

 

All Published Papers Search Service

Title

An Empirical Study of the Effectiveness of N-gram Modeling in Probabilistic Chinese Word Segmentation

Author

Zhiming Xu, Qiang Wang, Jonathan J. Webster, Chunyu Kit

Citation

Vol. 6  No. 1  pp. 81~86

Abstract

This paper presents an empirical study of the effectiveness of n-gram models in Chinese word segmentation. A number of experiments were carried out on the corpora used for the First International Chinese Word Segmentation Bakeoff (ISWB-1) [1]. This research aims to investigate how factors such as model order, training methods and smoothing techniques affect the performance of statistical word segmentation. Experimental results show that (1) supervised training with manual segmented corpora outperforms, as expected, self-supervised training using un-segmented corpora, and (2) higher-ordered n-gram models do not, unexpectedly, improve segmentation performance. The negative effect of higher model order is particularly significant with self-supervised training.

Keywords

Chinese word segmentation, n-gram, language model

URL