To search, Click
below search items.
|
|

All
Published Papers Search Service
|
Title
|
An Empirical Study of the Effectiveness of N-gram Modeling in Probabilistic Chinese Word Segmentation
|
Author
|
Zhiming Xu, Qiang Wang, Jonathan J. Webster, Chunyu Kit
|
Citation |
Vol. 6 No. 1 pp. 81~86
|
Abstract
|
This paper presents an empirical study of the effectiveness of n-gram models in Chinese word segmentation. A number of experiments were carried out on the corpora used for the First International Chinese Word Segmentation Bakeoff (ISWB-1) [1]. This research aims to investigate how factors such as model order, training methods and smoothing techniques affect the performance of statistical word segmentation. Experimental results show that (1) supervised training with manual segmented corpora outperforms, as expected, self-supervised training using un-segmented corpora, and (2) higher-ordered n-gram models do not, unexpectedly, improve segmentation performance. The negative effect of higher model order is particularly significant with self-supervised training.
|
Keywords
|
Chinese word segmentation, n-gram, language model
|
URL
|
|
|