To search, Click below search items.


All Published Papers Search Service


Collecting SMT Language Model Training Data for Low Source Language


Mamtily Nighmat and Izumi Yamamoto


Vol. 17  No. 11  pp. 103-107


Statistical machine translation (SMT) system basically relies on parallel corpus [1]. Different than Rule based Machine translation (RBMT) approach, capability of SMT system almost depends under the size of corpus. The quality of corpus became a key to build better SMT translation system. In this work, parallel corpus [2] in three languages translated to Uyghur language one by one manually evaluated and applied as train data for Uyghur language model. As a conclusion, comparison between parallel corpus in different grammatical structure language and similar structure language has been discussed.


Machine Translation, SMT, Parallel Corpus.