To search, Click below search items.

 

All Published Papers Search Service

Title

An Evaluation on Performance different metrics on extraction of Persian-English Parallel sentences

Author

Amin Keshavarzi, Marziyeh Homayouni

Citation

Vol. 16  No. 7  pp. 167-171

Abstract

Machine translation (MT) is automated translation. It is the process by which computer software is used to translate a text from one natural language (such as English) to another (such as Persain). MT systems are highly dependent on the amount of training data. Through past years, different methods have been proposed to extract parallel sentences from the web or available corpora. In this paper we have presented a method to create Persian-English comparable corpus from Wikipedia articles and extract parallel sentences from that. In order to create a Persian-English comparable corpus we have used WordNet to classify and extract similar articles in Wikipedia. Also we have evaluated the performance of different calssification algorithms in extracting Persian-English parallel sentences. Experimental results show the efficiency of the proposed approach in comparison with the other state of the art methods. This approach is language independent and it could be applied to other language pairs that have enough Wikipedia sources.

Keywords

Parallel sentences, Comparable Corpus, Wikipedia, Information Retrieval, Statistical Machine Translation

URL

http://paper.ijcsns.org/07_book/201607/20160719.pdf