IJCSNS - International Journal of Computer Science and Network Security

To search, Click below search items.

All Published Papers Search Service

Title	An Evaluation on Performance different metrics on extraction of Persian-English Parallel sentences
Author	Amin Keshavarzi, Marziyeh Homayouni
Citation	Vol. 16 No. 7 pp. 167-171
Abstract	Machine translation (MT) is automated translation. It is the process by which computer software is used to translate a text from one natural language (such as English) to another (such as Persain). MT systems are highly dependent on the amount of training data. Through past years, different methods have been proposed to extract parallel sentences from the web or available corpora. In this paper we have presented a method to create Persian-English comparable corpus from Wikipedia articles and extract parallel sentences from that. In order to create a Persian-English comparable corpus we have used WordNet to classify and extract similar articles in Wikipedia. Also we have evaluated the performance of different calssification algorithms in extracting Persian-English parallel sentences. Experimental results show the efficiency of the proposed approach in comparison with the other state of the art methods. This approach is language independent and it could be applied to other language pairs that have enough Wikipedia sources.
Keywords	Parallel sentences, Comparable Corpus, Wikipedia, Information Retrieval, Statistical Machine Translation
URL	http://paper.ijcsns.org/07_book/201607/20160719.pdf