To search, Click below search items.


All Published Papers Search Service


An Evaluation on Performance different metrics on extraction of Persian-English Parallel sentences


Amin Keshavarzi, Marziyeh Homayouni


Vol. 16  No. 7  pp. 167-171


Machine translation (MT) is automated translation. It is the process by which computer software is used to translate a text from one natural language (such as English) to another (such as Persain). MT systems are highly dependent on the amount of training data. Through past years, different methods have been proposed to extract parallel sentences from the web or available corpora. In this paper we have presented a method to create Persian-English comparable corpus from Wikipedia articles and extract parallel sentences from that. In order to create a Persian-English comparable corpus we have used WordNet to classify and extract similar articles in Wikipedia. Also we have evaluated the performance of different calssification algorithms in extracting Persian-English parallel sentences. Experimental results show the efficiency of the proposed approach in comparison with the other state of the art methods. This approach is language independent and it could be applied to other language pairs that have enough Wikipedia sources.


Parallel sentences, Comparable Corpus, Wikipedia, Information Retrieval, Statistical Machine Translation