contact | site map | imprint           18.3.2010
Logo EURAC  
  NEWS ARCHIVE    
      Events    
      Education courses    
      On research    
      New print releases    
      Job openings    
SITE SEARCH  
 

English-Persian Parallel corpus as a Translation Aid 
Home  |  Focus  |  Language technologies  |  English-Persian Parallel corpus as a Translation Aid  

In recent years, corpus linguistics has provided new potentials for use in many applications in language sciences.
The translator's workplace has changed dramatically over the last ten years or so, and today the computer is undoubtedly the single most important tool of the trade for a translator regardless of whether he or she is a literary translator working for a small publisher, a technical translator working for a translation agency or a legal translator.
A corpus is simply defined as a large collection of linguistic evidence, mainly naturally occurring data either written texts or a transcription of recorded speech. Recently, large monolingual, comparable and parallel corpora have played a very crucial role in solving various problems of computational linguistics as well as translation. Before the present stage of ICT (Information and Communication Technology) development, corpora were hardly available to translators in order to gain information about language, content, and translation practices. But now with continuous overload in translation work and massive production of translated texts corpus resources available to translators aroused an increased interest in their construction and use.


Constructing and Using Bilingual Parallel Corpus

In order to examine the superiority of corpus-based language analyses over traditional methods and compare the two kinds of translational resources, namely, bilingual parallel corpora and conventional bilingual dictionaries we had to construct our English-Persian parallel corpus.
Compiling bilingual corpora for high density languages such as English or French are very extensive and the results are very encouraging due to easy accessibility of the texts in these languages in digital forms including Websites. However, when a low or medium density language such as Persian comes to be one of the languages involved in a bilingual corpus, the case is much more difficult due to shortage of digitally stored materials as well as detectable parallel pages in World Wide Web.
Our developmental English-Persian parallel corpus consists of about three million words (more than 50,000 corresponding sentences in two languages). This is a kind of ongoing corpus, that is, an open corpus in which more material can be added as the need arises.
One of the main consequences of building such a corpus is to develop software for parallel concordancing in which a user can enter a search string in one language, and see all citations for that string in the search language as well as corresponding sentences in the target language.

Different Applications of Parallel Corpora

Although the range of applications of parallel aligned corpora in language sciences are wide, in this paper we only deal with some of the main applications of such corpora within the field of human translation.
One of the main applications of parallel corpora is to find different possible equivalents of certain words or collocations. That is, aligned translation units are simply displayed on the screen, offering the translator a range of similar contexts from a corpus of past translations. Usually finding appropriate and natural equivalent for different types of collocations is a difficult task especially in non-native language, and parallel corpora can be of great help in this respect.
In this connection, we prepared a set of one hundred English collocations and tried to find their appropriate Persian equivalents in the corpus. As far as bilingual dictionaries in most cases provide us with translational equivalents of single words and not collocations, bilingual corpora are considered as great help in this respect. That is, a parallel corpus is used to confirm the translation equivalent for a certain collocation where the majority of instances offer the same thing.
In some cases we may refer to a parallel corpus to verify, reject or supplement the equivalent(s) provided by bilingual dictionaries since it is believed that parallel corpora provide information that bilingual dictionaries do not usually contain. Using a bilingual dictionary for selecting a translation equivalent, the translator will decide about the appropriateness of different possible equivalents based of their definitions or a few examples given by the dictionary, while a parallel corpus offers the best possible translation equivalent based on natural evidences gained from past translations.

Conclusion

The method of using parallel corpora in finding translational equivalents for collocations not only has a great effect on improving the quality of translations produced by human translators, but also can be directly applied in machine translation systems. The other main potential of such corpora is to search for units above word level like collocations and phraseological units to extract correspondences between languages and make terminological databases. This further task can easily be realized with constructing specialized monolingual and bilingual corpora. It is hoped that our translators become more familiar with the valuable potentials of different types of corpora in their works. The suggestion here is that modern technologies such as corpora and concordancing software should find their proper place in translator workbenches, and this ideal can be achieved provided that more corpus resources are accessible to the translators. Practical courses introduced by the translation trainers at the universities can be of great help in this respect.

10.12.08

Tayebeh Mosavi Miangah
English Language Department
Payame Noor University of Yazd, Yazd, Iran


 
Copyright © EURAC 2010 Send page Print page Top of page