contact | site map | imprint           6.7.2008
Logo EURAC  
  NEWS ARCHIVE    
      Events    
      Education courses    
      On research    
      New print releases    
      Job openings    
SITE SEARCH  
 

The for Irish language from 700 AD to the present day - part II 
Home  |  Focus  |  Linguistic Corpora for all  |  The for Irish language from 700 AD to the present day - part II  

Terminology work and corpora in Irish

NTD (National Terminology Database): Online terminology and LSP database, incorporating a subcorpus of legal terminology and text at www.focal.ie

As noted above, we do not yet have corpora specifically compiled for terminological purposes. However, as part of the project described here there are  plans to develop a subcorpus of technical texts covering all areas except legal terminology. Legal terminology may be found in the parallel texts of the Acts of the Irish Government at www.achtanna.ie.

This National Terminilogy Database project was initiated in 2003 as a joint venture between Foras na Gaeilge and FIONTAR (Dublin City University). The terminology stock was provided in the main by Foras na Gaeilge and the subcorpus of legal material was provided by the Government Translation Section. FIONTAR provided the technical expertise and designed and implemented the database. The database was officially launched in 2006. Editorial work is ongoing and further technical developments are planned for the resource.

The aim of the project was to establish an online database of all readily available authoritative terminology in the Irish language as a resource to support the educational sector, the media, translators, public bodies and the general public, as well as to assist terminologists working in the Foras na Gaeilge terminology section in terminology management and provision.

The source material was in various formats at the outset of the project. Ten of the dictionaries had already been published in book format and online (see www.acmhainn.ie). A further ten dictionaries had been available in book format only and were scanned prior to the instigation of the project. This scanned material was in need of editing at the outset. Eight other collections were at earlier stages of development, consisting of draft-lists which were either being assessed by subcommittees within the Foras na Gaeilge terminology section or had been compiled with a view to having them assessed by sucommittees at a future time. Five other dictionaries were acquired for the database with the permission of their editors. Other sources include lists of unpublished terminology which had been compiled for textbooks.
A significant amount of material consisted of lists of miscellaneous terminology based on terminological enquiries dealt with in the terminology office, the largest of which was a compilation from the years 1985-2000 which had already been edited and issued online on the acmhainn.ie Web site. Further miscellaneous lists  were now included in the new database. The contents of an earlier database instigated in the early nineteen nineties and containing some 15,000 terms were also included.

The result was that, while it was very helpful to have all these sources centralised for the first time, a huge harmonization task emerged and is still ongoing. The harmonization issues include synonymy and polysemy and require a great deal of disambiguation work. For instance, the various meanings of the English verb 'perform' have not been abstracted in the Irish language, resulting in various verbs being used in different contexts. Therefore, we have disambiguated them so that in the domain of the Arts, we find (of artistic endeavour) gníomhaigh, cuir i ngníomh; (drama) cuir i láthair, léirigh; (music) seinn, buail; (performing arts in general) taibhigh; (poem, literary piece) aithris, reic, léigh; (recite, relate story) aithris, ársaigh, eachtraigh, inis, reic, trácht ar; while in Sport we find (of level of achievement) cruthaigh, dean; in Administration (duty, legal requirement) comhlíon; (do, carry out task) feidhmigh, déan; in Education (of level of achievement) cruthaigh, déan; (of language) léirigh; (of role) gníomhaigh; in Mechanics (of machine, car, computer) oibrigh, feidhmigh, rith; and in Economics (of economy) feidhmigh.
 
Other issues range from inadequate metadata to irregularities in spelling and grammar. Some grammatical questions are proving to be quite challenging due to the rapidly changing state of the spoken language and lack of consensus concerning certain grammatical points in standard reference books. This difficult and labour-intensive work is being carried out by the FIONTAR team in consultation with the Terminology Committee at Foras na Gaeilge. In the meantime, additional sources are being input which at times entail further harmonization work.

With a view to assisting users through seeing terminology in context, it has been decided to develop a link between the terminology database and the NCI (New Corpus for Ireland). This part of the project is still in the planning stage. The proposal is to insert hyperlinks beside all terms in the database which can be found in the corpus so that the user can click and find a range of examples from the corpus. For instance, while the verb 'cruthaigh' generally means to create or to prove, the following example - taken from the corpus - translates as 'Is it not a wonder that he didn't perform better at school?': "Nach é an t-iontas é nár chruthaigh sé níos fearr ar scoil?", while the following, a line from a song addressed to a well-loved boat, "Is nár mhaith a chruthaigh tú i gceann trí chúrsa" translates as 'And didn't you perform well after three circuits'.

Since the corpus does not contain parallel texts and the Irish-language section contains only Irish, the hyperlinks will be placed beside Irish terms only in the database, but added value will accrue to both resources since not only will a bridge be built from one resource to the other but also from one language to the other. Users will be in a position to access the database through entering a term in English and, if an equivalent Irish term appears with a hyperlink, they will then be able to see examples from the corpus. For example, the user will be able to enter 'perform' in the database, see the various verbs associated with it and then click on any one of those verbs to view examples such as those quoted above in relation to the verb 'cruthaigh'.

It is envisaged that not all Irish terms in the database currently exist in the corpus. In order to enhance the likelihood of finding examples of more technical terms, it is proposed to develop a subcorpus of technical texts building on what already exists and adding further technical material.
Enquiries about this project may be sent to tearmai@forasnagaeilge.ie.

21.11.07

Fidelma Ní Ghallchobhair


 
   


The for Irish language from 700 AD to the present day - part I

 
 
Copyright © EURAC 2008 Send page Print page Top of page