contact | site map | imprint           deutsch | italiano 22.11.2008
Logo EURAC  
  on this institute    
       About us    
       Projects    
       Publications    
       Partner    
  NEWS ARCHIVE    
      Events    
      Education courses    
      On research    
      New print releases    
      Job openings    
SITE SEARCH  
 

Gymn@zilla 
Home  |  Research departments  |  Applied Linguistics  |  Institute for Specialised Communication and Multilingualism  |  Projects  |  Gymn@zilla  

Gymn@zilla supports browsing a local document repository and the Internet by dynamically creating and annotating HTML and PDF documents with open dictionary resources. Gymn@zilla is written in Perl. It is an online application running on a Linux web server. Its architecture thus guarantees the usage of free and powerful modules. The main submodules of Gymn@zilla handle (1) the mirroring of web pages, (2) the linguistic processing, (3) the processing and selection of images and (4) the generation of exercises.
Mirroring of web pages is done by using Perl's LWP modules. Hyperlinks in a web page are rewritten to Gymn@zilla's URL in order to allow continuous browsing with Gymn@zilla. The original URL is encoded as a CGI-parameter. Links to multimedia documents such as audio, video and graphic files are preserved.
Once converted, the documents language is guessed and the best matching support language (L1) is selected. The text is segmented into its tokens, which is not trivial for East-Asian languages. For the annotation of inflected word forms a stemming of these forms is performed by the use of pattern matching techniques. According to the user's preferences, the text is then annotated with translations and terminological information. The annotation is done by insertion of <a> -tags with advanced link titles in JavaScript containing the information which will show up when the user moves the mouse onto it.
The dictionaries which Gymn@zilla includes are mostly taken from the Internet (eg. the Chinese cedict dictionary), or provided by our research partners (eg. the Russian dictionary from the Laboratory of Computational Linguistics at IPPI at the Russian Academy of Sciences). All dictionaries are transformed in an XML-structure which feature the lemma and optionally grammatical indicators, the translation, pronunciation features and notes.
In order to improve the quality of the annotation, attempts will be made in the future to classify the documents by comparing the character n-grams of the document to those of specific dictionaries. Part-of-speech tagging and word sense disambiguation will be explored in order to avoid notoriously incorrect annotations.
Each user in Gymn@zilla is associated with a session. This information is then used to make private editable wordlists in the form of simple XML documents. XSLT-transformations are then used to generate quizzes for training.

For more in-depth information on technological issues of Gymn@zilla we refer to our scientific publications about the project.
 

last update 16.10.2008


CONTACT  
   Tel.+39 0471 055 111
 Fax+39 0471 055 199
 contact
 

  more about Gymnzilla
   


general information

more technical information

newer versions (external site)

 

  contacts
   


Judith Knapp  

Oliver Streiter

Mathias Stuflesser

 
 
Copyright © EURAC 2008 Send page Print page Top of page