contact | site map | imprint           18.3.2010
Logo EURAC  
  NEWS ARCHIVE    
      Events    
      Education courses    
      On research    
      New print releases    
      Job openings    
SITE SEARCH  
 

The learners' language 
Home  |  Focus  |  Language technologies  |  The learners' language  

Corpus linguistics is now a well-established discipline within the general framework of linguistic studies. Its findings contribute to a number of branches of learning, from general language studies to lexicography, collecting a large amount of empirical data to shed light on the various aspects of language varieties.
Moreover in the last fifteen years corpus linguistic has been enriched with a particular type of data collection dealing with the language of a particular type of speakers: the learners.
Lerner corpora are the result of the methods and technologies used by the traditional corpus linguistics applied to the study of that particular variety of language that is the interlanguage, i.e. a particular form of idiom, set between first and foreign language, spoken by learners.

Why should we build a learner corpus?
The objectives of a learner corpus change according to who makes it, how it is made and what are the materials it is based on. Depending on whether it is implemented on research purposes by a university, by institutions for teachers' training, by publishing companies specialised in didactic materials or by international organisms, it can be given to different aims and therefore different structures.
The three main users of a learner corpus are language teachers, linguists and applied linguists. For linguists learner corpora are a source of lively data displaying a particular variety of language that can be compared to other varieties, for instance simplified language of learners can be likened to the informal written variety of native speakers. Moreover a part of speech tagged corpus makes it easier to draw a corpus-based sequence of the morphosyntactic development of a particular language feature, comparing different mother tongue students and different levels.
Applied linguists will then develop more effective syllabi with didactic materials and exercises especially devised to practise the most difficult features of a language and what is more new prospects will be revealed in the field of computer assisted self-study.
Learner corpora are now popular and well-known among teachers, especially in the English as Foreign Language field, but there is still a lot to do to introduce them in the didactic practice of Italian as a foreign language.

The example of VALICO
VALICO (Varietà di Apprendimento della Lingua Italiana: Corpus Online, i.e. 'Online Corpus of the Learning Varieties of the Italian Language') is meant to be a tool serving to all kinds of users, in particular one of its main goals is to involve teachers to the active implementation of the corpus, not only collecting texts, but also experimenting corpus-based exercises.
Since among the possible outcomes of a learner corpus, both from a theoretical and a practical point of view, there is the detection of the most frequent learners' errors it is important to rely on a set of sociolinguistic data to determine the learners' profiles. Therefore the architecture of VALICO enables the user to query the texts selecting the learners' age, gender, proficiency in Italian, knowledge of other languages, mother tongue…), but it contains information about the textual type as well and most of all it makes it possible to identify the group the one or the other student belongs to.
In order to compare learners coming from different countries it is necessary to give them a common stimulus to elicit the texts, in such a way researchers will be able to measure the proficiency of a student up to the one of another student starting from a common and partly predictable basis.

As far as the contents of VALICO are concerned, it is a twofold corpus: on the one hand it is a general survey of what teachers of Italian around the world ask their students to write, on the other hand it contains a more structured collection of data elicited on the base of comic strips.
So for instance we can query the corpus according to the Part Of Speech, e.g. the use of imperfect tense, in German learners of 20 years old, who have been studying Italian for three years, who can also speak English and have written a text describing a comic strip without using any dictionary. And then we can ask the same query changing the mother tongue of the learners and compare the two different groups.
The implementation of VALICO involved a number of university students who helped in the upload of the data. They specialised in transcribing the hand-written texts and therefore they became aware of the possible kinds of errors a learner can make, but also they were forced to think about the causes of those errors and they were invited to reflect on the variation in vocabulary use, on syntactic structures, on avoiding strategies and on the concept of linguistic norm and acceptability. This was an extremely useful training activity for future teachers of Italian as a foreign language, as once they will be on the other side of the desk they will be able to manage some phenomena they have already got in contact with.

28.01.09

Elisa Corino, University of Torino, Italy


 
Copyright © EURAC 2010 Send page Print page Top of page