contact | site map | imprint           19.3.2010
Logo EURAC  
  NEWS ARCHIVE    
      Events    
      Education courses    
      On research    
      New print releases    
      Job openings    
SITE SEARCH  
 

The African varieties of Portoguese 
Home  |  Focus  |  Language technologies  |  The African varieties of Portoguese  

Portuguese is the official language of 5 African countries: Cape Verde, Guinea-Bissau, S. Tome and Principe, Angola and Mozambique, although it is mainly spoken as a second language.
In Cape Verde, Guinea-Bissau and S. Tome and Principe, Creole languages emerged and are widely used, accounting for the fact that Portuguese is spoken by a minority, while in Angola and Mozambique, where there is a large diversity of Bantu languages and no Creole, Portuguese has come to establish itself as assuring national unity.

These African varieties of Portuguese show properties which differ from European Portuguese, although these are not systematic among the speakers and are still features of emerging grammars.

Compared with the quantity of empirical studies on European Portuguese and Brazilian Portuguese, developed from corpora (electronic collections of written texts and orthographically transcribed recordings of speech, representative of a language or a language variety) or lexicons, the shortage of studies on the Portuguese spoken in Africa is mostly due to the lack of similar resources.
The case of Mozambique is, however, an exception, since a spoken corpus had been compiled and several studies were undertaken and published.

Recently, 5 comparable corpora of the 5 African varieties of Portuguese have been compiled and are available for online query. The 5 corpora, which constitute the Africa Corpus, are around 640,000 words each and have the same percentage of spoken and written subparts (c. 25,000 spoken words and c. 615,000 written words).

The written subpart is divided in newspapers (50%), literature (20%) and miscellaneous (26%).  The 5 corpora are thus comparable in size, in chronology and in broad types and genres (to the exception of the miscellaneous subpart which includes texts from very different text types, like literary or social magazines, computer policies, tourism information, etc.).

The spoken subpart of the corpus is constituted by recordings of interviews which were later orthographically transcribed in text files. The corpora have been automatically annotated with information regarding word category and inflection (noun, verb, tense, etc) and lemma (an arbitrarily form chosen from the possible inflected forms of the word, e.g. the masculine singular for adjectives).

The analysis of the corpus showed that in most written texts the African varieties of Portuguese show little differences from the European norm, while the transcriptions of spoken register differ in several aspects, the most obvious one being the absence of concordance between the subject and the verb and between the lexical elements occurring inside the nominal phrase.

For the lexical analysis, 5 lexicons were extracted (one from each corpus) and were then compared and treated statistically, in the form of contrastive lists, with data regarding frequency and distribution (available for online query).


The common vocabulary of the 5 corpora has been extracted, as well as vocabulary specific to each variety, although non-occurring words in one variety ought to be validated when larger corpora become available. The common lexicon to the 5 varieties (26%) has a relatively low percentage but it contains high frequency words and match 91.75% of the total word forms in the corpus.

The words occurring in just one of the corpora (37%) have low frequencies and many do not occur in European Portuguese (verified in dictionaries) but they are the result of processes of lexical formation with radicals and affixes available in the European variety (e.g. verbs formed with the prefix des- 'un-': destrabalhar 'to unwork', desinventar 'to uninvent').

Verb complementation, at the lexicon-syntax interface, is another area where corpus data show diverging tendencies from the European standard, like, for example, indirect or prepositional objects in European Portuguese occurring as direct ones (leading to double object constructions, a possibility excluded in European Portuguese), as well as the opposite; different prepositions introducing verb complements and pronominal constructions occurring as non-pronominal ones.

These corpora have made it possible to observe that the general tendencies already described for Portuguese spoken in Mozambique are also present in the other African varieties, although idiosyncrasies are to be found. Most of the properties where the African varieties of Portuguese differ from the European norm are still emergent and show strong variation. The compilation of more resources and their analysis over time will make it possible to establish more stable tendencies of language change across the varieties and inside each variety.

14.01.09

Amália Mendes, Centro de Linguística da Universidade de Lisboa


 
Copyright © EURAC 2010 Send page Print page Top of page