​​​​​​​​​​​​​​​​​​Youtube.jpgFacebook_2015.pngTwitter.jpg​​​​Linkedin​JOBS​​​​CALLS​​​​​​​​PR​ESS​​​​​​RESTRICT​ED AREA


Digital Natives - Digital Immigrants. Writing on Social Network Sites: a corpus-based observation of the current language use in South Tyrol, with particular consideration of the writers' age Description of the project DiDi Corpus Publications, Presentations, Events Contacts Description of the project

In the project DiDi we have analysed the linguistic strategies employed by users of social network sites (SNS). The data analysis focused on South Tyrolean users and we investigated how they communicate with each other. In regions of the German speaking area where dialect is frequently used in different communicative contexts, regional and social codes are often also used in written computer mediated communication. Another interesting but more general aspect of the new media is connected to the emerging linguistic and social practices (new literacy). One of the main research questions in DiDi was whether people of different age use language on SNS in a similar way or in an age-specific manner.

The purpose of the study was:

  1. to record the contemporary language use of South Tyrolean German in the new media (cf. the DiDi Corpus)
  2. to describe the everyday usage of language of South Tyrolean SNS users with L1 German with respect to their choice of languages and varieties as well as with respect to their usage of specific cmc phenomena.

Please see the publications for detailed descriptions of the project and its results.

The project was financed by the Autonome Provinz Bozen - Südtirol, Abteilung Bildungsförderung, Universität und Forschung, Landesgesetz vom 13. Dezember 2006, Nr. 14 ''Forschung und Innovation" / Provincia autonoma di Bolzano - Alto Adige, Ripartizione Diritto allo studio, università e ricerca scientifica, Legge provinciale 13 dicembre 2006, n. 14 ''Ricerca e innovazione''

DiDi Corpus

The DiDi corpus has an overall size of around 650.000 Tokens gathered from 136 South Tyrolean Facebook users who participated in the DiDi project. It consists of 11.102 Facebook wall posts, 6.507 wall comments and 22.218 private messages. All messages were written by the participants throughout the year 2013. Please read the fulldescription of the corpus for further details. Please consider also the description of the method of data collection and the full description of the DiDi project and its research questions.

As every participant could offer either his/her private messages, his/her texts on the wall or both, the corpus comprises wall posts and wall comments from 130 profiles and private messages of 56 profiles; 50 participants granted access to both types of data. Free access to the corpus is given to the wall posts and comments. Due to privacy issues the access to the private messages is restricted. Access to the private messages can be given for scientific research only, after signing a non-disclosure agreement. In case you are interested in the data for scientific reasons, please contact the research team.

All texts were anonymised in order to guarantee that the participants' identity cannnot be infered from the texts. The anonymisation included person names, group names, geographical names and adjectival references, institution names, hyperlinks, mail addresses, phone numbers, numbers of bank accounts, servers, postal codes and other private information. Please, read the anonymisation document for the anonymisation keys.

The corpus offers a vast range of research opportunities for linguists that are interested in CMC in general, and more specific in multilingual language use, the use of regional varieties, code switching, code shifting and code mixing phenomena, etc.

Access to the DiDi corpus: https://commul.eurac.edu/annis/didi


Please consider the following documents regarding the DiDi corpus:

Description of the DiDi corpus (pdf)

Description of the anonymisation tags (pdf)

Description of the annotation layers (pdf)

Description of the annotions on the layer cmc (pdf)

Description of the metadata (pdf)


Please, consider also the following documents regarding the DiDi project:

Description of the DiDi project and its research questions (pdf)

Description of the method of data collection: (pdf)

Publications, Presentations, Events


Glaznieks, Aivars & Egon W. Stemle (2014): Challenges of building a cmc corpus for analyzing writer's style by age: The DiDi project. In: Journal of Language Technology and Computational Linguistics 29 (2), 31-57, Special Issue on Building and Annotating Corpora of Computer-Mediated Communication: Issues and Challenges at the Interface of Corpus and Computational Linguistics, Michael Beißwenger, Nelleke Oostdijk, Angelika Storrer & Henk van den Heuvel (eds). (pdf)

Frey, Jennifer-Carmen, Egon W. Stemle & Aivars Glaznieks (2014): Collecting language data of non-public social media profiles. In: Workshop Proceedings of the 12th Edition of the KONVENS Conference, Gertrud Faaß & Josef Ruppenhofer (eds.), Hildesheim: Universitätsverlag, 11-15. (pdf)

Frey, Jennifer-Carmen, Aivars Glaznieks & Egon W. Stemle (2015): The DiDi Corpus of South Tyrolean CMC Data. In: 2nd Workshop of the Natural Language Processing for Computer-Mediated Communication / Social Media. Proceedings of the Workshop, University of Duisburg-Essen, September 28, 2015.Michael Beißwenger & Torsten Zesch (eds.), 1-6. (pdf)

Frey, Jennifer-Carmen, Aivars Glaznieks & Egon W. Stemle (2016): The DiDi Corpus of South Tyrolean CMC Data: A multilingual corpus of Facebook texts. In: Proceedings of the Third Italian Conference on Computational Linguistics CLiC-it 2016. 5-6 December 2016, Napoli, Anna Corazza, Simone Montemagni, & Giovanni Semeraro, (eds).Torino: Academia University Press, 157-161. (pdf)

Glaznieks, Aivars. (2017). Think Global, Write Local – Patterns of Writing Dialect on SNS. In: Proceedings of the 5th Conference on CMC and Social Media Corpora for the Humanities (cmccorpora17). 3-4 Oct 2017, Bolzano, Italy. (pdf)

Glaznieks, Aivars & Jennifer-Carmen Frey (accepted): Dialekt als Norm? Zum Sprachgebrauch Südtiroler Jugendlicher auf Facebook. In: Jugendsprachen. Aktuelle Perspektiven internationaler Forschung, Arne Ziegler (ed.). Berlin: de Gruyter.



Glaznieks, Aivars (2017): Think global, write local – patterns of writing dialect on SNS. Keynote Talk at 5th CMC Corpora Conference, 3-4 October 2017, Eurac Research Bolzano.

Glück, Alexander & Aivars Glaznieks (2017): Geschriebener Dialekt in Südtiroler Facebooktexten. 8. Kolloquium Forum Sprachvariation der Internationalen Gesellschaft für Dialektologie des Deutschen (IGDD) und 6. Nachwuchskolloquium des Vereins für niederdeutsche Sprachforschung (VndS), 4-6 October 2017, University of Hamburg.

Glaznieks, Aivars (2016): Dialekt als Norm? Zum Sprachgebrauch Südtiroler Jugendlicher auf Facebook. International conference „Jugendsprachen 2016: Variation – Dynamik – Kontinuität“, 26.-28. May 2016, University of Graz.

Stemle, Egon W. (2015): The DiDi Project: Collecting, Annotating, and Analysing South Tyrolean Data of Computer-mediated Communication. International Research Days "Social Media and CMC Corpora for the eHumanities", 23.-24. October 2015, Université Rennes 2.

Frey, Jennifer-Carmen / Aivars Glaznieks & Egon W. Stemle (2015): The DiDi Corpus of South Tyrolean CMC Data. 2. workshop NLP 4 CMC at the International Conference of the German Society for Computational Linguistics and Language Technology (GSCL 2015), 29. September 2015, University of Duisburg-Essen.

Glaznieks, Aivars & Jennifer-Carmen Frey (2015): Variation und Konsistenz der Dialektschreibung in der Südtiroler Facebook-Kommunikation. 50. Linguistisches Kolloquium "Sprache verstehen, verwenden, übersetzen", 3.-5. September 2015, University of Innsbruck.

Glaznieks, Aivars & Jennifer-Carmen Frey (2015): "Bitte deutsch schreiben!" Multilingual and diglossic – a linguistic description of South Tyrolean Facebook users. International conference "Multilingualism in the Digital Age", 19. June 2015, University of Reading.

Glaznieks, Aivars (2015): The DiDi Project: Collecting and Analyzing German data of Computer-mediated Communication. Digital Humanities Seminar, 29. January 2015, Fondazione Bruno Kessler Povo.

Stuckey, Nicole & Jennifer-Carmen Frey (2014): Code-Switching on Facebook Wall Posts of Bilingual German-Speaking South Tyroleans. Workshop "Empirische Methoden der Variationslinguistik" at ÖLT (41. Österreichische Linguistiktagung), 7. December 2014, Vienna University of Economics and Business.

Frey, Jennifer-Carmen / Aivars Glaznieks & Egon W. Stemle (2014): Collecting language data of non-public social media profiles. 1. workshop NLP 4 CMC at KONVENS2014, 6. October 2014, University of Hildesheim.

Glaznieks, Aivars & Jennifer-Carmen Frey (2014): Wie schreibt Südtirol auf Facebook? 1. Linguistic Colloquium LRI "Lingua, regione e identità nella comunicazione mediata dal computer", 13.-14. June 2014, Villa San Marco, Merano.

Glaznieks, Aivars / Egon W. Stemle & Jennifer-Carmen Frey (2014): Nützlichkeit der Normalisierung von Dialekt für automatische Verarbeitungsschritte. Erfahrungen aus dem Projekt DiDi. 7. workshop of the scientific DFG network Empirikom "Social Media Corpora for the eHumanities: Standards, Challenges, and Perspectives", 19.-21. February 2014, TU Dortmund.

Glaznieks, Aivars & Egon W. Stemle (2013): Herausforderungen bei der automatischen Verarbeitung von dialektalen IBK-Daten. Workshop "Verabeitung und Annotation von Sprachdaten aus Genres internetbasierter Kommunikation" at the International Conference of the German Society for Computational Linguistics and Language Technology (GSCL 2013), 23. September 2013, TU Darmstadt.

Abel, Andrea & Aivars Glaznieks (2013): Schulaufsätze der Generation Facebook: Schreiben wie man postet? Convegno "Generation Facebook - Sprache und Soziale Medien", 1. March 2013, EURAC Bolzano.

Glaznieks, Aivars & Egon W. Stemle (2013): The Project DIDI. Writing on Social Network Sites – A Corpus-based Observation of the Current Language Use in South Tyrol, with Particular Consideration of the Writers’ Age. Workshop "Building Corpora of Computer-Mediated Communication: Issues, Challenges, and Perspectives", 13.-15. February 2013, TU Dortmund.



Round table discussion Jung und Alt - Schreiben in Zeiten von Facebook.
Participants: Dr. Eva Cescutti, Dr. Aivars Glaznieks, Anne-Bärbel Köhle und Carla Thuile
Moderator: Monika Obrist
10 November 2015, 20:00
Biblioteca Provinciale Dr. Friedrich Teßmann, Armando-Diaz-Straße 8, Bozen

Seminar: "Sprache, Region und Identität in der computervermittelten Kommunikation", 13.-14. June 2014, Villa San Marco Merano. Organised by the Institute for Specialised Communication and Multilingualism of EURAC, the Center for Language Studies of the Free University of Bolzano and the Graduate School Language & Literature of the Ludwig Maximilian-University of Munich.

Convegno: "Generation Facebook - Sprache und Soziale Medien", 1. March 2013, EURAC Bolzano. Organised by Sprachstelle im Südtiroler Kulturinstitut, the Institute for Specialised Communication and Multilingualism of EURAC, by the Katholischer Südtiroler Lehrerbund and by the Gesellschaft für Deutsche Sprache (Bolzano).

Aivars Glaznieks
+39 0471 055 139
Jennifer-Carmen Frey
+39 0471 055 136
Egon W. Stemle
+39 0471 055 129
Drususallee 1/Viale Druso 1
Tel. +39 0471 055 100
Fax. +39 0471 055 199
​​​​​​​​​​​​​​​​​​​​​QUICK LINKS



Viale Druso, 1 / Drususallee 1
39100 Bolzano / Bozen - Italy
Tel: +39 0471 055 055
Fax: +39 0471 055 099
Email: info@eurac.edu
Partita IVA: 01659400210
Newsletter​           Privacy
Host of the Alpine Convention