DIDI

Digital Natives - Digital Immigrants. Writing on Social Network Sites: a corpus-based observation of the current language use in South Tyrol, with particular consideration of the writers' age

  • Deutsch
  • English
  • Italiano
DIDI
  • Project duration: -
  • Project status: finished
  • Funding:
    Provincial P.-L.P. 14. Research projects (Province BZ funding /Project)
  • Total project budget: €200,392.20
  • Institute: Institute for Applied Linguistics

In the project DiDi we have analysed the linguistic strategies employed by users of social network sites (SNS). The data analysis focused on South Tyrolean users and we investigated how they communicate with each other. In regions of the German speaking area where dialect is frequently used in different communicative contexts, regional and social codes are often also used in written computer mediated communication. Another interesting but more general aspect of the new media is connected to the emerging linguistic and social practices (new literacy). One of the main research questions in DiDi was whether people of different age use language on SNS in a similar way or in an age-specific manner.

The purpose of the study was:
1. to record the contemporary language use of South Tyrolean German in the new media (cf. the DiDi Corpus)
2. to describe the everyday usage of language of South Tyrolean SNS users with L1 German with respect to their choice of languages and varieties as well as with respect to their usage of specific cmc phenomena.

Please see the "Publications" for detailed descriptions of the project and its results.

The DiDi Corpus

The DiDi corpus has an overall size of around 650.000 Tokens gathered from 136 South Tyrolean Facebook users who participated in the DiDi project. It consists of 11.102 Facebook wall posts, 6.507 wall comments and 22.218 private messages. All messages were written by the participants throughout the year 2013. Please read the fulldescription of the corpus for further details. Please consider also the description of the method of data collection and the full description of the DiDi project and its research questions.

As every participant could offer either his/her private messages, his/her texts on the wall or both, the corpus comprises wall posts and wall comments from 130 profiles and private messages of 56 profiles; 50 participants granted access to both types of data. Free access to the corpus is given to the wall posts and comments. Due to privacy issues the access to the private messages is restricted. Access to the private messages can be given for scientific research only, after signing a non-disclosure agreement. In case you are interested in the data for scientific reasons, please contact the research team.

All texts were anonymised in order to guarantee that the participants' identity cannnot be infered from the texts. The anonymisation included person names, group names, geographical names and adjectival references, institution names, hyperlinks, mail addresses, phone numbers, numbers of bank accounts, servers, postal codes and other private information. Please, read the anonymisation document for the anonymisation keys.

The corpus offers a vast range of research opportunities for linguists that are interested in CMC in general, and more specific in multilingual language use, the use of regional varieties, code switching, code shifting and code mixing phenomena, etc.

Access to the DiDi corpus via ANNIS: https://commul.eurac.edu/annis/didi

Corpus download via Eurac Research Clarin Centre: https://clarin.eurac.edu/

Publications
Das DiDi‐Korpus: Internetbasierte Kommunikation aus Südtirol
Glaznieks A, Frey JC (2020)
Contribution in book
Deutsch in Sozialen Medien

https://doi.org/10.1515/9783110679885-019

https://hdl.handle.net/10863/15720

Using Data Mining to Repurpose German Language Corpora. An evaluation of data-driven analysis methods for corpus linguistics
Frey J (2020)
PhD thesis

https://hdl.handle.net/10863/17321

DIDI - The DiDi Corpus of South Tyrolean CMC 1.0.0
Frey JC, Glaznieks A, Stemle EW (2019)
Database

More information: http://hdl.handle.net/20.500.12124/7

How FAIR are CMC Corpora?
König A, Frey JC, Stemle EW (2019)
Presentation/Speech

Conference: 7th Conference on CMC and Social Media Corpora for the Humanities (cmccorpora19) | Cergy-Pontoise | 9.9.2019 - 10.9.2019

https://hdl.handle.net/10863/11295

Comparison of Automatic vs. Manual Language Identification in Multilingual Social Media Texts
Frey JC, Stemle E, Doğruöz AS (2019)
Contribution in book
Building computer-mediated communication corpora for socio-linguistic analysis

https://hdl.handle.net/10863/10130

How FAIR are CMC corpora?
Frey JC, König A, Stemle E (2019)
Conference proceedings article

Conference: 7th Conference on CMC and Social Media Corpora for the Humanities (cmccorpora19) | Cergy-Pontoise | 9.9.2019 - 10.9.2019

More information: https://cmccorpora19.sciencesconf.org/data/pages/proceedings ...

https://hdl.handle.net/10863/11294

Das DiDi-Korpus: internetbasierte Kommunikation aus Südtirol
Frey J, Glaznieks A (2019)
Presentation/Speech

Conference: 55. Jahrestagung des Instituts für Deutsche Sprache | Mannheim | 12.3.2019 - 14.3.2019

https://hdl.handle.net/10863/13382

The myth of the Digital Native? Analysing language use of different generations in Facebook
Frey JC, Glaznieks A (2018)
Conference proceedings article
Der plurilinguale Sprecher in Facebook. Neue Medien und Pluriliteracy in Südtirol
Frey JC (2018)
Presentation/Speech

Conference: 4th LRI Workshop for young academics "Language Policy - Language Use - Language Standard" | Meran | 7.6.2018 - 8.6.2018

Becoming a multilingual speaker. New Media and pluriliteracy in South Tyrol
Frey JC (2018)
Presentation/Speech

Conference: Round table "Social Net(work)s in Education and Language Sciences" | Heidelberg | 15.6.2018 - 15.6.2018

Pluriliteracy on Social Media. The Multilingual Practices of South Tyroleans on Facebook
Frey JC (2018)
Presentation/Speech

Conference: Language, Identity and Education in Multilingual Contexts | Dublin | 2.2.2018 - 4.2.2018

The myth of the Digital Native: Analysing language use of different generations on Facebook
Frey JC, Glaznieks A (2018)
Presentation/Speech

Conference: 6th Conference on CMC and Social Media Corpora for the Humanities (cmccorpora18) | Antwerp | 17.9.2018 - 18.9.2018

Sociolinguistic research using the DiDi corpus of South Tyrolean CMC: From corpus-based research designs to computational linguistic challenges
Frey CF, Stemle EW, Glaznieks A (2018)
Presentation/Speech

Conference: 44. Österreichische Linguistiktagung 2018 (ÖLT2018) | Innsbruck | 26.10.2018 - 28.10.2018

Experteninterview: We viel "Emojion" verträgt unsere Sprache?
Abel A, Frey JC (2018)
Newspaper
Zett: Die Zeitung am Sonntag
Dialekt als Norm? Zum Sprachgebrauch Südtiroler Jugendlicher auf Facebook
Glaznieks A, Frey JC (2018)
Contribution in book
Jugendsprachen/Youth Languages: Aktuelle Perspektiven internationaler Forschung/Current Perspectives of International Research

https://doi.org/10.1515/9783110472226-038

https://hdl.handle.net/10863/7699

The Myth of the Digital Native: Analysing language use of different generations on Facebook
Frey JC, Glaznieks A (2018)
Conference proceedings article

Conference: 6th Conference on CMC and Social Media Corpora for the Humanities (cmccorpora18) | Antwerp | 17.9.2018 - 18.9.2018

More information: https://www.uantwerpen.be/images/uantwerpen/container49896/f ...

https://hdl.handle.net/10863/8093

Think Global, Write Local – Patterns of Writing Dialect on SNS
Glaznieks A (2017)
Presentation/Speech
Geschriebener Dialekt in Südtiroler Facebooktexten
Glück A, Glaznieks A (2017)
Presentation/Speech
A data mining approach to digital age
Frey J (2017)
Forlì
Presentation/Speech

Conference: DIT Postgraduate Research Workshop | Forlì | 6.7.2016 - 6.7.2016

Think Global, Write Local: Patterns of Writing Dialect on SNS
Glaznieks A (2017)
Conference proceedings article

https://doi.org/10.5281/zenodo.1041851

https://hdl.handle.net/10863/7939

Proceedings of the 5th Conference on CMC and Social Media Corpora for the Humanities
Stemle E, Wigham C (2017)
Bolzano: Eurac Research
Edited book

More information: https://zenodo.org/record/1040875

https://doi.org/10.5281/zenodo.1040875

https://hdl.handle.net/10863/6510

Connecting Resources: Which Issues have to be Solved to Integrate CMC Corpora from Heterogeneous Sources and for Different Languages?
Beißwenger M, Wigham CR, Etienne C, Fišer D, Suárez HG, Herzberg L, Hinrichs E, Horsmann T, Karlova-Bourbonus N, Lemnitzer L, Longhi J, Lüngen H, Ho-Dac L, Parisse C, Poudat C, Schmidt T, Stemle E, Storrer A, Zesch T (2017)
Bolzano, Italy
Conference proceedings article
Proceedings of the 5th Conference on CMC and Social Media Corpora for the Humanities

More information: https://zenodo.org/record/1041877

https://doi.org/10.5281/zenodo.1041877

https://hdl.handle.net/10863/7942

DiDi Corpus
Stemle EW (2017)
Duisburg, Germany
Presentation/Speech

Conference: Integrating a new type of language resource into the Digital Humanities landscape| French-German colloquium on standards for corpora of computer-mediated communication | Duisburg : 19.6.2017 - 20.6.2017

More information: https://sites.google.com/view/dhcmc2017/

https://hdl.handle.net/10863/9186

Mehrsprachigkeit auf Südtirols Social-Media-Profilen
Frey J (2016)
Bozen/Bolzano
Presentation/Speech

Conference: Work in Progress Linguistics Colloquium Eurac Research/Free University of Bolzano | Bozen | 11.6.2015 - 11.6.2015

The DiDi Corpus of South Tyrolean CMC Data: A multilingual corpus of Facebook texts
Frey J, Glaznieks A, Stemle EW (2016)
Naples
Presentation/Speech

Conference: Third Italian Conference on Computational Linguistics (CliC-it 2016) | Naples | 5.12.2016 - 6.12.2016

DiDi: A multilingual corpus of non-public South Tyrolean computer-mediated communication
Frey J (2016)
Lancaster
Presentation/Speech

Conference: UCREL Summer School in corpus-based NLP | | 10.7.2016 - 15.7.2016

The DiDi Corpus of South Tyrolean CMC Data: A multilingual corpus of Facebook texts
Frey J, Glaznieks A, Stemle EW (2016)
Naples
Conference proceedings article

Conference: Third Italian Conference on Computational Linguistics (CliC-it 2016) | Naples | 5.12.2016 - 6.12.2016

More information: http://ceur-ws.org/Vol-1749/paper27.pdf

https://hdl.handle.net/10863/8949

"Bitte deutsch schreiben!" Multilingual and diglossic - a linguistic description of South Tyrolean Facebook users
Glaznieks A, Frey JC (2015)
Presentation/Speech

Conference: Multilingualism in the Digital Age | Reading | 19.6.2015 - 19.6.2015

The DiDi Corpus of South Tyrolean CMC Data
Frey J, Glaznieks A, Stemle EW (2015)
Essen
Presentation/Speech

Conference: 2nd Workshop of the Natural Language Processing for Computer-Mediated Communication / Social Media| NLP4CMC at GSCL 2015 | Essen : 28.9.2015 - 29.9.2015

The DiDi Project: Collecting, Annotating, and Analysing South Tyrolean Data of Computer-mediated Communication.
Stemle EW (2015)
Rennes
Presentation/Speech

Conference: ird-cmc-rennes | International Research Days: Social Media and CMC Corpora for the eHumanities | Rennes : 23.10.2015 - 24.10.2015

More information: http://ird-cmc-rennes.sciencesconf.org/

https://hdl.handle.net/10863/9187

The DiDi Corpus of South Tyrolean CMC Data
Frey J, Glaznieks A, Stemle EW (2015)
Essen
Conference proceedings article

Conference: 2nd Workshop of the Natural Language Processing for Computer-Mediated Communication / Social Media| NLP4CMC at GSCL 2015 | Essen : 28.9.2015 - 29.9.2015

https://hdl.handle.net/10863/8928

Zum Projekt DiDi - Digital Natives - Digital Immigrants
Frey J (2014)
Bozen/Bolzano
Radio-TV
Wie schreibt Südtirol auf Facebook?
Frey JC (2014)
Presentation/Speech

Conference: 1. LRI Workshop "Sprache - Region - Identität in der computervermittelten Kommunikation | Meran | 13.6.2014 - 14.6.2014

Code-Switching on Facebook Wall Posts of Bilingual German-speaking South Tyroleans
Stuckey N, Frey J (2014)
Vienna
Presentation/Speech

Conference: 41. Österreichische Linguistiktagung (ÖLT 2014), Universität Wien | Vienna | 6.12.2014 - 8.12.2014

Collecting language data of non-public social media profiles
Frey J, Glaznieks A, Stemle EW (2014)
Hildesheim
Presentation/Speech

Conference: Workshop “NLP 4 CMC| Natural Language Processing for Computer-Mediated Communication / Social Media” at the 12th edition of KONVENS | Hildesheim : 8.10.2014 - 10.10.2014

Collecting language data of non-public social media profiles
Frey J, Stemle EW, Glaznieks A (2014)
Hildesheim: Universitatsverlag Hildesheim, Germany
Conference proceedings article

Conference: Workshop “NLP 4 CMC| Natural Language Processing for Computer-Mediated Communication / Social Media” at the 12th edition of KONVENS | Hildesheim : 8.10.2014 - 10.10.2014

More information: http://www.uni-hildesheim.de/konvens2014/data/konvens2014-wo ...

https://hdl.handle.net/10863/8891

The Project DIDI. Writing on Social Network Sites – A Corpus-based Observation of the Current Language Use in South Tyrol, with Particular Consideration of the Writers' Age
Glaznieks A, Stemle EW (2013)
Dortmund
Presentation/Speech
The Project DIDI. Writing on Social Network Sites – A Corpus-based Observation of the Current Language Use in South Tyrol, with Particular Consideration of the Writers’ Age. Talk at the international workshop "Building Corpora of Computer-Mediated Communi
Glaznieks A, Stemle EW (2013)
Dortmund
Presentation/Speech

Conference: International Workshop "Building Corpora of Computer-Mediated Communication| Issues, Challenges, and Perspectives" | Dortmund : 14.2.2013 - 15.2.2013

Herausforderungen bei der automatischen Verarbeitung von dialektalen IBK-Daten
Glaznieks A, Stemle EW (2013)
Darmstadt
Presentation/Speech

More information: https://www.researchgate.net/publication/259344920_Herausfor ...

Our partners
1 - 1
  • Südtiroler Kulturinstitut

Project Team
1 - 5
Nicole Stuckey

Nicole Stuckey

Team Member

Projects

1 - 9
Project

ITACA

Coherence in academic Italian

Duration: - Funding: Provincial P.-L.P. 14. Research ...

view all

Institute's Projects

Institute