Institut für Angewandte Sprachforschung - Projects - DIDI

DIDI

Digital Natives - Digital Immigrants. Schreiben auf Social Network Sites: Eine korpusunterstützte Sprachbeobachtung des aktuellen Sprachgebrauchs in Südtirol unter besonderer Berücksichtigung des Alters

Deutsch

English
Italiano

Project duration: May 2013 - December 2019
Project status: finished
Funding:
Provincial P.-L.P. 14. Research projects (Province BZ funding /Project)
Total project budget: 200.392,20 €
Institute: Institut für Angewandte Sprachforschung

Das Forschungsvorhaben widmete sich dem Schreiben im privaten Bereich. Ziel des Projektes war es die Strategien der Nutzer von Online-Netzwerken aus einer auf die Provinz Südtirol eingeschränkten Perspektive zu beobachten und zu untersuchen, wie "User" neuer Medien mit ihnen relativ informell, schnell und teilweise nahezu simultan schriftlich kommunizieren. In dialektal geprägten Regionen des deutschen Sprachgebietes ist die Verwendung des Dialekts in einem für die Standardsprache reservierten Medium ein auffallender, aber nicht der einzige interessante Aspekt. Die Frage, inwieweit Menschen unterschiedlichen Alters gleichermaßen über das Internet kommunizieren, stellte dabei den Schwerpunkt der Untersuchung dar. Anhand von Texten, die im Internet auf Social Network Sites (SNS) entstanden sind, wurde untersucht, wie die deutsche Sprache in ihrer Standard- und dialektalen Variante von Südtiroler Nutzern zu kommunikativen Zwecken in geschriebener Form verwendet wird. Wir untersuchen, welche Besonderheiten sich bei der Sprachverwendung in den Neuen Medien feststellen lassen. Ziel ist es dabei, (a) den aktuellen Sprachgebrauch des Deutschen mittels neuer Medien in Südtirol in einem frei zugänglichen Korpus zu dokumentieren (https://commul.eurac.edu/annis/didi) und (b) auf die Kompetenzen der Schreibenden einzugehen.

The DiDi Corpus

The DiDi corpus has an overall size of around 650.000 Tokens gathered from 136 South Tyrolean Facebook users who participated in the DiDi project. It consists of 11.102 Facebook wall posts, 6.507 wall comments and 22.218 private messages. All messages were written by the participants throughout the year 2013. Please read the fulldescription of the corpus for further details. Please consider also the description of the method of data collection and the full description of the DiDi project and its research questions.

As every participant could offer either his/her private messages, his/her texts on the wall or both, the corpus comprises wall posts and wall comments from 130 profiles and private messages of 56 profiles; 50 participants granted access to both types of data. Free access to the corpus is given to the wall posts and comments. Due to privacy issues the access to the private messages is restricted. Access to the private messages can be given for scientific research only, after signing a non-disclosure agreement. In case you are interested in the data for scientific reasons, please contact the research team.

All texts were anonymised in order to guarantee that the participants' identity cannnot be infered from the texts. The anonymisation included person names, group names, geographical names and adjectival references, institution names, hyperlinks, mail addresses, phone numbers, numbers of bank accounts, servers, postal codes and other private information. Please, read the anonymisation document for the anonymisation keys.

The corpus offers a vast range of research opportunities for linguists that are interested in CMC in general, and more specific in multilingual language use, the use of regional varieties, code switching, code shifting and code mixing phenomena, etc.

Access to the DiDi corpus via ANNIS: https://commul.eurac.edu/annis/didi

Corpus download via Eurac Research Clarin Centre: https://clarin.eurac.edu/

Publications

Das DiDi‐Korpus: Internetbasierte Kommunikation aus Südtirol
Glaznieks A, Frey JC (2020)
Buchkapitel
Deutsch in Sozialen Medien

https://doi.org/10.1515/9783110679885-019

https://hdl.handle.net/10863/15720

Using Data Mining to Repurpose German Language Corpora. An evaluation of data-driven analysis methods for corpus linguistics
Frey J (2020)
Doktorarbeit (PhD)

https://hdl.handle.net/10863/17321

DIDI - The DiDi Corpus of South Tyrolean CMC 1.0.0
Frey JC, Glaznieks A, Stemle EW (2019)
Datenbank

Weitere Informationen: http://hdl.handle.net/20.500.12124/7

How FAIR are CMC Corpora?
König A, Frey JC, Stemle EW (2019)
Vortrag

Conference: 7th Conference on CMC and Social Media Corpora for the Humanities (cmccorpora19) | Cergy-Pontoise | 9.9.2019 - 10.9.2019

https://hdl.handle.net/10863/11295

Comparison of Automatic vs. Manual Language Identification in Multilingual Social Media Texts
Frey JC, Stemle E, Doğruöz AS (2019)
Buchkapitel
Building computer-mediated communication corpora for socio-linguistic analysis

https://hdl.handle.net/10863/10130

How FAIR are CMC corpora?
Frey JC, König A, Stemle E (2019)
Beitrag in Konferenzband

Conference: 7th Conference on CMC and Social Media Corpora for the Humanities (cmccorpora19) | Cergy-Pontoise | 9.9.2019 - 10.9.2019

Weitere Informationen: https://cmccorpora19.sciencesconf.org/data/pages/proceedings ...

https://hdl.handle.net/10863/11294

Das DiDi-Korpus: internetbasierte Kommunikation aus Südtirol
Frey J, Glaznieks A (2019)
Vortrag

Conference: 55. Jahrestagung des Instituts für Deutsche Sprache | Mannheim | 12.3.2019 - 14.3.2019

https://hdl.handle.net/10863/13382

The myth of the Digital Native? Analysing language use of different generations in Facebook
Frey JC, Glaznieks A (2018)
Beitrag in Konferenzband

Der plurilinguale Sprecher in Facebook. Neue Medien und Pluriliteracy in Südtirol
Frey JC (2018)
Vortrag

Conference: 4th LRI Workshop for young academics "Language Policy - Language Use - Language Standard" | Meran | 7.6.2018 - 8.6.2018

Becoming a multilingual speaker. New Media and pluriliteracy in South Tyrol
Frey JC (2018)
Vortrag

Conference: Round table "Social Net(work)s in Education and Language Sciences" | Heidelberg | 15.6.2018 - 15.6.2018

Pluriliteracy on Social Media. The Multilingual Practices of South Tyroleans on Facebook
Frey JC (2018)
Vortrag

Conference: Language, Identity and Education in Multilingual Contexts | Dublin | 2.2.2018 - 4.2.2018

The myth of the Digital Native: Analysing language use of different generations on Facebook
Frey JC, Glaznieks A (2018)
Vortrag

Conference: 6th Conference on CMC and Social Media Corpora for the Humanities (cmccorpora18) | Antwerp | 17.9.2018 - 18.9.2018

Sociolinguistic research using the DiDi corpus of South Tyrolean CMC: From corpus-based research designs to computational linguistic challenges
Frey CF, Stemle EW, Glaznieks A (2018)
Vortrag

Conference: 44. Österreichische Linguistiktagung 2018 (ÖLT2018) | Innsbruck | 26.10.2018 - 28.10.2018

Experteninterview: We viel "Emojion" verträgt unsere Sprache?
Abel A, Frey JC (2018)
Zeitung
Zett: Die Zeitung am Sonntag

Dialekt als Norm? Zum Sprachgebrauch Südtiroler Jugendlicher auf Facebook
Glaznieks A, Frey JC (2018)
Buchkapitel
Jugendsprachen/Youth Languages: Aktuelle Perspektiven internationaler Forschung/Current Perspectives of International Research

https://doi.org/10.1515/9783110472226-038

https://hdl.handle.net/10863/7699

The Myth of the Digital Native: Analysing language use of different generations on Facebook
Frey JC, Glaznieks A (2018)
Beitrag in Konferenzband

Conference: 6th Conference on CMC and Social Media Corpora for the Humanities (cmccorpora18) | Antwerp | 17.9.2018 - 18.9.2018

Weitere Informationen: https://www.uantwerpen.be/images/uantwerpen/container49896/f ...

https://hdl.handle.net/10863/8093

Think Global, Write Local – Patterns of Writing Dialect on SNS
Glaznieks A (2017)
Vortrag

Geschriebener Dialekt in Südtiroler Facebooktexten
Glück A, Glaznieks A (2017)
Vortrag

A data mining approach to digital age
Frey J (2017)
Forlì
Vortrag

Conference: DIT Postgraduate Research Workshop | Forlì | 6.7.2016 - 6.7.2016

Think Global, Write Local: Patterns of Writing Dialect on SNS
Glaznieks A (2017)
Beitrag in Konferenzband

https://doi.org/10.5281/zenodo.1041851

https://hdl.handle.net/10863/7939

Proceedings of the 5th Conference on CMC and Social Media Corpora for the Humanities
Stemle E, Wigham C (2017)
Bolzano: Eurac Research
Buch (Herausgeber)

Weitere Informationen: https://zenodo.org/record/1040875

https://doi.org/10.5281/zenodo.1040875

https://hdl.handle.net/10863/6510

Connecting Resources: Which Issues have to be Solved to Integrate CMC Corpora from Heterogeneous Sources and for Different Languages?
Beißwenger M, Wigham CR, Etienne C, Fišer D, Suárez HG, Herzberg L, Hinrichs E, Horsmann T, Karlova-Bourbonus N, Lemnitzer L, Longhi J, Lüngen H, Ho-Dac L, Parisse C, Poudat C, Schmidt T, Stemle E, Storrer A, Zesch T (2017)
Bolzano, Italy
Beitrag in Konferenzband
Proceedings of the 5th Conference on CMC and Social Media Corpora for the Humanities

Weitere Informationen: https://zenodo.org/record/1041877

https://doi.org/10.5281/zenodo.1041877

https://hdl.handle.net/10863/7942

DiDi Corpus
Stemle EW (2017)
Duisburg, Germany
Vortrag

Conference: Integrating a new type of language resource into the Digital Humanities landscape| French-German colloquium on standards for corpora of computer-mediated communication | Duisburg : 19.6.2017 - 20.6.2017

Weitere Informationen: https://sites.google.com/view/dhcmc2017/

https://hdl.handle.net/10863/9186

Mehrsprachigkeit auf Südtirols Social-Media-Profilen
Frey J (2016)
Bozen/Bolzano
Vortrag

Conference: Work in Progress Linguistics Colloquium Eurac Research/Free University of Bolzano | Bozen | 11.6.2015 - 11.6.2015

The DiDi Corpus of South Tyrolean CMC Data: A multilingual corpus of Facebook texts
Frey J, Glaznieks A, Stemle EW (2016)
Naples
Vortrag

Conference: Third Italian Conference on Computational Linguistics (CliC-it 2016) | Naples | 5.12.2016 - 6.12.2016

DiDi: A multilingual corpus of non-public South Tyrolean computer-mediated communication
Frey J (2016)
Lancaster
Vortrag

Conference: UCREL Summer School in corpus-based NLP | | 10.7.2016 - 15.7.2016

The DiDi Corpus of South Tyrolean CMC Data: A multilingual corpus of Facebook texts
Frey J, Glaznieks A, Stemle EW (2016)
Naples
Beitrag in Konferenzband

Conference: Third Italian Conference on Computational Linguistics (CliC-it 2016) | Naples | 5.12.2016 - 6.12.2016

Weitere Informationen: http://ceur-ws.org/Vol-1749/paper27.pdf

https://hdl.handle.net/10863/8949

"Bitte deutsch schreiben!" Multilingual and diglossic - a linguistic description of South Tyrolean Facebook users
Glaznieks A, Frey JC (2015)
Vortrag

Conference: Multilingualism in the Digital Age | Reading | 19.6.2015 - 19.6.2015

The DiDi Corpus of South Tyrolean CMC Data
Frey J, Glaznieks A, Stemle EW (2015)
Essen
Vortrag

Conference: 2nd Workshop of the Natural Language Processing for Computer-Mediated Communication / Social Media| NLP4CMC at GSCL 2015 | Essen : 28.9.2015 - 29.9.2015

The DiDi Project: Collecting, Annotating, and Analysing South Tyrolean Data of Computer-mediated Communication.
Stemle EW (2015)
Rennes
Vortrag

Conference: ird-cmc-rennes | International Research Days: Social Media and CMC Corpora for the eHumanities | Rennes : 23.10.2015 - 24.10.2015

Weitere Informationen: http://ird-cmc-rennes.sciencesconf.org/

https://hdl.handle.net/10863/9187

The DiDi Corpus of South Tyrolean CMC Data
Frey J, Glaznieks A, Stemle EW (2015)
Essen
Beitrag in Konferenzband

Conference: 2nd Workshop of the Natural Language Processing for Computer-Mediated Communication / Social Media| NLP4CMC at GSCL 2015 | Essen : 28.9.2015 - 29.9.2015

https://hdl.handle.net/10863/8928

Zum Projekt DiDi - Digital Natives - Digital Immigrants
Frey J (2014)
Bozen/Bolzano
Rundfunk-TV

Wie schreibt Südtirol auf Facebook?
Frey JC (2014)
Vortrag

Conference: 1. LRI Workshop "Sprache - Region - Identität in der computervermittelten Kommunikation | Meran | 13.6.2014 - 14.6.2014

Code-Switching on Facebook Wall Posts of Bilingual German-speaking South Tyroleans
Stuckey N, Frey J (2014)
Vienna
Vortrag

Conference: 41. Österreichische Linguistiktagung (ÖLT 2014), Universität Wien | Vienna | 6.12.2014 - 8.12.2014

Collecting language data of non-public social media profiles
Frey J, Glaznieks A, Stemle EW (2014)
Hildesheim
Vortrag

Conference: Workshop “NLP 4 CMC| Natural Language Processing for Computer-Mediated Communication / Social Media” at the 12th edition of KONVENS | Hildesheim : 8.10.2014 - 10.10.2014

Collecting language data of non-public social media profiles
Frey J, Stemle EW, Glaznieks A (2014)
Hildesheim: Universitatsverlag Hildesheim, Germany
Beitrag in Konferenzband

Conference: Workshop “NLP 4 CMC| Natural Language Processing for Computer-Mediated Communication / Social Media” at the 12th edition of KONVENS | Hildesheim : 8.10.2014 - 10.10.2014

Weitere Informationen: http://www.uni-hildesheim.de/konvens2014/data/konvens2014-wo ...

https://hdl.handle.net/10863/8891

The Project DIDI. Writing on Social Network Sites – A Corpus-based Observation of the Current Language Use in South Tyrol, with Particular Consideration of the Writers' Age
Glaznieks A, Stemle EW (2013)
Dortmund
Vortrag

The Project DIDI. Writing on Social Network Sites – A Corpus-based Observation of the Current Language Use in South Tyrol, with Particular Consideration of the Writers’ Age. Talk at the international workshop "Building Corpora of Computer-Mediated Communi
Glaznieks A, Stemle EW (2013)
Dortmund
Vortrag

Conference: International Workshop "Building Corpora of Computer-Mediated Communication| Issues, Challenges, and Perspectives" | Dortmund : 14.2.2013 - 15.2.2013

Herausforderungen bei der automatischen Verarbeitung von dialektalen IBK-Daten
Glaznieks A, Stemle EW (2013)
Darmstadt
Vortrag

Weitere Informationen: https://www.researchgate.net/publication/259344920_Herausfor ...

Our partners

1 - 1

Südtiroler Kulturinstitut

Project Team

1 - 5

Aivars Glaznieks

Project Manager

Nicole Stuckey

Team Member

Egon W. Stemle

Team Member

Jennifer-Carmen Frey

Team Member

Andrea Abel

Team Member

Projects

1 - 9

Project

ITACA

Textkohärenz in der italienischen Bildungssprache

Duration: September 2020 - December 2024Funding: Provincial P.-L.P. 14. Research ...

Project

ConsTerm 2.0

Wissenschaftliche Unterstützung im Bereich Terminologie

Duration: May 2019 - December 2024Funding: Public institutions (Other projects ...

Project

Zeit.shift

Zeit.shift – digital in gesterns Zukunft: Bewahrung, Erschließung und Vermittlung des kulturellen ...

Duration: September 2020 - June 2023Funding: Italy-Austria 2014-2020 (EUTC / EU ...

Project

LCI

Lernerkorpusinfrastruktur

Duration: December 2015 - December 2024Funding: Internal funding EURAC (Project)

Project

SSL

Übersetzungen und Terminologiearbeit im Bereich Arbeitsschutz

Duration: September 2013 - September 2020Funding: Public institutions (Other projects ...

Project

enetCollect

Europäisches Netzwerk zur Verknüpfung von Sprachlernangeboten und Techniken des Crowdsourcing

Duration: March 2017 - September 2021Funding: COST (EU funding / Project)

Project

SMS 2.0

Sprachenvielfalt macht Schule 2.0

Duration: December 2018 - December 2024Funding: Internal funding EURAC (Project)

Project

MT@BZ

Maschinelle Übersetzung an Südtiroler Institutionen (Pilotstudie)

Duration: May 2021 - December 2023Funding: Internal funding EURAC (Project)

Scientific Advisory Board

Contact

Service	Decline/Accept	Terms Link
Youtube		https://support.google.com/youtube/answer/7671399?...
Vimeo		https://vimeo.com/terms
Flourish		https://flourish.studio/terms/
Datawrapper		https://www.datawrapper.de/terms
Power BI		https://www.microsoft.com/en/servicesagreement/
Facebook		https://www.facebook.com/policies/cookies/
Instagram		https://help.instagram.com/1896641480634370?ref=ig
X		https://help.twitter.com/en/rules-and-policies/twi...
Google Maps		https://maps.google.com/help/terms_maps/
LinkedIn		https://www.linkedin.com/legal/cookie-policy

DIDI

The DiDi Corpus

Südtiroler Kulturinstitut

Projects

Project

ITACA

Project

ConsTerm 2.0

Project

Zeit.shift

Project

LCI

Project

SSL

Project

enetCollect

Project

SMS 2.0

Project

MT@BZ

Project

CLARIN-IT-ER

view all

Institute's Projects

Science Shots Eurac Research Newsletter

My cookie preferences

External content

Technical
These cookies are necessary for the website to function and cannot be switched off.