Data Mining in Corpus Linguistics

  • Deutsch
  • English
  • Italiano
  • Project duration: September 2016 - December 2021
  • Project status:
    In preparation

Within her PhD project, “Data-Mining in Corpus Linguistics”, Jennifer-Carmen Frey aims at bridging between the field of computer science and linguistics, exploring recent methods of data-mining and their value for corpus linguistic research. In an exploratory case study, state-of-the-art machine-learning based approaches to data analysis are explored for their applicability to corpus linguistics and evaluated via prototypical implementations on existing corpus research.  The central questions of the approach, namely if data-mining methods are able to a) generate (and therefore verify) existing research results and b) lead the linguist to further linguistically interesting patterns emerging from the data, are addressed within a couple of case studies on available, non-standard corpora. The results of the work, an evaluation and discussion on the potential and the restrictions of corpus-driven data-mining approaches, as well as the provision of the adapted implementations as ready-to-use plug-ins for widely-used corpus software, will show how and if data-mining techniques can serve general corpus linguistic research.

Using Data Mining to Repurpose German Language Corpora. An evaluation of data-driven analysis methods for corpus linguistics
Frey J (2020)
PhD thesis
Lexikalische Komplexität im Kontext holistischer Textbewertungen
Frey JC (2020)

Conference: Mehrsprachigkeit und Lernerkorpora | Bolzano | 13.2.2020 - 13.2.2020


Comparison of Automatic vs. Manual Language Identification in Multilingual Social Media Texts
Frey JC, Stemle E, Doğruöz AS (2019)
Contribution in book
Building computer-mediated communication corpora for socio-linguistic analysis


The myth of the Digital Native? Analysing language use of different generations in Facebook
Frey JC, Glaznieks A (2018)
Conference proceedings article
The Myth of the Digital Native: Analysing language use of different generations on Facebook
Frey JC, Glaznieks A (2018)
Conference proceedings article

Conference: 6th Conference on CMC and Social Media Corpora for the Humanities (cmccorpora18) | Antwerp | 17.9.2018 - 18.9.2018

More information: https://www.uantwerpen.be/images/uantwerpen/container49896/f ...


Was wir bewerten, wenn wir Schülertexte bewerten: Menschliche Bewertungen und digitale Zugänge zu ihren empirischen Spuren
Frey JC (2018)

Conference: Expertenworkshop MIT.Qualität | Mannheim | 18.6.2018 - 19.6.2018

The myth of the Digital Native: Analysing language use of different generations on Facebook
Frey JC, Glaznieks A (2018)

Conference: 6th Conference on CMC and Social Media Corpora for the Humanities (cmccorpora18) | Antwerp | 17.9.2018 - 18.9.2018

Sociolinguistic research using the DiDi corpus of South Tyrolean CMC: From corpus-based research designs to computational linguistic challenges
Frey CF, Stemle EW, Glaznieks A (2018)

Conference: 44. Österreichische Linguistiktagung 2018 (ÖLT2018) | Innsbruck | 26.10.2018 - 28.10.2018

Measuring Text Quality in the Digital Age: The Project “MIT.Qualität”
Glaznieks A, Linthe M, Frey JC (2018)

Conference: 1st Literary Summit | Porto | 1.11.2018 - 3.11.2018

A data mining approach to digital age
Frey J (2017)

Conference: DIT Postgraduate Research Workshop | Forlì | 6.7.2016 - 6.7.2016

DiDi: A multilingual corpus of non-public South Tyrolean computer-mediated communication
Frey J (2016)

Conference: UCREL Summer School in corpus-based NLP | | 10.7.2016 - 15.7.2016

Project Team
1 - 1


1 - 9


Coherence in academic Italian

Duration: September 2020 - October 2021Funding:
Provincial P.-L.P. 14. Research projects (Province BZ funding /Project)

view all

Institute's Projects

Eurac Research logo

Eurac Research is a private research center based in Bolzano (South Tyrol) with researchers from a wide variety of scientific fields who come from all over the globe. Together, through scientific knowledge and research, they share the goal of shaping the future.

What we do

Our research addresses the greatest challenges facing us in the future: people need health, energy, well-functioning political and social systems and an intact environment. These are complex questions, and we are seeking the answers in the interaction between many different disciplines. In so doing, our research work embraces three major themes: regions fit for living in, diversity as a life-enhancing feature, a healthy society.

Great Place To Work
ISO 9001 / 2015ISO 9001:2015
ISO 27001ISO 27001:2013
ORCID Member

In order to give you a better service this site uses cookies. Additionally third party cookies are used. By continuing to browse the site you are agreeing to our use of cookies.

Privacy Policy