Data Mining in Corpus Linguistics
01.10.2016 - 30.09.2019
Within her PhD project, “Data-Mining in Corpus Linguistics”, Jennifer-Carmen Frey aims at bridging between the field of computer science and linguistics, exploring recent methods of data-mining and their value for corpus linguistic research. In an exploratory case study, state-of-the-art machine-learning based approaches to data analysis are explored for their applicability to corpus linguistics and evaluated via prototypical implementations on existing corpus research. The central questions of the approach, namely if data-mining methods are able to a) generate (and therefore verify) existing research results and b) lead the linguist to further linguistically interesting patterns emerging from the data, are addressed within a couple of case studies on available, non-standard corpora. The results of the work, an evaluation and discussion on the potential and the restrictions of corpus-driven data-mining approaches, as well as the provision of the adapted implementations as ready-to-use plug-ins for widely-used corpus software, will show how and if data-mining techniques can serve general corpus linguistic research.