Digital language infrastructure for South Tyrol
- Project duration: June 2022 - December 2024
- Project status: Approval by the Scientific Committee
- Funding: Internal funding EURAC (Project)
All over the world language data and technology are produced daily by universities, schools, GLAM institutions, news agencies, non-academic organizations, administrative bodies and more besides. While these resources typically serve individual needs, if made accessible in standardized formats they could stand to benefit much larger initiatives. The Institute for Applied Linguistics seeks to establish a language services research hub and observatory, the linguaXlab, to maximize access and (re)use of the linguistic data and technology produced by, and relating to, South Tyrol, and facilitate their exchange across the region between different user groups. Drawing from the expertise of the entire Institute, among other linguistic services the linguaXlab will offer support with data archiving, search functionality, terminological databases, machine learning, information extraction and data FAIRification to stakeholders from different backgrounds (e.g., the cultural heritage sector, public administration or the medical field) to collectively advance language research and development for South Tyrol and beyond.
How well-resourced are South Tyrolean languages? Are these linguistic resources FAIR, used and sufficiently representative of regional diversity?
Following on from the now concluded Digital Infrastructure for the ecosystem of South Tyrolean language data and services (DI-ÖSS) project, the linguaXlab will assess and document the data and technological landscape of South Tyrol, evaluate the FAIRness of relevant resources and offer expertise in the form of linguistic services (e.g. data storage and/or linking), to maximize their use in the name of research and policymaking. These services will interact with and raise awareness of Eurac’s existing CLARIN infrastructures and knowledge centers, and encourage collaborations between local stakeholders with a view to adequately supporting language research in South Tyrol.
As its first use case, the linguaXlab will resume work on the reference corpus of South Tyrolean German, Korpus Südtirol. This corpus of written natural language will cover different genres, topics and time-periods, and will be redesigned in accordance with the FAIR principles for scientific data management and stewardship. Korpus Südtirol will not only enable linguistic research on South Tyrolean German but fill a gap in the current coverage of regional variants and dialects, thus extending the scope for comparative studies on the German language as a whole. To this end, linguaXlab will interview stakeholders to better understand the needs of those who produce and use linguistic data in this local language variant, and conduct a survey of relevant datasets to map what is available and suitable for inclusion in the reference corpus. These tasks will help linguaXlab define and analyze data and infrastructure requirements, and prioritize the development of services moving forwards.
Coherence in academic ItalianDuration: September 2020 - December 2021Funding:
Scientific support for terminology issuesDuration: May 2019 - December 2021Funding:
Zeit.shift – On a digital journey into yesterday's future: preserving Tyrol's cultural text heritage ...Duration: September 2020 - December 2021Funding:
Learner Corpus InfrastructureDuration: December 2015 - December 2021Funding:
Translation and terminology work in the domain of occupational health and safetyDuration: September 2013 - December 2021Funding:
European Network for Combining Language Learning with Crowdsourcing TechniquesDuration: March 2017 - December 2021Funding:
One School, Many Languages 2.0Duration: December 2018 - December 2021Funding:
Machine Translation at South Tyrolean Institutions (pilot study)Duration: May 2021 - December 2021Funding:
Common Language Resources and Technology Infrastructure for Italy at Eurac ResearchDuration: March 2017 - December 2021Funding: