Common Language Resources and Technology Infrastructure, Consortium FIN-CLARIAH
FIN-CLARIAH is the premier Finnish digital research infrastructure (RI) for Social Sciences and Humanities (SSH), comprising of two components: FIN-CLARIN and DARIAH-FI. In their first common development project, the FIN-CLARIAH components seek to significantly broaden their mutual scope of digital SSH infrastructural support by consolidating and enhancing their resources in three major data-oriented directions:
1. to reach beyond processing of spoken standard Finnish into colloquial speech
2. to cater to a broad range of SSH research needs for processing unstructured text
3. to facilitate research based on metadata.
In Finland, digitization of materials for SSH research is well underway, but one of the main problems from the perspective of research is that the data is scattered. This presents problems for researchers due to the incompatibility of formats and interfaces. Additionally, there is a danger of duplicated effort when developing tools to manage and process the datasets. The primary concern of the RI project is thus to ensure that both data and functionality are consolidated under a unified national RI operated by the efficient computing infrastructure provided by CSC, currently serving as the technical host of the Language Bank of Finland.
The FIN-CLARIAH ecosystem has already made notable achievements, for example creating unified processes for negotiating research rights to materials and developing unified access mechanisms for the resulting datasets. Utilizing these, FIN-CLARIAH already makes available large collections of textual and multi-modal resources as well as tools for analysing and enriching them. However, recent advances due to neural network technology, supercomputing availability, and large digital SSH datasets have created clear opportunities and needs for further development of a common SSH data and tools infrastructure.
The different developmental steps that are needed for processing unstructured data, as well as standard and colloquial speech, require a world-class HPC environment and support for researchers. The role of CSC is to provide this integration and enable optimal usage of infrastructure.
This project has received funding from the Research Council of Finland.