Language technology paving the way for digital humanities

Language technology paving the way for digital humanities

Computational methods are nothing new in digital humanities and social sciences but language research has a special tool that puts it ahead of the pack: language technology, a field that combines linguistics and computer science.

Computational linguistics, the core of language technology, first appeared already in the 1950's, that is way before e.g. bioinformatics. Ever since, language technology has been an essential branch of the study of natural language. Languages that people use in everyday communication and that have developed over the course of millennia are considered natural. For example, constructed languages and programming languages are not natural languages. Computational linguistics studies the structure and function of language using the methods and theories of electronic data processing.

The Language Bank of Finland serves language technology in Finland. It is the set of services and resources for language research and other digital humanities and social sciences, presently coordinated by the national FIN-CLARIN consortium formed by Finnish universities and language research institutes and led by the University of Helsinki. CSC is responsible for the Language Bank's technical infrastructure and computational resources.

Since it was founded in the late 90's, the Language Bank has offered its services not only to research but also for teaching. One of the first things new language technology students do is to log in to the Language Bank Rights application, based on the REMS (Resource Entitlement Management System) technology developed at CSC, and apply for access rights to the Language Bank's resources. They can also use CSC's supercomputers. The Language Bank's experts are there for students just as well as any other customers.

Other humanities and social sciences lack an established cross-disciplinary dimension akin to language technology or bioinformatics. There is no such thing as e.g. "history technology". The importance of computational methods in humanities is recognized these days but it can be a challenge to find a feasible starting point and perspective.

The Language Bank has in the last few years been actively expanding its networks towards other digital humanities and social sciences. Natural language contains important data for many other fields than only true language research disciplines. Language resources, i.e. language corpora and tools, are potentially useful for anybody whose subject can be encoded with words.

For example, we have been thinking how our present tools would work for research questions that do not target the language itself, and what kind of brand new services we could develop for their needs. One tool that is already widely applicable as such is the Korp interface in which you can e.g. query the contents of the Suomi 24 discussion forum. An actual proof of this is that last year, both thesis awards given by the Rajapinta association were received by researchers using the Suomi 24 corpus.

Language research has traditionally been the most internationally oriented among the fields supported by CSC. 10% of the Language Bank's users have been outside Finland. We have also participated in building the CLARIN federation that connects the language banks of Europe since 2007. Presently, the Language Bank of Finland is the core of the FIN-CLARIN consortium that represents Finland in CLARIN.

Other digital humanities and social humanities have their own similar infrastructure called DARIAH. Finland is not yet a member but preparations for joining are underway, and the Language Bank is also involved. Our goal is to bring together the digital humanities and social sciences communities and offer them our services.

The Language Bank of Finland CLARIN   DARIAH

The Language bank of Finland - Apply for access rights

REMS   Korp

More about this topic » Go to insights and news »

Tero Aalto

The author is a language technologist and works with the Language Bank of Finland.