The Finnish News Agency opens up its digital news archive to the Language Bank of Finland
The Finnish News Agency (STT) has released its digital news archive for use by researchers at Kielipankki (Language Bank). The Finnish-language news archive that has been downloaded into the Language Bank contains articles from 1992–2018, and most of the over 2.7 million articles are news stories of varying lengths.
The Language Bank serves researchers working on different language materials and the development of language technology.
"We hope to benefit from the university research, especially in research projects related to language technology, machine learning and artificial intelligence. STT's mission is to develop automation and robotics for content production in order to serve the Finnish media industry in general. If research teams do not have access to media-produced material, it is of course impossible to build applications based on that material," said Kimmo Pietinen, CEO of STT.
So far, the Language Bank's newspaper material has focused on older material, so the STT archive increases the sample that is available for the researchers.
"This is the freshest news material currently out there. This archive is a great addition to the Language Bank's selection," said Mietta Lennes, Project Planner at the Language Bank.
The Language Bank serves researchers
The Language Bank of Finland is a key service entity of the FIN-CLARIN consortium, through which language materials and tools suitable for processing them are made available to researchers.
The STT news archive that is now available in the Language Bank can be downloaded in its entirety as a raw material. STT will evaluate and approve all research plans before access to the archive is granted.
News material in the Language Bank will be made available to researchers in a more structured form during the autumn. At that time, researchers will be able to access the material through the Korp service, where browsing is easier.
The FIN-CLARIN consortium is made up of Finnish universities, the CSC – IT Center for Science and The Institute for the Languages of Finland. The University of Helsinki is responsible for the acquisition and receipt of materials provided through the Language Bank, as well as the development of tools and training activities. CSC is responsible for the technical maintenance of the Language Bank.
STT's materials have already been used in research
STT's archival material has already been made available through the Language Bank to be used by the international Embeddia research project. Six European universities, STT and three other media companies are participating in the three-year Embeddia project, which started this year. The University of Helsinki is the Finnish university taking part.
The European research and innovation project aims to produce robotics components for news. These components can be scaled across language barriers and could support media companies in various ways, from automatic text production to comments moderation. The focus is on small language areas such as Finland, which have not been able to use all the technologies that have already been developed, for example, in the Anglo-Saxon language areas.
With the help of the achieve material, the NLP language technology group at the University of Turku has been developing its Finnish language model and has created a separate model for ‘STT Finland'. Their aim is to develop a news reporter application that uses artificial intelligence and machine learning to write news based on various forms of data in Finnish.
STT's news article: STT vauhdittaa kieliteknologian tutkimusta antamalla uutisarkistonsa tutkijoiden käyttöön (only in Finnish)