Tehdyt toimenpiteet

oulu

Oulu corpus

Description

The Oulu Corpus is a research material of the standard Finnish language in 1960's. Its collection was led by prof. Pauli Saukkonen. The research material was converted later, in 1997, into an SGML format by the Research Institute for the Languages of Finland.

The corpus project aimed at creation of a corpus that contains a representative sample of the Finnish language in the 1960's media. The corpus does not include the language as used in television.

Home Page:

Version and Size

Version: 1997

Size: The corpus contains all together 5800 short samples. These contain 429058 words and some 29000 sentences.

Content and Structure

The corpus is divided into the following genres:

  1. fiction literature published in 1961 - 1967
  2. radio talks during 29 September 1968 - 26 May 1969
  3. newpapers and journals in 1967
  4. non-fiction literature 1961 - 1967

Each of these genres is divided into further subgenres. In addition to these, there are some free interviews in the standard Finnish language.

The material has been selected from the texts with random sampling. Each sample is 5 sentences long or at least 60 words. The shortness of the samples was motivated by the choice to restrict the research to clauses, words, morfs and phonemes. The original purpose was not to research whole texts.

More information is available in text file /kielipankki/oulu/lue.minut.

Directory on the software server hippu.csc.fi

/kielipankki/oulu

Directory Listing

drwxr-x---   3 ling oulu   4096 29. tammi   2002 .
drwxr-xr-x 28 ling csc 4096 16. syys 15:50 ..
drwxr----- 2 ling oulu 4096 29. tammi 2002 cqp
-r--r----- 1 ling oulu 1012 28. syys 2000 korpoulu.dtd
-r--r----- 1 ling oulu 10169 28. syys 2000 lue.minut
-r--r----- 1 ling oulu 383442 28. syys 2000 text01.sgml
-r--r----- 1 ling oulu 82408 28. syys 2000 text01.txt
-r--r----- 1 ling oulu 381483 28. syys 2000 text02.sgml
-r--r----- 1 ling oulu 84997 28. syys 2000 text02.txt
...
-r--r----- 1 ling oulu 315958 28. syys 2000 text81.sgml
-r--r----- 1 ling oulu 87269 28. syys 2000 text81.txt
-r--r----- 1 ling oulu 578663 28. syys 2000 text90.sgml
-r--r----- 1 ling oulu 115335 28. syys 2000 text90.txt

Sample

/kielipankki/oulu/text77.txt:

<!DOCTYPE KORPOULU SYSTEM "korpoulu.dtd">
<korpoulu>
<header>
</header>
<text>
<s txt=77 extr=1 num=1 wnum=7 clnum=2 >
kuulimme miltä vanha linna näytti matkam päästä /
</s>
<s txt=77 extr=1 num=2 wnum=9 clnum=2 >
se on samanvärinen kuin kallio jolle se on rakennettu /
</s>

/kielipankki/oulu/text77.sgml:

<!DOCTYPE KORPOULU SYSTEM "korpoulu.dtd">
<korpoulu>
<header> </header>
<text>
<s txt=77 extr=1 num=1 wnum=7 clnum=2 >
<cl num=1 hier=main >
<w> kuulimme <g> cat=v mod=ind tns=pret pers=pl1 </g> </w>
</cl>
<cl num=2 hier=sub type=q fn=obj >
<w> miltä <g> </g> </w>
<w> vanha <g> cat=a </g> </w>
<w> linna <g> </g> </w>
<w> näytti <g> cat=v mod=ind tns=pret pers=sg3 </g> </w>
<w> matkam <g> </g> </w>
<w> päästä <g> </g> </w>
<pu>/</pu>
</cl>
</s>
<s txt=77 extr=1 num=2 wnum=9 clnum=2 >
<cl num=1 hier=main >
<w> se <g> cat=pron:dem </g> </w>
<w> on <g> cat=v mod=ind tns=pres pers=sg3 </g> </w>

Access Rights and Conditions

Oulun korpuksen tekstien käyttöoikeus koskee tekstien hyödyntämistä tieteellisiä tutkimustarkoituksia varten ja oikeutta hyödyntää tekstejä tutkimalla selville saatuja kielen piirteitä (esim. tilastollisia tunnuslukuja, kielioppisääntöjä ja sanojen semanttisia kuvauksia) sekä lyhyitä esimerkkilainauksia, sekä sellaisia kaupallisia kieliteknologisia tms. sovelluksia varten, jotka eivät loukkaa tekijänoikeuslakia.

The Group of Unix Users Having Access to the Resource: oulu

References

Making Bibliographical Reference to the Material:

Oulun korpus. Oulun yliopiston suomen ja saamen kielen laitoksen koostama ja Kotimaisten kielten tutkimuskeskuksen konvertoima korpus. 1982 - 1997.

Other References

Pauli Saukkonen. 1982. Oulun korpus. 1960-luvun suomen yleiskielen tutkimusmateriaali. Oulun yliopiston suomen ja saamen kielen laitoksen tutkimusraportteja 1. Oulu.

Paulu Saukkonen, Marjatta Haipus, Antero Niemikorpi and Helena Sulkala. 1979. Suomen kielen taajuussanasto. WSOY.

Field of science:
Language research
Available:
  • hippu
License:
A