susanne
The SUSANNE corpus of English
Description
Susanne Corpus is a digital research material on the modern English language. It was created in a project that aimed at a taxonomy of grammatical units for language technology applications. The taxonomy was not ment to be practical rather than theoretically optimal or psychologically realistic. The purpose of the taxonomy was to help systematical classification of grammatical units occurring in genuine texts. With the taxonomy, the researchers of language technology are able to exchange empirical data and its annotation with much smaller risk of using different terms and definitions for the annotations.
The language bank contains the syntactically annotated part of the Susanne Corpus.
The name “SUSANNE” stands for “Surface and underlying structural analysis of natural English”.
Home Page
http://www.grsampson.net/Resources.htmlVersion and Size
Release 4, 1994.11.07 The SUSANNE Corpus consists of 64 files (apart from this documentation file), each containing an annotated version of one 2000+ word text from the Brown Corpus. Files average about 83 kilobytes in size, thus the entire Corpus totals about 5.3 megabytes.Content and Structure
Korpus on Brown-korpuksen osajoukko ja sen laajuus on 130 000 sanaa. Neljästä alla luetellusta Brown-korpuksen tekstilajista on poimittu kustakin 16 yli 2000 sanan laajuista tekstiä.- A press reportage
- G belles lettres, biography, memoirs
- J learned (mainly scientific and technical) writing
- N adventure and Western fiction
Directory
/kielipankki/susanneDirectory Listing
drwxr-xr-x 2 ling csc 4096 28. syys 2000 ./
drwxr-xr-x 28 ling csc 4096 16. syys 15:50 ../
-r--r--r-- 1 ling csc 83671 28. syys 2000 A01
-r--r--r-- 1 ling csc 84248 28. syys 2000 A02
-r--r--r-- 1 ling csc 83410 28. syys 2000 A03
-r--r--r-- 1 ling csc 81892 28. syys 2000 A04
-r--r--r-- 1 ling csc 81739 28. syys 2000 A05
-r--r--r-- 1 ling csc 82573 28. syys 2000 A06
..
-r--r--r-- 1 ling csc 82536 28. syys 2000 N15
-r--r--r-- 1 ling csc 80949 28. syys 2000 N18
-r--r--r-- 1 ling csc 2110 28. syys 2000 README
-r--r--r-- 1 ling csc 52541 28. syys 2000 SUSANNE.doc
-r--r--r-- 1 ling csc 52541 28. syys 2000 SUSANNE.2.doc
-r--r--r-- 1 ling csc 8374 28. syys 2000 00_PLEASE_PARTICIPATE_IN_OUR_SURVEY.html
Sample
a sample from /kielipankki/susanne/A01:A01:0010a - YB <minbrk> - [Oh.Oh]
A01:0010b - AT The the [O[S[Nns:s.
A01:0010c - NP1s Fulton Fulton [Nns.
A01:0010d - NNL1cb County county .Nns]
A01:0010e - JJ Grand grand .
A01:0010f - NN1c Jury jury .Nns:s]
A01:0010g - VVDv said say [Vd.Vd]
A01:0010h - NPD1 Friday Friday [Nns:t.Nns:t]
A01:0010i - AT1 an an [Fn:o[Ns:s.
A01:0010j - NN1n investigation investigation .
A01:0020a - IO of of [Po.
A01:0020b - NP1t Atlanta Atlanta [Ns[G[Nns.Nns]
A01:0020c - GG +<apos>s - .G]
A01:0020d - JJ recent recent .
A01:0020e - JJ primary primary .
A01:0020f - NN1n election election .Ns]Po]Ns:s]
A01:0020g - VVDv produced produce [Vd.Vd]
A01:0020h - YIL <ldquo> - .
A01:0020i - ATn +no no [Ns:o.
A01:0020j - NN1u evidence evidence .
A01:0020k - YIR +<rdquo> - .
A01:0020m - CST that that [Fn.
A01:0030a - DDy any any [Np:s.
A01:0030b - NN2 irregularities irregularity .Np:s]
A01:0030c - VVDv took take [Vd.Vd]
A01:0030d - NNL1c place place [Ns:o.Ns:o]Fn]Ns:o]Fn:o]S]
A01:0030e - YF +. - .O]
A01:0030f - YB <minbrk> - [Oh.Oh]
Access Rights and Conditions
According to Geoffrey Sampson: "The SUSANNE Corpus is freely available without formalities for use by researchers anywhere..." and "So far as I am concerned, anyone is welcome to take copies of these resources and to use them for any purpose; and as far as I am able to check, I am legally entitled to make that offer. (If this is not legally watertight enough for you, you will have to go into the legalities yourself.) Naturally, if you do anything public with some of these materials, Sussex University and I would appreciate an acknowledgement (and, in the case of SUSANNE, CHRISTINE, and LUCY, so would the Economic and Social Research Council (UK), which sponsored their creation)."References
Making Bibliographical Reference to the Material:
Geoffrey Sampson (Ed.). 1995. Susanne Corpus School of Cognitive & Computing Sciences, University of Sussex.Bugs
Known Bugs
Release 5 is available. Other related corpora are available.Field of science:
Language researchAvailable:
- hippu