Tehdyt toimenpiteet

pc-parse

A set of tools for morphological and syntactic analysis

Description

PC-Parse is a collection of tools for morphological and syntactic analysis. They include an Item-and-Arrangement analyzer AMPLE, two-level morphology analyzer and generator PC-KIMMO, unification-based syntactic parser PC-PATR, and tools to help with translation.

Parts:

  • pc-kimmo - Two-level morphological analyzer
  • pc-patr - Syntactic parser
  • ktext - Text analysis with PC-KIMMO parser
  • ktagger - A part-of-speech tagger based on PC-KIMMO.
  • ample - A morphological Item-and-Arrangement parser for linguistic exploration.
  • anadiff
  • intergen
  • stamp - Morphological transfer and synthesis for adapting text to a related language
  • tonegen - Allows modeling autosegmental tonology with STAMP
  • tonepars - Allows modeling autosegmental tonology with AMPLE
  • convlex
  • xample

Version and Copyright Information

version: v. 20051207

  • PC-Kimmo version 2.1.13
  • PC-PATR version 1.3.13
  • KTEXT version 2.1.8
  • KTAGGER version 1.0.10
  • AMPLE version 3.10.1
  • ANADIFF version 1.0.6
  • INTERGEN version 2.2.0
  • STAMP version 2.2.1
  • ToneGen version 1.0b20
  • TonePars version 1.0.19

copyright:

Usage

GETTING STARTED WITH PC-KIMMO

Here are instructions for trying out PC-KIMMO with Englex, a PC-KIMMO description of English morphology (rules, lexicon, grammar). Englex is in the pckimmo/test/eng subdirectory.

After getting the englex archive and unpacking it, go to the englex subdirectory and edit the file englex.tak. Fix the file paths for your local system using either absolute or relative paths. Here is one strategy: move the englex.tak file out of the Englex subdirectory into the directory just above it and modify the paths like this:

load rules englex/english.rul load lexicon englex/english.lex load grammar englex/english.grm

Launch PC-KIMMO and at the prompt type:

take englex

(The Take command expects .tak as the default file extension.)

Now use the Recognize command to recognize (parse) words. For example, at the prompt type:

recognize foxes

The command keyword "recognize" can be shortened to "r". Better, at the prompt type "recognize" (or "r") and press return. A special "recognizer" prompt will appear. Now you can keep typing words without repeating the "recognize" command. Note: use only lower case letters and no punctuation (except apostrophe and hyphen).

GETTING STARTED WITH PC-PATR

Here are instructions for trying out PC-PATR with the supplied toy English grammar.

The directory doc/pcpatr/english contains a toy English sentence grammar and lexicon. This grammar can also be used with Englex, a morphological description of English for PC-KIMMO (see above). Englex will provide a morphological parse of words on the fly, thus building up a word lexicon as you parse sentences. Note that you do not have to have the stand-alone PC-KIMMO executable in order to use Englex with PC-PATR; PC-KIMMO is built into PC-PATR.

Go to the doc/pcpatr/english subdirectory and start PC-PATR:

 
% cd doc/pcpatr/english
% pcpatr
PC-PATR>take english

(The Take command expects .tak as the default file extension.)

Now use the Parse command to parse sentences. For example, at the prompt type:

PC-PATR>parse uther stormed cornwall

The command keyword "parse" can be shortened to "p". Better, at the prompt type "parse" (or "p") and press return. A special "parse" prompt will appear. Now you can keep typing sentences without repeating the "parse" command. Note: use only lower case letter and no punctuation (except apostrophe and hyphen inside words). Try sentences such as these (but remember that you are limited to words that are in the lexicon):

        uther sleeps
the knights sleep
uther storms cornwall
the brave knights have stormed cornwall
i sleep
he sleeps
he was sleeping
he slept
he has slept
i see him
he sees me
i was seen
i was seen by him
i was seen by him clearly
i saw the man with a telescope
the tall man on the hill saw me with a telescope
i saw uther before he stormed cornwall

USING PC-PATR WITH ENGLEX

Obtain Englex from the URL given above and install it. Go to the doc/pcpatr/english subdirectory under PC-PATR and edit the file english2.tak. Fix the take file for your system by modifying the file paths for your local system.

% emacs english2.tak or % vi english2.tak

Fix the first three lines starting with "load kimmo" so that they point to the Englex files. For example, if the englex subdirectory is on the same level as the doc and src subdirectories the default paths would be okay:

 
load kimmo rules ../../../englex/english.rul
load kimmo lexicon ../../../englex/english.lex
load kimmo grammar ../../../englex/english.grm

Note that the "kimmo mapping" file is found in the doc/pcpatr/english directory:

load kimmo mapping english2.map

Now you can parse sentences as described above. The difference is that any words you use which are not in the word lexicon (i.e. english.lex) are parsed by Englex. When you are done parsing sentences, you can save the modified word lexicon using the "Save Lexicon" command.

Field of science:
Language research
Available:
  • hippu
License:
A