mirror of
https://github.com/duszekjk/jezykiformalne.git
synced 2024-11-23 23:55:28 +01:00
42 lines
1.9 KiB
Plaintext
42 lines
1.9 KiB
Plaintext
|
Use regular expressions to mark Polish first-person masculine forms.
|
||
|
|
||
|
You should handle the following types of expressions:
|
||
|
|
||
|
* first-person masculine past forms of verbs ("zrobiłem", "pisałem", etc.),
|
||
|
* first-person singular masculine forms of the verb "być" ("be") combined
|
||
|
with singular masculine nominative forms of adjectives ("wysoki", "sprytny", etc.),
|
||
|
assuming that the form of the verb "być" is to the left of the adjective, not
|
||
|
more than 3 other words,
|
||
|
* the verb "będę" combined with the past participle (i.e. 3rd person
|
||
|
masculine imperfect form, e.g. "robił", pisał"), assuming
|
||
|
that "będę" is to the left of the adjective, not
|
||
|
more than 3 other words to the left of the participle OR directly
|
||
|
to the right of the participle ("robił będę").
|
||
|
|
||
|
The first-person masculine forms should be marked with curly brackets.
|
||
|
You should mark only the masculine form. Do not mark the form of "być"
|
||
|
(unless it clearly a masculine form, i.e. for "byłem").
|
||
|
|
||
|
The match should be case-insensitive.
|
||
|
|
||
|
The PoliMorf dictionary of inflected forms should be applied:
|
||
|
http://zil.ipipan.waw.pl/PoliMorf?action=AttachFile&do=get&target=PoliMorf-0.6.7.tab.gz
|
||
|
|
||
|
Suggested steps:
|
||
|
|
||
|
1. Extract all the needed forms from the PoliMorf dictionary:
|
||
|
|
||
|
* 1st person masculine past forms of verbs, unfortunately
|
||
|
this form is not directly present in the lexicon, you need
|
||
|
to add "em" to the 3rd person masculine form ("zrobił" => "zrobiłem")
|
||
|
* singular masculine nominative forms of adjectives
|
||
|
* masculine past participle (3rd person masculine imperfect forms of verbs)
|
||
|
|
||
|
You could do this using grep/cut commands — to obtain a simple text files
|
||
|
with a word in each line. You can do this once and commit the 3 files to your repository.
|
||
|
|
||
|
2. In your `run` script/program, read the 3 files and create a large
|
||
|
expression with alternatives. Use a regexp library based on DFAs (determintistic
|
||
|
finite-automatons).
|
||
|
|