djfz-2020/TaskG01/description.txt
2021-01-02 19:07:30 +01:00

45 lines
1.9 KiB
Plaintext

Use regular expressions to mark Polish first-person masculine forms.
You should handle the following types of expressions:
* first-person masculine past forms of verbs ("zrobiłem", "pisałem", etc.),
* first-person singular masculine forms of the verb "być" ("be") combined
with singular masculine nominative forms of adjectives ("wysoki", "sprytny", etc.),
assuming that the form of the verb "być" is to the left of the adjective, not
more than 3 other words,
* the verb "będę" combined with the past participle (i.e. 3rd person
masculine imperfect form, e.g. "robił", pisał"), assuming
that "będę" is to the left of the adjective, not
more than 3 other words to the left of the participle OR directly
to the right of the participle ("robił będę").
The first-person masculine forms should be marked with curly brackets.
You should mark only the masculine form. Do not mark the form of "być"
(unless it clearly a masculine form, i.e. for "byłem").
The match should be case-insensitive.
The PoliMorf dictionary of inflected forms should be applied:
http://zil.ipipan.waw.pl/PoliMorf?action=AttachFile&do=get&target=PoliMorf-0.6.7.tab.gz
Suggested steps:
1. Extract all the needed forms from the PoliMorf dictionary:
* 1st person masculine past forms of verbs, unfortunately
this form is not directly present in the lexicon, you need
to add "em" to the 3rd person masculine form ("zrobił" => "zrobiłem")
* singular masculine nominative forms of adjectives
* masculine past participle (3rd person masculine imperfect forms of verbs)
You could do this using grep/cut commands — to obtain a simple text files
with a word in each line. You can do this once and commit the 3 files to your repository.
2. In your `run` script/program, read the 3 files and create a large
expression with alternatives. Use a regexp library based on DFAs (determintistic
finite-automatons).
POINTS: 4
DEADLINE: 2021-01-20 15:30:00
REMAINDER: 0/2