jezykiformalne/TaskG01/description.txt

Use regular expressions to mark Polish first-person masculine forms.

You should handle the following types of expressions:

* first-person masculine past forms of verbs ("zrobiłem", "pisałem", etc.),
* first-person singular masculine forms of the verb "być" ("be") combined
  with singular masculine nominative forms of adjectives ("wysoki", "sprytny", etc.),
  assuming that the form of the verb "być" is to the left of the adjective, not
  more than 3 other words,
* the verb "będę" combined with the past participle (i.e. 3rd person
  masculine imperfect form, e.g. "robił", pisał"), assuming
  that "będę" is to the left of the adjective, not
  more than 3 other words to the left of the participle OR directly
  to the right of the participle ("robił będę").

The first-person masculine forms should be marked with curly brackets.
You should mark only the masculine form. Do not mark the form of "być"
(unless it clearly a masculine form, i.e. for "byłem").

The match should be case-insensitive.

The PoliMorf dictionary of inflected forms should be applied:
http://zil.ipipan.waw.pl/PoliMorf?action=AttachFile&do=get&target=PoliMorf-0.6.7.tab.gz

Suggested steps:

1. Extract all the needed forms from the PoliMorf dictionary:

* 1st person masculine past forms of verbs, unfortunately
  this form is not directly present in the lexicon, you need
  to add "em" to the 3rd person masculine form ("zrobił" => "zrobiłem")
* singular masculine nominative forms of adjectives
* masculine past participle (3rd person masculine imperfect forms of verbs)

You could do this using grep/cut commands — to obtain a simple text files
with a word in each line. You can do this once and commit the 3 files to your repository.

2. In your `run` script/program, read the 3 files and create a large
expression with alternatives. Use a regexp library based on DFAs (determintistic
finite-automatons).
Add files via upload 2024-01-08 11:27:35 +01:00			`Use regular expressions to mark Polish first-person masculine forms.`

			`You should handle the following types of expressions:`

			`* first-person masculine past forms of verbs ("zrobiłem", "pisałem", etc.),`
			`* first-person singular masculine forms of the verb "być" ("be") combined`
			`with singular masculine nominative forms of adjectives ("wysoki", "sprytny", etc.),`
			`assuming that the form of the verb "być" is to the left of the adjective, not`
			`more than 3 other words,`
			`* the verb "będę" combined with the past participle (i.e. 3rd person`
			`masculine imperfect form, e.g. "robił", pisał"), assuming`
			`that "będę" is to the left of the adjective, not`
			`more than 3 other words to the left of the participle OR directly`
			`to the right of the participle ("robił będę").`

			`The first-person masculine forms should be marked with curly brackets.`
			`You should mark only the masculine form. Do not mark the form of "być"`
			`(unless it clearly a masculine form, i.e. for "byłem").`

			`The match should be case-insensitive.`

			`The PoliMorf dictionary of inflected forms should be applied:`
			`http://zil.ipipan.waw.pl/PoliMorf?action=AttachFile&do=get&target=PoliMorf-0.6.7.tab.gz`

			`Suggested steps:`

			`1. Extract all the needed forms from the PoliMorf dictionary:`

			`* 1st person masculine past forms of verbs, unfortunately`
			`this form is not directly present in the lexicon, you need`
			`to add "em" to the 3rd person masculine form ("zrobił" => "zrobiłem")`
			`* singular masculine nominative forms of adjectives`
			`* masculine past participle (3rd person masculine imperfect forms of verbs)`

			`You could do this using grep/cut commands — to obtain a simple text files`
			`with a word in each line. You can do this once and commit the 3 files to your repository.`

			2. In your `run` script/program, read the 3 files and create a large
			`expression with alternatives. Use a regexp library based on DFAs (determintistic`
			`finite-automatons).`