F00-F05
This commit is contained in:
parent
e22e8a731a
commit
dc34bdeecc
19
TaskF00/description.txt
Normal file
19
TaskF00/description.txt
Normal file
@ -0,0 +1,19 @@
|
||||
Napisać program, który wczytuje kolejne wiersze ze standardowego
|
||||
wejścia i analizuje każdy wiersz (bez znaku końca wiersza). Należy w
|
||||
jak największym stopniu wykorzystać wyrażenia regularne (np. nie wolno
|
||||
użyć negacji jako operacji w danym języku programowania, jeśli da się
|
||||
to wyrazić w samym wyrażeniu regularnym). Tam, gdzie to możliwe należy
|
||||
użyć pojedynczego wyrażenia regularnego.
|
||||
|
||||
Write a program, which loads consecutive lines from standard input
|
||||
and analyze every line (with no newline character). You should
|
||||
use regular expressions to the greatest extent possible (e.g. you
|
||||
can not use negation in the programming language if it is
|
||||
possible to express the same in regular expression). Wherever possible,
|
||||
use one regular expression.
|
||||
|
||||
Write a program to substitute all 4-digits string to 4-characters string.
|
||||
In the substituted string "0" should change to "a", "1" should change to "b", "2" should change to "c", etc.
|
||||
E.g. "1162" should change to "bbgc".
|
||||
In this task digit means [0-9] class.
|
||||
|
50000
TaskF00/polish_wiki_excerpt.exp
Normal file
50000
TaskF00/polish_wiki_excerpt.exp
Normal file
File diff suppressed because one or more lines are too long
0
TaskF00/polish_wiki_excerpt.in
Normal file
0
TaskF00/polish_wiki_excerpt.in
Normal file
0
TaskF00/polish_wiki_excerpt.out
Normal file
0
TaskF00/polish_wiki_excerpt.out
Normal file
26
TaskF00/run.py
Normal file
26
TaskF00/run.py
Normal file
@ -0,0 +1,26 @@
|
||||
import re
|
||||
import sys
|
||||
|
||||
inFile = sys.argv[1]
|
||||
outFile = sys.argv[2]
|
||||
|
||||
def substitute_digits(match):
|
||||
return ''.join(chr(ord('a') + int(digit)) for digit in match.group())
|
||||
|
||||
def analyze_line(line):
|
||||
result = re.sub(r'\d{4}', substitute_digits, line)
|
||||
return result
|
||||
|
||||
if __name__ == "__main__":
|
||||
try:
|
||||
with open(inFile, 'r', encoding='utf-8') as inputFile, open(outFile, 'w', encoding='utf-8') as outputFile:
|
||||
for line in inputFile:
|
||||
line = line.rstrip()
|
||||
|
||||
modified_line = analyze_line(line)
|
||||
outputFile.write(modified_line +'\n')
|
||||
print(modified_line)
|
||||
|
||||
|
||||
except EOFError:
|
||||
pass
|
3
TaskF00/simple.exp
Normal file
3
TaskF00/simple.exp
Normal file
@ -0,0 +1,3 @@
|
||||
dece 34 dfd gfd 5
|
||||
f33sdfsdbcdedsfsdf
|
||||
3r
|
3
TaskF00/simple.in
Normal file
3
TaskF00/simple.in
Normal file
@ -0,0 +1,3 @@
|
||||
3424 34 dfd gfd 5
|
||||
f33sdfsd1234dsfsdf
|
||||
3r
|
3
TaskF00/simple.out
Normal file
3
TaskF00/simple.out
Normal file
@ -0,0 +1,3 @@
|
||||
dece 34 dfd gfd 5
|
||||
f33sdfsdbcdedsfsdf
|
||||
3r
|
22
TaskF01/description.txt
Normal file
22
TaskF01/description.txt
Normal file
@ -0,0 +1,22 @@
|
||||
Napisać program, który wczytuje kolejne wiersze ze standardowego
|
||||
wejścia i analizuje każdy wiersz (bez znaku końca wiersza). Należy w
|
||||
jak największym stopniu wykorzystać wyrażenia regularne (np. nie wolno
|
||||
użyć negacji jako operacji w danym języku programowania, jeśli da się
|
||||
to wyrazić w samym wyrażeniu regularnym). Tam, gdzie to możliwe należy
|
||||
użyć pojedynczego wyrażenia regularnego.
|
||||
|
||||
Write a program, which loads consecutive lines from standard input
|
||||
and analyze every line (with no newline character). You should
|
||||
use regular expressions to the greatest extent possible (e.g. you
|
||||
can not use negation in the programming language if it is
|
||||
possible to express the same in regular expression). Wherever possible,
|
||||
use one regular expression.
|
||||
|
||||
For each word with at least one lower case letter and one capital letter
|
||||
change every lower case letter to capital case and change every capital case
|
||||
letter to lower. In this task word means the string of "\w" metacharacters,
|
||||
lower case letter is [a-ząćęłńóśźż] class,
|
||||
capital case letter is [A-ZĄĆĘŁŃÓŚŹŻ] class.
|
||||
|
||||
POINTS: 2
|
||||
DEADLINE: 2020-12-18 23:59:59
|
50000
TaskF01/polish_wiki_excerpt.exp
Normal file
50000
TaskF01/polish_wiki_excerpt.exp
Normal file
File diff suppressed because one or more lines are too long
0
TaskF01/polish_wiki_excerpt.in
Normal file
0
TaskF01/polish_wiki_excerpt.in
Normal file
0
TaskF01/polish_wiki_excerpt.out
Normal file
0
TaskF01/polish_wiki_excerpt.out
Normal file
26
TaskF01/run.py
Normal file
26
TaskF01/run.py
Normal file
@ -0,0 +1,26 @@
|
||||
import re
|
||||
import sys
|
||||
|
||||
inFile = sys.argv[1]
|
||||
outFile = sys.argv[2]
|
||||
|
||||
def swap_case(match):
|
||||
return match.group().swapcase()
|
||||
|
||||
def analyze_line(line):
|
||||
result = re.sub(r'\b\w*([a-ząćęłńóśźż]+[A-ZĄĆĘŁŃÓŚŹŻ]|[A-ZĄĆĘŁŃÓŚŹŻ]+[a-ząćęłńóśźż]+)\w*\b', swap_case, line)
|
||||
return result
|
||||
|
||||
if __name__ == "__main__":
|
||||
try:
|
||||
with open(inFile, 'r', encoding='utf-8') as inputFile, open(outFile, 'w', encoding='utf-8') as outputFile:
|
||||
for line in inputFile:
|
||||
line = line.rstrip()
|
||||
|
||||
|
||||
modified_line = analyze_line(line)
|
||||
outputFile.write(modified_line + '\n')
|
||||
print(modified_line)
|
||||
|
||||
except EOFError:
|
||||
pass
|
3
TaskF01/simple.exp
Normal file
3
TaskF01/simple.exp
Normal file
@ -0,0 +1,3 @@
|
||||
ala mA KOTa
|
||||
lallaa
|
||||
żUK
|
3
TaskF01/simple.in
Normal file
3
TaskF01/simple.in
Normal file
@ -0,0 +1,3 @@
|
||||
ala Ma kotA
|
||||
lallaa
|
||||
Żuk
|
3
TaskF01/simple.out
Normal file
3
TaskF01/simple.out
Normal file
@ -0,0 +1,3 @@
|
||||
ala mA KOTa
|
||||
lallaa
|
||||
żUK
|
23
TaskF02/description.txt
Normal file
23
TaskF02/description.txt
Normal file
@ -0,0 +1,23 @@
|
||||
Napisać program, który wczytuje kolejne wiersze ze standardowego
|
||||
wejścia i analizuje każdy wiersz (bez znaku końca wiersza). Należy w
|
||||
jak największym stopniu wykorzystać wyrażenia regularne (np. nie wolno
|
||||
użyć negacji jako operacji w danym języku programowania, jeśli da się
|
||||
to wyrazić w samym wyrażeniu regularnym). Tam, gdzie to możliwe należy
|
||||
użyć pojedynczego wyrażenia regularnego.
|
||||
|
||||
Write a program, which loads consecutive lines from standard input
|
||||
and analyze every line (with no newline character). You should
|
||||
use regular expressions to the greatest extent possible (e.g. you
|
||||
can not use negation in the programming language if it is
|
||||
possible to express the same in regular expression). Wherever possible,
|
||||
use one regular expression.
|
||||
|
||||
For each line write 4 digits separated by space "A B C D", where
|
||||
A stands for all lower case letters, B stands for
|
||||
all capital case letters, C stand for digit,
|
||||
D stands for all remaining characters excluding newline.
|
||||
In this task, lower case letter is [a-ząćęłńóśźż] class,
|
||||
capital case letter is [A-ZĄĆĘŁŃÓŚŹŻ] class.
|
||||
|
||||
POINTS: 1
|
||||
DEADLINE: 2020-12-18 23:59:59
|
50000
TaskF02/polish_wiki_excerpt.exp
Normal file
50000
TaskF02/polish_wiki_excerpt.exp
Normal file
File diff suppressed because it is too large
Load Diff
0
TaskF02/polish_wiki_excerpt.in
Normal file
0
TaskF02/polish_wiki_excerpt.in
Normal file
0
TaskF02/polish_wiki_excerpt.out
Normal file
0
TaskF02/polish_wiki_excerpt.out
Normal file
28
TaskF02/run.py
Normal file
28
TaskF02/run.py
Normal file
@ -0,0 +1,28 @@
|
||||
import re
|
||||
import sys
|
||||
|
||||
inFile = sys.argv[1]
|
||||
outFile = sys.argv[2]
|
||||
|
||||
def analyze_line(line):
|
||||
lower_case_count = len(re.findall(r'[a-ząćęłńóśźż]', line))
|
||||
upper_case_count = len(re.findall(r'[A-ZĄĆĘŁŃÓŚŹŻ]', line))
|
||||
digit_count = len(re.findall(r'\d', line))
|
||||
other_count = len(re.findall(r'[^a-zą-żA-ZĄ-Ż\d\n]', line))
|
||||
# print(re.findall(r'[^a-zą-żA-ZĄ-Ż\d\n]', line))
|
||||
|
||||
return(lower_case_count, upper_case_count, digit_count, other_count)
|
||||
|
||||
if __name__ == "__main__":
|
||||
try:
|
||||
with open(inFile, 'r', encoding='utf-8') as inputFile, open(outFile, 'w', encoding='utf-8') as outputFile:
|
||||
for line in inputFile:
|
||||
line = line.rstrip()
|
||||
|
||||
|
||||
lower_case_count, upper_case_count, digit_count, other_count = analyze_line(line)
|
||||
outputFile.write(f'{lower_case_count} {upper_case_count} {digit_count} {other_count}' + '\n')
|
||||
print(f'{lower_case_count} {upper_case_count} {digit_count} {other_count}')
|
||||
|
||||
except EOFError:
|
||||
pass
|
3
TaskF02/simple.exp
Normal file
3
TaskF02/simple.exp
Normal file
@ -0,0 +1,3 @@
|
||||
7 2 0 2
|
||||
6 0 0 0
|
||||
6 1 1 2
|
3
TaskF02/simple.in
Normal file
3
TaskF02/simple.in
Normal file
@ -0,0 +1,3 @@
|
||||
ala Ma kotA
|
||||
lallaa
|
||||
Mam 2 żuki
|
3
TaskF02/simple.out
Normal file
3
TaskF02/simple.out
Normal file
@ -0,0 +1,3 @@
|
||||
7 2 0 2
|
||||
6 0 0 0
|
||||
6 1 1 2
|
24
TaskF03/description.txt
Normal file
24
TaskF03/description.txt
Normal file
@ -0,0 +1,24 @@
|
||||
Napisać program, który wczytuje kolejne wiersze ze standardowego
|
||||
wejścia i analizuje każdy wiersz (bez znaku końca wiersza). Należy w
|
||||
jak największym stopniu wykorzystać wyrażenia regularne (np. nie wolno
|
||||
użyć negacji jako operacji w danym języku programowania, jeśli da się
|
||||
to wyrazić w samym wyrażeniu regularnym). Tam, gdzie to możliwe należy
|
||||
użyć pojedynczego wyrażenia regularnego.
|
||||
|
||||
Write a program, which loads consecutive lines from standard input
|
||||
and analyze every line (with no newline character). You should
|
||||
use regular expressions to the greatest extent possible (e.g. you
|
||||
can not use negation in the programming language if it is
|
||||
possible to express the same in regular expression). Wherever possible,
|
||||
use one regular expression.
|
||||
|
||||
For each line write 2 digits separated by space "A B", where
|
||||
A stands for all words starting with lower case letter,
|
||||
B stands for all words starting with capital case letter,
|
||||
In this task word means a string of "\w" metacharacters,
|
||||
lower case letter is [a-ząćęłńóśźż] class,
|
||||
capital case letter is [A-ZĄĆĘŁŃÓŚŹŻ] class capital case letter is [A-ZĄĆĘŁŃÓŚŹŻ] class.
|
||||
|
||||
|
||||
POINTS: 1
|
||||
DEADLINE: 2020-12-18 23:59:59
|
50000
TaskF03/polish_wiki_excerpt.exp
Normal file
50000
TaskF03/polish_wiki_excerpt.exp
Normal file
File diff suppressed because it is too large
Load Diff
0
TaskF03/polish_wiki_excerpt.in
Normal file
0
TaskF03/polish_wiki_excerpt.in
Normal file
0
TaskF03/polish_wiki_excerpt.out
Normal file
0
TaskF03/polish_wiki_excerpt.out
Normal file
25
TaskF03/run.py
Normal file
25
TaskF03/run.py
Normal file
@ -0,0 +1,25 @@
|
||||
import re
|
||||
import sys
|
||||
|
||||
inFile = sys.argv[1]
|
||||
outFile = sys.argv[2]
|
||||
|
||||
def analyze_line(line):
|
||||
lowercase_count = len(re.findall(r'\b[a-ząćęłńóśźż]\w*\b', line))
|
||||
uppercase_count = len(re.findall(r'\b[A-ZĄĆĘŁŃÓŚŹŻ]\w*\b', line))
|
||||
|
||||
return (lowercase_count, uppercase_count)
|
||||
|
||||
if __name__ == "__main__":
|
||||
try:
|
||||
with open(inFile, 'r', encoding='utf-8') as inputFile, open(outFile, 'w', encoding='utf-8') as outputFile:
|
||||
for line in inputFile:
|
||||
line = line.rstrip()
|
||||
|
||||
lowercase_count, uppercase_count = analyze_line(line)
|
||||
outputFile.write(f'{lowercase_count} {uppercase_count}' + '\n')
|
||||
print(f'{lowercase_count} {uppercase_count}')
|
||||
|
||||
|
||||
except EOFError:
|
||||
pass
|
2
TaskF03/simple.exp
Normal file
2
TaskF03/simple.exp
Normal file
@ -0,0 +1,2 @@
|
||||
2 1
|
||||
1 0
|
2
TaskF03/simple.in
Normal file
2
TaskF03/simple.in
Normal file
@ -0,0 +1,2 @@
|
||||
Żmija i żuk.
|
||||
3daniowy obiad
|
2
TaskF03/simple.out
Normal file
2
TaskF03/simple.out
Normal file
@ -0,0 +1,2 @@
|
||||
2 1
|
||||
1 0
|
20
TaskF04/description.txt
Normal file
20
TaskF04/description.txt
Normal file
@ -0,0 +1,20 @@
|
||||
Napisać program, który wczytuje kolejne wiersze ze standardowego
|
||||
wejścia i analizuje każdy wiersz (bez znaku końca wiersza). Należy w
|
||||
jak największym stopniu wykorzystać wyrażenia regularne (np. nie wolno
|
||||
użyć negacji jako operacji w danym języku programowania, jeśli da się
|
||||
to wyrazić w samym wyrażeniu regularnym). Tam, gdzie to możliwe należy
|
||||
użyć pojedynczego wyrażenia regularnego.
|
||||
|
||||
Write a program, which loads consecutive lines from standard input
|
||||
and analyze every line (with no newline character). You should
|
||||
use regular expressions to the greatest extent possible (e.g. you
|
||||
can not use negation in the programming language if it is
|
||||
possible to express the same in regular expression). Wherever possible,
|
||||
use one regular expression.
|
||||
|
||||
Write the input line with the second digits string deleted.
|
||||
Digit is a [0-9] class.
|
||||
|
||||
|
||||
POINTS: 1
|
||||
DEADLINE: 2020-12-18 23:59:59
|
50000
TaskF04/polish_wiki_excerpt.exp
Normal file
50000
TaskF04/polish_wiki_excerpt.exp
Normal file
File diff suppressed because one or more lines are too long
0
TaskF04/polish_wiki_excerpt.in
Normal file
0
TaskF04/polish_wiki_excerpt.in
Normal file
17
TaskF04/run.py
Normal file
17
TaskF04/run.py
Normal file
@ -0,0 +1,17 @@
|
||||
import re
|
||||
import sys
|
||||
|
||||
inFile = sys.argv[1]
|
||||
outFile = sys.argv[2]
|
||||
|
||||
def analyze_line(line):
|
||||
# Use regular expression to remove the second string of digits
|
||||
result = re.sub(r'(\D*\d\D*)\d*', r'\1', line)
|
||||
return (result)
|
||||
|
||||
with open(inFile, 'r', encoding='utf-8') as inputFile, open(outFile, 'w', encoding='utf-8') as outputFile:
|
||||
for line in inputFile:
|
||||
line = line.rstrip()
|
||||
|
||||
res = analyze_line(line)
|
||||
outputFile.write(res+ '\n')
|
3
TaskF04/simple.exp
Normal file
3
TaskF04/simple.exp
Normal file
@ -0,0 +1,3 @@
|
||||
Mam 2 jabłka i banananów.
|
||||
Mam 2 jabłka i banananów oraz 20 gruszek.
|
||||
Widziałem 2 bociany.
|
3
TaskF04/simple.in
Normal file
3
TaskF04/simple.in
Normal file
@ -0,0 +1,3 @@
|
||||
Mam 2 jabłka i 35 banananów.
|
||||
Mam 2 jabłka i 35 banananów oraz 20 gruszek.
|
||||
Widziałem 2 bociany.
|
3
TaskF04/simple.out
Normal file
3
TaskF04/simple.out
Normal file
@ -0,0 +1,3 @@
|
||||
Mam 2 jabłka i banananów.
|
||||
Mam 2 jabłka i banananów oraz 2 gruszek.
|
||||
Widziałem 2 bociany.
|
22
TaskF05/description.txt
Normal file
22
TaskF05/description.txt
Normal file
@ -0,0 +1,22 @@
|
||||
Napisać program, który wczytuje kolejne wiersze ze standardowego
|
||||
wejścia i analizuje każdy wiersz (bez znaku końca wiersza). Należy w
|
||||
jak największym stopniu wykorzystać wyrażenia regularne (np. nie wolno
|
||||
użyć negacji jako operacji w danym języku programowania, jeśli da się
|
||||
to wyrazić w samym wyrażeniu regularnym). Tam, gdzie to możliwe należy
|
||||
użyć pojedynczego wyrażenia regularnego.
|
||||
|
||||
Write a program, which loads consecutive lines from standard input
|
||||
and analyze every line (with no newline character). You should
|
||||
use regular expressions to the greatest extent possible (e.g. you
|
||||
can not use negation in the programming language if it is
|
||||
possible to express the same in regular expression). Wherever possible,
|
||||
use one regular expression.
|
||||
|
||||
Write the input line with the third word changed to "xxx" string.
|
||||
The number of "x" in the "xxx" string should be the same as the
|
||||
the number of characters in the input string.
|
||||
In this task, a word means a string of "\w" metacharacters.
|
||||
|
||||
|
||||
POINTS: 2
|
||||
DEADLINE: 2020-12-18 23:59:59
|
50000
TaskF05/polish_wiki_excerpt.exp
Normal file
50000
TaskF05/polish_wiki_excerpt.exp
Normal file
File diff suppressed because one or more lines are too long
0
TaskF05/polish_wiki_excerpt.in
Normal file
0
TaskF05/polish_wiki_excerpt.in
Normal file
28
TaskF05/run.py
Normal file
28
TaskF05/run.py
Normal file
@ -0,0 +1,28 @@
|
||||
import re
|
||||
import sys
|
||||
|
||||
|
||||
inFile = sys.argv[1]
|
||||
outFile = sys.argv[2]
|
||||
|
||||
with open(inFile, 'r', encoding='utf-8') as inputFile, open(outFile, 'w', encoding='utf-8') as outputFile:
|
||||
for line in inputFile:
|
||||
line = line.rstrip()
|
||||
words = re.findall(r'\w+', line)
|
||||
|
||||
if len(words) >= 3:
|
||||
# Replace the third word with "xxx" string
|
||||
third_word = words[2]
|
||||
replacement = 'x' * len(third_word)
|
||||
words[2] = replacement
|
||||
|
||||
# Print the modified line
|
||||
outputFile.write(' '.join(words)+ '\n')
|
||||
else:
|
||||
# If there are less than 3 words, print the original line
|
||||
outputFile.write(line + '\n')
|
||||
|
||||
|
||||
|
||||
|
||||
|
2
TaskF05/simple.exp
Normal file
2
TaskF05/simple.exp
Normal file
@ -0,0 +1,2 @@
|
||||
Mam 2 xxxxxx i 35 banananów.
|
||||
Widziałem 2 xxxxxxx.
|
2
TaskF05/simple.in
Normal file
2
TaskF05/simple.in
Normal file
@ -0,0 +1,2 @@
|
||||
Mam 2 jabłka i 35 banananów.
|
||||
Widziałem 2 bociany.
|
2
TaskF05/simple.out
Normal file
2
TaskF05/simple.out
Normal file
@ -0,0 +1,2 @@
|
||||
Mam 2 xxxxxx i 35 banananów
|
||||
Widziałem 2 xxxxxxx
|
20
TaskG00/description.txt
Normal file
20
TaskG00/description.txt
Normal file
@ -0,0 +1,20 @@
|
||||
Use regular expressions to extract lines containing polish surnames.
|
||||
|
||||
Download list of polish male and female surnames from here:
|
||||
|
||||
|
||||
* https://dane.gov.pl/pl/dataset/1681,nazwiska-osob-zyjacych-wystepujace-w-rejestrze-pesel/resource/35279/table?page=1&per_page=20&q=&sort=
|
||||
* https://dane.gov.pl/pl/dataset/1681,nazwiska-osob-zyjacych-wystepujace-w-rejestrze-pesel/resource/22817/table?page=1&per_page=20&q=&sort=
|
||||
|
||||
|
||||
Extract lines from stdin containing any of the surnames.
|
||||
Look only for surnames in lowercase.
|
||||
The surname does not have to be surrounded by space or any other special characters.
|
||||
Don't search for declined forms of surnames.
|
||||
|
||||
Check either NFA (e.g. re python library) and DFA (google re2) and compare run speed.
|
||||
|
||||
Submit solution based on DFA library.
|
||||
|
||||
NOTE: You could extract the polish surnames list, save it to a file, then commit the file to your repository.
|
||||
NOTE: You may set max_mem to a higher value than the default in re2 library.
|
44401
TaskG00/polish_wiki_excerpt.exp
Normal file
44401
TaskG00/polish_wiki_excerpt.exp
Normal file
File diff suppressed because one or more lines are too long
0
TaskG00/polish_wiki_excerpt.in
Normal file
0
TaskG00/polish_wiki_excerpt.in
Normal file
41
TaskG01/description.txt
Normal file
41
TaskG01/description.txt
Normal file
@ -0,0 +1,41 @@
|
||||
Use regular expressions to mark Polish first-person masculine forms.
|
||||
|
||||
You should handle the following types of expressions:
|
||||
|
||||
* first-person masculine past forms of verbs ("zrobiłem", "pisałem", etc.),
|
||||
* first-person singular masculine forms of the verb "być" ("be") combined
|
||||
with singular masculine nominative forms of adjectives ("wysoki", "sprytny", etc.),
|
||||
assuming that the form of the verb "być" is to the left of the adjective, not
|
||||
more than 3 other words,
|
||||
* the verb "będę" combined with the past participle (i.e. 3rd person
|
||||
masculine imperfect form, e.g. "robił", pisał"), assuming
|
||||
that "będę" is to the left of the adjective, not
|
||||
more than 3 other words to the left of the participle OR directly
|
||||
to the right of the participle ("robił będę").
|
||||
|
||||
The first-person masculine forms should be marked with curly brackets.
|
||||
You should mark only the masculine form. Do not mark the form of "być"
|
||||
(unless it clearly a masculine form, i.e. for "byłem").
|
||||
|
||||
The match should be case-insensitive.
|
||||
|
||||
The PoliMorf dictionary of inflected forms should be applied:
|
||||
http://zil.ipipan.waw.pl/PoliMorf?action=AttachFile&do=get&target=PoliMorf-0.6.7.tab.gz
|
||||
|
||||
Suggested steps:
|
||||
|
||||
1. Extract all the needed forms from the PoliMorf dictionary:
|
||||
|
||||
* 1st person masculine past forms of verbs, unfortunately
|
||||
this form is not directly present in the lexicon, you need
|
||||
to add "em" to the 3rd person masculine form ("zrobił" => "zrobiłem")
|
||||
* singular masculine nominative forms of adjectives
|
||||
* masculine past participle (3rd person masculine imperfect forms of verbs)
|
||||
|
||||
You could do this using grep/cut commands — to obtain a simple text files
|
||||
with a word in each line. You can do this once and commit the 3 files to your repository.
|
||||
|
||||
2. In your `run` script/program, read the 3 files and create a large
|
||||
expression with alternatives. Use a regexp library based on DFAs (determintistic
|
||||
finite-automatons).
|
||||
|
25
TaskG01/simple.exp
Normal file
25
TaskG01/simple.exp
Normal file
@ -0,0 +1,25 @@
|
||||
Tu nic nie ma.
|
||||
Wczoraj {ugotowałem} ziemniaki.
|
||||
{Jechałem}, {jechałem} i {jechałem}, a potem się {zatrzymałem}.
|
||||
{Umyłem} się mydłem.
|
||||
Jestem {wysoki}.
|
||||
Jest wysoki.
|
||||
Mówią, że jestem od zawsze niezwykle {sprytny}.
|
||||
aaaa {byłem} aaa {zielony} ddd
|
||||
aaaa {byłem} aaa bbb {zielony} ddd
|
||||
aaaa {byłem} aaa bbb ccc {zielony} ddd
|
||||
aaaa {byłem} aaa bbb ccc ddd zielony ddd
|
||||
aaaa był aaa bbb zielony ddd
|
||||
aaaa byłam aaa bbb zielony ddd
|
||||
aaaa byłam aaa bbb zielona ddd
|
||||
teraz będę {pisał} książkę
|
||||
będę teraz {pisał} książkę
|
||||
będę teraz dla ciebie {pisał} książkę
|
||||
teraz dla ciebie {pisał} będę księżkę
|
||||
będę i on napisał książkę
|
||||
aaa będę {śpiewał} bbb
|
||||
aaa będę ccc {śpiewał} bbb
|
||||
aaa będę ccc ddd {śpiewał} bbb
|
||||
aaa będę ccc ddd eee {śpiewał} bbb
|
||||
aaa będę ccc ddd eee fff śpiewał bbb
|
||||
{pływałem} i {biegałem}
|
25
TaskG01/simple.in
Normal file
25
TaskG01/simple.in
Normal file
@ -0,0 +1,25 @@
|
||||
Tu nic nie ma.
|
||||
Wczoraj ugotowałem ziemniaki.
|
||||
Jechałem, jechałem i jechałem, a potem się zatrzymałem.
|
||||
Umyłem się mydłem.
|
||||
Jestem wysoki.
|
||||
Jest wysoki.
|
||||
Mówią, że jestem od zawsze niezwykle sprytny.
|
||||
aaaa byłem aaa zielony ddd
|
||||
aaaa byłem aaa bbb zielony ddd
|
||||
aaaa byłem aaa bbb ccc zielony ddd
|
||||
aaaa byłem aaa bbb ccc ddd zielony ddd
|
||||
aaaa był aaa bbb zielony ddd
|
||||
aaaa byłam aaa bbb zielony ddd
|
||||
aaaa byłam aaa bbb zielona ddd
|
||||
teraz będę pisał książkę
|
||||
będę teraz pisał książkę
|
||||
będę teraz dla ciebie pisał książkę
|
||||
teraz dla ciebie pisał będę księżkę
|
||||
będę i on napisał książkę
|
||||
aaa będę śpiewał bbb
|
||||
aaa będę ccc śpiewał bbb
|
||||
aaa będę ccc ddd śpiewał bbb
|
||||
aaa będę ccc ddd eee śpiewał bbb
|
||||
aaa będę ccc ddd eee fff śpiewał bbb
|
||||
pływałem i biegałem
|
40
TaskG02/description.txt
Normal file
40
TaskG02/description.txt
Normal file
@ -0,0 +1,40 @@
|
||||
Use regular expressions to mark Polish first-person feminine forms.
|
||||
|
||||
You should handle the following types of expressions:
|
||||
|
||||
* first-person feminine past forms of verbs ("zrobiłam", "pisałam", etc.),
|
||||
* first-person singular feminine forms of the verb "być" ("be") combined
|
||||
with singular feminine nominative forms of adjectives ("wysoka", "sprytna", etc.),
|
||||
assuming that the form of the verb "być" is to the left of the adjective, not
|
||||
more than 3 other words,
|
||||
* the verb "będę" combined with the past participle (i.e. 3rd person
|
||||
feminine imperfect form, e.g. "robiła", pisała"), assuming
|
||||
that "będę" is to the left of the adjective, not
|
||||
more than 3 other words to the left of the participle OR directly
|
||||
to the right of the participle ("robiła będę").
|
||||
|
||||
The first-person feminine forms should be marked with curly brackets.
|
||||
You should mark only the feminine form. Do not mark the form of "być"
|
||||
(unless it clearly a feminine form, i.e. for "byłam").
|
||||
|
||||
The match should be case-insensitive.
|
||||
|
||||
The PoliMorf dictionary of inflected forms should be applied:
|
||||
http://zil.ipipan.waw.pl/PoliMorf?action=AttachFile&do=get&target=PoliMorf-0.6.7.tab.gz
|
||||
|
||||
Suggested steps:
|
||||
|
||||
1. Extract all the needed forms from the PoliMorf dictionary:
|
||||
|
||||
* 1st person feminine past forms of verbs, unfortunately
|
||||
this form is not directly present in the lexicon, you need
|
||||
to add "m" to the 3rd person feminine form ("zrobiła" => "zrobiłam")
|
||||
* singular feminine nominative forms of adjectives
|
||||
* feminine past participle (3rd person feminine imperfect forms of verbs)
|
||||
|
||||
You could do this using grep/cut commands — to obtain a simple text files
|
||||
with a word in each line. You can do this once and commit the 3 files to your repository.
|
||||
|
||||
2. In your `run` script/program, read the 3 files and create a large
|
||||
expression with alternatives. Use a regexp library based on DFAs (determintistic
|
||||
finite-automatons).
|
25
TaskG02/simple.exp
Normal file
25
TaskG02/simple.exp
Normal file
@ -0,0 +1,25 @@
|
||||
Tu nic nie ma.
|
||||
Wczoraj {ugotowałam} ziemniaki.
|
||||
{Jechałam}, {jechałam} i {jechałam}, a potem się {zatrzymałam}.
|
||||
{Umyłam} się mydłem i bam, złam się.
|
||||
Jestem {wysoka}.
|
||||
Jest {wysoka}.
|
||||
Mówią, że jestem od zawsze niezwykle {sprytna}.
|
||||
aaaa {byłam} aaa {zielona} ddd
|
||||
aaaa {byłam} aaa bbb {zielona} ddd
|
||||
aaaa {byłam} aaa bbb ccc {zielona} ddd
|
||||
aaaa {byłam} aaa bbb ccc ddd zielona ddd
|
||||
aaaa była aaa bbb zielona ddd
|
||||
aaaa byłem aaa bbb zielona ddd
|
||||
aaaa byłem aaa bbb zielony ddd
|
||||
teraz będę {pisała} książkę
|
||||
będę teraz {pisała} książkę
|
||||
będę teraz dla ciebie {pisała} książkę
|
||||
teraz dla ciebie {pisała} będę księżkę
|
||||
będę i ona napisała książkę
|
||||
aaa będę {śpiewała} bbb
|
||||
aaa będę ccc {śpiewała} bbb
|
||||
aaa będę ccc ddd {śpiewała} bbb
|
||||
aaa będę ccc ddd eee {śpiewała} bbb
|
||||
aaa będę ccc ddd eee fff śpiewała bbb
|
||||
{pływałam} i {biegałam}
|
25
TaskG02/simple.in
Normal file
25
TaskG02/simple.in
Normal file
@ -0,0 +1,25 @@
|
||||
Tu nic nie ma.
|
||||
Wczoraj ugotowałam ziemniaki.
|
||||
Jechałam, jechałam i jechałam, a potem się zatrzymałam.
|
||||
Umyłam się mydłem i bam, złam się.
|
||||
Jestem wysoka.
|
||||
Jest wysoka.
|
||||
Mówią, że jestem od zawsze niezwykle sprytna.
|
||||
aaaa byłam aaa zielona ddd
|
||||
aaaa byłam aaa bbb zielona ddd
|
||||
aaaa byłam aaa bbb ccc zielona ddd
|
||||
aaaa byłam aaa bbb ccc ddd zielona ddd
|
||||
aaaa była aaa bbb zielona ddd
|
||||
aaaa byłem aaa bbb zielona ddd
|
||||
aaaa byłem aaa bbb zielony ddd
|
||||
teraz będę pisała książkę
|
||||
będę teraz pisała książkę
|
||||
będę teraz dla ciebie pisała książkę
|
||||
teraz dla ciebie pisała będę księżkę
|
||||
będę i ona napisała książkę
|
||||
aaa będę śpiewała bbb
|
||||
aaa będę ccc śpiewała bbb
|
||||
aaa będę ccc ddd śpiewała bbb
|
||||
aaa będę ccc ddd eee śpiewała bbb
|
||||
aaa będę ccc ddd eee fff śpiewała bbb
|
||||
pływałam i biegałam
|
20
TaskG03/description.txt
Normal file
20
TaskG03/description.txt
Normal file
@ -0,0 +1,20 @@
|
||||
Use regular expressions to extract lines containing polish surnames. CASE INSENSITIVE
|
||||
|
||||
Download list of polish male and female surnames from here:
|
||||
|
||||
|
||||
* https://dane.gov.pl/pl/dataset/1681,nazwiska-osob-zyjacych-wystepujace-w-rejestrze-pesel/resource/35279/table?page=1&per_page=20&q=&sort=
|
||||
* https://dane.gov.pl/pl/dataset/1681,nazwiska-osob-zyjacych-wystepujace-w-rejestrze-pesel/resource/22817/table?page=1&per_page=20&q=&sort=
|
||||
|
||||
|
||||
Extract lines from stdin containing any of the surnames.
|
||||
Look only for surnames no matter casing (case insensitive).
|
||||
The surname does not have to be surrounded by space or any other special characters.
|
||||
Don't search for declined forms of surnames.
|
||||
|
||||
Check either NFA (e.g. re python library) and DFA (google re2) and compare them.
|
||||
|
||||
Submit solution based on a better method.
|
||||
|
||||
NOTE: You could extract the polish surnames list, save it to a file, then commit the file to your repository.
|
||||
NOTE: You may set max_mem to a higher value than the default in re2 library.
|
44696
TaskG03/polish_wiki_excerpt.exp
Normal file
44696
TaskG03/polish_wiki_excerpt.exp
Normal file
File diff suppressed because one or more lines are too long
0
TaskG03/polish_wiki_excerpt.in
Normal file
0
TaskG03/polish_wiki_excerpt.in
Normal file
8
TaskG04/description.txt
Normal file
8
TaskG04/description.txt
Normal file
@ -0,0 +1,8 @@
|
||||
Na wejściu dostajemy bajty tekstu UTF-8 zapisane w zwykłej postaci tekstu
|
||||
(Zatem obługujemy strumień wejścia z 0 i 1 w postaci tekstowej, a nie strumień bitów).
|
||||
|
||||
Przekonwertuj plik na tekst UTF-8.
|
||||
|
||||
POINTS: 2
|
||||
DEADLINE: 2021-01-15 23:59:59
|
||||
REMAINDER: 0/2
|
0
TaskG04/polish_wiki_excerpt.exp
Normal file
0
TaskG04/polish_wiki_excerpt.exp
Normal file
Loading…
Reference in New Issue
Block a user