dev-0 | ||
dev-1 | ||
test-A | ||
config.txt | ||
README.md |
Diachronic normalisation of Polish texts
Transform old Polish texts into modern spelling.
CharMatch metric is used here, i.e. F-score for expected corrections (i.e. changes between the input text and the expected output).
Directory structure
README.md
— this fileconfig.txt
— configuration filedev-0/
— directory with dev (test) datadev-0/in.tsv
— input text for the dev setdev-0/expected.tsv
— reference text for the dev setdev-1/
— directory with another dev (test) setdev-1/in.tsv
— input text for the dev setdev-1/expected.tsv
— reference text for the dev settest-A
— directory with test datatest-A/in.tsv
— input data for the test settest-A/expected.tsv
— reference text for the test set