dia-norm/README.md
2023-10-12 11:53:23 +02:00

24 lines
835 B
Markdown

Diachronic normalisation of Polish texts
========================================
Transform old Polish texts into modern spelling.
CharMatch metric is used here, i.e. F-score for expected corrections
(i.e. changes between the input text and the expected output).
Directory structure
-------------------
* `README.md` — this file
* `config.txt` — configuration file
* `dev-0/` — directory with dev (test) data
* `dev-0/in.tsv` — input text for the dev set
* `dev-0/expected.tsv` — reference text for the dev set
* `dev-1/` — directory with another dev (test) set
* `dev-1/in.tsv` — input text for the dev set
* `dev-1/expected.tsv` — reference text for the dev set
* `test-A` — directory with test data
* `test-A/in.tsv` — input data for the test set
* `test-A/expected.tsv` — reference text for the test set