24 lines
835 B
Markdown
24 lines
835 B
Markdown
|
|
||
|
Diachronic normalisation of Polish texts
|
||
|
========================================
|
||
|
|
||
|
Transform old Polish texts into modern spelling.
|
||
|
|
||
|
CharMatch metric is used here, i.e. F-score for expected corrections
|
||
|
(i.e. changes between the input text and the expected output).
|
||
|
|
||
|
Directory structure
|
||
|
-------------------
|
||
|
|
||
|
* `README.md` — this file
|
||
|
* `config.txt` — configuration file
|
||
|
* `dev-0/` — directory with dev (test) data
|
||
|
* `dev-0/in.tsv` — input text for the dev set
|
||
|
* `dev-0/expected.tsv` — reference text for the dev set
|
||
|
* `dev-1/` — directory with another dev (test) set
|
||
|
* `dev-1/in.tsv` — input text for the dev set
|
||
|
* `dev-1/expected.tsv` — reference text for the dev set
|
||
|
* `test-A` — directory with test data
|
||
|
* `test-A/in.tsv` — input data for the test set
|
||
|
* `test-A/expected.tsv` — reference text for the test set
|