837 B
837 B
Sane words challenge
Guess if a given word is a correct Polish word in a given domain. Additionally, you have the information on reported frequency of the word in source texts.
Each entry in training data set is of the form: Sane (0 or 1), Domain, Word, Frequency. Evaluation metric is F2-score.
Directory structure
README.md
— this fileconfig.txt
— configuration filetrain/
— directory with training datatrain/train.tsv
— train setdev-0/
— directory with dev (test) datadev-0/in.tsv
— input data for the dev setdev-0/expected.tsv
— expected (reference) data for the dev settest-A
— directory with test datatest-A/in.tsv
— input data for the test settest-A/expected.tsv
— expected (reference) data for the test set