Go to file
2019-12-04 22:52:59 +01:00
.idea test 2019-12-04 22:52:59 +01:00
dev-0 test 2019-12-04 22:52:59 +01:00
test-A test 2019-12-04 22:52:59 +01:00
train TAU_22_sane_words_torch_nn 2019-12-02 14:41:07 +01:00
.gitignore TAU_22_sane_words_torch_nn 2019-12-02 14:41:07 +01:00
config.txt TAU_22_sane_words_torch_nn 2019-12-02 14:41:07 +01:00
new_lab.py test 2019-12-04 22:52:59 +01:00
README.md TAU_22_sane_words_torch_nn 2019-12-02 14:41:07 +01:00
s2.py test 2019-12-04 22:52:59 +01:00
s.py test 2019-12-04 22:52:59 +01:00

Sane words challenge

Guess if a given word is a correct Polish word in a given domain. Additionally, you have the information on reported frequency of the word in source texts.

Each entry in training data set is of the form: Sane (0 or 1), Domain, Word, Frequency. Evaluation metric is F2-score.

Directory structure

  • README.md — this file
  • config.txt — configuration file
  • train/ — directory with training data
  • train/train.tsv — train set
  • dev-0/ — directory with dev (test) data
  • dev-0/in.tsv — input data for the dev set
  • dev-0/expected.tsv — expected (reference) data for the dev set
  • test-A — directory with test data
  • test-A/in.tsv — input data for the test set
  • test-A/expected.tsv — expected (reference) data for the test set