Go to file
2020-11-15 19:58:25 +01:00
dev-0 Remove some duplicates 2020-11-15 19:32:02 +01:00
test-A Clean out test 2020-11-15 19:58:25 +01:00
train Add Google translator API 2020-10-23 23:53:05 +02:00
500.txt Googletrans improvement 2020-11-15 18:57:38 +01:00
config.txt Add Google translator API 2020-10-23 23:53:05 +02:00
create_voc.py Add post-processing to 25k 2020-11-15 16:26:19 +01:00
en_words.pickle Add post-processing to 25k 2020-11-15 16:26:19 +01:00
out-dev-clean.tsv Clean out test 2020-11-15 19:58:25 +01:00
out-test-clean.tsv Clean out test 2020-11-15 19:58:25 +01:00
pl_en.pickle Add post-processing to 25k 2020-11-15 16:26:19 +01:00
pl_words.pickle Add post-processing to 25k 2020-11-15 16:26:19 +01:00
post_process.py Clean out test 2020-11-15 19:58:25 +01:00
README.md Add Google translator API 2020-10-23 23:53:05 +02:00
test.txt Clean out test 2020-11-15 19:58:25 +01:00

wmt-2020-pl-en

Translate from Polish to English.

This is a challenge created from http://www.statmt.org/wmt20/translation-task.html . Train set is created from europarl wmt pl-en training data. Dev and test set are created from wmt pl-en development data.

Directory structure

  • README.md — this file
  • config.txt — configuration file
  • train/ — directory with training data
  • train/train.tsv — sample parallel corpus (Finnish text in the first column, Māori text in the second one)
  • dev-0/ — directory with dev (test) data
  • dev-0/in.tsv — Finnish input text for the dev set
  • dev-0/expected.tsv — Māori reference translation for the dev set
  • test-A — directory with test data
  • test-A/in.tsv — Finnish input data for the test set
  • test-A/expected.tsv — Māori reference translation for the test set