Guess a word in a gap ======================================= Give a probability distribution for a word in a gap in a corpus of book "Lalka". This is a challenge for language models. The metric is log-loss calculated on 10-bit fingerprints generated with MurmurHash3 hash function. Example ------- For instance, you are expected guess the word in the gap marked with token here: > Wokulski słuchał go uważnie do Pruszkowa . > Za Pruszkowem zmęczony i jednostajny głos pana > Tomasza zaczął go męczyć . Za to coraz > wyraźniej wpadała mu w ucho rozmowa panny > Izabeli ze , prowadzona po angielsku . > Usłyszał nawet kilka zdań , które go zainteresowały , > i zadał sobie pytanie : czy nie należałoby ostrzec > ich , że on rozumie po angielsku ? (And the correct expected word here is *rolnej*.) Directory structure ------------------- * `README.md` — this file * `config.txt` — GEval configuration file * `train/` — directory with training data * `dev-0/` — directory with dev (test) data * `dev-0/in.tsv` — input data for the dev set * `dev-0/expected.tsv` — expected (reference) data for the dev set * `test-A` — directory with test data * `test-A/in.tsv` — input data for the test set * `test-A/expected.tsv` — expected (reference) data for the test set (hidden) Format of the output files -------------------------- For each input line, a probability distribution for words in a gap must be given: word1:logprob1 word2:logprob2 ... wordN:logprobN :logprob0 where *logprobi* is the logarithm of the probability for *wordi* and *logprob0* is the logarithm of the probability mass for all the other words (it will be spread between all 1024 fingerprint values). If the respective probabilities do not sum up to 1, they will be normalised with softmax.