improve sample challenge for LogLossHashed

2018-05-15 08:14:52 +02:00 · 2018-05-15 08:14:52 +02:00 · 9fc4beaba1
commit 9fc4beaba1
parent 06fd093349
1 changed files with 29 additions and 0 deletions
--- a/src/GEval/CreateChallenge.hs
+++ b/src/GEval/CreateChallenge.hs
@ -113,6 +113,35 @@ The metric is average log-loss calculated for 10-bit hashes.
 Train file is a just text file (one utterance per line).
 In an input file, left and right contexts (TAB-separated) are given.
 In an expected file, the word to be guessed is given.
 Format of the output files
 --------------------------
 For each input line, a probability distribution for words in a gap
 must be given:
    word1:logprob1 word2:logprob2 ... wordN:logprobN :logprob0
 where *logprobi* is the logarithm of the probability for *wordi* and
 *logprob0* is the logarithm of the probability mass for all the other
 words (it will be spread between all 1024 fingerprint values). If the
 respective probabilities do not sum up to 1, they will be normalised with
 softmax.
 Note: the separator here is space, not TAB!
 ### Probs
 Probabilities could be given (instead of logprobs):
  * if **all** values look as probs and **at least value** is positive, we treat
    the values as probs rather then logprobs (single value 0.0 is treated
    as a logprob, i.e. probability 1.0!);
  * if their sum is greater than 1.0, then we normalize simply by dividing by the sum;
  * if the sum is smaller than 1.0 and there is no entry for all the other words,
    we add such an entry for the missing probability mass;
  * if the sum is smaller than 1.0 and there is an entry for all the other words,
    we normalize by dividing by the sum.
 |] ++ (commonReadmeMDContents testName)
 readmeMDContents CharMatch testName = [i|