diff --git a/README.md b/README.md
index fb1b280..a553bef 100644
--- a/README.md
+++ b/README.md
@@ -64,7 +64,7 @@ Let's step into the repo and run GEval (I assume you added `geval`
 path to `$PATH`, so that you could just use `geval` instead of
 `/full/path/to/geval`):
 
-    cd submission-01229
+    cd wmt-2017
     geval
 
 Well, something apparently went wrong:
@@ -94,18 +94,6 @@ After a moment, you'll see the results:
     WER	0.55201
     Accuracy	0.01660
 
-Ah, we forgot about the tokenization, in order to properly calculate
-BLEU (or GLEU) the way it was done within the official WMT-2017
-challenge, you need to tokenize the expected output and the actual
-output of your system using the right tokenizer:
-
-    geval -t dev-0 --metric GLEU --metric WER --metric Accuracy --tokenizer 13a
-
-    BLEU	0.26901
-    WER 0.58858
-    GLEU	0.30514
-    Accuracy	0.01660
-
 The results do not look good anyway and I'm not talking about
 Accuracy, which, even for a good MT (or even a human), will be low (as
 it measures how many translations are exactly the same as the golden
@@ -213,6 +201,12 @@ and run GEval for one of the submissions (UEdin-NMT):
 
 where `-i` stands for the input file, `-o` — output file, `-e` — file with expected (reference) data.
 
+Note the tokenization, in order to properly calculate
+BLEU (or GLEU) the way it was done within the official WMT-2017
+challenge, you need to tokenize the expected output and the actual
+output of your system using the right tokenizer. (The test set packaged
+for Gonito.net challenge were already tokenized.)
+
 Let's evaluate another system:
 
     geval --metric BLEU --precision 4 --tokenizer 13a \