update README

2018-09-01 14:43:35 +02:00 · 2018-09-01 14:43:35 +02:00 · 081b2507f3
commit 081b2507f3
parent e2c3102cc4
1 changed files with 45 additions and 3 deletions
--- a/README.md
+++ b/README.md
@ -208,21 +208,47 @@ Then let Gonito pull them and evaluate your results.
 ## `geval` options

 ```
-Usage: geval ([--init] | [-l|--line-by-line] | [-d|--diff OTHER-OUT])
-             ([-s|--sort] | [-r|--reverse-sort]) [--out-directory OUT-DIRECTORY]
+geval - stand-alone evaluation tool for tests in Gonito platform
+
+Usage: geval ([--init] | [-v|--version] | [-l|--line-by-line] |
+             [-w|--worst-features] | [-d|--diff OTHER-OUT] |
+             [-m|--most-worsening-features ARG] | [-j|--just-tokenize] |
+             [-S|--submit]) ([-s|--sort] | [-r|--reverse-sort])
+             [--out-directory OUT-DIRECTORY]
             [--expected-directory EXPECTED-DIRECTORY] [-t|--test-name NAME]
             [-o|--out-file OUT] [-e|--expected-file EXPECTED]
             [-i|--input-file INPUT] [-a|--alt-metric METRIC]
             [-m|--metric METRIC] [-p|--precision NUMBER-OF-FRACTIONAL-DIGITS]
+             [-T|--tokenizer TOKENIZER] [--gonito-host GONITO_HOST]
+             [--token TOKEN]
  Run evaluation for tests in Gonito platform

 Available options:
  -h,--help                Show this help text
  --init                   Init a sample Gonito challenge rather than run an
                           evaluation
+  -v,--version             Print GEval version
  -l,--line-by-line        Give scores for each line rather than the whole test
                           set
-  -d,--diff OTHER-OUT      compare results
+  -w,--worst-features      Print a ranking of worst features, i.e. features that
+                           worsen the score significantly. Features are sorted
+                           using p-value for Mann-Whitney U test comparing the
+                           items with a given feature and without it. For each
+                           feature the number of occurrences, average score and
+                           p-value is given.
+  -d,--diff OTHER-OUT      Compare results of evaluations (line by line) for two
+                           outputs.
+  -m,--most-worsening-features ARG
+                           Print a ranking of the "most worsening" features,
+                           i.e. features that worsen the score the most when
+                           comparing outputs from two systems.
+  -j,--just-tokenize       Just tokenise standard input and print out the tokens
+                           (separated by spaces) on the standard output. rather
+                           than do any evaluation. The --tokenizer option must
+                           be given.
+  -S,--submit              Submit current solution for evalution to an external
+                           Gonito instance specified with --gonito-host option.
+                           Optionally, specify --token.
  -s,--sort                When in line-by-line or diff mode, sort the results
                           from the worst to the best
  -r,--reverse-sort        When in line-by-line or diff mode, sort the results
@ -245,11 +271,23 @@ Available options:
  -a,--alt-metric METRIC   Alternative metric (overrides --metric option)
  -m,--metric METRIC       Metric to be used - RMSE, MSE, Accuracy, LogLoss,
                           Likelihood, F-measure (specify as F1, F2, F0.25,
+                           etc.), multi-label F-measure (specify as
+                           MultiLabel-F1, MultiLabel-F2, MultiLabel-F0.25,
                           etc.), MAP, BLEU, NMI, ClippEU, LogLossHashed,
                           LikelihoodHashed, BIO-F1, BIO-F1-Labels or CharMatch
  -p,--precision NUMBER-OF-FRACTIONAL-DIGITS
                           Arithmetic precision, i.e. the number of fractional
                           digits to be shown
+  -T,--tokenizer TOKENIZER Tokenizer on expected and actual output before
+                           running evaluation (makes sense mostly for metrics
+                           such BLEU), minimalistic, 13a and v14 tokenizers are
+                           implemented so far. Will be also used for tokenizing
+                           text into features when in --worst-features and
+                           --most-worsening-features modes.
+  --gonito-host GONITO_HOST
+                           Submit ONLY: Gonito instance location.
+  --token TOKEN            Submit ONLY: Token for authorization with Gonito
+                           instance.
 ```

 If you need another metric, let me know, or do it yourself!
@ -261,3 +299,7 @@ Apache License 2.0
 ## Authors

 Filip Graliński
+
+## Contributors
+
+Piotr Halama