Merge remote-tracking branch 'origin/master' into improvement/#25-issue-when-build-under-linux

# Conflicts:
#	README.md
This commit is contained in:
Mateusz Hinc 2019-12-11 10:20:46 +01:00
commit 3528fcbfca
19 changed files with 280 additions and 24 deletions

View File

@ -1,4 +1,13 @@
## 1.22.1.0
* Add "Mean/" meta-metric (for the time being working only with MultiLabel-F-measure)
* Add :S flag
## 1.22.0.0
* Add SegmentAccuracy
## 1.21.0.0 ## 1.21.0.0
* Add Probabilistic-MultiLabel-F-measure * Add Probabilistic-MultiLabel-F-measure

View File

@ -2,7 +2,7 @@
GEval is a Haskell library and a stand-alone tool for evaluating the GEval is a Haskell library and a stand-alone tool for evaluating the
results of solutions to machine learning challenges as defined in the results of solutions to machine learning challenges as defined in the
[Gonito](https://gonito.net) platform. Also could be used outside the [Gonito](https://gonito.net) platform. Also, could be used outside the
context of Gonito.net challenges, assuming the test data is given in context of Gonito.net challenges, assuming the test data is given in
simple TSV (tab-separated values) files. simple TSV (tab-separated values) files.
@ -14,6 +14,29 @@ The official repository is `git://gonito.net/geval`, browsable at
## Installing ## Installing
### The easy way: just download the fully static GEval binary
(Assuming you have a 64-bit Linux.)
wget https://gonito.net/get/bin/geval
chmod u+x geval
./geval --help
#### On Windows
For Windows, you should use Windows PowerShell.
wget https://gonito.net/get/bin/geval
Next, you should go to the folder where you download `geval` and right-click to `geval` file.
Go to `Properties` and in the section `Security` grant full access to the folder.
Or you should use `icacls "folder path to geval" /grant USER:<username>`
This is a fully static binary, it should work on any 64-bit Linux or 64-bit Windows.
### Build from scratch
You need [Haskell Stack](https://github.com/commercialhaskell/stack). You need [Haskell Stack](https://github.com/commercialhaskell/stack).
You could install Stack with your package manager or with: You could install Stack with your package manager or with:
@ -36,6 +59,8 @@ order to run `geval` you need to either add `$HOME/.local/bin` to
PATH="$HOME/.local/bin" geval ... PATH="$HOME/.local/bin" geval ...
In Windows you should add new global variable with name 'geval' and path should be the same as above.
### Troubleshooting ### Troubleshooting
If you see a message like this: If you see a message like this:
@ -64,15 +89,32 @@ In case the `lzma` package is not installed on your Linux, you need to run (assu
sudo apt-get install pkg-config liblzma-dev libpq-dev libpcre3-dev libcairo2-dev libbz2-dev sudo apt-get install pkg-config liblzma-dev libpq-dev libpcre3-dev libcairo2-dev libbz2-dev
### Plan B — just download the GEval binary #### Windows issues
(Assuming you have a 64-bit Linux.) If you see this message on Windows during executing `stack test` command:
wget https://gonito.net/get/bin/geval In the dependencies for geval-1.21.1.0:
chmod u+x geval     unix needed, but the stack configuration has no specified version
./geval --help In the dependencies for lzma-0.0.0.3:
    lzma-clib needed, but the stack configuration has no specified version
This is a fully static binary, it should work on any 64-bit Linux. You should replace `unix` with `unix-compat` in `geval.cabal` file,
because `unix` package is not supported for Windows.
And you should add `lzma-clib-5.2.2` and `unix-compat-0.5.2` to section extra-deps in `stack.yaml` file.
If you see message about missing pkg-config on Windpws you should download two packages from the site:
http://ftp.gnome.org/pub/gnome/binaries/win32/dependencies/
These packages are:
- pkg-config (the newest version)
- gettext-runtime (the newest version)
Extract `pkg-config.exe` file in Windows PATH
Extract init.dll file from gettext-runtime
You should also download from http://ftp.gnome.org/pub/gnome/binaries/win32/glib/2.28 glib package
and extract libglib-2.0-0.dll file.
All files you should put for example in `C:\MinGW\bin` directory.
## Quick tour ## Quick tour
@ -189,7 +231,7 @@ But why were double quotes so problematic in German-English
translation?! Well, look at the second-worst feature — `&apos;&apos;` translation?! Well, look at the second-worst feature — `&apos;&apos;`
in the _output_! Oops, it seems like a very stupid mistake with in the _output_! Oops, it seems like a very stupid mistake with
post-processing was done and no double quote was correctly generated, post-processing was done and no double quote was correctly generated,
which decreased the score a little bit for each sentence in which the which decreased the score a little for each sentence in which the
quote was expected. quote was expected.
When I fixed this simple bug, the BLUE metric increased from 0.27358 When I fixed this simple bug, the BLUE metric increased from 0.27358
@ -502,9 +544,9 @@ submitted. The suggested way to do this is as follows:
`test-A/expected.tsv` added. This branch should be accessible by `test-A/expected.tsv` added. This branch should be accessible by
Gonito platform, but should be kept “hidden” for regular users (or Gonito platform, but should be kept “hidden” for regular users (or
at least they should be kindly asked not to peek there). It is at least they should be kindly asked not to peek there). It is
recommended (though not obligatory) that this branch contain all recommended (though not obligatory) that this branch contains all
the source codes and data used to generate the train/dev/test sets. the source codes and data used to generate the train/dev/test sets.
(Use [git-annex](https://git-annex.branchable.com/) if you have really big files there.) (Use [git-annex](https://git-annex.branchable.com/) if you have huge files there.)
Branch (1) should be the parent of the branch (2), for instance, the Branch (1) should be the parent of the branch (2), for instance, the
repo (for the toy “planets” challenge) could be created as follows: repo (for the toy “planets” challenge) could be created as follows:
@ -567,7 +609,7 @@ be nice and commit also your source codes.
git push mine master git push mine master
Then let Gonito pull them and evaluate your results, either manually clicking Then let Gonito pull them and evaluate your results, either manually clicking
"submit" at the Gonito web site or using `--submit` option (see below). "submit" at the Gonito website or using `--submit` option (see below).
### Submitting a solution to a Gonito platform with GEval ### Submitting a solution to a Gonito platform with GEval

View File

@ -1,5 +1,5 @@
name: geval name: geval
version: 1.21.1.0 version: 1.22.1.0
synopsis: Machine learning evaluation tools synopsis: Machine learning evaluation tools
description: Please see README.md description: Please see README.md
homepage: http://github.com/name/project homepage: http://github.com/name/project

View File

@ -4,11 +4,12 @@
module GEval.Annotation module GEval.Annotation
(parseAnnotations, Annotation(..), (parseAnnotations, Annotation(..),
parseObtainedAnnotations, ObtainedAnnotation(..), parseObtainedAnnotations, ObtainedAnnotation(..),
matchScore, intSetParser) matchScore, intSetParser, segmentAccuracy, parseSegmentAnnotations)
where where
import qualified Data.IntSet as IS import qualified Data.IntSet as IS
import qualified Data.Text as T import qualified Data.Text as T
import Data.Set (intersection, fromList)
import Data.Attoparsec.Text import Data.Attoparsec.Text
import Data.Attoparsec.Combinator import Data.Attoparsec.Combinator
@ -17,11 +18,12 @@ import GEval.Common (sepByWhitespaces, (/.))
import GEval.Probability import GEval.Probability
import Data.Char import Data.Char
import Data.Maybe (fromMaybe) import Data.Maybe (fromMaybe)
import Data.Either (partitionEithers)
import GEval.PrecisionRecall(weightedMaxMatching) import GEval.PrecisionRecall(weightedMaxMatching)
data Annotation = Annotation T.Text IS.IntSet data Annotation = Annotation T.Text IS.IntSet
deriving (Eq, Show) deriving (Eq, Show, Ord)
data ObtainedAnnotation = ObtainedAnnotation Annotation Double data ObtainedAnnotation = ObtainedAnnotation Annotation Double
deriving (Eq, Show) deriving (Eq, Show)
@ -52,6 +54,36 @@ obtainedAnnotationParser = do
parseAnnotations :: T.Text -> Either String [Annotation] parseAnnotations :: T.Text -> Either String [Annotation]
parseAnnotations t = parseOnly (annotationsParser <* endOfInput) t parseAnnotations t = parseOnly (annotationsParser <* endOfInput) t
parseSegmentAnnotations :: T.Text -> Either String [Annotation]
parseSegmentAnnotations t = case parseAnnotationsWithColons t of
Left m -> Left m
Right annotations -> if areSegmentsDisjoint annotations
then (Right annotations)
else (Left "Overlapping segments")
areSegmentsDisjoint :: [Annotation] -> Bool
areSegmentsDisjoint = areIntSetsDisjoint . map (\(Annotation _ s) -> s)
areIntSetsDisjoint :: [IS.IntSet] -> Bool
areIntSetsDisjoint ss = snd $ foldr step (IS.empty, True) ss
where step _ w@(_, False) = w
step s (u, True) = (s `IS.union` u, s `IS.disjoint` u)
-- unfortunately, attoparsec does not seem to back-track properly
-- so we need a special function if labels can contain colons
parseAnnotationsWithColons :: T.Text -> Either String [Annotation]
parseAnnotationsWithColons t = case partitionEithers (map parseAnnotationWithColons $ T.words t) of
([], annotations) -> Right annotations
((firstProblem:_), _) -> Left firstProblem
parseAnnotationWithColons :: T.Text -> Either String Annotation
parseAnnotationWithColons t = if T.null label
then Left "Colon expected"
else case parseOnly (intSetParser <* endOfInput) position of
Left m -> Left m
Right s -> Right (Annotation (T.init label) s)
where (label, position) = T.breakOnEnd ":" t
annotationsParser :: Parser [Annotation] annotationsParser :: Parser [Annotation]
annotationsParser = sepByWhitespaces annotationParser annotationsParser = sepByWhitespaces annotationParser
@ -70,3 +102,7 @@ intervalParser = do
startIx <- decimal startIx <- decimal
endIx <- (string "-" *> decimal <|> pure startIx) endIx <- (string "-" *> decimal <|> pure startIx)
pure $ IS.fromList [startIx..endIx] pure $ IS.fromList [startIx..endIx]
segmentAccuracy :: [Annotation] -> [Annotation] -> Double
segmentAccuracy expected output = (fromIntegral $ length matched) / (fromIntegral $ length expected)
where matched = (fromList expected) `intersection` (fromList output)

View File

@ -492,6 +492,23 @@ gevalCoreOnSources CharMatch inputLineSource = helper inputLineSource
gevalCoreOnSources (LogLossHashed nbOfBits) _ = helperLogLossHashed nbOfBits id gevalCoreOnSources (LogLossHashed nbOfBits) _ = helperLogLossHashed nbOfBits id
gevalCoreOnSources (LikelihoodHashed nbOfBits) _ = helperLogLossHashed nbOfBits logLossToLikehood gevalCoreOnSources (LikelihoodHashed nbOfBits) _ = helperLogLossHashed nbOfBits logLossToLikehood
gevalCoreOnSources (Mean (MultiLabelFMeasure beta)) _
= gevalCoreWithoutInputOnItemTargets (Right . intoWords)
(Right . getWords)
((fMeasureOnCounts beta) . (getCounts (==)))
averageC
id
noGraph
where
-- repeated as below, as it will be refactored into dependent types soon anyway
getWords (RawItemTarget t) = Prelude.map unpack $ selectByStandardThreshold $ parseIntoProbList t
getWords (PartiallyParsedItemTarget ts) = Prelude.map unpack ts
intoWords (RawItemTarget t) = Prelude.map unpack $ Data.Text.words t
intoWords (PartiallyParsedItemTarget ts) = Prelude.map unpack ts
gevalCoreOnSources (Mean _) _ = error $ "Mean/ meta-metric defined only for MultiLabel-F1 for the time being"
-- only MultiLabel-F1 handled for JSONs for the time being... -- only MultiLabel-F1 handled for JSONs for the time being...
gevalCoreOnSources (MultiLabelFMeasure beta) _ = gevalCoreWithoutInputOnItemTargets (Right . intoWords) gevalCoreOnSources (MultiLabelFMeasure beta) _ = gevalCoreWithoutInputOnItemTargets (Right . intoWords)
(Right . getWords) (Right . getWords)
@ -706,6 +723,13 @@ gevalCoreOnSources TokenAccuracy _ = gevalCoreWithoutInput intoTokens
| otherwise = (h, t + 1) | otherwise = (h, t + 1)
hitsAndTotalsAgg = CC.foldl (\(h1, t1) (h2, t2) -> (h1 + h2, t1 + t2)) (0, 0) hitsAndTotalsAgg = CC.foldl (\(h1, t1) (h2, t2) -> (h1 + h2, t1 + t2)) (0, 0)
gevalCoreOnSources SegmentAccuracy _ = gevalCoreWithoutInput parseSegmentAnnotations
parseSegmentAnnotations
(uncurry segmentAccuracy)
averageC
id
noGraph
gevalCoreOnSources MultiLabelLogLoss _ = gevalCoreWithoutInput intoWords gevalCoreOnSources MultiLabelLogLoss _ = gevalCoreWithoutInput intoWords
(Right . parseIntoProbList) (Right . parseIntoProbList)
(uncurry countLogLossOnProbList) (uncurry countLogLossOnProbList)

View File

@ -55,6 +55,7 @@ createFile filePath contents = do
writeFile filePath contents writeFile filePath contents
readmeMDContents :: Metric -> String -> String readmeMDContents :: Metric -> String -> String
readmeMDContents (Mean metric) testName = readmeMDContents metric testName
readmeMDContents GLEU testName = readmeMDContents BLEU testName readmeMDContents GLEU testName = readmeMDContents BLEU testName
readmeMDContents BLEU testName = [i| readmeMDContents BLEU testName = [i|
GEval sample machine translation challenge GEval sample machine translation challenge
@ -297,6 +298,19 @@ in the expected file (but not in the output file).
|] ++ (commonReadmeMDContents testName) |] ++ (commonReadmeMDContents testName)
readmeMDContents SegmentAccuracy testName = [i|
Segment a sentence and tag with POS tags
========================================
This is a sample, toy challenge for SegmentAccuracy.
For each sentence, give a sequence of POS tags, each one with
its position (1-indexed). For instance, `N:1-10` means a nouns
starting from the beginning (the first character) up to to the tenth
character (inclusively).
|] ++ (commonReadmeMDContents testName)
readmeMDContents (ProbabilisticMultiLabelFMeasure beta) testName = readmeMDContents (MultiLabelFMeasure beta) testName readmeMDContents (ProbabilisticMultiLabelFMeasure beta) testName = readmeMDContents (MultiLabelFMeasure beta) testName
readmeMDContents (MultiLabelFMeasure beta) testName = [i| readmeMDContents (MultiLabelFMeasure beta) testName = [i|
Tag names and their component Tag names and their component
@ -400,6 +414,7 @@ configContents schemes precision testName = unwords (Prelude.map (\scheme -> ("-
precisionOpt (Just p) = " --precision " ++ (show p) precisionOpt (Just p) = " --precision " ++ (show p)
trainContents :: Metric -> String trainContents :: Metric -> String
trainContents (Mean metric) = trainContents metric
trainContents GLEU = trainContents BLEU trainContents GLEU = trainContents BLEU
trainContents BLEU = [hereLit|alussa loi jumala taivaan ja maan he mea hanga na te atua i te timatanga te rangi me te whenua trainContents BLEU = [hereLit|alussa loi jumala taivaan ja maan he mea hanga na te atua i te timatanga te rangi me te whenua
ja maa oli autio ja tyhjä , ja pimeys oli syvyyden päällä a kahore he ahua o te whenua , i takoto kau ; he pouri ano a runga i te mata o te hohonu ja maa oli autio ja tyhjä , ja pimeys oli syvyyden päällä a kahore he ahua o te whenua , i takoto kau ; he pouri ano a runga i te mata o te hohonu
@ -473,6 +488,9 @@ B-firstname/JOHN I-surname/VON I-surname/NEUMANN John von Nueman
trainContents TokenAccuracy = [hereLit|* V N I like cats trainContents TokenAccuracy = [hereLit|* V N I like cats
* * V * N I can see the rainbow * * V * N I can see the rainbow
|] |]
trainContents SegmentAccuracy = [hereLit|Art:1-3 N:5-11 V:12-13 A:15-19 The student's smart
N:1-6 N:8-10 V:12-13 A:15-18 Mary's dog is nice
|]
trainContents (ProbabilisticMultiLabelFMeasure beta) = trainContents (MultiLabelFMeasure beta) trainContents (ProbabilisticMultiLabelFMeasure beta) = trainContents (MultiLabelFMeasure beta)
trainContents (MultiLabelFMeasure _) = [hereLit|I know Mr John Smith person/3,4,5 first-name/4 surname/5 trainContents (MultiLabelFMeasure _) = [hereLit|I know Mr John Smith person/3,4,5 first-name/4 surname/5
Steven bloody Brown person/1,3 first-name/1 surname/3 Steven bloody Brown person/1,3 first-name/1 surname/3
@ -494,6 +512,7 @@ trainContents _ = [hereLit|0.06 0.39 0 0.206
|] |]
devInContents :: Metric -> String devInContents :: Metric -> String
devInContents (Mean metric) = devInContents metric
devInContents GLEU = devInContents BLEU devInContents GLEU = devInContents BLEU
devInContents BLEU = [hereLit|ja jumala sanoi : " tulkoon valkeus " , ja valkeus tuli devInContents BLEU = [hereLit|ja jumala sanoi : " tulkoon valkeus " , ja valkeus tuli
ja jumala näki , että valkeus oli hyvä ; ja jumala erotti valkeuden pimeydestä ja jumala näki , että valkeus oli hyvä ; ja jumala erotti valkeuden pimeydestä
@ -540,6 +559,9 @@ Mr Jan Kowalski
devInContents TokenAccuracy = [hereLit|The cats on the mat devInContents TokenAccuracy = [hereLit|The cats on the mat
Ala has a cat Ala has a cat
|] |]
devInContents SegmentAccuracy = [hereLit|John is smart
Mary's intelligent
|]
devInContents (ProbabilisticMultiLabelFMeasure beta) = devInContents (MultiLabelFMeasure beta) devInContents (ProbabilisticMultiLabelFMeasure beta) = devInContents (MultiLabelFMeasure beta)
devInContents (MultiLabelFMeasure _) = [hereLit|Jan Kowalski is here devInContents (MultiLabelFMeasure _) = [hereLit|Jan Kowalski is here
I see him I see him
@ -558,6 +580,7 @@ devInContents _ = [hereLit|0.72 0 0.007
|] |]
devExpectedContents :: Metric -> String devExpectedContents :: Metric -> String
devExpectedContents (Mean metric) = devExpectedContents metric
devExpectedContents GLEU = devExpectedContents BLEU devExpectedContents GLEU = devExpectedContents BLEU
devExpectedContents BLEU = [hereLit|a ka ki te atua , kia marama : na ka marama devExpectedContents BLEU = [hereLit|a ka ki te atua , kia marama : na ka marama
a ka kite te atua i te marama , he pai : a ka wehea e te atua te marama i te pouri a ka kite te atua i te marama , he pai : a ka wehea e te atua te marama i te pouri
@ -604,6 +627,9 @@ O B-firstname/JAN B-surname/KOWALSKI
devExpectedContents TokenAccuracy = [hereLit|* N * * N devExpectedContents TokenAccuracy = [hereLit|* N * * N
N V * N N V * N
|] |]
devExpectedContents SegmentAccuracy = [hereLit|N:1-4 V:6-7 A:9-13
N:1-4 V:6-7 A:9-19
|]
devExpectedContents (ProbabilisticMultiLabelFMeasure beta) = devExpectedContents (MultiLabelFMeasure beta) devExpectedContents (ProbabilisticMultiLabelFMeasure beta) = devExpectedContents (MultiLabelFMeasure beta)
devExpectedContents (MultiLabelFMeasure _) = [hereLit|person/1,2 first-name/1 surname/2 devExpectedContents (MultiLabelFMeasure _) = [hereLit|person/1,2 first-name/1 surname/2
@ -624,7 +650,9 @@ devExpectedContents _ = [hereLit|0.82
|] |]
testInContents :: Metric -> String testInContents :: Metric -> String
testInContents GLEU = testInContents BLEU testInContents (Mean metric) = testInContents metric
testInContents GLEU = [hereLit|Alice has a black
|]
testInContents BLEU = [hereLit|ja jumala kutsui valkeuden päiväksi , ja pimeyden hän kutsui yöksi testInContents BLEU = [hereLit|ja jumala kutsui valkeuden päiväksi , ja pimeyden hän kutsui yöksi
ja tuli ehtoo , ja tuli aamu , ensimmäinen päivä ja tuli ehtoo , ja tuli aamu , ensimmäinen päivä
|] |]
@ -672,6 +700,9 @@ No name here
testInContents TokenAccuracy = [hereLit|I have cats testInContents TokenAccuracy = [hereLit|I have cats
I know I know
|] |]
testInContents SegmentAccuracy = [hereLit|Mary's cat is old
John is young
|]
testInContents (ProbabilisticMultiLabelFMeasure beta) = testInContents (MultiLabelFMeasure beta) testInContents (ProbabilisticMultiLabelFMeasure beta) = testInContents (MultiLabelFMeasure beta)
testInContents (MultiLabelFMeasure _) = [hereLit|John bloody Smith testInContents (MultiLabelFMeasure _) = [hereLit|John bloody Smith
Nobody is there Nobody is there
@ -690,7 +721,7 @@ testInContents _ = [hereLit|0.72 0 0.007
|] |]
testExpectedContents :: Metric -> String testExpectedContents :: Metric -> String
testExpectedContents GLEU = testExpectedContents BLEU testExpectedContents (Mean metric) = testExpectedContents metric
testExpectedContents BLEU = [hereLit|na ka huaina e te atua te marama ko te awatea , a ko te pouri i huaina e ia ko te po testExpectedContents BLEU = [hereLit|na ka huaina e te atua te marama ko te awatea , a ko te pouri i huaina e ia ko te po
a ko te ahiahi , ko te ata , he ra kotahi a ko te ahiahi , ko te ata , he ra kotahi
|] |]
@ -738,6 +769,9 @@ O O O
testExpectedContents TokenAccuracy = [hereLit|* V N testExpectedContents TokenAccuracy = [hereLit|* V N
* V * V
|] |]
testExpectedContents SegmentAccuracy = [hereLit|N:1-6 N:8-10 V:12-13 A:15-17
N:1-4 V:6-7 A:9-13
|]
testExpectedContents (ProbabilisticMultiLabelFMeasure beta) = testExpectedContents (MultiLabelFMeasure beta) testExpectedContents (ProbabilisticMultiLabelFMeasure beta) = testExpectedContents (MultiLabelFMeasure beta)
testExpectedContents (MultiLabelFMeasure _) = [hereLit|person/1,3 first-name/1 surname/3 testExpectedContents (MultiLabelFMeasure _) = [hereLit|person/1,3 first-name/1 surname/3
@ -753,10 +787,13 @@ bar:1/50,50,1000,1000
testExpectedContents ClippEU = [hereLit|3/0,0,100,100/10 testExpectedContents ClippEU = [hereLit|3/0,0,100,100/10
1/10,10,1000,1000/10 1/10,10,1000,1000/10
|] |]
testExpectedContents GLEU = [hereLit|Alice has a black cat
|]
testExpectedContents _ = [hereLit|0.11 testExpectedContents _ = [hereLit|0.11
17.2 17.2
|] |]
gitignoreContents :: String gitignoreContents :: String
gitignoreContents = [hereLit| gitignoreContents = [hereLit|
*~ *~

View File

@ -6,8 +6,8 @@ import GEval.Metric
import Text.Regex.PCRE.Heavy import Text.Regex.PCRE.Heavy
import Text.Regex.PCRE.Light.Base (Regex(..)) import Text.Regex.PCRE.Light.Base (Regex(..))
import Data.Text (Text(..), concat, toLower, toUpper, pack, unpack) import Data.Text (Text(..), concat, toLower, toUpper, pack, unpack, words, unwords)
import Data.List (intercalate, break) import Data.List (intercalate, break, sort)
import Data.Either import Data.Either
import Data.Maybe (fromMaybe) import Data.Maybe (fromMaybe)
import qualified Data.ByteString.UTF8 as BSU import qualified Data.ByteString.UTF8 as BSU
@ -16,7 +16,7 @@ import qualified Data.ByteString.UTF8 as BSU
data EvaluationScheme = EvaluationScheme Metric [PreprocessingOperation] data EvaluationScheme = EvaluationScheme Metric [PreprocessingOperation]
deriving (Eq) deriving (Eq)
data PreprocessingOperation = RegexpMatch Regex | LowerCasing | UpperCasing | SetName Text data PreprocessingOperation = RegexpMatch Regex | LowerCasing | UpperCasing | Sorting | SetName Text
deriving (Eq) deriving (Eq)
leftParameterBracket :: Char leftParameterBracket :: Char
@ -39,6 +39,8 @@ readOps ('l':theRest) = (LowerCasing:ops, theRest')
readOps ('u':theRest) = (UpperCasing:ops, theRest') readOps ('u':theRest) = (UpperCasing:ops, theRest')
where (ops, theRest') = readOps theRest where (ops, theRest') = readOps theRest
readOps ('m':theRest) = handleParametrizedOp (RegexpMatch . (fromRight undefined) . ((flip compileM) []) . BSU.fromString) theRest readOps ('m':theRest) = handleParametrizedOp (RegexpMatch . (fromRight undefined) . ((flip compileM) []) . BSU.fromString) theRest
readOps ('S':theRest) = (Sorting:ops, theRest')
where (ops, theRest') = readOps theRest
readOps ('N':theRest) = handleParametrizedOp (SetName . pack) theRest readOps ('N':theRest) = handleParametrizedOp (SetName . pack) theRest
readOps s = ([], s) readOps s = ([], s)
@ -70,6 +72,7 @@ instance Show PreprocessingOperation where
show (RegexpMatch (Regex _ regexp)) = parametrizedOperation "m" (BSU.toString regexp) show (RegexpMatch (Regex _ regexp)) = parametrizedOperation "m" (BSU.toString regexp)
show LowerCasing = "l" show LowerCasing = "l"
show UpperCasing = "u" show UpperCasing = "u"
show Sorting = "S"
show (SetName t) = parametrizedOperation "N" (unpack t) show (SetName t) = parametrizedOperation "N" (unpack t)
parametrizedOperation :: String -> String -> String parametrizedOperation :: String -> String -> String
@ -82,4 +85,5 @@ applyPreprocessingOperation :: PreprocessingOperation -> Text -> Text
applyPreprocessingOperation (RegexpMatch regex) = Data.Text.concat . (map fst) . (scan regex) applyPreprocessingOperation (RegexpMatch regex) = Data.Text.concat . (map fst) . (scan regex)
applyPreprocessingOperation LowerCasing = toLower applyPreprocessingOperation LowerCasing = toLower
applyPreprocessingOperation UpperCasing = toUpper applyPreprocessingOperation UpperCasing = toUpper
applyPreprocessingOperation Sorting = Data.Text.unwords . sort . Data.Text.words
applyPreprocessingOperation (SetName _) = id applyPreprocessingOperation (SetName _) = id

View File

@ -26,9 +26,14 @@ import Data.Attoparsec.Text (parseOnly)
data Metric = RMSE | MSE | Pearson | Spearman | BLEU | GLEU | WER | Accuracy | ClippEU data Metric = RMSE | MSE | Pearson | Spearman | BLEU | GLEU | WER | Accuracy | ClippEU
| FMeasure Double | MacroFMeasure Double | NMI | FMeasure Double | MacroFMeasure Double | NMI
| LogLossHashed Word32 | CharMatch | MAP | LogLoss | Likelihood | LogLossHashed Word32 | CharMatch | MAP | LogLoss | Likelihood
| BIOF1 | BIOF1Labels | TokenAccuracy | LikelihoodHashed Word32 | MAE | SMAPE | MultiLabelFMeasure Double | BIOF1 | BIOF1Labels | TokenAccuracy | SegmentAccuracy | LikelihoodHashed Word32 | MAE | SMAPE | MultiLabelFMeasure Double
| MultiLabelLogLoss | MultiLabelLikelihood | MultiLabelLogLoss | MultiLabelLikelihood
| SoftFMeasure Double | ProbabilisticMultiLabelFMeasure Double | ProbabilisticSoftFMeasure Double | Soft2DFMeasure Double | SoftFMeasure Double | ProbabilisticMultiLabelFMeasure Double
| ProbabilisticSoftFMeasure Double | Soft2DFMeasure Double
-- it would be better to avoid infinite recursion here
-- `Mean (Mean BLEU)` is not useful, but as it would mean
-- a larger refactor, we will postpone this
| Mean Metric
deriving (Eq) deriving (Eq)
instance Show Metric where instance Show Metric where
@ -67,13 +72,18 @@ instance Show Metric where
show BIOF1 = "BIO-F1" show BIOF1 = "BIO-F1"
show BIOF1Labels = "BIO-F1-Labels" show BIOF1Labels = "BIO-F1-Labels"
show TokenAccuracy = "TokenAccuracy" show TokenAccuracy = "TokenAccuracy"
show SegmentAccuracy = "SegmentAccuracy"
show MAE = "MAE" show MAE = "MAE"
show SMAPE = "SMAPE" show SMAPE = "SMAPE"
show (MultiLabelFMeasure beta) = "MultiLabel-F" ++ (show beta) show (MultiLabelFMeasure beta) = "MultiLabel-F" ++ (show beta)
show MultiLabelLogLoss = "MultiLabel-Logloss" show MultiLabelLogLoss = "MultiLabel-Logloss"
show MultiLabelLikelihood = "MultiLabel-Likelihood" show MultiLabelLikelihood = "MultiLabel-Likelihood"
show (Mean metric) = "Mean/" ++ (show metric)
instance Read Metric where instance Read Metric where
readsPrec p ('M':'e':'a':'n':'/':theRest) = case readsPrec p theRest of
[(metric, theRest)] -> [(Mean metric, theRest)]
_ -> []
readsPrec _ ('R':'M':'S':'E':theRest) = [(RMSE, theRest)] readsPrec _ ('R':'M':'S':'E':theRest) = [(RMSE, theRest)]
readsPrec _ ('M':'S':'E':theRest) = [(MSE, theRest)] readsPrec _ ('M':'S':'E':theRest) = [(MSE, theRest)]
readsPrec _ ('P':'e':'a':'r':'s':'o':'n':theRest) = [(Pearson, theRest)] readsPrec _ ('P':'e':'a':'r':'s':'o':'n':theRest) = [(Pearson, theRest)]
@ -118,6 +128,7 @@ instance Read Metric where
readsPrec _ ('B':'I':'O':'-':'F':'1':'-':'L':'a':'b':'e':'l':'s':theRest) = [(BIOF1Labels, theRest)] readsPrec _ ('B':'I':'O':'-':'F':'1':'-':'L':'a':'b':'e':'l':'s':theRest) = [(BIOF1Labels, theRest)]
readsPrec _ ('B':'I':'O':'-':'F':'1':theRest) = [(BIOF1, theRest)] readsPrec _ ('B':'I':'O':'-':'F':'1':theRest) = [(BIOF1, theRest)]
readsPrec _ ('T':'o':'k':'e':'n':'A':'c':'c':'u':'r':'a':'c':'y':theRest) = [(TokenAccuracy, theRest)] readsPrec _ ('T':'o':'k':'e':'n':'A':'c':'c':'u':'r':'a':'c':'y':theRest) = [(TokenAccuracy, theRest)]
readsPrec _ ('S':'e':'g':'m':'e':'n':'t':'A':'c':'c':'u':'r':'a':'c':'y':theRest) = [(SegmentAccuracy, theRest)]
readsPrec _ ('M':'A':'E':theRest) = [(MAE, theRest)] readsPrec _ ('M':'A':'E':theRest) = [(MAE, theRest)]
readsPrec _ ('S':'M':'A':'P':'E':theRest) = [(SMAPE, theRest)] readsPrec _ ('S':'M':'A':'P':'E':theRest) = [(SMAPE, theRest)]
readsPrec _ ('M':'u':'l':'t':'i':'L':'a':'b':'e':'l':'-':'L':'o':'g':'L':'o':'s':'s':theRest) = [(MultiLabelLogLoss, theRest)] readsPrec _ ('M':'u':'l':'t':'i':'L':'a':'b':'e':'l':'-':'L':'o':'g':'L':'o':'s':'s':theRest) = [(MultiLabelLogLoss, theRest)]
@ -154,11 +165,13 @@ getMetricOrdering Likelihood = TheHigherTheBetter
getMetricOrdering BIOF1 = TheHigherTheBetter getMetricOrdering BIOF1 = TheHigherTheBetter
getMetricOrdering BIOF1Labels = TheHigherTheBetter getMetricOrdering BIOF1Labels = TheHigherTheBetter
getMetricOrdering TokenAccuracy = TheHigherTheBetter getMetricOrdering TokenAccuracy = TheHigherTheBetter
getMetricOrdering SegmentAccuracy = TheHigherTheBetter
getMetricOrdering MAE = TheLowerTheBetter getMetricOrdering MAE = TheLowerTheBetter
getMetricOrdering SMAPE = TheLowerTheBetter getMetricOrdering SMAPE = TheLowerTheBetter
getMetricOrdering (MultiLabelFMeasure _) = TheHigherTheBetter getMetricOrdering (MultiLabelFMeasure _) = TheHigherTheBetter
getMetricOrdering MultiLabelLogLoss = TheLowerTheBetter getMetricOrdering MultiLabelLogLoss = TheLowerTheBetter
getMetricOrdering MultiLabelLikelihood = TheHigherTheBetter getMetricOrdering MultiLabelLikelihood = TheHigherTheBetter
getMetricOrdering (Mean metric) = getMetricOrdering metric
bestPossibleValue :: Metric -> MetricValue bestPossibleValue :: Metric -> MetricValue
bestPossibleValue metric = case getMetricOrdering metric of bestPossibleValue metric = case getMetricOrdering metric of
@ -166,18 +179,21 @@ bestPossibleValue metric = case getMetricOrdering metric of
TheHigherTheBetter -> 1.0 TheHigherTheBetter -> 1.0
fixedNumberOfColumnsInExpected :: Metric -> Bool fixedNumberOfColumnsInExpected :: Metric -> Bool
fixedNumberOfColumnsInExpected (Mean metric) = fixedNumberOfColumnsInExpected metric
fixedNumberOfColumnsInExpected MAP = False fixedNumberOfColumnsInExpected MAP = False
fixedNumberOfColumnsInExpected BLEU = False fixedNumberOfColumnsInExpected BLEU = False
fixedNumberOfColumnsInExpected GLEU = False fixedNumberOfColumnsInExpected GLEU = False
fixedNumberOfColumnsInExpected _ = True fixedNumberOfColumnsInExpected _ = True
fixedNumberOfColumnsInInput :: Metric -> Bool fixedNumberOfColumnsInInput :: Metric -> Bool
fixedNumberOfColumnsInInput (Mean metric) = fixedNumberOfColumnsInInput metric
fixedNumberOfColumnsInInput (SoftFMeasure _) = False fixedNumberOfColumnsInInput (SoftFMeasure _) = False
fixedNumberOfColumnsInInput (ProbabilisticSoftFMeasure _) = False fixedNumberOfColumnsInInput (ProbabilisticSoftFMeasure _) = False
fixedNumberOfColumnsInInput (Soft2DFMeasure _) = False fixedNumberOfColumnsInInput (Soft2DFMeasure _) = False
fixedNumberOfColumnsInInput _ = True fixedNumberOfColumnsInInput _ = True
perfectOutLineFromExpectedLine :: Metric -> Text -> Text perfectOutLineFromExpectedLine :: Metric -> Text -> Text
perfectOutLineFromExpectedLine (Mean metric) t = perfectOutLineFromExpectedLine metric t
perfectOutLineFromExpectedLine (LogLossHashed _) t = t <> ":1.0" perfectOutLineFromExpectedLine (LogLossHashed _) t = t <> ":1.0"
perfectOutLineFromExpectedLine (LikelihoodHashed _) t = t <> ":1.0" perfectOutLineFromExpectedLine (LikelihoodHashed _) t = t <> ":1.0"
perfectOutLineFromExpectedLine BLEU t = getFirstColumn t perfectOutLineFromExpectedLine BLEU t = getFirstColumn t

View File

@ -48,6 +48,7 @@ listOfAvailableMetrics = [RMSE,
MultiLabelFMeasure 1.0, MultiLabelFMeasure 1.0,
MultiLabelFMeasure 2.0, MultiLabelFMeasure 2.0,
MultiLabelFMeasure 0.25, MultiLabelFMeasure 0.25,
Mean (MultiLabelFMeasure 1.0),
ProbabilisticMultiLabelFMeasure 1.0, ProbabilisticMultiLabelFMeasure 1.0,
ProbabilisticMultiLabelFMeasure 2.0, ProbabilisticMultiLabelFMeasure 2.0,
ProbabilisticMultiLabelFMeasure 0.25, ProbabilisticMultiLabelFMeasure 0.25,
@ -63,6 +64,7 @@ listOfAvailableMetrics = [RMSE,
BIOF1, BIOF1,
BIOF1Labels, BIOF1Labels,
TokenAccuracy, TokenAccuracy,
SegmentAccuracy,
SoftFMeasure 1.0, SoftFMeasure 1.0,
SoftFMeasure 2.0, SoftFMeasure 2.0,
SoftFMeasure 0.25, SoftFMeasure 0.25,
@ -93,6 +95,8 @@ isMetricDescribed :: Metric -> Bool
isMetricDescribed (SoftFMeasure _) = True isMetricDescribed (SoftFMeasure _) = True
isMetricDescribed (Soft2DFMeasure _) = True isMetricDescribed (Soft2DFMeasure _) = True
isMetricDescribed (ProbabilisticMultiLabelFMeasure _) = True isMetricDescribed (ProbabilisticMultiLabelFMeasure _) = True
isMetricDescribed GLEU = True
isMetricDescribed SegmentAccuracy = True
isMetricDescribed _ = False isMetricDescribed _ = False
getEvaluationSchemeDescription :: EvaluationScheme -> String getEvaluationSchemeDescription :: EvaluationScheme -> String
@ -118,8 +122,26 @@ where calibration measures the quality of probabilities (how well they are calib
if we have 10 items with probability 0.5 and 5 of them are correct, then the calibration if we have 10 items with probability 0.5 and 5 of them are correct, then the calibration
is perfect. is perfect.
|] |]
getMetricDescription GLEU =
[i|For the GLEU score, we record all sub-sequences of
1, 2, 3 or 4 tokens in output and target sequence (n-grams). We then
compute a recall, which is the ratio of the number of matching n-grams
to the number of total n-grams in the target (ground truth) sequence,
and a precision, which is the ratio of the number of matching n-grams
to the number of total n-grams in the generated output sequence. Then
GLEU score is simply the minimum of recall and precision. This GLEU
score's range is always between 0 (no matches) and 1 (all match) and
it is symmetrical when switching output and target. According to
the article, GLEU score correlates quite well with the BLEU
metric on a corpus level but does not have its drawbacks for our per
sentence reward objective.
see: https://arxiv.org/pdf/1609.08144.pdf
|]
getMetricDescription SegmentAccuracy =
[i|Accuracy counted for segments, i.e. labels with positions.
The percentage of labels in the ground truth retrieved in the actual output is returned.
Accuracy is calculated separately for each item and then averaged.
|]
outContents :: Metric -> String outContents :: Metric -> String
outContents (SoftFMeasure _) = [hereLit|inwords:1-4 outContents (SoftFMeasure _) = [hereLit|inwords:1-4
@ -132,6 +154,11 @@ outContents (ProbabilisticMultiLabelFMeasure _) = [hereLit|first-name/1:0.8 surn
surname/1:0.4 surname/1:0.4
first-name/3:0.9 first-name/3:0.9
|] |]
outContents GLEU = [hereLit|Alice has a black
|]
outContents SegmentAccuracy = [hereLit|N:1-4 V:5-6 N:8-10 V:12-13 A:15-17
N:1-4 V:6-7 A:9-13
|]
expectedScore :: EvaluationScheme -> MetricValue expectedScore :: EvaluationScheme -> MetricValue
expectedScore (EvaluationScheme (SoftFMeasure beta) []) expectedScore (EvaluationScheme (SoftFMeasure beta) [])
@ -146,6 +173,10 @@ expectedScore (EvaluationScheme (ProbabilisticMultiLabelFMeasure beta) [])
= let precision = 0.6569596940847289 = let precision = 0.6569596940847289
recall = 0.675 recall = 0.675
in weightedHarmonicMean beta precision recall in weightedHarmonicMean beta precision recall
expectedScore (EvaluationScheme GLEU [])
= 0.7142857142857143
expectedScore (EvaluationScheme SegmentAccuracy [])
= 0.875
helpMetricParameterMetricsList :: String helpMetricParameterMetricsList :: String
helpMetricParameterMetricsList = intercalate ", " $ map (\s -> (show s) ++ (case extraInfo s of helpMetricParameterMetricsList = intercalate ", " $ map (\s -> (show s) ++ (case extraInfo s of
@ -194,7 +225,15 @@ the form LABEL:PAGE/X0,Y0,X1,Y1 where LABEL is any label, page is the page numbe
formatDescription (ProbabilisticMultiLabelFMeasure _) = [hereLit|In each line a number of labels (entities) can be given. A label probability formatDescription (ProbabilisticMultiLabelFMeasure _) = [hereLit|In each line a number of labels (entities) can be given. A label probability
can be provided with a colon (e.g. "foo:0.7"). By default, 1.0 is assumed. can be provided with a colon (e.g. "foo:0.7"). By default, 1.0 is assumed.
|] |]
formatDescription GLEU = [hereLit|In each line a there is a space sparated sentence of words.
|]
formatDescription SegmentAccuracy = [hereLit|Labels can be any strings (without spaces), whereas is a list of
1-based indexes or spans separated by commas (spans are inclusive
ranges, e.g. "10-14"). For instance, "foo:bar:2,4-7,10" is a
label "foo:bar" for positions 2, 4, 5, 6, 7 and 10. Note that no
overlapping segments can be returned (evaluation will fail in
such a case).
|]
scoreExplanation :: EvaluationScheme -> Maybe String scoreExplanation :: EvaluationScheme -> Maybe String
scoreExplanation (EvaluationScheme (SoftFMeasure _) []) scoreExplanation (EvaluationScheme (SoftFMeasure _) [])
@ -206,6 +245,17 @@ As far as the second item is concerned, the total area that covered by the outpu
Hence, recall is 247500/902500=0.274 and precision - 247500/(20000+912000+240000)=0.211. Therefore, the F-score Hence, recall is 247500/902500=0.274 and precision - 247500/(20000+912000+240000)=0.211. Therefore, the F-score
for the second item is 0.238 and the F-score for the whole set is (0 + 0.238)/2 = 0.119.|] for the second item is 0.238 and the F-score for the whole set is (0 + 0.238)/2 = 0.119.|]
scoreExplanation (EvaluationScheme (ProbabilisticMultiLabelFMeasure _) []) = Nothing scoreExplanation (EvaluationScheme (ProbabilisticMultiLabelFMeasure _) []) = Nothing
scoreExplanation (EvaluationScheme GLEU [])
= Just [hereLit|To find out GLEU score we first count number of tp (true positives) fp(false positives) and fn(false negatives).
We have 4 matching unigrams ("Alice", "has", "a", "black") , 3 bigrams ("Alice has", "has a", "a black"), 2 trigrams ("Alice has a", "has a black") and 1 tetragram ("Alice has a black"),
so tp=10. We have no fp, therefore fp=0. There are 4 fn - ("cat", "black cat", "a black cat", "has a black cat").
Now we have to calculate precision and recall:
Precision is tp / (tp+fp) = 10/(10+0) = 1,
recall is tp / (tp+fn) = 10 / (10+4) = 10/14 =~ 0.71428...
The GLEU score is min(precision,recall)=0.71428 |]
scoreExplanation (EvaluationScheme SegmentAccuracy [])
= Just [hereLit|Out of 4 segments in the expected output for the first item, 3 were retrieved correcly (accuracy is 3/4=0.75).
The second item was retrieved perfectly (accuracy is 1.0). Hence, the average is (0.75+1.0)/2=0.875.|]
pasteLines :: String -> String -> String pasteLines :: String -> String -> String
pasteLines a b = printf "%-35s %s\n" a b pasteLines a b = printf "%-35s %s\n" a b

View File

@ -127,6 +127,8 @@ main = hspec $ do
runGEvalTest "accuracy-simple" `shouldReturnAlmost` 0.6 runGEvalTest "accuracy-simple" `shouldReturnAlmost` 0.6
it "with probs" $ it "with probs" $
runGEvalTest "accuracy-probs" `shouldReturnAlmost` 0.4 runGEvalTest "accuracy-probs" `shouldReturnAlmost` 0.4
it "sorted" $
runGEvalTest "accuracy-on-sorted" `shouldReturnAlmost` 0.75
describe "F-measure" $ do describe "F-measure" $ do
it "simple example" $ it "simple example" $
runGEvalTest "f-measure-simple" `shouldReturnAlmost` 0.57142857 runGEvalTest "f-measure-simple" `shouldReturnAlmost` 0.57142857
@ -146,6 +148,9 @@ main = hspec $ do
describe "TokenAccuracy" $ do describe "TokenAccuracy" $ do
it "simple example" $ do it "simple example" $ do
runGEvalTest "token-accuracy-simple" `shouldReturnAlmost` 0.5 runGEvalTest "token-accuracy-simple" `shouldReturnAlmost` 0.5
describe "SegmentAccuracy" $ do
it "simple test" $ do
runGEvalTest "segment-accuracy-simple" `shouldReturnAlmost` 0.4444444
describe "precision count" $ do describe "precision count" $ do
it "simple test" $ do it "simple test" $ do
precisionCount [["Alice", "has", "a", "cat" ]] ["Ala", "has", "cat"] `shouldBe` 2 precisionCount [["Alice", "has", "a", "cat" ]] ["Ala", "has", "cat"] `shouldBe` 2
@ -323,6 +328,9 @@ main = hspec $ do
runGEvalTest "multilabel-f1-with-probs" `shouldReturnAlmost` 0.615384615384615 runGEvalTest "multilabel-f1-with-probs" `shouldReturnAlmost` 0.615384615384615
it "labels given with probs and numbers" $ do it "labels given with probs and numbers" $ do
runGEvalTest "multilabel-f1-with-probs-and-numbers" `shouldReturnAlmost` 0.6666666666666 runGEvalTest "multilabel-f1-with-probs-and-numbers" `shouldReturnAlmost` 0.6666666666666
describe "Mean/MultiLabel-F" $ do
it "simple" $ do
runGEvalTest "mean-multilabel-f1-simple" `shouldReturnAlmost` 0.5
describe "MultiLabel-Likelihood" $ do describe "MultiLabel-Likelihood" $ do
it "simple" $ do it "simple" $ do
runGEvalTest "multilabel-likelihood-simple" `shouldReturnAlmost` 0.115829218528827 runGEvalTest "multilabel-likelihood-simple" `shouldReturnAlmost` 0.115829218528827
@ -342,6 +350,11 @@ main = hspec $ do
it "just parse" $ do it "just parse" $ do
parseAnnotations "foo:3,7-10 baz:4-6" `shouldBe` Right [Annotation "foo" (IS.fromList [3,7,8,9,10]), parseAnnotations "foo:3,7-10 baz:4-6" `shouldBe` Right [Annotation "foo" (IS.fromList [3,7,8,9,10]),
Annotation "baz" (IS.fromList [4,5,6])] Annotation "baz" (IS.fromList [4,5,6])]
it "just parse wit colons" $ do
parseSegmentAnnotations "foo:x:3,7-10 baz:4-6" `shouldBe` Right [Annotation "foo:x" (IS.fromList [3,7,8,9,10]),
Annotation "baz" (IS.fromList [4,5,6])]
it "just parse wit colons" $ do
parseSegmentAnnotations "foo:x:3,7-10 baz:2-6" `shouldBe` Left "Overlapping segments"
it "just parse 2" $ do it "just parse 2" $ do
parseAnnotations "inwords:1-3 indigits:5" `shouldBe` Right [Annotation "inwords" (IS.fromList [1,2,3]), parseAnnotations "inwords:1-3 indigits:5" `shouldBe` Right [Annotation "inwords" (IS.fromList [1,2,3]),
Annotation "indigits" (IS.fromList [5])] Annotation "indigits" (IS.fromList [5])]

View File

@ -0,0 +1,4 @@
foo baz bar
xyz aaa
2 a:1 3
1 foo baz bar
2 xyz aaa
3 2 a:1 3

View File

@ -0,0 +1 @@
--metric Accuracy:S

View File

@ -0,0 +1,4 @@
bar baz foo
xyz
a:1 2 3
1 bar baz foo
2 xyz
3 a:1 2 3

View File

@ -0,0 +1,4 @@
foo bar baz
uuu
foo bar baz
qqq aaa
1 foo bar baz
2 uuu
3 foo bar baz
4 qqq aaa

View File

@ -0,0 +1 @@
--metric Mean/MultiLabel-F1

View File

@ -0,0 +1,4 @@
foo bar baz
foo
qqq qqq
1 foo bar baz
2 foo
3 qqq qqq

View File

@ -0,0 +1,3 @@
foo:0 baq:1-2 baz:3
aaa:0-1
xyz:0 bbb:x:1
1 foo:0 baq:1-2 baz:3
2 aaa:0-1
3 xyz:0 bbb:x:1

View File

@ -0,0 +1 @@
--metric SegmentAccuracy

View File

@ -0,0 +1,3 @@
foo:0 bar:1-2 baz:3
aaa:0-2
xyz:0 bbb:x:1 ccc:x:2
1 foo:0 bar:1-2 baz:3
2 aaa:0-2
3 xyz:0 bbb:x:1 ccc:x:2