Merge remote-tracking branch 'origin/master' into improvement/#25-issue-when-build-under-linux
# Conflicts: # README.md
This commit is contained in:
commit
3528fcbfca
@ -1,4 +1,13 @@
|
|||||||
|
|
||||||
|
## 1.22.1.0
|
||||||
|
|
||||||
|
* Add "Mean/" meta-metric (for the time being working only with MultiLabel-F-measure)
|
||||||
|
* Add :S flag
|
||||||
|
|
||||||
|
## 1.22.0.0
|
||||||
|
|
||||||
|
* Add SegmentAccuracy
|
||||||
|
|
||||||
## 1.21.0.0
|
## 1.21.0.0
|
||||||
|
|
||||||
* Add Probabilistic-MultiLabel-F-measure
|
* Add Probabilistic-MultiLabel-F-measure
|
||||||
|
64
README.md
64
README.md
@ -2,7 +2,7 @@
|
|||||||
|
|
||||||
GEval is a Haskell library and a stand-alone tool for evaluating the
|
GEval is a Haskell library and a stand-alone tool for evaluating the
|
||||||
results of solutions to machine learning challenges as defined in the
|
results of solutions to machine learning challenges as defined in the
|
||||||
[Gonito](https://gonito.net) platform. Also could be used outside the
|
[Gonito](https://gonito.net) platform. Also, could be used outside the
|
||||||
context of Gonito.net challenges, assuming the test data is given in
|
context of Gonito.net challenges, assuming the test data is given in
|
||||||
simple TSV (tab-separated values) files.
|
simple TSV (tab-separated values) files.
|
||||||
|
|
||||||
@ -14,6 +14,29 @@ The official repository is `git://gonito.net/geval`, browsable at
|
|||||||
|
|
||||||
## Installing
|
## Installing
|
||||||
|
|
||||||
|
### The easy way: just download the fully static GEval binary
|
||||||
|
|
||||||
|
(Assuming you have a 64-bit Linux.)
|
||||||
|
|
||||||
|
wget https://gonito.net/get/bin/geval
|
||||||
|
chmod u+x geval
|
||||||
|
./geval --help
|
||||||
|
|
||||||
|
#### On Windows
|
||||||
|
|
||||||
|
For Windows, you should use Windows PowerShell.
|
||||||
|
|
||||||
|
wget https://gonito.net/get/bin/geval
|
||||||
|
|
||||||
|
Next, you should go to the folder where you download `geval` and right-click to `geval` file.
|
||||||
|
Go to `Properties` and in the section `Security` grant full access to the folder.
|
||||||
|
|
||||||
|
Or you should use `icacls "folder path to geval" /grant USER:<username>`
|
||||||
|
|
||||||
|
This is a fully static binary, it should work on any 64-bit Linux or 64-bit Windows.
|
||||||
|
|
||||||
|
### Build from scratch
|
||||||
|
|
||||||
You need [Haskell Stack](https://github.com/commercialhaskell/stack).
|
You need [Haskell Stack](https://github.com/commercialhaskell/stack).
|
||||||
You could install Stack with your package manager or with:
|
You could install Stack with your package manager or with:
|
||||||
|
|
||||||
@ -36,6 +59,8 @@ order to run `geval` you need to either add `$HOME/.local/bin` to
|
|||||||
|
|
||||||
PATH="$HOME/.local/bin" geval ...
|
PATH="$HOME/.local/bin" geval ...
|
||||||
|
|
||||||
|
In Windows you should add new global variable with name 'geval' and path should be the same as above.
|
||||||
|
|
||||||
### Troubleshooting
|
### Troubleshooting
|
||||||
|
|
||||||
If you see a message like this:
|
If you see a message like this:
|
||||||
@ -64,15 +89,32 @@ In case the `lzma` package is not installed on your Linux, you need to run (assu
|
|||||||
|
|
||||||
sudo apt-get install pkg-config liblzma-dev libpq-dev libpcre3-dev libcairo2-dev libbz2-dev
|
sudo apt-get install pkg-config liblzma-dev libpq-dev libpcre3-dev libcairo2-dev libbz2-dev
|
||||||
|
|
||||||
### Plan B — just download the GEval binary
|
#### Windows issues
|
||||||
|
|
||||||
(Assuming you have a 64-bit Linux.)
|
If you see this message on Windows during executing `stack test` command:
|
||||||
|
|
||||||
wget https://gonito.net/get/bin/geval
|
In the dependencies for geval-1.21.1.0:
|
||||||
chmod u+x geval
|
unix needed, but the stack configuration has no specified version
|
||||||
./geval --help
|
In the dependencies for lzma-0.0.0.3:
|
||||||
|
lzma-clib needed, but the stack configuration has no specified version
|
||||||
|
|
||||||
This is a fully static binary, it should work on any 64-bit Linux.
|
You should replace `unix` with `unix-compat` in `geval.cabal` file,
|
||||||
|
because `unix` package is not supported for Windows.
|
||||||
|
|
||||||
|
And you should add `lzma-clib-5.2.2` and `unix-compat-0.5.2` to section extra-deps in `stack.yaml` file.
|
||||||
|
|
||||||
|
If you see message about missing pkg-config on Windpws you should download two packages from the site:
|
||||||
|
http://ftp.gnome.org/pub/gnome/binaries/win32/dependencies/
|
||||||
|
These packages are:
|
||||||
|
- pkg-config (the newest version)
|
||||||
|
- gettext-runtime (the newest version)
|
||||||
|
Extract `pkg-config.exe` file in Windows PATH
|
||||||
|
Extract init.dll file from gettext-runtime
|
||||||
|
|
||||||
|
You should also download from http://ftp.gnome.org/pub/gnome/binaries/win32/glib/2.28 glib package
|
||||||
|
and extract libglib-2.0-0.dll file.
|
||||||
|
|
||||||
|
All files you should put for example in `C:\MinGW\bin` directory.
|
||||||
|
|
||||||
## Quick tour
|
## Quick tour
|
||||||
|
|
||||||
@ -189,7 +231,7 @@ But why were double quotes so problematic in German-English
|
|||||||
translation?! Well, look at the second-worst feature — `''`
|
translation?! Well, look at the second-worst feature — `''`
|
||||||
in the _output_! Oops, it seems like a very stupid mistake with
|
in the _output_! Oops, it seems like a very stupid mistake with
|
||||||
post-processing was done and no double quote was correctly generated,
|
post-processing was done and no double quote was correctly generated,
|
||||||
which decreased the score a little bit for each sentence in which the
|
which decreased the score a little for each sentence in which the
|
||||||
quote was expected.
|
quote was expected.
|
||||||
|
|
||||||
When I fixed this simple bug, the BLUE metric increased from 0.27358
|
When I fixed this simple bug, the BLUE metric increased from 0.27358
|
||||||
@ -502,9 +544,9 @@ submitted. The suggested way to do this is as follows:
|
|||||||
`test-A/expected.tsv` added. This branch should be accessible by
|
`test-A/expected.tsv` added. This branch should be accessible by
|
||||||
Gonito platform, but should be kept “hidden” for regular users (or
|
Gonito platform, but should be kept “hidden” for regular users (or
|
||||||
at least they should be kindly asked not to peek there). It is
|
at least they should be kindly asked not to peek there). It is
|
||||||
recommended (though not obligatory) that this branch contain all
|
recommended (though not obligatory) that this branch contains all
|
||||||
the source codes and data used to generate the train/dev/test sets.
|
the source codes and data used to generate the train/dev/test sets.
|
||||||
(Use [git-annex](https://git-annex.branchable.com/) if you have really big files there.)
|
(Use [git-annex](https://git-annex.branchable.com/) if you have huge files there.)
|
||||||
|
|
||||||
Branch (1) should be the parent of the branch (2), for instance, the
|
Branch (1) should be the parent of the branch (2), for instance, the
|
||||||
repo (for the toy “planets” challenge) could be created as follows:
|
repo (for the toy “planets” challenge) could be created as follows:
|
||||||
@ -567,7 +609,7 @@ be nice and commit also your source codes.
|
|||||||
git push mine master
|
git push mine master
|
||||||
|
|
||||||
Then let Gonito pull them and evaluate your results, either manually clicking
|
Then let Gonito pull them and evaluate your results, either manually clicking
|
||||||
"submit" at the Gonito web site or using `--submit` option (see below).
|
"submit" at the Gonito website or using `--submit` option (see below).
|
||||||
|
|
||||||
### Submitting a solution to a Gonito platform with GEval
|
### Submitting a solution to a Gonito platform with GEval
|
||||||
|
|
||||||
|
@ -1,5 +1,5 @@
|
|||||||
name: geval
|
name: geval
|
||||||
version: 1.21.1.0
|
version: 1.22.1.0
|
||||||
synopsis: Machine learning evaluation tools
|
synopsis: Machine learning evaluation tools
|
||||||
description: Please see README.md
|
description: Please see README.md
|
||||||
homepage: http://github.com/name/project
|
homepage: http://github.com/name/project
|
||||||
|
@ -4,11 +4,12 @@
|
|||||||
module GEval.Annotation
|
module GEval.Annotation
|
||||||
(parseAnnotations, Annotation(..),
|
(parseAnnotations, Annotation(..),
|
||||||
parseObtainedAnnotations, ObtainedAnnotation(..),
|
parseObtainedAnnotations, ObtainedAnnotation(..),
|
||||||
matchScore, intSetParser)
|
matchScore, intSetParser, segmentAccuracy, parseSegmentAnnotations)
|
||||||
where
|
where
|
||||||
|
|
||||||
import qualified Data.IntSet as IS
|
import qualified Data.IntSet as IS
|
||||||
import qualified Data.Text as T
|
import qualified Data.Text as T
|
||||||
|
import Data.Set (intersection, fromList)
|
||||||
|
|
||||||
import Data.Attoparsec.Text
|
import Data.Attoparsec.Text
|
||||||
import Data.Attoparsec.Combinator
|
import Data.Attoparsec.Combinator
|
||||||
@ -17,11 +18,12 @@ import GEval.Common (sepByWhitespaces, (/.))
|
|||||||
import GEval.Probability
|
import GEval.Probability
|
||||||
import Data.Char
|
import Data.Char
|
||||||
import Data.Maybe (fromMaybe)
|
import Data.Maybe (fromMaybe)
|
||||||
|
import Data.Either (partitionEithers)
|
||||||
|
|
||||||
import GEval.PrecisionRecall(weightedMaxMatching)
|
import GEval.PrecisionRecall(weightedMaxMatching)
|
||||||
|
|
||||||
data Annotation = Annotation T.Text IS.IntSet
|
data Annotation = Annotation T.Text IS.IntSet
|
||||||
deriving (Eq, Show)
|
deriving (Eq, Show, Ord)
|
||||||
|
|
||||||
data ObtainedAnnotation = ObtainedAnnotation Annotation Double
|
data ObtainedAnnotation = ObtainedAnnotation Annotation Double
|
||||||
deriving (Eq, Show)
|
deriving (Eq, Show)
|
||||||
@ -52,6 +54,36 @@ obtainedAnnotationParser = do
|
|||||||
parseAnnotations :: T.Text -> Either String [Annotation]
|
parseAnnotations :: T.Text -> Either String [Annotation]
|
||||||
parseAnnotations t = parseOnly (annotationsParser <* endOfInput) t
|
parseAnnotations t = parseOnly (annotationsParser <* endOfInput) t
|
||||||
|
|
||||||
|
parseSegmentAnnotations :: T.Text -> Either String [Annotation]
|
||||||
|
parseSegmentAnnotations t = case parseAnnotationsWithColons t of
|
||||||
|
Left m -> Left m
|
||||||
|
Right annotations -> if areSegmentsDisjoint annotations
|
||||||
|
then (Right annotations)
|
||||||
|
else (Left "Overlapping segments")
|
||||||
|
|
||||||
|
areSegmentsDisjoint :: [Annotation] -> Bool
|
||||||
|
areSegmentsDisjoint = areIntSetsDisjoint . map (\(Annotation _ s) -> s)
|
||||||
|
|
||||||
|
areIntSetsDisjoint :: [IS.IntSet] -> Bool
|
||||||
|
areIntSetsDisjoint ss = snd $ foldr step (IS.empty, True) ss
|
||||||
|
where step _ w@(_, False) = w
|
||||||
|
step s (u, True) = (s `IS.union` u, s `IS.disjoint` u)
|
||||||
|
|
||||||
|
-- unfortunately, attoparsec does not seem to back-track properly
|
||||||
|
-- so we need a special function if labels can contain colons
|
||||||
|
parseAnnotationsWithColons :: T.Text -> Either String [Annotation]
|
||||||
|
parseAnnotationsWithColons t = case partitionEithers (map parseAnnotationWithColons $ T.words t) of
|
||||||
|
([], annotations) -> Right annotations
|
||||||
|
((firstProblem:_), _) -> Left firstProblem
|
||||||
|
|
||||||
|
parseAnnotationWithColons :: T.Text -> Either String Annotation
|
||||||
|
parseAnnotationWithColons t = if T.null label
|
||||||
|
then Left "Colon expected"
|
||||||
|
else case parseOnly (intSetParser <* endOfInput) position of
|
||||||
|
Left m -> Left m
|
||||||
|
Right s -> Right (Annotation (T.init label) s)
|
||||||
|
where (label, position) = T.breakOnEnd ":" t
|
||||||
|
|
||||||
annotationsParser :: Parser [Annotation]
|
annotationsParser :: Parser [Annotation]
|
||||||
annotationsParser = sepByWhitespaces annotationParser
|
annotationsParser = sepByWhitespaces annotationParser
|
||||||
|
|
||||||
@ -70,3 +102,7 @@ intervalParser = do
|
|||||||
startIx <- decimal
|
startIx <- decimal
|
||||||
endIx <- (string "-" *> decimal <|> pure startIx)
|
endIx <- (string "-" *> decimal <|> pure startIx)
|
||||||
pure $ IS.fromList [startIx..endIx]
|
pure $ IS.fromList [startIx..endIx]
|
||||||
|
|
||||||
|
segmentAccuracy :: [Annotation] -> [Annotation] -> Double
|
||||||
|
segmentAccuracy expected output = (fromIntegral $ length matched) / (fromIntegral $ length expected)
|
||||||
|
where matched = (fromList expected) `intersection` (fromList output)
|
||||||
|
@ -492,6 +492,23 @@ gevalCoreOnSources CharMatch inputLineSource = helper inputLineSource
|
|||||||
gevalCoreOnSources (LogLossHashed nbOfBits) _ = helperLogLossHashed nbOfBits id
|
gevalCoreOnSources (LogLossHashed nbOfBits) _ = helperLogLossHashed nbOfBits id
|
||||||
gevalCoreOnSources (LikelihoodHashed nbOfBits) _ = helperLogLossHashed nbOfBits logLossToLikehood
|
gevalCoreOnSources (LikelihoodHashed nbOfBits) _ = helperLogLossHashed nbOfBits logLossToLikehood
|
||||||
|
|
||||||
|
|
||||||
|
gevalCoreOnSources (Mean (MultiLabelFMeasure beta)) _
|
||||||
|
= gevalCoreWithoutInputOnItemTargets (Right . intoWords)
|
||||||
|
(Right . getWords)
|
||||||
|
((fMeasureOnCounts beta) . (getCounts (==)))
|
||||||
|
averageC
|
||||||
|
id
|
||||||
|
noGraph
|
||||||
|
where
|
||||||
|
-- repeated as below, as it will be refactored into dependent types soon anyway
|
||||||
|
getWords (RawItemTarget t) = Prelude.map unpack $ selectByStandardThreshold $ parseIntoProbList t
|
||||||
|
getWords (PartiallyParsedItemTarget ts) = Prelude.map unpack ts
|
||||||
|
intoWords (RawItemTarget t) = Prelude.map unpack $ Data.Text.words t
|
||||||
|
intoWords (PartiallyParsedItemTarget ts) = Prelude.map unpack ts
|
||||||
|
|
||||||
|
gevalCoreOnSources (Mean _) _ = error $ "Mean/ meta-metric defined only for MultiLabel-F1 for the time being"
|
||||||
|
|
||||||
-- only MultiLabel-F1 handled for JSONs for the time being...
|
-- only MultiLabel-F1 handled for JSONs for the time being...
|
||||||
gevalCoreOnSources (MultiLabelFMeasure beta) _ = gevalCoreWithoutInputOnItemTargets (Right . intoWords)
|
gevalCoreOnSources (MultiLabelFMeasure beta) _ = gevalCoreWithoutInputOnItemTargets (Right . intoWords)
|
||||||
(Right . getWords)
|
(Right . getWords)
|
||||||
@ -706,6 +723,13 @@ gevalCoreOnSources TokenAccuracy _ = gevalCoreWithoutInput intoTokens
|
|||||||
| otherwise = (h, t + 1)
|
| otherwise = (h, t + 1)
|
||||||
hitsAndTotalsAgg = CC.foldl (\(h1, t1) (h2, t2) -> (h1 + h2, t1 + t2)) (0, 0)
|
hitsAndTotalsAgg = CC.foldl (\(h1, t1) (h2, t2) -> (h1 + h2, t1 + t2)) (0, 0)
|
||||||
|
|
||||||
|
gevalCoreOnSources SegmentAccuracy _ = gevalCoreWithoutInput parseSegmentAnnotations
|
||||||
|
parseSegmentAnnotations
|
||||||
|
(uncurry segmentAccuracy)
|
||||||
|
averageC
|
||||||
|
id
|
||||||
|
noGraph
|
||||||
|
|
||||||
gevalCoreOnSources MultiLabelLogLoss _ = gevalCoreWithoutInput intoWords
|
gevalCoreOnSources MultiLabelLogLoss _ = gevalCoreWithoutInput intoWords
|
||||||
(Right . parseIntoProbList)
|
(Right . parseIntoProbList)
|
||||||
(uncurry countLogLossOnProbList)
|
(uncurry countLogLossOnProbList)
|
||||||
|
@ -55,6 +55,7 @@ createFile filePath contents = do
|
|||||||
writeFile filePath contents
|
writeFile filePath contents
|
||||||
|
|
||||||
readmeMDContents :: Metric -> String -> String
|
readmeMDContents :: Metric -> String -> String
|
||||||
|
readmeMDContents (Mean metric) testName = readmeMDContents metric testName
|
||||||
readmeMDContents GLEU testName = readmeMDContents BLEU testName
|
readmeMDContents GLEU testName = readmeMDContents BLEU testName
|
||||||
readmeMDContents BLEU testName = [i|
|
readmeMDContents BLEU testName = [i|
|
||||||
GEval sample machine translation challenge
|
GEval sample machine translation challenge
|
||||||
@ -297,6 +298,19 @@ in the expected file (but not in the output file).
|
|||||||
|
|
||||||
|] ++ (commonReadmeMDContents testName)
|
|] ++ (commonReadmeMDContents testName)
|
||||||
|
|
||||||
|
readmeMDContents SegmentAccuracy testName = [i|
|
||||||
|
Segment a sentence and tag with POS tags
|
||||||
|
========================================
|
||||||
|
|
||||||
|
This is a sample, toy challenge for SegmentAccuracy.
|
||||||
|
|
||||||
|
For each sentence, give a sequence of POS tags, each one with
|
||||||
|
its position (1-indexed). For instance, `N:1-10` means a nouns
|
||||||
|
starting from the beginning (the first character) up to to the tenth
|
||||||
|
character (inclusively).
|
||||||
|
|
||||||
|
|] ++ (commonReadmeMDContents testName)
|
||||||
|
|
||||||
readmeMDContents (ProbabilisticMultiLabelFMeasure beta) testName = readmeMDContents (MultiLabelFMeasure beta) testName
|
readmeMDContents (ProbabilisticMultiLabelFMeasure beta) testName = readmeMDContents (MultiLabelFMeasure beta) testName
|
||||||
readmeMDContents (MultiLabelFMeasure beta) testName = [i|
|
readmeMDContents (MultiLabelFMeasure beta) testName = [i|
|
||||||
Tag names and their component
|
Tag names and their component
|
||||||
@ -400,6 +414,7 @@ configContents schemes precision testName = unwords (Prelude.map (\scheme -> ("-
|
|||||||
precisionOpt (Just p) = " --precision " ++ (show p)
|
precisionOpt (Just p) = " --precision " ++ (show p)
|
||||||
|
|
||||||
trainContents :: Metric -> String
|
trainContents :: Metric -> String
|
||||||
|
trainContents (Mean metric) = trainContents metric
|
||||||
trainContents GLEU = trainContents BLEU
|
trainContents GLEU = trainContents BLEU
|
||||||
trainContents BLEU = [hereLit|alussa loi jumala taivaan ja maan he mea hanga na te atua i te timatanga te rangi me te whenua
|
trainContents BLEU = [hereLit|alussa loi jumala taivaan ja maan he mea hanga na te atua i te timatanga te rangi me te whenua
|
||||||
ja maa oli autio ja tyhjä , ja pimeys oli syvyyden päällä a kahore he ahua o te whenua , i takoto kau ; he pouri ano a runga i te mata o te hohonu
|
ja maa oli autio ja tyhjä , ja pimeys oli syvyyden päällä a kahore he ahua o te whenua , i takoto kau ; he pouri ano a runga i te mata o te hohonu
|
||||||
@ -473,6 +488,9 @@ B-firstname/JOHN I-surname/VON I-surname/NEUMANN John von Nueman
|
|||||||
trainContents TokenAccuracy = [hereLit|* V N I like cats
|
trainContents TokenAccuracy = [hereLit|* V N I like cats
|
||||||
* * V * N I can see the rainbow
|
* * V * N I can see the rainbow
|
||||||
|]
|
|]
|
||||||
|
trainContents SegmentAccuracy = [hereLit|Art:1-3 N:5-11 V:12-13 A:15-19 The student's smart
|
||||||
|
N:1-6 N:8-10 V:12-13 A:15-18 Mary's dog is nice
|
||||||
|
|]
|
||||||
trainContents (ProbabilisticMultiLabelFMeasure beta) = trainContents (MultiLabelFMeasure beta)
|
trainContents (ProbabilisticMultiLabelFMeasure beta) = trainContents (MultiLabelFMeasure beta)
|
||||||
trainContents (MultiLabelFMeasure _) = [hereLit|I know Mr John Smith person/3,4,5 first-name/4 surname/5
|
trainContents (MultiLabelFMeasure _) = [hereLit|I know Mr John Smith person/3,4,5 first-name/4 surname/5
|
||||||
Steven bloody Brown person/1,3 first-name/1 surname/3
|
Steven bloody Brown person/1,3 first-name/1 surname/3
|
||||||
@ -494,6 +512,7 @@ trainContents _ = [hereLit|0.06 0.39 0 0.206
|
|||||||
|]
|
|]
|
||||||
|
|
||||||
devInContents :: Metric -> String
|
devInContents :: Metric -> String
|
||||||
|
devInContents (Mean metric) = devInContents metric
|
||||||
devInContents GLEU = devInContents BLEU
|
devInContents GLEU = devInContents BLEU
|
||||||
devInContents BLEU = [hereLit|ja jumala sanoi : " tulkoon valkeus " , ja valkeus tuli
|
devInContents BLEU = [hereLit|ja jumala sanoi : " tulkoon valkeus " , ja valkeus tuli
|
||||||
ja jumala näki , että valkeus oli hyvä ; ja jumala erotti valkeuden pimeydestä
|
ja jumala näki , että valkeus oli hyvä ; ja jumala erotti valkeuden pimeydestä
|
||||||
@ -540,6 +559,9 @@ Mr Jan Kowalski
|
|||||||
devInContents TokenAccuracy = [hereLit|The cats on the mat
|
devInContents TokenAccuracy = [hereLit|The cats on the mat
|
||||||
Ala has a cat
|
Ala has a cat
|
||||||
|]
|
|]
|
||||||
|
devInContents SegmentAccuracy = [hereLit|John is smart
|
||||||
|
Mary's intelligent
|
||||||
|
|]
|
||||||
devInContents (ProbabilisticMultiLabelFMeasure beta) = devInContents (MultiLabelFMeasure beta)
|
devInContents (ProbabilisticMultiLabelFMeasure beta) = devInContents (MultiLabelFMeasure beta)
|
||||||
devInContents (MultiLabelFMeasure _) = [hereLit|Jan Kowalski is here
|
devInContents (MultiLabelFMeasure _) = [hereLit|Jan Kowalski is here
|
||||||
I see him
|
I see him
|
||||||
@ -558,6 +580,7 @@ devInContents _ = [hereLit|0.72 0 0.007
|
|||||||
|]
|
|]
|
||||||
|
|
||||||
devExpectedContents :: Metric -> String
|
devExpectedContents :: Metric -> String
|
||||||
|
devExpectedContents (Mean metric) = devExpectedContents metric
|
||||||
devExpectedContents GLEU = devExpectedContents BLEU
|
devExpectedContents GLEU = devExpectedContents BLEU
|
||||||
devExpectedContents BLEU = [hereLit|a ka ki te atua , kia marama : na ka marama
|
devExpectedContents BLEU = [hereLit|a ka ki te atua , kia marama : na ka marama
|
||||||
a ka kite te atua i te marama , he pai : a ka wehea e te atua te marama i te pouri
|
a ka kite te atua i te marama , he pai : a ka wehea e te atua te marama i te pouri
|
||||||
@ -604,6 +627,9 @@ O B-firstname/JAN B-surname/KOWALSKI
|
|||||||
devExpectedContents TokenAccuracy = [hereLit|* N * * N
|
devExpectedContents TokenAccuracy = [hereLit|* N * * N
|
||||||
N V * N
|
N V * N
|
||||||
|]
|
|]
|
||||||
|
devExpectedContents SegmentAccuracy = [hereLit|N:1-4 V:6-7 A:9-13
|
||||||
|
N:1-4 V:6-7 A:9-19
|
||||||
|
|]
|
||||||
devExpectedContents (ProbabilisticMultiLabelFMeasure beta) = devExpectedContents (MultiLabelFMeasure beta)
|
devExpectedContents (ProbabilisticMultiLabelFMeasure beta) = devExpectedContents (MultiLabelFMeasure beta)
|
||||||
devExpectedContents (MultiLabelFMeasure _) = [hereLit|person/1,2 first-name/1 surname/2
|
devExpectedContents (MultiLabelFMeasure _) = [hereLit|person/1,2 first-name/1 surname/2
|
||||||
|
|
||||||
@ -624,7 +650,9 @@ devExpectedContents _ = [hereLit|0.82
|
|||||||
|]
|
|]
|
||||||
|
|
||||||
testInContents :: Metric -> String
|
testInContents :: Metric -> String
|
||||||
testInContents GLEU = testInContents BLEU
|
testInContents (Mean metric) = testInContents metric
|
||||||
|
testInContents GLEU = [hereLit|Alice has a black
|
||||||
|
|]
|
||||||
testInContents BLEU = [hereLit|ja jumala kutsui valkeuden päiväksi , ja pimeyden hän kutsui yöksi
|
testInContents BLEU = [hereLit|ja jumala kutsui valkeuden päiväksi , ja pimeyden hän kutsui yöksi
|
||||||
ja tuli ehtoo , ja tuli aamu , ensimmäinen päivä
|
ja tuli ehtoo , ja tuli aamu , ensimmäinen päivä
|
||||||
|]
|
|]
|
||||||
@ -672,6 +700,9 @@ No name here
|
|||||||
testInContents TokenAccuracy = [hereLit|I have cats
|
testInContents TokenAccuracy = [hereLit|I have cats
|
||||||
I know
|
I know
|
||||||
|]
|
|]
|
||||||
|
testInContents SegmentAccuracy = [hereLit|Mary's cat is old
|
||||||
|
John is young
|
||||||
|
|]
|
||||||
testInContents (ProbabilisticMultiLabelFMeasure beta) = testInContents (MultiLabelFMeasure beta)
|
testInContents (ProbabilisticMultiLabelFMeasure beta) = testInContents (MultiLabelFMeasure beta)
|
||||||
testInContents (MultiLabelFMeasure _) = [hereLit|John bloody Smith
|
testInContents (MultiLabelFMeasure _) = [hereLit|John bloody Smith
|
||||||
Nobody is there
|
Nobody is there
|
||||||
@ -690,7 +721,7 @@ testInContents _ = [hereLit|0.72 0 0.007
|
|||||||
|]
|
|]
|
||||||
|
|
||||||
testExpectedContents :: Metric -> String
|
testExpectedContents :: Metric -> String
|
||||||
testExpectedContents GLEU = testExpectedContents BLEU
|
testExpectedContents (Mean metric) = testExpectedContents metric
|
||||||
testExpectedContents BLEU = [hereLit|na ka huaina e te atua te marama ko te awatea , a ko te pouri i huaina e ia ko te po
|
testExpectedContents BLEU = [hereLit|na ka huaina e te atua te marama ko te awatea , a ko te pouri i huaina e ia ko te po
|
||||||
a ko te ahiahi , ko te ata , he ra kotahi
|
a ko te ahiahi , ko te ata , he ra kotahi
|
||||||
|]
|
|]
|
||||||
@ -738,6 +769,9 @@ O O O
|
|||||||
testExpectedContents TokenAccuracy = [hereLit|* V N
|
testExpectedContents TokenAccuracy = [hereLit|* V N
|
||||||
* V
|
* V
|
||||||
|]
|
|]
|
||||||
|
testExpectedContents SegmentAccuracy = [hereLit|N:1-6 N:8-10 V:12-13 A:15-17
|
||||||
|
N:1-4 V:6-7 A:9-13
|
||||||
|
|]
|
||||||
testExpectedContents (ProbabilisticMultiLabelFMeasure beta) = testExpectedContents (MultiLabelFMeasure beta)
|
testExpectedContents (ProbabilisticMultiLabelFMeasure beta) = testExpectedContents (MultiLabelFMeasure beta)
|
||||||
testExpectedContents (MultiLabelFMeasure _) = [hereLit|person/1,3 first-name/1 surname/3
|
testExpectedContents (MultiLabelFMeasure _) = [hereLit|person/1,3 first-name/1 surname/3
|
||||||
|
|
||||||
@ -753,10 +787,13 @@ bar:1/50,50,1000,1000
|
|||||||
testExpectedContents ClippEU = [hereLit|3/0,0,100,100/10
|
testExpectedContents ClippEU = [hereLit|3/0,0,100,100/10
|
||||||
1/10,10,1000,1000/10
|
1/10,10,1000,1000/10
|
||||||
|]
|
|]
|
||||||
|
testExpectedContents GLEU = [hereLit|Alice has a black cat
|
||||||
|
|]
|
||||||
testExpectedContents _ = [hereLit|0.11
|
testExpectedContents _ = [hereLit|0.11
|
||||||
17.2
|
17.2
|
||||||
|]
|
|]
|
||||||
|
|
||||||
|
|
||||||
gitignoreContents :: String
|
gitignoreContents :: String
|
||||||
gitignoreContents = [hereLit|
|
gitignoreContents = [hereLit|
|
||||||
*~
|
*~
|
||||||
|
@ -6,8 +6,8 @@ import GEval.Metric
|
|||||||
|
|
||||||
import Text.Regex.PCRE.Heavy
|
import Text.Regex.PCRE.Heavy
|
||||||
import Text.Regex.PCRE.Light.Base (Regex(..))
|
import Text.Regex.PCRE.Light.Base (Regex(..))
|
||||||
import Data.Text (Text(..), concat, toLower, toUpper, pack, unpack)
|
import Data.Text (Text(..), concat, toLower, toUpper, pack, unpack, words, unwords)
|
||||||
import Data.List (intercalate, break)
|
import Data.List (intercalate, break, sort)
|
||||||
import Data.Either
|
import Data.Either
|
||||||
import Data.Maybe (fromMaybe)
|
import Data.Maybe (fromMaybe)
|
||||||
import qualified Data.ByteString.UTF8 as BSU
|
import qualified Data.ByteString.UTF8 as BSU
|
||||||
@ -16,7 +16,7 @@ import qualified Data.ByteString.UTF8 as BSU
|
|||||||
data EvaluationScheme = EvaluationScheme Metric [PreprocessingOperation]
|
data EvaluationScheme = EvaluationScheme Metric [PreprocessingOperation]
|
||||||
deriving (Eq)
|
deriving (Eq)
|
||||||
|
|
||||||
data PreprocessingOperation = RegexpMatch Regex | LowerCasing | UpperCasing | SetName Text
|
data PreprocessingOperation = RegexpMatch Regex | LowerCasing | UpperCasing | Sorting | SetName Text
|
||||||
deriving (Eq)
|
deriving (Eq)
|
||||||
|
|
||||||
leftParameterBracket :: Char
|
leftParameterBracket :: Char
|
||||||
@ -39,6 +39,8 @@ readOps ('l':theRest) = (LowerCasing:ops, theRest')
|
|||||||
readOps ('u':theRest) = (UpperCasing:ops, theRest')
|
readOps ('u':theRest) = (UpperCasing:ops, theRest')
|
||||||
where (ops, theRest') = readOps theRest
|
where (ops, theRest') = readOps theRest
|
||||||
readOps ('m':theRest) = handleParametrizedOp (RegexpMatch . (fromRight undefined) . ((flip compileM) []) . BSU.fromString) theRest
|
readOps ('m':theRest) = handleParametrizedOp (RegexpMatch . (fromRight undefined) . ((flip compileM) []) . BSU.fromString) theRest
|
||||||
|
readOps ('S':theRest) = (Sorting:ops, theRest')
|
||||||
|
where (ops, theRest') = readOps theRest
|
||||||
readOps ('N':theRest) = handleParametrizedOp (SetName . pack) theRest
|
readOps ('N':theRest) = handleParametrizedOp (SetName . pack) theRest
|
||||||
readOps s = ([], s)
|
readOps s = ([], s)
|
||||||
|
|
||||||
@ -70,6 +72,7 @@ instance Show PreprocessingOperation where
|
|||||||
show (RegexpMatch (Regex _ regexp)) = parametrizedOperation "m" (BSU.toString regexp)
|
show (RegexpMatch (Regex _ regexp)) = parametrizedOperation "m" (BSU.toString regexp)
|
||||||
show LowerCasing = "l"
|
show LowerCasing = "l"
|
||||||
show UpperCasing = "u"
|
show UpperCasing = "u"
|
||||||
|
show Sorting = "S"
|
||||||
show (SetName t) = parametrizedOperation "N" (unpack t)
|
show (SetName t) = parametrizedOperation "N" (unpack t)
|
||||||
|
|
||||||
parametrizedOperation :: String -> String -> String
|
parametrizedOperation :: String -> String -> String
|
||||||
@ -82,4 +85,5 @@ applyPreprocessingOperation :: PreprocessingOperation -> Text -> Text
|
|||||||
applyPreprocessingOperation (RegexpMatch regex) = Data.Text.concat . (map fst) . (scan regex)
|
applyPreprocessingOperation (RegexpMatch regex) = Data.Text.concat . (map fst) . (scan regex)
|
||||||
applyPreprocessingOperation LowerCasing = toLower
|
applyPreprocessingOperation LowerCasing = toLower
|
||||||
applyPreprocessingOperation UpperCasing = toUpper
|
applyPreprocessingOperation UpperCasing = toUpper
|
||||||
|
applyPreprocessingOperation Sorting = Data.Text.unwords . sort . Data.Text.words
|
||||||
applyPreprocessingOperation (SetName _) = id
|
applyPreprocessingOperation (SetName _) = id
|
||||||
|
@ -26,9 +26,14 @@ import Data.Attoparsec.Text (parseOnly)
|
|||||||
data Metric = RMSE | MSE | Pearson | Spearman | BLEU | GLEU | WER | Accuracy | ClippEU
|
data Metric = RMSE | MSE | Pearson | Spearman | BLEU | GLEU | WER | Accuracy | ClippEU
|
||||||
| FMeasure Double | MacroFMeasure Double | NMI
|
| FMeasure Double | MacroFMeasure Double | NMI
|
||||||
| LogLossHashed Word32 | CharMatch | MAP | LogLoss | Likelihood
|
| LogLossHashed Word32 | CharMatch | MAP | LogLoss | Likelihood
|
||||||
| BIOF1 | BIOF1Labels | TokenAccuracy | LikelihoodHashed Word32 | MAE | SMAPE | MultiLabelFMeasure Double
|
| BIOF1 | BIOF1Labels | TokenAccuracy | SegmentAccuracy | LikelihoodHashed Word32 | MAE | SMAPE | MultiLabelFMeasure Double
|
||||||
| MultiLabelLogLoss | MultiLabelLikelihood
|
| MultiLabelLogLoss | MultiLabelLikelihood
|
||||||
| SoftFMeasure Double | ProbabilisticMultiLabelFMeasure Double | ProbabilisticSoftFMeasure Double | Soft2DFMeasure Double
|
| SoftFMeasure Double | ProbabilisticMultiLabelFMeasure Double
|
||||||
|
| ProbabilisticSoftFMeasure Double | Soft2DFMeasure Double
|
||||||
|
-- it would be better to avoid infinite recursion here
|
||||||
|
-- `Mean (Mean BLEU)` is not useful, but as it would mean
|
||||||
|
-- a larger refactor, we will postpone this
|
||||||
|
| Mean Metric
|
||||||
deriving (Eq)
|
deriving (Eq)
|
||||||
|
|
||||||
instance Show Metric where
|
instance Show Metric where
|
||||||
@ -67,13 +72,18 @@ instance Show Metric where
|
|||||||
show BIOF1 = "BIO-F1"
|
show BIOF1 = "BIO-F1"
|
||||||
show BIOF1Labels = "BIO-F1-Labels"
|
show BIOF1Labels = "BIO-F1-Labels"
|
||||||
show TokenAccuracy = "TokenAccuracy"
|
show TokenAccuracy = "TokenAccuracy"
|
||||||
|
show SegmentAccuracy = "SegmentAccuracy"
|
||||||
show MAE = "MAE"
|
show MAE = "MAE"
|
||||||
show SMAPE = "SMAPE"
|
show SMAPE = "SMAPE"
|
||||||
show (MultiLabelFMeasure beta) = "MultiLabel-F" ++ (show beta)
|
show (MultiLabelFMeasure beta) = "MultiLabel-F" ++ (show beta)
|
||||||
show MultiLabelLogLoss = "MultiLabel-Logloss"
|
show MultiLabelLogLoss = "MultiLabel-Logloss"
|
||||||
show MultiLabelLikelihood = "MultiLabel-Likelihood"
|
show MultiLabelLikelihood = "MultiLabel-Likelihood"
|
||||||
|
show (Mean metric) = "Mean/" ++ (show metric)
|
||||||
|
|
||||||
instance Read Metric where
|
instance Read Metric where
|
||||||
|
readsPrec p ('M':'e':'a':'n':'/':theRest) = case readsPrec p theRest of
|
||||||
|
[(metric, theRest)] -> [(Mean metric, theRest)]
|
||||||
|
_ -> []
|
||||||
readsPrec _ ('R':'M':'S':'E':theRest) = [(RMSE, theRest)]
|
readsPrec _ ('R':'M':'S':'E':theRest) = [(RMSE, theRest)]
|
||||||
readsPrec _ ('M':'S':'E':theRest) = [(MSE, theRest)]
|
readsPrec _ ('M':'S':'E':theRest) = [(MSE, theRest)]
|
||||||
readsPrec _ ('P':'e':'a':'r':'s':'o':'n':theRest) = [(Pearson, theRest)]
|
readsPrec _ ('P':'e':'a':'r':'s':'o':'n':theRest) = [(Pearson, theRest)]
|
||||||
@ -118,6 +128,7 @@ instance Read Metric where
|
|||||||
readsPrec _ ('B':'I':'O':'-':'F':'1':'-':'L':'a':'b':'e':'l':'s':theRest) = [(BIOF1Labels, theRest)]
|
readsPrec _ ('B':'I':'O':'-':'F':'1':'-':'L':'a':'b':'e':'l':'s':theRest) = [(BIOF1Labels, theRest)]
|
||||||
readsPrec _ ('B':'I':'O':'-':'F':'1':theRest) = [(BIOF1, theRest)]
|
readsPrec _ ('B':'I':'O':'-':'F':'1':theRest) = [(BIOF1, theRest)]
|
||||||
readsPrec _ ('T':'o':'k':'e':'n':'A':'c':'c':'u':'r':'a':'c':'y':theRest) = [(TokenAccuracy, theRest)]
|
readsPrec _ ('T':'o':'k':'e':'n':'A':'c':'c':'u':'r':'a':'c':'y':theRest) = [(TokenAccuracy, theRest)]
|
||||||
|
readsPrec _ ('S':'e':'g':'m':'e':'n':'t':'A':'c':'c':'u':'r':'a':'c':'y':theRest) = [(SegmentAccuracy, theRest)]
|
||||||
readsPrec _ ('M':'A':'E':theRest) = [(MAE, theRest)]
|
readsPrec _ ('M':'A':'E':theRest) = [(MAE, theRest)]
|
||||||
readsPrec _ ('S':'M':'A':'P':'E':theRest) = [(SMAPE, theRest)]
|
readsPrec _ ('S':'M':'A':'P':'E':theRest) = [(SMAPE, theRest)]
|
||||||
readsPrec _ ('M':'u':'l':'t':'i':'L':'a':'b':'e':'l':'-':'L':'o':'g':'L':'o':'s':'s':theRest) = [(MultiLabelLogLoss, theRest)]
|
readsPrec _ ('M':'u':'l':'t':'i':'L':'a':'b':'e':'l':'-':'L':'o':'g':'L':'o':'s':'s':theRest) = [(MultiLabelLogLoss, theRest)]
|
||||||
@ -154,11 +165,13 @@ getMetricOrdering Likelihood = TheHigherTheBetter
|
|||||||
getMetricOrdering BIOF1 = TheHigherTheBetter
|
getMetricOrdering BIOF1 = TheHigherTheBetter
|
||||||
getMetricOrdering BIOF1Labels = TheHigherTheBetter
|
getMetricOrdering BIOF1Labels = TheHigherTheBetter
|
||||||
getMetricOrdering TokenAccuracy = TheHigherTheBetter
|
getMetricOrdering TokenAccuracy = TheHigherTheBetter
|
||||||
|
getMetricOrdering SegmentAccuracy = TheHigherTheBetter
|
||||||
getMetricOrdering MAE = TheLowerTheBetter
|
getMetricOrdering MAE = TheLowerTheBetter
|
||||||
getMetricOrdering SMAPE = TheLowerTheBetter
|
getMetricOrdering SMAPE = TheLowerTheBetter
|
||||||
getMetricOrdering (MultiLabelFMeasure _) = TheHigherTheBetter
|
getMetricOrdering (MultiLabelFMeasure _) = TheHigherTheBetter
|
||||||
getMetricOrdering MultiLabelLogLoss = TheLowerTheBetter
|
getMetricOrdering MultiLabelLogLoss = TheLowerTheBetter
|
||||||
getMetricOrdering MultiLabelLikelihood = TheHigherTheBetter
|
getMetricOrdering MultiLabelLikelihood = TheHigherTheBetter
|
||||||
|
getMetricOrdering (Mean metric) = getMetricOrdering metric
|
||||||
|
|
||||||
bestPossibleValue :: Metric -> MetricValue
|
bestPossibleValue :: Metric -> MetricValue
|
||||||
bestPossibleValue metric = case getMetricOrdering metric of
|
bestPossibleValue metric = case getMetricOrdering metric of
|
||||||
@ -166,18 +179,21 @@ bestPossibleValue metric = case getMetricOrdering metric of
|
|||||||
TheHigherTheBetter -> 1.0
|
TheHigherTheBetter -> 1.0
|
||||||
|
|
||||||
fixedNumberOfColumnsInExpected :: Metric -> Bool
|
fixedNumberOfColumnsInExpected :: Metric -> Bool
|
||||||
|
fixedNumberOfColumnsInExpected (Mean metric) = fixedNumberOfColumnsInExpected metric
|
||||||
fixedNumberOfColumnsInExpected MAP = False
|
fixedNumberOfColumnsInExpected MAP = False
|
||||||
fixedNumberOfColumnsInExpected BLEU = False
|
fixedNumberOfColumnsInExpected BLEU = False
|
||||||
fixedNumberOfColumnsInExpected GLEU = False
|
fixedNumberOfColumnsInExpected GLEU = False
|
||||||
fixedNumberOfColumnsInExpected _ = True
|
fixedNumberOfColumnsInExpected _ = True
|
||||||
|
|
||||||
fixedNumberOfColumnsInInput :: Metric -> Bool
|
fixedNumberOfColumnsInInput :: Metric -> Bool
|
||||||
|
fixedNumberOfColumnsInInput (Mean metric) = fixedNumberOfColumnsInInput metric
|
||||||
fixedNumberOfColumnsInInput (SoftFMeasure _) = False
|
fixedNumberOfColumnsInInput (SoftFMeasure _) = False
|
||||||
fixedNumberOfColumnsInInput (ProbabilisticSoftFMeasure _) = False
|
fixedNumberOfColumnsInInput (ProbabilisticSoftFMeasure _) = False
|
||||||
fixedNumberOfColumnsInInput (Soft2DFMeasure _) = False
|
fixedNumberOfColumnsInInput (Soft2DFMeasure _) = False
|
||||||
fixedNumberOfColumnsInInput _ = True
|
fixedNumberOfColumnsInInput _ = True
|
||||||
|
|
||||||
perfectOutLineFromExpectedLine :: Metric -> Text -> Text
|
perfectOutLineFromExpectedLine :: Metric -> Text -> Text
|
||||||
|
perfectOutLineFromExpectedLine (Mean metric) t = perfectOutLineFromExpectedLine metric t
|
||||||
perfectOutLineFromExpectedLine (LogLossHashed _) t = t <> ":1.0"
|
perfectOutLineFromExpectedLine (LogLossHashed _) t = t <> ":1.0"
|
||||||
perfectOutLineFromExpectedLine (LikelihoodHashed _) t = t <> ":1.0"
|
perfectOutLineFromExpectedLine (LikelihoodHashed _) t = t <> ":1.0"
|
||||||
perfectOutLineFromExpectedLine BLEU t = getFirstColumn t
|
perfectOutLineFromExpectedLine BLEU t = getFirstColumn t
|
||||||
|
@ -48,6 +48,7 @@ listOfAvailableMetrics = [RMSE,
|
|||||||
MultiLabelFMeasure 1.0,
|
MultiLabelFMeasure 1.0,
|
||||||
MultiLabelFMeasure 2.0,
|
MultiLabelFMeasure 2.0,
|
||||||
MultiLabelFMeasure 0.25,
|
MultiLabelFMeasure 0.25,
|
||||||
|
Mean (MultiLabelFMeasure 1.0),
|
||||||
ProbabilisticMultiLabelFMeasure 1.0,
|
ProbabilisticMultiLabelFMeasure 1.0,
|
||||||
ProbabilisticMultiLabelFMeasure 2.0,
|
ProbabilisticMultiLabelFMeasure 2.0,
|
||||||
ProbabilisticMultiLabelFMeasure 0.25,
|
ProbabilisticMultiLabelFMeasure 0.25,
|
||||||
@ -63,6 +64,7 @@ listOfAvailableMetrics = [RMSE,
|
|||||||
BIOF1,
|
BIOF1,
|
||||||
BIOF1Labels,
|
BIOF1Labels,
|
||||||
TokenAccuracy,
|
TokenAccuracy,
|
||||||
|
SegmentAccuracy,
|
||||||
SoftFMeasure 1.0,
|
SoftFMeasure 1.0,
|
||||||
SoftFMeasure 2.0,
|
SoftFMeasure 2.0,
|
||||||
SoftFMeasure 0.25,
|
SoftFMeasure 0.25,
|
||||||
@ -93,6 +95,8 @@ isMetricDescribed :: Metric -> Bool
|
|||||||
isMetricDescribed (SoftFMeasure _) = True
|
isMetricDescribed (SoftFMeasure _) = True
|
||||||
isMetricDescribed (Soft2DFMeasure _) = True
|
isMetricDescribed (Soft2DFMeasure _) = True
|
||||||
isMetricDescribed (ProbabilisticMultiLabelFMeasure _) = True
|
isMetricDescribed (ProbabilisticMultiLabelFMeasure _) = True
|
||||||
|
isMetricDescribed GLEU = True
|
||||||
|
isMetricDescribed SegmentAccuracy = True
|
||||||
isMetricDescribed _ = False
|
isMetricDescribed _ = False
|
||||||
|
|
||||||
getEvaluationSchemeDescription :: EvaluationScheme -> String
|
getEvaluationSchemeDescription :: EvaluationScheme -> String
|
||||||
@ -118,8 +122,26 @@ where calibration measures the quality of probabilities (how well they are calib
|
|||||||
if we have 10 items with probability 0.5 and 5 of them are correct, then the calibration
|
if we have 10 items with probability 0.5 and 5 of them are correct, then the calibration
|
||||||
is perfect.
|
is perfect.
|
||||||
|]
|
|]
|
||||||
|
getMetricDescription GLEU =
|
||||||
|
[i|For the GLEU score, we record all sub-sequences of
|
||||||
|
1, 2, 3 or 4 tokens in output and target sequence (n-grams). We then
|
||||||
|
compute a recall, which is the ratio of the number of matching n-grams
|
||||||
|
to the number of total n-grams in the target (ground truth) sequence,
|
||||||
|
and a precision, which is the ratio of the number of matching n-grams
|
||||||
|
to the number of total n-grams in the generated output sequence. Then
|
||||||
|
GLEU score is simply the minimum of recall and precision. This GLEU
|
||||||
|
score's range is always between 0 (no matches) and 1 (all match) and
|
||||||
|
it is symmetrical when switching output and target. According to
|
||||||
|
the article, GLEU score correlates quite well with the BLEU
|
||||||
|
metric on a corpus level but does not have its drawbacks for our per
|
||||||
|
sentence reward objective.
|
||||||
|
see: https://arxiv.org/pdf/1609.08144.pdf
|
||||||
|
|]
|
||||||
|
getMetricDescription SegmentAccuracy =
|
||||||
|
[i|Accuracy counted for segments, i.e. labels with positions.
|
||||||
|
The percentage of labels in the ground truth retrieved in the actual output is returned.
|
||||||
|
Accuracy is calculated separately for each item and then averaged.
|
||||||
|
|]
|
||||||
|
|
||||||
outContents :: Metric -> String
|
outContents :: Metric -> String
|
||||||
outContents (SoftFMeasure _) = [hereLit|inwords:1-4
|
outContents (SoftFMeasure _) = [hereLit|inwords:1-4
|
||||||
@ -132,6 +154,11 @@ outContents (ProbabilisticMultiLabelFMeasure _) = [hereLit|first-name/1:0.8 surn
|
|||||||
surname/1:0.4
|
surname/1:0.4
|
||||||
first-name/3:0.9
|
first-name/3:0.9
|
||||||
|]
|
|]
|
||||||
|
outContents GLEU = [hereLit|Alice has a black
|
||||||
|
|]
|
||||||
|
outContents SegmentAccuracy = [hereLit|N:1-4 V:5-6 N:8-10 V:12-13 A:15-17
|
||||||
|
N:1-4 V:6-7 A:9-13
|
||||||
|
|]
|
||||||
|
|
||||||
expectedScore :: EvaluationScheme -> MetricValue
|
expectedScore :: EvaluationScheme -> MetricValue
|
||||||
expectedScore (EvaluationScheme (SoftFMeasure beta) [])
|
expectedScore (EvaluationScheme (SoftFMeasure beta) [])
|
||||||
@ -146,6 +173,10 @@ expectedScore (EvaluationScheme (ProbabilisticMultiLabelFMeasure beta) [])
|
|||||||
= let precision = 0.6569596940847289
|
= let precision = 0.6569596940847289
|
||||||
recall = 0.675
|
recall = 0.675
|
||||||
in weightedHarmonicMean beta precision recall
|
in weightedHarmonicMean beta precision recall
|
||||||
|
expectedScore (EvaluationScheme GLEU [])
|
||||||
|
= 0.7142857142857143
|
||||||
|
expectedScore (EvaluationScheme SegmentAccuracy [])
|
||||||
|
= 0.875
|
||||||
|
|
||||||
helpMetricParameterMetricsList :: String
|
helpMetricParameterMetricsList :: String
|
||||||
helpMetricParameterMetricsList = intercalate ", " $ map (\s -> (show s) ++ (case extraInfo s of
|
helpMetricParameterMetricsList = intercalate ", " $ map (\s -> (show s) ++ (case extraInfo s of
|
||||||
@ -194,7 +225,15 @@ the form LABEL:PAGE/X0,Y0,X1,Y1 where LABEL is any label, page is the page numbe
|
|||||||
formatDescription (ProbabilisticMultiLabelFMeasure _) = [hereLit|In each line a number of labels (entities) can be given. A label probability
|
formatDescription (ProbabilisticMultiLabelFMeasure _) = [hereLit|In each line a number of labels (entities) can be given. A label probability
|
||||||
can be provided with a colon (e.g. "foo:0.7"). By default, 1.0 is assumed.
|
can be provided with a colon (e.g. "foo:0.7"). By default, 1.0 is assumed.
|
||||||
|]
|
|]
|
||||||
|
formatDescription GLEU = [hereLit|In each line a there is a space sparated sentence of words.
|
||||||
|
|]
|
||||||
|
formatDescription SegmentAccuracy = [hereLit|Labels can be any strings (without spaces), whereas is a list of
|
||||||
|
1-based indexes or spans separated by commas (spans are inclusive
|
||||||
|
ranges, e.g. "10-14"). For instance, "foo:bar:2,4-7,10" is a
|
||||||
|
label "foo:bar" for positions 2, 4, 5, 6, 7 and 10. Note that no
|
||||||
|
overlapping segments can be returned (evaluation will fail in
|
||||||
|
such a case).
|
||||||
|
|]
|
||||||
|
|
||||||
scoreExplanation :: EvaluationScheme -> Maybe String
|
scoreExplanation :: EvaluationScheme -> Maybe String
|
||||||
scoreExplanation (EvaluationScheme (SoftFMeasure _) [])
|
scoreExplanation (EvaluationScheme (SoftFMeasure _) [])
|
||||||
@ -206,6 +245,17 @@ As far as the second item is concerned, the total area that covered by the outpu
|
|||||||
Hence, recall is 247500/902500=0.274 and precision - 247500/(20000+912000+240000)=0.211. Therefore, the F-score
|
Hence, recall is 247500/902500=0.274 and precision - 247500/(20000+912000+240000)=0.211. Therefore, the F-score
|
||||||
for the second item is 0.238 and the F-score for the whole set is (0 + 0.238)/2 = 0.119.|]
|
for the second item is 0.238 and the F-score for the whole set is (0 + 0.238)/2 = 0.119.|]
|
||||||
scoreExplanation (EvaluationScheme (ProbabilisticMultiLabelFMeasure _) []) = Nothing
|
scoreExplanation (EvaluationScheme (ProbabilisticMultiLabelFMeasure _) []) = Nothing
|
||||||
|
scoreExplanation (EvaluationScheme GLEU [])
|
||||||
|
= Just [hereLit|To find out GLEU score we first count number of tp (true positives) fp(false positives) and fn(false negatives).
|
||||||
|
We have 4 matching unigrams ("Alice", "has", "a", "black") , 3 bigrams ("Alice has", "has a", "a black"), 2 trigrams ("Alice has a", "has a black") and 1 tetragram ("Alice has a black"),
|
||||||
|
so tp=10. We have no fp, therefore fp=0. There are 4 fn - ("cat", "black cat", "a black cat", "has a black cat").
|
||||||
|
Now we have to calculate precision and recall:
|
||||||
|
Precision is tp / (tp+fp) = 10/(10+0) = 1,
|
||||||
|
recall is tp / (tp+fn) = 10 / (10+4) = 10/14 =~ 0.71428...
|
||||||
|
The GLEU score is min(precision,recall)=0.71428 |]
|
||||||
|
scoreExplanation (EvaluationScheme SegmentAccuracy [])
|
||||||
|
= Just [hereLit|Out of 4 segments in the expected output for the first item, 3 were retrieved correcly (accuracy is 3/4=0.75).
|
||||||
|
The second item was retrieved perfectly (accuracy is 1.0). Hence, the average is (0.75+1.0)/2=0.875.|]
|
||||||
|
|
||||||
pasteLines :: String -> String -> String
|
pasteLines :: String -> String -> String
|
||||||
pasteLines a b = printf "%-35s %s\n" a b
|
pasteLines a b = printf "%-35s %s\n" a b
|
||||||
|
13
test/Spec.hs
13
test/Spec.hs
@ -127,6 +127,8 @@ main = hspec $ do
|
|||||||
runGEvalTest "accuracy-simple" `shouldReturnAlmost` 0.6
|
runGEvalTest "accuracy-simple" `shouldReturnAlmost` 0.6
|
||||||
it "with probs" $
|
it "with probs" $
|
||||||
runGEvalTest "accuracy-probs" `shouldReturnAlmost` 0.4
|
runGEvalTest "accuracy-probs" `shouldReturnAlmost` 0.4
|
||||||
|
it "sorted" $
|
||||||
|
runGEvalTest "accuracy-on-sorted" `shouldReturnAlmost` 0.75
|
||||||
describe "F-measure" $ do
|
describe "F-measure" $ do
|
||||||
it "simple example" $
|
it "simple example" $
|
||||||
runGEvalTest "f-measure-simple" `shouldReturnAlmost` 0.57142857
|
runGEvalTest "f-measure-simple" `shouldReturnAlmost` 0.57142857
|
||||||
@ -146,6 +148,9 @@ main = hspec $ do
|
|||||||
describe "TokenAccuracy" $ do
|
describe "TokenAccuracy" $ do
|
||||||
it "simple example" $ do
|
it "simple example" $ do
|
||||||
runGEvalTest "token-accuracy-simple" `shouldReturnAlmost` 0.5
|
runGEvalTest "token-accuracy-simple" `shouldReturnAlmost` 0.5
|
||||||
|
describe "SegmentAccuracy" $ do
|
||||||
|
it "simple test" $ do
|
||||||
|
runGEvalTest "segment-accuracy-simple" `shouldReturnAlmost` 0.4444444
|
||||||
describe "precision count" $ do
|
describe "precision count" $ do
|
||||||
it "simple test" $ do
|
it "simple test" $ do
|
||||||
precisionCount [["Alice", "has", "a", "cat" ]] ["Ala", "has", "cat"] `shouldBe` 2
|
precisionCount [["Alice", "has", "a", "cat" ]] ["Ala", "has", "cat"] `shouldBe` 2
|
||||||
@ -323,6 +328,9 @@ main = hspec $ do
|
|||||||
runGEvalTest "multilabel-f1-with-probs" `shouldReturnAlmost` 0.615384615384615
|
runGEvalTest "multilabel-f1-with-probs" `shouldReturnAlmost` 0.615384615384615
|
||||||
it "labels given with probs and numbers" $ do
|
it "labels given with probs and numbers" $ do
|
||||||
runGEvalTest "multilabel-f1-with-probs-and-numbers" `shouldReturnAlmost` 0.6666666666666
|
runGEvalTest "multilabel-f1-with-probs-and-numbers" `shouldReturnAlmost` 0.6666666666666
|
||||||
|
describe "Mean/MultiLabel-F" $ do
|
||||||
|
it "simple" $ do
|
||||||
|
runGEvalTest "mean-multilabel-f1-simple" `shouldReturnAlmost` 0.5
|
||||||
describe "MultiLabel-Likelihood" $ do
|
describe "MultiLabel-Likelihood" $ do
|
||||||
it "simple" $ do
|
it "simple" $ do
|
||||||
runGEvalTest "multilabel-likelihood-simple" `shouldReturnAlmost` 0.115829218528827
|
runGEvalTest "multilabel-likelihood-simple" `shouldReturnAlmost` 0.115829218528827
|
||||||
@ -342,6 +350,11 @@ main = hspec $ do
|
|||||||
it "just parse" $ do
|
it "just parse" $ do
|
||||||
parseAnnotations "foo:3,7-10 baz:4-6" `shouldBe` Right [Annotation "foo" (IS.fromList [3,7,8,9,10]),
|
parseAnnotations "foo:3,7-10 baz:4-6" `shouldBe` Right [Annotation "foo" (IS.fromList [3,7,8,9,10]),
|
||||||
Annotation "baz" (IS.fromList [4,5,6])]
|
Annotation "baz" (IS.fromList [4,5,6])]
|
||||||
|
it "just parse wit colons" $ do
|
||||||
|
parseSegmentAnnotations "foo:x:3,7-10 baz:4-6" `shouldBe` Right [Annotation "foo:x" (IS.fromList [3,7,8,9,10]),
|
||||||
|
Annotation "baz" (IS.fromList [4,5,6])]
|
||||||
|
it "just parse wit colons" $ do
|
||||||
|
parseSegmentAnnotations "foo:x:3,7-10 baz:2-6" `shouldBe` Left "Overlapping segments"
|
||||||
it "just parse 2" $ do
|
it "just parse 2" $ do
|
||||||
parseAnnotations "inwords:1-3 indigits:5" `shouldBe` Right [Annotation "inwords" (IS.fromList [1,2,3]),
|
parseAnnotations "inwords:1-3 indigits:5" `shouldBe` Right [Annotation "inwords" (IS.fromList [1,2,3]),
|
||||||
Annotation "indigits" (IS.fromList [5])]
|
Annotation "indigits" (IS.fromList [5])]
|
||||||
|
@ -0,0 +1,4 @@
|
|||||||
|
foo baz bar
|
||||||
|
|
||||||
|
xyz aaa
|
||||||
|
2 a:1 3
|
|
1
test/accuracy-on-sorted/accuracy-on-sorted/config.txt
Normal file
1
test/accuracy-on-sorted/accuracy-on-sorted/config.txt
Normal file
@ -0,0 +1 @@
|
|||||||
|
--metric Accuracy:S
|
@ -0,0 +1,4 @@
|
|||||||
|
bar baz foo
|
||||||
|
|
||||||
|
xyz
|
||||||
|
a:1 2 3
|
|
@ -0,0 +1,4 @@
|
|||||||
|
foo bar baz
|
||||||
|
uuu
|
||||||
|
foo bar baz
|
||||||
|
qqq aaa
|
|
@ -0,0 +1 @@
|
|||||||
|
--metric Mean/MultiLabel-F1
|
@ -0,0 +1,4 @@
|
|||||||
|
foo bar baz
|
||||||
|
|
||||||
|
foo
|
||||||
|
qqq qqq
|
|
@ -0,0 +1,3 @@
|
|||||||
|
foo:0 baq:1-2 baz:3
|
||||||
|
aaa:0-1
|
||||||
|
xyz:0 bbb:x:1
|
|
@ -0,0 +1 @@
|
|||||||
|
--metric SegmentAccuracy
|
@ -0,0 +1,3 @@
|
|||||||
|
foo:0 bar:1-2 baz:3
|
||||||
|
aaa:0-2
|
||||||
|
xyz:0 bbb:x:1 ccc:x:2
|
|
Loading…
Reference in New Issue
Block a user