geval/CHANGELOG.md


## 1.40.6.0 (2021-10-28)

Enhancements:

* Add p<P> flag for selecting the P% subset with the highest confidence scores

## 1.40.5.1 (2021-10-09)

Bug fixes:

* Improve line-by-line mode for BIO-F1
* Fix clarifications for submitting solutions

## 1.40.5.0 (2021-08-20)

Enhancements:

* Add PerplexityHashed metric

## 1.40.4.1 (2021-08-03)

Fixes:

* Improve diagnostics for broken metric definitions

## 1.40.4.0 (2021-07-23)

Fixes:

* Fix inconsistencies in handling probabilities in MultiLabel-F-score
* Plot calibration graphs for Probabilities-MultiLabel-F-score
* Improve diagnostics for unknown or broken metric definitions

## 1.40.3.0

Fix:

* Properly validate the train set

## 1.40.2.0

Enhancements:

* Matching specifications (e.g. fuzzy matching) can be used for Accuracy

## 1.40.1.0

Improvements:

* Handle DOS/Windows end-of-lines

## 1.40.0.0

New features:

* Add BIO-Weighted-F1 metric
* Handle filter & match combination of flags properly

## 1.39.0.0

New features:

* Add Haversine metric for distance on a sphere

## 1.38.0.0

New features:

* Add CER (Character-Error Rate) metric
* Spaces can be escaped with backslashes in configuration files

## 1.37.0.0

New features:

* Add --select-metric to select metric(s) by name (useful when you
  have a complicated configuration with a large number of metrics,
  and you want to see the result only for a specific metric, especially
  in --line-by-line or --worst-features more)
* Add --show-preprocessed so that in --line-by-line or similar modes you
  will be shown the results after the proprocessing
* --validate checks whether at least one metric is of priority 1

Bug fixes:

* Fix handling MultiLabel-F1 in --line-by-line mode when used with flags

## 1.36.1.0

* Add "c" and "t" flags

## 1.36.0.0

* Add fuzzy matching for MultiLabel-F1

## 1.35.0.0

* Add simple "mix" ensembling

## 1.34.0.0

* Add filtering (f<...> op for metrics)

## 1.33.0.0

* Handle headers in TSV files

## 1.32.2.0

* Fix bug in cross-tabs

## 1.32.0.0

* Add option to mark worst features

## 1.31.0.0

* Fix validation of challenges with Bootstrap resampling

## 1.30.0.0

* Automatically set precision when in Bootstrap mode

## 1.29.0.0

* Bootstrap resampling for most metrics

## 1.28.0.0

* Add `s` flag for substitution

## 1.27.0.0

* Results are formatted in cross-tables (if possible)

## 1.26.0.0

* Change the meaning of WER (WER is calculated for the whole set now
  - similar to the way BLEU is calculated)
* Use `Mean/WER` if you want the old meaning (average of per-item results)

## 1.25.0.0

* Add --oracle-item-based

## 1.24.0.0

* Introduce metric priorities
* Use "Cartesian" strings in metrics

## 1.23.0.0

* New style of train data is preferred
  - `in.tsv` and `expected.tsv` instead of `train.tsv`
  - though this is not required as sometimes training data look different than test data
  - `--validate` option was changed accordingly

## 1.22.1.0

* Add "Mean/" meta-metric (for the time being working only with MultiLabel-F-measure)
* Add :S flag

## 1.22.0.0

* Add SegmentAccuracy

## 1.21.0.0

* Add Probabilistic-MultiLabel-F-measure

## 1.20.1.0

* Fix Soft2D-F1 metric
* Check for invalid rectangles in Soft2D-F1 metric

## 1.20.0.0

* Add --list-metrics options
* Add Soft2D-F1 metric.

## 1.19.0.0

* Fully static build
* Add preprocessing options for metrics

## 1.18.2.0

* During validation, check the number of columns
* During validation, check the number of lines
* Validate train files

## 1.18.1.0

* During validation, check whether the maximum values is obtained with the expected data

## 1.18.0.0

* Add --validate option

## 1.17.0.0

* Add Probabilistic-Soft-F-score

## 1.16.0.0

* Handle JSONL files (only for MultiLabel-F-score)
* Fix SMAPE metric

## 1.0.0.1

* Added `--version`, `-v` options handling
More checks in validation 2019-08-10 16:31:54 +02:00
Bump up version 2021-10-28 08:52:44 +02:00			`## 1.40.6.0 (2021-10-28)`

			`Enhancements:`

			`* Add p<P> flag for selecting the P% subset with the highest confidence scores`

Bump up version 2021-10-09 18:32:31 +02:00			`## 1.40.5.1 (2021-10-09)`

			`Bug fixes:`

			`* Improve line-by-line mode for BIO-F1`
			`* Fix clarifications for submitting solutions`

Bump up version 2021-08-20 19:40:16 +02:00			`## 1.40.5.0 (2021-08-20)`

			`Enhancements:`

			`* Add PerplexityHashed metric`

Bump up version 2021-08-03 14:36:04 +02:00			`## 1.40.4.1 (2021-08-03)`

			`Fixes:`

			`* Improve diagnostics for broken metric definitions`

Bump up version 2021-07-23 18:38:12 +02:00			`## 1.40.4.0 (2021-07-23)`

			`Fixes:`

			`* Fix inconsistencies in handling probabilities in MultiLabel-F-score`
			`* Plot calibration graphs for Probabilities-MultiLabel-F-score`
			`* Improve diagnostics for unknown or broken metric definitions`

Bump up version 2021-07-21 14:22:42 +02:00			`## 1.40.3.0`

			`Fix:`

			`* Properly validate the train set`

Bump up version 2021-07-19 17:09:29 +02:00			`## 1.40.2.0`

			`Enhancements:`

			`* Matching specifications (e.g. fuzzy matching) can be used for Accuracy`

Bump up version 2021-06-30 09:34:12 +02:00			`## 1.40.1.0`

			`Improvements:`

			`* Handle DOS/Windows end-of-lines`

Bump up version 2021-06-10 15:08:55 +02:00			`## 1.40.0.0`

			`New features:`

			`* Add BIO-Weighted-F1 metric`
			`* Handle filter & match combination of flags properly`

Bump up version 2021-06-07 18:17:21 +02:00			`## 1.39.0.0`

			`New features:`

			`* Add Haversine metric for distance on a sphere`

Bump up version 2020-10-17 20:53:31 +02:00			`## 1.38.0.0`

			`New features:`

			`* Add CER (Character-Error Rate) metric`
			`* Spaces can be escaped with backslashes in configuration files`

Bump up version 2020-08-08 21:52:03 +02:00			`## 1.37.0.0`

			`New features:`

			`* Add --select-metric to select metric(s) by name (useful when you`
			`have a complicated configuration with a large number of metrics,`
			`and you want to see the result only for a specific metric, especially`
			`in --line-by-line or --worst-features more)`
			`* Add --show-preprocessed so that in --line-by-line or similar modes you`
			`will be shown the results after the proprocessing`
			`* --validate checks whether at least one metric is of priority 1`

			`Bug fixes:`

			`* Fix handling MultiLabel-F1 in --line-by-line mode when used with flags`

Bump up version number 2020-08-01 21:46:59 +02:00			`## 1.36.1.0`

			`* Add "c" and "t" flags`

Bump up version 2020-07-02 18:38:20 +02:00			`## 1.36.0.0`

			`* Add fuzzy matching for MultiLabel-F1`

Bump up version 2020-05-20 11:56:37 +02:00			`## 1.35.0.0`

			`* Add simple "mix" ensembling`

Bump up version 2020-05-13 15:39:36 +02:00			`## 1.34.0.0`

			`* Add filtering (f<...> op for metrics)`

Bump up version number 2020-02-22 13:21:52 +01:00			`## 1.33.0.0`

			`* Handle headers in TSV files`

Bump up version number 2020-02-11 09:38:12 +01:00			`## 1.32.2.0`

			`* Fix bug in cross-tabs`

Bump up version 2020-01-31 09:50:26 +01:00			`## 1.32.0.0`

			`* Add option to mark worst features`

Bump up version number 2020-01-28 22:41:48 +01:00			`## 1.31.0.0`

			`* Fix validation of challenges with Bootstrap resampling`

Bump up version number 2020-01-28 21:42:54 +01:00			`## 1.30.0.0`

			`* Automatically set precision when in Bootstrap mode`

Bump up version 2020-01-27 22:53:15 +01:00			`## 1.29.0.0`

			`* Bootstrap resampling for most metrics`

Bump up version 2020-01-11 17:03:57 +01:00			`## 1.28.0.0`

			* Add `s` flag for substitution

Bump up versio 2020-01-04 22:01:56 +01:00			`## 1.27.0.0`

			`* Results are formatted in cross-tables (if possible)`

Bump up version number 2019-12-21 16:05:51 +01:00			`## 1.26.0.0`

			`* Change the meaning of WER (WER is calculated for the whole set now`
			`- similar to the way BLEU is calculated)`
			* Use `Mean/WER` if you want the old meaning (average of per-item results)

Bump up version 2019-12-16 12:49:18 +01:00			`## 1.25.0.0`

			`* Add --oracle-item-based`

Bump up version 2019-12-14 21:12:46 +01:00			`## 1.24.0.0`

			`* Introduce metric priorities`
			`* Use "Cartesian" strings in metrics`

Bump up version number 2019-12-13 20:38:08 +01:00			`## 1.23.0.0`

			`* New style of train data is preferred`
			- `in.tsv` and `expected.tsv` instead of `train.tsv`
			`- though this is not required as sometimes training data look different than test data`
			- `--validate` option was changed accordingly

Bump up version number 2019-11-25 21:35:22 +01:00			`## 1.22.1.0`

			`* Add "Mean/" meta-metric (for the time being working only with MultiLabel-F-measure)`
			`* Add :S flag`

Bump up version number 2019-11-18 21:53:40 +01:00			`## 1.22.0.0`

			`* Add SegmentAccuracy`

Bump up version 2019-09-07 15:55:36 +02:00			`## 1.21.0.0`

			`* Add Probabilistic-MultiLabel-F-measure`

Bump up version number 2019-09-03 17:21:27 +02:00			`## 1.20.1.0`

			`* Fix Soft2D-F1 metric`
			`* Check for invalid rectangles in Soft2D-F1 metric`

Bump up version 2019-08-22 17:09:58 +02:00			`## 1.20.0.0`

			`* Add --list-metrics options`
			`* Add Soft2D-F1 metric.`

Clean up README and CHANGELOG 2019-08-20 07:55:22 +02:00			`## 1.19.0.0`

			`* Fully static build`
			`* Add preprocessing options for metrics`

More checks in validation 2019-08-10 16:31:54 +02:00			`## 1.18.2.0`

			`* During validation, check the number of columns`
			`* During validation, check the number of lines`
			`* Validate train files`

			`## 1.18.1.0`

			`* During validation, check whether the maximum values is obtained with the expected data`

Refactor (introduce GEval.Metric) 2019-08-10 12:30:17 +02:00			`## 1.18.0.0`

			`* Add --validate option`

Bump up version number 2019-03-12 22:41:57 +01:00			`## 1.17.0.0`

			`* Add Probabilistic-Soft-F-score`

Bump up version number 2019-02-14 20:41:18 +01:00			`## 1.16.0.0`

			`* Handle JSONL files (only for MultiLabel-F-score)`
			`* Fix SMAPE metric`

Added version flag handling, added changelog 2018-06-13 12:19:06 +02:00			`## 1.0.0.1`

Bump up version number 2019-02-14 20:41:18 +01:00			* Added `--version`, `-v` options handling