Reproducibility guidelines
This commit is contained in:
parent
70fc7b8d31
commit
94df816065
263
README.md
263
README.md
@ -1018,6 +1018,269 @@ Note that using `--submit` option for the main instance at
|
|||||||
repositories are configured there in such a way that an evaluation is
|
repositories are configured there in such a way that an evaluation is
|
||||||
triggered with each push anyway.
|
triggered with each push anyway.
|
||||||
|
|
||||||
|
## Reproducibility guidelines
|
||||||
|
|
||||||
|
GEval is about evaluation, all you actually need to supply are just
|
||||||
|
`out.tsv` files. Remember, GEval (and associated evaluation platform
|
||||||
|
Gonito) is not going to _run_ your submission, it just evaluates the
|
||||||
|
_output_ of your solution by comparing it against the gold standard,
|
||||||
|
i.e. the `expected.tsv` files.
|
||||||
|
|
||||||
|
Nevertheless, it would be nice to have some _standards_ for organizing
|
||||||
|
your code and models so that it would be easy for other people (and
|
||||||
|
you yourself a month later) to reproduce your results. Here I lay out
|
||||||
|
some guidelines or standards for this. The conformance to the
|
||||||
|
guidelines is not checked by GEval/Gonito (though it may be at some
|
||||||
|
time in the future).
|
||||||
|
|
||||||
|
### The file structure
|
||||||
|
|
||||||
|
Here is the recommended file structure of your submission:
|
||||||
|
|
||||||
|
* `dev-?/out.tsv`, `test-?/out.tsv` — files required by GEval/Gonito
|
||||||
|
for the actual evaluation;
|
||||||
|
|
||||||
|
* `gonito.yaml` — metadata for Gonito;
|
||||||
|
|
||||||
|
* `predict.sh` — this script should read items from standard input in the
|
||||||
|
same format as in `in.tsv` files for a given challenge and
|
||||||
|
print the results on the standard output in the same
|
||||||
|
format as in `out.tsv` files
|
||||||
|
- actually `out.tsv` should be generated with `predict.sh`,
|
||||||
|
- `predict.sh` must print exactly the same number of lines as it read from the input,
|
||||||
|
- `predict.sh` should accept any number of items, including a single item, in other
|
||||||
|
words `echo '...' | ./predict.sh` should work,
|
||||||
|
- `predict.sh` should use models stored in `models/` (generated by `train.sh`),
|
||||||
|
- `predict.sh` can invoke further scripts in `code/`;
|
||||||
|
|
||||||
|
* `train.sh` — this script should train a machine-learning model using
|
||||||
|
the data in `train/` (and possibly using development sets in
|
||||||
|
`dev-?/` for fine-tuning, validation, early stopping, etc.), all the models
|
||||||
|
should be saved in the `models/` directory
|
||||||
|
- just as `predict.sh`, `train.sh` can invoke scripts in `code/` (obviously some code
|
||||||
|
in `code/` could be shared between `predict.sh` and `train.sh`)
|
||||||
|
- `train.sh` should generate `out.tsv` files (preferably by running `predict.sh`);
|
||||||
|
|
||||||
|
* `code/` — source codes and scripts for training and prediction should be put here;
|
||||||
|
|
||||||
|
* `models/` — all the models generated by `train.sh` should be put here;
|
||||||
|
|
||||||
|
* `Dockerfile` — recipe for a multi-stage build with `train` and
|
||||||
|
`predict` targets for building containers in which, respectively,
|
||||||
|
`train.sh` and `predict.sh` is guaranteed to run (more details below);
|
||||||
|
|
||||||
|
* `.dockerignore` — put at least `models/*` and `train/*` here to
|
||||||
|
speed up building Docker containers;
|
||||||
|
|
||||||
|
* `Makefile` (optional) — if you use make, please put your recipe here (not in `code/`).
|
||||||
|
|
||||||
|
#### Environment variables
|
||||||
|
|
||||||
|
There are some environment variables that should be handled by
|
||||||
|
`train.sh` and `predict.sh` (if it is applicable to them):
|
||||||
|
|
||||||
|
* `RANDOM_SEED` — the value of the random seed,
|
||||||
|
* `THREADS` — number of threads/jobs/cores to be used (usually to be passed
|
||||||
|
to options `-j N`, `--threads N` or similar).
|
||||||
|
* `BATCH_SIZE` (only `predict.sh`) — the value of the batch size
|
||||||
|
- by default, `BATCH_SIZE=1` should be assumed
|
||||||
|
- if set to 1, `predict.sh` should immediately return the processed value
|
||||||
|
- if set to N > 1, `predict.sh` can read batches of N items, process the whole
|
||||||
|
batch and return the results for the whole batch
|
||||||
|
|
||||||
|
### Example — Classification with fastText
|
||||||
|
|
||||||
|
Let's try to reproduce a sample submission conforming to the standards
|
||||||
|
laid out above. The challenge is to [guess whether a given tweet
|
||||||
|
expresses a positive or a negative
|
||||||
|
sentiment](https://gonito.net/challenge/sentiment140). You're given
|
||||||
|
the tweet text along with datestamps in two formats.
|
||||||
|
|
||||||
|
The [sample solution](https://gonito.net/view-variant/7452) to this challenge, based on
|
||||||
|
[fastText](https://fasttext.cc/), can be cloned as a git repo:
|
||||||
|
|
||||||
|
```
|
||||||
|
git clone --single-branch git://gonito.net/sentiment140 -b submission-07130
|
||||||
|
```
|
||||||
|
|
||||||
|
(The `--single-branch` is to speed up the download.)
|
||||||
|
|
||||||
|
As usual, you could evaluate this solution locally on the dev set:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ cd sentiment140
|
||||||
|
$ geval -t dev-0
|
||||||
|
79.88
|
||||||
|
```
|
||||||
|
|
||||||
|
The accuracy is nearly 80%, so it's pretty good. But now we are not
|
||||||
|
interested in evaluating outputs, we'd like to actually _run_ the
|
||||||
|
solution, or even reproduce training from scratch.
|
||||||
|
|
||||||
|
Let try to run the fastText classifier on the first 5 items from the
|
||||||
|
dev-0 set.
|
||||||
|
|
||||||
|
```
|
||||||
|
$ xzcat dev-0/in.tsv.xz | head -n 5
|
||||||
|
2009.4109589041095 20090531 @smaknews I love Santa Barbara! In fact @BCCF's next Black Tie Charity Event is in Santa Barbara on August 15th!
|
||||||
|
2009.4054794520548 20090529 @GreenMommaSmith yeah man, I really need an exercise bike. Tris laughs when I mention it
|
||||||
|
2009.2630136986302 20090407 Anticipating a slow empty boring summer
|
||||||
|
2009.4164383561645 20090602 just crossed the kankakee river i need to go back soon & see my family. *tori*
|
||||||
|
2009.4301369863015 20090607 is o tired because of my HillBilly Family and my histerical sister! Stress is not good for me, lol. Stuck at work
|
||||||
|
|
||||||
|
$ xzcat dev-0/in.tsv.xz | head -n 5 | ./predict.sh
|
||||||
|
terminate called after throwing an instance of 'std::invalid_argument'
|
||||||
|
what(): models/sentiment140.fasttext.bin cannot be opened for loading!
|
||||||
|
```
|
||||||
|
|
||||||
|
What went wrong!? The fastText model is pretty large (420 MB), so it
|
||||||
|
would not be a good idea to commit it to the git repository directly.
|
||||||
|
It was stored using git-annex instead.
|
||||||
|
[Git-annex](https://git-annex.branchable.com/) is a neat git extension
|
||||||
|
with which you commit only metadata and keep the actual contents
|
||||||
|
wherever you want (directory, rsync host, S3 bucket, DropBox etc.).
|
||||||
|
|
||||||
|
I put the model on my server, you can download it using the bash script supplied:
|
||||||
|
|
||||||
|
```
|
||||||
|
./get-annexed-files.sh models/sentiment140.fasttext.bin
|
||||||
|
```
|
||||||
|
|
||||||
|
Now should be OK:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ xzcat dev-0/in.tsv.xz | head -n 5 | ./predict.sh
|
||||||
|
positive
|
||||||
|
positive
|
||||||
|
negative
|
||||||
|
positive
|
||||||
|
negative
|
||||||
|
```
|
||||||
|
|
||||||
|
Well… provided that you have fastText installed. So it's not exactly a
|
||||||
|
perfect reproducibility. Don't worry, we solve this issue with
|
||||||
|
Docker in a moment.
|
||||||
|
|
||||||
|
What if you want retrain the model from scratch, then you should run
|
||||||
|
the `train.sh` script, let's set the random seed to some other value:
|
||||||
|
|
||||||
|
```
|
||||||
|
./get-annexed-files.sh train/in.tsv.xz
|
||||||
|
rm models/*
|
||||||
|
RANDOM_SEED=42 ./train.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
Note that we need to download the input part of the train set first. As it
|
||||||
|
is pretty large, I decided to store it in a git-annex storage too.
|
||||||
|
|
||||||
|
The evaluation results are slightly different:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ geval -t dev-0
|
||||||
|
79.86
|
||||||
|
```
|
||||||
|
|
||||||
|
It's not surprising as a different seed was chosen (and fastText might
|
||||||
|
not be deterministic itself).
|
||||||
|
|
||||||
|
#### How did I actually uploaded this solution?
|
||||||
|
|
||||||
|
I ran the `train.sh` script. All files except the model were added
|
||||||
|
using the regular `git add` command:
|
||||||
|
|
||||||
|
```
|
||||||
|
git add code dev-0/out.tsv .dockerignore Dockerfile gonito.yaml predict.sh test-A/out.tsv train.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
The model was added with `git annex`:
|
||||||
|
|
||||||
|
```
|
||||||
|
git annex add models/sentiment140.fasttext.bin
|
||||||
|
```
|
||||||
|
|
||||||
|
Then I committed the changes and pushed the files to the repo. Still,
|
||||||
|
the model file had to be uploaded to the git-annex storage.
|
||||||
|
I was using a directory on a server to which I have access via SSH:
|
||||||
|
|
||||||
|
```
|
||||||
|
git annex initremote gonito type=rsync rsyncurl=gonito.vm.wmi.amu.edu.pl:/srv/http/annex encryption=none
|
||||||
|
```
|
||||||
|
|
||||||
|
I uploaded the file there:
|
||||||
|
|
||||||
|
```
|
||||||
|
git annex copy models/* --to gonito
|
||||||
|
```
|
||||||
|
|
||||||
|
The problem is that only I could download the files from this
|
||||||
|
git-annex remote. In order to make it available to the whole world, I
|
||||||
|
set up an HTTP server and served the files from there. The trick is to
|
||||||
|
add
|
||||||
|
[httpalso](https://git-annex.branchable.com/special_remotes/httpalso/)
|
||||||
|
special remote:
|
||||||
|
|
||||||
|
```
|
||||||
|
git annex initremote --sameas=gonito gonito-https type=httpalso url=https://gonito.vm.wmi.amu.edu.pl/annex
|
||||||
|
```
|
||||||
|
|
||||||
|
Finally, you need to synchronize the information about special remotes:
|
||||||
|
|
||||||
|
```
|
||||||
|
git annex -a --no-content
|
||||||
|
```
|
||||||
|
|
||||||
|
### Docker
|
||||||
|
|
||||||
|
Still, the problem with reproducibility of the sample solution
|
||||||
|
remains, as you must install requirements: fastText (plus some Python
|
||||||
|
modules for training). It's a quite a big hassle, if you consider that
|
||||||
|
there might a lot of different solutions each with a different set of
|
||||||
|
requirements.
|
||||||
|
|
||||||
|
Docker containers might come in handy here. The idea is that a
|
||||||
|
submitter should supply a Dockerfile meeting the following conditions:
|
||||||
|
|
||||||
|
* defined as a multi-stage build;
|
||||||
|
* there are at least 2 images defined: `train` and `predict`;
|
||||||
|
* `train` defines an environment required for training
|
||||||
|
- but training scripts and the data set should _not_ be included in the image,
|
||||||
|
- the image should be run on the directory with the solution mounted to `/workspace`
|
||||||
|
- i.e. the following commands should run training
|
||||||
|
|
||||||
|
```
|
||||||
|
docker build . --target train -t foo-train
|
||||||
|
|
||||||
|
docker run -v $(pwd):/workspace -it foo-train /workspace/train.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
* `predict` defines a self-contained predictor
|
||||||
|
- contrary to the `train` image all the scripts and binaries needed
|
||||||
|
for the actual prediction should be there,
|
||||||
|
- … except for the models that needs to be supplied in the directory mounted
|
||||||
|
at `/workspace/models`, i.e. the following commands should just work:
|
||||||
|
|
||||||
|
```
|
||||||
|
docker build . --target predict -t foo-predict
|
||||||
|
|
||||||
|
docker run -v $(pwd)/models:/workspace/models -i foo-predict
|
||||||
|
```
|
||||||
|
- this way you can easily switch to another model without changing the base code
|
||||||
|
|
||||||
|
#### Back to the example
|
||||||
|
|
||||||
|
And it works for the example given above. With one caveat: due an
|
||||||
|
unfortunate interaction of git-annex and Docker, you need to _unlock_ model files
|
||||||
|
before running the Docker container:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ docker build . --target predict -t sentiment140-predict
|
||||||
|
|
||||||
|
$ git annex unlock models/*
|
||||||
|
|
||||||
|
$ echo -e '2021.99999\t20211231\tGEval is awesome!' | docker run -v $(pwd)/models:/workspace/models -i sentiment140-predict
|
||||||
|
positive
|
||||||
|
```
|
||||||
|
|
||||||
## `geval` options
|
## `geval` options
|
||||||
|
|
||||||
```
|
```
|
||||||
|
Loading…
Reference in New Issue
Block a user