diff --git a/README.md b/README.md index f7d03a7..60483a9 100644 --- a/README.md +++ b/README.md @@ -39,6 +39,87 @@ After installing Stack: The last command will start the Web server with Gonito (go to http://127.0.0.1:3000 in your browser). +Gonito & git +------------ + +Gonito uses git in an inherent manner: + +* challenges (data sets) are provided as git repositories, +* submissions are uploaded via git repositories, they are referred to with + git commit hashes. + +Advantages: + +* great flexibility as far as where you want to keep your challenges + and submissions (could be external, well-known services such as + GitHub or GitLab, your local git server, let's say gitolite or Gogs, or + just a disk accessible in a Gonito instance), +* even if Gonito ceases to exist, the challenges and submissions are still available + in a standard manner, provided that git repositories (be it external or local) are + accessible, +* data sets can be easily downloaded using the command line + (e.g. `git clone git://gonito.net/paranormal-or-skeptic`), without + even clicking anything in the Web browser, +* facilitates experiment repeatability and reproducibility (at worst + the system output is easily available via git) +* tools that were used to generate the output could be linked as git subrepositories +* some challenge/submission metadata are tracked in a Gonito-independent way + (within git commits), +* copying data can be avoided with git mechanisms (e.g. when the challenge is already + cloned, downloading specific submissions should be much quicker), +* large data sets and models could be stored if needed using mechanisms such as git-annex (see below). + +### Commit structure + +The following flow of git commits is recommended (though not required): + +* the challenge without hidden data for main test sets (i.e. files such as `test-A/expected.tsv`) + should be pushed to the `master` branch +* the hidden files (`test-A/expected.tsv`) should be added in a + subsequent commit and pushed either to the `dont-peek` branch or a + `master` branch of a separate repository (if access to the hidden + data must be more strict), +* the submissions should be committed with the `master` branch as the + parent (or at least ancestor) commit and pushed to the same + repository as the challenge data (in some user-specific branch) or any other + repository (could be user-owned repositories) +* any subsequent submissions could be derived in a natural way from other git commits + (e.g. when a submission is improved, or even two approaches are merged) +* new versions of the challenge can be committed (a challenge can be updated at Gonito) + to the `master` (and `dont-peek`) branches + +See also the following picture: + +![Recommended commit structure](misc/commits.png) + +### git-annex + +In some cases, you don't want to store challenge/submissions files simply in git: + +* very large data files, textual files (e.g. `train/in.tsv` even if + compressed as `train/in.tsv.xz`) +* binary training/testing data (PDF files, images, movies, recordings) +* data sensitive due to privacy/security concerns (a scenario where it's OK to store + metadata and some files in a widely accessible repository, but some files require + limited access) +* large ML models (note that Gonito does not require models for evaluation, but still + it might be a good practice to commit them along with output files and scripts) + +Such cases can be handled in a natural manner using git-annex, a git +extension for handling files and their metadata without commiting +their content to the repository. The contents can be stored at a wide +range of [special +remotes](https://git-annex.branchable.com/special_remotes/), e.g. S3 +buckets, WebDAV, rsync servers. + +It's up to you which files are stored in git in a regular manner and +which are added with `git annex add`, but note that if a +challenge/submission file must be stored via git-annex and are required +for evaluation (e.g. `expected.tsv` files for the challenge or +`out.tsv` files for submissions), the git-annex special remote must be +given when a challenge is created or a submission is done and the +Gonito server must have access to such a special remote. + Authors ------- diff --git a/misc/commits.drawio b/misc/commits.drawio new file mode 100644 index 0000000..73cf3ba --- /dev/null +++ b/misc/commits.drawio @@ -0,0 +1 @@ +7Vpbc5s6EP41zEkfzIC42H5MnOSch3ZOZzKdtk8dGTagBBAj5Ft/fVdGGDCNQ9q6kE6frF2thLT7fasV2HAW6fZfQfP4HQ8hMYgVbg3n2iDEdomFP0qz05q5Mys1kWCh1tWKO/YVtFIPjFYshKJlKDlPJMvbyoBnGQSypaNC8E3b7J4n7afmNIKO4i6gSVf7kYUyLrUzMq31/wGL4urJtj8ve1JaGeudFDEN+aahcm4MZyE4l2Ur3S4gUd6r/FKOu32i97AwAZnsMyD/ED58+T9YvXM/Wvzh6+P2jr2d6FnWNFnpDRvET3C+q5CtsRmpZhDTJIEMHaX7lqLqqjT42MaA78yR0kKC+L75GkTBeIbrsE3LrNwld1UMBF9lIahtWGi9iZmEu5wGqneDsENdLNMEJRubhRT88RArsn/YfosgJGyf9J19iAhiGXgKUuzQRA9wKjhqGHszr5Q3NSbsKtBxAw++1lENw+gwdR0pbOhgvSBwTsdLECJwtciFjHnEM5rc1Nqrth9rm7ec59p7DyDlTrOQriRv+xa9JXaf1Hhz6lXyZz3fXrjetqSdlp4Kilrz6ZDgFvlKBHDCFUSnBSoikM9hvRtiAQmVbN1exy+PF+kQLeSZnOQAj0+z6hRHevLuQkIhJ5doB9scEySEpizWaochQuFNf0aPjpOu3+ak61n9ODk9Fye9gTlJxsNJ9zVw0u1w8jJhAfxTqB2ulikrSsodU2RPHcNBUlnG9CqE9QQRdauJNsV5bvlKljRbqBKGZ0xyc0dTtaKL5swWboSGVNI3pWkRCJbLohSuefAI4p4hTvZyqqor3QcyMA0yxqPS9o5oafekpW2di5fTgXk5Hlr6PWnpDklL/wQtWZoLvsa4nGboGHlB2rwg86GPK7ubPn4DL86K7/lrOHbmHXyfuGA16z57nHcjpw1sx+oJ7LPdjWzyxwHbrt59PIdsMiSy7fkgJ+2WyU/VWYrtz/Whi1J9zCqhOmXHeZPtHeX5oFHuvjT68bus/ZK7bApC5UiL36tUwrAMV5Oo+lktp7wbqx1nqj5Y5dihrBEvVa/9qi+8s6MKwumZaGdnS7T+38q6coXXk7r+oNQd+irkWLNmyCaWafnTZ+K2l96DYOgDEKMK5rB52Ovk4X7vL8rMqHLkPRdlVjJHXGAeXgxU35AGf9F3ANLfvFf5fdxUqVbZoMoVX3aIMjroO57bgv50cOQP8t3pvAgmPRFc5tufgPB+6KUQdNcwyDnL1Bvfw8zvlaKGAHHb2W929JX3yNz2T5ljo3x+jYDDRn4CFO7A6XBEHz56o6nxH4AhEiLpJMRFTEXCutWD+twQ8HTJMpZF2L4AM1LlAmQFpMsElXipsvaVx+Hu1U2uShphgiVHtcVk1jPDzl+eYVGs/+1Rcq/+04xz8w0= \ No newline at end of file diff --git a/misc/commits.png b/misc/commits.png new file mode 100644 index 0000000..4ca4d9f Binary files /dev/null and b/misc/commits.png differ