Go to file
Filip Gralinski 9f4942a657 All solutions are visible now
All the data (except for the repo URL and dev outputs) could be
inferred, anyway. So it would not make much sense to hide it.
2021-08-21 16:11:15 +02:00
app Switch to an incompatible DB scheme 2021-02-27 11:48:30 +01:00
arena show Readme 2015-09-06 14:24:49 +02:00
config Introduce NoInternalGitServer scheme 2021-08-21 15:02:08 +02:00
Data Update geval 2020-10-19 08:14:09 +02:00
fay init 2015-08-20 22:33:38 +02:00
fay-shared init 2015-08-20 22:33:38 +02:00
geval@daac240904 Up GEval 2021-08-20 19:43:19 +02:00
Gonito Add dependency tracking 2018-11-16 12:43:44 +01:00
Handler All solutions are visible now 2021-08-21 16:11:15 +02:00
helpers/gitolite Add helper scripts 2019-11-24 14:39:33 +01:00
Import switch to Stack LTS 9.5, remove Fay 2017-09-22 14:23:03 +02:00
messages Slack announcements can be tested 2021-08-21 09:28:19 +02:00
misc Add documentation on git 2020-12-31 11:58:54 +01:00
Settings init 2015-08-20 22:33:38 +02:00
sql-scripts Add helper script for conversion to version 2 2021-02-27 11:27:04 +01:00
static Add API for viewing progress logs 2021-07-30 12:19:27 +02:00
templates Introduce NoInternalGitServer scheme 2021-08-21 15:02:08 +02:00
test Fix test 2021-05-12 07:03:25 +02:00
Web Handle Discord announcements 2021-08-21 10:48:34 +02:00
.dir-locals.el init 2015-08-20 22:33:38 +02:00
.dockerignore Switch to new Dockerfile 2021-05-12 07:03:17 +02:00
.ghci init 2015-08-20 22:33:38 +02:00
.gitignore Minor improvement in .gitignore 2021-08-17 17:34:31 +02:00
.gitlab-ci.yml Pinpoint docker image 2019-08-24 12:18:06 +02:00
.gitmodules Switch to new Dockerfile 2021-05-12 07:03:17 +02:00
add-variants.sql helper script for transition to multiple variants 2018-07-04 16:43:50 +02:00
add-versions.sql Fix helper script 2019-08-27 23:01:12 +02:00
Application.hs Slack announcements can be tested 2021-08-21 09:28:19 +02:00
build.sh Fix building script 2018-09-14 15:52:45 +02:00
CHANGELOG.md Bump up version 2021-08-09 22:21:15 +02:00
docker-compose-simple.yml Introduce NoInternalGitServer scheme 2021-08-21 15:02:08 +02:00
docker-compose.yml Introduce NoInternalGitServer scheme 2021-08-21 15:02:08 +02:00
Dockerfile Switch to new Dockerfile 2021-05-12 07:03:17 +02:00
fix-out.sql variants are used within within outs - transition completed 2018-07-06 16:54:17 +02:00
Foundation.hs All solutions are visible now 2021-08-21 16:11:15 +02:00
gonito.cabal Refactor towards general announcements 2021-08-21 09:45:37 +02:00
gpl-3.0.txt Add GPL license file 2021-02-15 21:34:54 +01:00
Import.hs switch to Stack LTS 9.5, remove Fay 2017-09-22 14:23:03 +02:00
Model.hs Introduce basic structures for teams 2021-03-03 09:19:34 +01:00
nginx.conf Whether using web socket for showing progress is configurable 2021-02-27 22:51:40 +01:00
pack.sh improve packing script 2015-12-20 21:54:37 +01:00
PersistEvaluationScheme.hs More diagnostic 2021-02-27 15:10:08 +01:00
PersistMetric.hs add leaderboard 2015-12-12 18:53:20 +01:00
PersistSHA1.hs Add dependency tracking 2018-11-16 12:43:44 +01:00
PersistTeamActionType.hs A team captain cain invite other members 2021-03-13 11:21:28 +01:00
README.md Introduce the menuless mode 2021-08-21 11:48:05 +02:00
sample.env Introduce NoInternalGitServer scheme 2021-08-21 15:02:08 +02:00
Settings.hs Introduce NoInternalGitServer scheme 2021-08-21 15:02:08 +02:00
stack.yaml Switch to new Dockerfile 2021-05-12 07:03:17 +02:00

Gonito platform

Gonito (pronounced ɡɔ̃ˈɲitɔ) is a Kaggle-like platform for machine learning competitions (disclaimer: Gonito is neither affiliated with nor endorsed by Kaggle).

What's so special about Gonito:

  • free & open-source (GPL), you can use it your own, in your company, at your university, etc.
  • git-based (challenges and solutions are submitted only with git).

See the home page (and an instance of Gonito) at https://gonito.net .

Installation

For development

Gonito is written in Haskell and uses Yesod Web Framework, but all you need is just the Stack tool. See https://github.com/commercialhaskell/stack for instruction how to install Stack on your computer.

By default, Gonito uses Postgresql, so it needs to be installed and running at your computer.

After installing Stack:

createdb -E utf8 gonito   # Postgres needs to be configured
git clone --recurse-submodules git://gonito.net/gonito
cd gonito
stack setup
# before starting the build you might need some non-Haskell dependencies, e.g. in Ubuntu:
# sudo apt-get install libbz2-dev liblzma-dev libpcre3-dev libcairo-dev libfcgi-dev
stack build
stack exec yesod devel

The last command will start the Web server with Gonito (go to http://127.0.0.1:3000 in your browser).

With docker-compose

The easiest way to run Gonito is with docker-compose.

git clone --recurse-submodules https://gitlab.com/filipg/gonito
cd gonito
cp sample.env .env
# now you need to edit .env manually,
# in particular, you need to set up the administrator's
# password and paths to volumes for the volumes,
# cloned data ("arena"), certificates and SSH data;
# also you need to set up your certificate
# here is an easy way to do it just for local
# testing
mkdir certs
cd certs
# generating certificates for HTTPS, remember to
# set the `NGINX_CERTIFICATE_DIR` variable in `.env`
# so that it would point to `certs` here
openssl req -x509 -newkey rsa:4096 -keyout privkey.pem -out fullchain.pem -days 365 -nodes
cd ..
docker-compose up

Gonito will be available at https://127.0.0.1/. Of course, your browser will complain about "Potential Security Risk" as these are local certificates.

Gonito as backend

On the one hand, Gonito is a monolithic Web application without front- and back-end separated. On the other, some features are provided as end-points, so that Gonito could be used with whatever front-end. The documentation in the Swagger format is provided at /static/swagger-ui/index.html. (see https://gonito.net/static/swagger-ui/index.html for this at the main instance).

Keycloak is assumed as the identity provider here for those end-points that require authorization.

Asynchronous jobs

Some tasks (e.g. evaluating a submitted solution, creating a challenge) can take more time, so they must be run in a asynchronous manner. End-points for such actions return a job ID (a number). There are two options to show the logs:

  1. The front-end can show the logs using web sockets, see https://gitlab.com/filipg/gonito/-/blob/master/static/test-gonito-as-backend.html#L133 for an example.
  2. The front-end can just redirect the user to /api/view-progress-with-web-sockets/jobID, where showing the logs will be handled directly by Gonito (no authorization is needed there).

It's recommended to test showing logs with the test end-point /api/test-progress/N/D, which just counts up to N with a D-second delay (e.g. /api/test-progress/10/2).

Integration with Keycloak

Gonito can be easily integrated with Keycloak for the back-end end-points (but not yet for signing in Gonito as the monolithic Web application, this feature is on the way).

  1. Let's assume that you have a Keycloak instance. A simple way to run for development and testing is: docker run -e KEYCLOAK_USER=admin -e KEYCLOAK_PASSWORD=admin -p 8080:8080 jboss/keycloak.

  2. You need to set up the JWK key from your Keycloak instance. Go to https://<KEYCLOAK-HOST>/auth/realms/<KEYCLOAK-REALM>/protocol/openid-connect/certs (e.g. for the Docker run as given in (1): http://127.0.0.1:8080/auth/realms/master/protocol/openid-connect/certs) and copy the contents of the key from the JSON the (key/0 element not the whole JSON!).

  3. Create gonito client in Keycloak (Clients / Create).

  4. Set Valid Redirect URIs for the gonito client in Keycloak (e.g. simply add * there).

  5. Set Web Origin for the gonito client in Keycloak (e.g. simply add * there).

  6. Add some test user, set up some first/last name for them.

  7. Set JSON_WEB_KEY variable to the content of the JWK key (or GONITO_JSON_WEB_KEY when using docker-compose) and run Gonito.

If you create a new user, you need to run /api/add-info GET end-point. No parameters are needed it just read the user's data from the token and adds a record to the Gonito database.

You can simulate a front-end by going to /static/test-gonito-as-backend.html.

Menuless mode

If you want to combine an external front-end with some features of the Gonito native front-end, you can run Gonito in a menuless mode setting MENULESS to true. This way, you will not show all the functions of native Gonito.

Gonito & git

Gonito uses git in an inherent manner:

  • challenges (data sets) are provided as git repositories,
  • submissions are uploaded via git repositories, they are referred to with git commit hashes.

Advantages:

  • great flexibility as far as where you want to keep your challenges and submissions (could be external, well-known services such as GitHub or GitLab, your local git server, let's say gitolite or Gogs, or just a disk accessible in a Gonito instance),
  • even if Gonito ceases to exist, the challenges and submissions are still available in a standard manner, provided that git repositories (be it external or local) are accessible,
  • data sets can be easily downloaded using the command line (e.g. git clone git://gonito.net/paranormal-or-skeptic), without even clicking anything in the Web browser,
  • facilitates experiment repeatability and reproducibility (at worst the system output is easily available via git)
  • tools that were used to generate the output could be linked as git subrepositories
  • some challenge/submission metadata are tracked in a Gonito-independent way (within git commits),
  • copying data can be avoided with git mechanisms (e.g. when the challenge is already cloned, downloading specific submissions should be much quicker),
  • large data sets and models could be stored if needed using mechanisms such as git-annex (see below).

Commit structure

The following flow of git commits is recommended (though not required):

  • the challenge without hidden data for main test sets (i.e. files such as test-A/expected.tsv) should be pushed to the master branch
  • the hidden files (test-A/expected.tsv) should be added in a subsequent commit and pushed either to the dont-peek branch or a master branch of a separate repository (if access to the hidden data must be more strict),
  • the submissions should be committed with the master branch as the parent (or at least ancestor) commit and pushed to the same repository as the challenge data (in some user-specific branch) or any other repository (could be user-owned repositories)
  • any subsequent submissions could be derived in a natural way from other git commits (e.g. when a submission is improved, or even two approaches are merged)
  • new versions of the challenge can be committed (a challenge can be updated at Gonito) to the master (and dont-peek) branches

See also the following picture:

Recommended commit structure

git-annex

In some cases, you don't want to store challenge/submissions files simply in git:

  • very large data files, textual files (e.g. train/in.tsv even if compressed as train/in.tsv.xz)
  • binary training/testing data (PDF files, images, movies, recordings)
  • data sensitive due to privacy/security concerns (a scenario where it's OK to store metadata and some files in a widely accessible repository, but some files require limited access)
  • large ML models (note that Gonito does not require models for evaluation, but still it might be a good practice to commit them along with output files and scripts)

Such cases can be handled in a natural manner using git-annex, a git extension for handling files and their metadata without commiting their content to the repository. The contents can be stored at a wide range of special remotes, e.g. S3 buckets, WebDAV, rsync servers.

It's up to you which files are stored in git in a regular manner and which are added with git annex add, but note that if a challenge/submission file must be stored via git-annex and are required for evaluation (e.g. expected.tsv files for the challenge or out.tsv files for submissions), the git-annex special remote must be given when a challenge is created or a submission is done and the Gonito server must have access to such a special remote.

Integration with Slack/Discord

Gonito can send announcements to Slack or Discord via web hooks, e.g. when new best result is achieved. Simply set the ANNOUNCEMENT_HOOK environment variable to a Slack/Discord webhook.

Authors

  • Filip Graliński

References

@inproceedings{gralinski:2016:gonito,
  title="{Gonito.net - Open Platform for Research Competition, Cooperation and Reproducibility}",
  author={Grali{\'n}ski, Filip and Jaworski, Rafa{\l} and Borchmann, {\L}ukasz and Wierzcho{\'n}, Piotr},
  booktitle="{Branco, Ant{\'o}nio and Nicoletta Calzolari and Khalid Choukri (eds.), Proceedings of the 4REAL Workshop: Workshop on Research Results Reproducibility and Resources Citation in Science and Technology of Language}",
  pages={13--20},
  year=2016,
  url="http://4real.di.fc.ul.pt/wp-content/uploads/2016/04/4REALWorkshopProceedings.pdf"
}