Go to file
Filip Gralinski 1590b20d2f Bump up version 2022-01-19 12:48:58 +01:00
Data Update geval 2020-10-19 08:14:09 +02:00
Gonito Add dependency tracking 2018-11-16 12:43:44 +01:00
Handler Hook up individual keys 2022-01-19 12:46:23 +01:00
Import switch to Stack LTS 9.5, remove Fay 2017-09-22 14:23:03 +02:00
Settings init 2015-08-20 22:33:38 +02:00
Web Introduce course-specific announcement hooks 2022-01-18 22:54:07 +01:00
app Switch to an incompatible DB scheme 2021-02-27 11:48:30 +01:00
arena show Readme 2015-09-06 14:24:49 +02:00
config Bump up version 2022-01-19 12:48:58 +01:00
fay init 2015-08-20 22:33:38 +02:00
fay-shared init 2015-08-20 22:33:38 +02:00
geval@2dbcedb40e Update GEval 2021-10-29 18:00:11 +02:00
helpers/gitolite Add post update gitolite script 2022-01-15 16:52:06 +01:00
messages Introduce course-specific announcement hooks 2022-01-18 22:54:07 +01:00
misc Add documentation on git 2020-12-31 11:58:54 +01:00
sql-scripts Add helper script for conversion to version 2 2021-02-27 11:27:04 +01:00
static Fix end-point for making a submission public 2021-09-25 15:12:51 +02:00
templates Generate invidual key 2022-01-19 10:48:38 +01:00
test Fix test 2021-05-12 07:03:25 +02:00
.dir-locals.el init 2015-08-20 22:33:38 +02:00
.dockerignore Switch to new Dockerfile 2021-05-12 07:03:17 +02:00
.ghci init 2015-08-20 22:33:38 +02:00
.gitignore Minor improvement in .gitignore 2021-08-17 17:34:31 +02:00
.gitlab-ci.yml Pinpoint docker image 2019-08-24 12:18:06 +02:00
.gitmodules Switch to new Dockerfile 2021-05-12 07:03:17 +02:00
Application.hs Add page for a course 2022-01-17 21:58:09 +01:00
CHANGELOG.md Bump up version 2022-01-19 12:48:58 +01:00
Dockerfile Switch to new Dockerfile 2021-05-12 07:03:17 +02:00
Foundation.hs Introduce course-specific announcement hooks 2022-01-18 22:54:07 +01:00
Import.hs switch to Stack LTS 9.5, remove Fay 2017-09-22 14:23:03 +02:00
Model.hs Introduce basic structures for teams 2021-03-03 09:19:34 +01:00
PersistEvaluationScheme.hs More diagnostic 2021-02-27 15:10:08 +01:00
PersistMetric.hs add leaderboard 2015-12-12 18:53:20 +01:00
PersistSHA1.hs Fix swagger documentation for version-info 2021-09-02 21:13:21 +02:00
PersistTeamActionType.hs A team captain cain invite other members 2021-03-13 11:21:28 +01:00
README.md Introduce the menuless mode 2021-08-21 11:48:05 +02:00
Settings.hs Introduce NoInternalGitServer scheme 2021-08-21 15:02:08 +02:00
add-variants.sql helper script for transition to multiple variants 2018-07-04 16:43:50 +02:00
add-versions.sql Fix helper script 2019-08-27 23:01:12 +02:00
build.sh Fix building script 2018-09-14 15:52:45 +02:00
docker-compose-simple.yml Introduce NoInternalGitServer scheme 2021-08-21 15:02:08 +02:00
docker-compose.yml Introduce NoInternalGitServer scheme 2021-08-21 15:02:08 +02:00
fix-out.sql variants are used within within outs - transition completed 2018-07-06 16:54:17 +02:00
gonito.cabal Bump up version 2022-01-19 12:48:58 +01:00
gpl-3.0.txt Add GPL license file 2021-02-15 21:34:54 +01:00
nginx.conf Whether using web socket for showing progress is configurable 2021-02-27 22:51:40 +01:00
pack.sh improve packing script 2015-12-20 21:54:37 +01:00
sample.env Introduce NoInternalGitServer scheme 2021-08-21 15:02:08 +02:00
stack.yaml Switch to new Dockerfile 2021-05-12 07:03:17 +02:00

README.md

Gonito platform

Gonito (pronounced ɡɔ̃ˈɲitɔ) is a Kaggle-like platform for machine learning competitions (disclaimer: Gonito is neither affiliated with nor endorsed by Kaggle).

What's so special about Gonito:

  • free & open-source (GPL), you can use it your own, in your company, at your university, etc.
  • git-based (challenges and solutions are submitted only with git).

See the home page (and an instance of Gonito) at https://gonito.net .

Installation

For development

Gonito is written in Haskell and uses Yesod Web Framework, but all you need is just the Stack tool. See https://github.com/commercialhaskell/stack for instruction how to install Stack on your computer.

By default, Gonito uses Postgresql, so it needs to be installed and running at your computer.

After installing Stack:

createdb -E utf8 gonito   # Postgres needs to be configured
git clone --recurse-submodules git://gonito.net/gonito
cd gonito
stack setup
# before starting the build you might need some non-Haskell dependencies, e.g. in Ubuntu:
# sudo apt-get install libbz2-dev liblzma-dev libpcre3-dev libcairo-dev libfcgi-dev
stack build
stack exec yesod devel

The last command will start the Web server with Gonito (go to http://127.0.0.1:3000 in your browser).

With docker-compose

The easiest way to run Gonito is with docker-compose.

git clone --recurse-submodules https://gitlab.com/filipg/gonito
cd gonito
cp sample.env .env
# now you need to edit .env manually,
# in particular, you need to set up the administrator's
# password and paths to volumes for the volumes,
# cloned data ("arena"), certificates and SSH data;
# also you need to set up your certificate
# here is an easy way to do it just for local
# testing
mkdir certs
cd certs
# generating certificates for HTTPS, remember to
# set the `NGINX_CERTIFICATE_DIR` variable in `.env`
# so that it would point to `certs` here
openssl req -x509 -newkey rsa:4096 -keyout privkey.pem -out fullchain.pem -days 365 -nodes
cd ..
docker-compose up

Gonito will be available at https://127.0.0.1/. Of course, your browser will complain about "Potential Security Risk" as these are local certificates.

Gonito as backend

On the one hand, Gonito is a monolithic Web application without front- and back-end separated. On the other, some features are provided as end-points, so that Gonito could be used with whatever front-end. The documentation in the Swagger format is provided at /static/swagger-ui/index.html. (see https://gonito.net/static/swagger-ui/index.html for this at the main instance).

Keycloak is assumed as the identity provider here for those end-points that require authorization.

Asynchronous jobs

Some tasks (e.g. evaluating a submitted solution, creating a challenge) can take more time, so they must be run in a asynchronous manner. End-points for such actions return a job ID (a number). There are two options to show the logs:

  1. The front-end can show the logs using web sockets, see https://gitlab.com/filipg/gonito/-/blob/master/static/test-gonito-as-backend.html#L133 for an example.
  2. The front-end can just redirect the user to /api/view-progress-with-web-sockets/jobID, where showing the logs will be handled directly by Gonito (no authorization is needed there).

It's recommended to test showing logs with the test end-point /api/test-progress/N/D, which just counts up to N with a D-second delay (e.g. /api/test-progress/10/2).

Integration with Keycloak

Gonito can be easily integrated with Keycloak for the back-end end-points (but not yet for signing in Gonito as the monolithic Web application, this feature is on the way).

  1. Let's assume that you have a Keycloak instance. A simple way to run for development and testing is: docker run -e KEYCLOAK_USER=admin -e KEYCLOAK_PASSWORD=admin -p 8080:8080 jboss/keycloak.

  2. You need to set up the JWK key from your Keycloak instance. Go to https://<KEYCLOAK-HOST>/auth/realms/<KEYCLOAK-REALM>/protocol/openid-connect/certs (e.g. for the Docker run as given in (1): http://127.0.0.1:8080/auth/realms/master/protocol/openid-connect/certs) and copy the contents of the key from the JSON the (key/0 element not the whole JSON!).

  3. Create gonito client in Keycloak (Clients / Create).

  4. Set Valid Redirect URIs for the gonito client in Keycloak (e.g. simply add * there).

  5. Set Web Origin for the gonito client in Keycloak (e.g. simply add * there).

  6. Add some test user, set up some first/last name for them.

  7. Set JSON_WEB_KEY variable to the content of the JWK key (or GONITO_JSON_WEB_KEY when using docker-compose) and run Gonito.

If you create a new user, you need to run /api/add-info GET end-point. No parameters are needed it just read the user's data from the token and adds a record to the Gonito database.

You can simulate a front-end by going to /static/test-gonito-as-backend.html.

Menuless mode

If you want to combine an external front-end with some features of the Gonito native front-end, you can run Gonito in a menuless mode setting MENULESS to true. This way, you will not show all the functions of native Gonito.

Gonito & git

Gonito uses git in an inherent manner:

  • challenges (data sets) are provided as git repositories,
  • submissions are uploaded via git repositories, they are referred to with git commit hashes.

Advantages:

  • great flexibility as far as where you want to keep your challenges and submissions (could be external, well-known services such as GitHub or GitLab, your local git server, let's say gitolite or Gogs, or just a disk accessible in a Gonito instance),
  • even if Gonito ceases to exist, the challenges and submissions are still available in a standard manner, provided that git repositories (be it external or local) are accessible,
  • data sets can be easily downloaded using the command line (e.g. git clone git://gonito.net/paranormal-or-skeptic), without even clicking anything in the Web browser,
  • facilitates experiment repeatability and reproducibility (at worst the system output is easily available via git)
  • tools that were used to generate the output could be linked as git subrepositories
  • some challenge/submission metadata are tracked in a Gonito-independent way (within git commits),
  • copying data can be avoided with git mechanisms (e.g. when the challenge is already cloned, downloading specific submissions should be much quicker),
  • large data sets and models could be stored if needed using mechanisms such as git-annex (see below).

Commit structure

The following flow of git commits is recommended (though not required):

  • the challenge without hidden data for main test sets (i.e. files such as test-A/expected.tsv) should be pushed to the master branch
  • the hidden files (test-A/expected.tsv) should be added in a subsequent commit and pushed either to the dont-peek branch or a master branch of a separate repository (if access to the hidden data must be more strict),
  • the submissions should be committed with the master branch as the parent (or at least ancestor) commit and pushed to the same repository as the challenge data (in some user-specific branch) or any other repository (could be user-owned repositories)
  • any subsequent submissions could be derived in a natural way from other git commits (e.g. when a submission is improved, or even two approaches are merged)
  • new versions of the challenge can be committed (a challenge can be updated at Gonito) to the master (and dont-peek) branches

See also the following picture:

Recommended commit structure

git-annex

In some cases, you don't want to store challenge/submissions files simply in git:

  • very large data files, textual files (e.g. train/in.tsv even if compressed as train/in.tsv.xz)
  • binary training/testing data (PDF files, images, movies, recordings)
  • data sensitive due to privacy/security concerns (a scenario where it's OK to store metadata and some files in a widely accessible repository, but some files require limited access)
  • large ML models (note that Gonito does not require models for evaluation, but still it might be a good practice to commit them along with output files and scripts)

Such cases can be handled in a natural manner using git-annex, a git extension for handling files and their metadata without commiting their content to the repository. The contents can be stored at a wide range of special remotes, e.g. S3 buckets, WebDAV, rsync servers.

It's up to you which files are stored in git in a regular manner and which are added with git annex add, but note that if a challenge/submission file must be stored via git-annex and are required for evaluation (e.g. expected.tsv files for the challenge or out.tsv files for submissions), the git-annex special remote must be given when a challenge is created or a submission is done and the Gonito server must have access to such a special remote.

Integration with Slack/Discord

Gonito can send announcements to Slack or Discord via web hooks, e.g. when new best result is achieved. Simply set the ANNOUNCEMENT_HOOK environment variable to a Slack/Discord webhook.

Authors

  • Filip Graliński

References

@inproceedings{gralinski:2016:gonito,
  title="{Gonito.net - Open Platform for Research Competition, Cooperation and Reproducibility}",
  author={Grali{\'n}ski, Filip and Jaworski, Rafa{\l} and Borchmann, {\L}ukasz and Wierzcho{\'n}, Piotr},
  booktitle="{Branco, Ant{\'o}nio and Nicoletta Calzolari and Khalid Choukri (eds.), Proceedings of the 4REAL Workshop: Workshop on Research Results Reproducibility and Resources Citation in Science and Technology of Language}",
  pages={13--20},
  year=2016,
  url="http://4real.di.fc.ul.pt/wp-content/uploads/2016/04/4REALWorkshopProceedings.pdf"
}