This commit is contained in:
Jakub Pokrywka 2022-02-12 19:14:40 +01:00
commit 146796c710
10 changed files with 935118 additions and 0 deletions

8
.gitignore vendored Normal file
View File

@ -0,0 +1,8 @@
*~
*.swp
*.bak
*.pyc
*.o
.DS_Store
.token

35
README.md Normal file
View File

@ -0,0 +1,35 @@
Wikipedia discover date
======================
Guess the masked date in an wikipedia article.
Only dates 1835-1935 are considered.
You should predict only one masked date in an article described as [DATEPREDICT].
[DATEMASK] are other masked dates. The predicted date should be in fractional year format.
Fractional year format:
Year is normalized as follows:
'''
days_in_year = 366 if is_leap else 365
normalized = d.year + ((day_of_year-1) / days_in_year)
'''
Directory structure
-------------------
* `README.md` — this file
* `config.txt` — configuration file
* `train/` — directory with training data
* `train/in.tsv` — input data for the train set
* `train/expected.tsv` — expected (reference) data for the train set
* `dev-0/` — directory with dev (test) data
* `dev-0/in.tsv` — input data for the dev set
* `dev-0/expected.tsv` — expected (reference) data for the dev set
* `test-A` — directory with test data
* `test-A/in.tsv` — input data for the test set
* `test-A/expected.tsv` — expected (reference) data for the test set

1
config.txt Normal file
View File

@ -0,0 +1 @@
--metric RMSE-Against-Interval --metric MAE-Against-Interval --precision 2 --in-header in-header.tsv --out-header out-header.tsv

19120
dev-0/expected.tsv Normal file

File diff suppressed because it is too large Load Diff

BIN
dev-0/in.tsv.xz Normal file

Binary file not shown.

1
in-header.tsv Normal file
View File

@ -0,0 +1 @@
title text
1 title text

1
out-header.tsv Normal file
View File

@ -0,0 +1 @@
fractional_year
1 fractional_year

BIN
test-A/in.tsv.xz Normal file

Binary file not shown.

915952
train/expected.tsv Normal file

File diff suppressed because it is too large Load Diff

BIN
train/in.tsv.xz Normal file

Binary file not shown.