2022-02-12 19:14:40 +01:00
|
|
|
|
2022-02-12 19:30:51 +01:00
|
|
|
Wiki Historian En
|
2022-02-12 19:14:40 +01:00
|
|
|
======================
|
|
|
|
|
|
|
|
Guess the masked date in an wikipedia article.
|
|
|
|
|
|
|
|
Only dates 1835-1935 are considered.
|
|
|
|
|
|
|
|
You should predict only one masked date in an article described as [DATEPREDICT].
|
|
|
|
[DATEMASK] are other masked dates. The predicted date should be in fractional year format.
|
|
|
|
|
|
|
|
Fractional year format:
|
|
|
|
|
|
|
|
Year is normalized as follows:
|
|
|
|
|
|
|
|
'''
|
|
|
|
days_in_year = 366 if is_leap else 365
|
|
|
|
normalized = d.year + ((day_of_year-1) / days_in_year)
|
|
|
|
'''
|
|
|
|
|
|
|
|
|
|
|
|
Directory structure
|
|
|
|
-------------------
|
|
|
|
|
|
|
|
* `README.md` — this file
|
|
|
|
* `config.txt` — configuration file
|
|
|
|
* `train/` — directory with training data
|
|
|
|
* `train/in.tsv` — input data for the train set
|
|
|
|
* `train/expected.tsv` — expected (reference) data for the train set
|
|
|
|
* `dev-0/` — directory with dev (test) data
|
|
|
|
* `dev-0/in.tsv` — input data for the dev set
|
|
|
|
* `dev-0/expected.tsv` — expected (reference) data for the dev set
|
|
|
|
* `test-A` — directory with test data
|
|
|
|
* `test-A/in.tsv` — input data for the test set
|
|
|
|
* `test-A/expected.tsv` — expected (reference) data for the test set
|