Go to file
Jakub Pokrywka 2cfd344ac5 most common value in train 2023-11-16 16:54:06 +01:00
dev-0 most common value in train 2023-11-16 16:54:06 +01:00
most_common_from_train most common value in train 2021-07-06 10:39:57 +02:00
roberta_no_year roberta base with no year emb 2021-09-18 09:20:14 +02:00
roberta_no_year_better_finetunning roberta no year better finetunning 2021-09-30 14:01:17 +02:00
roberta_no_year_better_finetunning_1k roberta_no_year_better_finetunning 1k trainset 2022-02-19 10:12:00 +01:00
roberta_no_year_better_finetunning_10k roberta_no_year_better_finetunning 10k trainset 2022-02-18 12:25:44 +01:00
roberta_no_year_better_finetunning_100k roberta_no_year_better_finetunning 100k trainset 2022-02-17 15:15:58 +01:00
roberta_no_year_from_scratch roberta no year from scratch better finetuning 2021-10-14 11:38:26 +02:00
roberta_temp a 2021-09-24 15:29:02 +02:00
roberta_year_as_text_better_finetunning roberta_year_as_text_better_finetunning 2021-10-07 08:55:15 +02:00
roberta_year_as_text_better_finetunning_1k roberta_year_as_text_better_finetunning 1k trainset 2022-02-21 09:06:02 +01:00
roberta_year_as_text_better_finetunning_10k roberta_year_as_text_better_finetunning 10k trainset 2022-02-20 22:03:28 +01:00
roberta_year_as_text_better_finetunning_100k roberta_year_as_text_better_finetunning 100k trainset 2022-02-20 09:49:36 +01:00
roberta_year_as_text_better_finetunning_only_day roberta_year_as_text_better_finetunning_only_day 2021-10-21 11:00:51 +02:00
roberta_year_as_text_better_finetunning_only_month roberta_year_as_text_better_finetunning_only_month 2021-10-21 23:16:39 +02:00
roberta_year_as_text_better_finetunning_only_weekday roberta_year_as_text_better_finetunning_only_weekday 2021-10-22 10:35:16 +02:00
roberta_year_as_text_better_finetunning_only_year roberta_year_as_text_better_finetunning_only_year 2021-10-20 22:36:20 +02:00
roberta_year_as_text_better_finetunning_scientific_representation roberta_year_as_text_better_finetunning_scientific_representation 2022-07-11 17:37:30 +02:00
roberta_year_as_text_from_scratch roberta_year_as_text_from_scratch 2021-10-15 14:35:57 +02:00
roberta_year_as_token_better_finetunning a 2022-02-10 14:37:36 +01:00
roberta_year_as_token_everywhere_better_finetunning roberta_year_as_token_everywhere_better_finetunning 2021-10-19 10:46:19 +02:00
roberta_year_as_token_everywhere_from_scratch_better_finetunning roberta_year_as_token_everywhere_from_scratch_better_finetunning 2021-10-20 11:49:08 +02:00
roberta_year_as_token_finetunned roberta_year_as_token_finetunned 2021-10-04 17:30:08 +02:00
roberta_year_as_token_from_scratch_better_finetunning roberta_year_as_token_from_scratch_better_finetunning 2021-10-18 17:26:55 +02:00
solutions most common value in train 2023-11-16 16:54:06 +01:00
t5_no_year t5 v1.1. 5.93 epochs 2021-10-04 17:28:22 +02:00
t5_year t5 year 2021-10-06 10:45:09 +02:00
test-A most common value in train 2023-11-16 16:54:06 +01:00
test-B most common value in train 2023-11-16 16:54:06 +01:00
train a 2020-10-22 10:15:36 +02:00
.gitignore a 2020-10-22 10:15:36 +02:00
README.md name 2020-10-24 22:18:19 +02:00
config.txt change formatting 2021-11-13 14:29:34 +01:00
names a 2020-10-22 10:15:36 +02:00

README.md

Ireland news headlines

Dataset source and thanks

Predict the headline category given headine text and year Start Date: 1996-01-01 End Date: 2019-12-31

Dataset taken from https://www.kaggle.com/therohk/ireland-historical-news on 19.06.2020. Special thanks to Rohit Kulkarni who created it.

You may find whole dataset (including the test dataset) in the link above. The dataset in the link may be updated. Please, do not incorporate any of the data from this kaggle dataset (or others) to your submission in this gonito challange.

Context (from https://www.kaggle.com/therohk/ireland-historical-news )

This news dataset is a composition of 1.48 million headlines posted by the Irish Times operating within Ireland.

Created over 160 years ago; the agency can provides long term birds eye view of the happenings in Europe.

Challange creation

Year is normalized as follows:

''' days_in_year = 366 if is_leap else 365 normalized = d.year + ((day_of_year-1) / days_in_year) '''

train, dev, test split is 80%, 10%, 10% randomly

note that there are very similar headlines in the data

I did not make any effort to prevent from going one sentence like this to the train and second one to the test.

I used a first category in the classification task. E.g there is "world" instead of "world.us" as on original dataset.