This commit is contained in:
Filip Gralinski 2022-02-08 09:22:22 +01:00
commit 06d170be0b
10 changed files with 19045 additions and 0 deletions

8
.gitignore vendored Normal file
View File

@ -0,0 +1,8 @@
*~
*.swp
*.bak
*.pyc
*.o
.DS_Store
.token

62
README.md Normal file
View File

@ -0,0 +1,62 @@
Dialogi z lektur
=====================================
## Format danych
Pierwsza kolumna zbioru `in.tsv` zawiera początek dialogu pewnej lektury. Dialogi mogą być być prowadzone przez dowolną ilość osób
i nie zawierają innych adnotacji niż sama wypowiedź (np. komentarzy narratora). Poszczególne wypowiedzi w początku dialogu oddzielone są separatorem `[SEP]`.
Każda kolejna kolumna to propozycja kontynuacji dialogu. Kontynuacja dialogu może pochodzić z tej samej lub innej lektury.
Istnieje tylko jedna taka poprawna kontynuacja dialogu- ta, która faktycznie występuje w książce.
Zadaniem jest zwrócić poprawną kontynuację dialogu.
Jako format wyjściowy zwróć wszystkie proponowane kontynuacje dialogu uszeregowane w kolejności od najbardziej prawdopodobnej do najmniej prawdopodbnej.
Propozycje powinny być identyczne jak w pliku `in.tsv` i oddzielone tabulacjami.
## Metryka
Metryka ewaluacji to [Mean Reciprocal Rank](https://en.wikipedia.org/wiki/Mean_reciprocal_rank) (MRR) lub Mean Average Precision (MAP). W przypadku niniejszego zadania, gdzie tylko jedna odpowiedź jest prawidłowa,
metryki MRR i MAP są tożsame.
## Zasady wyzwania
Dozwolone jest używanie gotowych, dostępnych modeli językowych (np. https://github.com/sdadas/polish-roberta ), ale nie wolno odwoływać się do żadnych danych poza tymi, zawartymi w zadaniu.
W szczególności nie wolno korzystać z żadnych książek, np. dotrenowywać modelu językowego na lekturach innych, niż te zawarte w zadaniu.
## Przykład
Przykładowy wiersz `in.tsv` z trzema propozycjami kontynuacji dialogu:
```
Cześć, jestem Adam![SEP]Cześć, mam na imię Ala[SEP]Jak się masz? Milordzie, fortuna nam sprzyja! Dobrze. A Ty? Co się stało?
```
Odpowiadający wiersz z `expected.tsv`:
```
Dobrze. A Ty?
```
Przykładowy plik `out.tsv`:
```
Dobrze. A Ty? Co się stało? Milordzie, fortuna nam sprzyja!
```
Directory structure
-------------------
* `README.md` — this file
* `config.txt` — configuration file
* `train/` — directory with training data
* `train/train.tsv` — sample train set
* `dev-0/` — directory with dev (test) data
* `dev-0/in.tsv` — input data for the dev set
* `dev-0/expected.tsv` — expected (reference) data for the dev set
* `test-A` — directory with test data
* `test-A/in.tsv` — input data for the test set
* `test-A/expected.tsv` — expected (reference) data for the test set

1
config.txt Normal file
View File

@ -0,0 +1 @@
--metric MAP --precision 6 --in-header in-header.tsv --out-header out-header.tsv

4348
dev-0/expected.tsv Normal file

File diff suppressed because one or more lines are too long

BIN
dev-0/in.tsv.gz Normal file

Binary file not shown.

1
in-header.tsv Normal file
View File

@ -0,0 +1 @@
dialogue_beginning answer_1 answer_2 answer_3 answer_4 answer_5 answer_6 answer_7 answer_8 answer_9 answer_10 answer_11 answer_12 answer_13 answer_14 answer_15 answer_16 answer_17 answer_18 answer_19 answer_20 answer_21 answer_22 answer_23 answer_24 answer_25 answer_26 answer_27 answer_28 answer_29 answer_30 answer_31 answer_32 answer_33 answer_34 answer_35 answer_36 answer_37 answer_38 answer_39 answer_40 answer_41 answer_42 answer_43 answer_44 answer_45 answer_46 answer_47 answer_48 answer_49 answer_50 answer_51 answer_52 answer_53 answer_54 answer_55 answer_56 answer_57 answer_58 answer_59 answer_60 answer_61 answer_62 answer_63 answer_64 answer_65 answer_66 answer_67 answer_68 answer_69 answer_70 answer_71 answer_72 answer_73 answer_74 answer_75 answer_76 answer_77 answer_78 answer_79 answer_80 answer_81 answer_82 answer_83 answer_84 answer_85 answer_86 answer_87 answer_88 answer_89 answer_90 answer_91 answer_92 answer_93 answer_94 answer_95 answer_96 answer_97 answer_98 answer_99 answer_100
1 dialogue_beginning answer_1 answer_2 answer_3 answer_4 answer_5 answer_6 answer_7 answer_8 answer_9 answer_10 answer_11 answer_12 answer_13 answer_14 answer_15 answer_16 answer_17 answer_18 answer_19 answer_20 answer_21 answer_22 answer_23 answer_24 answer_25 answer_26 answer_27 answer_28 answer_29 answer_30 answer_31 answer_32 answer_33 answer_34 answer_35 answer_36 answer_37 answer_38 answer_39 answer_40 answer_41 answer_42 answer_43 answer_44 answer_45 answer_46 answer_47 answer_48 answer_49 answer_50 answer_51 answer_52 answer_53 answer_54 answer_55 answer_56 answer_57 answer_58 answer_59 answer_60 answer_61 answer_62 answer_63 answer_64 answer_65 answer_66 answer_67 answer_68 answer_69 answer_70 answer_71 answer_72 answer_73 answer_74 answer_75 answer_76 answer_77 answer_78 answer_79 answer_80 answer_81 answer_82 answer_83 answer_84 answer_85 answer_86 answer_87 answer_88 answer_89 answer_90 answer_91 answer_92 answer_93 answer_94 answer_95 answer_96 answer_97 answer_98 answer_99 answer_100

1
out-header.tsv Normal file
View File

@ -0,0 +1 @@
probable_answer_1 probable_answer_2 probable_answer_3 probable_answer_4 probable_answer_5 probable_answer_6 probable_answer_7 probable_answer_8 probable_answer_9 probable_answer_10 probable_answer_11 probable_answer_12 probable_answer_13 probable_answer_14 probable_answer_15 probable_answer_16 probable_answer_17 probable_answer_18 probable_answer_19 probable_answer_20 probable_answer_21 probable_answer_22 probable_answer_23 probable_answer_24 probable_answer_25 probable_answer_26 probable_answer_27 probable_answer_28 probable_answer_29 probable_answer_30 probable_answer_31 probable_answer_32 probable_answer_33 probable_answer_34 probable_answer_35 probable_answer_36 probable_answer_37 probable_answer_38 probable_answer_39 probable_answer_40 probable_answer_41 probable_answer_42 probable_answer_43 probable_answer_44 probable_answer_45 probable_answer_46 probable_answer_47 probable_answer_48 probable_answer_49 probable_answer_50 probable_answer_51 probable_answer_52 probable_answer_53 probable_answer_54 probable_answer_55 probable_answer_56 probable_answer_57 probable_answer_58 probable_answer_59 probable_answer_60 probable_answer_61 probable_answer_62 probable_answer_63 probable_answer_64 probable_answer_65 probable_answer_66 probable_answer_67 probable_answer_68 probable_answer_69 probable_answer_70 probable_answer_71 probable_answer_72 probable_answer_73 probable_answer_74 probable_answer_75 probable_answer_76 probable_answer_77 probable_answer_78 probable_answer_79 probable_answer_80 probable_answer_81 probable_answer_82 probable_answer_83 probable_answer_84 probable_answer_85 probable_answer_86 probable_answer_87 probable_answer_88 probable_answer_89 probable_answer_90 probable_answer_91 probable_answer_92 probable_answer_93 probable_answer_94 probable_answer_95 probable_answer_96 probable_answer_97 probable_answer_98 probable_answer_99 probable_answer_100
1 probable_answer_1 probable_answer_2 probable_answer_3 probable_answer_4 probable_answer_5 probable_answer_6 probable_answer_7 probable_answer_8 probable_answer_9 probable_answer_10 probable_answer_11 probable_answer_12 probable_answer_13 probable_answer_14 probable_answer_15 probable_answer_16 probable_answer_17 probable_answer_18 probable_answer_19 probable_answer_20 probable_answer_21 probable_answer_22 probable_answer_23 probable_answer_24 probable_answer_25 probable_answer_26 probable_answer_27 probable_answer_28 probable_answer_29 probable_answer_30 probable_answer_31 probable_answer_32 probable_answer_33 probable_answer_34 probable_answer_35 probable_answer_36 probable_answer_37 probable_answer_38 probable_answer_39 probable_answer_40 probable_answer_41 probable_answer_42 probable_answer_43 probable_answer_44 probable_answer_45 probable_answer_46 probable_answer_47 probable_answer_48 probable_answer_49 probable_answer_50 probable_answer_51 probable_answer_52 probable_answer_53 probable_answer_54 probable_answer_55 probable_answer_56 probable_answer_57 probable_answer_58 probable_answer_59 probable_answer_60 probable_answer_61 probable_answer_62 probable_answer_63 probable_answer_64 probable_answer_65 probable_answer_66 probable_answer_67 probable_answer_68 probable_answer_69 probable_answer_70 probable_answer_71 probable_answer_72 probable_answer_73 probable_answer_74 probable_answer_75 probable_answer_76 probable_answer_77 probable_answer_78 probable_answer_79 probable_answer_80 probable_answer_81 probable_answer_82 probable_answer_83 probable_answer_84 probable_answer_85 probable_answer_86 probable_answer_87 probable_answer_88 probable_answer_89 probable_answer_90 probable_answer_91 probable_answer_92 probable_answer_93 probable_answer_94 probable_answer_95 probable_answer_96 probable_answer_97 probable_answer_98 probable_answer_99 probable_answer_100

BIN
test-A/in.tsv.gz Normal file

Binary file not shown.

14624
train/expected.tsv Normal file

File diff suppressed because one or more lines are too long

BIN
train/in.tsv.gz Normal file

Binary file not shown.