Update README.md

This commit is contained in:
kdudzic 2024-11-19 12:47:34 +01:00
parent 8523077e72
commit 6e0a5f6be6

View File

@ -1,4 +1,4 @@
Dariah proper name disambiguation challenge Dariah character names disambiguation challenge
This challenge is based on the contents of the classic Polish novel "Lalka" (English: "The Doll") by Bolesław Prus. Fragments of the novel text included in the dataset had all the different names indicating the same character annotated with a common character label. For example, the main character - Stanisław Wokulski - is always annotated as "WOKULSKI" no matter if he is referred to as "Wokulski", "Stach", "S. Wokulski", etc. in the text. In the same manner, all different mentions of Izabela Łęcka are annotated as "LECKA", all mentions of Ignacy Rzecki as "RZECKI", and so on. The training split of the dataset consists of 3336 annotated sentences from the novel. This challenge is based on the contents of the classic Polish novel "Lalka" (English: "The Doll") by Bolesław Prus. Fragments of the novel text included in the dataset had all the different names indicating the same character annotated with a common character label. For example, the main character - Stanisław Wokulski - is always annotated as "WOKULSKI" no matter if he is referred to as "Wokulski", "Stach", "S. Wokulski", etc. in the text. In the same manner, all different mentions of Izabela Łęcka are annotated as "LECKA", all mentions of Ignacy Rzecki as "RZECKI", and so on. The training split of the dataset consists of 3336 annotated sentences from the novel.