dev-0 | ||
test-A | ||
.gitignore | ||
config.txt | ||
geval | ||
kMeans.ipynb | ||
kMeans.py | ||
papers.bib | ||
README.md |
Cluster Polish urban legend texts
Cluster Polish urban legend texts the way folklorists do.
The task is to group texts into urban legend types. Note that this is an unsupervised machine learning task. The metric used is Normalized Mutual Information (NMI).
Bibliography
Please cite:
Roman Grundkiewicz and Filip Graliński. How to distinguish a kidney theft from a death car? Experiments in clustering urban-legend texts. In Proceedings of the RANLP 2011 Workshop on Information Extraction and Knowledge Acquisition, pages 29-36, Hissar, Bulgaria, September 2011. Association for Computational Linguistics.
See also:
Roman Grundkiewicz, Automatic identification and clustering of short narrative texts published on the Internet, master thesis, supervisor: Filip Graliński, Faculty of Mathematics and Computer Science, Adam Mickiewicz University in Poznań, June 2011
Filip Graliński. Tropiąc czarną wołgę w sieci. o poszukiwaniu legend miejskich w internecie. In Anna Gumkowska, editor, Tekst (w) sieci, volume 2, pages 253-261, Warszawa, 2009. Wydawnictwa Akademickie i Profesjonalne.
(See papers.bib
for BiBTeX entries.)
Directory structure
README.md
— this fileconfig.txt
— configuration filepapers.bib
— BiBTeX entriesdev-0/
— directory with dev (test) data (87 texts)dev-0/in.tsv
— input data for the dev setdev-0/expected.tsv
— expected (reference) data for the dev settest-A
— directory with test data (691 texts)test-A/in.tsv
— input data for the test settest-A/expected.tsv
— expected (reference) data for the test set (hidden)