forked from kubapok/polish-urban-legends-public

Go to file

szypol 148368be03 finally... after platform change solution works		2021-07-09 00:35:30 +02:00
dev-0	finally... after platform change solution works	2021-07-09 00:35:30 +02:00
test-A	finally... after platform change solution works	2021-07-09 00:35:30 +02:00
.gitignore	init	2021-04-12 14:56:01 +02:00
config.txt	init	2021-04-12 14:56:01 +02:00
geval	finally... after platform change solution works	2021-07-09 00:35:30 +02:00
kMeans.ipynb	finally... after platform change solution works	2021-07-09 00:35:30 +02:00
kMeans.py	finally... after platform change solution works	2021-07-09 00:35:30 +02:00
papers.bib	init	2021-04-12 14:56:01 +02:00
README.md	init	2021-04-12 14:56:01 +02:00

README.md

Cluster Polish urban legend texts

Cluster Polish urban legend texts the way folklorists do.

The task is to group texts into urban legend types. Note that this is an unsupervised machine learning task. The metric used is Normalized Mutual Information (NMI).

Bibliography

Please cite:

Roman Grundkiewicz and Filip Graliński. How to distinguish a kidney theft from a death car? Experiments in clustering urban-legend texts. In Proceedings of the RANLP 2011 Workshop on Information Extraction and Knowledge Acquisition, pages 29-36, Hissar, Bulgaria, September 2011. Association for Computational Linguistics.

Directory structure

README.md — this file
config.txt — configuration file
papers.bib — BiBTeX entries
dev-0/ — directory with dev (test) data (87 texts)
dev-0/in.tsv — input data for the dev set
dev-0/expected.tsv — expected (reference) data for the dev set
test-A — directory with test data (691 texts)
test-A/in.tsv — input data for the test set
test-A/expected.tsv — expected (reference) data for the test set (hidden)