Cluster Polish urban legend texts

Cluster Polish urban legend texts the way folklorists do.

The task is to group texts into urban legend types. Note that this is an unsupervised machine learning task. The metric used is Normalized Mutual Information (NMI).

Bibliography

Please cite:

Roman Grundkiewicz and Filip Graliński. How to distinguish a kidney theft from a death car? Experiments in clustering urban-legend texts. In Proceedings of the RANLP 2011 Workshop on Information Extraction and Knowledge Acquisition, pages 29-36, Hissar, Bulgaria, September 2011. Association for Computational Linguistics.

Directory structure

README.md — this file
config.txt — configuration file
papers.bib — BiBTeX entries
dev-0/ — directory with dev (test) data (87 texts)
dev-0/in.tsv — input data for the dev set
dev-0/expected.tsv — expected (reference) data for the dev set
test-A — directory with test data (691 texts)
test-A/in.tsv — input data for the test set
test-A/expected.tsv — expected (reference) data for the test set (hidden)

1.6 KiB Raw Permalink Blame History

Cluster Polish urban legend texts

Bibliography

Directory structure

1.6 KiB

Raw Permalink Blame History