forked from kubapok/polish-urban-legends-public

Go to file

Jan Przybylski eea9e08c8d update		2021-04-13 11:19:34 +02:00
dev-0	update	2021-04-13 11:19:34 +02:00
test-A	update	2021-04-13 11:19:34 +02:00
.gitignore	init	2021-04-12 14:56:01 +02:00
config.txt	init	2021-04-12 14:56:01 +02:00
geval	update	2021-04-13 11:19:34 +02:00
papers.bib	init	2021-04-12 14:56:01 +02:00
prog.py	update	2021-04-13 11:19:34 +02:00
README.md	init	2021-04-12 14:56:01 +02:00

README.md

Cluster Polish urban legend texts

Cluster Polish urban legend texts the way folklorists do.

The task is to group texts into urban legend types. Note that this is an unsupervised machine learning task. The metric used is Normalized Mutual Information (NMI).

Bibliography

Please cite:

Roman Grundkiewicz and Filip Graliński. How to distinguish a kidney theft from a death car? Experiments in clustering urban-legend texts. In Proceedings of the RANLP 2011 Workshop on Information Extraction and Knowledge Acquisition, pages 29-36, Hissar, Bulgaria, September 2011. Association for Computational Linguistics.

Directory structure

README.md — this file
config.txt — configuration file
papers.bib — BiBTeX entries
dev-0/ — directory with dev (test) data (87 texts)
dev-0/in.tsv — input data for the dev set
dev-0/expected.tsv — expected (reference) data for the dev set
test-A — directory with test data (691 texts)
test-A/in.tsv — input data for the test set
test-A/expected.tsv — expected (reference) data for the test set (hidden)