forked from kubapok/polish-urban-legends-public

Go to file

Jakub Kolasiński 60b2708888 Use stopwords		2021-04-13 20:28:41 +02:00
dev-0	Use stopwords	2021-04-13 20:28:41 +02:00
test-A	init	2021-04-12 14:56:01 +02:00
.gitignore	Add classifier	2021-04-13 19:34:47 +02:00
classifier.py	Use stopwords	2021-04-13 20:28:41 +02:00
config.txt	init	2021-04-12 14:56:01 +02:00
papers.bib	init	2021-04-12 14:56:01 +02:00
README.md	init	2021-04-12 14:56:01 +02:00
stopwords	Use stopwords	2021-04-13 20:28:41 +02:00

README.md

Cluster Polish urban legend texts

Cluster Polish urban legend texts the way folklorists do.

The task is to group texts into urban legend types. Note that this is an unsupervised machine learning task. The metric used is Normalized Mutual Information (NMI).

Bibliography

Please cite:

Roman Grundkiewicz and Filip Graliński. How to distinguish a kidney theft from a death car? Experiments in clustering urban-legend texts. In Proceedings of the RANLP 2011 Workshop on Information Extraction and Knowledge Acquisition, pages 29-36, Hissar, Bulgaria, September 2011. Association for Computational Linguistics.

Directory structure

README.md — this file
config.txt — configuration file
papers.bib — BiBTeX entries
dev-0/ — directory with dev (test) data (87 texts)
dev-0/in.tsv — input data for the dev set
dev-0/expected.tsv — expected (reference) data for the dev set
test-A — directory with test data (691 texts)
test-A/in.tsv — input data for the test set
test-A/expected.tsv — expected (reference) data for the test set (hidden)