1.3 KiB
1.3 KiB
FCE - Grammatical error detection
Detect errors in English text.
This is a Gonito.net challenge based on data from https://ilexir.co.uk/datasets/index.html The aim of the challenge is to predict which tokens are incorrect.
MultiLabel-F0.5 is used as the evaluation metric.
Dataset reference:
-
Compositional Sequence Labeling Models for Error Detection in Learner Writing Marek Rei and Helen Yannakoudakis In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL-2016)
-
A New Dataset and Method for Automatically Grading ESOL Texts Helen Yannakoudakis, Ted Briscoe and Ben Medlock In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL-2011)
Directory structure
README.md
— this fileconfig.txt
— configuration filetrain/
— directory with training datatrain/in.tsv
— Original input text for the train settrain/expected.tsv
— Incorrect token indexes. Indexes start from 1.dev-0/
— directory with dev datadev-0/in.tsv
— Original input text for the dev setdev-0/expected.tsv
— Incorrect token indexes. Indexes start from 1.test-A
— directory with test datatest-A/in.tsv
— Original input text for the test set