fce-error-detection/README.md

1.3 KiB

FCE - Grammatical error detection

Detect errors in English text.

This is a Gonito.net challenge based on data from https://ilexir.co.uk/datasets/index.html The aim of the challenge is to predict which tokens are incorrect.

MultiLabel-F0.5 is used as the evaluation metric.

Dataset reference:

  1. Compositional Sequence Labeling Models for Error Detection in Learner Writing Marek Rei and Helen Yannakoudakis In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL-2016)

  2. A New Dataset and Method for Automatically Grading ESOL Texts Helen Yannakoudakis, Ted Briscoe and Ben Medlock In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL-2011)

Directory structure

  • README.md — this file
  • config.txt — configuration file
  • train/ — directory with training data
  • train/in.tsv — Original input text for the train set
  • train/expected.tsv — Incorrect token indexes. Indexes start from 1.
  • dev-0/ — directory with dev data
  • dev-0/in.tsv — Original input text for the dev set
  • dev-0/expected.tsv — Incorrect token indexes. Indexes start from 1.
  • test-A — directory with test data
  • test-A/in.tsv — Original input text for the test set