Go to file
Ryszard Staruch ef813cde8f Roberta-large oddballness threshold 0.84 2024-01-24 11:29:21 +00:00
dev-0 Roberta-large oddballness threshold 0.84 2024-01-24 11:29:21 +00:00
test-A Roberta-large oddballness threshold 0.84 2024-01-24 11:29:21 +00:00
train Add files 2023-10-28 21:20:54 +02:00
.gitignore Add files 2023-10-28 21:20:54 +02:00
README.md Add files 2023-10-28 21:20:54 +02:00
config.txt Add files 2023-10-28 21:20:54 +02:00

README.md

FCE - Grammatical error detection

Detect errors in English text.

This is a Gonito.net challenge based on data from https://ilexir.co.uk/datasets/index.html The aim of the challenge is to predict which tokens are incorrect.

MultiLabel-F0.5 is used as the evaluation metric.

Dataset reference:

  1. Compositional Sequence Labeling Models for Error Detection in Learner Writing Marek Rei and Helen Yannakoudakis In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL-2016)

  2. A New Dataset and Method for Automatically Grading ESOL Texts Helen Yannakoudakis, Ted Briscoe and Ben Medlock In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL-2011)

Directory structure

  • README.md — this file
  • config.txt — configuration file
  • train/ — directory with training data
  • train/in.tsv — Original input text for the train set
  • train/expected.tsv — Incorrect token indexes. Indexes start from 1.
  • dev-0/ — directory with dev data
  • dev-0/in.tsv — Original input text for the dev set
  • dev-0/expected.tsv — Incorrect token indexes. Indexes start from 1.
  • test-A — directory with test data
  • test-A/in.tsv — Original input text for the test set