Go to file
Ryszard Staruch 5f2dcc24a3 Check mistral 2024-06-10 11:08:32 +02:00
dev-0 Llama 7b 16fp prob (threshold 0.0004) 2023-11-10 14:51:34 +01:00
test-A Check mistral 2024-06-10 11:08:32 +02:00
train Add files 2023-10-28 21:20:54 +02:00
.gitignore Add files 2023-10-28 21:20:54 +02:00
README.md Add files 2023-10-28 21:20:54 +02:00
config.txt Add files 2023-10-28 21:20:54 +02:00

README.md

FCE - Grammatical error detection

Detect errors in English text.

This is a Gonito.net challenge based on data from https://ilexir.co.uk/datasets/index.html The aim of the challenge is to predict which tokens are incorrect.

MultiLabel-F0.5 is used as the evaluation metric.

Dataset reference:

  1. Compositional Sequence Labeling Models for Error Detection in Learner Writing Marek Rei and Helen Yannakoudakis In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL-2016)

  2. A New Dataset and Method for Automatically Grading ESOL Texts Helen Yannakoudakis, Ted Briscoe and Ben Medlock In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL-2011)

Directory structure

  • README.md — this file
  • config.txt — configuration file
  • train/ — directory with training data
  • train/in.tsv — Original input text for the train set
  • train/expected.tsv — Incorrect token indexes. Indexes start from 1.
  • dev-0/ — directory with dev data
  • dev-0/in.tsv — Original input text for the dev set
  • dev-0/expected.tsv — Incorrect token indexes. Indexes start from 1.
  • test-A — directory with test data
  • test-A/in.tsv — Original input text for the test set