fce-error-detection/README.md

FCE - Grammatical error detection
===========================

Detect errors in English text.

This is a Gonito.net challenge based on data from https://ilexir.co.uk/datasets/index.html
The aim of the challenge is to predict which tokens are incorrect.

MultiLabel-F0.5 is used as the evaluation metric.

Dataset reference:
1) Compositional Sequence Labeling Models for Error Detection in Learner Writing
Marek Rei and Helen Yannakoudakis
In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL-2016)

2) A New Dataset and Method for Automatically Grading ESOL Texts
Helen Yannakoudakis, Ted Briscoe and Ben Medlock
In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL-2011)

## Directory structure

* `README.md` — this file
* `config.txt` — configuration file
* `train/` — directory with training data
* `train/in.tsv` — Original input text for the train set
* `train/expected.tsv` — Incorrect token indexes. Indexes start from 1.
* `dev-0/` — directory with dev data
* `dev-0/in.tsv` — Original input text for the dev set
* `dev-0/expected.tsv` — Incorrect token indexes. Indexes start from 1.
* `test-A` — directory with test data
* `test-A/in.tsv` — Original input text for the test set
Add files 2023-10-28 21:20:54 +02:00			`FCE - Grammatical error detection`
			`===========================`

			`Detect errors in English text.`

			`This is a Gonito.net challenge based on data from https://ilexir.co.uk/datasets/index.html`
			`The aim of the challenge is to predict which tokens are incorrect.`

			`MultiLabel-F0.5 is used as the evaluation metric.`

			`Dataset reference:`
			`1) Compositional Sequence Labeling Models for Error Detection in Learner Writing`
			`Marek Rei and Helen Yannakoudakis`
			`In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL-2016)`

			`2) A New Dataset and Method for Automatically Grading ESOL Texts`
			`Helen Yannakoudakis, Ted Briscoe and Ben Medlock`
			`In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL-2011)`

			`## Directory structure`

			* `README.md` — this file
			* `config.txt` — configuration file
			* `train/` — directory with training data
			* `train/in.tsv` — Original input text for the train set
			* `train/expected.tsv` — Incorrect token indexes. Indexes start from 1.
			* `dev-0/` — directory with dev data
			* `dev-0/in.tsv` — Original input text for the dev set
			* `dev-0/expected.tsv` — Incorrect token indexes. Indexes start from 1.
			* `test-A` — directory with test data
			* `test-A/in.tsv` — Original input text for the test set