en-ner-conll-2003/Transformer.ipynb

31 KiB

Transformer

Użyj transformeroewgo pipeline'u (https://huggingface.co/docs/transformers/main_classes/pipelines) do implementacji zadania rozpoznawania jednostek nazewniczych (NER) na zbiorze danych https://git.wmi.amu.edu.pl/kubapok/en-ner-conll-2003.
Dokonaj ewaluacji za pomocą narzędzia GEval.

Import bibliotek

import pandas as pd
from transformers import pipeline, AutoModelForTokenClassification, AutoTokenizer
from tqdm.notebook import tqdm

Wczytanie danych

test_A_data = pd.read_csv("test-A/in.tsv", sep="\t", header=None, names=["x"])
dev_0_data = pd.read_csv("dev-0/in.tsv", sep="\t", header=None, names=["x"])

Ustawienie modelu, tokenizatora oraz pipeline'u

model = AutoModelForTokenClassification.from_pretrained("dbmdz/bert-large-cased-finetuned-conll03-english")
tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-cased")
recognizer = pipeline("ner", model=model, tokenizer=tokenizer)
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

Funkcja naprawiająca przewidziane tagi

def correct_labels(data):
    corrected_lines = []

    for line in data:
        corrected_line = []
        previous_token = "O"

        for token in line:
            if (
                token == "I-ORG"
                and previous_token != "B-ORG"
                and previous_token != "I-ORG"
            ):
                corrected_line.append("B-ORG")
            elif (
                token == "I-PER"
                and previous_token != "B-PER"
                and previous_token != "I-PER"
            ):
                corrected_line.append("B-PER")
            elif (
                token == "I-LOC"
                and previous_token != "B-LOC"
                and previous_token != "I-LOC"
            ):
                corrected_line.append("B-LOC")
            elif (
                token == "I-MISC"
                and previous_token != "B-MISC"
                and previous_token != "I-MISC"
            ):
                corrected_line.append("B-MISC")
            else:
                corrected_line.append(token)

            previous_token = token

        corrected_lines.append(corrected_line)

    return corrected_lines

Funkcja przewidująca tagi NER

def predict_ner_tags(data):
    predictions = []
    counter = 1
    for line in data:
        print(f'Predicting NER tags for line {counter}/{len(data)}... ', end='')
        word_positions = []
        position = 0
        result = recognizer(line)
        entity_dict = {res['start']: res['entity'] for res in result}

        for word in line.split():
            word_positions.append(position)
            position += len(word) + 1
        classified_words = []

        for checked_position in word_positions:
            entity = entity_dict.get(checked_position, "O")
            classified_words.append(entity)

        predictions.append(classified_words)
        print('Done')
        counter += 1
    return correct_labels(predictions)

Funkcja zapisująca wyniki

def save_predictions(predictions, filename):
    with open(filename, "w") as f:
        for line in predictions:
            f.write(" ".join(line) + "\n")

Wyznaczenie tagów NER

print("Prediction for dev-0 data")
dev_0_labels = predict_ner_tags(dev_0_data["x"])

print()

print("Prediction for test-A data")
test_A_labels = predict_ner_tags(test_A_data["x"])
Prediction for dev-0 data
Predicting NER tags for line 1/215... Done
Predicting NER tags for line 2/215... Done
Predicting NER tags for line 3/215... Done
Predicting NER tags for line 4/215... Done
Predicting NER tags for line 5/215... Done
Predicting NER tags for line 6/215... Done
Predicting NER tags for line 7/215... Done
Predicting NER tags for line 8/215... Done
Predicting NER tags for line 9/215... Done
Predicting NER tags for line 10/215... Done
Predicting NER tags for line 11/215... Done
Predicting NER tags for line 12/215... Done
Predicting NER tags for line 13/215... Done
Predicting NER tags for line 14/215... Done
Predicting NER tags for line 15/215... Done
Predicting NER tags for line 16/215... Done
Predicting NER tags for line 17/215... Done
Predicting NER tags for line 18/215... Done
Predicting NER tags for line 19/215... Done
Predicting NER tags for line 20/215... Done
Predicting NER tags for line 21/215... Done
Predicting NER tags for line 22/215... Done
Predicting NER tags for line 23/215... Done
Predicting NER tags for line 24/215... Done
Predicting NER tags for line 25/215... Done
Predicting NER tags for line 26/215... Done
Predicting NER tags for line 27/215... Done
Predicting NER tags for line 28/215... Done
Predicting NER tags for line 29/215... Done
Predicting NER tags for line 30/215... Done
Predicting NER tags for line 31/215... Done
Predicting NER tags for line 32/215... Done
Predicting NER tags for line 33/215... Done
Predicting NER tags for line 34/215... Done
Predicting NER tags for line 35/215... Done
Predicting NER tags for line 36/215... Done
Predicting NER tags for line 37/215... Done
Predicting NER tags for line 38/215... Done
Predicting NER tags for line 39/215... Done
Predicting NER tags for line 40/215... Done
Predicting NER tags for line 41/215... Done
Predicting NER tags for line 42/215... Done
Predicting NER tags for line 43/215... Done
Predicting NER tags for line 44/215... Done
Predicting NER tags for line 45/215... Done
Predicting NER tags for line 46/215... Done
Predicting NER tags for line 47/215... Done
Predicting NER tags for line 48/215... Done
Predicting NER tags for line 49/215... Done
Predicting NER tags for line 50/215... Done
Predicting NER tags for line 51/215... Done
Predicting NER tags for line 52/215... Done
Predicting NER tags for line 53/215... Done
Predicting NER tags for line 54/215... Done
Predicting NER tags for line 55/215... Done
Predicting NER tags for line 56/215... Done
Predicting NER tags for line 57/215... Done
Predicting NER tags for line 58/215... Done
Predicting NER tags for line 59/215... Done
Predicting NER tags for line 60/215... Done
Predicting NER tags for line 61/215... Done
Predicting NER tags for line 62/215... Done
Predicting NER tags for line 63/215... Done
Predicting NER tags for line 64/215... Done
Predicting NER tags for line 65/215... Done
Predicting NER tags for line 66/215... Done
Predicting NER tags for line 67/215... Done
Predicting NER tags for line 68/215... Done
Predicting NER tags for line 69/215... Done
Predicting NER tags for line 70/215... Done
Predicting NER tags for line 71/215... Done
Predicting NER tags for line 72/215... Done
Predicting NER tags for line 73/215... Done
Predicting NER tags for line 74/215... Done
Predicting NER tags for line 75/215... Done
Predicting NER tags for line 76/215... Done
Predicting NER tags for line 77/215... Done
Predicting NER tags for line 78/215... Done
Predicting NER tags for line 79/215... Done
Predicting NER tags for line 80/215... Done
Predicting NER tags for line 81/215... Done
Predicting NER tags for line 82/215... Done
Predicting NER tags for line 83/215... Done
Predicting NER tags for line 84/215... Done
Predicting NER tags for line 85/215... Done
Predicting NER tags for line 86/215... Done
Predicting NER tags for line 87/215... Done
Predicting NER tags for line 88/215... Done
Predicting NER tags for line 89/215... Done
Predicting NER tags for line 90/215... Done
Predicting NER tags for line 91/215... Done
Predicting NER tags for line 92/215... Done
Predicting NER tags for line 93/215... Done
Predicting NER tags for line 94/215... Done
Predicting NER tags for line 95/215... Done
Predicting NER tags for line 96/215... Done
Predicting NER tags for line 97/215... Done
Predicting NER tags for line 98/215... Done
Predicting NER tags for line 99/215... Done
Predicting NER tags for line 100/215... Done
Predicting NER tags for line 101/215... Done
Predicting NER tags for line 102/215... Done
Predicting NER tags for line 103/215... Done
Predicting NER tags for line 104/215... Done
Predicting NER tags for line 105/215... Done
Predicting NER tags for line 106/215... Done
Predicting NER tags for line 107/215... Done
Predicting NER tags for line 108/215... Done
Predicting NER tags for line 109/215... Done
Predicting NER tags for line 110/215... Done
Predicting NER tags for line 111/215... Done
Predicting NER tags for line 112/215... Done
Predicting NER tags for line 113/215... Done
Predicting NER tags for line 114/215... Done
Predicting NER tags for line 115/215... Done
Predicting NER tags for line 116/215... Done
Predicting NER tags for line 117/215... Done
Predicting NER tags for line 118/215... Done
Predicting NER tags for line 119/215... Done
Predicting NER tags for line 120/215... Done
Predicting NER tags for line 121/215... Done
Predicting NER tags for line 122/215... Done
Predicting NER tags for line 123/215... Done
Predicting NER tags for line 124/215... Done
Predicting NER tags for line 125/215... Done
Predicting NER tags for line 126/215... Done
Predicting NER tags for line 127/215... Done
Predicting NER tags for line 128/215... Done
Predicting NER tags for line 129/215... Done
Predicting NER tags for line 130/215... Done
Predicting NER tags for line 131/215... Done
Predicting NER tags for line 132/215... Done
Predicting NER tags for line 133/215... Done
Predicting NER tags for line 134/215... Done
Predicting NER tags for line 135/215... Done
Predicting NER tags for line 136/215... Done
Predicting NER tags for line 137/215... Done
Predicting NER tags for line 138/215... Done
Predicting NER tags for line 139/215... Done
Predicting NER tags for line 140/215... Done
Predicting NER tags for line 141/215... Done
Predicting NER tags for line 142/215... Done
Predicting NER tags for line 143/215... Done
Predicting NER tags for line 144/215... Done
Predicting NER tags for line 145/215... Done
Predicting NER tags for line 146/215... Done
Predicting NER tags for line 147/215... Done
Predicting NER tags for line 148/215... Done
Predicting NER tags for line 149/215... Done
Predicting NER tags for line 150/215... Done
Predicting NER tags for line 151/215... Done
Predicting NER tags for line 152/215... Done
Predicting NER tags for line 153/215... Done
Predicting NER tags for line 154/215... Done
Predicting NER tags for line 155/215... Done
Predicting NER tags for line 156/215... Done
Predicting NER tags for line 157/215... Done
Predicting NER tags for line 158/215... Done
Predicting NER tags for line 159/215... Done
Predicting NER tags for line 160/215... Done
Predicting NER tags for line 161/215... Done
Predicting NER tags for line 162/215... Done
Predicting NER tags for line 163/215... Done
Predicting NER tags for line 164/215... Done
Predicting NER tags for line 165/215... Done
Predicting NER tags for line 166/215... Done
Predicting NER tags for line 167/215... Done
Predicting NER tags for line 168/215... Done
Predicting NER tags for line 169/215... Done
Predicting NER tags for line 170/215... Done
Predicting NER tags for line 171/215... Done
Predicting NER tags for line 172/215... Done
Predicting NER tags for line 173/215... Done
Predicting NER tags for line 174/215... Done
Predicting NER tags for line 175/215... Done
Predicting NER tags for line 176/215... Done
Predicting NER tags for line 177/215... Done
Predicting NER tags for line 178/215... Done
Predicting NER tags for line 179/215... Done
Predicting NER tags for line 180/215... Done
Predicting NER tags for line 181/215... Done
Predicting NER tags for line 182/215... Done
Predicting NER tags for line 183/215... Done
Predicting NER tags for line 184/215... Done
Predicting NER tags for line 185/215... Done
Predicting NER tags for line 186/215... Done
Predicting NER tags for line 187/215... Done
Predicting NER tags for line 188/215... Done
Predicting NER tags for line 189/215... Done
Predicting NER tags for line 190/215... Done
Predicting NER tags for line 191/215... Done
Predicting NER tags for line 192/215... Done
Predicting NER tags for line 193/215... Done
Predicting NER tags for line 194/215... Done
Predicting NER tags for line 195/215... Done
Predicting NER tags for line 196/215... Done
Predicting NER tags for line 197/215... Done
Predicting NER tags for line 198/215... Done
Predicting NER tags for line 199/215... Done
Predicting NER tags for line 200/215... Done
Predicting NER tags for line 201/215... Done
Predicting NER tags for line 202/215... Done
Predicting NER tags for line 203/215... Done
Predicting NER tags for line 204/215... Done
Predicting NER tags for line 205/215... Done
Predicting NER tags for line 206/215... Done
Predicting NER tags for line 207/215... Done
Predicting NER tags for line 208/215... Done
Predicting NER tags for line 209/215... Done
Predicting NER tags for line 210/215... Done
Predicting NER tags for line 211/215... Done
Predicting NER tags for line 212/215... Done
Predicting NER tags for line 213/215... Done
Predicting NER tags for line 214/215... Done
Predicting NER tags for line 215/215... Done

Prediction for test-A data
Predicting NER tags for line 1/230... Done
Predicting NER tags for line 2/230... Done
Predicting NER tags for line 3/230... Done
Predicting NER tags for line 4/230... Done
Predicting NER tags for line 5/230... Done
Predicting NER tags for line 6/230... Done
Predicting NER tags for line 7/230... Done
Predicting NER tags for line 8/230... Done
Predicting NER tags for line 9/230... Done
Predicting NER tags for line 10/230... Done
Predicting NER tags for line 11/230... Done
Predicting NER tags for line 12/230... Done
Predicting NER tags for line 13/230... Done
Predicting NER tags for line 14/230... Done
Predicting NER tags for line 15/230... Done
Predicting NER tags for line 16/230... Done
Predicting NER tags for line 17/230... Done
Predicting NER tags for line 18/230... Done
Predicting NER tags for line 19/230... Done
Predicting NER tags for line 20/230... Done
Predicting NER tags for line 21/230... Done
Predicting NER tags for line 22/230... Done
Predicting NER tags for line 23/230... Done
Predicting NER tags for line 24/230... Done
Predicting NER tags for line 25/230... Done
Predicting NER tags for line 26/230... Done
Predicting NER tags for line 27/230... Done
Predicting NER tags for line 28/230... Done
Predicting NER tags for line 29/230... Done
Predicting NER tags for line 30/230... Done
Predicting NER tags for line 31/230... Done
Predicting NER tags for line 32/230... Done
Predicting NER tags for line 33/230... Done
Predicting NER tags for line 34/230... Done
Predicting NER tags for line 35/230... Done
Predicting NER tags for line 36/230... Done
Predicting NER tags for line 37/230... Done
Predicting NER tags for line 38/230... Done
Predicting NER tags for line 39/230... Done
Predicting NER tags for line 40/230... Done
Predicting NER tags for line 41/230... Done
Predicting NER tags for line 42/230... Done
Predicting NER tags for line 43/230... Done
Predicting NER tags for line 44/230... Done
Predicting NER tags for line 45/230... Done
Predicting NER tags for line 46/230... Done
Predicting NER tags for line 47/230... Done
Predicting NER tags for line 48/230... Done
Predicting NER tags for line 49/230... Done
Predicting NER tags for line 50/230... Done
Predicting NER tags for line 51/230... Done
Predicting NER tags for line 52/230... Done
Predicting NER tags for line 53/230... Done
Predicting NER tags for line 54/230... Done
Predicting NER tags for line 55/230... Done
Predicting NER tags for line 56/230... Done
Predicting NER tags for line 57/230... Done
Predicting NER tags for line 58/230... Done
Predicting NER tags for line 59/230... Done
Predicting NER tags for line 60/230... Done
Predicting NER tags for line 61/230... Done
Predicting NER tags for line 62/230... Done
Predicting NER tags for line 63/230... Done
Predicting NER tags for line 64/230... Done
Predicting NER tags for line 65/230... Done
Predicting NER tags for line 66/230... Done
Predicting NER tags for line 67/230... Done
Predicting NER tags for line 68/230... Done
Predicting NER tags for line 69/230... Done
Predicting NER tags for line 70/230... Done
Predicting NER tags for line 71/230... Done
Predicting NER tags for line 72/230... Done
Predicting NER tags for line 73/230... Done
Predicting NER tags for line 74/230... Done
Predicting NER tags for line 75/230... Done
Predicting NER tags for line 76/230... Done
Predicting NER tags for line 77/230... Done
Predicting NER tags for line 78/230... Done
Predicting NER tags for line 79/230... Done
Predicting NER tags for line 80/230... Done
Predicting NER tags for line 81/230... Done
Predicting NER tags for line 82/230... Done
Predicting NER tags for line 83/230... Done
Predicting NER tags for line 84/230... Done
Predicting NER tags for line 85/230... Done
Predicting NER tags for line 86/230... Done
Predicting NER tags for line 87/230... Done
Predicting NER tags for line 88/230... Done
Predicting NER tags for line 89/230... Done
Predicting NER tags for line 90/230... Done
Predicting NER tags for line 91/230... Done
Predicting NER tags for line 92/230... Done
Predicting NER tags for line 93/230... Done
Predicting NER tags for line 94/230... Done
Predicting NER tags for line 95/230... Done
Predicting NER tags for line 96/230... Done
Predicting NER tags for line 97/230... Done
Predicting NER tags for line 98/230... Done
Predicting NER tags for line 99/230... Done
Predicting NER tags for line 100/230... Done
Predicting NER tags for line 101/230... Done
Predicting NER tags for line 102/230... Done
Predicting NER tags for line 103/230... Done
Predicting NER tags for line 104/230... Done
Predicting NER tags for line 105/230... Done
Predicting NER tags for line 106/230... Done
Predicting NER tags for line 107/230... Done
Predicting NER tags for line 108/230... Done
Predicting NER tags for line 109/230... Done
Predicting NER tags for line 110/230... Done
Predicting NER tags for line 111/230... Done
Predicting NER tags for line 112/230... Done
Predicting NER tags for line 113/230... Done
Predicting NER tags for line 114/230... Done
Predicting NER tags for line 115/230... Done
Predicting NER tags for line 116/230... Done
Predicting NER tags for line 117/230... Done
Predicting NER tags for line 118/230... Done
Predicting NER tags for line 119/230... Done
Predicting NER tags for line 120/230... Done
Predicting NER tags for line 121/230... Done
Predicting NER tags for line 122/230... Done
Predicting NER tags for line 123/230... Done
Predicting NER tags for line 124/230... Done
Predicting NER tags for line 125/230... Done
Predicting NER tags for line 126/230... Done
Predicting NER tags for line 127/230... Done
Predicting NER tags for line 128/230... Done
Predicting NER tags for line 129/230... Done
Predicting NER tags for line 130/230... Done
Predicting NER tags for line 131/230... Done
Predicting NER tags for line 132/230... Done
Predicting NER tags for line 133/230... Done
Predicting NER tags for line 134/230... Done
Predicting NER tags for line 135/230... Done
Predicting NER tags for line 136/230... Done
Predicting NER tags for line 137/230... Done
Predicting NER tags for line 138/230... Done
Predicting NER tags for line 139/230... Done
Predicting NER tags for line 140/230... Done
Predicting NER tags for line 141/230... Done
Predicting NER tags for line 142/230... Done
Predicting NER tags for line 143/230... Done
Predicting NER tags for line 144/230... Done
Predicting NER tags for line 145/230... Done
Predicting NER tags for line 146/230... Done
Predicting NER tags for line 147/230... Done
Predicting NER tags for line 148/230... Done
Predicting NER tags for line 149/230... Done
Predicting NER tags for line 150/230... Done
Predicting NER tags for line 151/230... Done
Predicting NER tags for line 152/230... Done
Predicting NER tags for line 153/230... Done
Predicting NER tags for line 154/230... Done
Predicting NER tags for line 155/230... Done
Predicting NER tags for line 156/230... Done
Predicting NER tags for line 157/230... Done
Predicting NER tags for line 158/230... Done
Predicting NER tags for line 159/230... Done
Predicting NER tags for line 160/230... Done
Predicting NER tags for line 161/230... Done
Predicting NER tags for line 162/230... Done
Predicting NER tags for line 163/230... Done
Predicting NER tags for line 164/230... Done
Predicting NER tags for line 165/230... Done
Predicting NER tags for line 166/230... Done
Predicting NER tags for line 167/230... Done
Predicting NER tags for line 168/230... Done
Predicting NER tags for line 169/230... Done
Predicting NER tags for line 170/230... Done
Predicting NER tags for line 171/230... Done
Predicting NER tags for line 172/230... Done
Predicting NER tags for line 173/230... Done
Predicting NER tags for line 174/230... Done
Predicting NER tags for line 175/230... Done
Predicting NER tags for line 176/230... Done
Predicting NER tags for line 177/230... Done
Predicting NER tags for line 178/230... Done
Predicting NER tags for line 179/230... Done
Predicting NER tags for line 180/230... Done
Predicting NER tags for line 181/230... Done
Predicting NER tags for line 182/230... Done
Predicting NER tags for line 183/230... Done
Predicting NER tags for line 184/230... Done
Predicting NER tags for line 185/230... Done
Predicting NER tags for line 186/230... Done
Predicting NER tags for line 187/230... Done
Predicting NER tags for line 188/230... Done
Predicting NER tags for line 189/230... Done
Predicting NER tags for line 190/230... Done
Predicting NER tags for line 191/230... Done
Predicting NER tags for line 192/230... Done
Predicting NER tags for line 193/230... Done
Predicting NER tags for line 194/230... Done
Predicting NER tags for line 195/230... Done
Predicting NER tags for line 196/230... Done
Predicting NER tags for line 197/230... Done
Predicting NER tags for line 198/230... Done
Predicting NER tags for line 199/230... Done
Predicting NER tags for line 200/230... Done
Predicting NER tags for line 201/230... Done
Predicting NER tags for line 202/230... Done
Predicting NER tags for line 203/230... Done
Predicting NER tags for line 204/230... Done
Predicting NER tags for line 205/230... Done
Predicting NER tags for line 206/230... Done
Predicting NER tags for line 207/230... Done
Predicting NER tags for line 208/230... Done
Predicting NER tags for line 209/230... Done
Predicting NER tags for line 210/230... Done
Predicting NER tags for line 211/230... Done
Predicting NER tags for line 212/230... Done
Predicting NER tags for line 213/230... Done
Predicting NER tags for line 214/230... Done
Predicting NER tags for line 215/230... Done
Predicting NER tags for line 216/230... Done
Predicting NER tags for line 217/230... Done
Predicting NER tags for line 218/230... Done
Predicting NER tags for line 219/230... Done
Predicting NER tags for line 220/230... Done
Predicting NER tags for line 221/230... Done
Predicting NER tags for line 222/230... Done
Predicting NER tags for line 223/230... Done
Predicting NER tags for line 224/230... Done
Predicting NER tags for line 225/230... Done
Predicting NER tags for line 226/230... Done
Predicting NER tags for line 227/230... Done
Predicting NER tags for line 228/230... Done
Predicting NER tags for line 229/230... Done
Predicting NER tags for line 230/230... Done

Zapis wyników do plików

save_predictions(dev_0_labels, "dev-0/out.tsv")
save_predictions(test_A_labels, "test-A/out.tsv")