22 KiB
22 KiB
Importy
import numpy as np
import pandas as pd
Wczytanie danych
df = pd.read_csv('data4.csv')
y = pd.DataFrame(df['isGoal'])
X = df.drop(['isGoal'], axis=1)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
d:\anaconda3\lib\site-packages\scipy\__init__.py:146: UserWarning: A NumPy version >=1.16.5 and <1.23.0 is required for this version of SciPy (detected version 1.24.3 warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
X_train.head()
match_minute | match_second | position_x | position_y | play_type | BodyPart | Number_Intervening_Opponents | Number_Intervening_Teammates | Interference_on_Shooter | outcome | ... | Interference_on_Shooter_Code | distance_to_goalM | distance_to_centerM | angle | isFoot | isHead | header_distance_to_goalM | High | Low | Medium | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
8767 | 13 | 28 | 9.23 | -2.24 | Open Play | Head | 3 | 0 | Medium | Goal | ... | 2 | 9.499168 | 2.245283 | 13.672174 | 0 | 1 | 9.499168 | 0 | 0 | 1 |
5798 | 78 | 9 | 14.46 | 12.72 | Open Play | Left | 3 | 0 | Low | Saved | ... | 1 | 19.278332 | 12.750000 | 41.404002 | 1 | 0 | 0.000000 | 0 | 1 | 0 |
6018 | 78 | 27 | 9.73 | 14.22 | Open Play | Left | 2 | 0 | Low | Missed | ... | 1 | 17.257933 | 14.253538 | 55.681087 | 1 | 0 | 0.000000 | 0 | 1 | 0 |
4961 | 34 | 34 | 34.91 | 0.25 | Open Play | Right | 4 | 1 | Low | Saved | ... | 1 | 34.910899 | 0.250590 | 0.411271 | 1 | 0 | 0.000000 | 0 | 1 | 0 |
447 | 52 | 57 | 26.93 | 1.00 | Open Play | Left | 2 | 0 | Medium | Saved | ... | 2 | 26.948648 | 1.002358 | 2.131616 | 1 | 0 | 0.000000 | 0 | 0 | 1 |
5 rows × 29 columns
y_train.head()
isGoal | |
---|---|
8767 | 1 |
5798 | 0 |
6018 | 0 |
4961 | 0 |
447 | 0 |
Przygotowanie danych
X_train.columns
Index(['match_minute', 'match_second', 'position_x', 'position_y', 'play_type', 'BodyPart', 'Number_Intervening_Opponents', 'Number_Intervening_Teammates', 'Interference_on_Shooter', 'outcome', 'position_xM', 'position_yM', 'position_xM_r', 'position_yM_r', 'position_xM_std', 'position_yM_std', 'position_xM_std_r', 'position_yM_std_r', 'BodyPartCode', 'Interference_on_Shooter_Code', 'distance_to_goalM', 'distance_to_centerM', 'angle', 'isFoot', 'isHead', 'header_distance_to_goalM', 'High', 'Low', 'Medium'], dtype='object')
Uwzględnienie wybranych cech:
- Współrzędna x strzelającego,
- Współrzędna y strzelającego,
- Dystans do bramki,
- Kąt do bramki,
- Minuta meczu,
- Liczba przeciwników przed piłką,
- Liczba zawodników ze swojej drużyny przed piłką,
- Część ciała.
X_train_extracted = X_train[['position_x', 'position_y', 'distance_to_goalM',
'angle', 'match_minute', 'Number_Intervening_Opponents',
'Number_Intervening_Teammates', 'isFoot', 'isHead']]
X_test_extracted = X_test[['position_x', 'position_y', 'distance_to_goalM',
'angle', 'match_minute', 'Number_Intervening_Opponents',
'Number_Intervening_Teammates', 'isFoot', 'isHead']]
Trening danych
from sklearn.linear_model import LogisticRegression
model = LogisticRegression(max_iter=500)
model.fit(X_train_extracted, y_train)
d:\anaconda3\lib\site-packages\sklearn\utils\validation.py:1143: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). y = column_or_1d(y, warn=True)
LogisticRegression(max_iter=500)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
LogisticRegression(max_iter=500)
Ewaluacja modelu
from sklearn.metrics import roc_auc_score
print(f'Zbiór danych testowych zawiera {len(y_test)} oddane strzały, gdzie {y_test.sum()["isGoal"]} to strzały trafione.')
print(f'Dokładność klasyfikacji, czy strzał jest bramką, czy nie, wynosi {model.score(X_test_extracted, y_test):.2f}%.')
print(f'klasyfikator uzyskał ROC-AUC na poziomie {roc_auc_score(y_test, model.predict_proba(X_test_extracted)[:, 1]):.2f}%.')
Zbiór danych testowych zawiera 2033 oddane strzały, gdzie 236 to strzały trafione. Dokładność klasyfikacji, czy strzał jest bramką, czy nie, wynosi 0.89%. klasyfikator uzyskał ROC-AUC na poziomie 0.76%.
from sklearn.metrics import classification_report
print(classification_report(y_test,model.predict(X_test_extracted)))
precision recall f1-score support 0 0.89 0.99 0.94 1797 1 0.59 0.09 0.16 236 accuracy 0.89 2033 macro avg 0.74 0.54 0.55 2033 weighted avg 0.86 0.89 0.85 2033
Zapisywanie modelu
from joblib import dump
dump(model, 'regresja_logistyczna.joblib')
['regresja_logistyczna.joblib']
Wczytywanie modelu
from joblib import load
model2 = load('regresja_logistyczna.joblib')