20 KiB
20 KiB
Notebook for first substask of Inżynieria Uczenia Maszynowego class project.
This workbook downloads, normalizes and prints short summary of the dataset I will be working on and its subsets.
Link to the dataset at Kaggle.com:
google colab related stuff
from google.colab import drive
drive.mount('drive')
Drive already mounted at drive; to attempt to forcibly remount, call drive.mount("drive", force_remount=True).
- Click in Colab GUI to allow Colab access and modify Google Drive files
!mkdir ~/.kaggle
!cp drive/MyDrive/kaggle.json ~/.kaggle/.
!chmod +x ~/.kaggle/kaggle.json
!pip install -q kaggle
script for lab IUM-01
download data
!kaggle datasets download -d 'pcbreviglieri/smart-grid-stability' >>/dev/null 2>&1
!unzip smart-grid-stability.zip >>/dev/null 2>&1
read the data as pandas data frame
import pandas as pd
df = pd.read_csv('smart_grid_stability_augmented.csv')
normalize values, so they are all between 0 and 1 (included)
from sklearn import preprocessing
scaler = preprocessing.StandardScaler().fit(df.iloc[:, 0:-1])
df_norm_array = scaler.transform(df.iloc[:, 0:-1])
df_norm = pd.DataFrame(data=df_norm_array,
columns=df.columns[:-1])
df_norm['stabf'] = df['stabf']
divide the data into train, test and validation subsets
from sklearn.model_selection import train_test_split
df_norm_data = df_norm.copy()
df_norm_data = df_norm_data.drop('stab', axis=1)
df_norm_labels = df_norm_data.pop('stabf')
X_train, X_testAndValid, Y_train, Y_testAndValid = train_test_split(
df_norm_data,
df_norm_labels,
test_size=0.2,
random_state=42)
X_test, X_valid, Y_test, Y_valid = train_test_split(
X_testAndValid,
Y_testAndValid,
test_size=0.5,
random_state=42)
train = pd.concat([X_train, Y_train], axis=1)
test = pd.concat([X_test, Y_test], axis=1)
valid = pd.concat([X_valid, Y_valid], axis=1)
print short summary of the dataset and its subsets
def namestr(obj, namespace):
return [name for name in namespace if namespace[name] is obj]
dataset = df_norm
for x in [dataset, train, test, valid]:
print([q for q in namestr(x, globals()) if len(q) == max([len(w) for w in namestr(x, globals())])][-1])
print("size:", len(x))
print(x.describe(include='all'))
print("class distribution", x.value_counts('stabf'))
print('===============================================================')
script for lab IUM-03
download data
!kaggle datasets download -d 'pcbreviglieri/smart-grid-stability' >>/dev/null 2>&1
!unzip smart-grid-stability.zip >>/dev/null 2>&1
check how many data entries is in the dataset
!wc -l smart_grid_stability_augmented.csv
60001 smart_grid_stability_augmented.csv
take a look at the dataset to choose columns to keep
import pandas as pd
df = pd.read_csv('smart_grid_stability_augmented.csv')
df.head()
tau1 | tau2 | tau3 | tau4 | p1 | p2 | p3 | p4 | g1 | g2 | g3 | g4 | stab | stabf | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2.959060 | 3.079885 | 8.381025 | 9.780754 | 3.763085 | -0.782604 | -1.257395 | -1.723086 | 0.650456 | 0.859578 | 0.887445 | 0.958034 | 0.055347 | unstable |
1 | 9.304097 | 4.902524 | 3.047541 | 1.369357 | 5.067812 | -1.940058 | -1.872742 | -1.255012 | 0.413441 | 0.862414 | 0.562139 | 0.781760 | -0.005957 | stable |
2 | 8.971707 | 8.848428 | 3.046479 | 1.214518 | 3.405158 | -1.207456 | -1.277210 | -0.920492 | 0.163041 | 0.766689 | 0.839444 | 0.109853 | 0.003471 | unstable |
3 | 0.716415 | 7.669600 | 4.486641 | 2.340563 | 3.963791 | -1.027473 | -1.938944 | -0.997374 | 0.446209 | 0.976744 | 0.929381 | 0.362718 | 0.028871 | unstable |
4 | 3.134112 | 7.608772 | 4.943759 | 9.857573 | 3.525811 | -1.125531 | -1.845975 | -0.554305 | 0.797110 | 0.455450 | 0.656947 | 0.820923 | 0.049860 | unstable |
discard some of the columns; shuffle the data; divide into train, test and validations subsets and print number of rows of the subsets
!sed 1d smart_grid_stability_augmented.csv | cut -f 1,5,9,13,14 -d "," | shuf | split -l 48000
!mv xaa train.csv
!mv xab toDivide
!split -l 6000 toDivide
!mv xaa test.csv
!mv xab valid.csv
!wc -l train.csv
!wc -l test.csv
!wc -l valid.csv
48000 train.csv 6000 test.csv 6000 valid.csv
script for lab IUM-05 - Model and training
first run lab IUM-01!!!
import tensorflow as tf
from tensorflow.keras import layers
model = tf.keras.Sequential([
layers.Input(shape=(12,)),
layers.Dense(32),
layers.Dense(16),
layers.Dense(2, activation='softmax')
])
model.compile(
loss=tf.losses.BinaryCrossentropy(),
optimizer=tf.optimizers.Adam(),
metrics=[tf.keras.metrics.BinaryAccuracy()])
import numpy as np
def onezero(label):
return 0 if label == 'unstable' else 1
Y_train_one_zero = [onezero(x) for x in Y_train]
Y_train_onehot = np.eye(2)[Y_train_one_zero]
Y_test_one_zero = [onezero(x) for x in Y_test]
Y_test_onehot = np.eye(2)[Y_test_one_zero]
history = model.fit(tf.convert_to_tensor(X_train, np.float32),
Y_train_onehot, epochs=5)
Epoch 1/5 1500/1500 [==============================] - 2s 1ms/step - loss: 0.3926 - binary_accuracy: 0.8149 Epoch 2/5 1500/1500 [==============================] - 2s 1ms/step - loss: 0.3925 - binary_accuracy: 0.8161 Epoch 3/5 1500/1500 [==============================] - 2s 1ms/step - loss: 0.3922 - binary_accuracy: 0.8150 Epoch 4/5 1500/1500 [==============================] - 2s 1ms/step - loss: 0.3927 - binary_accuracy: 0.8146 Epoch 5/5 1500/1500 [==============================] - 2s 1ms/step - loss: 0.3926 - binary_accuracy: 0.8143
model.summary()
Model: "sequential_6" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= dense_28 (Dense) (None, 32) 416 _________________________________________________________________ dense_29 (Dense) (None, 16) 528 _________________________________________________________________ dense_30 (Dense) (None, 2) 34 ================================================================= Total params: 978 Trainable params: 978 Non-trainable params: 0 _________________________________________________________________
results = model.evaluate(X_test, Y_test_onehot, batch_size=64)
print('test loss: ',results[0])
print('test acc: ', results[1])
94/94 [==============================] - 0s 1ms/step - loss: 0.3933 - binary_accuracy: 0.8112 test loss: 0.3933383822441101 test acc: 0.8111666440963745