ium_470607/ium01.ipynb
2021-04-24 13:31:02 +02:00

20 KiB

Notebook for first substask of Inżynieria Uczenia Maszynowego class project.

This workbook downloads, normalizes and prints short summary of the dataset I will be working on and its subsets.

Link to the dataset at Kaggle.com:

https://www.kaggle.com/pcbreviglieri/smart-grid-stability

from google.colab import drive
drive.mount('drive')
Drive already mounted at drive; to attempt to forcibly remount, call drive.mount("drive", force_remount=True).
  • Click in Colab GUI to allow Colab access and modify Google Drive files
!mkdir ~/.kaggle
!cp drive/MyDrive/kaggle.json ~/.kaggle/.
!chmod +x ~/.kaggle/kaggle.json
!pip install -q kaggle

script for lab IUM-01

download data

!kaggle datasets download -d 'pcbreviglieri/smart-grid-stability' >>/dev/null 2>&1
!unzip smart-grid-stability.zip >>/dev/null 2>&1

read the data as pandas data frame

import pandas as pd

df = pd.read_csv('smart_grid_stability_augmented.csv')

normalize values, so they are all between 0 and 1 (included)

from sklearn import preprocessing

scaler = preprocessing.StandardScaler().fit(df.iloc[:, 0:-1])
df_norm_array = scaler.transform(df.iloc[:, 0:-1])
df_norm = pd.DataFrame(data=df_norm_array,
                       columns=df.columns[:-1])
df_norm['stabf'] = df['stabf']

divide the data into train, test and validation subsets

from sklearn.model_selection import train_test_split

df_norm_data = df_norm.copy()
df_norm_data = df_norm_data.drop('stab', axis=1)
df_norm_labels = df_norm_data.pop('stabf')

X_train, X_testAndValid, Y_train, Y_testAndValid = train_test_split(
    df_norm_data,
    df_norm_labels,
    test_size=0.2,
    random_state=42)

X_test, X_valid, Y_test, Y_valid = train_test_split(
    X_testAndValid,
    Y_testAndValid,
    test_size=0.5,
    random_state=42)
train = pd.concat([X_train, Y_train], axis=1)
test = pd.concat([X_test, Y_test], axis=1)
valid = pd.concat([X_valid, Y_valid], axis=1)

print short summary of the dataset and its subsets

def namestr(obj, namespace):
  return [name for name in namespace if namespace[name] is obj]

dataset = df_norm
for x in [dataset, train, test, valid]:
  print([q for q in namestr(x, globals()) if len(q) == max([len(w) for w in namestr(x, globals())])][-1]) 
  print("size:", len(x))
  print(x.describe(include='all'))
  print("class distribution", x.value_counts('stabf'))
  print('===============================================================')

script for lab IUM-03

download data

!kaggle datasets download -d 'pcbreviglieri/smart-grid-stability' >>/dev/null 2>&1
!unzip smart-grid-stability.zip >>/dev/null 2>&1

check how many data entries is in the dataset

!wc -l smart_grid_stability_augmented.csv
60001 smart_grid_stability_augmented.csv

take a look at the dataset to choose columns to keep

import pandas as pd
df = pd.read_csv('smart_grid_stability_augmented.csv')
df.head()
tau1 tau2 tau3 tau4 p1 p2 p3 p4 g1 g2 g3 g4 stab stabf
0 2.959060 3.079885 8.381025 9.780754 3.763085 -0.782604 -1.257395 -1.723086 0.650456 0.859578 0.887445 0.958034 0.055347 unstable
1 9.304097 4.902524 3.047541 1.369357 5.067812 -1.940058 -1.872742 -1.255012 0.413441 0.862414 0.562139 0.781760 -0.005957 stable
2 8.971707 8.848428 3.046479 1.214518 3.405158 -1.207456 -1.277210 -0.920492 0.163041 0.766689 0.839444 0.109853 0.003471 unstable
3 0.716415 7.669600 4.486641 2.340563 3.963791 -1.027473 -1.938944 -0.997374 0.446209 0.976744 0.929381 0.362718 0.028871 unstable
4 3.134112 7.608772 4.943759 9.857573 3.525811 -1.125531 -1.845975 -0.554305 0.797110 0.455450 0.656947 0.820923 0.049860 unstable

discard some of the columns; shuffle the data; divide into train, test and validations subsets and print number of rows of the subsets

!sed 1d smart_grid_stability_augmented.csv | cut -f 1,5,9,13,14 -d "," | shuf | split -l 48000
!mv xaa train.csv
!mv xab toDivide
!split -l 6000 toDivide
!mv xaa test.csv
!mv xab valid.csv
!wc -l train.csv
!wc -l test.csv
!wc -l valid.csv
48000 train.csv
6000 test.csv
6000 valid.csv

script for lab IUM-05 - Model and training

first run lab IUM-01!!!

import tensorflow as tf
from tensorflow.keras import layers

model = tf.keras.Sequential([
                          layers.Input(shape=(12,)),
                          layers.Dense(32),
                          layers.Dense(16),
                          layers.Dense(2, activation='softmax')
])

model.compile(
    loss=tf.losses.BinaryCrossentropy(),
    optimizer=tf.optimizers.Adam(),
    metrics=[tf.keras.metrics.BinaryAccuracy()])
import numpy as np

def onezero(label):
  return 0 if label == 'unstable' else 1


Y_train_one_zero = [onezero(x) for x in Y_train]
Y_train_onehot = np.eye(2)[Y_train_one_zero]

Y_test_one_zero = [onezero(x) for x in Y_test]
Y_test_onehot = np.eye(2)[Y_test_one_zero]
history = model.fit(tf.convert_to_tensor(X_train, np.float32),
          Y_train_onehot, epochs=5)
Epoch 1/5
1500/1500 [==============================] - 2s 1ms/step - loss: 0.3926 - binary_accuracy: 0.8149
Epoch 2/5
1500/1500 [==============================] - 2s 1ms/step - loss: 0.3925 - binary_accuracy: 0.8161
Epoch 3/5
1500/1500 [==============================] - 2s 1ms/step - loss: 0.3922 - binary_accuracy: 0.8150
Epoch 4/5
1500/1500 [==============================] - 2s 1ms/step - loss: 0.3927 - binary_accuracy: 0.8146
Epoch 5/5
1500/1500 [==============================] - 2s 1ms/step - loss: 0.3926 - binary_accuracy: 0.8143
model.summary()
Model: "sequential_6"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_28 (Dense)             (None, 32)                416       
_________________________________________________________________
dense_29 (Dense)             (None, 16)                528       
_________________________________________________________________
dense_30 (Dense)             (None, 2)                 34        
=================================================================
Total params: 978
Trainable params: 978
Non-trainable params: 0
_________________________________________________________________
results = model.evaluate(X_test, Y_test_onehot, batch_size=64)
print('test loss: ',results[0])
print('test acc: ', results[1])
94/94 [==============================] - 0s 1ms/step - loss: 0.3933 - binary_accuracy: 0.8112
test loss:  0.3933383822441101
test acc:  0.8111666440963745