Notebook for first substask of Inżynieria Uczenia Maszynowego class project.

This workbook downloads, normalizes and prints short summary of the dataset I will be working on and its subsets.

Link to the dataset at Kaggle.com:


from google.colab import drive
!mkdir ~/.kaggle
!cp drive/MyDrive/kaggle.json ~/.kaggle/.
!chmod +x ~/.kaggle/kaggle.json
!pip install -q kaggle

script for lab IUM-01

download data

!kaggle datasets download -d 'pcbreviglieri/smart-grid-stability' >>/dev/null 2>&1
!unzip smart-grid-stability.zip >>/dev/null 2>&1

read the data as pandas data frame

import pandas as pd

df = pd.read_csv('smart_grid_stability_augmented.csv')

normalize values, so they are all between 0 and 1 (included)

from sklearn import preprocessing

scaler = preprocessing.StandardScaler().fit(df.iloc[:, 0:-1])
df_norm_array = scaler.transform(df.iloc[:, 0:-1])
df_norm = pd.DataFrame(data=df_norm_array,
df_norm['stabf'] = df['stabf']

divide the data into train, test and validation subsets

from sklearn.model_selection import train_test_split

df_norm_data = df_norm.copy()
df_norm_data = df_norm_data.drop('stab', axis=1)
df_norm_labels = df_norm_data.pop('stabf')

X_train, X_testAndValid, Y_train, Y_testAndValid = train_test_split(

X_test, X_valid, Y_test, Y_valid = train_test_split(
train = pd.concat([X_train, Y_train], axis=1)
test = pd.concat([X_test, Y_test], axis=1)
valid = pd.concat([X_valid, Y_valid], axis=1)

print short summary of the dataset and its subsets

def namestr(obj, namespace):
  return [name for name in namespace if namespace[name] is obj]

dataset = df_norm
for x in [dataset, train, test, valid]:
  print([q for q in namestr(x, globals()) if len(q) == max([len(w) for w in namestr(x, globals())])][-1]) 
  print("size:", len(x))
  print("class distribution", x.value_counts('stabf'))

script for lab IUM-03

download data

!kaggle datasets download -d 'pcbreviglieri/smart-grid-stability' >>/dev/null 2>&1
!unzip smart-grid-stability.zip >>/dev/null 2>&1

check how many data entries is in the dataset

!wc -l smart_grid_stability_augmented.csv
take a look at the dataset to choose columns to keep

import pandas as pd
df = pd.read_csv('smart_grid_stability_augmented.csv')
tau1 tau2 tau3 tau4 p1 p2 p3 p4 g1 g2 g3 g4 stab stabf
0 2.959060 3.079885 8.381025 9.780754 3.763085 -0.782604 -1.257395 -1.723086 0.650456 0.859578 0.887445 0.958034 0.055347 unstable
1 9.304097 4.902524 3.047541 1.369357 5.067812 -1.940058 -1.872742 -1.255012 0.413441 0.862414 0.562139 0.781760 -0.005957 stable
2 8.971707 8.848428 3.046479 1.214518 3.405158 -1.207456 -1.277210 -0.920492 0.163041 0.766689 0.839444 0.109853 0.003471 unstable
3 0.716415 7.669600 4.486641 2.340563 3.963791 -1.027473 -1.938944 -0.997374 0.446209 0.976744 0.929381 0.362718 0.028871 unstable
4 3.134112 7.608772 4.943759 9.857573 3.525811 -1.125531 -1.845975 -0.554305 0.797110 0.455450 0.656947 0.820923 0.049860 unstable

discard some of the columns; shuffle the data; divide into train, test and validations subsets and print number of rows of the subsets

!sed 1d smart_grid_stability_augmented.csv | cut -f 1,5,9,13,14 -d "," | shuf | split -l 48000
!mv xaa train.csv
!mv xab toDivide
!split -l 6000 toDivide
!mv xaa test.csv
!mv xab valid.csv
!wc -l train.csv
!wc -l test.csv
!wc -l valid.csv
script for lab IUM-05 - Model and training

first run lab IUM-01!!!

import tensorflow as tf
from tensorflow.keras import layers

model = tf.keras.Sequential([
                          layers.Dense(2, activation='softmax')

import numpy as np

def onezero(label):
  return 0 if label == 'unstable' else 1

Y_train_one_zero = [onezero(x) for x in Y_train]
Y_train_onehot = np.eye(2)[Y_train_one_zero]

Y_test_one_zero = [onezero(x) for x in Y_test]
Y_test_onehot = np.eye(2)[Y_test_one_zero]
history = model.fit(tf.convert_to_tensor(X_train, np.float32),
          Y_train_onehot, epochs=5)
results = model.evaluate(X_test, Y_test_onehot, batch_size=64)
print('test loss: ',results[0])
print('test acc: ', results[1])
94/94 [==============================] - 0s 1ms/step - loss: 0.3933 - binary_accuracy: 0.8112
test loss:  0.3933383822441101
test acc:  0.8111666440963745