Notebook for first substask of Inżynieria Uczenia Maszynowego class project.

This workbook downloads, normalizes and prints short summary of the dataset I will be working on and its subsets.

Link to the dataset at Kaggle.com:

https://www.kaggle.com/pcbreviglieri/smart-grid-stability

from google.colab import drive
drive.mount('drive')

Drive already mounted at drive; to attempt to forcibly remount, call drive.mount("drive", force_remount=True).

Click in Colab GUI to allow Colab access and modify Google Drive files

!mkdir ~/.kaggle
!cp drive/MyDrive/kaggle.json ~/.kaggle/.
!chmod +x ~/.kaggle/kaggle.json
!pip install -q kaggle

script for lab IUM-01

download data

!kaggle datasets download -d 'pcbreviglieri/smart-grid-stability' >>/dev/null 2>&1
!unzip smart-grid-stability.zip >>/dev/null 2>&1

read the data as pandas data frame

import pandas as pd

df = pd.read_csv('smart_grid_stability_augmented.csv')

normalize values, so they are all between 0 and 1 (included)

from sklearn import preprocessing

scaler = preprocessing.StandardScaler().fit(df.iloc[:, 0:-1])
df_norm_array = scaler.transform(df.iloc[:, 0:-1])
df_norm = pd.DataFrame(data=df_norm_array,
                       columns=df.columns[:-1])
df_norm['stabf'] = df['stabf']

divide the data into train, test and validation subsets

from sklearn.model_selection import train_test_split

df_norm_data = df_norm.copy()
df_norm_data = df_norm_data.drop('stab', axis=1)
df_norm_labels = df_norm_data.pop('stabf')

X_train, X_testAndValid, Y_train, Y_testAndValid = train_test_split(
    df_norm_data,
    df_norm_labels,
    test_size=0.2,
    random_state=42)

X_test, X_valid, Y_test, Y_valid = train_test_split(
    X_testAndValid,
    Y_testAndValid,
    test_size=0.5,
    random_state=42)

train = pd.concat([X_train, Y_train], axis=1)
test = pd.concat([X_test, Y_test], axis=1)
valid = pd.concat([X_valid, Y_valid], axis=1)

print short summary of the dataset and its subsets

def namestr(obj, namespace):
  return [name for name in namespace if namespace[name] is obj]

dataset = df_norm
for x in [dataset, train, test, valid]:
  print([q for q in namestr(x, globals()) if len(q) == max([len(w) for w in namestr(x, globals())])][-1]) 
  print("size:", len(x))
  print(x.describe(include='all'))
  print("class distribution", x.value_counts('stabf'))
  print('===============================================================')

script for lab IUM-03

download data

!kaggle datasets download -d 'pcbreviglieri/smart-grid-stability' >>/dev/null 2>&1
!unzip smart-grid-stability.zip >>/dev/null 2>&1

check how many data entries is in the dataset

!wc -l smart_grid_stability_augmented.csv

60001 smart_grid_stability_augmented.csv

take a look at the dataset to choose columns to keep

import pandas as pd
df = pd.read_csv('smart_grid_stability_augmented.csv')
df.head()

	tau1	tau2	tau3	tau4	p1	p2	p3	p4	g1	g2	g3	g4	stab	stabf
0	2.959060	3.079885	8.381025	9.780754	3.763085	-0.782604	-1.257395	-1.723086	0.650456	0.859578	0.887445	0.958034	0.055347	unstable
1	9.304097	4.902524	3.047541	1.369357	5.067812	-1.940058	-1.872742	-1.255012	0.413441	0.862414	0.562139	0.781760	-0.005957	stable
2	8.971707	8.848428	3.046479	1.214518	3.405158	-1.207456	-1.277210	-0.920492	0.163041	0.766689	0.839444	0.109853	0.003471	unstable
3	0.716415	7.669600	4.486641	2.340563	3.963791	-1.027473	-1.938944	-0.997374	0.446209	0.976744	0.929381	0.362718	0.028871	unstable
4	3.134112	7.608772	4.943759	9.857573	3.525811	-1.125531	-1.845975	-0.554305	0.797110	0.455450	0.656947	0.820923	0.049860	unstable

discard some of the columns; shuffle the data; divide into train, test and validations subsets and print number of rows of the subsets

!sed 1d smart_grid_stability_augmented.csv | cut -f 1,5,9,13,14 -d "," | shuf | split -l 48000
!mv xaa train.csv
!mv xab toDivide
!split -l 6000 toDivide
!mv xaa test.csv
!mv xab valid.csv
!wc -l train.csv
!wc -l test.csv
!wc -l valid.csv

48000 train.csv
6000 test.csv
6000 valid.csv

script for lab IUM-05 - Model and training

first run lab IUM-01!!!

import tensorflow as tf
from tensorflow.keras import layers

model = tf.keras.Sequential([
                          layers.Input(shape=(12,)),
                          layers.Dense(32),
                          layers.Dense(16),
                          layers.Dense(2, activation='softmax')
])

model.compile(
    loss=tf.losses.BinaryCrossentropy(),
    optimizer=tf.optimizers.Adam(),
    metrics=[tf.keras.metrics.BinaryAccuracy()])

import numpy as np

def onezero(label):
  return 0 if label == 'unstable' else 1


Y_train_one_zero = [onezero(x) for x in Y_train]
Y_train_onehot = np.eye(2)[Y_train_one_zero]

Y_test_one_zero = [onezero(x) for x in Y_test]
Y_test_onehot = np.eye(2)[Y_test_one_zero]

history = model.fit(tf.convert_to_tensor(X_train, np.float32),
          Y_train_onehot, epochs=5)

Epoch 1/5
1500/1500 [==============================] - 2s 1ms/step - loss: 0.3926 - binary_accuracy: 0.8149
Epoch 2/5
1500/1500 [==============================] - 2s 1ms/step - loss: 0.3925 - binary_accuracy: 0.8161
Epoch 3/5
1500/1500 [==============================] - 2s 1ms/step - loss: 0.3922 - binary_accuracy: 0.8150
Epoch 4/5
1500/1500 [==============================] - 2s 1ms/step - loss: 0.3927 - binary_accuracy: 0.8146
Epoch 5/5
1500/1500 [==============================] - 2s 1ms/step - loss: 0.3926 - binary_accuracy: 0.8143

model.summary()

Model: "sequential_6"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_28 (Dense)             (None, 32)                416       
_________________________________________________________________
dense_29 (Dense)             (None, 16)                528       
_________________________________________________________________
dense_30 (Dense)             (None, 2)                 34        
=================================================================
Total params: 978
Trainable params: 978
Non-trainable params: 0
_________________________________________________________________

results = model.evaluate(X_test, Y_test_onehot, batch_size=64)
print('test loss: ',results[0])
print('test acc: ', results[1])

94/94 [==============================] - 0s 1ms/step - loss: 0.3933 - binary_accuracy: 0.8112
test loss:  0.3933383822441101
test acc:  0.8111666440963745

20 KiB Raw Blame History

Notebook for first substask of Inżynieria Uczenia Maszynowego class project.

google colab related stuff

script for lab IUM-01

script for lab IUM-03

script for lab IUM-05 - Model and training

first run lab IUM-01!!!

20 KiB

Raw Blame History