ium_470623/IUM_dane02.ipynb
Cezary Gałązkiewicz a4e1a1f930 Zad 02.Dane
2022-03-21 10:35:52 +01:00

58 KiB
Raw Blame History

!pip install kaggle
!pip install pandas
!pip install seaborn
Requirement already satisfied: kaggle in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (1.5.12)
Requirement already satisfied: six>=1.10 in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (from kaggle) (1.15.0)
Requirement already satisfied: certifi in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (from kaggle) (2021.10.8)
Requirement already satisfied: python-dateutil in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (from kaggle) (2.8.1)
Requirement already satisfied: requests in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (from kaggle) (2.27.1)
Requirement already satisfied: tqdm in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (from kaggle) (4.59.0)
Requirement already satisfied: python-slugify in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (from kaggle) (6.1.1)
Requirement already satisfied: urllib3 in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (from kaggle) (1.26.9)
Requirement already satisfied: text-unidecode>=1.3 in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (from python-slugify->kaggle) (1.3)
Requirement already satisfied: idna<4,>=2.5 in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (from requests->kaggle) (3.3)
Requirement already satisfied: charset-normalizer~=2.0.0 in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (from requests->kaggle) (2.0.12)
Requirement already satisfied: pandas in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (1.4.1)
Requirement already satisfied: pytz>=2020.1 in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (from pandas) (2022.1)
Requirement already satisfied: python-dateutil>=2.8.1 in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (from pandas) (2.8.1)
Requirement already satisfied: numpy>=1.18.5 in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (from pandas) (1.20.1)
Requirement already satisfied: six>=1.5 in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (from python-dateutil>=2.8.1->pandas) (1.15.0)
Requirement already satisfied: seaborn in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (0.11.2)
Requirement already satisfied: pandas>=0.23 in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (from seaborn) (1.4.1)
Requirement already satisfied: numpy>=1.15 in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (from seaborn) (1.20.1)
Requirement already satisfied: scipy>=1.0 in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (from seaborn) (1.6.1)
Requirement already satisfied: matplotlib>=2.2 in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (from seaborn) (3.5.1)
Requirement already satisfied: packaging>=20.0 in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (from matplotlib>=2.2->seaborn) (20.9)
Requirement already satisfied: pyparsing>=2.2.1 in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (from matplotlib>=2.2->seaborn) (2.4.7)
Requirement already satisfied: fonttools>=4.22.0 in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (from matplotlib>=2.2->seaborn) (4.31.1)
Requirement already satisfied: python-dateutil>=2.7 in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (from matplotlib>=2.2->seaborn) (2.8.1)
Requirement already satisfied: pillow>=6.2.0 in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (from matplotlib>=2.2->seaborn) (9.0.1)
Requirement already satisfied: kiwisolver>=1.0.1 in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (from matplotlib>=2.2->seaborn) (1.4.0)
Requirement already satisfied: cycler>=0.10 in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (from matplotlib>=2.2->seaborn) (0.11.0)
Requirement already satisfied: pytz>=2020.1 in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (from pandas>=0.23->seaborn) (2022.1)
Requirement already satisfied: six>=1.5 in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (from python-dateutil>=2.7->matplotlib>=2.2->seaborn) (1.15.0)
!kaggle datasets download -d csafrit2/steel-industry-energy-consumption
Downloading steel-industry-energy-consumption.zip to D:\UAM zajecia\IUM\ium_470623

  0%|          | 0.00/484k [00:00<?, ?B/s]
100%|##########| 484k/484k [00:00<00:00, 3.32MB/s]
100%|##########| 484k/484k [00:00<00:00, 3.29MB/s]
!unzip -o steel-industry-energy-consumption.zip
Archive:  steel-industry-energy-consumption.zip
  inflating: Steel_industry_data.csv  
import pandas as pd
energy_data=pd.read_csv('Steel_industry_data.csv')
energy_data
date Usage_kWh Lagging_Current_Reactive.Power_kVarh Leading_Current_Reactive_Power_kVarh CO2(tCO2) Lagging_Current_Power_Factor Leading_Current_Power_Factor NSM WeekStatus Day_of_week Load_Type
0 01/01/2018 00:15 3.17 2.95 0.00 0.0 73.21 100.00 900 Weekday Monday Light_Load
1 01/01/2018 00:30 4.00 4.46 0.00 0.0 66.77 100.00 1800 Weekday Monday Light_Load
2 01/01/2018 00:45 3.24 3.28 0.00 0.0 70.28 100.00 2700 Weekday Monday Light_Load
3 01/01/2018 01:00 3.31 3.56 0.00 0.0 68.09 100.00 3600 Weekday Monday Light_Load
4 01/01/2018 01:15 3.82 4.50 0.00 0.0 64.72 100.00 4500 Weekday Monday Light_Load
... ... ... ... ... ... ... ... ... ... ... ...
35035 31/12/2018 23:00 3.85 4.86 0.00 0.0 62.10 100.00 82800 Weekday Monday Light_Load
35036 31/12/2018 23:15 3.74 3.74 0.00 0.0 70.71 100.00 83700 Weekday Monday Light_Load
35037 31/12/2018 23:30 3.78 3.17 0.07 0.0 76.62 99.98 84600 Weekday Monday Light_Load
35038 31/12/2018 23:45 3.78 3.06 0.11 0.0 77.72 99.96 85500 Weekday Monday Light_Load
35039 31/12/2018 00:00 3.67 3.02 0.07 0.0 77.22 99.98 0 Weekday Monday Light_Load

35040 rows × 11 columns

energy_data.describe(include='all')
date Usage_kWh Lagging_Current_Reactive.Power_kVarh Leading_Current_Reactive_Power_kVarh CO2(tCO2) Lagging_Current_Power_Factor Leading_Current_Power_Factor NSM WeekStatus Day_of_week Load_Type
count 35040 35040.000000 35040.000000 35040.000000 35040.000000 35040.000000 35040.000000 35040.000000 35040 35040 35040
unique 35040 NaN NaN NaN NaN NaN NaN NaN 2 7 3
top 01/01/2018 00:15 NaN NaN NaN NaN NaN NaN NaN Weekday Monday Light_Load
freq 1 NaN NaN NaN NaN NaN NaN NaN 25056 5088 18072
mean NaN 27.386892 13.035384 3.870949 0.011524 80.578056 84.367870 42750.000000 NaN NaN NaN
std NaN 33.444380 16.306000 7.424463 0.016151 18.921322 30.456535 24940.534317 NaN NaN NaN
min NaN 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 NaN NaN NaN
25% NaN 3.200000 2.300000 0.000000 0.000000 63.320000 99.700000 21375.000000 NaN NaN NaN
50% NaN 4.570000 5.000000 0.000000 0.000000 87.960000 100.000000 42750.000000 NaN NaN NaN
75% NaN 51.237500 22.640000 2.090000 0.020000 99.022500 100.000000 64125.000000 NaN NaN NaN
max NaN 157.180000 96.910000 27.760000 0.070000 100.000000 100.000000 85500.000000 NaN NaN NaN
from sklearn.model_selection import train_test_split

train_data, test_data  = train_test_split(energy_data, test_size=3504, random_state=1)
test_data, dev_data = train_test_split(test_data, test_size=1752, random_state=1)
print('Training set size:')
print(train_data.shape)
print('Testing set size:')
print(test_data.shape)
print('Dev set size:')
print(dev_data.shape)
Training set size:
(31536, 11)
Testing set size:
(1752, 11)
Dev set size:
(1752, 11)
train_data.describe(include='all')
date Usage_kWh Lagging_Current_Reactive.Power_kVarh Leading_Current_Reactive_Power_kVarh CO2(tCO2) Lagging_Current_Power_Factor Leading_Current_Power_Factor NSM WeekStatus Day_of_week Load_Type
count 31536 31536.000000 31536.000000 31536.000000 31536.000000 31536.000000 31536.000000 31536.000000 31536 31536 31536
unique 31536 NaN NaN NaN NaN NaN NaN NaN 2 7 3
top 30/01/2018 00:15 NaN NaN NaN NaN NaN NaN NaN Weekday Monday Light_Load
freq 1 NaN NaN NaN NaN NaN NaN NaN 22514 4560 16280
mean NaN 27.369449 13.037946 3.866059 0.011513 80.525058 84.410086 42707.363014 NaN NaN NaN
std NaN 33.473304 16.302910 7.434250 0.016159 18.929571 30.436675 24968.193911 NaN NaN NaN
min NaN 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 NaN NaN NaN
25% NaN 3.200000 2.330000 0.000000 0.000000 63.200000 99.720000 20700.000000 NaN NaN NaN
50% NaN 4.570000 5.000000 0.000000 0.000000 87.900000 100.000000 42300.000000 NaN NaN NaN
75% NaN 51.230000 22.650000 1.980000 0.020000 98.970000 100.000000 63900.000000 NaN NaN NaN
max NaN 157.180000 96.910000 27.760000 0.070000 100.000000 100.000000 85500.000000 NaN NaN NaN
test_data.describe(include='all')
date Usage_kWh Lagging_Current_Reactive.Power_kVarh Leading_Current_Reactive_Power_kVarh CO2(tCO2) Lagging_Current_Power_Factor Leading_Current_Power_Factor NSM WeekStatus Day_of_week Load_Type
count 1752 1752.000000 1752.000000 1752.000000 1752.000000 1752.000000 1752.000000 1752.000000 1752 1752 1752
unique 1752 NaN NaN NaN NaN NaN NaN NaN 2 7 3
top 07/05/2018 06:00 NaN NaN NaN NaN NaN NaN NaN Weekday Tuesday Light_Load
freq 1 NaN NaN NaN NaN NaN NaN NaN 1268 291 898
mean NaN 27.330982 12.649024 3.949281 0.011530 81.364526 83.630702 43080.821918 NaN NaN NaN
std NaN 33.484216 16.185283 7.298637 0.016224 18.758338 30.801180 24944.325392 NaN NaN NaN
min NaN 2.480000 0.000000 0.000000 0.000000 41.120000 12.540000 0.000000 NaN NaN NaN
25% NaN 3.200000 1.392500 0.000000 0.000000 64.630000 99.180000 21600.000000 NaN NaN NaN
50% NaN 4.570000 4.930000 0.000000 0.000000 88.955000 100.000000 43200.000000 NaN NaN NaN
75% NaN 49.870000 21.240000 3.837500 0.020000 99.852500 100.000000 64800.000000 NaN NaN NaN
max NaN 143.930000 87.700000 27.540000 0.070000 100.000000 100.000000 85500.000000 NaN NaN NaN
dev_data.describe(include='all')
date Usage_kWh Lagging_Current_Reactive.Power_kVarh Leading_Current_Reactive_Power_kVarh CO2(tCO2) Lagging_Current_Power_Factor Leading_Current_Power_Factor NSM WeekStatus Day_of_week Load_Type
count 1752 1752.000000 1752.000000 1752.000000 1752.000000 1752.000000 1752.000000 1752.000000 1752 1752 1752
unique 1752 NaN NaN NaN NaN NaN NaN NaN 2 7 3
top 02/06/2018 02:00 NaN NaN NaN NaN NaN NaN NaN Weekday Monday Light_Load
freq 1 NaN NaN NaN NaN NaN NaN NaN 1274 275 894
mean NaN 27.756787 13.375628 3.880634 0.011729 80.745548 84.345154 43186.643836 NaN NaN NaN
std NaN 32.895802 16.482148 7.376468 0.015943 18.927378 30.475427 24440.888112 NaN NaN NaN
min NaN 2.520000 0.000000 0.000000 0.000000 38.330000 14.070000 0.000000 NaN NaN NaN
25% NaN 3.200000 2.270000 0.000000 0.000000 63.942500 99.690000 22500.000000 NaN NaN NaN
50% NaN 4.680000 5.110000 0.000000 0.000000 87.940000 100.000000 43200.000000 NaN NaN NaN
75% NaN 52.187500 24.050000 2.177500 0.020000 99.030000 100.000000 63900.000000 NaN NaN NaN
max NaN 139.030000 80.750000 27.580000 0.060000 100.000000 100.000000 85500.000000 NaN NaN NaN