58 KiB
58 KiB
!pip install kaggle
!pip install pandas
!pip install seaborn
Requirement already satisfied: kaggle in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (1.5.12) Requirement already satisfied: six>=1.10 in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (from kaggle) (1.15.0) Requirement already satisfied: certifi in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (from kaggle) (2021.10.8) Requirement already satisfied: python-dateutil in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (from kaggle) (2.8.1) Requirement already satisfied: requests in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (from kaggle) (2.27.1) Requirement already satisfied: tqdm in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (from kaggle) (4.59.0) Requirement already satisfied: python-slugify in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (from kaggle) (6.1.1) Requirement already satisfied: urllib3 in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (from kaggle) (1.26.9) Requirement already satisfied: text-unidecode>=1.3 in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (from python-slugify->kaggle) (1.3) Requirement already satisfied: idna<4,>=2.5 in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (from requests->kaggle) (3.3) Requirement already satisfied: charset-normalizer~=2.0.0 in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (from requests->kaggle) (2.0.12) Requirement already satisfied: pandas in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (1.4.1) Requirement already satisfied: pytz>=2020.1 in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (from pandas) (2022.1) Requirement already satisfied: python-dateutil>=2.8.1 in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (from pandas) (2.8.1) Requirement already satisfied: numpy>=1.18.5 in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (from pandas) (1.20.1) Requirement already satisfied: six>=1.5 in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (from python-dateutil>=2.8.1->pandas) (1.15.0) Requirement already satisfied: seaborn in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (0.11.2) Requirement already satisfied: pandas>=0.23 in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (from seaborn) (1.4.1) Requirement already satisfied: numpy>=1.15 in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (from seaborn) (1.20.1) Requirement already satisfied: scipy>=1.0 in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (from seaborn) (1.6.1) Requirement already satisfied: matplotlib>=2.2 in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (from seaborn) (3.5.1) Requirement already satisfied: packaging>=20.0 in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (from matplotlib>=2.2->seaborn) (20.9) Requirement already satisfied: pyparsing>=2.2.1 in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (from matplotlib>=2.2->seaborn) (2.4.7) Requirement already satisfied: fonttools>=4.22.0 in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (from matplotlib>=2.2->seaborn) (4.31.1) Requirement already satisfied: python-dateutil>=2.7 in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (from matplotlib>=2.2->seaborn) (2.8.1) Requirement already satisfied: pillow>=6.2.0 in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (from matplotlib>=2.2->seaborn) (9.0.1) Requirement already satisfied: kiwisolver>=1.0.1 in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (from matplotlib>=2.2->seaborn) (1.4.0) Requirement already satisfied: cycler>=0.10 in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (from matplotlib>=2.2->seaborn) (0.11.0) Requirement already satisfied: pytz>=2020.1 in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (from pandas>=0.23->seaborn) (2022.1) Requirement already satisfied: six>=1.5 in c:\users\cgala\appdata\local\programs\python\python39\lib\site-packages (from python-dateutil>=2.7->matplotlib>=2.2->seaborn) (1.15.0)
!kaggle datasets download -d csafrit2/steel-industry-energy-consumption
Downloading steel-industry-energy-consumption.zip to D:\UAM zajecia\IUM\ium_470623
0%| | 0.00/484k [00:00<?, ?B/s] 100%|##########| 484k/484k [00:00<00:00, 3.32MB/s] 100%|##########| 484k/484k [00:00<00:00, 3.29MB/s]
!unzip -o steel-industry-energy-consumption.zip
Archive: steel-industry-energy-consumption.zip inflating: Steel_industry_data.csv
import pandas as pd
energy_data=pd.read_csv('Steel_industry_data.csv')
energy_data
date | Usage_kWh | Lagging_Current_Reactive.Power_kVarh | Leading_Current_Reactive_Power_kVarh | CO2(tCO2) | Lagging_Current_Power_Factor | Leading_Current_Power_Factor | NSM | WeekStatus | Day_of_week | Load_Type | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 01/01/2018 00:15 | 3.17 | 2.95 | 0.00 | 0.0 | 73.21 | 100.00 | 900 | Weekday | Monday | Light_Load |
1 | 01/01/2018 00:30 | 4.00 | 4.46 | 0.00 | 0.0 | 66.77 | 100.00 | 1800 | Weekday | Monday | Light_Load |
2 | 01/01/2018 00:45 | 3.24 | 3.28 | 0.00 | 0.0 | 70.28 | 100.00 | 2700 | Weekday | Monday | Light_Load |
3 | 01/01/2018 01:00 | 3.31 | 3.56 | 0.00 | 0.0 | 68.09 | 100.00 | 3600 | Weekday | Monday | Light_Load |
4 | 01/01/2018 01:15 | 3.82 | 4.50 | 0.00 | 0.0 | 64.72 | 100.00 | 4500 | Weekday | Monday | Light_Load |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
35035 | 31/12/2018 23:00 | 3.85 | 4.86 | 0.00 | 0.0 | 62.10 | 100.00 | 82800 | Weekday | Monday | Light_Load |
35036 | 31/12/2018 23:15 | 3.74 | 3.74 | 0.00 | 0.0 | 70.71 | 100.00 | 83700 | Weekday | Monday | Light_Load |
35037 | 31/12/2018 23:30 | 3.78 | 3.17 | 0.07 | 0.0 | 76.62 | 99.98 | 84600 | Weekday | Monday | Light_Load |
35038 | 31/12/2018 23:45 | 3.78 | 3.06 | 0.11 | 0.0 | 77.72 | 99.96 | 85500 | Weekday | Monday | Light_Load |
35039 | 31/12/2018 00:00 | 3.67 | 3.02 | 0.07 | 0.0 | 77.22 | 99.98 | 0 | Weekday | Monday | Light_Load |
35040 rows × 11 columns
energy_data.describe(include='all')
date | Usage_kWh | Lagging_Current_Reactive.Power_kVarh | Leading_Current_Reactive_Power_kVarh | CO2(tCO2) | Lagging_Current_Power_Factor | Leading_Current_Power_Factor | NSM | WeekStatus | Day_of_week | Load_Type | |
---|---|---|---|---|---|---|---|---|---|---|---|
count | 35040 | 35040.000000 | 35040.000000 | 35040.000000 | 35040.000000 | 35040.000000 | 35040.000000 | 35040.000000 | 35040 | 35040 | 35040 |
unique | 35040 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2 | 7 | 3 |
top | 01/01/2018 00:15 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Weekday | Monday | Light_Load |
freq | 1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 25056 | 5088 | 18072 |
mean | NaN | 27.386892 | 13.035384 | 3.870949 | 0.011524 | 80.578056 | 84.367870 | 42750.000000 | NaN | NaN | NaN |
std | NaN | 33.444380 | 16.306000 | 7.424463 | 0.016151 | 18.921322 | 30.456535 | 24940.534317 | NaN | NaN | NaN |
min | NaN | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | NaN | NaN | NaN |
25% | NaN | 3.200000 | 2.300000 | 0.000000 | 0.000000 | 63.320000 | 99.700000 | 21375.000000 | NaN | NaN | NaN |
50% | NaN | 4.570000 | 5.000000 | 0.000000 | 0.000000 | 87.960000 | 100.000000 | 42750.000000 | NaN | NaN | NaN |
75% | NaN | 51.237500 | 22.640000 | 2.090000 | 0.020000 | 99.022500 | 100.000000 | 64125.000000 | NaN | NaN | NaN |
max | NaN | 157.180000 | 96.910000 | 27.760000 | 0.070000 | 100.000000 | 100.000000 | 85500.000000 | NaN | NaN | NaN |
from sklearn.model_selection import train_test_split
train_data, test_data = train_test_split(energy_data, test_size=3504, random_state=1)
test_data, dev_data = train_test_split(test_data, test_size=1752, random_state=1)
print('Training set size:')
print(train_data.shape)
print('Testing set size:')
print(test_data.shape)
print('Dev set size:')
print(dev_data.shape)
Training set size: (31536, 11) Testing set size: (1752, 11) Dev set size: (1752, 11)
train_data.describe(include='all')
date | Usage_kWh | Lagging_Current_Reactive.Power_kVarh | Leading_Current_Reactive_Power_kVarh | CO2(tCO2) | Lagging_Current_Power_Factor | Leading_Current_Power_Factor | NSM | WeekStatus | Day_of_week | Load_Type | |
---|---|---|---|---|---|---|---|---|---|---|---|
count | 31536 | 31536.000000 | 31536.000000 | 31536.000000 | 31536.000000 | 31536.000000 | 31536.000000 | 31536.000000 | 31536 | 31536 | 31536 |
unique | 31536 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2 | 7 | 3 |
top | 30/01/2018 00:15 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Weekday | Monday | Light_Load |
freq | 1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 22514 | 4560 | 16280 |
mean | NaN | 27.369449 | 13.037946 | 3.866059 | 0.011513 | 80.525058 | 84.410086 | 42707.363014 | NaN | NaN | NaN |
std | NaN | 33.473304 | 16.302910 | 7.434250 | 0.016159 | 18.929571 | 30.436675 | 24968.193911 | NaN | NaN | NaN |
min | NaN | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | NaN | NaN | NaN |
25% | NaN | 3.200000 | 2.330000 | 0.000000 | 0.000000 | 63.200000 | 99.720000 | 20700.000000 | NaN | NaN | NaN |
50% | NaN | 4.570000 | 5.000000 | 0.000000 | 0.000000 | 87.900000 | 100.000000 | 42300.000000 | NaN | NaN | NaN |
75% | NaN | 51.230000 | 22.650000 | 1.980000 | 0.020000 | 98.970000 | 100.000000 | 63900.000000 | NaN | NaN | NaN |
max | NaN | 157.180000 | 96.910000 | 27.760000 | 0.070000 | 100.000000 | 100.000000 | 85500.000000 | NaN | NaN | NaN |
test_data.describe(include='all')
date | Usage_kWh | Lagging_Current_Reactive.Power_kVarh | Leading_Current_Reactive_Power_kVarh | CO2(tCO2) | Lagging_Current_Power_Factor | Leading_Current_Power_Factor | NSM | WeekStatus | Day_of_week | Load_Type | |
---|---|---|---|---|---|---|---|---|---|---|---|
count | 1752 | 1752.000000 | 1752.000000 | 1752.000000 | 1752.000000 | 1752.000000 | 1752.000000 | 1752.000000 | 1752 | 1752 | 1752 |
unique | 1752 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2 | 7 | 3 |
top | 07/05/2018 06:00 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Weekday | Tuesday | Light_Load |
freq | 1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1268 | 291 | 898 |
mean | NaN | 27.330982 | 12.649024 | 3.949281 | 0.011530 | 81.364526 | 83.630702 | 43080.821918 | NaN | NaN | NaN |
std | NaN | 33.484216 | 16.185283 | 7.298637 | 0.016224 | 18.758338 | 30.801180 | 24944.325392 | NaN | NaN | NaN |
min | NaN | 2.480000 | 0.000000 | 0.000000 | 0.000000 | 41.120000 | 12.540000 | 0.000000 | NaN | NaN | NaN |
25% | NaN | 3.200000 | 1.392500 | 0.000000 | 0.000000 | 64.630000 | 99.180000 | 21600.000000 | NaN | NaN | NaN |
50% | NaN | 4.570000 | 4.930000 | 0.000000 | 0.000000 | 88.955000 | 100.000000 | 43200.000000 | NaN | NaN | NaN |
75% | NaN | 49.870000 | 21.240000 | 3.837500 | 0.020000 | 99.852500 | 100.000000 | 64800.000000 | NaN | NaN | NaN |
max | NaN | 143.930000 | 87.700000 | 27.540000 | 0.070000 | 100.000000 | 100.000000 | 85500.000000 | NaN | NaN | NaN |
dev_data.describe(include='all')
date | Usage_kWh | Lagging_Current_Reactive.Power_kVarh | Leading_Current_Reactive_Power_kVarh | CO2(tCO2) | Lagging_Current_Power_Factor | Leading_Current_Power_Factor | NSM | WeekStatus | Day_of_week | Load_Type | |
---|---|---|---|---|---|---|---|---|---|---|---|
count | 1752 | 1752.000000 | 1752.000000 | 1752.000000 | 1752.000000 | 1752.000000 | 1752.000000 | 1752.000000 | 1752 | 1752 | 1752 |
unique | 1752 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2 | 7 | 3 |
top | 02/06/2018 02:00 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Weekday | Monday | Light_Load |
freq | 1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1274 | 275 | 894 |
mean | NaN | 27.756787 | 13.375628 | 3.880634 | 0.011729 | 80.745548 | 84.345154 | 43186.643836 | NaN | NaN | NaN |
std | NaN | 32.895802 | 16.482148 | 7.376468 | 0.015943 | 18.927378 | 30.475427 | 24440.888112 | NaN | NaN | NaN |
min | NaN | 2.520000 | 0.000000 | 0.000000 | 0.000000 | 38.330000 | 14.070000 | 0.000000 | NaN | NaN | NaN |
25% | NaN | 3.200000 | 2.270000 | 0.000000 | 0.000000 | 63.942500 | 99.690000 | 22500.000000 | NaN | NaN | NaN |
50% | NaN | 4.680000 | 5.110000 | 0.000000 | 0.000000 | 87.940000 | 100.000000 | 43200.000000 | NaN | NaN | NaN |
75% | NaN | 52.187500 | 24.050000 | 2.177500 | 0.020000 | 99.030000 | 100.000000 | 63900.000000 | NaN | NaN | NaN |
max | NaN | 139.030000 | 80.750000 | 27.580000 | 0.060000 | 100.000000 | 100.000000 | 85500.000000 | NaN | NaN | NaN |