ium_487194/ium_lab2.ipynb
2023-03-21 22:38:55 +01:00

132 KiB
Raw Blame History

#Pobieranie odpowiednich bibliotek
!pip install kaggle
!pip install pandas
!pip install unzip
!pip install scikit-learn
!pip install seaborn
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: kaggle in /home/witek/.local/lib/python3.10/site-packages (1.5.13)
Requirement already satisfied: python-slugify in /home/witek/.local/lib/python3.10/site-packages (from kaggle) (8.0.1)
Requirement already satisfied: requests in /usr/lib/python3/dist-packages (from kaggle) (2.25.1)
Requirement already satisfied: certifi in /usr/lib/python3/dist-packages (from kaggle) (2020.6.20)
Requirement already satisfied: six>=1.10 in /usr/lib/python3/dist-packages (from kaggle) (1.16.0)
Requirement already satisfied: python-dateutil in /home/witek/.local/lib/python3.10/site-packages (from kaggle) (2.8.2)
Requirement already satisfied: urllib3 in /usr/lib/python3/dist-packages (from kaggle) (1.26.5)
Requirement already satisfied: tqdm in /home/witek/.local/lib/python3.10/site-packages (from kaggle) (4.65.0)
Requirement already satisfied: text-unidecode>=1.3 in /home/witek/.local/lib/python3.10/site-packages (from python-slugify->kaggle) (1.3)
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: pandas in /home/witek/.local/lib/python3.10/site-packages (1.5.3)
Requirement already satisfied: pytz>=2020.1 in /usr/lib/python3/dist-packages (from pandas) (2022.1)
Requirement already satisfied: python-dateutil>=2.8.1 in /home/witek/.local/lib/python3.10/site-packages (from pandas) (2.8.2)
Requirement already satisfied: numpy>=1.21.0 in /home/witek/.local/lib/python3.10/site-packages (from pandas) (1.24.2)
Requirement already satisfied: six>=1.5 in /usr/lib/python3/dist-packages (from python-dateutil>=2.8.1->pandas) (1.16.0)
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: unzip in /home/witek/.local/lib/python3.10/site-packages (1.0.0)
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: scikit-learn in /home/witek/.local/lib/python3.10/site-packages (1.2.2)
Requirement already satisfied: joblib>=1.1.1 in /home/witek/.local/lib/python3.10/site-packages (from scikit-learn) (1.2.0)
Requirement already satisfied: scipy>=1.3.2 in /home/witek/.local/lib/python3.10/site-packages (from scikit-learn) (1.10.1)
Requirement already satisfied: numpy>=1.17.3 in /home/witek/.local/lib/python3.10/site-packages (from scikit-learn) (1.24.2)
Requirement already satisfied: threadpoolctl>=2.0.0 in /home/witek/.local/lib/python3.10/site-packages (from scikit-learn) (3.1.0)
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: seaborn in /home/witek/.local/lib/python3.10/site-packages (0.12.2)
Requirement already satisfied: pandas>=0.25 in /home/witek/.local/lib/python3.10/site-packages (from seaborn) (1.5.3)
Requirement already satisfied: matplotlib!=3.6.1,>=3.1 in /home/witek/.local/lib/python3.10/site-packages (from seaborn) (3.7.1)
Requirement already satisfied: numpy!=1.24.0,>=1.17 in /home/witek/.local/lib/python3.10/site-packages (from seaborn) (1.24.2)
Requirement already satisfied: contourpy>=1.0.1 in /home/witek/.local/lib/python3.10/site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (1.0.7)
Requirement already satisfied: kiwisolver>=1.0.1 in /home/witek/.local/lib/python3.10/site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (1.4.4)
Requirement already satisfied: pyparsing>=2.3.1 in /usr/lib/python3/dist-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (2.4.7)
Requirement already satisfied: cycler>=0.10 in /home/witek/.local/lib/python3.10/site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (0.11.0)
Requirement already satisfied: python-dateutil>=2.7 in /home/witek/.local/lib/python3.10/site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (2.8.2)
Requirement already satisfied: pillow>=6.2.0 in /usr/lib/python3/dist-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (9.0.1)
Requirement already satisfied: packaging>=20.0 in /home/witek/.local/lib/python3.10/site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (23.0)
Requirement already satisfied: fonttools>=4.22.0 in /home/witek/.local/lib/python3.10/site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (4.39.2)
Requirement already satisfied: pytz>=2020.1 in /usr/lib/python3/dist-packages (from pandas>=0.25->seaborn) (2022.1)
Requirement already satisfied: six>=1.5 in /usr/lib/python3/dist-packages (from python-dateutil>=2.7->matplotlib!=3.6.1,>=3.1->seaborn) (1.16.0)
!kaggle datasets download -d sadiqshah/bike-sales-in-europe
Warning: Your Kaggle API key is readable by other users on this system! To fix this, you can run 'chmod 600 /home/witek/.kaggle/kaggle.json'
Downloading bike-sales-in-europe.zip to /home/witek/python-ws
  0%|                                               | 0.00/1.15M [00:00<?, ?B/s]
100%|██████████████████████████████████████| 1.15M/1.15M [00:00<00:00, 18.2MB/s]
!unzip -o bike-sales-in-europe.zip
Archive:  bike-sales-in-europe.zip
  inflating: Sales.csv               
import pandas as pd
import seaborn as sns
bikes = pd.read_csv('Sales.csv')
bikes
Date Day Month Year Customer_Age Age_Group Customer_Gender Country State Product_Category Sub_Category Product Order_Quantity Unit_Cost Unit_Price Profit Cost Revenue
0 2013-11-26 26 November 2013 19 Youth (<25) M Canada British Columbia Accessories Bike Racks Hitch Rack - 4-Bike 8 45 120 590 360 950
1 2015-11-26 26 November 2015 19 Youth (<25) M Canada British Columbia Accessories Bike Racks Hitch Rack - 4-Bike 8 45 120 590 360 950
2 2014-03-23 23 March 2014 49 Adults (35-64) M Australia New South Wales Accessories Bike Racks Hitch Rack - 4-Bike 23 45 120 1366 1035 2401
3 2016-03-23 23 March 2016 49 Adults (35-64) M Australia New South Wales Accessories Bike Racks Hitch Rack - 4-Bike 20 45 120 1188 900 2088
4 2014-05-15 15 May 2014 47 Adults (35-64) F Australia New South Wales Accessories Bike Racks Hitch Rack - 4-Bike 4 45 120 238 180 418
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
113031 2016-04-12 12 April 2016 41 Adults (35-64) M United Kingdom England Clothing Vests Classic Vest, S 3 24 64 112 72 184
113032 2014-04-02 2 April 2014 18 Youth (<25) M Australia Queensland Clothing Vests Classic Vest, M 22 24 64 655 528 1183
113033 2016-04-02 2 April 2016 18 Youth (<25) M Australia Queensland Clothing Vests Classic Vest, M 22 24 64 655 528 1183
113034 2014-03-04 4 March 2014 37 Adults (35-64) F France Seine (Paris) Clothing Vests Classic Vest, L 24 24 64 684 576 1260
113035 2016-03-04 4 March 2016 37 Adults (35-64) F France Seine (Paris) Clothing Vests Classic Vest, L 23 24 64 655 552 1207

113036 rows × 18 columns

bikes.isnull().sum()
#Zbiór jest już wyczyszczony z artefaktów
Date                0
Day                 0
Month               0
Year                0
Customer_Age        0
Age_Group           0
Customer_Gender     0
Country             0
State               0
Product_Category    0
Sub_Category        0
Product             0
Order_Quantity      0
Unit_Cost           0
Unit_Price          0
Profit              0
Cost                0
Revenue             0
dtype: int64
#Normalizacja danych poprzez ustawienie małych liter w zbiorze
bikes['Month'] = bikes['Month'].str.lower()
bikes['Age_Group'] = bikes['Age_Group'].str.lower()
bikes['Country'] = bikes['Country'].str.lower()
bikes['State'] = bikes['State'].str.lower()
bikes['Product_Category'] = bikes['Product_Category'].str.lower()
bikes['Sub_Category'] = bikes['Sub_Category'].str.lower()
bikes['Product'] = bikes['Product'].str.lower()
#Podział na zbiory
from sklearn.model_selection import train_test_split
bikes_train, bikes_test = train_test_split(bikes, test_size=0.2, random_state=1)
bikes_test
Date Day Month Year Customer_Age Age_Group Customer_Gender Country State Product_Category Sub_Category Product Order_Quantity Unit_Cost Unit_Price Profit Cost Revenue
31242 2013-09-02 2 september 2013 25 young adults (25-34) F australia queensland accessories helmets sport-100 helmet, red 11 13 35 180 143 323
76421 2015-10-06 6 october 2015 29 young adults (25-34) M australia queensland accessories tires and tubes ll mountain tire 30 9 25 360 270 630
63417 2016-05-04 4 may 2016 44 adults (35-64) F united states oregon bikes road bikes road-750 black, 44 1 344 540 120 344 464
13214 2013-11-23 23 november 2013 42 adults (35-64) F united states washington accessories bottles and cages mountain bottle cage 29 4 10 110 116 226
17882 2013-12-25 25 december 2013 46 adults (35-64) F germany nordrhein-westfalen clothing caps awc logo cap 19 7 9 16 133 149
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
36385 2016-06-29 29 june 2016 40 adults (35-64) F australia new south wales accessories helmets sport-100 helmet, red 1 13 35 17 13 30
11506 2014-03-04 4 march 2014 44 adults (35-64) F united states california accessories bottles and cages water bottle - 30 oz. 20 2 5 58 40 98
52187 2015-12-18 18 december 2015 23 youth (<25) M united kingdom england bikes mountain bikes mountain-400-w silver, 46 1 420 769 318 420 738
83391 2015-12-12 12 december 2015 26 young adults (25-34) F australia victoria accessories tires and tubes ml road tire 22 9 25 237 198 435
112433 2015-09-17 17 september 2015 32 young adults (25-34) M germany hamburg clothing vests classic vest, l 31 24 64 1101 744 1845

22608 rows × 18 columns

bikes_train
Date Day Month Year Customer_Age Age_Group Customer_Gender Country State Product_Category Sub_Category Product Order_Quantity Unit_Cost Unit_Price Profit Cost Revenue
47030 2014-01-10 10 january 2014 23 youth (<25) F france loiret clothing jerseys short-sleeve classic jersey, l 2 42 54 12 84 96
36579 2016-05-04 4 may 2016 34 young adults (25-34) F united states california accessories helmets sport-100 helmet, black 14 13 35 298 182 480
88485 2016-01-06 6 january 2016 34 young adults (25-34) M france loiret accessories tires and tubes touring tire tube 20 2 5 49 40 89
12816 2014-07-15 15 july 2014 40 adults (35-64) M germany bayern accessories bottles and cages water bottle - 30 oz. 6 2 5 18 12 30
109397 2015-11-29 29 november 2015 22 youth (<25) F australia queensland bikes touring bikes touring-2000 blue, 46 1 755 1215 266 755 1021
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
50057 2015-11-16 16 november 2015 24 youth (<25) M united states washington bikes mountain bikes mountain-200 silver, 38 3 1266 2320 1631 3798 5429
98047 2013-09-10 10 september 2013 28 young adults (25-34) M australia new south wales accessories tires and tubes ml road tire 12 9 25 153 108 261
5192 2016-05-26 26 may 2016 33 young adults (25-34) M australia new south wales accessories bottles and cages water bottle - 30 oz. 15 2 5 35 30 65
77708 2013-11-11 11 november 2013 63 adults (35-64) M united states california accessories tires and tubes hl mountain tire 21 13 35 447 273 720
98539 2016-04-14 14 april 2016 46 adults (35-64) M united states washington accessories tires and tubes hl road tire 22 12 33 302 264 566

90428 rows × 18 columns

bikes["Age_Group"].value_counts()
adults (35-64)          55824
young adults (25-34)    38654
youth (<25)             17828
seniors (64+)             730
Name: Age_Group, dtype: int64
bikes["Age_Group"].value_counts().plot(kind="bar")
<Axes: >
 bikes[["Year","Profit"]].groupby("Year").mean()
Profit
Year
2011 1076.317146
2012 1102.724318
2013 243.800188
2014 199.472311
2015 308.004868
2016 239.334240
 bikes[["Year","Profit"]].groupby("Year").mean().plot(kind="bar")
<Axes: xlabel='Year'>
sns.set_theme()
sns.relplot(data=bikes, x="Unit_Cost", y="Unit_Price", hue="Unit_Cost")
<seaborn.axisgrid.FacetGrid at 0x7f807fceafb0>