132 KiB
132 KiB
#Pobieranie odpowiednich bibliotek
!pip install kaggle
!pip install pandas
!pip install unzip
!pip install scikit-learn
!pip install seaborn
Defaulting to user installation because normal site-packages is not writeable Requirement already satisfied: kaggle in /home/witek/.local/lib/python3.10/site-packages (1.5.13) Requirement already satisfied: python-slugify in /home/witek/.local/lib/python3.10/site-packages (from kaggle) (8.0.1) Requirement already satisfied: requests in /usr/lib/python3/dist-packages (from kaggle) (2.25.1) Requirement already satisfied: certifi in /usr/lib/python3/dist-packages (from kaggle) (2020.6.20) Requirement already satisfied: six>=1.10 in /usr/lib/python3/dist-packages (from kaggle) (1.16.0) Requirement already satisfied: python-dateutil in /home/witek/.local/lib/python3.10/site-packages (from kaggle) (2.8.2) Requirement already satisfied: urllib3 in /usr/lib/python3/dist-packages (from kaggle) (1.26.5) Requirement already satisfied: tqdm in /home/witek/.local/lib/python3.10/site-packages (from kaggle) (4.65.0) Requirement already satisfied: text-unidecode>=1.3 in /home/witek/.local/lib/python3.10/site-packages (from python-slugify->kaggle) (1.3) Defaulting to user installation because normal site-packages is not writeable Requirement already satisfied: pandas in /home/witek/.local/lib/python3.10/site-packages (1.5.3) Requirement already satisfied: pytz>=2020.1 in /usr/lib/python3/dist-packages (from pandas) (2022.1) Requirement already satisfied: python-dateutil>=2.8.1 in /home/witek/.local/lib/python3.10/site-packages (from pandas) (2.8.2) Requirement already satisfied: numpy>=1.21.0 in /home/witek/.local/lib/python3.10/site-packages (from pandas) (1.24.2) Requirement already satisfied: six>=1.5 in /usr/lib/python3/dist-packages (from python-dateutil>=2.8.1->pandas) (1.16.0) Defaulting to user installation because normal site-packages is not writeable Requirement already satisfied: unzip in /home/witek/.local/lib/python3.10/site-packages (1.0.0) Defaulting to user installation because normal site-packages is not writeable Requirement already satisfied: scikit-learn in /home/witek/.local/lib/python3.10/site-packages (1.2.2) Requirement already satisfied: joblib>=1.1.1 in /home/witek/.local/lib/python3.10/site-packages (from scikit-learn) (1.2.0) Requirement already satisfied: scipy>=1.3.2 in /home/witek/.local/lib/python3.10/site-packages (from scikit-learn) (1.10.1) Requirement already satisfied: numpy>=1.17.3 in /home/witek/.local/lib/python3.10/site-packages (from scikit-learn) (1.24.2) Requirement already satisfied: threadpoolctl>=2.0.0 in /home/witek/.local/lib/python3.10/site-packages (from scikit-learn) (3.1.0) Defaulting to user installation because normal site-packages is not writeable Requirement already satisfied: seaborn in /home/witek/.local/lib/python3.10/site-packages (0.12.2) Requirement already satisfied: pandas>=0.25 in /home/witek/.local/lib/python3.10/site-packages (from seaborn) (1.5.3) Requirement already satisfied: matplotlib!=3.6.1,>=3.1 in /home/witek/.local/lib/python3.10/site-packages (from seaborn) (3.7.1) Requirement already satisfied: numpy!=1.24.0,>=1.17 in /home/witek/.local/lib/python3.10/site-packages (from seaborn) (1.24.2) Requirement already satisfied: contourpy>=1.0.1 in /home/witek/.local/lib/python3.10/site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (1.0.7) Requirement already satisfied: kiwisolver>=1.0.1 in /home/witek/.local/lib/python3.10/site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (1.4.4) Requirement already satisfied: pyparsing>=2.3.1 in /usr/lib/python3/dist-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (2.4.7) Requirement already satisfied: cycler>=0.10 in /home/witek/.local/lib/python3.10/site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (0.11.0) Requirement already satisfied: python-dateutil>=2.7 in /home/witek/.local/lib/python3.10/site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (2.8.2) Requirement already satisfied: pillow>=6.2.0 in /usr/lib/python3/dist-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (9.0.1) Requirement already satisfied: packaging>=20.0 in /home/witek/.local/lib/python3.10/site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (23.0) Requirement already satisfied: fonttools>=4.22.0 in /home/witek/.local/lib/python3.10/site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (4.39.2) Requirement already satisfied: pytz>=2020.1 in /usr/lib/python3/dist-packages (from pandas>=0.25->seaborn) (2022.1) Requirement already satisfied: six>=1.5 in /usr/lib/python3/dist-packages (from python-dateutil>=2.7->matplotlib!=3.6.1,>=3.1->seaborn) (1.16.0)
!kaggle datasets download -d sadiqshah/bike-sales-in-europe
Warning: Your Kaggle API key is readable by other users on this system! To fix this, you can run 'chmod 600 /home/witek/.kaggle/kaggle.json' Downloading bike-sales-in-europe.zip to /home/witek/python-ws 0%| | 0.00/1.15M [00:00<?, ?B/s] 100%|██████████████████████████████████████| 1.15M/1.15M [00:00<00:00, 18.2MB/s]
!unzip -o bike-sales-in-europe.zip
Archive: bike-sales-in-europe.zip inflating: Sales.csv
import pandas as pd
import seaborn as sns
bikes = pd.read_csv('Sales.csv')
bikes
Date | Day | Month | Year | Customer_Age | Age_Group | Customer_Gender | Country | State | Product_Category | Sub_Category | Product | Order_Quantity | Unit_Cost | Unit_Price | Profit | Cost | Revenue | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2013-11-26 | 26 | November | 2013 | 19 | Youth (<25) | M | Canada | British Columbia | Accessories | Bike Racks | Hitch Rack - 4-Bike | 8 | 45 | 120 | 590 | 360 | 950 |
1 | 2015-11-26 | 26 | November | 2015 | 19 | Youth (<25) | M | Canada | British Columbia | Accessories | Bike Racks | Hitch Rack - 4-Bike | 8 | 45 | 120 | 590 | 360 | 950 |
2 | 2014-03-23 | 23 | March | 2014 | 49 | Adults (35-64) | M | Australia | New South Wales | Accessories | Bike Racks | Hitch Rack - 4-Bike | 23 | 45 | 120 | 1366 | 1035 | 2401 |
3 | 2016-03-23 | 23 | March | 2016 | 49 | Adults (35-64) | M | Australia | New South Wales | Accessories | Bike Racks | Hitch Rack - 4-Bike | 20 | 45 | 120 | 1188 | 900 | 2088 |
4 | 2014-05-15 | 15 | May | 2014 | 47 | Adults (35-64) | F | Australia | New South Wales | Accessories | Bike Racks | Hitch Rack - 4-Bike | 4 | 45 | 120 | 238 | 180 | 418 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
113031 | 2016-04-12 | 12 | April | 2016 | 41 | Adults (35-64) | M | United Kingdom | England | Clothing | Vests | Classic Vest, S | 3 | 24 | 64 | 112 | 72 | 184 |
113032 | 2014-04-02 | 2 | April | 2014 | 18 | Youth (<25) | M | Australia | Queensland | Clothing | Vests | Classic Vest, M | 22 | 24 | 64 | 655 | 528 | 1183 |
113033 | 2016-04-02 | 2 | April | 2016 | 18 | Youth (<25) | M | Australia | Queensland | Clothing | Vests | Classic Vest, M | 22 | 24 | 64 | 655 | 528 | 1183 |
113034 | 2014-03-04 | 4 | March | 2014 | 37 | Adults (35-64) | F | France | Seine (Paris) | Clothing | Vests | Classic Vest, L | 24 | 24 | 64 | 684 | 576 | 1260 |
113035 | 2016-03-04 | 4 | March | 2016 | 37 | Adults (35-64) | F | France | Seine (Paris) | Clothing | Vests | Classic Vest, L | 23 | 24 | 64 | 655 | 552 | 1207 |
113036 rows × 18 columns
bikes.isnull().sum()
#Zbiór jest już wyczyszczony z artefaktów
Date 0 Day 0 Month 0 Year 0 Customer_Age 0 Age_Group 0 Customer_Gender 0 Country 0 State 0 Product_Category 0 Sub_Category 0 Product 0 Order_Quantity 0 Unit_Cost 0 Unit_Price 0 Profit 0 Cost 0 Revenue 0 dtype: int64
#Normalizacja danych poprzez ustawienie małych liter w zbiorze
bikes['Month'] = bikes['Month'].str.lower()
bikes['Age_Group'] = bikes['Age_Group'].str.lower()
bikes['Country'] = bikes['Country'].str.lower()
bikes['State'] = bikes['State'].str.lower()
bikes['Product_Category'] = bikes['Product_Category'].str.lower()
bikes['Sub_Category'] = bikes['Sub_Category'].str.lower()
bikes['Product'] = bikes['Product'].str.lower()
#Podział na zbiory
from sklearn.model_selection import train_test_split
bikes_train, bikes_test = train_test_split(bikes, test_size=0.2, random_state=1)
bikes_test
Date | Day | Month | Year | Customer_Age | Age_Group | Customer_Gender | Country | State | Product_Category | Sub_Category | Product | Order_Quantity | Unit_Cost | Unit_Price | Profit | Cost | Revenue | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
31242 | 2013-09-02 | 2 | september | 2013 | 25 | young adults (25-34) | F | australia | queensland | accessories | helmets | sport-100 helmet, red | 11 | 13 | 35 | 180 | 143 | 323 |
76421 | 2015-10-06 | 6 | october | 2015 | 29 | young adults (25-34) | M | australia | queensland | accessories | tires and tubes | ll mountain tire | 30 | 9 | 25 | 360 | 270 | 630 |
63417 | 2016-05-04 | 4 | may | 2016 | 44 | adults (35-64) | F | united states | oregon | bikes | road bikes | road-750 black, 44 | 1 | 344 | 540 | 120 | 344 | 464 |
13214 | 2013-11-23 | 23 | november | 2013 | 42 | adults (35-64) | F | united states | washington | accessories | bottles and cages | mountain bottle cage | 29 | 4 | 10 | 110 | 116 | 226 |
17882 | 2013-12-25 | 25 | december | 2013 | 46 | adults (35-64) | F | germany | nordrhein-westfalen | clothing | caps | awc logo cap | 19 | 7 | 9 | 16 | 133 | 149 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
36385 | 2016-06-29 | 29 | june | 2016 | 40 | adults (35-64) | F | australia | new south wales | accessories | helmets | sport-100 helmet, red | 1 | 13 | 35 | 17 | 13 | 30 |
11506 | 2014-03-04 | 4 | march | 2014 | 44 | adults (35-64) | F | united states | california | accessories | bottles and cages | water bottle - 30 oz. | 20 | 2 | 5 | 58 | 40 | 98 |
52187 | 2015-12-18 | 18 | december | 2015 | 23 | youth (<25) | M | united kingdom | england | bikes | mountain bikes | mountain-400-w silver, 46 | 1 | 420 | 769 | 318 | 420 | 738 |
83391 | 2015-12-12 | 12 | december | 2015 | 26 | young adults (25-34) | F | australia | victoria | accessories | tires and tubes | ml road tire | 22 | 9 | 25 | 237 | 198 | 435 |
112433 | 2015-09-17 | 17 | september | 2015 | 32 | young adults (25-34) | M | germany | hamburg | clothing | vests | classic vest, l | 31 | 24 | 64 | 1101 | 744 | 1845 |
22608 rows × 18 columns
bikes_train
Date | Day | Month | Year | Customer_Age | Age_Group | Customer_Gender | Country | State | Product_Category | Sub_Category | Product | Order_Quantity | Unit_Cost | Unit_Price | Profit | Cost | Revenue | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
47030 | 2014-01-10 | 10 | january | 2014 | 23 | youth (<25) | F | france | loiret | clothing | jerseys | short-sleeve classic jersey, l | 2 | 42 | 54 | 12 | 84 | 96 |
36579 | 2016-05-04 | 4 | may | 2016 | 34 | young adults (25-34) | F | united states | california | accessories | helmets | sport-100 helmet, black | 14 | 13 | 35 | 298 | 182 | 480 |
88485 | 2016-01-06 | 6 | january | 2016 | 34 | young adults (25-34) | M | france | loiret | accessories | tires and tubes | touring tire tube | 20 | 2 | 5 | 49 | 40 | 89 |
12816 | 2014-07-15 | 15 | july | 2014 | 40 | adults (35-64) | M | germany | bayern | accessories | bottles and cages | water bottle - 30 oz. | 6 | 2 | 5 | 18 | 12 | 30 |
109397 | 2015-11-29 | 29 | november | 2015 | 22 | youth (<25) | F | australia | queensland | bikes | touring bikes | touring-2000 blue, 46 | 1 | 755 | 1215 | 266 | 755 | 1021 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
50057 | 2015-11-16 | 16 | november | 2015 | 24 | youth (<25) | M | united states | washington | bikes | mountain bikes | mountain-200 silver, 38 | 3 | 1266 | 2320 | 1631 | 3798 | 5429 |
98047 | 2013-09-10 | 10 | september | 2013 | 28 | young adults (25-34) | M | australia | new south wales | accessories | tires and tubes | ml road tire | 12 | 9 | 25 | 153 | 108 | 261 |
5192 | 2016-05-26 | 26 | may | 2016 | 33 | young adults (25-34) | M | australia | new south wales | accessories | bottles and cages | water bottle - 30 oz. | 15 | 2 | 5 | 35 | 30 | 65 |
77708 | 2013-11-11 | 11 | november | 2013 | 63 | adults (35-64) | M | united states | california | accessories | tires and tubes | hl mountain tire | 21 | 13 | 35 | 447 | 273 | 720 |
98539 | 2016-04-14 | 14 | april | 2016 | 46 | adults (35-64) | M | united states | washington | accessories | tires and tubes | hl road tire | 22 | 12 | 33 | 302 | 264 | 566 |
90428 rows × 18 columns
bikes["Age_Group"].value_counts()
adults (35-64) 55824 young adults (25-34) 38654 youth (<25) 17828 seniors (64+) 730 Name: Age_Group, dtype: int64
bikes["Age_Group"].value_counts().plot(kind="bar")
<Axes: >
bikes[["Year","Profit"]].groupby("Year").mean()
Profit | |
---|---|
Year | |
2011 | 1076.317146 |
2012 | 1102.724318 |
2013 | 243.800188 |
2014 | 199.472311 |
2015 | 308.004868 |
2016 | 239.334240 |
bikes[["Year","Profit"]].groupby("Year").mean().plot(kind="bar")
<Axes: xlabel='Year'>
sns.set_theme()
sns.relplot(data=bikes, x="Unit_Cost", y="Unit_Price", hue="Unit_Cost")
<seaborn.axisgrid.FacetGrid at 0x7f807fceafb0>