9.8 KiB
9.8 KiB
%pip install --user kaggle
%pip install --user pandas
%pip install --user pandas
%pip install --user seaborn
!kaggle datasets download -d bartoszpieniak/poland-cars-for-sale-dataset
!unzip -o poland-cars-for-sale-dataset.zip
!wc -l Car_sale_ads.csv
!head -n 5 Car_sale_ads.csv
Requirement already satisfied: kaggle in \\\\files\students\s487174\.appdata\python\python310\site-packages (1.5.13) Requirement already satisfied: urllib3 in c:\software\python3\lib\site-packages (from kaggle) (1.26.14) Requirement already satisfied: requests in c:\software\python3\lib\site-packages (from kaggle) (2.28.2) Requirement already satisfied: tqdm in c:\software\python3\lib\site-packages (from kaggle) (4.64.1) Requirement already satisfied: python-dateutil in c:\software\python3\lib\site-packages (from kaggle) (2.8.2) Requirement already satisfied: six>=1.10 in c:\software\python3\lib\site-packages (from kaggle) (1.16.0) Requirement already satisfied: python-slugify in \\\\files\students\s487174\.appdata\python\python310\site-packages (from kaggle) (8.0.1) Requirement already satisfied: certifi in c:\software\python3\lib\site-packages (from kaggle) (2022.12.7) Requirement already satisfied: text-unidecode>=1.3 in \\\\files\students\s487174\.appdata\python\python310\site-packages (from python-slugify->kaggle) (1.3) Requirement already satisfied: charset-normalizer<4,>=2 in c:\software\python3\lib\site-packages (from requests->kaggle) (3.0.1) Requirement already satisfied: idna<4,>=2.5 in c:\software\python3\lib\site-packages (from requests->kaggle) (3.4) Requirement already satisfied: colorama in c:\software\python3\lib\site-packages (from tqdm->kaggle) (0.4.6) Requirement already satisfied: pandas in c:\software\python3\lib\site-packages (1.5.3) Requirement already satisfied: pytz>=2020.1 in c:\software\python3\lib\site-packages (from pandas) (2022.7.1) Requirement already satisfied: numpy>=1.21.0 in c:\software\python3\lib\site-packages (from pandas) (1.24.2) Requirement already satisfied: python-dateutil>=2.8.1 in c:\software\python3\lib\site-packages (from pandas) (2.8.2) Requirement already satisfied: six>=1.5 in c:\software\python3\lib\site-packages (from python-dateutil>=2.8.1->pandas) (1.16.0) Requirement already satisfied: pandas in c:\software\python3\lib\site-packages (1.5.3) Requirement already satisfied: numpy>=1.21.0 in c:\software\python3\lib\site-packages (from pandas) (1.24.2) Requirement already satisfied: python-dateutil>=2.8.1 in c:\software\python3\lib\site-packages (from pandas) (2.8.2) Requirement already satisfied: pytz>=2020.1 in c:\software\python3\lib\site-packages (from pandas) (2022.7.1) Requirement already satisfied: six>=1.5 in c:\software\python3\lib\site-packages (from python-dateutil>=2.8.1->pandas) (1.16.0) Requirement already satisfied: seaborn in \\\\files\students\s487174\.appdata\python\python310\site-packages (0.12.2) Requirement already satisfied: numpy!=1.24.0,>=1.17 in c:\software\python3\lib\site-packages (from seaborn) (1.24.2) Requirement already satisfied: pandas>=0.25 in c:\software\python3\lib\site-packages (from seaborn) (1.5.3) Requirement already satisfied: matplotlib!=3.6.1,>=3.1 in c:\software\python3\lib\site-packages (from seaborn) (3.7.0) Requirement already satisfied: contourpy>=1.0.1 in c:\software\python3\lib\site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (1.0.7) Requirement already satisfied: fonttools>=4.22.0 in c:\software\python3\lib\site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (4.38.0) Requirement already satisfied: cycler>=0.10 in c:\software\python3\lib\site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (0.11.0) Requirement already satisfied: python-dateutil>=2.7 in c:\software\python3\lib\site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (2.8.2) Requirement already satisfied: pyparsing>=2.3.1 in c:\software\python3\lib\site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (3.0.9) Requirement already satisfied: pillow>=6.2.0 in c:\software\python3\lib\site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (9.4.0) Requirement already satisfied: packaging>=20.0 in c:\software\python3\lib\site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (23.0) Requirement already satisfied: kiwisolver>=1.0.1 in c:\software\python3\lib\site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (1.4.4) Requirement already satisfied: pytz>=2020.1 in c:\software\python3\lib\site-packages (from pandas>=0.25->seaborn) (2022.7.1) Requirement already satisfied: six>=1.5 in c:\software\python3\lib\site-packages (from python-dateutil>=2.7->matplotlib!=3.6.1,>=3.1->seaborn) (1.16.0) Note: you may need to restart the kernel to use updated packages. poland-cars-for-sale-dataset.zip: Skipping, found more recently modified local copy (use --force to force download) Archive: poland-cars-for-sale-dataset.zip inflating: Car_sale_ads.csv 208305 Car_sale_ads.csv Index,Price,Currency,Condition,Vehicle_brand,Vehicle_model,Vehicle_version,Vehicle_generation,Production_year,Mileage_km,Power_HP,Displacement_cm3,Fuel_type,CO2_emissions,Drive,Transmission,Type,Doors_number,Colour,Origin_country,First_owner,First_registration_date,Offer_publication_date,Offer_location,Features 0,86200,PLN,New,Abarth,595,,,2021,1.0,145.0,1400.0,Gasoline,,Front wheels,Manual,small_cars,3.0,gray,,,,04/05/2021,"ul. Jubilerska 6 - 04-190 Warszawa, Mazowieckie (Polska)",[] 1,43500,PLN,Used,Abarth,Other,,,1974,59000.0,75.0,1100.0,Gasoline,,Front wheels,Manual,coupe,2.0,silver,,,,03/05/2021,"kanonierska12 - 04-425 Warszawa, Rembertów (Polska)",[] 2,44900,PLN,Used,Abarth,500,,,2018,52000.0,180.0,1368.0,Gasoline,,,Automatic,small_cars,3.0,silver,,,,03/05/2021,"Warszawa, Mazowieckie, Białołęka","['ABS', 'Electric front windows', 'Drivers airbag', 'Power steering', 'ASR (traction control)', 'Rear view camera', 'Heated side mirrors', 'CD', 'Electrically adjustable mirrors', 'Passengers airbag', 'Alarm', 'Bluetooth', 'Automatic air conditioning', 'Airbag protecting the knees', 'Central locking', 'Immobilizer', 'Factory radio', 'Alloy wheels', 'Rain sensor', 'On-board computer', 'Multifunction steering wheel']" 3,39900,PLN,Used,Abarth,500,,,2012,29000.0,160.0,1368.0,Gasoline,139.0,Front wheels,Manual,small_cars,3.0,gray,,,,30/04/2021,"Jaworzno, Śląskie","['ABS', 'Electric front windows', 'Drivers airbag', 'Power steering', 'Bluetooth', 'AUX socket', 'On-board computer', 'Xenon lights', 'CD', 'Electrically adjustable mirrors', 'Passengers airbag', 'Alloy wheels', 'Rain sensor', 'USB socket', 'MP3', 'Multifunction steering wheel', 'Central locking', 'Immobilizer', 'Factory radio', 'ASR (traction control)', 'ESP(stabilization of the track)', 'Automatic air conditioning', 'Front side airbags']"
import pandas as pd
cars=pd.read_csv('Car_sale_ads.csv')
cars
cars.describe(include='all')
cars["Drive"].value_counts()
Front wheels 139944 Rear wheels 18081 4x4 (permanent) 16986 4x4 (attached automatically) 15420 4x4 (attached manually) 2797 Name: Drive, dtype: int64
## Split
from sklearn.model_selection import train_test_split
cars_train, cars_test = train_test_split(cars, test_size=1000, random_state=1)
cars_train["Drive"].value_counts()
cars_test["Drive"].value_counts()
Front wheels 658 4x4 (permanent) 87 4x4 (attached automatically) 84 Rear wheels 82 4x4 (attached manually) 13 Name: Drive, dtype: int64
from sklearn.model_selection import train_test_split
#cars_train, cars_test = train_test_split(cars, test_size=50, random_state=1, stratify=cars["Drive"])
#cars_train["Drive"].value_counts()