Instalacja i import bibliotek

!pip install kaggle
!pip install pandas
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler

Pobranie zbioru danych

!kaggle datasets download -d syedanwarafridi/vehicle-sales-data
#conda install git pip
#!pip install unzip
!unzip -o vehicle-sales-data.zip
Opis i czyszczenie danych danych

df = pd.read_csv('car_prices.csv')
year make model trim body transmission vin state condition odometer color interior seller mmr sellingprice saledate
0 2015 Kia Sorento LX SUV automatic 5xyktca69fg566472 ca 5.0 16639.0 white black kia motors america inc 20500.0 21500.0 Tue Dec 16 2014 12:30:00 GMT-0800 (PST)
1 2015 Kia Sorento LX SUV automatic 5xyktca69fg561319 ca 5.0 9393.0 white beige kia motors america inc 20800.0 21500.0 Tue Dec 16 2014 12:30:00 GMT-0800 (PST)
2 2014 BMW 3 Series 328i SULEV Sedan automatic wba3c1c51ek116351 ca 45.0 1331.0 gray black financial services remarketing (lease) 31900.0 30000.0 Thu Jan 15 2015 04:30:00 GMT-0800 (PST)
3 2015 Volvo S60 T5 Sedan automatic yv1612tb4f1310987 ca 41.0 14282.0 white black volvo na rep/world omni 27500.0 27750.0 Thu Jan 29 2015 04:30:00 GMT-0800 (PST)
4 2014 BMW 6 Series Gran Coupe 650i Sedan automatic wba6b2c57ed129731 ca 43.0 2641.0 gray black financial services remarketing (lease) 66000.0 67000.0 Thu Dec 18 2014 12:30:00 GMT-0800 (PST)
(558837, 16)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 558837 entries, 0 to 558836
Data columns (total 16 columns):
 #   Column        Non-Null Count   Dtype  
---  ------        --------------   -----  
 0   year          558837 non-null  int64  
 1   make          548536 non-null  object 
 2   model         548438 non-null  object 
 3   trim          548186 non-null  object 
 4   body          545642 non-null  object 
 5   transmission  493485 non-null  object 
 6   vin           558833 non-null  object 
 7   state         558837 non-null  object 
 8   condition     547017 non-null  float64
 9   odometer      558743 non-null  float64
 10  color         558088 non-null  object 
 11  interior      558088 non-null  object 
 12  seller        558837 non-null  object 
 13  mmr           558799 non-null  float64
 14  sellingprice  558825 non-null  float64
 15  saledate      558825 non-null  object 
dtypes: float64(4), int64(1), object(11)
memory usage: 68.2+ MB
df = df.dropna()
(472325, 16)
df['body'] = df['body'].replace({'sedan': 'Sedan'})
df['body'] = df['body'].replace({'Suv': 'SUV'})
df['body'] = df['body'].replace({'suv': 'SUV'})
numeric_columns = df.select_dtypes(include=['int', 'float']).columns
scaler = MinMaxScaler(feature_range=(0, 1))

df_scaled = df.copy()
df_scaled[numeric_columns] = scaler.fit_transform(df[numeric_columns])
year make model trim body transmission vin state condition odometer color interior seller mmr sellingprice saledate
0 1.00 Kia Sorento LX SUV automatic 5xyktca69fg566472 ca 0.083333 0.016638 white black kia motors america inc 0.112515 0.093474 Tue Dec 16 2014 12:30:00 GMT-0800 (PST)
1 1.00 Kia Sorento LX SUV automatic 5xyktca69fg561319 ca 0.083333 0.009392 white beige kia motors america inc 0.114164 0.093474 Tue Dec 16 2014 12:30:00 GMT-0800 (PST)
2 0.96 BMW 3 Series 328i SULEV Sedan automatic wba3c1c51ek116351 ca 0.916667 0.001330 gray black financial services remarketing (lease) 0.175161 0.130431 Thu Jan 15 2015 04:30:00 GMT-0800 (PST)
3 1.00 Volvo S60 T5 Sedan automatic yv1612tb4f1310987 ca 0.833333 0.014281 white black volvo na rep/world omni 0.150982 0.120648 Thu Jan 29 2015 04:30:00 GMT-0800 (PST)
4 0.96 BMW 6 Series Gran Coupe 650i Sedan automatic wba6b2c57ed129731 ca 0.875000 0.002640 gray black financial services remarketing (lease) 0.362550 0.291301 Thu Dec 18 2014 12:30:00 GMT-0800 (PST)

Podział danych na podzbiory

car_train, car_dev_test = train_test_split(df, random_state = 0, train_size = 0.8)
car_dev, car_test = train_test_split(car_dev_test, random_state = 0, train_size = 0.5)
(377860, 16)
(47232, 16)
(47233, 16)

Statystyki zbioru

Ford         81013
Chevrolet    54150
Nissan       44043
Toyota       35313
Dodge        27181
Honda        24781
Hyundai      18659
BMW          17509
Kia          15828
Chrysler     15133
Name: make, dtype: int64
Sedan          211298
SUV            120968
Hatchback       19351
Minivan         18305
Coupe           13121
Wagon           12023
Crew Cab        11508
Convertible      7725
SuperCrew        6195
G Sedan          5644
Name: body, dtype: int64
automatic    455963
manual        16362
Name: transmission, dtype: int64