115 KiB
115 KiB
OPIS
Dataset zawiera dane dotyczące cen awokado Hass i ich sprzedaży w wybranych regionach Stanów Zjednoczonych.
Opis kolumn:
- Date - data obserwacji
- AveragePrice - średnia cena pojedynczego awokado
- type - zwykłe lub organiczne
- year - rok obserwacji
- Region - miasto/region obserwacji
- Total Volume - liczba sprzedanych awokado
- 4046 - liczba sprzedanych awokado z kodem PLU 4046 (małe)
- 4225 - liczba sprzedanych awokado z kodem PLU 4225 (duże)
- 4770 - liczba sprzedanych awokado z kodem PLU 4770 (bardzo duże)
import sys
!{sys.executable} -m pip install kaggle
!echo OOOOOOOOO {sys.executable}
!{sys.executable} -m pip install pandas
!python3 -m pip install sklearn
Requirement already satisfied: kaggle in /usr/local/Cellar/jupyterlab/3.0.0_1/libexec/lib/python3.9/site-packages (1.5.12) Requirement already satisfied: six>=1.10 in /usr/local/Cellar/jupyterlab/3.0.0_1/libexec/lib/python3.9/site-packages (from kaggle) (1.15.0) Requirement already satisfied: requests in /usr/local/Cellar/jupyterlab/3.0.0_1/libexec/lib/python3.9/site-packages (from kaggle) (2.25.1) Requirement already satisfied: python-dateutil in /usr/local/Cellar/jupyterlab/3.0.0_1/libexec/lib/python3.9/site-packages (from kaggle) (2.8.1) Requirement already satisfied: python-slugify in /usr/local/Cellar/jupyterlab/3.0.0_1/libexec/lib/python3.9/site-packages (from kaggle) (4.0.1) Requirement already satisfied: urllib3 in /usr/local/Cellar/jupyterlab/3.0.0_1/libexec/lib/python3.9/site-packages (from kaggle) (1.26.2) Requirement already satisfied: tqdm in /usr/local/Cellar/jupyterlab/3.0.0_1/libexec/lib/python3.9/site-packages (from kaggle) (4.59.0) Requirement already satisfied: certifi in /usr/local/Cellar/jupyterlab/3.0.0_1/libexec/lib/python3.9/site-packages (from kaggle) (2020.12.5) Requirement already satisfied: text-unidecode>=1.3 in /usr/local/Cellar/jupyterlab/3.0.0_1/libexec/lib/python3.9/site-packages (from python-slugify->kaggle) (1.3) Requirement already satisfied: chardet<5,>=3.0.2 in /usr/local/Cellar/jupyterlab/3.0.0_1/libexec/lib/python3.9/site-packages (from requests->kaggle) (4.0.0) Requirement already satisfied: idna<3,>=2.5 in /usr/local/Cellar/jupyterlab/3.0.0_1/libexec/lib/python3.9/site-packages (from requests->kaggle) (2.10) OOOOOOOOO /usr/local/Cellar/jupyterlab/3.0.0_1/libexec/bin/python Requirement already satisfied: pandas in /usr/local/Cellar/jupyterlab/3.0.0_1/libexec/lib/python3.9/site-packages (1.2.3) Requirement already satisfied: numpy>=1.16.5 in /usr/local/Cellar/jupyterlab/3.0.0_1/libexec/lib/python3.9/site-packages (from pandas) (1.20.1) Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/Cellar/jupyterlab/3.0.0_1/libexec/lib/python3.9/site-packages (from pandas) (2.8.1) Requirement already satisfied: pytz>=2017.3 in /usr/local/Cellar/jupyterlab/3.0.0_1/libexec/lib/python3.9/site-packages (from pandas) (2020.4) Requirement already satisfied: six>=1.5 in /usr/local/Cellar/jupyterlab/3.0.0_1/libexec/lib/python3.9/site-packages (from python-dateutil>=2.7.3->pandas) (1.15.0) Requirement already satisfied: sklearn in /usr/local/lib/python3.9/site-packages (0.0) Requirement already satisfied: scikit-learn in /usr/local/lib/python3.9/site-packages (from sklearn) (0.24.1) Requirement already satisfied: joblib>=0.11 in /usr/local/lib/python3.9/site-packages (from scikit-learn->sklearn) (1.0.1) Requirement already satisfied: scipy>=0.19.1 in /usr/local/lib/python3.9/site-packages (from scikit-learn->sklearn) (1.6.1) Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.9/site-packages (from scikit-learn->sklearn) (2.1.0) Requirement already satisfied: numpy>=1.13.3 in /usr/local/lib/python3.9/site-packages (from scikit-learn->sklearn) (1.20.1)
!kaggle datasets download -d timmate/avocado-prices-2020
!unzip -o avocado-prices-2020.zip
Archive: avocado-prices-2020.zip inflating: avocado-updated-2020.csv
!head -n 5 avocado-updated-2020.csv
date,average_price,total_volume,4046,4225,4770,total_bags,small_bags,large_bags,xlarge_bags,type,year,geography 2015-01-04,1.22,40873.28,2819.5,28287.42,49.9,9716.46,9186.93,529.53,0.0,conventional,2015,Albany 2015-01-04,1.79,1373.95,57.42,153.88,0.0,1162.65,1162.65,0.0,0.0,organic,2015,Albany 2015-01-04,1.0,435021.49,364302.39,23821.16,82.15,46815.79,16707.15,30108.64,0.0,conventional,2015,Atlanta 2015-01-04,1.76,3846.69,1500.15,938.35,0.0,1408.19,1071.35,336.84,0.0,organic,2015,Atlanta
import pandas as pd
avocado_with_year = pd.read_csv('avocado-updated-2020.csv')
avocado_with_year
new = ['date', 'average_price', 'total_volume', '4046', '4225', '4770', 'total_bags', 'small_bags', 'large_bags', 'xlarge_bags', 'type', 'geography']
avocado = avocado_with_year[new]
avocado.to_csv("avocado.csv", index=False)
avocado = pd.read_csv('avocado.csv')
avocado
date | average_price | total_volume | 4046 | 4225 | 4770 | total_bags | small_bags | large_bags | xlarge_bags | type | geography | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2015-01-04 | 1.22 | 40873.28 | 2819.50 | 28287.42 | 49.90 | 9716.46 | 9186.93 | 529.53 | 0.00 | conventional | Albany |
1 | 2015-01-04 | 1.79 | 1373.95 | 57.42 | 153.88 | 0.00 | 1162.65 | 1162.65 | 0.00 | 0.00 | organic | Albany |
2 | 2015-01-04 | 1.00 | 435021.49 | 364302.39 | 23821.16 | 82.15 | 46815.79 | 16707.15 | 30108.64 | 0.00 | conventional | Atlanta |
3 | 2015-01-04 | 1.76 | 3846.69 | 1500.15 | 938.35 | 0.00 | 1408.19 | 1071.35 | 336.84 | 0.00 | organic | Atlanta |
4 | 2015-01-04 | 1.08 | 788025.06 | 53987.31 | 552906.04 | 39995.03 | 141136.68 | 137146.07 | 3990.61 | 0.00 | conventional | Baltimore/Washington |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
33040 | 2020-11-29 | 1.47 | 1583056.27 | 67544.48 | 97996.46 | 2617.17 | 1414878.10 | 906711.52 | 480191.83 | 27974.75 | organic | Total U.S. |
33041 | 2020-11-29 | 0.91 | 5811114.22 | 1352877.53 | 589061.83 | 19741.90 | 3790665.29 | 2197611.02 | 1531530.14 | 61524.13 | conventional | West |
33042 | 2020-11-29 | 1.48 | 289961.27 | 13273.75 | 19341.09 | 636.51 | 256709.92 | 122606.21 | 134103.71 | 0.00 | organic | West |
33043 | 2020-11-29 | 0.67 | 822818.75 | 234688.01 | 80205.15 | 10543.63 | 497381.96 | 285764.11 | 210808.02 | 809.83 | conventional | West Tex/New Mexico |
33044 | 2020-11-29 | 1.35 | 24106.58 | 1236.96 | 617.80 | 1564.98 | 20686.84 | 17824.52 | 2862.32 | 0.00 | organic | West Tex/New Mexico |
33045 rows × 12 columns
import numpy as np
avocado_train, avocado_validate, avocado_test = np.split(avocado.sample(frac=1), [int(.6*len(avocado)), int(.8*len(avocado))])
print("Avocado: ".ljust(20), np.size(avocado))
print("Avocado (train) : ".ljust(20), np.size(avocado_train))
print("Avocado (validate): ".ljust(20), np.size(avocado_validate))
print("Avocado (test) ".ljust(20), np.size(avocado_test))
Avocado: 396540 Avocado (train) : 237924 Avocado (validate): 79308 Avocado (test) 79308
avocado.describe(include = 'all')
date | average_price | total_volume | 4046 | 4225 | 4770 | total_bags | small_bags | large_bags | xlarge_bags | type | geography | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 33045 | 33045.000000 | 3.304500e+04 | 3.304500e+04 | 3.304500e+04 | 3.304500e+04 | 3.304500e+04 | 3.304500e+04 | 3.304500e+04 | 3.304500e+04 | 33045 | 33045 |
unique | 306 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2 | 54 |
top | 2017-10-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | conventional | Atlanta |
freq | 108 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 16524 | 612 |
mean | NaN | 1.379941 | 9.683997e+05 | 3.023914e+05 | 2.797693e+05 | 2.148255e+04 | 3.646735e+05 | 2.501980e+05 | 1.067329e+05 | 7.742585e+03 | NaN | NaN |
std | NaN | 0.378972 | 3.934533e+06 | 1.301026e+06 | 1.151052e+06 | 1.001607e+05 | 1.564004e+06 | 1.037734e+06 | 5.167226e+05 | 4.819803e+04 | NaN | NaN |
min | NaN | 0.440000 | 8.456000e+01 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | NaN | NaN |
25% | NaN | 1.100000 | 1.511895e+04 | 7.673100e+02 | 2.712470e+03 | 0.000000e+00 | 9.121860e+03 | 6.478630e+03 | 4.662900e+02 | 0.000000e+00 | NaN | NaN |
50% | NaN | 1.350000 | 1.291170e+05 | 1.099477e+04 | 2.343600e+04 | 1.780900e+02 | 5.322224e+04 | 3.687699e+04 | 6.375860e+03 | 0.000000e+00 | NaN | NaN |
75% | NaN | 1.620000 | 5.058285e+05 | 1.190219e+05 | 1.352389e+05 | 5.096530e+03 | 1.744314e+05 | 1.206624e+05 | 4.041723e+04 | 8.044400e+02 | NaN | NaN |
max | NaN | 3.250000 | 6.371614e+07 | 2.274362e+07 | 2.047057e+07 | 2.546439e+06 | 3.168919e+07 | 2.055041e+07 | 1.332760e+07 | 1.403184e+06 | NaN | NaN |
avocado_train.describe(include= 'all' )
date | average_price | total_volume | 4046 | 4225 | 4770 | total_bags | small_bags | large_bags | xlarge_bags | type | geography | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 19827 | 19827.000000 | 1.982700e+04 | 1.982700e+04 | 1.982700e+04 | 1.982700e+04 | 1.982700e+04 | 1.982700e+04 | 1.982700e+04 | 1.982700e+04 | 19827 | 19827 |
unique | 306 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2 | 54 |
top | 2018-09-23 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | organic | Sacramento |
freq | 77 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 9954 | 404 |
mean | NaN | 1.380658 | 9.503549e+05 | 2.955048e+05 | 2.762023e+05 | 2.117442e+04 | 3.573659e+05 | 2.448356e+05 | 1.049736e+05 | 7.556707e+03 | NaN | NaN |
std | NaN | 0.377988 | 3.896388e+06 | 1.285945e+06 | 1.147780e+06 | 1.008332e+05 | 1.548676e+06 | 1.023617e+06 | 5.161354e+05 | 4.776408e+04 | NaN | NaN |
min | NaN | 0.460000 | 2.534500e+02 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | NaN | NaN |
25% | NaN | 1.100000 | 1.509891e+04 | 7.560400e+02 | 2.695640e+03 | 0.000000e+00 | 9.095285e+03 | 6.430960e+03 | 4.678750e+02 | 0.000000e+00 | NaN | NaN |
50% | NaN | 1.350000 | 1.275485e+05 | 1.086294e+04 | 2.337789e+04 | 1.714100e+02 | 5.240743e+04 | 3.663295e+04 | 6.148990e+03 | 0.000000e+00 | NaN | NaN |
75% | NaN | 1.610000 | 4.996119e+05 | 1.174216e+05 | 1.337254e+05 | 4.976950e+03 | 1.721448e+05 | 1.193927e+05 | 3.875767e+04 | 7.391950e+02 | NaN | NaN |
max | NaN | 3.170000 | 6.371614e+07 | 2.113740e+07 | 2.047057e+07 | 2.546439e+06 | 3.168919e+07 | 2.055041e+07 | 1.332760e+07 | 1.403184e+06 | NaN | NaN |
avocado_validate.describe(include = 'all')
date | average_price | total_volume | 4046 | 4225 | 4770 | total_bags | small_bags | large_bags | xlarge_bags | type | geography | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 6609 | 6609.000000 | 6.609000e+03 | 6.609000e+03 | 6.609000e+03 | 6.609000e+03 | 6.609000e+03 | 6.609000e+03 | 6.609000e+03 | 6.609000e+03 | 6609 | 6609 |
unique | 306 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2 | 54 |
top | 2020-05-03 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | organic | Jacksonville |
freq | 35 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 3365 | 149 |
mean | NaN | 1.382624 | 9.914296e+05 | 3.140144e+05 | 2.827458e+05 | 2.172480e+04 | 3.729031e+05 | 2.567059e+05 | 1.085372e+05 | 7.660065e+03 | NaN | NaN |
std | NaN | 0.380997 | 4.042527e+06 | 1.341419e+06 | 1.181393e+06 | 1.021178e+05 | 1.596924e+06 | 1.065783e+06 | 5.196275e+05 | 4.795256e+04 | NaN | NaN |
min | NaN | 0.440000 | 8.456000e+01 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | NaN | NaN |
25% | NaN | 1.100000 | 1.486299e+04 | 7.570000e+02 | 2.534810e+03 | 0.000000e+00 | 9.007310e+03 | 6.281480e+03 | 4.562400e+02 | 0.000000e+00 | NaN | NaN |
50% | NaN | 1.350000 | 1.241199e+05 | 1.023778e+04 | 2.204006e+04 | 1.674700e+02 | 5.247009e+04 | 3.492217e+04 | 6.458780e+03 | 0.000000e+00 | NaN | NaN |
75% | NaN | 1.620000 | 5.026773e+05 | 1.207824e+05 | 1.307007e+05 | 5.104000e+03 | 1.706264e+05 | 1.197749e+05 | 4.128634e+04 | 7.951300e+02 | NaN | NaN |
max | NaN | 3.250000 | 6.250565e+07 | 2.274362e+07 | 2.044550e+07 | 1.800066e+06 | 2.666884e+07 | 1.740824e+07 | 1.077854e+07 | 1.123540e+06 | NaN | NaN |
avocado_test.describe(include = 'all')
date | average_price | total_volume | 4046 | 4225 | 4770 | total_bags | small_bags | large_bags | xlarge_bags | type | geography | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 6609 | 6609.000000 | 6.609000e+03 | 6.609000e+03 | 6.609000e+03 | 6.609000e+03 | 6.609000e+03 | 6.609000e+03 | 6.609000e+03 | 6.609000e+03 | 6609 | 6609 |
unique | 306 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2 | 54 |
top | 2020-06-21 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | conventional | California |
freq | 33 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 3407 | 143 |
mean | NaN | 1.375107 | 9.995041e+05 | 3.114282e+05 | 2.874940e+05 | 2.216469e+04 | 3.783667e+05 | 2.597775e+05 | 1.102065e+05 | 8.382739e+03 | NaN | NaN |
std | NaN | 0.379902 | 3.939225e+06 | 1.305043e+06 | 1.130053e+06 | 9.608845e+04 | 1.576553e+06 | 1.051335e+06 | 5.156234e+05 | 4.971697e+04 | NaN | NaN |
min | NaN | 0.480000 | 3.855500e+02 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | NaN | NaN |
25% | NaN | 1.090000 | 1.544873e+04 | 8.225900e+02 | 2.903380e+03 | 0.000000e+00 | 9.358110e+03 | 6.834760e+03 | 4.706000e+02 | 0.000000e+00 | NaN | NaN |
50% | NaN | 1.330000 | 1.409398e+05 | 1.233835e+04 | 2.530639e+04 | 2.074500e+02 | 5.576654e+04 | 3.897502e+04 | 7.182140e+03 | 0.000000e+00 | NaN | NaN |
75% | NaN | 1.610000 | 5.330085e+05 | 1.221341e+05 | 1.453971e+05 | 5.358790e+03 | 1.833669e+05 | 1.254250e+05 | 4.531138e+04 | 1.012940e+03 | NaN | NaN |
max | NaN | 3.000000 | 5.453235e+07 | 1.707665e+07 | 1.789639e+07 | 1.993645e+06 | 2.735245e+07 | 1.791382e+07 | 1.063102e+07 | 1.181516e+06 | NaN | NaN |
avocado.geography.value_counts()
Atlanta 612 St. Louis 612 New York 612 Indianapolis 612 Sacramento 612 Spokane 612 Philadelphia 612 South Carolina 612 West 612 San Francisco 612 Orlando 612 Southeast 612 Miami/Ft. Lauderdale 612 Nashville 612 Syracuse 612 Columbus 612 Detroit 612 Northern New England 612 Buffalo/Rochester 612 Raleigh/Greensboro 612 Midsouth 612 Boise 612 San Diego 612 Hartford/Springfield 612 Los Angeles 612 Total U.S. 612 Dallas/Ft. Worth 612 Great Lakes 612 Roanoke 612 Plains 612 California 612 Portland 612 Grand Rapids 612 Harrisburg/Scranton 612 Charlotte 612 Cincinnati/Dayton 612 Richmond/Norfolk 612 Houston 612 South Central 612 Northeast 612 Seattle 612 Jacksonville 612 Baltimore/Washington 612 Pittsburgh 612 Louisville 612 Boston 612 Tampa 612 Phoenix/Tucson 612 Chicago 612 Denver 612 Las Vegas 612 Albany 612 New Orleans/Mobile 612 West Tex/New Mexico 609 Name: geography, dtype: int64
avocado_test.geography.value_counts()
California 143 Grand Rapids 139 Roanoke 139 Las Vegas 139 Spokane 137 Plains 135 Seattle 134 Louisville 132 Atlanta 131 Syracuse 130 New York 130 Nashville 129 Raleigh/Greensboro 129 Miami/Ft. Lauderdale 128 Phoenix/Tucson 128 Orlando 128 Hartford/Springfield 127 San Francisco 127 South Central 127 Charlotte 126 Richmond/Norfolk 126 West 126 Tampa 124 Los Angeles 124 South Carolina 122 Great Lakes 122 Total U.S. 122 Northeast 121 Cincinnati/Dayton 121 Columbus 121 Baltimore/Washington 119 Pittsburgh 119 Jacksonville 119 Portland 119 West Tex/New Mexico 118 Midsouth 118 Houston 117 Chicago 116 Buffalo/Rochester 116 New Orleans/Mobile 116 Philadelphia 115 San Diego 115 Indianapolis 115 Northern New England 114 Boston 114 Boise 114 Southeast 114 Dallas/Ft. Worth 113 Detroit 113 Albany 112 Denver 111 St. Louis 111 Harrisburg/Scranton 104 Sacramento 100 Name: geography, dtype: int64
avocado_train.geography.value_counts()
Sacramento 404 Albany 398 Northern New England 390 Harrisburg/Scranton 388 St. Louis 385 Columbus 384 Boise 382 Indianapolis 381 Detroit 380 South Carolina 378 West Tex/New Mexico 378 Southeast 378 Nashville 377 Denver 377 Los Angeles 377 Great Lakes 376 San Diego 375 Cincinnati/Dayton 374 Boston 374 South Central 373 New Orleans/Mobile 373 Richmond/Norfolk 371 Seattle 371 Total U.S. 371 Buffalo/Rochester 370 Northeast 369 Charlotte 368 Atlanta 368 Chicago 367 San Francisco 366 Midsouth 366 Philadelphia 365 New York 363 Portland 363 Syracuse 362 Grand Rapids 361 Louisville 361 Roanoke 361 Dallas/Ft. Worth 360 Orlando 359 Tampa 359 Houston 359 Hartford/Springfield 358 Pittsburgh 357 West 356 Miami/Ft. Lauderdale 354 Baltimore/Washington 353 Phoenix/Tucson 353 Raleigh/Greensboro 345 Jacksonville 344 Las Vegas 339 California 336 Plains 335 Spokane 335 Name: geography, dtype: int64
pd.value_counts(avocado['type']).plot.bar()
<AxesSubplot:>
pd.value_counts(avocado_train['type']).plot.bar()
<AxesSubplot:>
pd.value_counts(avocado_test['type']).plot.bar()
<AxesSubplot:>
avocado['average_price'].hist()
avocado_train['average_price'].hist()
avocado_validate['average_price'].hist()
avocado_test['average_price'].hist()
<AxesSubplot:>
# według https://www.journaldev.com/45109/normalize-data-in-python
from sklearn import preprocessing
num_values = avocado.select_dtypes(include='float64').values
scaler = preprocessing.MinMaxScaler()
x_scaled = scaler.fit_transform(num_values)
num_columns = avocado.select_dtypes(include='float64').columns
avocado_normalized = pd.DataFrame(x_scaled, columns=num_columns)
for col in avocado.columns:
if col in num_columns:
avocado[col] = avocado_normalized[col]
avocado
date | average_price | total_volume | 4046 | 4225 | 4770 | total_bags | small_bags | large_bags | xlarge_bags | type | geography | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2015-01-04 | 0.277580 | 0.000640 | 0.000124 | 0.001382 | 0.000020 | 0.000307 | 0.000447 | 0.000040 | 0.000000 | conventional | Albany |
1 | 2015-01-04 | 0.480427 | 0.000020 | 0.000003 | 0.000008 | 0.000000 | 0.000037 | 0.000057 | 0.000000 | 0.000000 | organic | Albany |
2 | 2015-01-04 | 0.199288 | 0.006826 | 0.016018 | 0.001164 | 0.000032 | 0.001477 | 0.000813 | 0.002259 | 0.000000 | conventional | Atlanta |
3 | 2015-01-04 | 0.469751 | 0.000059 | 0.000066 | 0.000046 | 0.000000 | 0.000044 | 0.000052 | 0.000025 | 0.000000 | organic | Atlanta |
4 | 2015-01-04 | 0.227758 | 0.012366 | 0.002374 | 0.027010 | 0.015706 | 0.004454 | 0.006674 | 0.000299 | 0.000000 | conventional | Baltimore/Washington |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
33040 | 2020-11-29 | 0.366548 | 0.024844 | 0.002970 | 0.004787 | 0.001028 | 0.044649 | 0.044121 | 0.036030 | 0.019937 | organic | Total U.S. |
33041 | 2020-11-29 | 0.167260 | 0.091202 | 0.059484 | 0.028776 | 0.007753 | 0.119620 | 0.106938 | 0.114914 | 0.043846 | conventional | West |
33042 | 2020-11-29 | 0.370107 | 0.004550 | 0.000584 | 0.000945 | 0.000250 | 0.008101 | 0.005966 | 0.010062 | 0.000000 | organic | West |
33043 | 2020-11-29 | 0.081851 | 0.012913 | 0.010319 | 0.003918 | 0.004141 | 0.015696 | 0.013906 | 0.015817 | 0.000577 | conventional | West Tex/New Mexico |
33044 | 2020-11-29 | 0.323843 | 0.000377 | 0.000054 | 0.000030 | 0.000615 | 0.000653 | 0.000867 | 0.000215 | 0.000000 | organic | West Tex/New Mexico |
33045 rows × 12 columns
avocado.isnull().sum()
date 0 average_price 0 total_volume 0 4046 0 4225 0 4770 0 total_bags 0 small_bags 0 large_bags 0 xlarge_bags 0 type 0 geography 0 dtype: int64
avocado.dropna()
date | average_price | total_volume | 4046 | 4225 | 4770 | total_bags | small_bags | large_bags | xlarge_bags | type | geography | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2015-01-04 | 0.277580 | 0.000640 | 0.000124 | 0.001382 | 0.000020 | 0.000307 | 0.000447 | 0.000040 | 0.000000 | conventional | Albany |
1 | 2015-01-04 | 0.480427 | 0.000020 | 0.000003 | 0.000008 | 0.000000 | 0.000037 | 0.000057 | 0.000000 | 0.000000 | organic | Albany |
2 | 2015-01-04 | 0.199288 | 0.006826 | 0.016018 | 0.001164 | 0.000032 | 0.001477 | 0.000813 | 0.002259 | 0.000000 | conventional | Atlanta |
3 | 2015-01-04 | 0.469751 | 0.000059 | 0.000066 | 0.000046 | 0.000000 | 0.000044 | 0.000052 | 0.000025 | 0.000000 | organic | Atlanta |
4 | 2015-01-04 | 0.227758 | 0.012366 | 0.002374 | 0.027010 | 0.015706 | 0.004454 | 0.006674 | 0.000299 | 0.000000 | conventional | Baltimore/Washington |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
33040 | 2020-11-29 | 0.366548 | 0.024844 | 0.002970 | 0.004787 | 0.001028 | 0.044649 | 0.044121 | 0.036030 | 0.019937 | organic | Total U.S. |
33041 | 2020-11-29 | 0.167260 | 0.091202 | 0.059484 | 0.028776 | 0.007753 | 0.119620 | 0.106938 | 0.114914 | 0.043846 | conventional | West |
33042 | 2020-11-29 | 0.370107 | 0.004550 | 0.000584 | 0.000945 | 0.000250 | 0.008101 | 0.005966 | 0.010062 | 0.000000 | organic | West |
33043 | 2020-11-29 | 0.081851 | 0.012913 | 0.010319 | 0.003918 | 0.004141 | 0.015696 | 0.013906 | 0.015817 | 0.000577 | conventional | West Tex/New Mexico |
33044 | 2020-11-29 | 0.323843 | 0.000377 | 0.000054 | 0.000030 | 0.000615 | 0.000653 | 0.000867 | 0.000215 | 0.000000 | organic | West Tex/New Mexico |
33045 rows × 12 columns