152 KiB
152 KiB
#instalacja pakietow
!pip install kaggle
!pip install pandas
!pip install unzip
!pip install scikit-learn
!pip install seaborn
Requirement already satisfied: kaggle in ./jupyter_env/lib/python3.10/site-packages (1.5.13) Requirement already satisfied: requests in ./jupyter_env/lib/python3.10/site-packages (from kaggle) (2.28.2) Requirement already satisfied: six>=1.10 in ./jupyter_env/lib/python3.10/site-packages (from kaggle) (1.16.0) Requirement already satisfied: tqdm in ./jupyter_env/lib/python3.10/site-packages (from kaggle) (4.65.0) Requirement already satisfied: urllib3 in ./jupyter_env/lib/python3.10/site-packages (from kaggle) (1.26.15) Requirement already satisfied: certifi in ./jupyter_env/lib/python3.10/site-packages (from kaggle) (2022.12.7) Requirement already satisfied: python-slugify in ./jupyter_env/lib/python3.10/site-packages (from kaggle) (8.0.1) Requirement already satisfied: python-dateutil in ./jupyter_env/lib/python3.10/site-packages (from kaggle) (2.8.2) Requirement already satisfied: text-unidecode>=1.3 in ./jupyter_env/lib/python3.10/site-packages (from python-slugify->kaggle) (1.3) Requirement already satisfied: idna<4,>=2.5 in ./jupyter_env/lib/python3.10/site-packages (from requests->kaggle) (3.4) Requirement already satisfied: charset-normalizer<4,>=2 in ./jupyter_env/lib/python3.10/site-packages (from requests->kaggle) (3.1.0) Requirement already satisfied: pandas in ./jupyter_env/lib/python3.10/site-packages (1.5.3) Requirement already satisfied: numpy>=1.21.0 in ./jupyter_env/lib/python3.10/site-packages (from pandas) (1.24.2) Requirement already satisfied: python-dateutil>=2.8.1 in ./jupyter_env/lib/python3.10/site-packages (from pandas) (2.8.2) Requirement already satisfied: pytz>=2020.1 in ./jupyter_env/lib/python3.10/site-packages (from pandas) (2022.7.1) Requirement already satisfied: six>=1.5 in ./jupyter_env/lib/python3.10/site-packages (from python-dateutil>=2.8.1->pandas) (1.16.0) Requirement already satisfied: unzip in ./jupyter_env/lib/python3.10/site-packages (1.0.0) Requirement already satisfied: scikit-learn in ./jupyter_env/lib/python3.10/site-packages (1.2.2) Requirement already satisfied: threadpoolctl>=2.0.0 in ./jupyter_env/lib/python3.10/site-packages (from scikit-learn) (3.1.0) Requirement already satisfied: numpy>=1.17.3 in ./jupyter_env/lib/python3.10/site-packages (from scikit-learn) (1.24.2) Requirement already satisfied: joblib>=1.1.1 in ./jupyter_env/lib/python3.10/site-packages (from scikit-learn) (1.2.0) Requirement already satisfied: scipy>=1.3.2 in ./jupyter_env/lib/python3.10/site-packages (from scikit-learn) (1.10.1) Requirement already satisfied: seaborn in ./jupyter_env/lib/python3.10/site-packages (0.12.2) Requirement already satisfied: numpy!=1.24.0,>=1.17 in ./jupyter_env/lib/python3.10/site-packages (from seaborn) (1.24.2) Requirement already satisfied: pandas>=0.25 in ./jupyter_env/lib/python3.10/site-packages (from seaborn) (1.5.3) Requirement already satisfied: matplotlib!=3.6.1,>=3.1 in ./jupyter_env/lib/python3.10/site-packages (from seaborn) (3.7.1) Requirement already satisfied: pillow>=6.2.0 in ./jupyter_env/lib/python3.10/site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (9.4.0) Requirement already satisfied: fonttools>=4.22.0 in ./jupyter_env/lib/python3.10/site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (4.39.2) Requirement already satisfied: pyparsing>=2.3.1 in ./jupyter_env/lib/python3.10/site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (3.0.9) Requirement already satisfied: contourpy>=1.0.1 in ./jupyter_env/lib/python3.10/site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (1.0.7) Requirement already satisfied: cycler>=0.10 in ./jupyter_env/lib/python3.10/site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (0.11.0) Requirement already satisfied: kiwisolver>=1.0.1 in ./jupyter_env/lib/python3.10/site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (1.4.4) Requirement already satisfied: packaging>=20.0 in ./jupyter_env/lib/python3.10/site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (23.0) Requirement already satisfied: python-dateutil>=2.7 in ./jupyter_env/lib/python3.10/site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (2.8.2) Requirement already satisfied: pytz>=2020.1 in ./jupyter_env/lib/python3.10/site-packages (from pandas>=0.25->seaborn) (2022.7.1) Requirement already satisfied: six>=1.5 in ./jupyter_env/lib/python3.10/site-packages (from python-dateutil>=2.7->matplotlib!=3.6.1,>=3.1->seaborn) (1.16.0)
#Pobranie zbioru
!kaggle datasets download -d sohier/crime-in-baltimore
Warning: Your Kaggle API key is readable by other users on this system! To fix this, you can run 'chmod 600 /home/user/.kaggle/kaggle.json' crime-in-baltimore.zip: Skipping, found more recently modified local copy (use --force to force download)
!unzip -o crime-in-baltimore.zip
Archive: crime-in-baltimore.zip inflating: BPD_Part_1_Victim_Based_Crime_Data.csv
! grep -P "^$" -n BPD_Part_1_Victim_Based_Crime_Data.csv
import pandas as pd
baltimore=pd.read_csv('BPD_Part_1_Victim_Based_Crime_Data.csv')
baltimore
CrimeDate | CrimeTime | CrimeCode | Location | Description | Inside/Outside | Weapon | Post | District | Neighborhood | Longitude | Latitude | Location 1 | Premise | Total Incidents | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 09/02/2017 | 23:30:00 | 3JK | 4200 AUDREY AVE | ROBBERY - RESIDENCE | I | KNIFE | 913.0 | SOUTHERN | Brooklyn | -76.60541 | 39.22951 | (39.2295100000, -76.6054100000) | ROW/TOWNHO | 1 |
1 | 09/02/2017 | 23:00:00 | 7A | 800 NEWINGTON AVE | AUTO THEFT | O | NaN | 133.0 | CENTRAL | Reservoir Hill | -76.63217 | 39.31360 | (39.3136000000, -76.6321700000) | STREET | 1 |
2 | 09/02/2017 | 22:53:00 | 9S | 600 RADNOR AV | SHOOTING | Outside | FIREARM | 524.0 | NORTHERN | Winston-Govans | -76.60697 | 39.34768 | (39.3476800000, -76.6069700000) | Street | 1 |
3 | 09/02/2017 | 22:50:00 | 4C | 1800 RAMSAY ST | AGG. ASSAULT | I | OTHER | 934.0 | SOUTHERN | Carrollton Ridge | -76.64526 | 39.28315 | (39.2831500000, -76.6452600000) | ROW/TOWNHO | 1 |
4 | 09/02/2017 | 22:31:00 | 4E | 100 LIGHT ST | COMMON ASSAULT | O | HANDS | 113.0 | CENTRAL | Downtown West | -76.61365 | 39.28756 | (39.2875600000, -76.6136500000) | STREET | 1 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
276524 | 01/01/2012 | 00:00:00 | 6J | 1400 JOH AVE | LARCENY | I | NaN | 832.0 | SOUTHWESTERN | Violetville | -76.67195 | 39.26132 | (39.2613200000, -76.6719500000) | OTHER - IN | 1 |
276525 | 01/01/2012 | 00:00:00 | 6J | 5500 SINCLAIR LN | LARCENY | O | NaN | 444.0 | NORTHEASTERN | Frankford | -76.53829 | 39.32493 | (39.3249300000, -76.5382900000) | OTHER - OU | 1 |
276526 | 01/01/2012 | 00:00:00 | 6E | 400 N PATTERSON PK AV | LARCENY | O | NaN | 321.0 | EASTERN | CARE | -76.58497 | 39.29573 | (39.2957300000, -76.5849700000) | STREET | 1 |
276527 | 01/01/2012 | 00:00:00 | 5A | 5800 LILLYAN AV | BURGLARY | I | NaN | 425.0 | NORTHEASTERN | Glenham-Belhar | -76.54578 | 39.34701 | (39.3470100000, -76.5457800000) | APT. LOCKE | 1 |
276528 | 01/01/2012 | 00:00:00 | 5A | 1900 GRINNALDS AV | BURGLARY | I | NaN | 831.0 | SOUTHWESTERN | Morrell Park | -76.65094 | 39.26698 | (39.2669800000, -76.6509400000) | ROW/TOWNHO | 1 |
276529 rows × 15 columns
baltimore.isnull().sum()
CrimeDate 0 CrimeTime 0 CrimeCode 0 Location 2207 Description 0 Inside/Outside 10279 Weapon 180952 Post 224 District 80 Neighborhood 2740 Longitude 2204 Latitude 2204 Location 1 2204 Premise 10757 Total Incidents 0 dtype: int64
# W wiekszosci przestepstw nie uzywa sie broni, zastepujemy
# puste pola przez None
baltimore["Weapon"].fillna("None", inplace=True)
baltimore.isnull().sum()
CrimeDate 0 CrimeTime 0 CrimeCode 0 Location 2207 Description 0 Inside/Outside 10279 Weapon 0 Post 224 District 80 Neighborhood 2740 Longitude 2204 Latitude 2204 Location 1 2204 Premise 10757 Total Incidents 0 dtype: int64
#Wyczyszczenie zbioru z artefaktow
baltimore.dropna(inplace=True)
baltimore.isnull().sum()
CrimeDate 0 CrimeTime 0 CrimeCode 0 Location 0 Description 0 Inside/Outside 0 Weapon 0 Post 0 District 0 Neighborhood 0 Longitude 0 Latitude 0 Location 1 0 Premise 0 Total Incidents 0 dtype: int64
from sklearn.model_selection import train_test_split
#Normalizacja
baltimore['Post'] = baltimore['Post'] /baltimore['Post'].abs().max()
baltimore['Location']=baltimore['Location'].str.lower()
baltimore['Description']=baltimore['Description'].str.lower()
baltimore['Weapon']=baltimore['Weapon'].str.lower()
baltimore['Premise']=baltimore['Premise'].str.lower()
baltimore['District']=baltimore['District'].str.lower()
baltimore['CrimeCode']=baltimore['CrimeCode'].str.lower()
baltimore['Neighborhood']=baltimore['Neighborhood'].str.lower()
baltimore['Inside/Outside']=baltimore['Inside/Outside'].str.lower()
baltimore['District'].value_counts().plot(kind="bar")
<Axes: >
import seaborn as sns
sns.set_theme()
sns.relplot(data=baltimore[:20], x='Longitude', y='Latitude', hue='Weapon')
<seaborn.axisgrid.FacetGrid at 0x7f9756fab6a0>
#Podzial na zbiory
baltimore_train, baltimore_test = train_test_split(baltimore, test_size=0.1, random_state=1)
baltimore_test
CrimeDate | CrimeTime | CrimeCode | Location | Description | Inside/Outside | Weapon | Post | District | Neighborhood | Longitude | Latitude | Location 1 | Premise | Total Incidents | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
20700 | 04/10/2017 | 22:26:00 | 4e | 4900 eastern av | common assault | o | hands | 0.256628 | southeastern | greektown | -76.55422 | 39.28706 | (39.2870600000, -76.5542200000) | alley | 1 |
63746 | 06/05/2016 | 20:44:00 | 4e | 3000 s hanover st | common assault | o | hands | 0.977731 | southern | middle branch/reedbird pa | -76.61504 | 39.25134 | (39.2513400000, -76.6150400000) | street | 1 |
169854 | 03/10/2014 | 20:00:00 | 4e | 4100 parkside dr | common assault | o | hands | 0.447508 | northeastern | belair-parkside | -76.56605 | 39.32783 | (39.3278300000, -76.5660500000) | street | 1 |
42473 | 10/31/2016 | 09:30:00 | 4e | 5600 loch raven blvd | common assault | i | hands | 0.440085 | northeastern | loch raven | -76.58856 | 39.35952 | (39.3595200000, -76.5885600000) | hotel/mote | 1 |
86103 | 12/05/2015 | 08:15:00 | 4e | 1100 guilford ave | common assault | i | hands | 0.149523 | central | mid-town belvedere | -76.61194 | 39.30319 | (39.3031900000, -76.6119400000) | apt/condo | 1 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
182763 | 11/20/2013 | 20:00:00 | 6d | 3800 dolfield av | larceny from auto | o | none | 0.681866 | northwestern | dolfield | -76.68090 | 39.33938 | (39.3393800000, -76.6809000000) | street | 1 |
14972 | 05/22/2017 | 03:30:00 | 4c | 3000 w garrison ave | agg. assault | i | other | 0.651113 | northwestern | central park heights | -76.67146 | 39.34863 | (39.3486300000, -76.6714600000) | row/townho | 1 |
44956 | 10/15/2016 | 23:30:00 | 7a | 500 jack st | auto theft | o | none | 0.968187 | southern | brooklyn | -76.60582 | 39.23265 | (39.2326500000, -76.6058200000) | street | 1 |
36873 | 12/08/2016 | 18:30:00 | 4e | 3800 cedarhurst rd | common assault | o | hands | 0.451750 | northeastern | waltherson | -76.56315 | 39.33720 | (39.3372000000, -76.5631500000) | street | 1 |
230084 | 12/06/2012 | 14:00:00 | 4e | 800 s highland av | common assault | i | hands | 0.246023 | southeastern | canton | -76.56878 | 39.28342 | (39.2834200000, -76.5687800000) | school | 1 |
26312 rows × 15 columns
baltimore_train, baltimore_val= train_test_split(baltimore_train, test_size=0.25, random_state=1)
baltimore.describe(include='all')
CrimeDate | CrimeTime | CrimeCode | Location | Description | Inside/Outside | Weapon | Post | District | Neighborhood | Longitude | Latitude | Location 1 | Premise | Total Incidents | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 263118 | 263118 | 263118 | 263118 | 263118 | 263118 | 263118 | 263118.000000 | 263118 | 263118 | 263118.000000 | 263118.000000 | 263118 | 263118 | 263118.0 |
unique | 2072 | 2935 | 80 | 25276 | 15 | 4 | 5 | NaN | 9 | 278 | NaN | NaN | 93543 | 118 | NaN |
top | 04/27/2015 | 18:00:00 | 4e | 200 e pratt st | larceny | i | none | NaN | northeastern | downtown | NaN | NaN | (39.3180000000, -76.6582100000) | street | NaN |
freq | 407 | 6483 | 43093 | 632 | 58246 | 131015 | 173175 | NaN | 40842 | 8701 | NaN | NaN | 503 | 102544 | NaN |
mean | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.536416 | NaN | NaN | -76.617469 | 39.307456 | NaN | NaN | 1.0 |
std | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.276554 | NaN | NaN | 0.042220 | 0.029537 | NaN | NaN | 0.0 |
min | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.117709 | NaN | NaN | -76.711280 | 39.200410 | NaN | NaN | 1.0 |
25% | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.256628 | NaN | NaN | -76.648420 | 39.288340 | NaN | NaN | 1.0 |
50% | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.541888 | NaN | NaN | -76.614010 | 39.303680 | NaN | NaN | 1.0 |
75% | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.775186 | NaN | NaN | -76.587490 | 39.327890 | NaN | NaN | 1.0 |
max | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.000000 | NaN | NaN | -76.529770 | 39.371980 | NaN | NaN | 1.0 |
baltimore_test.describe(include='all')
CrimeDate | CrimeTime | CrimeCode | Location | Description | Inside/Outside | Weapon | Post | District | Neighborhood | Longitude | Latitude | Location 1 | Premise | Total Incidents | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 26312 | 26312 | 26312 | 26312 | 26312 | 26312 | 26312 | 26312.000000 | 26312 | 26312 | 26312.000000 | 26312.000000 | 26312 | 26312 | 26312.0 |
unique | 2071 | 1513 | 71 | 11180 | 15 | 4 | 5 | NaN | 9 | 276 | NaN | NaN | 18843 | 104 | NaN |
top | 04/27/2015 | 18:00:00 | 4e | 1500 russell st | larceny | i | none | NaN | northeastern | downtown | NaN | NaN | (39.3180000000, -76.6582100000) | street | NaN |
freq | 28 | 650 | 4357 | 56 | 5740 | 13248 | 17358 | NaN | 4137 | 853 | NaN | NaN | 49 | 10075 | NaN |
mean | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.535663 | NaN | NaN | -76.617518 | 39.307771 | NaN | NaN | 1.0 |
std | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.275572 | NaN | NaN | 0.042479 | 0.029477 | NaN | NaN | 0.0 |
min | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.117709 | NaN | NaN | -76.711220 | 39.200470 | NaN | NaN | 1.0 |
25% | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.257688 | NaN | NaN | -76.648905 | 39.288490 | NaN | NaN | 1.0 |
50% | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.541888 | NaN | NaN | -76.614170 | 39.303850 | NaN | NaN | 1.0 |
75% | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.766702 | NaN | NaN | -76.587170 | 39.328290 | NaN | NaN | 1.0 |
max | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.000000 | NaN | NaN | -76.529770 | 39.371970 | NaN | NaN | 1.0 |
baltimore_train.describe(include='all')
CrimeDate | CrimeTime | CrimeCode | Location | Description | Inside/Outside | Weapon | Post | District | Neighborhood | Longitude | Latitude | Location 1 | Premise | Total Incidents | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 177604 | 177604 | 177604 | 177604 | 177604 | 177604 | 177604 | 177604.000000 | 177604 | 177604 | 177604.000000 | 177604.000000 | 177604 | 177604 | 177604.0 |
unique | 2072 | 2435 | 79 | 22781 | 15 | 4 | 5 | NaN | 9 | 278 | NaN | NaN | 74417 | 116 | NaN |
top | 04/27/2015 | 18:00:00 | 4e | 200 e pratt st | larceny | i | none | NaN | northeastern | downtown | NaN | NaN | (39.3180000000, -76.6582100000) | street | NaN |
freq | 298 | 4340 | 29065 | 440 | 39287 | 88319 | 116884 | NaN | 27451 | 5877 | NaN | NaN | 337 | 69325 | NaN |
mean | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.536132 | NaN | NaN | -76.617452 | 39.307395 | NaN | NaN | 1.0 |
std | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.276695 | NaN | NaN | 0.042192 | 0.029526 | NaN | NaN | 0.0 |
min | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.117709 | NaN | NaN | -76.711280 | 39.200410 | NaN | NaN | 1.0 |
25% | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.256628 | NaN | NaN | -76.648290 | 39.288330 | NaN | NaN | 1.0 |
50% | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.541888 | NaN | NaN | -76.613990 | 39.303580 | NaN | NaN | 1.0 |
75% | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.775186 | NaN | NaN | -76.587500 | 39.327742 | NaN | NaN | 1.0 |
max | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.000000 | NaN | NaN | -76.529770 | 39.371970 | NaN | NaN | 1.0 |
baltimore_val.describe(include='all')
CrimeDate | CrimeTime | CrimeCode | Location | Description | Inside/Outside | Weapon | Post | District | Neighborhood | Longitude | Latitude | Location 1 | Premise | Total Incidents | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 59202 | 59202 | 59202 | 59202 | 59202 | 59202 | 59202 | 59202.000000 | 59202 | 59202 | 59202.000000 | 59202.000000 | 59202 | 59202 | 59202.0 |
unique | 2070 | 1804 | 77 | 16050 | 15 | 4 | 5 | NaN | 9 | 276 | NaN | NaN | 35435 | 112 | NaN |
top | 04/27/2015 | 18:00:00 | 4e | 200 e pratt st | larceny | i | none | NaN | northeastern | downtown | NaN | NaN | (39.3180000000, -76.6582100000) | street | NaN |
freq | 81 | 1493 | 9671 | 140 | 13219 | 29448 | 38933 | NaN | 9254 | 1971 | NaN | NaN | 117 | 23144 | NaN |
mean | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.537601 | NaN | NaN | -76.617499 | 39.307502 | NaN | NaN | 1.0 |
std | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.276567 | NaN | NaN | 0.042191 | 0.029595 | NaN | NaN | 0.0 |
min | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.117709 | NaN | NaN | -76.711270 | 39.202540 | NaN | NaN | 1.0 |
25% | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.257688 | NaN | NaN | -76.648500 | 39.288340 | NaN | NaN | 1.0 |
50% | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.541888 | NaN | NaN | -76.614020 | 39.303930 | NaN | NaN | 1.0 |
75% | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.775186 | NaN | NaN | -76.587592 | 39.328030 | NaN | NaN | 1.0 |
max | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.000000 | NaN | NaN | -76.529770 | 39.371980 | NaN | NaN | 1.0 |