42 KiB
42 KiB
- Zaimportuj bibliotkę pandas jako pd.
import pandas as pd
- Wczytaj zbiór danych
311.csv
do zniennej data.
df = pd.read_csv('./311.csv', low_memory=False)
- Wyświetl 5 pierwszych wierszy z data.
df.head(5)
Unique Key | Created Date | Closed Date | Agency | Agency Name | Complaint Type | Descriptor | Location Type | Incident Zip | Incident Address | ... | Bridge Highway Name | Bridge Highway Direction | Road Ramp | Bridge Highway Segment | Garage Lot Name | Ferry Direction | Ferry Terminal Name | Latitude | Longitude | Location | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 26589651 | 10/31/2013 02:08:41 AM | NaN | NYPD | New York City Police Department | Noise - Street/Sidewalk | Loud Talking | Street/Sidewalk | 11432 | 90-03 169 STREET | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 40.708275 | -73.791604 | (40.70827532593202, -73.79160395779721) |
1 | 26593698 | 10/31/2013 02:01:04 AM | NaN | NYPD | New York City Police Department | Illegal Parking | Commercial Overnight Parking | Street/Sidewalk | 11378 | 58 AVENUE | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 40.721041 | -73.909453 | (40.721040535628305, -73.90945306791765) |
2 | 26594139 | 10/31/2013 02:00:24 AM | 10/31/2013 02:40:32 AM | NYPD | New York City Police Department | Noise - Commercial | Loud Music/Party | Club/Bar/Restaurant | 10032 | 4060 BROADWAY | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 40.843330 | -73.939144 | (40.84332975466513, -73.93914371913482) |
3 | 26595721 | 10/31/2013 01:56:23 AM | 10/31/2013 02:21:48 AM | NYPD | New York City Police Department | Noise - Vehicle | Car/Truck Horn | Street/Sidewalk | 10023 | WEST 72 STREET | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 40.778009 | -73.980213 | (40.7780087446372, -73.98021349023975) |
4 | 26590930 | 10/31/2013 01:53:44 AM | NaN | DOHMH | Department of Health and Mental Hygiene | Rodent | Condition Attracting Rodents | Vacant Lot | 10027 | WEST 124 STREET | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 40.807691 | -73.947387 | (40.80769092704951, -73.94738703491433) |
5 rows × 52 columns
- Wyświetl nazwy kolumn.
df.columns
Index(['Unique Key', 'Created Date', 'Closed Date', 'Agency', 'Agency Name', 'Complaint Type', 'Descriptor', 'Location Type', 'Incident Zip', 'Incident Address', 'Street Name', 'Cross Street 1', 'Cross Street 2', 'Intersection Street 1', 'Intersection Street 2', 'Address Type', 'City', 'Landmark', 'Facility Type', 'Status', 'Due Date', 'Resolution Action Updated Date', 'Community Board', 'Borough', 'X Coordinate (State Plane)', 'Y Coordinate (State Plane)', 'Park Facility Name', 'Park Borough', 'School Name', 'School Number', 'School Region', 'School Code', 'School Phone Number', 'School Address', 'School City', 'School State', 'School Zip', 'School Not Found', 'School or Citywide Complaint', 'Vehicle Type', 'Taxi Company Borough', 'Taxi Pick Up Location', 'Bridge Highway Name', 'Bridge Highway Direction', 'Road Ramp', 'Bridge Highway Segment', 'Garage Lot Name', 'Ferry Direction', 'Ferry Terminal Name', 'Latitude', 'Longitude', 'Location'], dtype='object')
- Wyświetl ile nasz zbiór danych ma kolumn i wierszy.
df.shape
(111069, 52)
- Wyświetl kolumnę 'City' z powyższego zbioru danych.
df['City']
0 JAMAICA 1 MASPETH 2 NEW YORK 3 NEW YORK 4 NEW YORK ... 111064 BROOKLYN 111065 JAMAICA 111066 NEW YORK 111067 BROOKLYN 111068 BROOKLYN Name: City, Length: 111069, dtype: object
- Wyświetl jakie wartoścu przyjmuje kolumna 'City'.
df['City'].values
array(['JAMAICA', 'MASPETH', 'NEW YORK', ..., 'NEW YORK', 'BROOKLYN', 'BROOKLYN'], dtype=object)
- Zlicz wartości w kolumnie
City
.
df['City'].value_counts()
City BROOKLYN 31662 NEW YORK 22664 BRONX 18438 STATEN ISLAND 4766 Jamaica 1521 ... BELLEVILLE 1 WOODBURY 1 BOHIEMA 1 CENTRAL ISLIP 1 NEWARK AIRPORT 1 Name: count, Length: 142, dtype: int64
- Wyświetl tylko pierwsze 4 wiersze z wcześniejszego polecenia.
df['City'].value_counts().head(4)
City BROOKLYN 31662 NEW YORK 22664 BRONX 18438 STATEN ISLAND 4766 Name: count, dtype: int64
- Wyświetl, w ilu przypadkach kolumna City zawiera NaN.
df['City'].isna().sum()
12215
- Wyświetl data.info()
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 111069 entries, 0 to 111068 Data columns (total 52 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Unique Key 111069 non-null int64 1 Created Date 111069 non-null object 2 Closed Date 60270 non-null object 3 Agency 111069 non-null object 4 Agency Name 111069 non-null object 5 Complaint Type 111069 non-null object 6 Descriptor 110613 non-null object 7 Location Type 79022 non-null object 8 Incident Zip 98807 non-null object 9 Incident Address 84441 non-null object 10 Street Name 84432 non-null object 11 Cross Street 1 84728 non-null object 12 Cross Street 2 84005 non-null object 13 Intersection Street 1 19364 non-null object 14 Intersection Street 2 19366 non-null object 15 Address Type 102247 non-null object 16 City 98854 non-null object 17 Landmark 95 non-null object 18 Facility Type 19104 non-null object 19 Status 111069 non-null object 20 Due Date 39239 non-null object 21 Resolution Action Updated Date 96507 non-null object 22 Community Board 111069 non-null object 23 Borough 111069 non-null object 24 X Coordinate (State Plane) 98143 non-null float64 25 Y Coordinate (State Plane) 98143 non-null float64 26 Park Facility Name 111069 non-null object 27 Park Borough 111069 non-null object 28 School Name 111069 non-null object 29 School Number 111048 non-null object 30 School Region 110524 non-null object 31 School Code 110524 non-null object 32 School Phone Number 111069 non-null object 33 School Address 111069 non-null object 34 School City 111069 non-null object 35 School State 111069 non-null object 36 School Zip 111069 non-null object 37 School Not Found 38984 non-null object 38 School or Citywide Complaint 0 non-null float64 39 Vehicle Type 99 non-null object 40 Taxi Company Borough 117 non-null object 41 Taxi Pick Up Location 1059 non-null object 42 Bridge Highway Name 185 non-null object 43 Bridge Highway Direction 185 non-null object 44 Road Ramp 180 non-null object 45 Bridge Highway Segment 219 non-null object 46 Garage Lot Name 49 non-null object 47 Ferry Direction 24 non-null object 48 Ferry Terminal Name 70 non-null object 49 Latitude 98143 non-null float64 50 Longitude 98143 non-null float64 51 Location 98143 non-null object dtypes: float64(5), int64(1), object(46) memory usage: 44.1+ MB
- Wyświetl tylko kolumny Borough i Agency i tylko 5 ostatnich linii.
df[['Borough','Agency']].tail(5)
Borough | Agency | |
---|---|---|
111064 | BROOKLYN | DPR |
111065 | QUEENS | NYPD |
111066 | MANHATTAN | NYPD |
111067 | BROOKLYN | NYPD |
111068 | BROOKLYN | NYPD |
- Wyświetl tylko te dane, dla których wartość z kolumny Agency jest równa NYPD. Zlicz ile jest takich przykładów.
df['Agency'].eq('NYPD').value_counts()
Agency False 95774 True 15295 Name: count, dtype: int64
- Wyświetl wartość minimalną i maksymalną z kolumny Longitude.
print(df['Longitude'].min())
print(df['Longitude'].max())
-74.25443731808713 -73.70127761473603
- Dodaj kolumne diff, która powstanie przez sumowanie kolumn Longitude i Latitude.
df['diff'] = df['Longitude'] + df['Latitude']
df
Unique Key | Created Date | Closed Date | Agency | Agency Name | Complaint Type | Descriptor | Location Type | Incident Zip | Incident Address | ... | Bridge Highway Direction | Road Ramp | Bridge Highway Segment | Garage Lot Name | Ferry Direction | Ferry Terminal Name | Latitude | Longitude | Location | diff | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 26589651 | 10/31/2013 02:08:41 AM | NaN | NYPD | New York City Police Department | Noise - Street/Sidewalk | Loud Talking | Street/Sidewalk | 11432 | 90-03 169 STREET | ... | NaN | NaN | NaN | NaN | NaN | NaN | 40.708275 | -73.791604 | (40.70827532593202, -73.79160395779721) | -33.083329 |
1 | 26593698 | 10/31/2013 02:01:04 AM | NaN | NYPD | New York City Police Department | Illegal Parking | Commercial Overnight Parking | Street/Sidewalk | 11378 | 58 AVENUE | ... | NaN | NaN | NaN | NaN | NaN | NaN | 40.721041 | -73.909453 | (40.721040535628305, -73.90945306791765) | -33.188413 |
2 | 26594139 | 10/31/2013 02:00:24 AM | 10/31/2013 02:40:32 AM | NYPD | New York City Police Department | Noise - Commercial | Loud Music/Party | Club/Bar/Restaurant | 10032 | 4060 BROADWAY | ... | NaN | NaN | NaN | NaN | NaN | NaN | 40.843330 | -73.939144 | (40.84332975466513, -73.93914371913482) | -33.095814 |
3 | 26595721 | 10/31/2013 01:56:23 AM | 10/31/2013 02:21:48 AM | NYPD | New York City Police Department | Noise - Vehicle | Car/Truck Horn | Street/Sidewalk | 10023 | WEST 72 STREET | ... | NaN | NaN | NaN | NaN | NaN | NaN | 40.778009 | -73.980213 | (40.7780087446372, -73.98021349023975) | -33.202205 |
4 | 26590930 | 10/31/2013 01:53:44 AM | NaN | DOHMH | Department of Health and Mental Hygiene | Rodent | Condition Attracting Rodents | Vacant Lot | 10027 | WEST 124 STREET | ... | NaN | NaN | NaN | NaN | NaN | NaN | 40.807691 | -73.947387 | (40.80769092704951, -73.94738703491433) | -33.139696 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
111064 | 26426013 | 10/04/2013 12:01:13 AM | 10/07/2013 04:07:16 PM | DPR | Department of Parks and Recreation | Maintenance or Facility | Structure - Outdoors | Park | 11213 | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
111065 | 26428083 | 10/04/2013 12:01:05 AM | 10/04/2013 02:13:50 AM | NYPD | New York City Police Department | Illegal Parking | Posted Parking Sign Violation | Street/Sidewalk | 11434 | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | 40.656160 | -73.767353 | (40.656160351546845, -73.76735262738222) | -33.111192 |
111066 | 26428987 | 10/04/2013 12:00:45 AM | 10/04/2013 01:25:01 AM | NYPD | New York City Police Department | Noise - Street/Sidewalk | Loud Talking | Street/Sidewalk | 10016 | 344 EAST 28 STREET | ... | NaN | NaN | NaN | NaN | NaN | NaN | 40.740295 | -73.976952 | (40.740295354643706, -73.97695165980414) | -33.236656 |
111067 | 26426115 | 10/04/2013 12:00:28 AM | 10/04/2013 04:17:32 AM | NYPD | New York City Police Department | Noise - Commercial | Loud Talking | Club/Bar/Restaurant | 11226 | 1233 FLATBUSH AVENUE | ... | NaN | NaN | NaN | NaN | NaN | NaN | 40.640182 | -73.955306 | (40.64018174662485, -73.95530566958138) | -33.315124 |
111068 | 26428033 | 10/04/2013 12:00:10 AM | 10/04/2013 01:20:52 AM | NYPD | New York City Police Department | Blocked Driveway | Partial Access | Street/Sidewalk | 11236 | 1259 EAST 94 STREET | ... | NaN | NaN | NaN | NaN | NaN | NaN | 40.640024 | -73.900717 | (40.640024057399216, -73.90071711703163) | -33.260693 |
111069 rows × 53 columns
- Zlicz wartości dla kolumny 'Descriptor', dla której Agency jest równe NYPD.
df['Descriptor'].eq('NYPD').value_counts()