2023-programowanie-w-pythonie/zajecia2/zad_01.ipynb
Maksymilian Stachowiak 92dca8796c rozwiazanka
2023-11-26 09:12:43 +01:00

42 KiB
Raw Blame History

  1. Zaimportuj bibliotkę pandas jako pd.
import pandas as pd
  1. Wczytaj zbiór danych 311.csv do zniennej data.
df = pd.read_csv('./311.csv', low_memory=False)
  1. Wyświetl 5 pierwszych wierszy z data.
df.head(5)
Unique Key Created Date Closed Date Agency Agency Name Complaint Type Descriptor Location Type Incident Zip Incident Address ... Bridge Highway Name Bridge Highway Direction Road Ramp Bridge Highway Segment Garage Lot Name Ferry Direction Ferry Terminal Name Latitude Longitude Location
0 26589651 10/31/2013 02:08:41 AM NaN NYPD New York City Police Department Noise - Street/Sidewalk Loud Talking Street/Sidewalk 11432 90-03 169 STREET ... NaN NaN NaN NaN NaN NaN NaN 40.708275 -73.791604 (40.70827532593202, -73.79160395779721)
1 26593698 10/31/2013 02:01:04 AM NaN NYPD New York City Police Department Illegal Parking Commercial Overnight Parking Street/Sidewalk 11378 58 AVENUE ... NaN NaN NaN NaN NaN NaN NaN 40.721041 -73.909453 (40.721040535628305, -73.90945306791765)
2 26594139 10/31/2013 02:00:24 AM 10/31/2013 02:40:32 AM NYPD New York City Police Department Noise - Commercial Loud Music/Party Club/Bar/Restaurant 10032 4060 BROADWAY ... NaN NaN NaN NaN NaN NaN NaN 40.843330 -73.939144 (40.84332975466513, -73.93914371913482)
3 26595721 10/31/2013 01:56:23 AM 10/31/2013 02:21:48 AM NYPD New York City Police Department Noise - Vehicle Car/Truck Horn Street/Sidewalk 10023 WEST 72 STREET ... NaN NaN NaN NaN NaN NaN NaN 40.778009 -73.980213 (40.7780087446372, -73.98021349023975)
4 26590930 10/31/2013 01:53:44 AM NaN DOHMH Department of Health and Mental Hygiene Rodent Condition Attracting Rodents Vacant Lot 10027 WEST 124 STREET ... NaN NaN NaN NaN NaN NaN NaN 40.807691 -73.947387 (40.80769092704951, -73.94738703491433)

5 rows × 52 columns

  1. Wyświetl nazwy kolumn.
df.columns
Index(['Unique Key', 'Created Date', 'Closed Date', 'Agency', 'Agency Name',
       'Complaint Type', 'Descriptor', 'Location Type', 'Incident Zip',
       'Incident Address', 'Street Name', 'Cross Street 1', 'Cross Street 2',
       'Intersection Street 1', 'Intersection Street 2', 'Address Type',
       'City', 'Landmark', 'Facility Type', 'Status', 'Due Date',
       'Resolution Action Updated Date', 'Community Board', 'Borough',
       'X Coordinate (State Plane)', 'Y Coordinate (State Plane)',
       'Park Facility Name', 'Park Borough', 'School Name', 'School Number',
       'School Region', 'School Code', 'School Phone Number', 'School Address',
       'School City', 'School State', 'School Zip', 'School Not Found',
       'School or Citywide Complaint', 'Vehicle Type', 'Taxi Company Borough',
       'Taxi Pick Up Location', 'Bridge Highway Name',
       'Bridge Highway Direction', 'Road Ramp', 'Bridge Highway Segment',
       'Garage Lot Name', 'Ferry Direction', 'Ferry Terminal Name', 'Latitude',
       'Longitude', 'Location'],
      dtype='object')
  1. Wyświetl ile nasz zbiór danych ma kolumn i wierszy.
df.shape
(111069, 52)
  1. Wyświetl kolumnę 'City' z powyższego zbioru danych.
df['City']
0          JAMAICA
1          MASPETH
2         NEW YORK
3         NEW YORK
4         NEW YORK
            ...   
111064    BROOKLYN
111065     JAMAICA
111066    NEW YORK
111067    BROOKLYN
111068    BROOKLYN
Name: City, Length: 111069, dtype: object
  1. Wyświetl jakie wartoścu przyjmuje kolumna 'City'.
df['City'].values
array(['JAMAICA', 'MASPETH', 'NEW YORK', ..., 'NEW YORK', 'BROOKLYN',
       'BROOKLYN'], dtype=object)
  1. Zlicz wartości w kolumnie City.
df['City'].value_counts()
City
BROOKLYN          31662
NEW YORK          22664
BRONX             18438
STATEN ISLAND      4766
Jamaica            1521
                  ...  
BELLEVILLE            1
WOODBURY              1
BOHIEMA               1
CENTRAL ISLIP         1
NEWARK AIRPORT        1
Name: count, Length: 142, dtype: int64
  1. Wyświetl tylko pierwsze 4 wiersze z wcześniejszego polecenia.
df['City'].value_counts().head(4)
City
BROOKLYN         31662
NEW YORK         22664
BRONX            18438
STATEN ISLAND     4766
Name: count, dtype: int64
  1. Wyświetl, w ilu przypadkach kolumna City zawiera NaN.
df['City'].isna().sum()
12215
  1. Wyświetl data.info()
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 111069 entries, 0 to 111068
Data columns (total 52 columns):
 #   Column                          Non-Null Count   Dtype  
---  ------                          --------------   -----  
 0   Unique Key                      111069 non-null  int64  
 1   Created Date                    111069 non-null  object 
 2   Closed Date                     60270 non-null   object 
 3   Agency                          111069 non-null  object 
 4   Agency Name                     111069 non-null  object 
 5   Complaint Type                  111069 non-null  object 
 6   Descriptor                      110613 non-null  object 
 7   Location Type                   79022 non-null   object 
 8   Incident Zip                    98807 non-null   object 
 9   Incident Address                84441 non-null   object 
 10  Street Name                     84432 non-null   object 
 11  Cross Street 1                  84728 non-null   object 
 12  Cross Street 2                  84005 non-null   object 
 13  Intersection Street 1           19364 non-null   object 
 14  Intersection Street 2           19366 non-null   object 
 15  Address Type                    102247 non-null  object 
 16  City                            98854 non-null   object 
 17  Landmark                        95 non-null      object 
 18  Facility Type                   19104 non-null   object 
 19  Status                          111069 non-null  object 
 20  Due Date                        39239 non-null   object 
 21  Resolution Action Updated Date  96507 non-null   object 
 22  Community Board                 111069 non-null  object 
 23  Borough                         111069 non-null  object 
 24  X Coordinate (State Plane)      98143 non-null   float64
 25  Y Coordinate (State Plane)      98143 non-null   float64
 26  Park Facility Name              111069 non-null  object 
 27  Park Borough                    111069 non-null  object 
 28  School Name                     111069 non-null  object 
 29  School Number                   111048 non-null  object 
 30  School Region                   110524 non-null  object 
 31  School Code                     110524 non-null  object 
 32  School Phone Number             111069 non-null  object 
 33  School Address                  111069 non-null  object 
 34  School City                     111069 non-null  object 
 35  School State                    111069 non-null  object 
 36  School Zip                      111069 non-null  object 
 37  School Not Found                38984 non-null   object 
 38  School or Citywide Complaint    0 non-null       float64
 39  Vehicle Type                    99 non-null      object 
 40  Taxi Company Borough            117 non-null     object 
 41  Taxi Pick Up Location           1059 non-null    object 
 42  Bridge Highway Name             185 non-null     object 
 43  Bridge Highway Direction        185 non-null     object 
 44  Road Ramp                       180 non-null     object 
 45  Bridge Highway Segment          219 non-null     object 
 46  Garage Lot Name                 49 non-null      object 
 47  Ferry Direction                 24 non-null      object 
 48  Ferry Terminal Name             70 non-null      object 
 49  Latitude                        98143 non-null   float64
 50  Longitude                       98143 non-null   float64
 51  Location                        98143 non-null   object 
dtypes: float64(5), int64(1), object(46)
memory usage: 44.1+ MB
  1. Wyświetl tylko kolumny Borough i Agency i tylko 5 ostatnich linii.
df[['Borough','Agency']].tail(5)
Borough Agency
111064 BROOKLYN DPR
111065 QUEENS NYPD
111066 MANHATTAN NYPD
111067 BROOKLYN NYPD
111068 BROOKLYN NYPD
  1. Wyświetl tylko te dane, dla których wartość z kolumny Agency jest równa NYPD. Zlicz ile jest takich przykładów.
df['Agency'].eq('NYPD').value_counts()
Agency
False    95774
True     15295
Name: count, dtype: int64
  1. Wyświetl wartość minimalną i maksymalną z kolumny Longitude.
print(df['Longitude'].min())
print(df['Longitude'].max())
-74.25443731808713
-73.70127761473603
  1. Dodaj kolumne diff, która powstanie przez sumowanie kolumn Longitude i Latitude.
df['diff'] = df['Longitude'] + df['Latitude']

df
Unique Key Created Date Closed Date Agency Agency Name Complaint Type Descriptor Location Type Incident Zip Incident Address ... Bridge Highway Direction Road Ramp Bridge Highway Segment Garage Lot Name Ferry Direction Ferry Terminal Name Latitude Longitude Location diff
0 26589651 10/31/2013 02:08:41 AM NaN NYPD New York City Police Department Noise - Street/Sidewalk Loud Talking Street/Sidewalk 11432 90-03 169 STREET ... NaN NaN NaN NaN NaN NaN 40.708275 -73.791604 (40.70827532593202, -73.79160395779721) -33.083329
1 26593698 10/31/2013 02:01:04 AM NaN NYPD New York City Police Department Illegal Parking Commercial Overnight Parking Street/Sidewalk 11378 58 AVENUE ... NaN NaN NaN NaN NaN NaN 40.721041 -73.909453 (40.721040535628305, -73.90945306791765) -33.188413
2 26594139 10/31/2013 02:00:24 AM 10/31/2013 02:40:32 AM NYPD New York City Police Department Noise - Commercial Loud Music/Party Club/Bar/Restaurant 10032 4060 BROADWAY ... NaN NaN NaN NaN NaN NaN 40.843330 -73.939144 (40.84332975466513, -73.93914371913482) -33.095814
3 26595721 10/31/2013 01:56:23 AM 10/31/2013 02:21:48 AM NYPD New York City Police Department Noise - Vehicle Car/Truck Horn Street/Sidewalk 10023 WEST 72 STREET ... NaN NaN NaN NaN NaN NaN 40.778009 -73.980213 (40.7780087446372, -73.98021349023975) -33.202205
4 26590930 10/31/2013 01:53:44 AM NaN DOHMH Department of Health and Mental Hygiene Rodent Condition Attracting Rodents Vacant Lot 10027 WEST 124 STREET ... NaN NaN NaN NaN NaN NaN 40.807691 -73.947387 (40.80769092704951, -73.94738703491433) -33.139696
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
111064 26426013 10/04/2013 12:01:13 AM 10/07/2013 04:07:16 PM DPR Department of Parks and Recreation Maintenance or Facility Structure - Outdoors Park 11213 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
111065 26428083 10/04/2013 12:01:05 AM 10/04/2013 02:13:50 AM NYPD New York City Police Department Illegal Parking Posted Parking Sign Violation Street/Sidewalk 11434 NaN ... NaN NaN NaN NaN NaN NaN 40.656160 -73.767353 (40.656160351546845, -73.76735262738222) -33.111192
111066 26428987 10/04/2013 12:00:45 AM 10/04/2013 01:25:01 AM NYPD New York City Police Department Noise - Street/Sidewalk Loud Talking Street/Sidewalk 10016 344 EAST 28 STREET ... NaN NaN NaN NaN NaN NaN 40.740295 -73.976952 (40.740295354643706, -73.97695165980414) -33.236656
111067 26426115 10/04/2013 12:00:28 AM 10/04/2013 04:17:32 AM NYPD New York City Police Department Noise - Commercial Loud Talking Club/Bar/Restaurant 11226 1233 FLATBUSH AVENUE ... NaN NaN NaN NaN NaN NaN 40.640182 -73.955306 (40.64018174662485, -73.95530566958138) -33.315124
111068 26428033 10/04/2013 12:00:10 AM 10/04/2013 01:20:52 AM NYPD New York City Police Department Blocked Driveway Partial Access Street/Sidewalk 11236 1259 EAST 94 STREET ... NaN NaN NaN NaN NaN NaN 40.640024 -73.900717 (40.640024057399216, -73.90071711703163) -33.260693

111069 rows × 53 columns

  1. Zlicz wartości dla kolumny 'Descriptor', dla której Agency jest równe NYPD.
df['Descriptor'].eq('NYPD').value_counts()