ads/kibice_mlb/kibice.ipynb
2022-10-18 00:24:12 +02:00

88 KiB
Raw Blame History

Analiza zależności ilości kibiców w baseball mlb

import pandas as pd

data = pd.read_csv("baseball_reference_2016_clean.csv")

data
Unnamed: 0 attendance away_team away_team_errors away_team_hits away_team_runs date field_type game_type home_team ... temperature wind_speed wind_direction sky total_runs game_hours_dec season home_team_win home_team_loss home_team_outcome
0 0 40030.0 New York Mets 1 7 3 2016-04-03 on grass Night Game Kansas City Royals ... 74.0 14.0 from Right to Left Sunny 7 3.216667 regular season 1 0 Win
1 1 21621.0 Philadelphia Phillies 0 5 2 2016-04-06 on grass Night Game Cincinnati Reds ... 55.0 24.0 from Right to Left Overcast 5 2.383333 regular season 1 0 Win
2 2 12622.0 Minnesota Twins 0 5 2 2016-04-06 on grass Night Game Baltimore Orioles ... 48.0 7.0 out to Leftfield Unknown 6 3.183333 regular season 1 0 Win
3 3 18531.0 Washington Nationals 0 8 3 2016-04-06 on grass Night Game Atlanta Braves ... 65.0 10.0 from Right to Left Cloudy 4 2.883333 regular season 0 1 Loss
4 4 18572.0 Colorado Rockies 1 8 4 2016-04-06 on grass Day Game Arizona Diamondbacks ... 77.0 0.0 in unknown direction In Dome 7 2.650000 regular season 0 1 Loss
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2458 2458 31042.0 Toronto Blue Jays 2 7 5 2016-04-03 on turf Day Game Tampa Bay Rays ... 72.0 0.0 in unknown direction In Dome 8 2.850000 regular season 0 1 Loss
2459 2459 39500.0 St. Louis Cardinals 0 5 1 2016-04-03 on grass Day Game Pittsburgh Pirates ... 39.0 14.0 out to Leftfield Unknown 5 3.033333 regular season 1 0 Win
2460 2460 20098.0 San Francisco Giants 0 6 3 2016-04-06 on grass Day Game Milwaukee Brewers ... 66.0 0.0 in unknown direction In Dome 7 3.316667 regular season 1 0 Win
2461 2461 17883.0 Detroit Tigers 0 13 7 2016-04-06 on grass Day Game Miami Marlins ... 71.0 0.0 in unknown direction In Dome 10 3.366667 regular season 0 1 Loss
2462 2462 10298.0 Boston Red Sox 1 10 6 2016-04-06 on grass Night Game Cleveland Indians ... 60.0 7.0 out to Leftfield Unknown 13 3.483333 regular season 1 0 Win

2463 rows × 26 columns

Pogoda

image

data['sky'].unique()
array(['Sunny', 'Overcast', 'Unknown', 'Cloudy', 'In Dome', 'Drizzle',
       'Rain', 'Night'], dtype=object)
sunny = data[data['sky'] == 'Sunny']
overcast = data[data['sky'] == 'Overcast']
cloudy = data[data['sky'] == 'Cloudy']
in_dome = data[data['sky'] == 'In Dome']
drizzle = data[data['sky'] == 'Drizzle']
rain = data[data['sky'] == 'Rain']
night = data[data['sky'] == 'Night']

Średnia ilość kibiców w zależności od pogody

import matplotlib.pyplot as plt
  
left = [1, 2, 3, 4, 5, 6, 7]

height = [sunny['attendance'].mean(), overcast['attendance'].mean(), cloudy['attendance'].mean(), 
in_dome['attendance'].mean(), drizzle['attendance'].mean(), rain['attendance'].mean(), night['attendance'].mean()]

tick_label = ['sunny', 'overcast', 'cloudy', 'in dome', 'drizzle', 'rain', 'night']

plt.bar(left, height, tick_label = tick_label,
        width = 0.8, color = ['blue', 'green', 'red'])
  
plt.xlabel('Weather')
plt.ylabel('Attendance')
plt.title('Attendance - Weather')

plt.show()

Mediana

import matplotlib.pyplot as plt
  
left = [1, 2, 3, 4, 5, 6, 7]

height = [sunny['attendance'].median(), overcast['attendance'].median(), cloudy['attendance'].median(), 
in_dome['attendance'].median(), drizzle['attendance'].median(), rain['attendance'].median(), night['attendance'].median()]

tick_label = ['sunny', 'overcast', 'cloudy', 'in dome', 'drizzle', 'rain', 'night']

plt.bar(left, height, tick_label = tick_label,
        width = 0.8, color = ['blue', 'green', 'red'])
  
plt.xlabel('Weather')
plt.ylabel('Attendance')
plt.title('Attendance - Weather')

plt.show()

W nocy prawdopodobnie najwięcej, gdyż większa grupa odbiorców ma dostęp do meczy online z całego świata.
Pod kopułą może być najmniej widzów, gdyż takie stadiony mają mniejsze trybuny.

Dzień tygodnia

image2

data['day_of_week'].unique()
array(['Sunday', 'Wednesday', 'Tuesday', 'Monday', 'Thursday', 'Saturday',
       'Friday'], dtype=object)
monday = data[data['day_of_week'] == 'Monday']
tuesday = data[data['day_of_week'] == 'Tuesday']
wednesday = data[data['day_of_week'] == 'Wednesday']
thursday = data[data['day_of_week'] == 'Thursday']
friday = data[data['day_of_week'] == 'Friday']
saturday = data[data['day_of_week'] == 'Saturday']
sunday = data[data['day_of_week'] == 'Sunday']

Średnia ilość kibiców w danym dniu

import matplotlib.pyplot as plt
  
left = [1, 2, 3, 4, 5, 6, 7]

height = [monday['attendance'].mean(), tuesday['attendance'].mean(), wednesday['attendance'].mean(), 
thursday['attendance'].mean(), friday['attendance'].mean(), saturday['attendance'].mean(), sunday['attendance'].mean()]

tick_label = ['monday', 'tuesday', 'wednesday', 'thursday', 'friday', 'saturday', 'sunday']

plt.bar(left, height, tick_label = tick_label,
        width = 0.8, color = ['blue', 'green', 'red'])
  
plt.xlabel('Day')
plt.ylabel('Attendance')
plt.title('Attendance - Day')

plt.show()

Mediana

import matplotlib.pyplot as plt
  
left = [1, 2, 3, 4, 5, 6, 7]

height = [monday['attendance'].median(), tuesday['attendance'].median(), wednesday['attendance'].median(), 
thursday['attendance'].median(), friday['attendance'].median(), saturday['attendance'].median(), sunday['attendance'].median()]

tick_label = ['monday', 'tuesday', 'wednesday', 'thursday', 'friday', 'saturday', 'sunday']

plt.bar(left, height, tick_label = tick_label,
        width = 0.8, color = ['blue', 'green', 'red'])
  
plt.xlabel('Day')
plt.ylabel('Attendance')
plt.title('Attendance - Day')

plt.show()

Najwięcej kibiców jest w weekendy.

Zwycięstwo / porażka gospodarzy

image3

data['home_team_outcome'].unique()
array(['Win', 'Loss'], dtype=object)
win = data[data['home_team_outcome'] == 'Win']
loss = data[data['home_team_outcome'] == 'Loss']

Średnia ilość kibiców przy wygraniu/przegraniu gospodarzy

left = [1, 2]

height = [win['attendance'].mean(), loss['attendance'].mean()]

tick_label = ['win', 'loss']

plt.bar(left, height, tick_label = tick_label,
        width = 0.8, color = ['blue', 'red'])
  
plt.xlabel('Win')
plt.ylabel('Attendance')
plt.title('Attendance - Win')

plt.show()

Mediana

left = [1, 2]

height = [win['attendance'].median(), loss['attendance'].median()]

tick_label = ['win', 'loss']

plt.bar(left, height, tick_label = tick_label,
        width = 0.8, color = ['blue', 'red'])
  
plt.xlabel('Win')
plt.ylabel('Attendance')
plt.title('Attendance - Win')

plt.show()

Nie ma to wpływu, raczej nie jest tak, że widać przegraną przed końcem i przez to kibice wychodzą. A nawet jeśli to działa to w miarę równomiernie w obie strony.