billionaries-wizualizacja/EDA_Billionaires .ipynb
2024-04-10 11:47:54 +02:00

1.2 MiB
Raw Blame History

Statystyki miliarderów

Zbiór danych zawiera statystyki dotyczące miliarderów na świecie, zawarte są również ich dane osobowe, branże którymi się zajmują oraz firmy.

Został on pobrany z serwisu Kaggle.

Źródło

Potencjalne wykresy

  • Analiza rozkładu majątku: Zbadanie rozkład majątku miliarderów w różnych branżach, krajach i regionach.

  • Analiza demograficzna: Przedstawienie wieku, płeci i miejsca urodzenia miliarderów.

  • Samodzielnie wytworzone vs. odziedziczone bogactwo: Analiza odseteku samodzielnie stworzonych miliarderów i tych, którzy odziedziczyli swój majątek.

  • Wskaźniki ekonomiczne: Zbadanie korelacji między bogactwem miliarderów a wskaźnikami ekonomicznymi, takimi jak PKB, CPI (wskaźnik cen towarów i usług) i stawki podatkowe.

  • Analiza geoprzestrzenna: Wizualizacja geograficznego rozmieszczenia miliarderów i ich majątku na mapie.

  • Trendy w czasie: Prześledzenie zmian demograficznych i majątkowych miliarderów na przestrzeni lat.

Wykresy testowe oraz przedstawienie zbioru danych

import pandas as pd 
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

plt.style.use('ggplot')
df = pd.read_csv('data.csv')
df.head()
rank finalWorth category personName age country city source industries countryOfCitizenship ... cpi_change_country gdp_country gross_tertiary_education_enrollment gross_primary_education_enrollment_country life_expectancy_country tax_revenue_country_country total_tax_rate_country population_country latitude_country longitude_country
0 1 211000 Fashion & Retail Bernard Arnault & family 74.0 France Paris LVMH Fashion & Retail France ... 1.1 $2,715,518,274,227 65.6 102.5 82.5 24.2 60.7 67059887.0 46.227638 2.213749
1 2 180000 Automotive Elon Musk 51.0 United States Austin Tesla, SpaceX Automotive United States ... 7.5 $21,427,700,000,000 88.2 101.8 78.5 9.6 36.6 328239523.0 37.090240 -95.712891
2 3 114000 Technology Jeff Bezos 59.0 United States Medina Amazon Technology United States ... 7.5 $21,427,700,000,000 88.2 101.8 78.5 9.6 36.6 328239523.0 37.090240 -95.712891
3 4 107000 Technology Larry Ellison 78.0 United States Lanai Oracle Technology United States ... 7.5 $21,427,700,000,000 88.2 101.8 78.5 9.6 36.6 328239523.0 37.090240 -95.712891
4 5 106000 Finance & Investments Warren Buffett 92.0 United States Omaha Berkshire Hathaway Finance & Investments United States ... 7.5 $21,427,700,000,000 88.2 101.8 78.5 9.6 36.6 328239523.0 37.090240 -95.712891

5 rows × 35 columns

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2640 entries, 0 to 2639
Data columns (total 35 columns):
 #   Column                                      Non-Null Count  Dtype  
---  ------                                      --------------  -----  
 0   rank                                        2640 non-null   int64  
 1   finalWorth                                  2640 non-null   int64  
 2   category                                    2640 non-null   object 
 3   personName                                  2640 non-null   object 
 4   age                                         2575 non-null   float64
 5   country                                     2602 non-null   object 
 6   city                                        2568 non-null   object 
 7   source                                      2640 non-null   object 
 8   industries                                  2640 non-null   object 
 9   countryOfCitizenship                        2640 non-null   object 
 10  organization                                325 non-null    object 
 11  selfMade                                    2640 non-null   bool   
 12  status                                      2640 non-null   object 
 13  gender                                      2640 non-null   object 
 14  birthDate                                   2564 non-null   object 
 15  lastName                                    2640 non-null   object 
 16  firstName                                   2637 non-null   object 
 17  title                                       339 non-null    object 
 18  date                                        2640 non-null   object 
 19  state                                       753 non-null    object 
 20  residenceStateRegion                        747 non-null    object 
 21  birthYear                                   2564 non-null   float64
 22  birthMonth                                  2564 non-null   float64
 23  birthDay                                    2564 non-null   float64
 24  cpi_country                                 2456 non-null   float64
 25  cpi_change_country                          2456 non-null   float64
 26  gdp_country                                 2476 non-null   object 
 27  gross_tertiary_education_enrollment         2458 non-null   float64
 28  gross_primary_education_enrollment_country  2459 non-null   float64
 29  life_expectancy_country                     2458 non-null   float64
 30  tax_revenue_country_country                 2457 non-null   float64
 31  total_tax_rate_country                      2458 non-null   float64
 32  population_country                          2476 non-null   float64
 33  latitude_country                            2476 non-null   float64
 34  longitude_country                           2476 non-null   float64
dtypes: bool(1), float64(14), int64(2), object(18)
memory usage: 704.0+ KB
df.isnull().sum()
rank                                             0
finalWorth                                       0
category                                         0
personName                                       0
age                                             65
country                                         38
city                                            72
source                                           0
industries                                       0
countryOfCitizenship                             0
organization                                  2315
selfMade                                         0
status                                           0
gender                                           0
birthDate                                       76
lastName                                         0
firstName                                        3
title                                         2301
date                                             0
state                                         1887
residenceStateRegion                          1893
birthYear                                       76
birthMonth                                      76
birthDay                                        76
cpi_country                                    184
cpi_change_country                             184
gdp_country                                    164
gross_tertiary_education_enrollment            182
gross_primary_education_enrollment_country     181
life_expectancy_country                        182
tax_revenue_country_country                    183
total_tax_rate_country                         182
population_country                             164
latitude_country                               164
longitude_country                              164
dtype: int64
ax = sns.barplot(df.head(10), x='finalWorth',y='personName', hue = 'personName', legend = False,orient='h', palette='rainbow')
ax.set_title('Najbardziej majętne osoby')
ax.set_xlabel('Osoba')
ax.set_ylabel('Majątek')
for container in ax.containers:
    ax.bar_label(container, fontsize=8)
plt.show()
ppl_in_countries = df.groupby('countryOfCitizenship')['rank'].count().reset_index().sort_values(by ='rank',ascending=False).head(20)
ax = sns.barplot(ppl_in_countries,x='rank',y='countryOfCitizenship',orient='h',hue = 'countryOfCitizenship', legend = False, palette='rainbow')
ax.set_title('Kraj z największą liczbą miliarderów')
ax.set_ylabel('Kraje')
ax.set_xlabel('Liczba milarderów')
for container in ax.containers:
    ax.bar_label(container, fontsize=8)
plt.show()
age_and_wealthy = df.groupby('age')['finalWorth'].mean()
age_and_wealthy
age
18.0     3500.000000
19.0     1700.000000
20.0     2300.000000
21.0     2600.000000
26.0     1450.000000
            ...     
96.0     4366.666667
97.0     1425.000000
98.0     1750.000000
99.0     4375.000000
101.0    1300.000000
Name: finalWorth, Length: 79, dtype: float64
sns.lineplot(df, x='age', y='finalWorth',errorbar=None)
plt.show()
#category = df.explode('category')
categories = df.groupby('category')['rank'].count().sort_values(ascending=False).reset_index().head(20)
ax = sns.barplot(categories,x='rank',y='category',orient='h',hue = 'category', legend = False, palette='rainbow')
ax.set_title('Najpopularniejsze kategorie biznesu')
ax.set_ylabel('Kategoria')
ax.set_xlabel('Liczba osób')
for container in ax.containers:
    ax.bar_label(container, fontsize=8)
plt.show()
plt.figure(figsize=(8, 6))
df['category'].value_counts().plot(kind='pie', autopct='%1.1f%%')
plt.title('Billionaires by Industry')
plt.axis('equal')
plt.show()
world_map = plt.figure(figsize=(12, 6))
ax = world_map.add_subplot(111)
df.groupby('country')['finalWorth'].sum().sort_values(ascending=False).head(10).plot(kind='bar', ax=ax)
plt.title('Top 10 Countries by Billionaire Wealth')
plt.xlabel('Country')
plt.ylabel('Total Wealth (USD)')
plt.xticks(rotation=90)
plt.show()

Przykładowe wykresy z internetu

for i in ['images/kraje.png', 'images/top.png', 'images/world.jpeg']:
    imga = mpimg.imread(i)
    plt.figure(figsize=(15,8))
    imgplot = plt.imshow(imga)