1.2 MiB
Statystyki miliarderów
Zbiór danych zawiera statystyki dotyczące miliarderów na świecie, zawarte są również ich dane osobowe, branże którymi się zajmują oraz firmy.
Został on pobrany z serwisu Kaggle.
Potencjalne wykresy
Analiza rozkładu majątku: Zbadanie rozkład majątku miliarderów w różnych branżach, krajach i regionach.
Analiza demograficzna: Przedstawienie wieku, płeci i miejsca urodzenia miliarderów.
Samodzielnie wytworzone vs. odziedziczone bogactwo: Analiza odseteku samodzielnie stworzonych miliarderów i tych, którzy odziedziczyli swój majątek.
Wskaźniki ekonomiczne: Zbadanie korelacji między bogactwem miliarderów a wskaźnikami ekonomicznymi, takimi jak PKB, CPI (wskaźnik cen towarów i usług) i stawki podatkowe.
Analiza geoprzestrzenna: Wizualizacja geograficznego rozmieszczenia miliarderów i ich majątku na mapie.
Trendy w czasie: Prześledzenie zmian demograficznych i majątkowych miliarderów na przestrzeni lat.
Wykresy testowe oraz przedstawienie zbioru danych
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
plt.style.use('ggplot')
df = pd.read_csv('data.csv')
df.head()
rank | finalWorth | category | personName | age | country | city | source | industries | countryOfCitizenship | ... | cpi_change_country | gdp_country | gross_tertiary_education_enrollment | gross_primary_education_enrollment_country | life_expectancy_country | tax_revenue_country_country | total_tax_rate_country | population_country | latitude_country | longitude_country | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 211000 | Fashion & Retail | Bernard Arnault & family | 74.0 | France | Paris | LVMH | Fashion & Retail | France | ... | 1.1 | $2,715,518,274,227 | 65.6 | 102.5 | 82.5 | 24.2 | 60.7 | 67059887.0 | 46.227638 | 2.213749 |
1 | 2 | 180000 | Automotive | Elon Musk | 51.0 | United States | Austin | Tesla, SpaceX | Automotive | United States | ... | 7.5 | $21,427,700,000,000 | 88.2 | 101.8 | 78.5 | 9.6 | 36.6 | 328239523.0 | 37.090240 | -95.712891 |
2 | 3 | 114000 | Technology | Jeff Bezos | 59.0 | United States | Medina | Amazon | Technology | United States | ... | 7.5 | $21,427,700,000,000 | 88.2 | 101.8 | 78.5 | 9.6 | 36.6 | 328239523.0 | 37.090240 | -95.712891 |
3 | 4 | 107000 | Technology | Larry Ellison | 78.0 | United States | Lanai | Oracle | Technology | United States | ... | 7.5 | $21,427,700,000,000 | 88.2 | 101.8 | 78.5 | 9.6 | 36.6 | 328239523.0 | 37.090240 | -95.712891 |
4 | 5 | 106000 | Finance & Investments | Warren Buffett | 92.0 | United States | Omaha | Berkshire Hathaway | Finance & Investments | United States | ... | 7.5 | $21,427,700,000,000 | 88.2 | 101.8 | 78.5 | 9.6 | 36.6 | 328239523.0 | 37.090240 | -95.712891 |
5 rows × 35 columns
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 2640 entries, 0 to 2639 Data columns (total 35 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 rank 2640 non-null int64 1 finalWorth 2640 non-null int64 2 category 2640 non-null object 3 personName 2640 non-null object 4 age 2575 non-null float64 5 country 2602 non-null object 6 city 2568 non-null object 7 source 2640 non-null object 8 industries 2640 non-null object 9 countryOfCitizenship 2640 non-null object 10 organization 325 non-null object 11 selfMade 2640 non-null bool 12 status 2640 non-null object 13 gender 2640 non-null object 14 birthDate 2564 non-null object 15 lastName 2640 non-null object 16 firstName 2637 non-null object 17 title 339 non-null object 18 date 2640 non-null object 19 state 753 non-null object 20 residenceStateRegion 747 non-null object 21 birthYear 2564 non-null float64 22 birthMonth 2564 non-null float64 23 birthDay 2564 non-null float64 24 cpi_country 2456 non-null float64 25 cpi_change_country 2456 non-null float64 26 gdp_country 2476 non-null object 27 gross_tertiary_education_enrollment 2458 non-null float64 28 gross_primary_education_enrollment_country 2459 non-null float64 29 life_expectancy_country 2458 non-null float64 30 tax_revenue_country_country 2457 non-null float64 31 total_tax_rate_country 2458 non-null float64 32 population_country 2476 non-null float64 33 latitude_country 2476 non-null float64 34 longitude_country 2476 non-null float64 dtypes: bool(1), float64(14), int64(2), object(18) memory usage: 704.0+ KB
df.isnull().sum()
rank 0 finalWorth 0 category 0 personName 0 age 65 country 38 city 72 source 0 industries 0 countryOfCitizenship 0 organization 2315 selfMade 0 status 0 gender 0 birthDate 76 lastName 0 firstName 3 title 2301 date 0 state 1887 residenceStateRegion 1893 birthYear 76 birthMonth 76 birthDay 76 cpi_country 184 cpi_change_country 184 gdp_country 164 gross_tertiary_education_enrollment 182 gross_primary_education_enrollment_country 181 life_expectancy_country 182 tax_revenue_country_country 183 total_tax_rate_country 182 population_country 164 latitude_country 164 longitude_country 164 dtype: int64
ax = sns.barplot(df.head(10), x='finalWorth',y='personName', hue = 'personName', legend = False,orient='h', palette='rainbow')
ax.set_title('Najbardziej majętne osoby')
ax.set_xlabel('Osoba')
ax.set_ylabel('Majątek')
for container in ax.containers:
ax.bar_label(container, fontsize=8)
plt.show()
ppl_in_countries = df.groupby('countryOfCitizenship')['rank'].count().reset_index().sort_values(by ='rank',ascending=False).head(20)
ax = sns.barplot(ppl_in_countries,x='rank',y='countryOfCitizenship',orient='h',hue = 'countryOfCitizenship', legend = False, palette='rainbow')
ax.set_title('Kraj z największą liczbą miliarderów')
ax.set_ylabel('Kraje')
ax.set_xlabel('Liczba milarderów')
for container in ax.containers:
ax.bar_label(container, fontsize=8)
plt.show()
age_and_wealthy = df.groupby('age')['finalWorth'].mean()
age_and_wealthy
age 18.0 3500.000000 19.0 1700.000000 20.0 2300.000000 21.0 2600.000000 26.0 1450.000000 ... 96.0 4366.666667 97.0 1425.000000 98.0 1750.000000 99.0 4375.000000 101.0 1300.000000 Name: finalWorth, Length: 79, dtype: float64
sns.lineplot(df, x='age', y='finalWorth',errorbar=None)
plt.show()
#category = df.explode('category')
categories = df.groupby('category')['rank'].count().sort_values(ascending=False).reset_index().head(20)
ax = sns.barplot(categories,x='rank',y='category',orient='h',hue = 'category', legend = False, palette='rainbow')
ax.set_title('Najpopularniejsze kategorie biznesu')
ax.set_ylabel('Kategoria')
ax.set_xlabel('Liczba osób')
for container in ax.containers:
ax.bar_label(container, fontsize=8)
plt.show()
plt.figure(figsize=(8, 6))
df['category'].value_counts().plot(kind='pie', autopct='%1.1f%%')
plt.title('Billionaires by Industry')
plt.axis('equal')
plt.show()
world_map = plt.figure(figsize=(12, 6))
ax = world_map.add_subplot(111)
df.groupby('country')['finalWorth'].sum().sort_values(ascending=False).head(10).plot(kind='bar', ax=ax)
plt.title('Top 10 Countries by Billionaire Wealth')
plt.xlabel('Country')
plt.ylabel('Total Wealth (USD)')
plt.xticks(rotation=90)
plt.show()
Przykładowe wykresy z internetu
for i in ['images/kraje.png', 'images/top.png', 'images/world.jpeg']:
imga = mpimg.imread(i)
plt.figure(figsize=(15,8))
imgplot = plt.imshow(imga)