89 KiB
89 KiB
!wget -c https://git.wmi.amu.edu.pl/s434695/ium_434695/raw/commit/2301fb86e434734376f73503307a8f3255a75cc6/vgsales.csv
/bin/sh: 1: wget: not found
!pip install --user pandas
!pip install --user scikit-learn
!pip install --user matplotlib
Requirement already satisfied: pandas in /home/tomasz/snap/jupyter/common/lib/python3.7/site-packages (1.2.3) Requirement already satisfied: pytz>=2017.3 in /home/tomasz/snap/jupyter/common/lib/python3.7/site-packages (from pandas) (2021.1) Requirement already satisfied: numpy>=1.16.5 in /home/tomasz/snap/jupyter/common/lib/python3.7/site-packages (from pandas) (1.20.1) Requirement already satisfied: python-dateutil>=2.7.3 in /snap/jupyter/6/lib/python3.7/site-packages (from pandas) (2.8.0) Requirement already satisfied: six>=1.5 in /snap/jupyter/6/lib/python3.7/site-packages (from python-dateutil>=2.7.3->pandas) (1.12.0) Requirement already satisfied: scikit-learn in /home/tomasz/snap/jupyter/common/lib/python3.7/site-packages (0.24.1) Requirement already satisfied: numpy>=1.13.3 in /home/tomasz/snap/jupyter/common/lib/python3.7/site-packages (from scikit-learn) (1.20.1) Requirement already satisfied: joblib>=0.11 in /home/tomasz/snap/jupyter/common/lib/python3.7/site-packages (from scikit-learn) (1.0.1) Requirement already satisfied: threadpoolctl>=2.0.0 in /home/tomasz/snap/jupyter/common/lib/python3.7/site-packages (from scikit-learn) (2.1.0) Requirement already satisfied: scipy>=0.19.1 in /home/tomasz/snap/jupyter/common/lib/python3.7/site-packages (from scikit-learn) (1.6.1) Collecting matplotlib [?25l Downloading https://files.pythonhosted.org/packages/23/3d/db9a6b3c83c9511301152dbb64a029c3a4313c86eaef12c237b13ecf91d6/matplotlib-3.3.4-cp37-cp37m-manylinux1_x86_64.whl (11.5MB) [K |████████████████████████████████| 11.6MB 4.9MB/s eta 0:00:01 |██████████▊ | 3.9MB 1.7MB/s eta 0:00:05 [?25hCollecting cycler>=0.10 (from matplotlib) Downloading https://files.pythonhosted.org/packages/f7/d2/e07d3ebb2bd7af696440ce7e754c59dd546ffe1bbe732c8ab68b9c834e61/cycler-0.10.0-py2.py3-none-any.whl Collecting kiwisolver>=1.0.1 (from matplotlib) [?25l Downloading https://files.pythonhosted.org/packages/d2/46/231de802ade4225b76b96cffe419cf3ce52bbe92e3b092cf12db7d11c207/kiwisolver-1.3.1-cp37-cp37m-manylinux1_x86_64.whl (1.1MB) [K |████████████████████████████████| 1.1MB 6.1MB/s eta 0:00:01 [?25hRequirement already satisfied: numpy>=1.15 in /home/tomasz/snap/jupyter/common/lib/python3.7/site-packages (from matplotlib) (1.20.1) Requirement already satisfied: python-dateutil>=2.1 in /snap/jupyter/6/lib/python3.7/site-packages (from matplotlib) (2.8.0) Requirement already satisfied: pillow>=6.2.0 in /home/tomasz/snap/jupyter/common/lib/python3.7/site-packages (from matplotlib) (8.1.2) Collecting pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.3 (from matplotlib) [?25l Downloading https://files.pythonhosted.org/packages/8a/bb/488841f56197b13700afd5658fc279a2025a39e22449b7cf29864669b15d/pyparsing-2.4.7-py2.py3-none-any.whl (67kB) [K |████████████████████████████████| 71kB 5.5MB/s eta 0:00:01 [?25hRequirement already satisfied: six in /snap/jupyter/6/lib/python3.7/site-packages (from cycler>=0.10->matplotlib) (1.12.0) Installing collected packages: cycler, kiwisolver, pyparsing, matplotlib Successfully installed cycler-0.10.0 kiwisolver-1.3.1 matplotlib-3.3.4 pyparsing-2.4.7
import pandas as pd
vgsales = pd.read_csv('vgsales.csv')
vgsales
Rank | Name | Platform | Year | Genre | Publisher | NA_Sales | EU_Sales | JP_Sales | Other_Sales | Global_Sales | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Wii Sports | Wii | 2006.0 | Sports | Nintendo | 41.49 | 29.02 | 3.77 | 8.46 | 82.74 |
1 | 2 | Super Mario Bros. | NES | 1985.0 | Platform | Nintendo | 29.08 | 3.58 | 6.81 | 0.77 | 40.24 |
2 | 3 | Mario Kart Wii | Wii | 2008.0 | Racing | Nintendo | 15.85 | 12.88 | 3.79 | 3.31 | 35.82 |
3 | 4 | Wii Sports Resort | Wii | 2009.0 | Sports | Nintendo | 15.75 | 11.01 | 3.28 | 2.96 | 33.00 |
4 | 5 | Pokemon Red/Pokemon Blue | GB | 1996.0 | Role-Playing | Nintendo | 11.27 | 8.89 | 10.22 | 1.00 | 31.37 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
16593 | 16596 | Woody Woodpecker in Crazy Castle 5 | GBA | 2002.0 | Platform | Kemco | 0.01 | 0.00 | 0.00 | 0.00 | 0.01 |
16594 | 16597 | Men in Black II: Alien Escape | GC | 2003.0 | Shooter | Infogrames | 0.01 | 0.00 | 0.00 | 0.00 | 0.01 |
16595 | 16598 | SCORE International Baja 1000: The Official Game | PS2 | 2008.0 | Racing | Activision | 0.00 | 0.00 | 0.00 | 0.00 | 0.01 |
16596 | 16599 | Know How 2 | DS | 2010.0 | Puzzle | 7G//AMES | 0.00 | 0.01 | 0.00 | 0.00 | 0.01 |
16597 | 16600 | Spirits & Spells | GBA | 2003.0 | Platform | Wanadoo | 0.01 | 0.00 | 0.00 | 0.00 | 0.01 |
16598 rows × 11 columns
vgsales.describe(include='all')
Rank | Name | Platform | Year | Genre | Publisher | NA_Sales | EU_Sales | JP_Sales | Other_Sales | Global_Sales | |
---|---|---|---|---|---|---|---|---|---|---|---|
count | 16598.000000 | 16598 | 16598 | 16327.000000 | 16598 | 16540 | 16598.000000 | 16598.000000 | 16598.000000 | 16598.000000 | 16598.000000 |
unique | NaN | 11493 | 31 | NaN | 12 | 578 | NaN | NaN | NaN | NaN | NaN |
top | NaN | Need for Speed: Most Wanted | DS | NaN | Action | Electronic Arts | NaN | NaN | NaN | NaN | NaN |
freq | NaN | 12 | 2163 | NaN | 3316 | 1351 | NaN | NaN | NaN | NaN | NaN |
mean | 8300.605254 | NaN | NaN | 2006.406443 | NaN | NaN | 0.264667 | 0.146652 | 0.077782 | 0.048063 | 0.537441 |
std | 4791.853933 | NaN | NaN | 5.828981 | NaN | NaN | 0.816683 | 0.505351 | 0.309291 | 0.188588 | 1.555028 |
min | 1.000000 | NaN | NaN | 1980.000000 | NaN | NaN | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.010000 |
25% | 4151.250000 | NaN | NaN | 2003.000000 | NaN | NaN | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.060000 |
50% | 8300.500000 | NaN | NaN | 2007.000000 | NaN | NaN | 0.080000 | 0.020000 | 0.000000 | 0.010000 | 0.170000 |
75% | 12449.750000 | NaN | NaN | 2010.000000 | NaN | NaN | 0.240000 | 0.110000 | 0.040000 | 0.040000 | 0.470000 |
max | 16600.000000 | NaN | NaN | 2020.000000 | NaN | NaN | 41.490000 | 29.020000 | 10.220000 | 10.570000 | 82.740000 |
vgsales["Publisher"].value_counts()
Electronic Arts 1351 Activision 975 Namco Bandai Games 932 Ubisoft 921 Konami Digital Entertainment 832 ... Phantagram 1 989 Sports 1 Illusion Softworks 1 TYO 1 General Entertainment 1 Name: Publisher, Length: 578, dtype: int64
vgsales["Platform"].value_counts()
DS 2163 PS2 2161 PS3 1329 Wii 1325 X360 1265 PSP 1213 PS 1196 PC 960 XB 824 GBA 822 GC 556 3DS 509 PSV 413 PS4 336 N64 319 SNES 239 XOne 213 SAT 173 WiiU 143 2600 133 NES 98 GB 98 DC 52 GEN 27 NG 12 SCD 6 WS 6 3DO 3 TG16 2 GG 1 PCFX 1 Name: Platform, dtype: int64
vgsales["Platform"].value_counts().plot(kind="bar")
<AxesSubplot:>
vgsales[["Platform","JP_Sales"]].groupby("Platform").mean().plot(kind="bar")
<matplotlib.axes._subplots.AxesSubplot at 0x7f668577e690>
import seaborn as sns
sns.set_theme()
sns.relplot(data=vgsales, x="JP_Sales", y="NA_Sales", hue="Genre")
<seaborn.axisgrid.FacetGrid at 0x7f6676f85790>
from sklearn.model_selection import train_test_split
vgsales_train, vgsales_test = train_test_split(vgsales, test_size = 0.6, random_state = 1)
vgsales_train["Platform"].value_counts()
PS2 873 DS 829 Wii 530 X360 507 PSP 503 PS3 488 PS 471 PC 396 XB 339 GBA 337 GC 237 3DS 205 PSV 166 PS4 143 N64 126 XOne 95 SNES 95 SAT 65 WiiU 55 2600 49 NES 43 GB 38 DC 25 GEN 10 NG 8 3DO 2 WS 2 GG 1 SCD 1 Name: Platform, dtype: int64
vgsales_test["Platform"].value_counts()
DS 1334 PS2 1288 PS3 841 Wii 795 X360 758 PS 725 PSP 710 PC 564 GBA 485 XB 485 GC 319 3DS 304 PSV 247 N64 193 PS4 193 SNES 144 XOne 118 SAT 108 WiiU 88 2600 84 GB 60 NES 55 DC 27 GEN 17 SCD 5 WS 4 NG 4 TG16 2 3DO 1 PCFX 1 Name: Platform, dtype: int64