2083 lines
110 KiB
Plaintext
2083 lines
110 KiB
Plaintext
|
{
|
|||
|
"cells": [
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"# Google Play Store data exploration\n",
|
|||
|
"### Kamila Bobkowska s444517\n",
|
|||
|
"\n",
|
|||
|
"Link do danych: https://www.kaggle.com/datasets/lava18/google-play-store-apps"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Aby ściągnąć dataset z Kaggle należy założyć konto i pobrać token który umożliwi poprawne korzystanie API. Po pobraniu tokenu trzeba go umieścić w odpowiednim miejscu w zależności czy korzystamy z Winodwsa czy Linuxa jest to inna lokalizacja.\n",
|
|||
|
"\n",
|
|||
|
"*Robiąc to zadanie pobrałam dane korzystając z kaggle z Windowsem, ponieważ nie mam dostępu do Linuxa oprócz komputera wydziałowego, a tam nie działają mi komendy z biblioteki kaggle.*"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 45,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"google-play-store-apps.zip: Skipping, found more recently modified local copy (use --force to force download)\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"!kaggle datasets download -d lava18/google-play-store-apps"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 5,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Archive: google-play-store-apps.zip\n",
|
|||
|
" inflating: googleplaystore.csv \n",
|
|||
|
" inflating: googleplaystore_user_reviews.csv \n",
|
|||
|
" inflating: license.txt \n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"!unzip -o google-play-store-apps.zip"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 47,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/html": [
|
|||
|
"<div>\n",
|
|||
|
"<style scoped>\n",
|
|||
|
" .dataframe tbody tr th:only-of-type {\n",
|
|||
|
" vertical-align: middle;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe tbody tr th {\n",
|
|||
|
" vertical-align: top;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe thead th {\n",
|
|||
|
" text-align: right;\n",
|
|||
|
" }\n",
|
|||
|
"</style>\n",
|
|||
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|||
|
" <thead>\n",
|
|||
|
" <tr style=\"text-align: right;\">\n",
|
|||
|
" <th></th>\n",
|
|||
|
" <th>App</th>\n",
|
|||
|
" <th>Category</th>\n",
|
|||
|
" <th>Rating</th>\n",
|
|||
|
" <th>Reviews</th>\n",
|
|||
|
" <th>Size</th>\n",
|
|||
|
" <th>Installs</th>\n",
|
|||
|
" <th>Type</th>\n",
|
|||
|
" <th>Price</th>\n",
|
|||
|
" <th>Content Rating</th>\n",
|
|||
|
" <th>Genres</th>\n",
|
|||
|
" <th>Last Updated</th>\n",
|
|||
|
" <th>Current Ver</th>\n",
|
|||
|
" <th>Android Ver</th>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </thead>\n",
|
|||
|
" <tbody>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>0</th>\n",
|
|||
|
" <td>Photo Editor & Candy Camera & Grid & ScrapBook</td>\n",
|
|||
|
" <td>ART_AND_DESIGN</td>\n",
|
|||
|
" <td>4.1</td>\n",
|
|||
|
" <td>159</td>\n",
|
|||
|
" <td>19M</td>\n",
|
|||
|
" <td>10,000+</td>\n",
|
|||
|
" <td>Free</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>Everyone</td>\n",
|
|||
|
" <td>Art & Design</td>\n",
|
|||
|
" <td>January 7, 2018</td>\n",
|
|||
|
" <td>1.0.0</td>\n",
|
|||
|
" <td>4.0.3 and up</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>1</th>\n",
|
|||
|
" <td>Coloring book moana</td>\n",
|
|||
|
" <td>ART_AND_DESIGN</td>\n",
|
|||
|
" <td>3.9</td>\n",
|
|||
|
" <td>967</td>\n",
|
|||
|
" <td>14M</td>\n",
|
|||
|
" <td>500,000+</td>\n",
|
|||
|
" <td>Free</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>Everyone</td>\n",
|
|||
|
" <td>Art & Design;Pretend Play</td>\n",
|
|||
|
" <td>January 15, 2018</td>\n",
|
|||
|
" <td>2.0.0</td>\n",
|
|||
|
" <td>4.0.3 and up</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>2</th>\n",
|
|||
|
" <td>U Launcher Lite – FREE Live Cool Themes, Hide ...</td>\n",
|
|||
|
" <td>ART_AND_DESIGN</td>\n",
|
|||
|
" <td>4.7</td>\n",
|
|||
|
" <td>87510</td>\n",
|
|||
|
" <td>8.7M</td>\n",
|
|||
|
" <td>5,000,000+</td>\n",
|
|||
|
" <td>Free</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>Everyone</td>\n",
|
|||
|
" <td>Art & Design</td>\n",
|
|||
|
" <td>August 1, 2018</td>\n",
|
|||
|
" <td>1.2.4</td>\n",
|
|||
|
" <td>4.0.3 and up</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>3</th>\n",
|
|||
|
" <td>Sketch - Draw & Paint</td>\n",
|
|||
|
" <td>ART_AND_DESIGN</td>\n",
|
|||
|
" <td>4.5</td>\n",
|
|||
|
" <td>215644</td>\n",
|
|||
|
" <td>25M</td>\n",
|
|||
|
" <td>50,000,000+</td>\n",
|
|||
|
" <td>Free</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>Teen</td>\n",
|
|||
|
" <td>Art & Design</td>\n",
|
|||
|
" <td>June 8, 2018</td>\n",
|
|||
|
" <td>Varies with device</td>\n",
|
|||
|
" <td>4.2 and up</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>4</th>\n",
|
|||
|
" <td>Pixel Draw - Number Art Coloring Book</td>\n",
|
|||
|
" <td>ART_AND_DESIGN</td>\n",
|
|||
|
" <td>4.3</td>\n",
|
|||
|
" <td>967</td>\n",
|
|||
|
" <td>2.8M</td>\n",
|
|||
|
" <td>100,000+</td>\n",
|
|||
|
" <td>Free</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>Everyone</td>\n",
|
|||
|
" <td>Art & Design;Creativity</td>\n",
|
|||
|
" <td>June 20, 2018</td>\n",
|
|||
|
" <td>1.1</td>\n",
|
|||
|
" <td>4.4 and up</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>...</th>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>10836</th>\n",
|
|||
|
" <td>Sya9a Maroc - FR</td>\n",
|
|||
|
" <td>FAMILY</td>\n",
|
|||
|
" <td>4.5</td>\n",
|
|||
|
" <td>38</td>\n",
|
|||
|
" <td>53M</td>\n",
|
|||
|
" <td>5,000+</td>\n",
|
|||
|
" <td>Free</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>Everyone</td>\n",
|
|||
|
" <td>Education</td>\n",
|
|||
|
" <td>July 25, 2017</td>\n",
|
|||
|
" <td>1.48</td>\n",
|
|||
|
" <td>4.1 and up</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>10837</th>\n",
|
|||
|
" <td>Fr. Mike Schmitz Audio Teachings</td>\n",
|
|||
|
" <td>FAMILY</td>\n",
|
|||
|
" <td>5.0</td>\n",
|
|||
|
" <td>4</td>\n",
|
|||
|
" <td>3.6M</td>\n",
|
|||
|
" <td>100+</td>\n",
|
|||
|
" <td>Free</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>Everyone</td>\n",
|
|||
|
" <td>Education</td>\n",
|
|||
|
" <td>July 6, 2018</td>\n",
|
|||
|
" <td>1.0</td>\n",
|
|||
|
" <td>4.1 and up</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>10838</th>\n",
|
|||
|
" <td>Parkinson Exercices FR</td>\n",
|
|||
|
" <td>MEDICAL</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>3</td>\n",
|
|||
|
" <td>9.5M</td>\n",
|
|||
|
" <td>1,000+</td>\n",
|
|||
|
" <td>Free</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>Everyone</td>\n",
|
|||
|
" <td>Medical</td>\n",
|
|||
|
" <td>January 20, 2017</td>\n",
|
|||
|
" <td>1.0</td>\n",
|
|||
|
" <td>2.2 and up</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>10839</th>\n",
|
|||
|
" <td>The SCP Foundation DB fr nn5n</td>\n",
|
|||
|
" <td>BOOKS_AND_REFERENCE</td>\n",
|
|||
|
" <td>4.5</td>\n",
|
|||
|
" <td>114</td>\n",
|
|||
|
" <td>Varies with device</td>\n",
|
|||
|
" <td>1,000+</td>\n",
|
|||
|
" <td>Free</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>Mature 17+</td>\n",
|
|||
|
" <td>Books & Reference</td>\n",
|
|||
|
" <td>January 19, 2015</td>\n",
|
|||
|
" <td>Varies with device</td>\n",
|
|||
|
" <td>Varies with device</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>10840</th>\n",
|
|||
|
" <td>iHoroscope - 2018 Daily Horoscope & Astrology</td>\n",
|
|||
|
" <td>LIFESTYLE</td>\n",
|
|||
|
" <td>4.5</td>\n",
|
|||
|
" <td>398307</td>\n",
|
|||
|
" <td>19M</td>\n",
|
|||
|
" <td>10,000,000+</td>\n",
|
|||
|
" <td>Free</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>Everyone</td>\n",
|
|||
|
" <td>Lifestyle</td>\n",
|
|||
|
" <td>July 25, 2018</td>\n",
|
|||
|
" <td>Varies with device</td>\n",
|
|||
|
" <td>Varies with device</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </tbody>\n",
|
|||
|
"</table>\n",
|
|||
|
"<p>10841 rows × 13 columns</p>\n",
|
|||
|
"</div>"
|
|||
|
],
|
|||
|
"text/plain": [
|
|||
|
" App Category \\\n",
|
|||
|
"0 Photo Editor & Candy Camera & Grid & ScrapBook ART_AND_DESIGN \n",
|
|||
|
"1 Coloring book moana ART_AND_DESIGN \n",
|
|||
|
"2 U Launcher Lite – FREE Live Cool Themes, Hide ... ART_AND_DESIGN \n",
|
|||
|
"3 Sketch - Draw & Paint ART_AND_DESIGN \n",
|
|||
|
"4 Pixel Draw - Number Art Coloring Book ART_AND_DESIGN \n",
|
|||
|
"... ... ... \n",
|
|||
|
"10836 Sya9a Maroc - FR FAMILY \n",
|
|||
|
"10837 Fr. Mike Schmitz Audio Teachings FAMILY \n",
|
|||
|
"10838 Parkinson Exercices FR MEDICAL \n",
|
|||
|
"10839 The SCP Foundation DB fr nn5n BOOKS_AND_REFERENCE \n",
|
|||
|
"10840 iHoroscope - 2018 Daily Horoscope & Astrology LIFESTYLE \n",
|
|||
|
"\n",
|
|||
|
" Rating Reviews Size Installs Type Price \\\n",
|
|||
|
"0 4.1 159 19M 10,000+ Free 0 \n",
|
|||
|
"1 3.9 967 14M 500,000+ Free 0 \n",
|
|||
|
"2 4.7 87510 8.7M 5,000,000+ Free 0 \n",
|
|||
|
"3 4.5 215644 25M 50,000,000+ Free 0 \n",
|
|||
|
"4 4.3 967 2.8M 100,000+ Free 0 \n",
|
|||
|
"... ... ... ... ... ... ... \n",
|
|||
|
"10836 4.5 38 53M 5,000+ Free 0 \n",
|
|||
|
"10837 5.0 4 3.6M 100+ Free 0 \n",
|
|||
|
"10838 NaN 3 9.5M 1,000+ Free 0 \n",
|
|||
|
"10839 4.5 114 Varies with device 1,000+ Free 0 \n",
|
|||
|
"10840 4.5 398307 19M 10,000,000+ Free 0 \n",
|
|||
|
"\n",
|
|||
|
" Content Rating Genres Last Updated \\\n",
|
|||
|
"0 Everyone Art & Design January 7, 2018 \n",
|
|||
|
"1 Everyone Art & Design;Pretend Play January 15, 2018 \n",
|
|||
|
"2 Everyone Art & Design August 1, 2018 \n",
|
|||
|
"3 Teen Art & Design June 8, 2018 \n",
|
|||
|
"4 Everyone Art & Design;Creativity June 20, 2018 \n",
|
|||
|
"... ... ... ... \n",
|
|||
|
"10836 Everyone Education July 25, 2017 \n",
|
|||
|
"10837 Everyone Education July 6, 2018 \n",
|
|||
|
"10838 Everyone Medical January 20, 2017 \n",
|
|||
|
"10839 Mature 17+ Books & Reference January 19, 2015 \n",
|
|||
|
"10840 Everyone Lifestyle July 25, 2018 \n",
|
|||
|
"\n",
|
|||
|
" Current Ver Android Ver \n",
|
|||
|
"0 1.0.0 4.0.3 and up \n",
|
|||
|
"1 2.0.0 4.0.3 and up \n",
|
|||
|
"2 1.2.4 4.0.3 and up \n",
|
|||
|
"3 Varies with device 4.2 and up \n",
|
|||
|
"4 1.1 4.4 and up \n",
|
|||
|
"... ... ... \n",
|
|||
|
"10836 1.48 4.1 and up \n",
|
|||
|
"10837 1.0 4.1 and up \n",
|
|||
|
"10838 1.0 2.2 and up \n",
|
|||
|
"10839 Varies with device Varies with device \n",
|
|||
|
"10840 Varies with device Varies with device \n",
|
|||
|
"\n",
|
|||
|
"[10841 rows x 13 columns]"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 47,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"import pandas as pd\n",
|
|||
|
"\n",
|
|||
|
"data = pd.read_csv('googleplaystore.csv')\n",
|
|||
|
"data"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"## Data exploration"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 48,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"Index(['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type',\n",
|
|||
|
" 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver',\n",
|
|||
|
" 'Android Ver'],\n",
|
|||
|
" dtype='object')"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 48,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"data.columns"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 49,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"App object\n",
|
|||
|
"Category object\n",
|
|||
|
"Rating float64\n",
|
|||
|
"Reviews object\n",
|
|||
|
"Size object\n",
|
|||
|
"Installs object\n",
|
|||
|
"Type object\n",
|
|||
|
"Price object\n",
|
|||
|
"Content Rating object\n",
|
|||
|
"Genres object\n",
|
|||
|
"Last Updated object\n",
|
|||
|
"Current Ver object\n",
|
|||
|
"Android Ver object\n",
|
|||
|
"dtype: object"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 49,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"data.dtypes"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 50,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/html": [
|
|||
|
"<div>\n",
|
|||
|
"<style scoped>\n",
|
|||
|
" .dataframe tbody tr th:only-of-type {\n",
|
|||
|
" vertical-align: middle;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe tbody tr th {\n",
|
|||
|
" vertical-align: top;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe thead th {\n",
|
|||
|
" text-align: right;\n",
|
|||
|
" }\n",
|
|||
|
"</style>\n",
|
|||
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|||
|
" <thead>\n",
|
|||
|
" <tr style=\"text-align: right;\">\n",
|
|||
|
" <th></th>\n",
|
|||
|
" <th>App</th>\n",
|
|||
|
" <th>Category</th>\n",
|
|||
|
" <th>Rating</th>\n",
|
|||
|
" <th>Reviews</th>\n",
|
|||
|
" <th>Size</th>\n",
|
|||
|
" <th>Installs</th>\n",
|
|||
|
" <th>Type</th>\n",
|
|||
|
" <th>Price</th>\n",
|
|||
|
" <th>Content Rating</th>\n",
|
|||
|
" <th>Genres</th>\n",
|
|||
|
" <th>Last Updated</th>\n",
|
|||
|
" <th>Current Ver</th>\n",
|
|||
|
" <th>Android Ver</th>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </thead>\n",
|
|||
|
" <tbody>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>count</th>\n",
|
|||
|
" <td>10841</td>\n",
|
|||
|
" <td>10841</td>\n",
|
|||
|
" <td>9367.000000</td>\n",
|
|||
|
" <td>10841</td>\n",
|
|||
|
" <td>10841</td>\n",
|
|||
|
" <td>10841</td>\n",
|
|||
|
" <td>10840</td>\n",
|
|||
|
" <td>10841</td>\n",
|
|||
|
" <td>10840</td>\n",
|
|||
|
" <td>10841</td>\n",
|
|||
|
" <td>10841</td>\n",
|
|||
|
" <td>10833</td>\n",
|
|||
|
" <td>10838</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>unique</th>\n",
|
|||
|
" <td>9660</td>\n",
|
|||
|
" <td>34</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>6002</td>\n",
|
|||
|
" <td>462</td>\n",
|
|||
|
" <td>22</td>\n",
|
|||
|
" <td>3</td>\n",
|
|||
|
" <td>93</td>\n",
|
|||
|
" <td>6</td>\n",
|
|||
|
" <td>120</td>\n",
|
|||
|
" <td>1378</td>\n",
|
|||
|
" <td>2832</td>\n",
|
|||
|
" <td>33</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>top</th>\n",
|
|||
|
" <td>ROBLOX</td>\n",
|
|||
|
" <td>FAMILY</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>Varies with device</td>\n",
|
|||
|
" <td>1,000,000+</td>\n",
|
|||
|
" <td>Free</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>Everyone</td>\n",
|
|||
|
" <td>Tools</td>\n",
|
|||
|
" <td>August 3, 2018</td>\n",
|
|||
|
" <td>Varies with device</td>\n",
|
|||
|
" <td>4.1 and up</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>freq</th>\n",
|
|||
|
" <td>9</td>\n",
|
|||
|
" <td>1972</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>596</td>\n",
|
|||
|
" <td>1695</td>\n",
|
|||
|
" <td>1579</td>\n",
|
|||
|
" <td>10039</td>\n",
|
|||
|
" <td>10040</td>\n",
|
|||
|
" <td>8714</td>\n",
|
|||
|
" <td>842</td>\n",
|
|||
|
" <td>326</td>\n",
|
|||
|
" <td>1459</td>\n",
|
|||
|
" <td>2451</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>mean</th>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>4.193338</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>std</th>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>0.537431</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>min</th>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>1.000000</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>25%</th>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>4.000000</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>50%</th>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>4.300000</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>75%</th>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>4.500000</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>max</th>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>19.000000</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </tbody>\n",
|
|||
|
"</table>\n",
|
|||
|
"</div>"
|
|||
|
],
|
|||
|
"text/plain": [
|
|||
|
" App Category Rating Reviews Size Installs \\\n",
|
|||
|
"count 10841 10841 9367.000000 10841 10841 10841 \n",
|
|||
|
"unique 9660 34 NaN 6002 462 22 \n",
|
|||
|
"top ROBLOX FAMILY NaN 0 Varies with device 1,000,000+ \n",
|
|||
|
"freq 9 1972 NaN 596 1695 1579 \n",
|
|||
|
"mean NaN NaN 4.193338 NaN NaN NaN \n",
|
|||
|
"std NaN NaN 0.537431 NaN NaN NaN \n",
|
|||
|
"min NaN NaN 1.000000 NaN NaN NaN \n",
|
|||
|
"25% NaN NaN 4.000000 NaN NaN NaN \n",
|
|||
|
"50% NaN NaN 4.300000 NaN NaN NaN \n",
|
|||
|
"75% NaN NaN 4.500000 NaN NaN NaN \n",
|
|||
|
"max NaN NaN 19.000000 NaN NaN NaN \n",
|
|||
|
"\n",
|
|||
|
" Type Price Content Rating Genres Last Updated \\\n",
|
|||
|
"count 10840 10841 10840 10841 10841 \n",
|
|||
|
"unique 3 93 6 120 1378 \n",
|
|||
|
"top Free 0 Everyone Tools August 3, 2018 \n",
|
|||
|
"freq 10039 10040 8714 842 326 \n",
|
|||
|
"mean NaN NaN NaN NaN NaN \n",
|
|||
|
"std NaN NaN NaN NaN NaN \n",
|
|||
|
"min NaN NaN NaN NaN NaN \n",
|
|||
|
"25% NaN NaN NaN NaN NaN \n",
|
|||
|
"50% NaN NaN NaN NaN NaN \n",
|
|||
|
"75% NaN NaN NaN NaN NaN \n",
|
|||
|
"max NaN NaN NaN NaN NaN \n",
|
|||
|
"\n",
|
|||
|
" Current Ver Android Ver \n",
|
|||
|
"count 10833 10838 \n",
|
|||
|
"unique 2832 33 \n",
|
|||
|
"top Varies with device 4.1 and up \n",
|
|||
|
"freq 1459 2451 \n",
|
|||
|
"mean NaN NaN \n",
|
|||
|
"std NaN NaN \n",
|
|||
|
"min NaN NaN \n",
|
|||
|
"25% NaN NaN \n",
|
|||
|
"50% NaN NaN \n",
|
|||
|
"75% NaN NaN \n",
|
|||
|
"max NaN NaN "
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 50,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"data.describe(include='all')"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 51,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"FAMILY 1972\n",
|
|||
|
"GAME 1144\n",
|
|||
|
"TOOLS 843\n",
|
|||
|
"MEDICAL 463\n",
|
|||
|
"BUSINESS 460\n",
|
|||
|
"PRODUCTIVITY 424\n",
|
|||
|
"PERSONALIZATION 392\n",
|
|||
|
"COMMUNICATION 387\n",
|
|||
|
"SPORTS 384\n",
|
|||
|
"LIFESTYLE 382\n",
|
|||
|
"FINANCE 366\n",
|
|||
|
"HEALTH_AND_FITNESS 341\n",
|
|||
|
"PHOTOGRAPHY 335\n",
|
|||
|
"SOCIAL 295\n",
|
|||
|
"NEWS_AND_MAGAZINES 283\n",
|
|||
|
"SHOPPING 260\n",
|
|||
|
"TRAVEL_AND_LOCAL 258\n",
|
|||
|
"DATING 234\n",
|
|||
|
"BOOKS_AND_REFERENCE 231\n",
|
|||
|
"VIDEO_PLAYERS 175\n",
|
|||
|
"EDUCATION 156\n",
|
|||
|
"ENTERTAINMENT 149\n",
|
|||
|
"MAPS_AND_NAVIGATION 137\n",
|
|||
|
"FOOD_AND_DRINK 127\n",
|
|||
|
"HOUSE_AND_HOME 88\n",
|
|||
|
"LIBRARIES_AND_DEMO 85\n",
|
|||
|
"AUTO_AND_VEHICLES 85\n",
|
|||
|
"WEATHER 82\n",
|
|||
|
"ART_AND_DESIGN 65\n",
|
|||
|
"EVENTS 64\n",
|
|||
|
"PARENTING 60\n",
|
|||
|
"COMICS 60\n",
|
|||
|
"BEAUTY 53\n",
|
|||
|
"1.9 1\n",
|
|||
|
"Name: Category, dtype: int64"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 51,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"data['Category'].value_counts()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 52,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"Everyone 8714\n",
|
|||
|
"Teen 1208\n",
|
|||
|
"Mature 17+ 499\n",
|
|||
|
"Everyone 10+ 414\n",
|
|||
|
"Adults only 18+ 3\n",
|
|||
|
"Unrated 2\n",
|
|||
|
"Name: Content Rating, dtype: int64"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 52,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"data[\"Content Rating\"].value_counts()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 53,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"Tools 842\n",
|
|||
|
"Entertainment 623\n",
|
|||
|
"Education 549\n",
|
|||
|
"Medical 463\n",
|
|||
|
"Business 460\n",
|
|||
|
" ... \n",
|
|||
|
"Parenting;Brain Games 1\n",
|
|||
|
"Health & Fitness;Education 1\n",
|
|||
|
"Role Playing;Education 1\n",
|
|||
|
"Puzzle;Education 1\n",
|
|||
|
"Travel & Local;Action & Adventure 1\n",
|
|||
|
"Name: Genres, Length: 120, dtype: int64"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 53,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"data['Genres'].value_counts()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 54,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"0 10040\n",
|
|||
|
"$0.99 148\n",
|
|||
|
"$2.99 129\n",
|
|||
|
"$1.99 73\n",
|
|||
|
"$4.99 72\n",
|
|||
|
" ... \n",
|
|||
|
"$3.02 1\n",
|
|||
|
"$2.95 1\n",
|
|||
|
"$1.61 1\n",
|
|||
|
"$14.00 1\n",
|
|||
|
"$1.29 1\n",
|
|||
|
"Name: Price, Length: 93, dtype: int64"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 54,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"data['Price'].value_counts()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 55,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"App 0\n",
|
|||
|
"Category 0\n",
|
|||
|
"Rating 1474\n",
|
|||
|
"Reviews 0\n",
|
|||
|
"Size 0\n",
|
|||
|
"Installs 0\n",
|
|||
|
"Type 1\n",
|
|||
|
"Price 0\n",
|
|||
|
"Content Rating 1\n",
|
|||
|
"Genres 0\n",
|
|||
|
"Last Updated 0\n",
|
|||
|
"Current Ver 8\n",
|
|||
|
"Android Ver 3\n",
|
|||
|
"dtype: int64"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 55,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"data.isnull().sum()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 56,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/html": [
|
|||
|
"<div>\n",
|
|||
|
"<style scoped>\n",
|
|||
|
" .dataframe tbody tr th:only-of-type {\n",
|
|||
|
" vertical-align: middle;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe tbody tr th {\n",
|
|||
|
" vertical-align: top;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe thead th {\n",
|
|||
|
" text-align: right;\n",
|
|||
|
" }\n",
|
|||
|
"</style>\n",
|
|||
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|||
|
" <thead>\n",
|
|||
|
" <tr style=\"text-align: right;\">\n",
|
|||
|
" <th></th>\n",
|
|||
|
" <th>App</th>\n",
|
|||
|
" <th>Category</th>\n",
|
|||
|
" <th>Rating</th>\n",
|
|||
|
" <th>Reviews</th>\n",
|
|||
|
" <th>Size</th>\n",
|
|||
|
" <th>Installs</th>\n",
|
|||
|
" <th>Type</th>\n",
|
|||
|
" <th>Price</th>\n",
|
|||
|
" <th>Content Rating</th>\n",
|
|||
|
" <th>Genres</th>\n",
|
|||
|
" <th>Last Updated</th>\n",
|
|||
|
" <th>Current Ver</th>\n",
|
|||
|
" <th>Android Ver</th>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </thead>\n",
|
|||
|
" <tbody>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>0</th>\n",
|
|||
|
" <td>Photo Editor & Candy Camera & Grid & ScrapBook</td>\n",
|
|||
|
" <td>ART_AND_DESIGN</td>\n",
|
|||
|
" <td>4.1</td>\n",
|
|||
|
" <td>159</td>\n",
|
|||
|
" <td>19M</td>\n",
|
|||
|
" <td>10,000+</td>\n",
|
|||
|
" <td>Free</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>Everyone</td>\n",
|
|||
|
" <td>Art & Design</td>\n",
|
|||
|
" <td>January 7, 2018</td>\n",
|
|||
|
" <td>1.0.0</td>\n",
|
|||
|
" <td>4.0.3 and up</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>1</th>\n",
|
|||
|
" <td>Coloring book moana</td>\n",
|
|||
|
" <td>ART_AND_DESIGN</td>\n",
|
|||
|
" <td>3.9</td>\n",
|
|||
|
" <td>967</td>\n",
|
|||
|
" <td>14M</td>\n",
|
|||
|
" <td>500,000+</td>\n",
|
|||
|
" <td>Free</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>Everyone</td>\n",
|
|||
|
" <td>Art & Design;Pretend Play</td>\n",
|
|||
|
" <td>January 15, 2018</td>\n",
|
|||
|
" <td>2.0.0</td>\n",
|
|||
|
" <td>4.0.3 and up</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>2</th>\n",
|
|||
|
" <td>U Launcher Lite – FREE Live Cool Themes, Hide ...</td>\n",
|
|||
|
" <td>ART_AND_DESIGN</td>\n",
|
|||
|
" <td>4.7</td>\n",
|
|||
|
" <td>87510</td>\n",
|
|||
|
" <td>8.7M</td>\n",
|
|||
|
" <td>5,000,000+</td>\n",
|
|||
|
" <td>Free</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>Everyone</td>\n",
|
|||
|
" <td>Art & Design</td>\n",
|
|||
|
" <td>August 1, 2018</td>\n",
|
|||
|
" <td>1.2.4</td>\n",
|
|||
|
" <td>4.0.3 and up</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>3</th>\n",
|
|||
|
" <td>Sketch - Draw & Paint</td>\n",
|
|||
|
" <td>ART_AND_DESIGN</td>\n",
|
|||
|
" <td>4.5</td>\n",
|
|||
|
" <td>215644</td>\n",
|
|||
|
" <td>25M</td>\n",
|
|||
|
" <td>50,000,000+</td>\n",
|
|||
|
" <td>Free</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>Teen</td>\n",
|
|||
|
" <td>Art & Design</td>\n",
|
|||
|
" <td>June 8, 2018</td>\n",
|
|||
|
" <td>Varies with device</td>\n",
|
|||
|
" <td>4.2 and up</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>4</th>\n",
|
|||
|
" <td>Pixel Draw - Number Art Coloring Book</td>\n",
|
|||
|
" <td>ART_AND_DESIGN</td>\n",
|
|||
|
" <td>4.3</td>\n",
|
|||
|
" <td>967</td>\n",
|
|||
|
" <td>2.8M</td>\n",
|
|||
|
" <td>100,000+</td>\n",
|
|||
|
" <td>Free</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>Everyone</td>\n",
|
|||
|
" <td>Art & Design;Creativity</td>\n",
|
|||
|
" <td>June 20, 2018</td>\n",
|
|||
|
" <td>1.1</td>\n",
|
|||
|
" <td>4.4 and up</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>...</th>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>9355</th>\n",
|
|||
|
" <td>FR Calculator</td>\n",
|
|||
|
" <td>FAMILY</td>\n",
|
|||
|
" <td>4.0</td>\n",
|
|||
|
" <td>7</td>\n",
|
|||
|
" <td>2.6M</td>\n",
|
|||
|
" <td>500+</td>\n",
|
|||
|
" <td>Free</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>Everyone</td>\n",
|
|||
|
" <td>Education</td>\n",
|
|||
|
" <td>June 18, 2017</td>\n",
|
|||
|
" <td>1.0.0</td>\n",
|
|||
|
" <td>4.1 and up</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>9356</th>\n",
|
|||
|
" <td>Sya9a Maroc - FR</td>\n",
|
|||
|
" <td>FAMILY</td>\n",
|
|||
|
" <td>4.5</td>\n",
|
|||
|
" <td>38</td>\n",
|
|||
|
" <td>53M</td>\n",
|
|||
|
" <td>5,000+</td>\n",
|
|||
|
" <td>Free</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>Everyone</td>\n",
|
|||
|
" <td>Education</td>\n",
|
|||
|
" <td>July 25, 2017</td>\n",
|
|||
|
" <td>1.48</td>\n",
|
|||
|
" <td>4.1 and up</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>9357</th>\n",
|
|||
|
" <td>Fr. Mike Schmitz Audio Teachings</td>\n",
|
|||
|
" <td>FAMILY</td>\n",
|
|||
|
" <td>5.0</td>\n",
|
|||
|
" <td>4</td>\n",
|
|||
|
" <td>3.6M</td>\n",
|
|||
|
" <td>100+</td>\n",
|
|||
|
" <td>Free</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>Everyone</td>\n",
|
|||
|
" <td>Education</td>\n",
|
|||
|
" <td>July 6, 2018</td>\n",
|
|||
|
" <td>1.0</td>\n",
|
|||
|
" <td>4.1 and up</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>9358</th>\n",
|
|||
|
" <td>The SCP Foundation DB fr nn5n</td>\n",
|
|||
|
" <td>BOOKS_AND_REFERENCE</td>\n",
|
|||
|
" <td>4.5</td>\n",
|
|||
|
" <td>114</td>\n",
|
|||
|
" <td>Varies with device</td>\n",
|
|||
|
" <td>1,000+</td>\n",
|
|||
|
" <td>Free</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>Mature 17+</td>\n",
|
|||
|
" <td>Books & Reference</td>\n",
|
|||
|
" <td>January 19, 2015</td>\n",
|
|||
|
" <td>Varies with device</td>\n",
|
|||
|
" <td>Varies with device</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>9359</th>\n",
|
|||
|
" <td>iHoroscope - 2018 Daily Horoscope & Astrology</td>\n",
|
|||
|
" <td>LIFESTYLE</td>\n",
|
|||
|
" <td>4.5</td>\n",
|
|||
|
" <td>398307</td>\n",
|
|||
|
" <td>19M</td>\n",
|
|||
|
" <td>10,000,000+</td>\n",
|
|||
|
" <td>Free</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>Everyone</td>\n",
|
|||
|
" <td>Lifestyle</td>\n",
|
|||
|
" <td>July 25, 2018</td>\n",
|
|||
|
" <td>Varies with device</td>\n",
|
|||
|
" <td>Varies with device</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </tbody>\n",
|
|||
|
"</table>\n",
|
|||
|
"<p>9360 rows × 13 columns</p>\n",
|
|||
|
"</div>"
|
|||
|
],
|
|||
|
"text/plain": [
|
|||
|
" App Category \\\n",
|
|||
|
"0 Photo Editor & Candy Camera & Grid & ScrapBook ART_AND_DESIGN \n",
|
|||
|
"1 Coloring book moana ART_AND_DESIGN \n",
|
|||
|
"2 U Launcher Lite – FREE Live Cool Themes, Hide ... ART_AND_DESIGN \n",
|
|||
|
"3 Sketch - Draw & Paint ART_AND_DESIGN \n",
|
|||
|
"4 Pixel Draw - Number Art Coloring Book ART_AND_DESIGN \n",
|
|||
|
"... ... ... \n",
|
|||
|
"9355 FR Calculator FAMILY \n",
|
|||
|
"9356 Sya9a Maroc - FR FAMILY \n",
|
|||
|
"9357 Fr. Mike Schmitz Audio Teachings FAMILY \n",
|
|||
|
"9358 The SCP Foundation DB fr nn5n BOOKS_AND_REFERENCE \n",
|
|||
|
"9359 iHoroscope - 2018 Daily Horoscope & Astrology LIFESTYLE \n",
|
|||
|
"\n",
|
|||
|
" Rating Reviews Size Installs Type Price \\\n",
|
|||
|
"0 4.1 159 19M 10,000+ Free 0 \n",
|
|||
|
"1 3.9 967 14M 500,000+ Free 0 \n",
|
|||
|
"2 4.7 87510 8.7M 5,000,000+ Free 0 \n",
|
|||
|
"3 4.5 215644 25M 50,000,000+ Free 0 \n",
|
|||
|
"4 4.3 967 2.8M 100,000+ Free 0 \n",
|
|||
|
"... ... ... ... ... ... ... \n",
|
|||
|
"9355 4.0 7 2.6M 500+ Free 0 \n",
|
|||
|
"9356 4.5 38 53M 5,000+ Free 0 \n",
|
|||
|
"9357 5.0 4 3.6M 100+ Free 0 \n",
|
|||
|
"9358 4.5 114 Varies with device 1,000+ Free 0 \n",
|
|||
|
"9359 4.5 398307 19M 10,000,000+ Free 0 \n",
|
|||
|
"\n",
|
|||
|
" Content Rating Genres Last Updated \\\n",
|
|||
|
"0 Everyone Art & Design January 7, 2018 \n",
|
|||
|
"1 Everyone Art & Design;Pretend Play January 15, 2018 \n",
|
|||
|
"2 Everyone Art & Design August 1, 2018 \n",
|
|||
|
"3 Teen Art & Design June 8, 2018 \n",
|
|||
|
"4 Everyone Art & Design;Creativity June 20, 2018 \n",
|
|||
|
"... ... ... ... \n",
|
|||
|
"9355 Everyone Education June 18, 2017 \n",
|
|||
|
"9356 Everyone Education July 25, 2017 \n",
|
|||
|
"9357 Everyone Education July 6, 2018 \n",
|
|||
|
"9358 Mature 17+ Books & Reference January 19, 2015 \n",
|
|||
|
"9359 Everyone Lifestyle July 25, 2018 \n",
|
|||
|
"\n",
|
|||
|
" Current Ver Android Ver \n",
|
|||
|
"0 1.0.0 4.0.3 and up \n",
|
|||
|
"1 2.0.0 4.0.3 and up \n",
|
|||
|
"2 1.2.4 4.0.3 and up \n",
|
|||
|
"3 Varies with device 4.2 and up \n",
|
|||
|
"4 1.1 4.4 and up \n",
|
|||
|
"... ... ... \n",
|
|||
|
"9355 1.0.0 4.1 and up \n",
|
|||
|
"9356 1.48 4.1 and up \n",
|
|||
|
"9357 1.0 4.1 and up \n",
|
|||
|
"9358 Varies with device Varies with device \n",
|
|||
|
"9359 Varies with device Varies with device \n",
|
|||
|
"\n",
|
|||
|
"[9360 rows x 13 columns]"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 56,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"data.dropna(subset=['Rating', 'Type','Content Rating','Current Ver','Android Ver'], inplace=True)\n",
|
|||
|
"data.reset_index(drop=True, inplace=True)\n",
|
|||
|
"data"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 57,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"App 0\n",
|
|||
|
"Category 0\n",
|
|||
|
"Rating 0\n",
|
|||
|
"Reviews 0\n",
|
|||
|
"Size 0\n",
|
|||
|
"Installs 0\n",
|
|||
|
"Type 0\n",
|
|||
|
"Price 0\n",
|
|||
|
"Content Rating 0\n",
|
|||
|
"Genres 0\n",
|
|||
|
"Last Updated 0\n",
|
|||
|
"Current Ver 0\n",
|
|||
|
"Android Ver 0\n",
|
|||
|
"dtype: int64"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 57,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"data.isnull().sum()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"## Proste wizualizacje"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 58,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAABIEAAAFNCAYAAACXJH+pAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8li6FKAAAgAElEQVR4nOzdd3yc1YH2/euMRhr1Xq3qIvduuWBqQigmCT0EzAJhwxKWEJJ9Nu9C9smz6fuyT5I3yabAEkJIQkwvS4AAgdBsY2Mb9y5LtiQX9Toqo5k57x+SHeO1Ldka6ZZmft/PRx9pZo7v+xJlbF0+xVhrBQAAAAAAgPDmcjoAAAAAAAAAhh8lEAAAAAAAQASgBAIAAAAAAIgAlEAAAAAAAAARgBIIAAAAAAAgAlACAQAAAAAARABKIAAAgFMwxtxsjHnD6RwAAAChYKy1TmcAAAAIGWPMfkk5kgKSOiS9Jukea23HAL+uRFKlpGhrrX94UwIAAIw8ZgIBAIBw9FlrbaKkuZLmSfqGw3kAAAAcRwkEAADClrX2iKTX1VcGyRjzaWPMRmNMmzGm2hjz7eOGv9f/ucUY02GMOccY8wVjzMqjA4wx1hhzlzFmrzGm2RjzS2OM6X8tyhjzY2NMgzGm0hhzT/94d//rXzDGVBhj2vtfv3lE/iEAAAD0czsdAAAAYLgYYwokLZP01/6nvJJulbRd0kxJfzHGbLLWvijpAvUtB0s9uhzMGDPlJJf9jKSFkpIlbZD0J/UtOfuH/nvN7b/PM8flSJD0n5IWWmt3G2PyJKWH9rsFAAA4PWYCAQCAcPSiMaZdUrWkOknfkiRr7TvW2q3W2qC1doukJyRdeIbXfsBa22KtrZL0tvpnGUm6QdLPrLU11tpmSQ+c8OuCkmYaY+KstYettdvP8nsDAAA4K5RAAAAgHF1trU2SdJGkqZIyJckYs9gY87Yxpt4Y0yrprqOvnYEjx33dKSmx/+tx6iudjjr2tbXWK+nz/fc7bIx5xRgz9QzvCwAAMCSUQAAAIGxZa9+V9JikH/U/tULSS5IKrbUpkh6SZI4OH+LtDksqOO5x4QlZXrfWXiIpT9IuSb8e4v0AAADOCCUQAAAIdz+VdIkxZq6kJElN1tpuY8wiScuPG1evviVbE87yPk9L+qoxJt8YkyrpvqMvGGNyjDFX9u8N1KO+o+sDZ3kfAACAs0IJBAAAwpq1tl7S7yX9H0l3S/pu/35B/6a+4ubouE5JP5C0yhjTYoxZcoa3+rWkNyRtkbRR0quS/Oore1yS/lnSIUlN6tuH6O4hfFsAAABnzFg71JnPAAAAOJExZpmkh6y1xU5nAQAAkJgJBAAAEBLGmDhjzBXGGLcxJl99J5K94HQuAACAo5gJBAAAEALGmHhJ76rvNLIuSa9I+qq1ts3RYAAAAP0ogQAAAAAAACIAy8EAAAAAAAAiACUQAAAAAABABHA7dePMzExbUlLi1O0BAAAAAADCzoYNGxqstVkne82xEqikpETr16936vYAAAAAAABhxxhz4FSvsRwMAAAAAAAgAlACAQAAAAAARABKIAAAAAAAgAhACQQAAAAAABABKIEAAAAAAAAiACUQAAAAAABABKAEAgAAAAAAiACUQAAAAAAAABGAEggAAAAAACACUAIBAAAAAABEAEogAAAAAACACOB2OgAAAAAAYPitWFsVkussX1wUkusAGHnMBAIAAAAAAIgAlEAAAAAAAAARgBIIAAAAAAAgAlACAQAAAAAARABKIAAAAAAAgAhACQQAAAAAABABOCIeAAAAAEaxUB3tDgDMBAIAAAAAAIgAlEAAAAAAAAARYMASyBjzqDGmzhizbYBxC40xAWPM9aGLBwAAAAAAgFAYzEygxyRdfroBxpgoSf8h6fUQZAIAAAAAAECIDVgCWWvfk9Q0wLCvSHpOUl0oQgEAAAAAACC0hrwnkDEmX9I1kh4aehwAAAAAAAAMh1BsDP1TSfdZawMDDTTG3GmMWW+MWV9fXx+CWwMAAAAAAGAw3CG4RpmkJ40xkpQp6QpjjN9a++KJA621D0t6WJLKyspsCO4NAAAAAACAQRhyCWStHX/0a2PMY5JePlkBBAAAAAAAAOcMWAIZY56QdJGkTGNMjaRvSYqWJGst+wABAAAAQARZsbYqJNdZvrgoJNcBMHgDlkDW2psGezFr7ReGlAYAAAAAAADDIhQbQwMAAAAAAGCUowQCAAAAAACIAJRAAAAAAAAAEYASCAAAAAAAIAJQAgEAAAAAAEQASiAAAAAAAIAIQAkEAAAAAAAQASiBAAAAAAAAIgAlEAAAAAAAQASgBAIAAAAAAIgAlEAAAAAAAAARgBIIAAAAAAAgAlACAQAAAAAARABKIAAAAAAAgAhACQQAAAAAABABKIEAAAAAAAAiACUQAAAAAABABKAEAgAAAAAAiABupwMAAAAAQDhasbbK6QgA8DHMBAIAAAAAAIgAlEAAAAAAAAARgBIIAAAAAAAgAlACAQAAAAAARABKIAAAAAAAgAhACQQAAAAAABABKIEAAAAAAAAiwIAlkDHmUWNMnTFm2ylev9kYs6X/Y7UxZk7oYwIAAAAAAGAoBjMT6DFJl5/m9UpJF1prZ0v6nqSHQ5ALAAAAAAAAIeQeaIC19j1jTMlpXl993MM1kgqGHgsAAAAAAAChFOo9gb4o6c8hviYAAAAAAACGaMCZQINljPmE+kqg804z5k5Jd0pSUVFRqG4NAAAAAACAAYRkJpAxZrakRyRdZa1tPNU4a+3D1toya21ZVlZWKG4NAAAAAACAQRhyCWSMKZL0vKRbrLV7hh4JAAAAAAAAoTbgcjBjzBOSLpKUaYypkfQtSdGSZK19SNK/ScqQ9CtjjCT5rbVlwxUYAAAAAAAAZ24wp4PdNMDrd0i6I2SJAAAAAAAAEHKhPh0MAAAAAAAAoxAlEAAAAAAAQASgBAIAAAAAAIgAA+4JBAAAAAAY+3oDQTV7fWrq9KnJ61Oz16fmzl6lJ8RoUnaiSjISFONmngAQziiBAAAAACCM9AaC2nG4TQ3tPWrqL32avT61dfs/Ni46yiglLka7a9u1srxBUS6j4ox4lWYlalJ2kvJSY+XqOwEaQJigBAIAAACAMOAPBLXuQLPe3V13rPBJjnUrPcGjSdlJSk+IVnpCjNLjY5SWEKNEj1vGGPn8Qe1v9Kq8rkPldR16fUetXt9Rq/iYKE3MStSk7L6PtPgYh79DAENFCQQAAAAAY1ggaLWxqll/3VWnlq5eFWfE6/oFhSrOiFd01MDLu2LcLk3OSdLknCRJUnt3r/bVdxwrhbYebJUkjc9M0HXzC5SeQBkEjFWUQAAAAAAwBgWt1ebqFr21q05NXp8K0uJ0zbx8TcpOlBnCMq6k2GjNLUzT3MI0WWtV196j3Ufa9fbuOv38r3v12TnjNK8wdUj3AOAMSiAAAAAAGEOC1mrbwVa9tatO9e09ykuJ1S1LijU1NynkxYwxRjnJscpJjtWsghQ9s75az26o0e4j7bp6br7iYqJCej8Aw4sSCAAAAADGAGutdh5u15s7a3WkrVvZSR7dtKhIM8Ylj8gGzmnxMbrj/Al6f0+9/rKzVlVNnbp+QYEmZiUO+70BhAYlEAAAAACMcr2BoJ7dUKOtB1uVkRCjG8oKNLsgdcRP73IZowunZGtSdpKeWl+tR1dW6rxJmbpkeo7cg9h/CICzKIEAAAAAYBTr9Pn1+JoD2t/YqUun5+j80ixFuZzdjyc/LU73fGKSXt12WO+XN6i8vkM3lBUqJznW0VwATo+qFgAAAABGqeZOn/7rvQpVN3fp8wsLddGUbMcLoKNi3C5dPTdfty4pVltXr375drlW72uQtdbpaABOgZlAAAAAADAKHWrp0u9W71dvMKjbl5Zowijde2dqXrLuvbhUz390UC9vOay9tR1avrhoUMfTh8KKtVUhuc7yxUUhuQ4wmjETCAAAAABGmb217Xr4/Qq5XEZfumDiqC2AjkqKjdat5xTrM7PztLu2Xc9/VMOMIGAUYiYQAAAAAIwiHx1o1vMba5S
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 1440x360 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {
|
|||
|
"needs_background": "light"
|
|||
|
},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"import matplotlib.pyplot as plt\n",
|
|||
|
"import seaborn as sns\n",
|
|||
|
"\n",
|
|||
|
"plt.figure(figsize=(20,5))\n",
|
|||
|
"sns.distplot(data['Rating']).set(title='Ratings')\n",
|
|||
|
"plt.show()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 74,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAABIcAAAFNCAYAAACaOg/uAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8li6FKAAAgAElEQVR4nO3db5Cl51kf6N/d53S3ZiSNRsIjWZZGllKrxRaE2I4QZl3FpgwUlmFRvmzWTsCJa2tVKqzF3oJiTbZ2gVSlig8sS1zlWGXAC17+eAmwiUJUCBYwhAQ7ki1jEEJBK4w1SJ4eY013S9PTf5/9cE7LrVaP5nT36T6nu6+rqmvOed/nfc/T6nlLPb+6n/up1loAAAAAOJomRj0BAAAAAEZHOAQAAABwhAmHAAAAAI4w4RAAAADAESYcAgAAADjChEMAAAAAR5hwCABgm6rqH1XVb416HgAAw1CttVHPAQBgz1XVF5LckGQ1yQtJfjPJ/a21Fy5z3a1J/jLJZGttZW9nCQCw/1QOAQBHyX/TWrsqyZuSvDnJD494PgAAIyccAgCOnNbal5I8nF5IlKr6zqp6rKrmquqZqvrRDcP/oP/n+ap6oaq+uar+SVX94fqAqmpVdV9V/UVVPV9VH66q6p/rVNX/XlVfrqq/rKr7++O7/fP/pKqerqr5/vl/tC//EQAA+rqjngAAwH6rqpuT3J3kd/uHXkzyniSPJ/n6JL9dVZ9rrf3rJN+S3rKyk+vLyqrqa7e47Xcl+cYkJ5J8Jsm/TW/p2v/Q/6w39T/nX22Yx5VJPpTkG1trT1bVjUmuG+53CwDw6lQOAQBHyb+uqvkkzySZSfIjSdJa+2Rr7U9aa2uttc8n+eUk//U27/3jrbXzrbUvJvm99KuSkvyDJP+itXamtfZ8kh/fdN1akq+vqmOttedaa4/v8HsDANgR4RAAcJT8/dba1Un+XpI3JHlNklTVN1XV71XVuaqaTXLf+rlt+NKG1xeSXNV//br0wqh1L71urb2Y5L/rf95zVfXvquoN2/xcAIBdEQ4BAEdOa+33k/xckp/oH/qlJA8mOd1auybJA0lqffguP+65JDdveH9601webq19e5Ibk/x5kp/e5ecBAGyLcAgAOKp+Ksm3V9Wbklyd5CuttYtVdVeSf7hh3Ln0ln79rR1+zq8keX9V3VRVJ5P8z+snquqGqvrufu+hxSQvJFnd4ecAAOyIcAgAOJJaa+eSfDzJ/5rk+5L8s34/ov8tvUBnfdyFJP88yX+oqvNV9dZtftRPJ/mtJJ9P8liSh5KspBcCTST5gSTPJvlKen2Ovm8X3xYAwLZVa7utlAYAYFBVdXeSB1prrx/1XAAAEpVDAAB7qqqOVdU7q6pbVTelt0Pa/zPqeQEArFM5BACwh6rqeJLfT293tIUk/y7J+1trcyOdGABAn3AIAAAA4AizrAwAAADgCBMOAQAAABxh3VFPYCuvec1r2q233jrqaQAAAAAcGp/5zGe+3Fo7tfn4WIZDt956ax599NFRTwMAAADg0Kiqv9rquGVlAAAAAEeYcAgAAADgCBMOAQAAABxhwiEAAACAI0w4BAAAAHCECYcAAAAAjjDhEAAAAMARJhwCAAAAOMKEQwAAAABHmHAIAAAA4AgTDh0gP/ZvH88f/Odzo54GAAAAcIh0Rz0BBrO21vLz//ELWVtr+Zb/8tSopwMAAAAcEiqHDogXllay1pIXFldHPRUAAADgEBEOHRCzF5aTJBeWVkY8EwAAAOAwEQ4dELMLvXDohUXhEAAAADA8wqEDYj0curBkWRkAAAAwPMKhA2I9HHpR5RAAAAAwRAOFQ1X1jqp6sqqeqqoPbnH+DVX1R1W1WFU/uMX5TlU9VlW/MYxJH0UvhUN6DgEAAABDdNlwqKo6ST6c5O4kdyR5d1XdsWnYV5J8f5KfuMRt3p/kiV3M88j7auWQZWUAAADA8AxSOXRXkqdaa0+31paSfCLJPRsHtNZmWmuPJFnefHFV3ZzkO5P8zBDme2Sdv6AhNQAAADB8g4RDNyV5ZsP7M/1jg/qpJD+UZG0b17DJeuXQ0spallf9pwQAAACGY5BwqLY41ga5eVV9V5KZ1tpnBhh7b1U9WlWPnjt3bpDbHylzC18tyrpgaRkAAAAwJIOEQ2eSnN7w/uYkzw54/7cl+e6q+kJ6y9HeXlW/sNXA1tpHW2t3ttbuPHXq1IC3PzpmN4RDmlIDAAAAwzJIOPRIktur6raqmkryriQPDnLz1toPt9Zubq3d2r/ud1tr37Pj2R5hLwuH9B0CAAAAhqR7uQGttZWquj/Jw0k6ST7WWnu8qu7rn3+gql6b5NEkJ5KsVdUHktzRWpvbw7kfKbMLyzlxRTdzF1fy4pJlZQAAAMBwXDYcSpLW2kNJHtp07IENr7+U3nKzV7vHJ5N8ctszJEkvHHrdyWOZ+9K8yiEAAABgaAZZVsaIra21zF1czk0njyWxrAwAAAAYHuHQATB/cSWtJa9bD4c0pAYAAACGRDh0AKw3o77x5BVJkhdtZQ8AAAAMiXDoAFgPhywrAwAAAIZNOHQArIdDN5y4IlWxWxkAAAAwNMKhA2A9HDp5fDLHJzsqhwAAAIChEQ4dAOcXlpIkJ49N5crprnAIAAAAGBrh0AGwXjl0zbHJXDXdtawMAAAAGBrh0AEwu7Ccqc5ErpicyPFpy8oAAACA4REOHQBzC8s5cWwyVZUrpywrAwAAAIZHOHQAzC4s55pj3STp9RxaEg4BAAAAwyEcOgB64dBkkl44dGFRzyEAAABgOIRDB8DLwqGpTl6wrAwAAAAYEuHQAXD+wnJOHp9K0q8cslsZAAAAMCTCoQNgc+XQi0sraa2NeFYAAADAYSAcGnOray3zF1dyYkPPodaShWXVQwAAAMDuCYfG3PzF5SR5qXLo+HRv1zJ9hwAAAIBhEA6NudmFl4dDV013ksSOZQAAAMBQCIfG3OZw6PiUyiEAAABgeIRDY249HDp5fL1yqBcOvSgcAgAAAIZAODTmzl94eeXQlf1wyHb2AAAAwDAIh8bc5mVlV071eg5ZVgYAAAAMg3BozL0iHHqpckg4BAAAAOyecGjMzS0sZ6o7kSsmexVDV77UkNqyMgAAAGD3uqOeAK9udmE51xybzC99+otJktW1liT5o//vb3KsHxht1z/8pluGNj8AAADgYFM5NObWw6F1nYlKd6KytKJyCAAAANg94dCYO39hOSc3hENJMtWdyOLK2ohmBAAAABwmA4VDVfWOqnqyqp6qqg9ucf4NVfVHVbVYVT+44fjpqvq9qnqiqh6vqvcPc/JHwebKoSSZ7k5kSTgEAAAADMFlw6Gq6iT5cJK7k9yR5N1VdcemYV9J8v1JfmLT8ZUkP9Bae2OStyZ53xbX8iq2Doc6KocAAACAoRikcuiuJE+11p5urS0l+USSezYOaK3NtNYeSbK86fhzrbXP9l/PJ3kiyU1DmfkRMbewnBNbLCtTOQQAAAAMwyDh0E1Jntnw/kx2EPBU1a1J3pzk09u99qhaXWuZX1zZclnZoobUAAAAwBAMEg7VFsfadj6kqq5K8mtJPtBam7vEmHur6tGqevTcuXPbuf2hNbfQK8TaHA5pSA0AAAAMyyDh0Jkkpze8vznJs4N+QFVNphcM/WJr7dcvNa619tHW2p2ttTtPnTo16O0PtdlLhEMaUgMAAADDMkg49EiS26vqtqqaSvKuJA8OcvOqqiQ/m+SJ1tpP7nyaR9P5fjh08rjKIQAAAGBvdC83oLW2UlX3J3k4SSfJx1prj1fVff3zD1TVa5M8muREkrWq+kB6O5t9Q5LvTfInVfW5/i3/aWvtoT34Xg6djZVDZ+cWXzo+3e2oHAIAAACG4rLhUJL0w5yHNh17YMPrL6W33GyzP8zWPYsYwKstK1ttLStra+lODFL8BQAAALA1ycIYu1Q4NNXt/diWllUPAQAAALsjHBpj67uVndiicihJFleFQwAAAMDuCIfG2OzCcqa7E7lisvOy41Pd3ntNqQEAAIDdEg6NsfMXll6xU1ny1cohTakBAAC
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 1440x360 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {
|
|||
|
"needs_background": "light"
|
|||
|
},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"data[\"Price\"] = data[\"Price\"].replace({'\\$': ''}, regex=True)\n",
|
|||
|
"plt.figure(figsize=(20,5))\n",
|
|||
|
"sns.distplot(data['Price']).set(title='Ratings')\n",
|
|||
|
"plt.show()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"# Kolumna \"Size\"\n",
|
|||
|
"Mimo, że ta kolumna może mieć znaczenie przy opracowwaniu danych ta kolumna zostanie pominięta ze względu na występującą w niej wartość \"Varies with device\", którą byłoby ciążko opracować. Ponadto nie można po prostu usunąć wszystkich jej wystąpień, ponieważ występują ona w ponad 1500 rzędach."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 62,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"['19M' '14M' '8.7M' '25M' '2.8M' '5.6M' '29M' '33M' '3.1M' '28M' '12M'\n",
|
|||
|
" '20M' '21M' '37M' '5.5M' '17M' '39M' '31M' '4.2M' '23M' '6.0M' '6.1M'\n",
|
|||
|
" '4.6M' '9.2M' '5.2M' '11M' '24M' 'Varies with device' '9.4M' '15M' '10M'\n",
|
|||
|
" '1.2M' '26M' '8.0M' '7.9M' '56M' '57M' '35M' '54M' '201k' '3.6M' '5.7M'\n",
|
|||
|
" '8.6M' '2.4M' '27M' '2.7M' '2.5M' '7.0M' '16M' '3.4M' '8.9M' '3.9M'\n",
|
|||
|
" '2.9M' '38M' '32M' '5.4M' '18M' '1.1M' '2.2M' '4.5M' '9.8M' '52M' '9.0M'\n",
|
|||
|
" '6.7M' '30M' '2.6M' '7.1M' '22M' '6.4M' '3.2M' '8.2M' '4.9M' '9.5M'\n",
|
|||
|
" '5.0M' '5.9M' '13M' '73M' '6.8M' '3.5M' '4.0M' '2.3M' '2.1M' '42M' '9.1M'\n",
|
|||
|
" '55M' '23k' '7.3M' '6.5M' '1.5M' '7.5M' '51M' '41M' '48M' '8.5M' '46M'\n",
|
|||
|
" '8.3M' '4.3M' '4.7M' '3.3M' '40M' '7.8M' '8.8M' '6.6M' '5.1M' '61M' '66M'\n",
|
|||
|
" '79k' '8.4M' '3.7M' '118k' '44M' '695k' '1.6M' '6.2M' '53M' '1.4M' '3.0M'\n",
|
|||
|
" '7.2M' '5.8M' '3.8M' '9.6M' '45M' '63M' '49M' '77M' '4.4M' '70M' '9.3M'\n",
|
|||
|
" '8.1M' '36M' '6.9M' '7.4M' '84M' '97M' '2.0M' '1.9M' '1.8M' '5.3M' '47M'\n",
|
|||
|
" '556k' '526k' '76M' '7.6M' '59M' '9.7M' '78M' '72M' '43M' '7.7M' '6.3M'\n",
|
|||
|
" '334k' '93M' '65M' '79M' '100M' '58M' '50M' '68M' '64M' '34M' '67M' '60M'\n",
|
|||
|
" '94M' '9.9M' '232k' '99M' '624k' '95M' '8.5k' '41k' '292k' '80M' '1.7M'\n",
|
|||
|
" '10.0M' '74M' '62M' '69M' '75M' '98M' '85M' '82M' '96M' '87M' '71M' '86M'\n",
|
|||
|
" '91M' '81M' '92M' '83M' '88M' '704k' '862k' '899k' '378k' '4.8M' '266k'\n",
|
|||
|
" '375k' '1.3M' '975k' '980k' '4.1M' '89M' '696k' '544k' '525k' '920k'\n",
|
|||
|
" '779k' '853k' '720k' '713k' '772k' '318k' '58k' '241k' '196k' '857k'\n",
|
|||
|
" '51k' '953k' '865k' '251k' '930k' '540k' '313k' '746k' '203k' '26k'\n",
|
|||
|
" '314k' '239k' '371k' '220k' '730k' '756k' '91k' '293k' '17k' '74k' '14k'\n",
|
|||
|
" '317k' '78k' '924k' '818k' '81k' '939k' '169k' '45k' '965k' '90M' '545k'\n",
|
|||
|
" '61k' '283k' '655k' '714k' '93k' '872k' '121k' '322k' '976k' '206k'\n",
|
|||
|
" '954k' '444k' '717k' '210k' '609k' '308k' '306k' '175k' '350k' '383k'\n",
|
|||
|
" '454k' '1.0M' '70k' '812k' '442k' '842k' '417k' '412k' '459k' '478k'\n",
|
|||
|
" '335k' '782k' '721k' '430k' '429k' '192k' '460k' '728k' '496k' '816k'\n",
|
|||
|
" '414k' '506k' '887k' '613k' '778k' '683k' '592k' '186k' '840k' '647k'\n",
|
|||
|
" '373k' '437k' '598k' '716k' '585k' '982k' '219k' '55k' '323k' '691k'\n",
|
|||
|
" '511k' '951k' '963k' '25k' '554k' '351k' '27k' '82k' '208k' '551k' '29k'\n",
|
|||
|
" '103k' '116k' '153k' '209k' '499k' '173k' '597k' '809k' '122k' '411k'\n",
|
|||
|
" '400k' '801k' '787k' '50k' '643k' '986k' '516k' '837k' '780k' '20k'\n",
|
|||
|
" '498k' '600k' '656k' '221k' '228k' '176k' '34k' '259k' '164k' '458k'\n",
|
|||
|
" '629k' '28k' '288k' '775k' '785k' '636k' '916k' '994k' '309k' '485k'\n",
|
|||
|
" '914k' '903k' '608k' '500k' '54k' '562k' '847k' '948k' '811k' '270k'\n",
|
|||
|
" '48k' '523k' '784k' '280k' '24k' '892k' '154k' '18k' '33k' '860k' '364k'\n",
|
|||
|
" '387k' '626k' '161k' '879k' '39k' '170k' '141k' '160k' '144k' '143k'\n",
|
|||
|
" '190k' '376k' '193k' '473k' '246k' '73k' '253k' '957k' '420k' '72k'\n",
|
|||
|
" '404k' '470k' '226k' '240k' '89k' '234k' '257k' '861k' '467k' '676k'\n",
|
|||
|
" '552k' '582k' '619k']\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"print(data[\"Size\"].unique())"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 63,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"1637"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 63,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"data[data.Size == 'Varies with device'].shape[0]"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 64,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"data = data.drop(columns=[\"Size\", \"Android Ver\", \"Current Ver\", \"Last Updated\"])"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 65,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"to_lowercase = ['App', 'Category', 'Type', 'Content Rating', 'Genres']\n",
|
|||
|
"for column in to_lowercase:\n",
|
|||
|
" data[column] = data[column].apply(str.lower)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 71,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/html": [
|
|||
|
"<div>\n",
|
|||
|
"<style scoped>\n",
|
|||
|
" .dataframe tbody tr th:only-of-type {\n",
|
|||
|
" vertical-align: middle;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe tbody tr th {\n",
|
|||
|
" vertical-align: top;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe thead th {\n",
|
|||
|
" text-align: right;\n",
|
|||
|
" }\n",
|
|||
|
"</style>\n",
|
|||
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|||
|
" <thead>\n",
|
|||
|
" <tr style=\"text-align: right;\">\n",
|
|||
|
" <th></th>\n",
|
|||
|
" <th>App</th>\n",
|
|||
|
" <th>Category</th>\n",
|
|||
|
" <th>Rating</th>\n",
|
|||
|
" <th>Reviews</th>\n",
|
|||
|
" <th>Installs</th>\n",
|
|||
|
" <th>Type</th>\n",
|
|||
|
" <th>Price</th>\n",
|
|||
|
" <th>Content Rating</th>\n",
|
|||
|
" <th>Genres</th>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </thead>\n",
|
|||
|
" <tbody>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>0</th>\n",
|
|||
|
" <td>photo editor & candy camera & grid & scrapbook</td>\n",
|
|||
|
" <td>art_and_design</td>\n",
|
|||
|
" <td>4.1</td>\n",
|
|||
|
" <td>2.021538e-06</td>\n",
|
|||
|
" <td>10000</td>\n",
|
|||
|
" <td>free</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>everyone</td>\n",
|
|||
|
" <td>art & design</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>1</th>\n",
|
|||
|
" <td>coloring book moana</td>\n",
|
|||
|
" <td>art_and_design</td>\n",
|
|||
|
" <td>3.9</td>\n",
|
|||
|
" <td>1.235953e-05</td>\n",
|
|||
|
" <td>500000</td>\n",
|
|||
|
" <td>free</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>everyone</td>\n",
|
|||
|
" <td>art & design;pretend play</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>2</th>\n",
|
|||
|
" <td>u launcher lite – free live cool themes, hide ...</td>\n",
|
|||
|
" <td>art_and_design</td>\n",
|
|||
|
" <td>4.7</td>\n",
|
|||
|
" <td>1.119638e-03</td>\n",
|
|||
|
" <td>5000000</td>\n",
|
|||
|
" <td>free</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>everyone</td>\n",
|
|||
|
" <td>art & design</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>3</th>\n",
|
|||
|
" <td>sketch - draw & paint</td>\n",
|
|||
|
" <td>art_and_design</td>\n",
|
|||
|
" <td>4.5</td>\n",
|
|||
|
" <td>2.759054e-03</td>\n",
|
|||
|
" <td>50000000</td>\n",
|
|||
|
" <td>free</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>teen</td>\n",
|
|||
|
" <td>art & design</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>4</th>\n",
|
|||
|
" <td>pixel draw - number art coloring book</td>\n",
|
|||
|
" <td>art_and_design</td>\n",
|
|||
|
" <td>4.3</td>\n",
|
|||
|
" <td>1.235953e-05</td>\n",
|
|||
|
" <td>100000</td>\n",
|
|||
|
" <td>free</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>everyone</td>\n",
|
|||
|
" <td>art & design;creativity</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>...</th>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>9355</th>\n",
|
|||
|
" <td>fr calculator</td>\n",
|
|||
|
" <td>family</td>\n",
|
|||
|
" <td>4.0</td>\n",
|
|||
|
" <td>7.676727e-08</td>\n",
|
|||
|
" <td>500</td>\n",
|
|||
|
" <td>free</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>everyone</td>\n",
|
|||
|
" <td>education</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>9356</th>\n",
|
|||
|
" <td>sya9a maroc - fr</td>\n",
|
|||
|
" <td>family</td>\n",
|
|||
|
" <td>4.5</td>\n",
|
|||
|
" <td>4.733982e-07</td>\n",
|
|||
|
" <td>5000</td>\n",
|
|||
|
" <td>free</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>everyone</td>\n",
|
|||
|
" <td>education</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>9357</th>\n",
|
|||
|
" <td>fr. mike schmitz audio teachings</td>\n",
|
|||
|
" <td>family</td>\n",
|
|||
|
" <td>5.0</td>\n",
|
|||
|
" <td>3.838364e-08</td>\n",
|
|||
|
" <td>100</td>\n",
|
|||
|
" <td>free</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>everyone</td>\n",
|
|||
|
" <td>education</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>9358</th>\n",
|
|||
|
" <td>the scp foundation db fr nn5n</td>\n",
|
|||
|
" <td>books_and_reference</td>\n",
|
|||
|
" <td>4.5</td>\n",
|
|||
|
" <td>1.445784e-06</td>\n",
|
|||
|
" <td>1000</td>\n",
|
|||
|
" <td>free</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>mature 17+</td>\n",
|
|||
|
" <td>books & reference</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>9359</th>\n",
|
|||
|
" <td>ihoroscope - 2018 daily horoscope & astrology</td>\n",
|
|||
|
" <td>lifestyle</td>\n",
|
|||
|
" <td>4.5</td>\n",
|
|||
|
" <td>5.096144e-03</td>\n",
|
|||
|
" <td>10000000</td>\n",
|
|||
|
" <td>free</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>everyone</td>\n",
|
|||
|
" <td>lifestyle</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </tbody>\n",
|
|||
|
"</table>\n",
|
|||
|
"<p>9360 rows × 9 columns</p>\n",
|
|||
|
"</div>"
|
|||
|
],
|
|||
|
"text/plain": [
|
|||
|
" App Category \\\n",
|
|||
|
"0 photo editor & candy camera & grid & scrapbook art_and_design \n",
|
|||
|
"1 coloring book moana art_and_design \n",
|
|||
|
"2 u launcher lite – free live cool themes, hide ... art_and_design \n",
|
|||
|
"3 sketch - draw & paint art_and_design \n",
|
|||
|
"4 pixel draw - number art coloring book art_and_design \n",
|
|||
|
"... ... ... \n",
|
|||
|
"9355 fr calculator family \n",
|
|||
|
"9356 sya9a maroc - fr family \n",
|
|||
|
"9357 fr. mike schmitz audio teachings family \n",
|
|||
|
"9358 the scp foundation db fr nn5n books_and_reference \n",
|
|||
|
"9359 ihoroscope - 2018 daily horoscope & astrology lifestyle \n",
|
|||
|
"\n",
|
|||
|
" Rating Reviews Installs Type Price Content Rating \\\n",
|
|||
|
"0 4.1 2.021538e-06 10000 free 0 everyone \n",
|
|||
|
"1 3.9 1.235953e-05 500000 free 0 everyone \n",
|
|||
|
"2 4.7 1.119638e-03 5000000 free 0 everyone \n",
|
|||
|
"3 4.5 2.759054e-03 50000000 free 0 teen \n",
|
|||
|
"4 4.3 1.235953e-05 100000 free 0 everyone \n",
|
|||
|
"... ... ... ... ... ... ... \n",
|
|||
|
"9355 4.0 7.676727e-08 500 free 0 everyone \n",
|
|||
|
"9356 4.5 4.733982e-07 5000 free 0 everyone \n",
|
|||
|
"9357 5.0 3.838364e-08 100 free 0 everyone \n",
|
|||
|
"9358 4.5 1.445784e-06 1000 free 0 mature 17+ \n",
|
|||
|
"9359 4.5 5.096144e-03 10000000 free 0 everyone \n",
|
|||
|
"\n",
|
|||
|
" Genres \n",
|
|||
|
"0 art & design \n",
|
|||
|
"1 art & design;pretend play \n",
|
|||
|
"2 art & design \n",
|
|||
|
"3 art & design \n",
|
|||
|
"4 art & design;creativity \n",
|
|||
|
"... ... \n",
|
|||
|
"9355 education \n",
|
|||
|
"9356 education \n",
|
|||
|
"9357 education \n",
|
|||
|
"9358 books & reference \n",
|
|||
|
"9359 lifestyle \n",
|
|||
|
"\n",
|
|||
|
"[9360 rows x 9 columns]"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 71,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"data[\"Installs\"] = data[\"Installs\"].replace({'\\+': ''}, regex=True)\n",
|
|||
|
"data[\"Installs\"] = data[\"Installs\"].replace({',': ''}, regex=True)\n",
|
|||
|
"data"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 72,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/html": [
|
|||
|
"<div>\n",
|
|||
|
"<style scoped>\n",
|
|||
|
" .dataframe tbody tr th:only-of-type {\n",
|
|||
|
" vertical-align: middle;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe tbody tr th {\n",
|
|||
|
" vertical-align: top;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe thead th {\n",
|
|||
|
" text-align: right;\n",
|
|||
|
" }\n",
|
|||
|
"</style>\n",
|
|||
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|||
|
" <thead>\n",
|
|||
|
" <tr style=\"text-align: right;\">\n",
|
|||
|
" <th></th>\n",
|
|||
|
" <th>App</th>\n",
|
|||
|
" <th>Category</th>\n",
|
|||
|
" <th>Rating</th>\n",
|
|||
|
" <th>Reviews</th>\n",
|
|||
|
" <th>Installs</th>\n",
|
|||
|
" <th>Type</th>\n",
|
|||
|
" <th>Price</th>\n",
|
|||
|
" <th>Content Rating</th>\n",
|
|||
|
" <th>Genres</th>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </thead>\n",
|
|||
|
" <tbody>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>0</th>\n",
|
|||
|
" <td>photo editor & candy camera & grid & scrapbook</td>\n",
|
|||
|
" <td>art_and_design</td>\n",
|
|||
|
" <td>4.1</td>\n",
|
|||
|
" <td>2.021538e-06</td>\n",
|
|||
|
" <td>9.999000e-06</td>\n",
|
|||
|
" <td>free</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>everyone</td>\n",
|
|||
|
" <td>art & design</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>1</th>\n",
|
|||
|
" <td>coloring book moana</td>\n",
|
|||
|
" <td>art_and_design</td>\n",
|
|||
|
" <td>3.9</td>\n",
|
|||
|
" <td>1.235953e-05</td>\n",
|
|||
|
" <td>4.999990e-04</td>\n",
|
|||
|
" <td>free</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>everyone</td>\n",
|
|||
|
" <td>art & design;pretend play</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>2</th>\n",
|
|||
|
" <td>u launcher lite – free live cool themes, hide ...</td>\n",
|
|||
|
" <td>art_and_design</td>\n",
|
|||
|
" <td>4.7</td>\n",
|
|||
|
" <td>1.119638e-03</td>\n",
|
|||
|
" <td>4.999999e-03</td>\n",
|
|||
|
" <td>free</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>everyone</td>\n",
|
|||
|
" <td>art & design</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>3</th>\n",
|
|||
|
" <td>sketch - draw & paint</td>\n",
|
|||
|
" <td>art_and_design</td>\n",
|
|||
|
" <td>4.5</td>\n",
|
|||
|
" <td>2.759054e-03</td>\n",
|
|||
|
" <td>5.000000e-02</td>\n",
|
|||
|
" <td>free</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>teen</td>\n",
|
|||
|
" <td>art & design</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>4</th>\n",
|
|||
|
" <td>pixel draw - number art coloring book</td>\n",
|
|||
|
" <td>art_and_design</td>\n",
|
|||
|
" <td>4.3</td>\n",
|
|||
|
" <td>1.235953e-05</td>\n",
|
|||
|
" <td>9.999900e-05</td>\n",
|
|||
|
" <td>free</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>everyone</td>\n",
|
|||
|
" <td>art & design;creativity</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>...</th>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>9355</th>\n",
|
|||
|
" <td>fr calculator</td>\n",
|
|||
|
" <td>family</td>\n",
|
|||
|
" <td>4.0</td>\n",
|
|||
|
" <td>7.676727e-08</td>\n",
|
|||
|
" <td>4.990000e-07</td>\n",
|
|||
|
" <td>free</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>everyone</td>\n",
|
|||
|
" <td>education</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>9356</th>\n",
|
|||
|
" <td>sya9a maroc - fr</td>\n",
|
|||
|
" <td>family</td>\n",
|
|||
|
" <td>4.5</td>\n",
|
|||
|
" <td>4.733982e-07</td>\n",
|
|||
|
" <td>4.999000e-06</td>\n",
|
|||
|
" <td>free</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>everyone</td>\n",
|
|||
|
" <td>education</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>9357</th>\n",
|
|||
|
" <td>fr. mike schmitz audio teachings</td>\n",
|
|||
|
" <td>family</td>\n",
|
|||
|
" <td>5.0</td>\n",
|
|||
|
" <td>3.838364e-08</td>\n",
|
|||
|
" <td>9.900000e-08</td>\n",
|
|||
|
" <td>free</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>everyone</td>\n",
|
|||
|
" <td>education</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>9358</th>\n",
|
|||
|
" <td>the scp foundation db fr nn5n</td>\n",
|
|||
|
" <td>books_and_reference</td>\n",
|
|||
|
" <td>4.5</td>\n",
|
|||
|
" <td>1.445784e-06</td>\n",
|
|||
|
" <td>9.990000e-07</td>\n",
|
|||
|
" <td>free</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>mature 17+</td>\n",
|
|||
|
" <td>books & reference</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>9359</th>\n",
|
|||
|
" <td>ihoroscope - 2018 daily horoscope & astrology</td>\n",
|
|||
|
" <td>lifestyle</td>\n",
|
|||
|
" <td>4.5</td>\n",
|
|||
|
" <td>5.096144e-03</td>\n",
|
|||
|
" <td>9.999999e-03</td>\n",
|
|||
|
" <td>free</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>everyone</td>\n",
|
|||
|
" <td>lifestyle</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </tbody>\n",
|
|||
|
"</table>\n",
|
|||
|
"<p>9360 rows × 9 columns</p>\n",
|
|||
|
"</div>"
|
|||
|
],
|
|||
|
"text/plain": [
|
|||
|
" App Category \\\n",
|
|||
|
"0 photo editor & candy camera & grid & scrapbook art_and_design \n",
|
|||
|
"1 coloring book moana art_and_design \n",
|
|||
|
"2 u launcher lite – free live cool themes, hide ... art_and_design \n",
|
|||
|
"3 sketch - draw & paint art_and_design \n",
|
|||
|
"4 pixel draw - number art coloring book art_and_design \n",
|
|||
|
"... ... ... \n",
|
|||
|
"9355 fr calculator family \n",
|
|||
|
"9356 sya9a maroc - fr family \n",
|
|||
|
"9357 fr. mike schmitz audio teachings family \n",
|
|||
|
"9358 the scp foundation db fr nn5n books_and_reference \n",
|
|||
|
"9359 ihoroscope - 2018 daily horoscope & astrology lifestyle \n",
|
|||
|
"\n",
|
|||
|
" Rating Reviews Installs Type Price Content Rating \\\n",
|
|||
|
"0 4.1 2.021538e-06 9.999000e-06 free 0 everyone \n",
|
|||
|
"1 3.9 1.235953e-05 4.999990e-04 free 0 everyone \n",
|
|||
|
"2 4.7 1.119638e-03 4.999999e-03 free 0 everyone \n",
|
|||
|
"3 4.5 2.759054e-03 5.000000e-02 free 0 teen \n",
|
|||
|
"4 4.3 1.235953e-05 9.999900e-05 free 0 everyone \n",
|
|||
|
"... ... ... ... ... ... ... \n",
|
|||
|
"9355 4.0 7.676727e-08 4.990000e-07 free 0 everyone \n",
|
|||
|
"9356 4.5 4.733982e-07 4.999000e-06 free 0 everyone \n",
|
|||
|
"9357 5.0 3.838364e-08 9.900000e-08 free 0 everyone \n",
|
|||
|
"9358 4.5 1.445784e-06 9.990000e-07 free 0 mature 17+ \n",
|
|||
|
"9359 4.5 5.096144e-03 9.999999e-03 free 0 everyone \n",
|
|||
|
"\n",
|
|||
|
" Genres \n",
|
|||
|
"0 art & design \n",
|
|||
|
"1 art & design;pretend play \n",
|
|||
|
"2 art & design \n",
|
|||
|
"3 art & design \n",
|
|||
|
"4 art & design;creativity \n",
|
|||
|
"... ... \n",
|
|||
|
"9355 education \n",
|
|||
|
"9356 education \n",
|
|||
|
"9357 education \n",
|
|||
|
"9358 books & reference \n",
|
|||
|
"9359 lifestyle \n",
|
|||
|
"\n",
|
|||
|
"[9360 rows x 9 columns]"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 72,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"data[\"Reviews\"] = pd.to_numeric(data[\"Reviews\"], errors='coerce')\n",
|
|||
|
"max_value = data[\"Reviews\"].max()\n",
|
|||
|
"min_value = data[\"Reviews\"].min()\n",
|
|||
|
"data[\"Reviews\"] = (data[\"Reviews\"] - min_value) / (max_value - min_value)\n",
|
|||
|
"\n",
|
|||
|
"data[\"Installs\"] = pd.to_numeric(data[\"Installs\"], errors='coerce')\n",
|
|||
|
"max_value = data[\"Installs\"].max()\n",
|
|||
|
"min_value = data[\"Installs\"].min()\n",
|
|||
|
"data[\"Installs\"] = (data[\"Installs\"] - min_value) / (max_value - min_value)\n",
|
|||
|
"data"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 75,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/html": [
|
|||
|
"<div>\n",
|
|||
|
"<style scoped>\n",
|
|||
|
" .dataframe tbody tr th:only-of-type {\n",
|
|||
|
" vertical-align: middle;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe tbody tr th {\n",
|
|||
|
" vertical-align: top;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe thead th {\n",
|
|||
|
" text-align: right;\n",
|
|||
|
" }\n",
|
|||
|
"</style>\n",
|
|||
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|||
|
" <thead>\n",
|
|||
|
" <tr style=\"text-align: right;\">\n",
|
|||
|
" <th></th>\n",
|
|||
|
" <th>App</th>\n",
|
|||
|
" <th>Category</th>\n",
|
|||
|
" <th>Rating</th>\n",
|
|||
|
" <th>Reviews</th>\n",
|
|||
|
" <th>Installs</th>\n",
|
|||
|
" <th>Type</th>\n",
|
|||
|
" <th>Price</th>\n",
|
|||
|
" <th>Content Rating</th>\n",
|
|||
|
" <th>Genres</th>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </thead>\n",
|
|||
|
" <tbody>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>count</th>\n",
|
|||
|
" <td>9360</td>\n",
|
|||
|
" <td>9360</td>\n",
|
|||
|
" <td>9360.000000</td>\n",
|
|||
|
" <td>9360.000000</td>\n",
|
|||
|
" <td>9360.000000</td>\n",
|
|||
|
" <td>9360</td>\n",
|
|||
|
" <td>9360</td>\n",
|
|||
|
" <td>9360</td>\n",
|
|||
|
" <td>9360</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>unique</th>\n",
|
|||
|
" <td>8174</td>\n",
|
|||
|
" <td>33</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>2</td>\n",
|
|||
|
" <td>73</td>\n",
|
|||
|
" <td>6</td>\n",
|
|||
|
" <td>115</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>top</th>\n",
|
|||
|
" <td>roblox</td>\n",
|
|||
|
" <td>family</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>free</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>everyone</td>\n",
|
|||
|
" <td>tools</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>freq</th>\n",
|
|||
|
" <td>9</td>\n",
|
|||
|
" <td>1746</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>8715</td>\n",
|
|||
|
" <td>8715</td>\n",
|
|||
|
" <td>7414</td>\n",
|
|||
|
" <td>732</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>mean</th>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>4.191838</td>\n",
|
|||
|
" <td>0.006581</td>\n",
|
|||
|
" <td>0.017909</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>std</th>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>0.515263</td>\n",
|
|||
|
" <td>0.040239</td>\n",
|
|||
|
" <td>0.091266</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>min</th>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>1.000000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>25%</th>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>4.000000</td>\n",
|
|||
|
" <td>0.000002</td>\n",
|
|||
|
" <td>0.000010</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>50%</th>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>4.300000</td>\n",
|
|||
|
" <td>0.000076</td>\n",
|
|||
|
" <td>0.000500</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>75%</th>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>4.500000</td>\n",
|
|||
|
" <td>0.001044</td>\n",
|
|||
|
" <td>0.005000</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>max</th>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>5.000000</td>\n",
|
|||
|
" <td>1.000000</td>\n",
|
|||
|
" <td>1.000000</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </tbody>\n",
|
|||
|
"</table>\n",
|
|||
|
"</div>"
|
|||
|
],
|
|||
|
"text/plain": [
|
|||
|
" App Category Rating Reviews Installs Type Price \\\n",
|
|||
|
"count 9360 9360 9360.000000 9360.000000 9360.000000 9360 9360 \n",
|
|||
|
"unique 8174 33 NaN NaN NaN 2 73 \n",
|
|||
|
"top roblox family NaN NaN NaN free 0 \n",
|
|||
|
"freq 9 1746 NaN NaN NaN 8715 8715 \n",
|
|||
|
"mean NaN NaN 4.191838 0.006581 0.017909 NaN NaN \n",
|
|||
|
"std NaN NaN 0.515263 0.040239 0.091266 NaN NaN \n",
|
|||
|
"min NaN NaN 1.000000 0.000000 0.000000 NaN NaN \n",
|
|||
|
"25% NaN NaN 4.000000 0.000002 0.000010 NaN NaN \n",
|
|||
|
"50% NaN NaN 4.300000 0.000076 0.000500 NaN NaN \n",
|
|||
|
"75% NaN NaN 4.500000 0.001044 0.005000 NaN NaN \n",
|
|||
|
"max NaN NaN 5.000000 1.000000 1.000000 NaN NaN \n",
|
|||
|
"\n",
|
|||
|
" Content Rating Genres \n",
|
|||
|
"count 9360 9360 \n",
|
|||
|
"unique 6 115 \n",
|
|||
|
"top everyone tools \n",
|
|||
|
"freq 7414 732 \n",
|
|||
|
"mean NaN NaN \n",
|
|||
|
"std NaN NaN \n",
|
|||
|
"min NaN NaN \n",
|
|||
|
"25% NaN NaN \n",
|
|||
|
"50% NaN NaN \n",
|
|||
|
"75% NaN NaN \n",
|
|||
|
"max NaN NaN "
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 75,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"data.describe(include='all')"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"## Splitting into test, train, validation sets"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 68,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Data shape: (9360, 9)\n",
|
|||
|
"Train shape: (5616, 9)\n",
|
|||
|
"Test shape: (1872, 9)\n",
|
|||
|
"Validation shape:(1872, 9)\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"import numpy as np\n",
|
|||
|
"\n",
|
|||
|
"np.random.seed(123)\n",
|
|||
|
"train, validate, test = np.split(data.sample(frac=1, random_state=42), [int(.6*len(data)), int(.8*len(data))])\n",
|
|||
|
"print(f\"Data shape: {data.shape}\\nTrain shape: {train.shape}\\nTest shape: {test.shape}\\nValidation shape:{validate.shape}\")"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": []
|
|||
|
}
|
|||
|
],
|
|||
|
"metadata": {
|
|||
|
"kernelspec": {
|
|||
|
"display_name": "Python 3",
|
|||
|
"language": "python",
|
|||
|
"name": "python3"
|
|||
|
},
|
|||
|
"language_info": {
|
|||
|
"codemirror_mode": {
|
|||
|
"name": "ipython",
|
|||
|
"version": 3
|
|||
|
},
|
|||
|
"file_extension": ".py",
|
|||
|
"mimetype": "text/x-python",
|
|||
|
"name": "python",
|
|||
|
"nbconvert_exporter": "python",
|
|||
|
"pygments_lexer": "ipython3",
|
|||
|
"version": "3.8.1"
|
|||
|
}
|
|||
|
},
|
|||
|
"nbformat": 4,
|
|||
|
"nbformat_minor": 4
|
|||
|
}
|