{ "cells": [ { "cell_type": "code", "execution_count": 1, "id": "spread-happiness", "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "%load_ext autoreload\n", "%autoreload 2\n", "\n", "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "from IPython.display import Markdown, display, HTML\n", "\n", "# Fix the dying kernel problem (only a problem in some installations - you can remove it, if it works without it)\n", "import os\n", "os.environ['KMP_DUPLICATE_LIB_OK'] = 'True'" ] }, { "cell_type": "markdown", "id": "approximate-classic", "metadata": {}, "source": [ "# Numpy\n", "\n", "For a detailed reference check out: https://numpy.org/doc/stable/reference/arrays.indexing.html." ] }, { "cell_type": "markdown", "id": "renewable-chase", "metadata": {}, "source": [ "## Creating numpy arrays" ] }, { "cell_type": "markdown", "id": "afraid-consortium", "metadata": {}, "source": [ "### Directly" ] }, { "cell_type": "code", "execution_count": 4, "id": "textile-mainland", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1. 2. 3.]\n", " [4. 5. 6.]\n", " [7. 8. 9.]]\n" ] } ], "source": [ "a = np.array(\n", " [[1.0, 2.0, 3.0], \n", " [4.0, 5.0, 6.0], \n", " [7.0, 8.0, 9.0]]\n", ")\n", "\n", "print(a)" ] }, { "cell_type": "markdown", "id": "floating-junior", "metadata": {}, "source": [ "### From a list" ] }, { "cell_type": "code", "execution_count": 5, "id": "reasonable-mistress", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]]\n", "\n", "[[1. 2. 3.]\n", " [4. 5. 6.]\n", " [7. 8. 9.]]\n" ] } ], "source": [ "a = [[1.0, 2.0, 3.0], \n", " [4.0, 5.0, 6.0], \n", " [7.0, 8.0, 9.0]]\n", "\n", "print(a)\n", "print()\n", "\n", "a = np.array(a)\n", "\n", "print(a)" ] }, { "cell_type": "markdown", "id": "incorrect-practitioner", "metadata": {}, "source": [ "### From a list comprehension" ] }, { "cell_type": "code", "execution_count": 6, "id": "straight-cooling", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]\n", "\n", "[ 0 1 4 9 16 25 36 49 64 81]\n" ] } ], "source": [ "a = [i**2 for i in range(10)]\n", "\n", "print(a)\n", "print()\n", "print(np.array(a))" ] }, { "cell_type": "markdown", "id": "suffering-myanmar", "metadata": {}, "source": [ "### Ready-made functions in numpy" ] }, { "cell_type": "code", "execution_count": 7, "id": "expensive-bowling", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "All zeros\n", "[[0. 0. 0. 0.]\n", " [0. 0. 0. 0.]\n", " [0. 0. 0. 0.]]\n", "\n", "All chosen value (variant 1)\n", "[[7. 7. 7. 7.]\n", " [7. 7. 7. 7.]\n", " [7. 7. 7. 7.]]\n", "\n", "All chosen value (variant 2)\n", "[[7. 7. 7. 7.]\n", " [7. 7. 7. 7.]\n", " [7. 7. 7. 7.]]\n", "\n", "Random integers\n", "[[7 5]\n", " [9 8]\n", " [6 3]]\n", "\n", "Random values from the normal distribution\n", "[[ 3.88109518 -15.30896612]\n", " [ 7.88779281 7.67458172]\n", " [ -9.81026963 -6.02098263]]\n" ] } ], "source": [ "# All zeros\n", "a = np.zeros((3, 4))\n", "print(\"All zeros\")\n", "print(a)\n", "print()\n", "\n", "# All a chosen value\n", "a = np.full((3, 4), 7.0)\n", "print(\"All chosen value (variant 1)\")\n", "print(a)\n", "print()\n", "\n", "# or\n", "\n", "a = np.zeros((3, 4))\n", "a[:] = 7.0\n", "print(\"All chosen value (variant 2)\")\n", "print(a)\n", "print()\n", "\n", "# Random integers\n", "\n", "a = np.random.randint(low=0, high=10, size=(3, 2))\n", "print(\"Random integers\")\n", "print(a)\n", "print()\n", "\n", "# Random values from the normal distribution (Gaussian)\n", "\n", "print(\"Random values from the normal distribution\")\n", "a = np.random.normal(loc=0, scale=10, size=(3, 2))\n", "print(a)" ] }, { "cell_type": "markdown", "id": "aggressive-titanium", "metadata": {}, "source": [ "## Slicing numpy arrays" ] }, { "cell_type": "markdown", "id": "former-richardson", "metadata": {}, "source": [ "### Slicing in 1D\n", "\n", "To obtain only specific values from a numpy array one can use so called slicing. It has the form\n", "\n", "**arr[low:high:step]**\n", "\n", "where low is the lowest index to be retrieved, high is the lowest index not to be retrieved and step indicates that every step element will be taken." ] }, { "cell_type": "code", "execution_count": 10, "id": "desirable-documentary", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Original: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]\n", "First 5 elements: [0, 1, 4, 9, 16]\n", "Elements from index 3 to index 5: [9, 16, 25]\n", "Last 3 elements (negative indexing): [49, 64, 81]\n", "Printed in reverse order: [81, 64, 49, 36, 25, 16, 9, 4, 1, 0]\n", "Every second element: [0, 4, 16, 36, 64]\n" ] } ], "source": [ "a = [i**2 for i in range(10)]\n", "\n", "print(\"Original: \", a)\n", "print(\"First 5 elements:\", a[:5])\n", "print(\"Elements from index 3 to index 5:\", a[3:6])\n", "print(\"Last 3 elements (negative indexing):\", a[-3:])\n", "print(\"Printed in reverse order:\", a[::-1])\n", "print(\"Every second element:\", a[::2])" ] }, { "cell_type": "markdown", "id": "micro-explosion", "metadata": {}, "source": [ "### Slicing in 2D\n", "\n", "In two dimensions it works similarly, just the slicing is separate for every dimension." ] }, { "cell_type": "code", "execution_count": 11, "id": "disciplinary-think", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Original: \n", "[[ 0 1 2 3 4]\n", " [ 5 6 7 8 9]\n", " [10 11 12 13 14]\n", " [15 16 17 18 19]\n", " [20 21 22 23 24]]\n", "\n", "First 2 elements of the first 3 row:\n", "[[ 0 1]\n", " [ 5 6]\n", " [10 11]]\n", "\n", "Middle 3 elements from the middle 3 rows:\n", "[[ 6 7 8]\n", " [11 12 13]\n", " [16 17 18]]\n", "\n", "Bottom-right 3 by 3 submatrix (negative indexing):\n", "[[12 13 14]\n", " [17 18 19]\n", " [22 23 24]]\n", "\n", "Reversed columns:\n", "[[ 4 3 2 1 0]\n", " [ 9 8 7 6 5]\n", " [14 13 12 11 10]\n", " [19 18 17 16 15]\n", " [24 23 22 21 20]]\n", "\n" ] } ], "source": [ "a = np.array([i for i in range(25)]).reshape(5, 5)\n", "\n", "print(\"Original: \")\n", "print(a)\n", "print()\n", "print(\"First 2 elements of the first 3 row:\")\n", "print(a[:3, :2])\n", "print()\n", "print(\"Middle 3 elements from the middle 3 rows:\")\n", "print(a[1:4, 1:4])\n", "print()\n", "print(\"Bottom-right 3 by 3 submatrix (negative indexing):\")\n", "print(a[-3:, -3:])\n", "print()\n", "print(\"Reversed columns:\")\n", "print(a[:, ::-1])\n", "print()" ] }, { "cell_type": "markdown", "id": "removable-canyon", "metadata": {}, "source": [ "### Setting numpy array field values" ] }, { "cell_type": "code", "execution_count": 12, "id": "senior-serbia", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Original: \n", "[[ 0 1 2 3 4]\n", " [ 5 6 7 8 9]\n", " [10 11 12 13 14]\n", " [15 16 17 18 19]\n", " [20 21 22 23 24]]\n", "\n", "Middle values changed to 5\n", "[[ 0 1 2 3 4]\n", " [ 5 5 5 5 9]\n", " [10 5 5 5 14]\n", " [15 5 5 5 19]\n", " [20 21 22 23 24]]\n", "\n", "Second matrix\n", "[[ 0 0 2]\n", " [ 6 12 20]\n", " [30 42 56]]\n", "\n", "Second matrix substituted into the middle of the first matrix\n", "[[ 0 1 2 3 4]\n", " [ 5 0 0 2 9]\n", " [10 6 12 20 14]\n", " [15 30 42 56 19]\n", " [20 21 22 23 24]]\n" ] } ], "source": [ "a = np.array([i for i in range(25)]).reshape(5, 5)\n", "\n", "print(\"Original: \")\n", "print(a)\n", "print()\n", "\n", "a[1:4, 1:4] = 5.0\n", "\n", "print(\"Middle values changed to 5\")\n", "print(a)\n", "print()\n", "\n", "b = np.array([i**2 - i for i in range(9)]).reshape(3, 3)\n", "\n", "print(\"Second matrix\")\n", "print(b)\n", "print()\n", "\n", "a[1:4, 1:4] = b\n", "\n", "print(\"Second matrix substituted into the middle of the first matrix\")\n", "print(a)" ] }, { "cell_type": "markdown", "id": "federal-wayne", "metadata": {}, "source": [ "## Operations on numpy arrays\n", "\n", "It is important to remember that arithmetic operations on numpy arrays are always element-wise." ] }, { "cell_type": "code", "execution_count": 13, "id": "southwest-biotechnology", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 0 1 4]\n", " [ 9 16 25]\n", " [36 49 64]]\n", "\n", "[[0. 1. 1.41421356]\n", " [1.73205081 2. 2.23606798]\n", " [2.44948974 2.64575131 2.82842712]]\n", "\n" ] } ], "source": [ "a = np.array([i**2 for i in range(9)]).reshape((3, 3))\n", "print(a)\n", "print()\n", "\n", "b = np.array([i**0.5 for i in range(9)]).reshape((3, 3))\n", "print(b)\n", "print()" ] }, { "cell_type": "markdown", "id": "intensive-gates", "metadata": {}, "source": [ "### Element-wise sum" ] }, { "cell_type": "code", "execution_count": 14, "id": "behavioral-safety", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 0. 2. 5.41421356]\n", " [10.73205081 18. 27.23606798]\n", " [38.44948974 51.64575131 66.82842712]]\n" ] } ], "source": [ "print(a + b)" ] }, { "cell_type": "markdown", "id": "occupied-trial", "metadata": {}, "source": [ "### Element-wise multiplication" ] }, { "cell_type": "code", "execution_count": 15, "id": "charming-pleasure", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 0. 1. 5.65685425]\n", " [ 15.58845727 32. 55.90169944]\n", " [ 88.18163074 129.64181424 181.01933598]]\n" ] } ], "source": [ "print(a * b)" ] }, { "cell_type": "markdown", "id": "efficient-league", "metadata": {}, "source": [ "### Matrix multiplication" ] }, { "cell_type": "code", "execution_count": 17, "id": "changing-community", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 11.53000978 12.58300524 13.54977648]\n", " [ 88.95005649 107.14378278 119.21568782]\n", " [241.63783311 303.32808391 341.49835513]]\n", "\n", "[[ 0. 1. 4.]\n", " [ 9. 16. 25.]\n", " [36. 49. 64.]]\n" ] } ], "source": [ "print(np.matmul(a, b))\n", "print()\n", "\n", "# Multiplication by the identity matrix (to check it works as expected)\n", "id_matrix = np.array([[1.0, 0.0, 0.0], \n", " [0.0, 1.0, 0.0], \n", " [0.0, 0.0, 1.0]])\n", "\n", "print(np.matmul(id_matrix, a))" ] }, { "cell_type": "markdown", "id": "assisted-communications", "metadata": {}, "source": [ "### Calculating the mean" ] }, { "cell_type": "code", "execution_count": 22, "id": "defensive-wrong", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1 4 0 6 4]\n", "\n", "Mean (by sum): 3.0\n", "Mean (by mean): 3.0\n" ] } ], "source": [ "a = np.random.randint(low=0, high=10, size=(5))\n", "\n", "print(a)\n", "print()\n", "\n", "print(\"Mean (by sum): \", np.sum(a) / len(a))\n", "print(\"Mean (by mean):\", np.mean(a))" ] }, { "cell_type": "markdown", "id": "complex-karma", "metadata": {}, "source": [ "### Calculating the mean of every row" ] }, { "cell_type": "code", "execution_count": 30, "id": "correct-dietary", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[4 9 5]\n", " [8 9 1]\n", " [5 6 4]\n", " [3 7 8]\n", " [2 1 5]]\n", "\n", "(5, 3)\n", "\n", "Mean: [6. 6. 5. 6. 2.66666667]\n", "Mean in the original matrix form:\n", "[[6. ]\n", " [6. ]\n", " [5. ]\n", " [6. ]\n", " [2.66666667]]\n" ] } ], "source": [ "a = np.random.randint(low=0, high=10, size=(5, 3))\n", "\n", "print(a)\n", "print()\n", "print(a.shape)\n", "print()\n", "\n", "print(\"Mean:\", np.sum(a, axis=1) / a.shape[1])\n", "\n", "print(\"Mean in the original matrix form:\")\n", "print((np.sum(a, axis=1) / a.shape[1]).reshape(-1, 1)) # -1 calculates the right size to use all elements" ] }, { "cell_type": "markdown", "id": "indian-orlando", "metadata": {}, "source": [ "### More complex operations" ] }, { "cell_type": "code", "execution_count": 31, "id": "presidential-cologne", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Vector to power 2 (element-wise)\n", "[1. 4. 9.]\n", "\n", "Euler number to the power a (element-wise)\n", "[ 2.71828183 7.3890561 20.08553692]\n", "\n", "An even more complex expression\n", "[0.61971364 1.89817602 4.84758949]\n" ] } ], "source": [ "a = [1.0, 2.0, 3.0]\n", "\n", "print(\"Vector to power 2 (element-wise)\")\n", "print(np.power(a, 2))\n", "print()\n", "print(\"Euler number to the power a (element-wise)\")\n", "print(np.exp(a))\n", "print()\n", "print(\"An even more complex expression\")\n", "print((np.power(a, 2) + np.exp(a)) / np.sum(a))" ] }, { "cell_type": "markdown", "id": "hearing-street", "metadata": {}, "source": [ "## Numpy tasks" ] }, { "cell_type": "markdown", "id": "regular-niger", "metadata": {}, "source": [ "**Task 1.** Calculate the sigmoid (logistic) function on every element of the following numpy array [0.3, 1.2, -1.4, 0.2, -0.1, 0.1, 0.8, -0.25] and print the last 5 elements. Use only vector operations." ] }, { "cell_type": "code", "execution_count": null, "id": "agreed-single", "metadata": {}, "outputs": [], "source": [ "# Write your code here" ] }, { "cell_type": "markdown", "id": "another-catch", "metadata": {}, "source": [ "**Task 2.** Calculate the dot product of the following two vectors:
\n", "$x = [3, 1, 4, 2, 6, 1, 4, 8]$
\n", "$y = [5, 2, 3, 12, 2, 4, 17, 11]$
\n", "a) by using element-wise mutliplication and np.sum,
\n", "b) by using np.dot,
\n", "b) by using np.matmul and transposition (x.T)." ] }, { "cell_type": "code", "execution_count": null, "id": "forbidden-journalism", "metadata": {}, "outputs": [], "source": [ "# Write your code here" ] }, { "cell_type": "markdown", "id": "acute-amber", "metadata": {}, "source": [ "**Task 3.** Calculate the following expression
\n", "$$\\frac{1}{1 + e^{-x_0 \\theta_0 - \\ldots - x_9 \\theta_9 - \\theta_{10}}}$$\n", "for
\n", "$x = [1.2, 2.3, 3.4, -0.7, 4.2, 2.7, -0.5, -2.1, -3.3, 0.2]$
\n", "$\\theta = [7.7, 0.33, -2.12, -1.73, 2.9, -5.8, -0.9, 12.11, 3.43, -0.5, 1.65]$
\n", "and print the result. Use only vector operations." ] }, { "cell_type": "code", "execution_count": null, "id": "falling-holder", "metadata": {}, "outputs": [], "source": [ "# Write your code here" ] }, { "cell_type": "markdown", "id": "latter-vector", "metadata": {}, "source": [ "# Pandas" ] }, { "cell_type": "markdown", "id": "contrary-vacuum", "metadata": {}, "source": [ "## Load datasets\n", "\n", "- Steam (https://www.kaggle.com/tamber/steam-video-games)\n", "\n", "- MovieLens (https://grouplens.org/datasets/movielens/)" ] }, { "cell_type": "code", "execution_count": 32, "id": "alert-friday", "metadata": {}, "outputs": [], "source": [ "steam_df = pd.read_csv(os.path.join(\"data\", \"steam\", \"steam-200k.csv\"), \n", " names=['user-id', 'game-title', 'behavior-name', 'value', 'zero'])\n", "\n", "ml_ratings_df = pd.read_csv(os.path.join(\"data\", \"movielens_small\", \"ratings.csv\"))\n", "ml_movies_df = pd.read_csv(os.path.join(\"data\", \"movielens_small\", \"movies.csv\"))" ] }, { "cell_type": "markdown", "id": "personal-productivity", "metadata": {}, "source": [ "## Peek into the datasets" ] }, { "cell_type": "code", "execution_count": 33, "id": "musical-trust", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
user-idgame-titlebehavior-namevaluezero
0151603712The Elder Scrolls V Skyrimpurchase1.00
1151603712The Elder Scrolls V Skyrimplay273.00
2151603712Fallout 4purchase1.00
3151603712Fallout 4play87.00
4151603712Sporepurchase1.00
5151603712Sporeplay14.90
6151603712Fallout New Vegaspurchase1.00
7151603712Fallout New Vegasplay12.10
8151603712Left 4 Dead 2purchase1.00
9151603712Left 4 Dead 2play8.90
\n", "
" ], "text/plain": [ " user-id game-title behavior-name value zero\n", "0 151603712 The Elder Scrolls V Skyrim purchase 1.0 0\n", "1 151603712 The Elder Scrolls V Skyrim play 273.0 0\n", "2 151603712 Fallout 4 purchase 1.0 0\n", "3 151603712 Fallout 4 play 87.0 0\n", "4 151603712 Spore purchase 1.0 0\n", "5 151603712 Spore play 14.9 0\n", "6 151603712 Fallout New Vegas purchase 1.0 0\n", "7 151603712 Fallout New Vegas play 12.1 0\n", "8 151603712 Left 4 Dead 2 purchase 1.0 0\n", "9 151603712 Left 4 Dead 2 play 8.9 0" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "steam_df.head(10)" ] }, { "cell_type": "code", "execution_count": 34, "id": "electrical-floor", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
userIdmovieIdratingtimestamp
0114.0964982703
1134.0964981247
2164.0964982224
31475.0964983815
41505.0964982931
51703.0964982400
611015.0964980868
711104.0964982176
811515.0964984041
911575.0964984100
\n", "
" ], "text/plain": [ " userId movieId rating timestamp\n", "0 1 1 4.0 964982703\n", "1 1 3 4.0 964981247\n", "2 1 6 4.0 964982224\n", "3 1 47 5.0 964983815\n", "4 1 50 5.0 964982931\n", "5 1 70 3.0 964982400\n", "6 1 101 5.0 964980868\n", "7 1 110 4.0 964982176\n", "8 1 151 5.0 964984041\n", "9 1 157 5.0 964984100" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ml_ratings_df.head(10)" ] }, { "cell_type": "code", "execution_count": 36, "id": "cordless-daniel", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
movieIdtitlegenres
01Toy Story (1995)Adventure|Animation|Children|Comedy|Fantasy
12Jumanji (1995)Adventure|Children|Fantasy
23Grumpier Old Men (1995)Comedy|Romance
34Waiting to Exhale (1995)Comedy|Drama|Romance
45Father of the Bride Part II (1995)Comedy
56Heat (1995)Action|Crime|Thriller
67Sabrina (1995)Comedy|Romance
78Tom and Huck (1995)Adventure|Children
89Sudden Death (1995)Action
910GoldenEye (1995)Action|Adventure|Thriller
\n", "
" ], "text/plain": [ " movieId title \\\n", "0 1 Toy Story (1995) \n", "1 2 Jumanji (1995) \n", "2 3 Grumpier Old Men (1995) \n", "3 4 Waiting to Exhale (1995) \n", "4 5 Father of the Bride Part II (1995) \n", "5 6 Heat (1995) \n", "6 7 Sabrina (1995) \n", "7 8 Tom and Huck (1995) \n", "8 9 Sudden Death (1995) \n", "9 10 GoldenEye (1995) \n", "\n", " genres \n", "0 Adventure|Animation|Children|Comedy|Fantasy \n", "1 Adventure|Children|Fantasy \n", "2 Comedy|Romance \n", "3 Comedy|Drama|Romance \n", "4 Comedy \n", "5 Action|Crime|Thriller \n", "6 Comedy|Romance \n", "7 Adventure|Children \n", "8 Action \n", "9 Action|Adventure|Thriller " ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ml_movies_df.head(10)" ] }, { "cell_type": "markdown", "id": "alpha-portal", "metadata": {}, "source": [ "## Merge both MovieLens DataFrames into one" ] }, { "cell_type": "code", "execution_count": 39, "id": "affecting-disclosure", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
userIdmovieIdratingtimestamptitlegenres
0114.0964982703Toy Story (1995)Adventure|Animation|Children|Comedy|Fantasy
1514.0847434962Toy Story (1995)Adventure|Animation|Children|Comedy|Fantasy
2714.51106635946Toy Story (1995)Adventure|Animation|Children|Comedy|Fantasy
31512.51510577970Toy Story (1995)Adventure|Animation|Children|Comedy|Fantasy
41714.51305696483Toy Story (1995)Adventure|Animation|Children|Comedy|Fantasy
51813.51455209816Toy Story (1995)Adventure|Animation|Children|Comedy|Fantasy
61914.0965705637Toy Story (1995)Adventure|Animation|Children|Comedy|Fantasy
72113.51407618878Toy Story (1995)Adventure|Animation|Children|Comedy|Fantasy
82713.0962685262Toy Story (1995)Adventure|Animation|Children|Comedy|Fantasy
93115.0850466616Toy Story (1995)Adventure|Animation|Children|Comedy|Fantasy
\n", "
" ], "text/plain": [ " userId movieId rating timestamp title \\\n", "0 1 1 4.0 964982703 Toy Story (1995) \n", "1 5 1 4.0 847434962 Toy Story (1995) \n", "2 7 1 4.5 1106635946 Toy Story (1995) \n", "3 15 1 2.5 1510577970 Toy Story (1995) \n", "4 17 1 4.5 1305696483 Toy Story (1995) \n", "5 18 1 3.5 1455209816 Toy Story (1995) \n", "6 19 1 4.0 965705637 Toy Story (1995) \n", "7 21 1 3.5 1407618878 Toy Story (1995) \n", "8 27 1 3.0 962685262 Toy Story (1995) \n", "9 31 1 5.0 850466616 Toy Story (1995) \n", "\n", " genres \n", "0 Adventure|Animation|Children|Comedy|Fantasy \n", "1 Adventure|Animation|Children|Comedy|Fantasy \n", "2 Adventure|Animation|Children|Comedy|Fantasy \n", "3 Adventure|Animation|Children|Comedy|Fantasy \n", "4 Adventure|Animation|Children|Comedy|Fantasy \n", "5 Adventure|Animation|Children|Comedy|Fantasy \n", "6 Adventure|Animation|Children|Comedy|Fantasy \n", "7 Adventure|Animation|Children|Comedy|Fantasy \n", "8 Adventure|Animation|Children|Comedy|Fantasy \n", "9 Adventure|Animation|Children|Comedy|Fantasy " ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ml_df = pd.merge(ml_ratings_df, ml_movies_df, on='movieId')\n", "ml_df.head(10)" ] }, { "cell_type": "markdown", "id": "lightweight-constitution", "metadata": {}, "source": [ "## Choosing a row, a column or several columns" ] }, { "cell_type": "code", "execution_count": 40, "id": "excited-interface", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
user-idgame-titlebehavior-namevaluezero
0151603712The Elder Scrolls V Skyrimpurchase1.00
1151603712The Elder Scrolls V Skyrimplay273.00
2151603712Fallout 4purchase1.00
3151603712Fallout 4play87.00
4151603712Sporepurchase1.00
5151603712Sporeplay14.90
6151603712Fallout New Vegaspurchase1.00
7151603712Fallout New Vegasplay12.10
8151603712Left 4 Dead 2purchase1.00
9151603712Left 4 Dead 2play8.90
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Choosing rows by index\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
user-idgame-titlebehavior-namevaluezero
3151603712Fallout 4play87.00
4151603712Sporepurchase1.00
5151603712Sporeplay14.90
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Choosing rows by position\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
user-idgame-titlebehavior-namevaluezero
3151603712Fallout 4play87.00
4151603712Sporepurchase1.00
5151603712Sporeplay14.90
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "display(HTML(steam_df.head(10).to_html()))\n", "\n", "# Choosing rows by index\n", "chosen_df = steam_df[3:6]\n", "\n", "print(\"Choosing rows by index\")\n", "display(HTML(chosen_df.head(10).to_html()))\n", "\n", "# Choosing rows by position\n", "chosen_df = steam_df.iloc[3:6]\n", "\n", "print(\"Choosing rows by position\")\n", "display(HTML(chosen_df.head(10).to_html()))" ] }, { "cell_type": "code", "execution_count": 42, "id": "reflected-banner", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 The Elder Scrolls V Skyrim\n", "1 The Elder Scrolls V Skyrim\n", "2 Fallout 4\n", "3 Fallout 4\n", "4 Spore\n", "5 Spore\n", "6 Fallout New Vegas\n", "7 Fallout New Vegas\n", "8 Left 4 Dead 2\n", "9 Left 4 Dead 2\n", "Name: game-title, dtype: object\n" ] } ], "source": [ "# Choosing a column\n", "chosen_df = steam_df['game-title']\n", "\n", "print(chosen_df.head(10))" ] }, { "cell_type": "code", "execution_count": 43, "id": "efficient-humidity", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
user-idgame-title
0151603712The Elder Scrolls V Skyrim
1151603712The Elder Scrolls V Skyrim
2151603712Fallout 4
3151603712Fallout 4
4151603712Spore
5151603712Spore
6151603712Fallout New Vegas
7151603712Fallout New Vegas
8151603712Left 4 Dead 2
9151603712Left 4 Dead 2
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Choosing several columns\n", "chosen_df = steam_df[['user-id', 'game-title']]\n", "\n", "display(HTML(chosen_df.head(10).to_html()))" ] }, { "cell_type": "markdown", "id": "popular-cause", "metadata": {}, "source": [ "### Splitting the dataset into training and test set" ] }, { "cell_type": "code", "execution_count": 45, "id": "continuous-cheat", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Shuffled range of indices\n", "[ 88886 27084 35588 56116 183664 34019 190384 138109 48325 94171\n", " 163304 35071 45875 187591 107927 62332 97588 3784 669 75931]\n", "\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
user-idgame-titlebehavior-namevaluezero
88886173434036Mortal Kombat Xpurchase1.00
2708480779496Sins of a Solar Empire Trinityplay0.60
35588109669093Killing Floorplay225.00
5611694269421Fallout 4play10.10
183664279406744BLOCKADE 3Dpurchase1.00
34019126269125Grand Theft Auto San Andreaspurchase1.00
190384713354027 Days to Dieplay8.20
138109156818121Half-Life 2play22.00
48325114617787Garry's Modplay1.20
94171156615447LEGO MARVEL Super Heroesplay1.70
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
user-idgame-titlebehavior-namevaluezero
17008081591317Warframepurchase1.00
8527944472980Serious Sam Double D XXLpurchase1.00
13291645592640Penumbra Black Plaguepurchase1.00
1219364787956Always Sometimes Monsterspurchase1.00
46374192538478Heroes & Generalsplay0.40
898231936551Castle Crasherspurchase1.00
179113132196353Knights and Merchantspurchase1.00
14400213190476Blood Bowl 2play6.30
3541660296891Mirror's Edgepurchase1.00
12078662990992Rome Total Warpurchase1.00
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "160000\n", "40000\n" ] } ], "source": [ "shuffle = np.array(list(range(len(steam_df))))\n", "\n", "# alternatively\n", "\n", "shuffle = np.arange(len(steam_df))\n", "\n", "np.random.shuffle(shuffle)\n", "# shuffle = list(shuffle)\n", "print(\"Shuffled range of indices\")\n", "print(shuffle[:20])\n", "print()\n", "\n", "train_test_split = 0.8\n", "split_index = int(len(steam_df) * train_test_split)\n", "\n", "training_set = steam_df.iloc[shuffle[:split_index]]\n", "test_set = steam_df.iloc[shuffle[split_index:]]\n", "\n", "display(HTML(training_set.head(10).to_html()))\n", "\n", "display(HTML(test_set.head(10).to_html()))\n", "\n", "print(len(training_set))\n", "print(len(test_set))" ] }, { "cell_type": "markdown", "id": "outside-twist", "metadata": {}, "source": [ "## Filtering" ] }, { "cell_type": "markdown", "id": "otherwise-rachel", "metadata": {}, "source": [ "### Filtering columns" ] }, { "cell_type": "code", "execution_count": 46, "id": "numerical-pride", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
user-idgame-title
0151603712The Elder Scrolls V Skyrim
1151603712The Elder Scrolls V Skyrim
2151603712Fallout 4
3151603712Fallout 4
4151603712Spore
5151603712Spore
6151603712Fallout New Vegas
7151603712Fallout New Vegas
8151603712Left 4 Dead 2
9151603712Left 4 Dead 2
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "chosen_df = steam_df.loc[:, ['user-id', 'game-title']]\n", "\n", "display(HTML(chosen_df.head(10).to_html()))" ] }, { "cell_type": "markdown", "id": "interior-cleaner", "metadata": {}, "source": [ "### Filtering rows" ] }, { "cell_type": "code", "execution_count": 47, "id": "marine-growth", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 False\n", "1 False\n", "2 True\n", "3 True\n", "4 False\n", "5 False\n", "6 False\n", "7 False\n", "8 False\n", "9 False\n", "Name: game-title, dtype: bool\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
user-idgame-titlebehavior-namevaluezero
2151603712Fallout 4purchase1.00
3151603712Fallout 4play87.00
318787445402Fallout 4purchase1.00
318887445402Fallout 4play83.00
568325096601Fallout 4purchase1.00
568425096601Fallout 4play1.60
6219211925330Fallout 4purchase1.00
6220211925330Fallout 4play133.00
7300115396529Fallout 4purchase1.00
7301115396529Fallout 4play17.90
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "condition = steam_df['game-title'] == 'Fallout 4'\n", "\n", "print(condition.head(10))\n", "\n", "chosen_df = steam_df.loc[condition]\n", "\n", "display(HTML(chosen_df.head(10).to_html()))" ] }, { "cell_type": "markdown", "id": "functioning-condition", "metadata": {}, "source": [ "### Filtering rows and columns at once" ] }, { "cell_type": "code", "execution_count": 48, "id": "advanced-religion", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
user-idgame-titlevalue
3151603712Fallout 487.0
318887445402Fallout 483.0
568425096601Fallout 41.6
6220211925330Fallout 4133.0
7301115396529Fallout 417.9
75274834220Fallout 419.8
761765229865Fallout 40.5
771265958466Fallout 4123.0
996391800733Fallout 463.0
1070043913966Fallout 465.0
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "condition = (steam_df['game-title'] == 'Fallout 4') & (steam_df['behavior-name'] == 'play')\n", "\n", "chosen_df = steam_df.loc[condition, ['user-id', 'game-title', 'value']]\n", "\n", "display(HTML(chosen_df.head(10).to_html()))" ] }, { "cell_type": "markdown", "id": "frequent-match", "metadata": {}, "source": [ "## Simple operations on columns" ] }, { "cell_type": "markdown", "id": "described-sister", "metadata": {}, "source": [ "### Multiply a column by 2" ] }, { "cell_type": "code", "execution_count": 51, "id": "injured-sweet", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
user-idgame-titlebehavior-namevaluezero
0151603712The Elder Scrolls V Skyrimpurchase1.00
1151603712The Elder Scrolls V Skyrimplay273.00
2151603712Fallout 4purchase1.00
3151603712Fallout 4play87.00
4151603712Sporepurchase1.00
5151603712Sporeplay14.90
6151603712Fallout New Vegaspurchase1.00
7151603712Fallout New Vegasplay12.10
8151603712Left 4 Dead 2purchase1.00
9151603712Left 4 Dead 2play8.90
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
user-idgame-titlebehavior-namevaluezero
0151603712The Elder Scrolls V Skyrimpurchase2.00
1151603712The Elder Scrolls V Skyrimplay546.00
2151603712Fallout 4purchase2.00
3151603712Fallout 4play174.00
4151603712Sporepurchase2.00
5151603712Sporeplay29.80
6151603712Fallout New Vegaspurchase2.00
7151603712Fallout New Vegasplay24.20
8151603712Left 4 Dead 2purchase2.00
9151603712Left 4 Dead 2play17.80
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "steam_df_copy = steam_df.copy()\n", "\n", "display(HTML(steam_df_copy.head(10).to_html()))\n", "\n", "steam_df_copy.loc[:, 'value'] = steam_df_copy['value'] * 2\n", "\n", "display(HTML(steam_df_copy.head(10).to_html()))" ] }, { "cell_type": "markdown", "id": "executed-processor", "metadata": {}, "source": [ "### Choose the first n letters of a string" ] }, { "cell_type": "code", "execution_count": 52, "id": "forbidden-mining", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
movieIdtitlegenres
01Toy Story (1995)Adventure|Animation|Children|Comedy|Fantasy
12Jumanji (1995)Adventure|Children|Fantasy
23Grumpier Old Men (1995)Comedy|Romance
34Waiting to Exhale (1995)Comedy|Drama|Romance
45Father of the Bride Part II (1995)Comedy
56Heat (1995)Action|Crime|Thriller
67Sabrina (1995)Comedy|Romance
78Tom and Huck (1995)Adventure|Children
89Sudden Death (1995)Action
910GoldenEye (1995)Action|Adventure|Thriller
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
movieIdtitlegenres
01Toy StAdventure|Animation|Children|Comedy|Fantasy
12JumanjAdventure|Children|Fantasy
23GrumpiComedy|Romance
34WaitinComedy|Drama|Romance
45FatherComedy
56Heat (Action|Crime|Thriller
67SabrinComedy|Romance
78Tom anAdventure|Children
89SuddenAction
910GoldenAction|Adventure|Thriller
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "ml_movies_df_copy = ml_movies_df.copy()\n", "\n", "display(HTML(ml_movies_df_copy.head(10).to_html()))\n", "\n", "ml_movies_df_copy.loc[:, 'title'] = ml_movies_df_copy['title'].str[:6]\n", "\n", "display(HTML(ml_movies_df_copy.head(10).to_html()))" ] }, { "cell_type": "markdown", "id": "incorporated-entrance", "metadata": {}, "source": [ "### Take the mean of a column" ] }, { "cell_type": "code", "execution_count": 53, "id": "selected-trial", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "17.874384000000475\n", "17.874384000000475\n" ] } ], "source": [ "# Option 1\n", "print(steam_df['value'].mean())\n", "\n", "# Option 2\n", "print(np.mean(steam_df['value']))" ] }, { "cell_type": "markdown", "id": "discrete-cheese", "metadata": {}, "source": [ "### Simple operation on filtered data" ] }, { "cell_type": "code", "execution_count": 54, "id": "bridal-greenhouse", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
user-idgame-titlebehavior-namevaluezero
1151603712The Elder Scrolls V Skyrimplay273.00
3151603712Fallout 4play87.00
7359945701The Elder Scrolls V Skyrimplay58.00
106692107940The Elder Scrolls V Skyrimplay110.00
1168250006052The Elder Scrolls V Skyrimplay465.00
138811373749The Elder Scrolls V Skyrimplay220.00
206554103616The Elder Scrolls V Skyrimplay35.00
256956038151The Elder Scrolls V Skyrimplay14.60
318887445402Fallout 4play83.00
323394088853The Elder Scrolls V Skyrimplay320.00
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
user-idgame-titlebehavior-namevaluezero
1151603712The Elder Scrolls V Skyrimplay273.00
3151603712Fallout 4play174.00
7359945701The Elder Scrolls V Skyrimplay58.00
106692107940The Elder Scrolls V Skyrimplay110.00
1168250006052The Elder Scrolls V Skyrimplay465.00
138811373749The Elder Scrolls V Skyrimplay220.00
206554103616The Elder Scrolls V Skyrimplay35.00
256956038151The Elder Scrolls V Skyrimplay14.60
318887445402Fallout 4play166.00
323394088853The Elder Scrolls V Skyrimplay320.00
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "steam_df_copy = steam_df.loc[((steam_df['game-title'] == 'Fallout 4') | (steam_df['game-title'] == 'The Elder Scrolls V Skyrim')) \n", " & (steam_df['behavior-name'] == 'play')].copy()\n", "\n", "display(HTML(steam_df_copy.head(10).to_html()))\n", "\n", "condition = (steam_df_copy['game-title'] == 'Fallout 4') & (steam_df_copy['behavior-name'] == 'play')\n", "\n", "steam_df_copy.loc[condition, 'value'] = steam_df_copy.loc[condition, 'value'] * 2\n", "\n", "display(HTML(steam_df_copy.head(10).to_html()))" ] }, { "cell_type": "markdown", "id": "relevant-strap", "metadata": {}, "source": [ "## Advanced operations on columns" ] }, { "cell_type": "code", "execution_count": 55, "id": "female-french", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
user-idgame-titlebehavior-namevaluezero
1151603712The Elder Scrolls V Skyrimplay273.00
3151603712Fallout 4play87.00
5151603712Sporeplay14.90
7151603712Fallout New Vegasplay12.10
9151603712Left 4 Dead 2play8.90
11151603712HuniePopplay8.50
13151603712Path of Exileplay8.10
15151603712Poly Bridgeplay7.50
17151603712Left 4 Deadplay3.30
19151603712Team Fortress 2play2.80
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
user-idgame-titlebehavior-namevaluezero
1151603712The Elder Scrolls V Skyrimplay4.0000000
3151603712Fallout 4play4.0000000
5151603712Sporeplay2.7663190
7151603712Fallout New Vegasplay2.5726120
9151603712Left 4 Dead 2play2.2925350
11151603712HuniePopplay2.2512920
13151603712Path of Exileplay2.2082740
15151603712Poly Bridgeplay2.1400660
17151603712Left 4 Deadplay1.4586150
19151603712Team Fortress 2play1.3350010
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "def reduce_outliers(x):\n", " return min(np.log(1 + x), 4)\n", "\n", "steam_df_copy = steam_df.loc[steam_df['behavior-name'] == 'play'].copy()\n", "\n", "display(HTML(steam_df_copy.head(10).to_html()))\n", "\n", "steam_df_copy.loc[:, 'value'] = steam_df_copy['value'].apply(reduce_outliers)\n", "\n", "display(HTML(steam_df_copy.head(10).to_html()))" ] }, { "cell_type": "markdown", "id": "supported-graphic", "metadata": {}, "source": [ "### The same apply operation can be achieved with the use of a lambda function" ] }, { "cell_type": "code", "execution_count": 56, "id": "objective-survey", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
user-idgame-titlebehavior-namevaluezero
1151603712The Elder Scrolls V Skyrimplay273.00
3151603712Fallout 4play87.00
5151603712Sporeplay14.90
7151603712Fallout New Vegasplay12.10
9151603712Left 4 Dead 2play8.90
11151603712HuniePopplay8.50
13151603712Path of Exileplay8.10
15151603712Poly Bridgeplay7.50
17151603712Left 4 Deadplay3.30
19151603712Team Fortress 2play2.80
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
user-idgame-titlebehavior-namevaluezero
1151603712The Elder Scrolls V Skyrimplay4.0000000
3151603712Fallout 4play4.0000000
5151603712Sporeplay2.7663190
7151603712Fallout New Vegasplay2.5726120
9151603712Left 4 Dead 2play2.2925350
11151603712HuniePopplay2.2512920
13151603712Path of Exileplay2.2082740
15151603712Poly Bridgeplay2.1400660
17151603712Left 4 Deadplay1.4586150
19151603712Team Fortress 2play1.3350010
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "steam_df_copy = steam_df.loc[steam_df['behavior-name'] == 'play'].copy()\n", "\n", "display(HTML(steam_df_copy.head(10).to_html()))\n", "\n", "steam_df_copy.loc[:, 'value'] = steam_df_copy['value'].apply(lambda x: min(np.log(1 + x), 4))\n", "\n", "display(HTML(steam_df_copy.head(10).to_html()))" ] }, { "cell_type": "markdown", "id": "guilty-single", "metadata": {}, "source": [ "### Apply on two columns at once" ] }, { "cell_type": "code", "execution_count": 58, "id": "thrown-geneva", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
user-idgame-titlebehavior-namevaluezero
1151603712The Elder Scrolls V Skyrimplay273.00
3151603712Fallout 4play87.00
5151603712Sporeplay14.90
7151603712Fallout New Vegasplay12.10
9151603712Left 4 Dead 2play8.90
11151603712HuniePopplay8.50
13151603712Path of Exileplay8.10
15151603712Poly Bridgeplay7.50
17151603712Left 4 Deadplay3.30
19151603712Team Fortress 2play2.80
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
user-idgame-titlebehavior-namevaluezerovalue_2
1151603712The Elder Scrolls V Skyrimplay273.004.000000
3151603712Fallout 4play87.004.000000
5151603712Sporeplay14.902.766319
7151603712Fallout New Vegasplay12.102.572612
9151603712Left 4 Dead 2play8.902.292535
11151603712HuniePopplay8.502.251292
13151603712Path of Exileplay8.102.208274
15151603712Poly Bridgeplay7.502.140066
17151603712Left 4 Deadplay3.301.458615
19151603712Team Fortress 2play2.801.335001
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
user-idgame-titlebehavior-namevaluezerovalue_2
1151603712The Elder Scrolls V Skyrimplay1092.00000004.000000
3151603712Fallout 4play348.00000004.000000
5151603712Sporeplay41.21815502.766319
7151603712Fallout New Vegasplay31.12860802.572612
9151603712Left 4 Dead 2play20.40355902.292535
11151603712HuniePopplay19.13598002.251292
13151603712Path of Exileplay17.88702302.208274
15151603712Poly Bridgeplay16.05049602.140066
17151603712Left 4 Deadplay4.81343001.458615
19151603712Team Fortress 2play3.73800301.335001
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "steam_df_copy = steam_df.loc[steam_df['behavior-name'] == 'play'].copy()\n", "\n", "display(HTML(steam_df_copy.head(10).to_html()))\n", "\n", "steam_df_copy.loc[:, 'value_2'] = steam_df_copy['value'].apply(lambda x: min(np.log(1 + x), 4))\n", "\n", "display(HTML(steam_df_copy.head(10).to_html()))\n", "\n", "steam_df_copy.loc[:, 'value'] = steam_df_copy[['value', 'value_2']].apply(lambda x: x[0] * x[1], axis=1)\n", "\n", "display(HTML(steam_df_copy.head(10).to_html()))" ] }, { "cell_type": "code", "execution_count": 59, "id": "governing-alexandria", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
movieIdtitlegenres
01Toy Story (1995)Adventure|Animation|Children|Comedy|Fantasy
12Jumanji (1995)Adventure|Children|Fantasy
23Grumpier Old Men (1995)Comedy|Romance
34Waiting to Exhale (1995)Comedy|Drama|Romance
45Father of the Bride Part II (1995)Comedy
56Heat (1995)Action|Crime|Thriller
67Sabrina (1995)Comedy|Romance
78Tom and Huck (1995)Adventure|Children
89Sudden Death (1995)Action
910GoldenEye (1995)Action|Adventure|Thriller
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
movieIdtitlegenrestitle|genres
01Toy Story (1995)Adventure|Animation|Children|Comedy|FantasyToy Story (1995)|Adventure|Animation|Children|Comedy|Fantasy
12Jumanji (1995)Adventure|Children|FantasyJumanji (1995)|Adventure|Children|Fantasy
23Grumpier Old Men (1995)Comedy|RomanceGrumpier Old Men (1995)|Comedy|Romance
34Waiting to Exhale (1995)Comedy|Drama|RomanceWaiting to Exhale (1995)|Comedy|Drama|Romance
45Father of the Bride Part II (1995)ComedyFather of the Bride Part II (1995)|Comedy
56Heat (1995)Action|Crime|ThrillerHeat (1995)|Action|Crime|Thriller
67Sabrina (1995)Comedy|RomanceSabrina (1995)|Comedy|Romance
78Tom and Huck (1995)Adventure|ChildrenTom and Huck (1995)|Adventure|Children
89Sudden Death (1995)ActionSudden Death (1995)|Action
910GoldenEye (1995)Action|Adventure|ThrillerGoldenEye (1995)|Action|Adventure|Thriller
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "ml_movies_df_copy = ml_movies_df.copy()\n", "\n", "display(HTML(ml_movies_df_copy.head(10).to_html()))\n", "\n", "ml_movies_df_copy.loc[:, 'title|genres'] = ml_movies_df_copy[['title', 'genres']].apply(lambda x: x[0] + \"|\" + x[1], axis=1)\n", "\n", "display(HTML(ml_movies_df_copy.head(10).to_html()))" ] }, { "cell_type": "markdown", "id": "critical-fields", "metadata": {}, "source": [ "## Grouping and aggregating" ] }, { "cell_type": "markdown", "id": "biological-light", "metadata": {}, "source": [ "### Find the most popular games (in terms of purchases)" ] }, { "cell_type": "code", "execution_count": 61, "id": "greenhouse-scout", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
value
game-title
007 Legends1.0
0RBITALIS3.0
1... 2... 3... KICK IT! (Drop That Beat Like an Ugly Baby)7.0
10 Second Ninja6.0
10,000,0001.0
100% Orange Juice10.0
1000 Amps2.0
12 Labours of Hercules10.0
12 Labours of Hercules II The Cretan Bull12.0
12 Labours of Hercules III Girl Power6.0
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
game-titlevalue
0Dota 24841.0
1Team Fortress 22323.0
2Unturned1563.0
3Counter-Strike Global Offensive1412.0
4Half-Life 2 Lost Coast981.0
5Counter-Strike Source978.0
6Left 4 Dead 2951.0
7Counter-Strike856.0
8Warframe847.0
9Half-Life 2 Deathmatch823.0
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "steam_grouped = steam_df.loc[steam_df['behavior-name'] == 'purchase', ['game-title', 'value']]\n", "steam_grouped = steam_grouped.groupby('game-title').sum()\n", "display(HTML(steam_grouped.head(10).to_html()))\n", "\n", "steam_grouped = steam_grouped.sort_values(by='value', ascending=False).reset_index()\n", "\n", "display(HTML(steam_grouped.head(10).to_html()))" ] }, { "cell_type": "markdown", "id": "indie-calcium", "metadata": {}, "source": [ "## Iterating over a DataFrame (if possible, use column operations instead)" ] }, { "cell_type": "code", "execution_count": 63, "id": "laden-intersection", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0, 151603712, The Elder Scrolls V Skyrim, purchase]\n", "[1, 151603712, The Elder Scrolls V Skyrim, play]\n", "[2, 151603712, Fallout 4, purchase]\n", "[3, 151603712, Fallout 4, play]\n", "[4, 151603712, Spore, purchase]\n", "[5, 151603712, Spore, play]\n", "[6, 151603712, Fallout New Vegas, purchase]\n", "[7, 151603712, Fallout New Vegas, play]\n", "[8, 151603712, Left 4 Dead 2, purchase]\n", "[9, 151603712, Left 4 Dead 2, play]\n" ] } ], "source": [ "i = 0\n", "for idx, row in steam_df.iterrows():\n", " print(\"[{}, {}, {}, {}]\".format(idx, row['user-id'], row['game-title'], row['behavior-name']))\n", " i += 1\n", " if i == 10:\n", " break" ] }, { "cell_type": "markdown", "id": "objective-associate", "metadata": {}, "source": [ "## Pandas tasks - Steam dataset" ] }, { "cell_type": "markdown", "id": "floppy-american", "metadata": {}, "source": [ "**Task 4.** How many people made a purchase in the Steam dataset? Remember that a person could by many games, but you need to count every person once." ] }, { "cell_type": "code", "execution_count": null, "id": "decimal-grass", "metadata": {}, "outputs": [], "source": [ "# Write your code here" ] }, { "cell_type": "markdown", "id": "protected-glossary", "metadata": {}, "source": [ "**Task 5.** How many people made a purchase of \"The Elder Scrolls V Skyrim\"?" ] }, { "cell_type": "code", "execution_count": null, "id": "distant-overview", "metadata": {}, "outputs": [], "source": [ "# Write your code here" ] }, { "cell_type": "markdown", "id": "vocational-weekly", "metadata": {}, "source": [ "**Task 6.** How many purchases people made on average?" ] }, { "cell_type": "code", "execution_count": null, "id": "reflected-cathedral", "metadata": {}, "outputs": [], "source": [ "# Write your code here" ] }, { "cell_type": "markdown", "id": "signed-transaction", "metadata": {}, "source": [ "**Task 7.** Who bought the most games?" ] }, { "cell_type": "code", "execution_count": null, "id": "handmade-revolution", "metadata": {}, "outputs": [], "source": [ "# Write your code here" ] }, { "cell_type": "markdown", "id": "piano-bobby", "metadata": {}, "source": [ "**Task 8.** How many hours on average people played in \"The Elder Scrolls V Skyrim\"?" ] }, { "cell_type": "code", "execution_count": null, "id": "hydraulic-observation", "metadata": {}, "outputs": [], "source": [ "# Write your code here" ] }, { "cell_type": "markdown", "id": "stuffed-creativity", "metadata": {}, "source": [ "**Task 9.** Which games were played the most (in terms of the number of hours played)? Print the first 10 titles and respective numbers of hours." ] }, { "cell_type": "code", "execution_count": null, "id": "challenging-truck", "metadata": {}, "outputs": [], "source": [ "# Write your code here" ] }, { "cell_type": "markdown", "id": "crude-petroleum", "metadata": {}, "source": [ "**Task 10.** Which games are the most consistently played (in terms of the average number of hours played)? Print the first 10 titles and respective numbers of hours." ] }, { "cell_type": "code", "execution_count": null, "id": "surgical-lawsuit", "metadata": {}, "outputs": [], "source": [ "# Write your code here" ] }, { "cell_type": "markdown", "id": "monetary-toyota", "metadata": {}, "source": [ "**Task 11\\*\\*.** Fix the above for the fact that 0 hours played is not listed, but only a purchase is recorded in such a case." ] }, { "cell_type": "code", "execution_count": null, "id": "protective-report", "metadata": {}, "outputs": [], "source": [ "# Write your code here" ] }, { "cell_type": "markdown", "id": "ceramic-awareness", "metadata": {}, "source": [ "**Task 12.** Apply the sigmoid function\n", "$$f(x) = \\frac{1}{1 + e^{-\\frac{1}{100}x}}$$\n", "to hours played and print the first 10 rows from the entire Steam dataset after this change." ] }, { "cell_type": "code", "execution_count": null, "id": "optical-announcement", "metadata": {}, "outputs": [], "source": [ "# Write your code here" ] }, { "cell_type": "markdown", "id": "actual-spotlight", "metadata": {}, "source": [ "## Pandas tasks - MovieLens dataset" ] }, { "cell_type": "markdown", "id": "inclusive-crash", "metadata": {}, "source": [ "**Task 13\\*.** Calculate popularity (by the number of users who watched a movie) of all genres." ] }, { "cell_type": "code", "execution_count": null, "id": "developmental-seven", "metadata": {}, "outputs": [], "source": [ "# Write your code here" ] }, { "cell_type": "markdown", "id": "personalized-finland", "metadata": {}, "source": [ "**Task 14\\*.** Calculate average rating for all genres." ] }, { "cell_type": "code", "execution_count": null, "id": "inside-personal", "metadata": {}, "outputs": [], "source": [ "# Write your code here" ] }, { "cell_type": "markdown", "id": "vertical-stick", "metadata": {}, "source": [ "**Task 15.** Calculate each movie rating bias (deviation from the mean of all movies average ratings). Print first 10 in the form: title, average rating, bias." ] }, { "cell_type": "code", "execution_count": null, "id": "greatest-screen", "metadata": {}, "outputs": [], "source": [ "# Write your code here" ] }, { "cell_type": "markdown", "id": "hawaiian-haiti", "metadata": {}, "source": [ "**Task 16.** Calculate each user rating bias (deviation from the mean of all users average ratings). Print first 10 in the form: user_id, average rating, bias." ] }, { "cell_type": "code", "execution_count": null, "id": "charitable-guyana", "metadata": {}, "outputs": [], "source": [ "# Write your code here" ] }, { "cell_type": "markdown", "id": "intimate-porcelain", "metadata": {}, "source": [ "**Task 17.** Randomly choose 10 movies and 10 users and print their interaction matrix in the form of a DataFrame with user_id as index and movie titles as columns (use HTML Display for that). You can iterate over the DataFrame in this task." ] }, { "cell_type": "code", "execution_count": null, "id": "brazilian-frost", "metadata": {}, "outputs": [], "source": [ "# Write your code here" ] }, { "cell_type": "markdown", "id": "boolean-modem", "metadata": {}, "source": [ "## Pandas + numpy tasks" ] }, { "cell_type": "markdown", "id": "worldwide-disclaimer", "metadata": {}, "source": [ "**Task 18.** Create the entire interaction matrix for the MovieLens dataset." ] }, { "cell_type": "code", "execution_count": null, "id": "marine-initial", "metadata": {}, "outputs": [], "source": [ "# Write your code here" ] }, { "cell_type": "markdown", "id": "micro-vision", "metadata": {}, "source": [ "**Task 19.** Calculate the matrix of size (n_users, n_users) where at position (i, j) is the number of movies watched both by user i and user j. Print the submatrix of first 10 rows and 10 columns." ] }, { "cell_type": "code", "execution_count": null, "id": "swedish-lambda", "metadata": {}, "outputs": [], "source": [ "# Write your code here" ] }, { "cell_type": "markdown", "id": "shaped-advance", "metadata": {}, "source": [ "**Task 20.** Calculate the matrix of size (n_items, n_items) where at position (i, j) is the number of users who watched both movie i and movie j. Print the submatrix of first 10 rows and 10 columns." ] }, { "cell_type": "code", "execution_count": null, "id": "quality-bubble", "metadata": {}, "outputs": [], "source": [ "# Write your code here" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.9" } }, "nbformat": 4, "nbformat_minor": 5 }