forked from tdwojak/Python2019
572 lines
177 KiB
Plaintext
572 lines
177 KiB
Plaintext
|
{
|
||
|
"cells": [
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"# Uczenie nienadzorowane"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Do tej pory zajmowaliśmy się uczeniem nadzorowanym (ang. *supervised*), tj. takimi przypadkami, gdy zbiór trenujący składał się z dwóch zmiennych `X` i `y`, a naszym zadaniem było przewidzenia `y` na podstawie danych z `X`. Ponadto poznaliśmy odpowiednie metryki, które pozwalały nam zmierzyć jak dobrze (lub) źle działają modele, które wytrenowaliśmy.\n",
|
||
|
"\n",
|
||
|
"Przypomnijmy, że na uczenie maszynowe składają się trzy paradygmaty:\n",
|
||
|
" * supervised learning\n",
|
||
|
" * unsupervised learning\n",
|
||
|
" * reinforcement learning\n",
|
||
|
" \n",
|
||
|
"Dzisiejsze zajęcia są poświęcone drugiemu paradygmatowi, czyli uczeniu nienadzorowanym, a dokładniej automatycznemu klastrowaniu. Do klastrowania służą m.in. następujące algorytmy:\n",
|
||
|
" * K-średnich (ang. *k-means*)\n",
|
||
|
" * [DB-SCAN](https://en.wikipedia.org/wiki/DBSCAN)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 1,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"import pandas as pd\n",
|
||
|
"import matplotlib.pyplot as plt"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"**Zadanie 0**: wczytaj do zmiennej `points` zbiór danych z pliku `points.csv`. Uwaga: kolumny są rozdzielone spacją. Plik nie zawiera nagłówka."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 116,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"points = pd.read_csv('points.csv', sep=' ', header=None)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Narysujmy wykres z wyżej wczytanych punktów."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 117,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"<matplotlib.axes._subplots.AxesSubplot at 0x7f02a0cecba8>"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 117,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
},
|
||
|
{
|
||
|
"data": {
|
||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZcAAAEKCAYAAADenhiQAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJzsnXt4VdWd979r73MhJBAwKBISQA3oJJSkmhEd0CmgHcrNmUekM6D2tV5m5jW276iArQOIvJ1WUftWYWqtY2eoWEWcCgRaawWrQUGDJjRJLWaskouipBAIJOey93r/2Gft7Mval3NyknOSrM/zKHDOPnuvfVu/9bsTSikEAoFAIEgnUqYHIBAIBIKhhxAuAoFAIEg7QrgIBAKBIO0I4SIQCASCtCOEi0AgEAjSjhAuAoFAIEg7QrgIBAKBIO0I4SIQCASCtCOEi0AgEAjSTiDTA8gU48aNo1OmTMn0MAQCgWBQcejQoeOU0nO9thu2wmXKlCmora3N9DAEAoFgUEEI+cTPdsIsJhAIBIK0I4SLQCAQCNKOEC4CgUAgSDtCuAgEAoEg7QjhIhAIBIK0I4SLQCDg0tEVQX3LSXR0RTI9FMEgZNiGIgsE/UVHVwStJ7pRNDYHBXnhTA8nJXbUtWH1S4cRlCTEVBUPXz8DSyomZnpYgkGEEC4CQRqxTsprFpZi+sT8QSVoOroiWP3SYfTEVPRABQCseukwZpWMGzTnIMg8QrgIBGmCNynf/3ID8sIy4iodNKv/1hPdCEqSfg4AEJQktJ7oFsJF4BvhcxEI0gSblK10RRT0xFSseunwoPBfFI3NQUxVTZ/FVBVFY3MyNCLBYEQIF4EgTfAmZSutJ7oHaDSpU5AXxsPXz8CIoIRR4QBGBCU8fP0MobUIkkKYxQSCNMEm5VUvHYYE4GzMLGh6YipyQ3JmBpckSyomYlbJuEEfmCDIHEK4CARphE3K+z74HPf/8veIKFT/LiwTnIkqGRxdchTkhYVQEaSMMIsJBGmmIC+MOZecByIR0+dEIsJvIRg2COEiyCqGSuKe8FsIhjvCLCbIGoZa4p7wWwyNhFJBamRUcyGEPEMI+ZwQ0mD47BxCyKuEkA8Tf45NfE4IIY8TQpoJIYcJIZcafvONxPYfEkK+kYlzEfQNY47I6Uh8UIXuulGQF0Z58ZhhObHuqGvDrIf24sanD2LWQ3uxs64t00MSDCCZNov9J4D5ls/uA/AapXQqgNcS/waArwGYmvjvDgA/BjRhBGAdgJkALgewjgkkweCBlyPCEvf6E79muGw112XzuIbiYkHgn4yaxSilbxBCplg+vg7AVxJ//y8ArwNYnfh8C6WUAjhACBlDCJmQ2PZVSumfAYAQ8io0gfWLfh6+II0MVOKe0UxT03zclxnOj7luIM0/7FgNbZ3YsLspK82IIstfkI0+l/GU0k8Tf/8MwPjE3ycCaDFs15r4zOlzG4SQO6BpPZg0aVIahyzoK8YcEeNkmc6JyCgkoooClQIxhbrWz/JTZ2sgfUXsWAGJoCuihTVnY/0vkeUvyEbhokMppYQQ6r2l7/09BeApAKisrEzbfgXpoT8d4DwhYYW3snZagTe2n0J+ThC5IXnAijwaz8Hv+DPFQCwWBNlNNgqXY4SQCZTSTxNmr88Tn7cBKDZsV5T4rA29ZjT2+esDME5BP9BfiXs8IWGFt7LmrcC7Y3HcvqUWIVlCRFFBqHmd0l+TvNc5ZJtmIKLlhjeZdujz2AmARXx9A8AOw+c3J6LGrgDQmTCfvQLgq4SQsQlH/lcTnwmGAW4ObeN3PCERkIBwwD0PxZqvEg4QEEIQiWuO6mhcNWXhA/03yTvVLssNyVmbRzOco+WGOxnVXAghv4CmdYwjhLRCi/r6AYBthJBbAXwCYFli8z0AFgBoBnAWwC0AQCn9MyFkA4B3E9s9yJz7gqGNm6+D993D18/Ayu2HIUsEikqxcekMXytr4wq8szuKO7e+j5gS178fEZSgqhThgNyv5h+eqWnNolIUjx0JgKKsMD/txxQIUiXT0WL/4PDVPM62FMCdDvt5BsAzaRyaIMvp6Ipg1fZ6ROJ2hzwArh9kzcJSABSgRPsT0IUAC3nmCQVjJBhPe6AUeGxZBU73xBCJqyidMLo/ThmA3dRU03wcd/y8NisjxgTDm2z0uQiGGdYwXj9hvVsPHkUkzvd1sL8bfROyRLB+VyOiCgWgRVmteukwTvfEXcN5tx74BOt3NSIoS1Ao1TUgpj30xBUoqopvP/8+jNaxm6+chAev+1KarpAZ5pcSHSMF2YwQLoKMYjVfLbusCNsOtXrmlGze12zbV1RRdF+HLQxWoQjKWggywyhweJPz1gOf4P6XG/R9s+/3r56L/avnorG9E7dvqUWEU+h4y9tHcfMVU1AyflTqF8cDkUsiyGay0aEvGCbwsri3HDjqmdXdeqIbIdn+6FbNmaqv6h++fgbCAQkjQzLCAQnrFpdCsUR1MYFjhE3OHV0RrK9ush1DJkSfvPNzQgjJzv1Z6lpOJnM5kkbkkgiyGSFcBBnDqS2wEaOpi0V/5YZk26QaDkhYPrM3MZay/1Ptz1HhgK1KMVfgJCZnTYCZS+YDQEzpnby9Ok9WFI9xPbe+IiovC7IZYRYTZAw/bYHZZG8zn1UWYVttKzdBj2lEmk/Gbs4y+nNGhQOOiX5x1Z5ne+9XL9a/N0ZvxeKqzefSnyYxhsglEWQrhNLhmaheWVlJa2trMz2MYYOTk35nXZtpcrcKDS3UNkfzbRgc+COCEqqrZuNMtNfPwvbfeqIbNz59EKcjveHCo8IBPHvbTJRztAmvsVEKROIqwjIBkYjND8R+H4sraGg/hXF5IVx5UXJOdVGaXjBYIIQcopRWem0nNBdBn/AzKbrlo/BW3t+eN623MGN1EyRCuJFhZ6IKyovH2Pa/ZlGpTSOKxBXH/vW8qgAdXRFMLsjFs9+8HMufPqjtQ6GAQm0RWez3O+ra8INff5B0WPBQ62MjEABCc8n0MAY1fqsFz3por6ke1oighP2r57qu0Hm/M8L2AYC7/zULS7FhdxOoShFRKEYENd+OcYxOgtF4XpG4Akkipv3ztKBUz7P52GksePzNRIi0/98JBJlCaC6CfsVvjoVz4cdO5OeEbCYtY1Ijr47WyKAMFVT3jdS3nOTuf/rEfFRXzcaCJ2oAUH3SX/XSYZROGI09DZ9h874PEZJ7s+pnlYxDY3unLTkTPsq7pBIWvKOuDSu3HzYJFj+/EwgGA0K4CFLC72TKc9r3xJVE4UcZPXEFlFLkBAMm7Yf3u3BAwpM3XYaywtH6MXJDMiJxc6KJMeIrLEuIxnv3Q1WKBY/XIKpon0Ximl/m7m11kCUJsmQ3wYVlAkoIwrJzdV+3sGCehsSEs3Fs1t8JBIMZIVwEKeE3x8JaDyuqqFBUFRGld2IHoDvfjdoPr2T71dPO1X/DzFeSRACFmhzubBK3+V4UChaobCSuAnGHyDUiEew2BA/wNAqn8To1JHPSzEIyEeHEgiGBEC6ClEimX4e58GMMd259zxTJZcSo/biF2fJ6m1CiCQEWAmwdI89/4obRBOcnrHhJxUSUThiNupaTqCgeg7G5Id0PYzUd8oRzKCBhz12zBySEWSDob4RwEaRMMjkWxnpYbrktVu3HGMllNC/xVv5hWYsgY7CILxaynBuSsWhTja9z45ngvLAGONz5lRJH02F58RiucBaCRTBUEMJF0CeSbe5l1SZ4Phfe/mzhxgvt4cZGweQUyWY10V1/6UTsrG/XWwYzquaUmExwVnjFNq0BDpv2NcNqgjOOUSRACoYyQrgIBhzrpArA9Pf6lpNc57dx4l6/qxH3/s3FeOzVIzaznFskG+/Y//1+m2l84QAxlZIx0tEVwdaDR7F5XzNCBgf/5IJcm5YSkiXccfWF2Px6s6PpMFnhLJItBYMFIVwEGcE6qbIkRL/O76hC8chvjmDd4lJML8z3DGM2Fpw0muhaT3RjzaJSbKhu8vQd7ahr08OUAS1rH9AEV3XVbK4mtXzmJCyfOSktAkEkWwoGE0K
|
||
|
"text/plain": [
|
||
|
"<matplotlib.figure.Figure at 0x7f02a0cf7940>"
|
||
|
]
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"output_type": "display_data"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"xs = points[0]\n",
|
||
|
"ys = points[1]\n",
|
||
|
"\n",
|
||
|
"points.plot(kind='scatter', x=0, y=1)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"**zadanie 1** Ile dostrzegasz rozdzielnych grup punktów na powyższym wykresie?"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": []
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Podstawowym akgorytmem do klastrowania danych jest $k$-średnich albo k-means, który został omówiony na wykładzie. Oczywiście biblioteka `sklearn` zawiera implementację tego algorytmu."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"**zadanie 2** Wczytaj z biblioteki `sklearn.cluster` klasę `KMeans`."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 119,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": []
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Algorytm k-means wymaga podania oczekiwanej liczby klas, dlatego podczas tworzenia obiektu `KMeans` musimy podać parametr `n_clusters`. W poniższym przykładzie ustawiamy powyższy parametr na 3."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 120,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"kmeans = KMeans(n_clusters=3)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"**zadanie 3** Wywołaj metodę `fit` na obiekcie `kmeans` i jako parametr przekaż zmienną `points`. W taki sposób wytrenujesz model."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 121,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,\n",
|
||
|
" n_clusters=3, n_init=10, n_jobs=1, precompute_distances='auto',\n",
|
||
|
" random_state=None, tol=0.0001, verbose=0)"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 121,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": []
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"**zadanie 4** Mając wytrenowany model k-średnich, możemy wyznaczyć klaster, do którego został przydzielony każdy z punktów. Służy do tego komenda *predict*. Wywołaj tę komendę na zmiennej *points* i zapisz wynik do zmiennej *clusters*."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 122,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": []
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Wyświetlmy, w jaki sposób model podzielił punkty:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 125,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYkAAAD8CAYAAACCRVh7AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJzsnXWYXNXZwH/vHZ/1zSYhaAjBpUiQ4FYcghcoEKy4FCjuUIoWLU6gUPSDQtACIbgVEiRoITghti7jc9/vj3N3d2bnzuxsPOT+nmef7Jw595xzZyfnvedVUVU8PDw8PDzcsBb2Ajw8PDw8Fl08IeHh4eHhURRPSHh4eHh4FMUTEh4eHh4eRfGEhIeHh4dHUTwh4eHh4eFRFE9IeHh4eHgUxRMSHh4eHh5F8YSEh4eHh0dR/At7AXNLQ0ODDh8+fGEvw8PDw2OxYvLkyY2qOri/fou9kBg+fDiTJk1a2Mvw8PDwWKwQkR/L6eepmzw8PDw8iuIJCQ8PDw+PonhCwsPDw8OjKJ6Q8PDw8PAoiickPDx+w6jdjKY+QLPTF/ZSPBZTFnvvJg+P+YHaLZD5CfzLIVb9wl7OgFG10fZLIf44SAg0hYY2Q2qvRySysJfnsRjhnSQ8PHJQzWK3XYjO2hJtORydtSV269nYdhLV9MJeXtlo7D6IPwGkQDuAJCTfRtsvW9hL81jM8ISEh0cO2nUHxMcDSdBOIAWJp2DWuujMtbEb90JTnyzsZfZP1z+BRJ/GJMSfXqyEncfCxxMSHh65dN1H4eaadX5syHyOtoxFMz8t+LUNBO0o8kYWNLlAl+KxeOMJCQ+PXIpurrl9kmjXffN/LXNDcENACtt9yyFW5QJfjsfiiyckPDxyCaxdRqcsZL6c70uZG6TqTJAKIOC0WEAEqb50Ia7KY3HEExIeHjlI9fkgEfr9r+FbZoGsZ04R/0pIw7MQOQD8a0N4D2TQY0hok4W9NI/FDM8F1sMjBwmsDYOeQDvvhMQLQNy9Y2SfBbquOUF8SyM1FyzsZXgs5ngnCY95jqqimV/Q7MyFvZQ5QvwrYdVehdRcDoQLO1jLIsGNF/i6PDwWBt5JwmOeoqlP0LbTIDsLUNS/MlJ7I+JffmEvbeCEd4XUBybeQHyAgFQg9eMQcTEK/0bR9BRIfwrW0hDaAhFv21iS8P7aHvMMzTahLWNBY72NmS/R5oNg8KuIBIpfvAgiIkjNJWjFkZCeDFYDBEcvMZukagptOdbcu9ogfpAqqH8I8S+7sJfnsYCYJ+omEblHRGaJyGc5bfUiMkFEvnH+rXPaRURuEpGpIjJFRNbPuWas0/8bERk7L9bmseDQ+BOg2T6tNmgXJN+cf/NmpqJd96Px8ajd6d7HbsFu/Qv2jLWxZ6yF3XIymp1d1vjiXx6J7IXMp6dotZvR5Dto5vt5PvbcoF33QGoSaBwTXNgF9ixzUvRYYphX3/h/Av8A7s9pOxuYqKpXisjZzuuzgJ2BlZ2fjYHbgI1FpB64CBgFKDBZRJ5W1ZZ5tEaP+U12GuASqKUZsOedfUIzPxiBlG0Fezqk/gvYgB/kEqi7CwmO6u2vWbTpAMj+DGRMY3IC2vQJDJ6ASBBVhdR/0cSToDYS2Q2CW84XtZJmvkU7boD0R0aoahtIGDSDBtZC6m5HrOp5Pu+AiT9GYWChDekvULt5scxp5TFw5omQUNU3RGR4n+YxwNbO7/cBr2GExBjgflVV4D0RqRWRYU7fCaraDCAiE4CdgIfnxRo95j8SHIXGxwOxPu9YEPjdPJnDjj0N7edjNvtMn3dToKAtx8GQd3rVW8k3wJ7Vp7+zOSdegshuaMeVEHuEbm8mTbwE4Z2g5sp5Kig08x3atK/zdG7nvOGcgNKfoG1nIXW3zbM55xjt+/mW+Z7Hb4r56d00VFW78xPPAIY6vy8D/JzT7xenrVh7ASJytIhMEpFJs2eXpzLwWACEdwD/MkAwtxFCo5HAGnM9vNpdjoBIUCggcsmYp/Sel1PdU1FoDG0/D3v2ThB7gHx317hxgU3P2zxN2vmPQgGRRxqSb6J2GZHf85vwzvQG4+XgWxbxDVngy/FYOCwQF1jn1KDzcLw7VXWUqo4aPHjwvBrWYy4RCSL1j0LFkeBbDnwrQdWpSO3N82aC1PvGeNr/SiA3iZ1/hEmX7YbGIfsd4Jb0LoEmX52DhZYgNZniAqIbq/dksRCRyhPAtyxI1GkJG++u2msX6ro8Fizz001jpogMU9XpjjppltM+DVgup9+yTts0etVT3e2vzcf1ecwHxKpEqk6FqlPLvkbtLsj+CL6l8vTcajejXf+C1FtgDYMcO0M/I+b3DW0F1iDIJil9AumL30ltMQ/xLWPsKKWwqsEaWrrPAkCsKmh4BhIvoanJJu9TdE/PFrGEMT9PEk8D3R5KY4GnctoPdbycNgHaHLXUi8AOIlLneELt4LR5/EZRVezOm9FZo9Hmg03thpZTUE2g2Ua0cTfousuofJIvQse1Lt5TuQSBMFJzLZJzchDxI4P+D0LbM7DnIssYsOchUnkc0E/RH98ykF00PJ1EgkhkN6yai7Aqj/AExBLIPDlJiMjDmFNAg4j8gvFSuhL4PxE5EvgR2N/p/jywCzAVY+E8HEBVm0XkMuADp9+l3UZsj98mGh8PnXcDiV5lZPIVtP0S449vt9GrBlLTT6KgfvJOBNZICG0JEoToPoh/hfx57Ba0637j3RTcAuwZJRL0+TFpwRUC66NS55ZLdY6R0BZo9YXQcSVoyswlVaDN9HwI6Y+NcXvQU4tnEKLHbwox5oLFl1GjRumkSZMW9jKWaDTbhHbeCslXwKqC8D7Gmyj1PviXRSqOMDmR+mDP2hbsX1xGDIJvacj+4PJetyE13afNj9H1C0QPQKrORMRvBMTs3UFbcq5xG6MYYWTwRMQ3b21f6rgFqwagcXsKXU19ENkLq+Zv83ReD49uRGSyqvarw10yQkc95htqt6NNe4LdDKTNPt15OUaTaUPmUzQxEa25FiuyQ+91mgJ7WrFRQYrFCbht7On89tgjKCDV56KtF4DOcukfAKk1AWIlhUUCbT0TGXRviT4DR8Rv1ErpL1Dxubh1ZB0jt4fHwsVL8OcxV2jsEbBbKdxo7Zx/E9BxIZprT0i+TvGvXxAqjqZQd1/uM00CYo9gJ9+B1AT3LhKC2tvBv3L/w6XfK3PeOcC3VL4nVt57nqrJY+HjCQmPuSP1Lq5R1n2x45D9FTCqFs3OAnzufYMbmFNHxeHkf0VtoLbMhSl03kpRz2tNIb4GsMuJs5l/yfzEqofQdkBfF90IUnnsfJvXw6NcPCHhMXf4lqXoZp9HFs38iN04Bp25JnRcgTEQ9yWERPc1v2am9hnbBrrID9YrgoSNgbroulcwRmH/WvQrBEI7lH5/LpHaqyCyK+a+gmANhpqrkeAG83VeD49y8AzXHgOiO4Fed51kzUxFG/em0PCaSwAC60F6Sp9+3c8o3aopP2bD9kFgFUh/gWtcg38Ns5FmvwNrWUh/SP5pJgxV55r2xNO4B6+FofoiJLA62nQgRYsLUQWDn8fy9R+3oJpxvKaC4F9lwOk8VONgd4I1CBHv+c1j/lKu4dr7Jno46p/pqN0351JOn8zP2E0HobM2QmdthN20P5r5EfGPROpuMmm0iQABU3eAoHHtJARWLaQn4ZosDgHfSJBBTlva9EtPoWjgm92KVX8X1uCJWIPuQwY9DMHNQGrAvypSew1WxQGOXSOE+0khAe2XgKaRQQ9CcFOgil5B1U0a2s6kv4cpTb6JztoUbT4Ubd4fbdweTX9d8pq+iEQQ32BPQHgsUngniSUcO/Zvx2c/CShExiDVFyLSq9JRTaKztwW7id6ncgukFhnyKiIRVG3I/mLSNvgGoXYrZL5B2/8GmW+AlPsCpApqboK2Ex1Po3KwoPpqrOgexhie/RWsmrzMqZqYgLad68QiJHC3TQiEd8dy0kxo/HlzTUGCwghSfx8SXNd1NXbme2jcjQLjvdQhQ97M+yw9PBYVPBdYj37R5OvQfil5qpb40yg2kuufn5j
|
||
|
"text/plain": [
|
||
|
"<matplotlib.figure.Figure at 0x7f02a3ee7c88>"
|
||
|
]
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"output_type": "display_data"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"plt.scatter(x=points[0], y=points[1], c=clusters)\n",
|
||
|
"plt.show()"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Informacje o centroidach są przechowywwane w atrybucie `cluster_centers_`:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 129,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stdout",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
"Claster ID: 0\tX: 1158.9296227871434\tY:-212.28055211754568\n",
|
||
|
"Claster ID: 1\tX: -844.3076877296985\tY:-450.0715318089522\n",
|
||
|
"Claster ID: 2\tX: 60.61234354820601\tY:444.84943020237415\n"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"for idx, centroid in enumerate(kmeans.cluster_centers_):\n",
|
||
|
" print(\"Claster ID: {}\\tX: {}\\tY:{}\".format(idx, centroid[0], centroid[1]))"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"**zadanie 5** Sprawdź, w jaki sposób podzieli zbiór punktów model k-średnich, jeżeli ustawimy liczbę klastrów na 2 i 4."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": []
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Algorytm k-średnich minimalizuje sumę odległości do najbliżsego centroidu, co możemy traktować jako funkcje kosztu i wykorzystać to porównania pomiędzy modelami z różnymi liczbami klastrów."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 132,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEDCAYAAADOc0QpAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAH/1JREFUeJzt3Xl0lfW97/H3NzOQASFhykDIpOKEGBEniAw9ak+lrbV1bO3VImi10J57T8+9d517jmd1nXXXvbdOVREtVat17ESrrRWZZZDgjFQgIUAAIUwZyfy7f2SDMWbYgb33s4fPay1W9vCwn4+P5PN89/M82THnHCIiEl3ivA4gIiKBp3IXEYlCKncRkSikchcRiUIqdxGRKKRyFxGJQp6Wu5ktMbODZvaxH8uON7O3zOxDM1tpZjmhyCgiEom8ntyfBq72c9n/CzzrnDsfuB/4z2CFEhGJdJ6Wu3NuNXCk+2NmVmhmfzWzzWa2xszO8j01EVjuu70CmBPCqCIiEcXryb03i4F7nXMXAf8EPOZ7/APgm77b3wDSzGykB/lERMJegtcBujOzVOAy4BUzO/Fwsu/rPwG/MLPbgdXAXqAj1BlFRCJBWJU7Xe8kjjnnJvV8wjm3D9/k7tsJXO+cOxbifCIiESGsDss45+qAnWZ2A4B1ucB3O9PMTuT9F2CJRzFFRMKe15dCvgCsB840s2ozuwO4BbjDzD4AtvD5idMy4FMz2waMBn7mQWQRkYhg+shfEZHoE1aHZUREJDA8O6GamZnp8vPzvVq9iEhE2rx58yHnXNZAy3lW7vn5+ZSXl3u1ehGRiGRmu/xZTodlRESikMpdRCQKqdxFRKKQyl1EJAqp3EVEopDKXUQkCqncRUSiUMSVe9WhRv79T1to6+j0OoqISNiKuHKvqGngV29X8dvN1V5HEREJWxFX7jPOGsUFORk8snwHre2a3kVEehNx5W5mLJhdwt5jx3lV07uISK8irtwBykqymJQ7nEdX7KClXb9pT0Skp4gsdzNjoW96f7lc07uISE8DlruZLTGzg2b2cR/P32JmH5rZR2a27sSvxQu2acWZXDT+DB7T9C4i8iX+TO5PA1f38/xOYLpz7jzgP4DFAcg1IDNj4awS9tc289KmPaFYpYhIxBiw3J1zq4Ej/Ty/zjl31Hd3A5AToGwDurxoJBfnn8GjK3bQ3KbpXUTkhEAfc78D+EuAX7NPJ6b3A3UtvPDO7lCtVkQk7AWs3M3sKrrK/Z/7WWaumZWbWXlNTU1A1ntp4UgumTCCx1ZWaHoXEfEJSLmb2fnAU8Ac59zhvpZzzi12zpU650qzsgb8FYD+rpuFs0uoqW/huQ1+/fYpEZGod9rlbmZ5wO+A25xz204/0uBNLRjJpQUjWbSqkuOtmt5FRPy5FPIFYD1wpplVm9kdZjbPzOb5FvlXYCTwmJm9b2ae/NbrhbNLONSg6V1EBCBhoAWcczcN8PydwJ0BS3SKpkwYwRVFmSxaVcEtU/MYmjTgf5qISNSKyJ9Q7cvC2cUcbmzl2fWa3kUktkVVuV80fgTTSrJ4YlUFDS3tXscREfFMVJU7wMJZxRxtauOZdVVeRxER8UzUlfuFeWdQdmYWT66ppL65zes4IiKeiLpyB1g4q4Rjmt5FJIZFZblfkDucmWeN4sk1O6nT9C4iMSgqyx1gwawSao+38au1VV5HEREJuagt9/NyMpg9cTRPra2k9rimdxGJLVFb7gALZhVT39zOkrU7vY4iIhJSUV3u54zL4B/OGc2StTupbdL0LiKxI6rLHbqOvde3tPPU2kqvo4iIhEzUl/vZY9O59rwx/OrtKo42tnodR0QkJKK+3AF+NLOExtZ2nlyj6V1EYkNMlPuZY9K49ryxPLOuiiOa3kUkBsREuQMsmFlMU1sHi1dreheR6Bcz5V48Oo2vnT+OZ9ZVcaihxes4IiJBFTPlDnDfzGJa2jW9i0j0i6lyLxqVypxJ2Ty7voqaek3vIhK9YqrcAe6dUURreyeLVlV4HUVEJGhirtwLslL5+oXZPLdhFwfrmr2OIyISFDFX7gD3zSimvdPxuKZ3EYlSMVnu+ZnD+OaF2Ty/cTcHNL2LSBSKyXIHuHdGMZ2djsdW7PA6iohIwMVsueeNHMq3LsrhhXf2sL/2uNdxREQCKmbLHeCeq4rodI7HVujYu4hEl5gu99wRQ7mhNJcXN+1m7zFN7yISPQYsdzNbYmYHzezjPp43M3vYzHaY2YdmNjnwMYPnhzOKAHhUx95FJIr4M7k/DVzdz/PXAMW+P3OBx08/VuhkDx/Cdy7O5ZXyPew50uR1HBGRgBiw3J1zq4Ej/SwyB3jWddkADDezsYEKGAr3XFWEYZreRSRqBOKYezawp9v9at9jX2Jmc82s3MzKa2pqArDqwBibMYQbp+Ty6uZqTe8iEhVCekLVObfYOVfqnCvNysoK5aoHdHdZEXFxxiPLt3sdRUTktAWi3PcCud3u5/geiyhjMlK4eUoev313L1WHGr2OIyJyWgJR7kuB7/qumpkK1Drn9gfgdUPu7rJCEuKMR5br2LuIRDZ/LoV8AVgPnGlm1WZ2h5nNM7N5vkVeByqBHcCTwN1BSxtko9JTuHXqeH7/XjU7Nb2LSARLGGgB59xNAzzvgHsClshj86YX8vzGXTz81nYe+M4kr+OIiJySmP4J1d5kpSVz29Tx/PH9vVTUNHgdR0TklKjce3HX9EKSE+J5+C1dOSMikUnl3ovM1GS+e9l4ln6wj+0H6r2OIyIyaCr3Ptw1rZAhifE8pOldRCKQyr0PI4Ylcftl+bz20X62aXoXkQijcu/HD64sYFhSAg8t0/QuIpFF5d6PM7pN71v313kdR0TEbyr3Adx55QTSkjW9i0hkUbkPYPjQJL5/xQT+uuUztuyr9TqOiIhfVO5+uOOKCaSlJPCgpncRiRAqdz9kDEnkzisKePOTA3y8V9O7iIQ/lbufvn9FPukpCTy4bJvXUUREBqRy91N6SiI/uLKAZVsP8sGeY17HERHpl8p9EG6/PJ/hQxM1vYtI2FO5D0Kab3pf8WkN7+0+6nUcEZE+qdwH6XuX5XPG0EQe0JUzIhLGVO6DlJqcwNxphazeVsPmXZreRSQ8qdxPwXcvHc/IYUk69i4iYUvlfgqGJSdw1/QC1mw/xKaqI17HERH5EpX7Kbp16ngyU5N44E1N7yISflTup2hoUgLzpheyruIwGysPex1HROQLVO6n4dap48lKS+YBHXsXkTCjcj8NKYnxzJ9eyIbKI6yrOOR1HBGRk1Tup+nmS/IYnZ7Mg29uxznndRwREUDlftpSEuO5u6yId6qOsK5Cx95FJDz4Ve5mdrWZfWpmO8zsp708n2dmK8zsPTP70MyuDXzU8PWdi3MZk57Cz9/cpuldRMLCgOVuZvHAo8A1wETgJjOb2GOx/wm87Jy7ELgReCzQQcNZSmI898woYvOuo6zZrmPvIuI9fyb3KcAO51ylc64VeBGY02MZB6T7bmcA+wIXMTJ8uzSHcRkpPLBM07uIeM+fcs8G9nS7X+17rLt/A241s2rgdeDe3l7IzOaaWbmZldfU1JxC3PCVnNA1vb+3+xgrt0XXf5uIRJ5AnVC9CXjaOZcDXAv82sy+9NrOucXOuVLnXGlWVlaAVh0+brgol+zhQ3hQx95FxGP+lPteILfb/RzfY93dAbwM4JxbD6QAmYEIGEmSEuK4d0YRH1TXsuLTg17HEZEY5k+5bwKKzWyCmSXRdcJ0aY9ldgMzAczsbLrKPSaPTVx/UQ65I4bwgK57FxEPDVjuzrl24IfAG8BWuq6K2WJm95vZdb7FfgL8wMw+AF4Abncx2myJ8XHce1UxH+2tZdlWTe8i4g3zqoNLS0tdeXm5J+sOtraOTmb9fBXDkhJ47b4rMDOvI4lIlDCzzc650oGW00+oBkFifBz3zijmk/11vLHlgNdxRCQGqdyD5OuTxjEhcxgPLttGZ2dMHqESEQ+p3IMkIT6O+2YW8ffP6nljy2dexxGRGKNyD6LrLsimIGsYDy7bruldREJK5R5
|
||
|
"text/plain": [
|
||
|
"<matplotlib.figure.Figure at 0x7f02a4eb62b0>"
|
||
|
]
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"output_type": "display_data"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"n_clusters = [1, 2, 3, 4, 5]\n",
|
||
|
"inertias = []\n",
|
||
|
"\n",
|
||
|
"for n_cluster in n_clusters:\n",
|
||
|
" model = KMeans(n_clusters=n_cluster)\n",
|
||
|
" model.fit(points)\n",
|
||
|
" inertias.append(model.inertia_)\n",
|
||
|
"\n",
|
||
|
"plt.plot(n_clusters, inertias)\n",
|
||
|
"plt.show()"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Powyższy wykres przedstawia zależność pomiędzy liczbą klastrów, a funkcją kosztu. Można łatwo zauważyć, powyżej 3 klastrów zależność na wygładza się. Stąd, liczba 3 wydaje się być najlepszym wyborem."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Drugim popularnym algorytmem jest DB-SCAN, który nie wymaga `a priori` podania liczby klastrów, którą sam ją wyznacza. Ponadto, cechą tego modelu jest możliwość pominięcia niektórych punktów, które są oddalone od skupisk."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 133,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"from sklearn.cluster import DBSCAN"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Model DB-SCAN przyjmuje dwa parametry: eps - odległość pomiędzy punktami i minimalną liczbę punktów potrzebna do utworzenia klastra."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 168,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"array([0, 0, 0, ..., 2, 2, 2])"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 168,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"db = DBSCAN(eps=130, min_samples=10)\n",
|
||
|
"labels = db.fit_predict(points)\n",
|
||
|
"labels"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 169,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYkAAAD8CAYAAACCRVh7AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJzsnXd4FNXawH9nZkt203sBQu8gVZqKXcHeC9Yr9vZZrorXXrH33gUL6rWBINeOSJMmPUAIEEghvW+Z2TnfH5ssWXY3CR10fjw8yZ49M/POPpvzznmrkFJiYmJiYmISDmV/C2BiYmJicuBiKgkTExMTk4iYSsLExMTEJCKmkjAxMTExiYipJExMTExMImIqCRMTExOTiJhKwsTExMQkIqaSMDExMTGJiKkkTExMTEwiYtnfAuwuKSkpslOnTvtbDBMTE5ODisWLF5dJKVNbm3fQK4lOnTqxaNGi/S2GiYmJyUGFEGJzW+aZ5iYTExMTk4iYSsLExMTEJCKmkjAxMTExiYipJExMTExMImIqCROTvzFVpdUs/301JVvK9rcoJgcpB310k4nJ3qBOq2Obp4RUeypx1tj9Lc5OYxgGr9z8HjPf+wWb3Yrm0Rhy/AD+8+ktRDnt+1s8k4MIU0mYmDTDkAaTNn3EH2VzsSgWdENjePJwLut4MYpQsCgHx5/M1y9O54cPfkNza2huDYDFPy7j1Zvf4/Z3rtvP0pkcTIiDvX3p0KFDpZknYbKn+LZgGt8VzcBreANjSqNVViLJdmZzWaeL6RrTZX+J2CbGdbyW0i3lIeNWu5WpNZOwWA8OZWey9xBCLJZSDm1tnumTMDFpxo/bfgpSEABG4z+JZHPDZp7MeYYSd+l+krBt1Fc3hB03fD68jTsLE5O2YCoJE5NmNPhcrc7RDI3/bfthH0iz6xwyug9CiJDxjM7pOGMd+0Eik4MVU0mYmDSjc3SnVucYGOQ3bNn7wuwGVz11CY7YKCxWFQBFVbA77dzyxtX7WTKTgw1TSZiYNOPi7HHYFVvADxGJFFvKPpJo18ju1Y63lz/LydccT89Du3LMRUfw8vzHGXh0v/0tmslBhum4NjHZgUJXEd8VzmBh5aIQ/0QTE3rdQe+4XvtYMhOTPYfpuDbZb0gpKfWUUemt3N+i7BJZjkyu7jqe8Z0vxyZsIe+n2FLoFdtzP0hmYrLvMePgTPYoG+ryeH3DW1R5qwBJO0c7buh2HWlRrZatP+AYnjSMtbXrmF06B0UoCARRqp07et4a1in8d2XtwlzWLtxAWnYKh44ZiGpR97dIJvsQ09xksseo0Wq4Y9kE3IYnMCYQxFvjeXbAkwdNItqOlLhLWVe3nnhLHH3ie6OKf8YiqXk17jvtSVbNycHwSVSrQnS8k+d/f4SMTmn7WzyT3WSfmpuEEO8JIUqEECubjSUJIX4UQqxv/JnYOC6EEC8JIXKFEMuFEIObHXNZ4/z1QojL9oRsJvuO2aVz8EkjaEwicfvcrKheGeGo3afQVcgPxT8xp2wurgghrHVaHW9seJsrF17L+IXX8Mr616jyVrfp/GlRqRyeMor+Cf32ioKoKq1myc8r2LqucI+fe3f477PTWDl7De56D163F1etm4rCSh4f9+L+Fs1kH7KnHu0+AF4BJjUbmwD8LKV8QggxofH1XcBYoHvj/+HA68BwIUQS8AAwFJDAYiHEVCnlwWnY/gdS5i1Hk6GJWj7po1Kr2mPXKXZvY3bpH9Tr9ZR7K1hTkwNIFKEyadPH3N7zFnrEdg/MN6TBo2smUuIuxYcPgMWVS8mr38iThzyOVbEipSSndi1/lM3BkAYjkkdwSHy/vWJWys8p4IP7p7BqzloMn0FdZR12px3dq9NjaFce/vYuYhKi9/h1d5bv3/0Fj2uHxEJDkrs0j+qyGuJT4vaTZCb7kj2iJKSUvwshOu0wfDpwVOPvHwK/4VcSpwOTpN/ONV8IkSCEyGyc+6OUsgJACPEjMAb4dE/IaLL36RnbnTllc/E0MzcBCCHoGt15j1xjbtk83t/4IT5pBBb8AFIH4IV1L/PSoOcC5q3lVSuo9FYGzTcwqNfrWVS5hJHJw5my5XN+KfktEM20uHIpQxOHcFWXK/aootiytoAbh0/AXe9BGttNvXpjhvSaBet56vJXePibu/bYNXcVXdMjvCPQNV+E90z+buzN6KZ0KWVR4+/FQHrj7+2A5plIWxvHIo2HIIS4WgixSAixqLT0wC6P8E9iaOIQUuzJWMT2Zw+bYqNPXG86Rnfc7fO7fW7e3zQJr9RCFUQzfNJHbt2GwOsCdyFeI3SH4zY8vL/xAyYsv5cfi38OCnf1GB4WVS5iQ33ebsvdnMkPfYFnBwXRHN2rs+h/f1FfXb9Hr7srHHneKKz20OfIzM5pJGcm7geJTPYH+yQEtnHXsMc85FLKt6SUQ6WUQ1NTD76omb8rFsXCfX3+w9iME0m1p5AZlcnZ7c7kpm7X75Hz59SuRW3DV1bgVxRNZEZlYlNCQ1kBPIaXIndRWKXjNTSWVS3fZXnDsWruWowICqIJoSjU17ReHmRvc/G9Z5PRKQ1HTBQANocNZ6yDuybftJ8lM9mX7M1wk21CiEwpZVGjOamkcbwA6NBsXvvGsQK2m6eaxn/bi/KZ7AUcqoNzOpzFOR3OavMxbp+bbe5tJNqSgno31Gq1/LjtZ1ZWryLJlkTP2O5tetKQQPdmPokBCf2Js8aieVregeyIKlSilD3beyGtYwol+S03AIpNjCalXdIeve6uEB0fzZvLnmH2lwtYNTeHzM7pHH/pkaYv4h/G3lQSU4HLgCcaf37bbPxGIcQU/I7r6kZF8j/g8aYoKOAE4O69KJ/JfkZKyTcFU5lR/D2qUNEMncGJA7mqy3hcPhf3rXyIBr0eTepsqM9jWfVyaCFk2yIsKAiu6XoVNsUaGFeFyn197mHSpo9YUrU0aJfREgLBiOThu32fzRn3n7N56Oxn8DR4Is5J65jK1nVFZPcKa23dp1htVo658HCOufDw/S2KyX5ijygJIcSn+HcBKUKIrfijlJ4APhdCjAc2A+c1Tp8BnATkAg3AvwCklBVCiEeAhY3zHm5yYpv8PZlTNpcZxTMb/QV+n8HSymVM2vQxDtVBnV4XtKB7DS9Rih1FKhhsD7XNjMpkQHx/LIqF0amHkx6VHnSdOq2OH7f9RKmnlP5xfanwVpLvCl+gT0UN7Da6x3QjxhKzR+/50BMHcuPLV/DmvyeheXQMn4/oeCfVZbUBP0XO/PXcNPxuXl/yFFldM/bo9U1MdhYzmc5kt6nRavi2cBp/VS7DaXFyeMphVHurWVObQ6o9lbEZJ9I5plPIcf/+awKl3tDAA6uwkGRLZptnW8h7qlARUqCzPfLGv4NQkBgIoXB06pGcn30uqlCp0+q4Z8X91Op1gcXfIixIZJt2FDZh4+kBT5Bgi9+JT6R1fLqPsoIKLDYLl3W7MSTUVFEVTrjsSG5/Z8/4c0xMdqStyXQHZwqsyQFDvd7A/Ssfokav9S+63nI+yZ+CQCCRbKzfxNKqv7imy5UMTRoSOE43dMrCKAgAQ0qiLU4IY5EJt7DrslmopoRfS2cBMK7jBby38QOq9OqQ+SoqMWo0bsMTfPwOeKWXt/Le4c5et7f0Mew0qkUlvWMquX9tRLWqsIOf2vAZrJyzdo9e08RkVzAL/JnsFr+VzKJOrw9ZvGWji1ki8RpePtg0GaNZNvay6uWICF8/q2Lh5Myx2HeISFJpW7az1/Dya+ksVlavZknVX2Hn2BQrt/S4mXZRrdv9/cl6e4eUdklonvBKyjQ1mRwImErCZLdYVbM6bJb1jngMD+Vef89ln/RR5a1CEeG/ft1juzM0aQgnpp+AYHsim4FBtNq2TGQpJdMKvwsoqx3RDJ14azxVeuuZ4HuzlF9CajyjTh+KLcoaNG532rjw7jP34pVNTNqGaW4y2S1S7akoBDuSw2FIg2L3Nl5a/yr5DVuwCEvQzqIJq7ByZOoRgD8JTkUN+B8kEo/hwSIsLZqIwL9TqGihVHl6VBppUal0ju7IX63
|
||
|
"text/plain": [
|
||
|
"<matplotlib.figure.Figure at 0x7f02adea9550>"
|
||
|
]
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"output_type": "display_data"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"plt.scatter(x=points[0], y=points[1], c=labels)\n",
|
||
|
"plt.show()"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"**zadanie 6** Przeskaluj dane, tak aby miały rozkład standardowy (średnia = 0 , std = 1). I uruchom model SB-SCAN i k-średnich. Czy normalizacja zmieniła coś?"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": []
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"### Redukcja wymiaru\n",
|
||
|
"\n",
|
||
|
"Jedną z wad algorytmu k-średnich jest czas trenowania, który rośnie z wymiarem danych, jak ich z liczbą przykładów trenujących. Podstawową techniką w takim przypadku jest zmniejszenie wymiarowości danych. Najprostszą techniką jest [PCA](https://en.wikipedia.org/wiki/Principal_component_analysis)."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 189,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"from sklearn.decomposition import PCA"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Ściągnijmy zbiór dancych MNIST, który pojawił się na naszych zajęciach."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 183,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"mnist = fetch_mldata('MNIST original')\n",
|
||
|
"X = mnist.data.astype('float64')\n",
|
||
|
"y = mnist.target"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Podczas tworzeania PCA, możemy podać wyjsciową liczbę wymiarów (argument *n_components*). "
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 190,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"PCA(copy=True, iterated_power='auto', n_components=10, random_state=None,\n",
|
||
|
" svd_solver='auto', tol=0.0, whiten=False)"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 190,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"pca = PCA(n_components=10)\n",
|
||
|
"pca.fit(X)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 191,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"mnist_pca = pca.transform(X)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"**zadanie 7** Wytrenuj K-Means na wyjściu z PCA. Ustaw liczbę klastrów na 10. Ponadto zapisz do `mnist_clasters` numer klastra, do którego został on przydzielony."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 192,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": []
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"**zadanie 8** Zmienna `y` zawiera informację o prawidłowych oznaczeniach: tj. liczby od 0 do 9 (włącznie). Dla każdej cyfry *i* znajdz klaster *j*, w którym znajduje się najwięcej cyfr *i*."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 198,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"array([1, 7, 8, 3, 0, 2, 4, 1, 6, 0])"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 198,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": []
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"**zadanie 9** mając wyznaczone klasy z poprzedniego zadania, sumuj liczbę elementów w najpopularniejszym klastrze."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 200,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"0.5762857142857143"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 200,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": []
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"**zadanie 10** Oblicz accuracy biorąc wynik z poprzedniego zadania."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 88,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": []
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"**zadanie 11** Spróbuj podwyższych wynik, stosując np. normalizację lub zmieniając parametry."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": []
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"**Gratuluję!**"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"metadata": {
|
||
|
"kernelspec": {
|
||
|
"display_name": "Python 3",
|
||
|
"language": "python",
|
||
|
"name": "python3"
|
||
|
},
|
||
|
"language_info": {
|
||
|
"codemirror_mode": {
|
||
|
"name": "ipython",
|
||
|
"version": 3
|
||
|
},
|
||
|
"file_extension": ".py",
|
||
|
"mimetype": "text/x-python",
|
||
|
"name": "python",
|
||
|
"nbconvert_exporter": "python",
|
||
|
"pygments_lexer": "ipython3",
|
||
|
"version": "3.7.2"
|
||
|
}
|
||
|
},
|
||
|
"nbformat": 4,
|
||
|
"nbformat_minor": 2
|
||
|
}
|