DSIC-Bayes-continuous/Bayes.ipynb

1574 lines
428 KiB
Plaintext
Raw Permalink Normal View History

2021-05-26 21:08:58 +02:00
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-05-31 20:09:39 +02:00
"# Klasyfikacja za pomocą naiwnej metody bayesowskiej (rozkłady ciągłe)"
2021-05-26 21:08:58 +02:00
]
},
{
2021-05-31 20:09:39 +02:00
"cell_type": "markdown",
2021-05-26 21:08:58 +02:00
"metadata": {},
"source": [
2021-05-31 20:09:39 +02:00
"Skład grupy:\n",
"- Nowak Ania,\n",
"- Łaźna Patrycja,\n",
"- Bregier Damian"
2021-05-26 21:08:58 +02:00
]
},
{
"cell_type": "code",
2021-05-31 20:09:39 +02:00
"execution_count": 1,
2021-05-26 21:08:58 +02:00
"metadata": {},
"outputs": [],
"source": [
2021-05-31 20:09:39 +02:00
"#!pip install pandas==1.2.4\n",
"#!pip install numpy==1.20.3\n",
"#!pip install sklearn==0.0\n",
"\n",
2021-05-26 21:08:58 +02:00
"from sklearn.model_selection import train_test_split\n",
"import pandas as pd\n",
"import numpy as np\n",
"import typing\n",
"import os, pickle\n",
"from sklearn.metrics import confusion_matrix, accuracy_score\n",
"%matplotlib inline\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns; sns.set()"
2021-05-26 21:08:58 +02:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-05-31 20:09:39 +02:00
"# 0. Podstawowe informacje o zbiorze danych"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"W projekcie wykorzystany został GTZAN Dataset poruszający problem wieloklasowej klasyfikacji danych na przykładzie gatunków muzycznych. Zbiór ten składa się z 10 gatunków obejmujacych: blues, muzykę klasyczną, country, disco, hip-hop, jazz, pop, reggae oraz rock. Każdy ze wspomnianych gatunków jest reprezentowany przez 100 plików audio o długości 30 sekund, a same próbki były zbierane w latach 2000-2001 ze zdyfersyfikowanych źródeł obejmujących: stacje radiowe, prywatne płyty CD oraz nagrania własne.\n",
"\n",
"Zbiór danych jest niezwykle bogaty i rozbudowany, ponieważ do każdego utworu zostało przypisanych 60 unikalnych parametrów. Parametry te obejmują takie dane jak: długość utworu, etykietę z nazwą gatunku, tempo, harmoniczność, variancję czy częstotliwość melodyczną (MFCC).\n",
"\n",
"Dokładne dane na temat tego zbioru danych można znaleźć pod adresem: https://www.kaggle.com/andradaolteanu/gtzan-dataset-music-genre-classification\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 1. Wczytywanie i normalizacja danych"
2021-05-26 21:08:58 +02:00
]
},
{
"cell_type": "code",
2021-05-31 20:09:39 +02:00
"execution_count": 2,
2021-05-26 21:08:58 +02:00
"metadata": {},
"outputs": [],
"source": [
2021-05-31 20:09:39 +02:00
"# Słownik zawierający 10 gatunków muzycznych, które zostały sparowane z\n",
"# odpowiadającymi im wartościami numerycznymi\n",
2021-05-26 21:08:58 +02:00
"genre_dict = {\n",
" \"blues\" : 1,\n",
" \"classical\" : 2,\n",
" \"country\" : 3,\n",
" \"disco\" : 4,\n",
" \"hiphop\" : 5,\n",
" \"jazz\" : 6,\n",
" \"metal\" : 7,\n",
" \"pop\" : 8,\n",
" \"reggae\" : 9,\n",
" \"rock\" : 10\n",
"}\n",
2021-05-31 20:09:39 +02:00
"# nazwa pliku w którym umieszczane są parametry po wstępnym przetworzeniu\n",
2021-05-26 21:08:58 +02:00
"filename = 'music_genre.csv'\n",
"model_path = 'model.model'"
]
},
{
"cell_type": "code",
2021-05-31 20:09:39 +02:00
"execution_count": 3,
2021-05-26 21:08:58 +02:00
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
2021-06-01 17:48:18 +02:00
"Preparing data...\n"
2021-05-26 21:08:58 +02:00
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>genre</th>\n",
" <th>chroma_stft_mean</th>\n",
" <th>chroma_stft_var</th>\n",
" <th>rms_mean</th>\n",
" <th>rms_var</th>\n",
" <th>spectral_centroid_mean</th>\n",
" <th>spectral_centroid_var</th>\n",
" <th>spectral_bandwidth_mean</th>\n",
" <th>spectral_bandwidth_var</th>\n",
" <th>rolloff_mean</th>\n",
" <th>...</th>\n",
" <th>mfcc16_var</th>\n",
" <th>mfcc17_mean</th>\n",
" <th>mfcc17_var</th>\n",
" <th>mfcc18_mean</th>\n",
" <th>mfcc18_var</th>\n",
" <th>mfcc19_mean</th>\n",
" <th>mfcc19_var</th>\n",
" <th>mfcc20_mean</th>\n",
" <th>mfcc20_var</th>\n",
2021-05-31 20:09:39 +02:00
" <th>label</th>\n",
2021-05-26 21:08:58 +02:00
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>0.350088</td>\n",
" <td>0.088757</td>\n",
" <td>0.130228</td>\n",
" <td>0.002827</td>\n",
" <td>1784.165850</td>\n",
" <td>1.297741e+05</td>\n",
" <td>2002.449060</td>\n",
" <td>85882.761315</td>\n",
" <td>3805.839606</td>\n",
" <td>...</td>\n",
" <td>52.420910</td>\n",
" <td>-1.690215</td>\n",
" <td>36.524071</td>\n",
" <td>-0.408979</td>\n",
" <td>41.597103</td>\n",
" <td>-2.303523</td>\n",
" <td>55.062923</td>\n",
" <td>1.221291</td>\n",
" <td>46.936035</td>\n",
2021-05-31 20:09:39 +02:00
" <td>blues</td>\n",
2021-05-26 21:08:58 +02:00
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1</td>\n",
" <td>0.340914</td>\n",
" <td>0.094980</td>\n",
" <td>0.095948</td>\n",
" <td>0.002373</td>\n",
" <td>1530.176679</td>\n",
" <td>3.758501e+05</td>\n",
" <td>2039.036516</td>\n",
" <td>213843.755497</td>\n",
" <td>3550.522098</td>\n",
" <td>...</td>\n",
" <td>55.356403</td>\n",
" <td>-0.731125</td>\n",
" <td>60.314529</td>\n",
" <td>0.295073</td>\n",
" <td>48.120598</td>\n",
" <td>-0.283518</td>\n",
" <td>51.106190</td>\n",
" <td>0.531217</td>\n",
" <td>45.786282</td>\n",
2021-05-31 20:09:39 +02:00
" <td>blues</td>\n",
2021-05-26 21:08:58 +02:00
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1</td>\n",
" <td>0.363637</td>\n",
" <td>0.085275</td>\n",
" <td>0.175570</td>\n",
" <td>0.002746</td>\n",
" <td>1552.811865</td>\n",
" <td>1.564676e+05</td>\n",
" <td>1747.702312</td>\n",
" <td>76254.192257</td>\n",
" <td>3042.260232</td>\n",
" <td>...</td>\n",
" <td>40.598766</td>\n",
" <td>-7.729093</td>\n",
" <td>47.639427</td>\n",
" <td>-1.816407</td>\n",
" <td>52.382141</td>\n",
" <td>-3.439720</td>\n",
" <td>46.639660</td>\n",
" <td>-2.231258</td>\n",
" <td>30.573025</td>\n",
2021-05-31 20:09:39 +02:00
" <td>blues</td>\n",
2021-05-26 21:08:58 +02:00
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1</td>\n",
" <td>0.404785</td>\n",
" <td>0.093999</td>\n",
" <td>0.141093</td>\n",
" <td>0.006346</td>\n",
" <td>1070.106615</td>\n",
" <td>1.843559e+05</td>\n",
" <td>1596.412872</td>\n",
" <td>166441.494769</td>\n",
" <td>2184.745799</td>\n",
" <td>...</td>\n",
" <td>44.427753</td>\n",
" <td>-3.319597</td>\n",
" <td>50.206673</td>\n",
" <td>0.636965</td>\n",
" <td>37.319130</td>\n",
" <td>-0.619121</td>\n",
" <td>37.259739</td>\n",
" <td>-3.407448</td>\n",
" <td>31.949339</td>\n",
2021-05-31 20:09:39 +02:00
" <td>blues</td>\n",
2021-05-26 21:08:58 +02:00
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1</td>\n",
" <td>0.308526</td>\n",
" <td>0.087841</td>\n",
" <td>0.091529</td>\n",
" <td>0.002303</td>\n",
" <td>1835.004266</td>\n",
" <td>3.433999e+05</td>\n",
" <td>1748.172116</td>\n",
" <td>88445.209036</td>\n",
" <td>3579.757627</td>\n",
" <td>...</td>\n",
" <td>86.099236</td>\n",
" <td>-5.454034</td>\n",
" <td>75.269707</td>\n",
" <td>-0.916874</td>\n",
" <td>53.613918</td>\n",
" <td>-4.404827</td>\n",
" <td>62.910812</td>\n",
" <td>-11.703234</td>\n",
" <td>55.195160</td>\n",
2021-05-31 20:09:39 +02:00
" <td>blues</td>\n",
2021-05-26 21:08:58 +02:00
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>1</td>\n",
" <td>0.302456</td>\n",
" <td>0.087532</td>\n",
" <td>0.103494</td>\n",
" <td>0.003981</td>\n",
" <td>1831.993940</td>\n",
" <td>1.030482e+06</td>\n",
" <td>1729.653287</td>\n",
" <td>201910.508633</td>\n",
" <td>3481.517592</td>\n",
" <td>...</td>\n",
" <td>72.549225</td>\n",
" <td>-1.838263</td>\n",
" <td>68.702026</td>\n",
" <td>-2.783800</td>\n",
" <td>42.447453</td>\n",
" <td>-3.047909</td>\n",
" <td>39.808784</td>\n",
" <td>-8.109991</td>\n",
" <td>46.311005</td>\n",
2021-05-31 20:09:39 +02:00
" <td>blues</td>\n",
2021-05-26 21:08:58 +02:00
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>1</td>\n",
" <td>0.291328</td>\n",
" <td>0.093981</td>\n",
" <td>0.141874</td>\n",
" <td>0.008803</td>\n",
" <td>1459.366472</td>\n",
" <td>4.378594e+05</td>\n",
" <td>1389.009131</td>\n",
" <td>185023.239545</td>\n",
" <td>2795.610963</td>\n",
" <td>...</td>\n",
" <td>83.248245</td>\n",
" <td>-10.913176</td>\n",
" <td>56.902153</td>\n",
" <td>-6.971336</td>\n",
" <td>38.231800</td>\n",
" <td>-3.436505</td>\n",
" <td>48.235741</td>\n",
" <td>-6.483466</td>\n",
" <td>70.170364</td>\n",
2021-05-31 20:09:39 +02:00
" <td>blues</td>\n",
2021-05-26 21:08:58 +02:00
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>1</td>\n",
" <td>0.307955</td>\n",
" <td>0.092903</td>\n",
" <td>0.131822</td>\n",
" <td>0.005531</td>\n",
" <td>1451.667066</td>\n",
" <td>4.495682e+05</td>\n",
" <td>1577.270941</td>\n",
" <td>168211.938804</td>\n",
" <td>2954.836760</td>\n",
" <td>...</td>\n",
" <td>70.438438</td>\n",
" <td>-10.568935</td>\n",
" <td>52.090893</td>\n",
" <td>-10.784515</td>\n",
" <td>60.461330</td>\n",
" <td>-4.690678</td>\n",
" <td>65.547516</td>\n",
" <td>-8.630722</td>\n",
" <td>56.401436</td>\n",
2021-05-31 20:09:39 +02:00
" <td>blues</td>\n",
2021-05-26 21:08:58 +02:00
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>1</td>\n",
" <td>0.408879</td>\n",
" <td>0.086512</td>\n",
" <td>0.142416</td>\n",
" <td>0.001507</td>\n",
" <td>1719.368948</td>\n",
" <td>1.632828e+05</td>\n",
" <td>2031.740381</td>\n",
" <td>105542.718193</td>\n",
" <td>3782.316288</td>\n",
" <td>...</td>\n",
" <td>50.563751</td>\n",
" <td>-7.041824</td>\n",
" <td>28.894934</td>\n",
" <td>2.695248</td>\n",
" <td>36.889568</td>\n",
" <td>3.412305</td>\n",
" <td>33.698597</td>\n",
" <td>-2.715692</td>\n",
" <td>36.418430</td>\n",
2021-05-31 20:09:39 +02:00
" <td>blues</td>\n",
2021-05-26 21:08:58 +02:00
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>1</td>\n",
" <td>0.273950</td>\n",
" <td>0.092316</td>\n",
" <td>0.081314</td>\n",
" <td>0.004347</td>\n",
" <td>1817.150863</td>\n",
" <td>2.982361e+05</td>\n",
" <td>1973.773306</td>\n",
" <td>114070.112591</td>\n",
" <td>3943.490565</td>\n",
" <td>...</td>\n",
" <td>59.314602</td>\n",
" <td>-1.916804</td>\n",
" <td>58.418438</td>\n",
" <td>-2.292661</td>\n",
" <td>83.205231</td>\n",
" <td>2.881967</td>\n",
" <td>77.082222</td>\n",
" <td>-4.235203</td>\n",
" <td>91.468811</td>\n",
2021-05-31 20:09:39 +02:00
" <td>blues</td>\n",
2021-05-26 21:08:58 +02:00
" </tr>\n",
" </tbody>\n",
"</table>\n",
2021-05-31 20:09:39 +02:00
"<p>10 rows × 59 columns</p>\n",
2021-05-26 21:08:58 +02:00
"</div>"
],
"text/plain": [
" genre chroma_stft_mean chroma_stft_var rms_mean rms_var \\\n",
"0 1 0.350088 0.088757 0.130228 0.002827 \n",
"1 1 0.340914 0.094980 0.095948 0.002373 \n",
"2 1 0.363637 0.085275 0.175570 0.002746 \n",
"3 1 0.404785 0.093999 0.141093 0.006346 \n",
"4 1 0.308526 0.087841 0.091529 0.002303 \n",
"5 1 0.302456 0.087532 0.103494 0.003981 \n",
"6 1 0.291328 0.093981 0.141874 0.008803 \n",
"7 1 0.307955 0.092903 0.131822 0.005531 \n",
"8 1 0.408879 0.086512 0.142416 0.001507 \n",
"9 1 0.273950 0.092316 0.081314 0.004347 \n",
"\n",
" spectral_centroid_mean spectral_centroid_var spectral_bandwidth_mean \\\n",
"0 1784.165850 1.297741e+05 2002.449060 \n",
"1 1530.176679 3.758501e+05 2039.036516 \n",
"2 1552.811865 1.564676e+05 1747.702312 \n",
"3 1070.106615 1.843559e+05 1596.412872 \n",
"4 1835.004266 3.433999e+05 1748.172116 \n",
"5 1831.993940 1.030482e+06 1729.653287 \n",
"6 1459.366472 4.378594e+05 1389.009131 \n",
"7 1451.667066 4.495682e+05 1577.270941 \n",
"8 1719.368948 1.632828e+05 2031.740381 \n",
"9 1817.150863 2.982361e+05 1973.773306 \n",
"\n",
2021-05-31 20:09:39 +02:00
" spectral_bandwidth_var rolloff_mean ... mfcc16_var mfcc17_mean \\\n",
"0 85882.761315 3805.839606 ... 52.420910 -1.690215 \n",
"1 213843.755497 3550.522098 ... 55.356403 -0.731125 \n",
"2 76254.192257 3042.260232 ... 40.598766 -7.729093 \n",
"3 166441.494769 2184.745799 ... 44.427753 -3.319597 \n",
"4 88445.209036 3579.757627 ... 86.099236 -5.454034 \n",
"5 201910.508633 3481.517592 ... 72.549225 -1.838263 \n",
"6 185023.239545 2795.610963 ... 83.248245 -10.913176 \n",
"7 168211.938804 2954.836760 ... 70.438438 -10.568935 \n",
"8 105542.718193 3782.316288 ... 50.563751 -7.041824 \n",
"9 114070.112591 3943.490565 ... 59.314602 -1.916804 \n",
2021-05-26 21:08:58 +02:00
"\n",
2021-05-31 20:09:39 +02:00
" mfcc17_var mfcc18_mean mfcc18_var mfcc19_mean mfcc19_var mfcc20_mean \\\n",
"0 36.524071 -0.408979 41.597103 -2.303523 55.062923 1.221291 \n",
"1 60.314529 0.295073 48.120598 -0.283518 51.106190 0.531217 \n",
"2 47.639427 -1.816407 52.382141 -3.439720 46.639660 -2.231258 \n",
"3 50.206673 0.636965 37.319130 -0.619121 37.259739 -3.407448 \n",
"4 75.269707 -0.916874 53.613918 -4.404827 62.910812 -11.703234 \n",
"5 68.702026 -2.783800 42.447453 -3.047909 39.808784 -8.109991 \n",
"6 56.902153 -6.971336 38.231800 -3.436505 48.235741 -6.483466 \n",
"7 52.090893 -10.784515 60.461330 -4.690678 65.547516 -8.630722 \n",
"8 28.894934 2.695248 36.889568 3.412305 33.698597 -2.715692 \n",
"9 58.418438 -2.292661 83.205231 2.881967 77.082222 -4.235203 \n",
2021-05-26 21:08:58 +02:00
"\n",
2021-05-31 20:09:39 +02:00
" mfcc20_var label \n",
"0 46.936035 blues \n",
"1 45.786282 blues \n",
"2 30.573025 blues \n",
"3 31.949339 blues \n",
"4 55.195160 blues \n",
"5 46.311005 blues \n",
"6 70.170364 blues \n",
"7 56.401436 blues \n",
"8 36.418430 blues \n",
"9 91.468811 blues \n",
2021-05-26 21:08:58 +02:00
"\n",
2021-05-31 20:09:39 +02:00
"[10 rows x 59 columns]"
2021-05-26 21:08:58 +02:00
]
},
"metadata": {},
"output_type": "display_data"
2021-05-31 20:09:39 +02:00
},
{
"data": {
"text/plain": [
"Index(['genre', 'chroma_stft_mean', 'chroma_stft_var', 'rms_mean', 'rms_var',\n",
" 'spectral_centroid_mean', 'spectral_centroid_var',\n",
" 'spectral_bandwidth_mean', 'spectral_bandwidth_var', 'rolloff_mean',\n",
" 'rolloff_var', 'zero_crossing_rate_mean', 'zero_crossing_rate_var',\n",
" 'harmony_mean', 'harmony_var', 'perceptr_mean', 'perceptr_var', 'tempo',\n",
" 'mfcc1_mean', 'mfcc1_var', 'mfcc2_mean', 'mfcc2_var', 'mfcc3_mean',\n",
" 'mfcc3_var', 'mfcc4_mean', 'mfcc4_var', 'mfcc5_mean', 'mfcc5_var',\n",
" 'mfcc6_mean', 'mfcc6_var', 'mfcc7_mean', 'mfcc7_var', 'mfcc8_mean',\n",
" 'mfcc8_var', 'mfcc9_mean', 'mfcc9_var', 'mfcc10_mean', 'mfcc10_var',\n",
" 'mfcc11_mean', 'mfcc11_var', 'mfcc12_mean', 'mfcc12_var', 'mfcc13_mean',\n",
" 'mfcc13_var', 'mfcc14_mean', 'mfcc14_var', 'mfcc15_mean', 'mfcc15_var',\n",
" 'mfcc16_mean', 'mfcc16_var', 'mfcc17_mean', 'mfcc17_var', 'mfcc18_mean',\n",
" 'mfcc18_var', 'mfcc19_mean', 'mfcc19_var', 'mfcc20_mean', 'mfcc20_var',\n",
" 'label'],\n",
" dtype='object')"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
2021-05-26 21:08:58 +02:00
}
],
"source": [
2021-05-31 20:09:39 +02:00
"# skrypt ten realizuje dwie podstawowe funkcje\n",
"# 1) sprawdza czy plik music_genre.csv istnieje i jeżeli tak to wczytuje go\n",
"# 2) w przeciwnym przypadku dokonuje preprocessingu danych w ramach którego\n",
"# gatunki zamieniane są na wartości licznowe, a wartości takie jak nazwa \n",
"# pliku, etykieta czy długość są usuwane\n",
" \n",
2021-05-26 21:08:58 +02:00
"if os.path.isfile(filename):\n",
" print(\"Loading prepared data...\")\n",
" data = pd.read_csv(filename)\n",
"else:\n",
" print(\"Preparing data...\")\n",
" data = pd.read_csv('music_genre_raw.csv')\n",
" column = data[\"label\"].apply(lambda x: genre_dict[x])\n",
" data.insert(0, 'genre', column, 'int')\n",
2021-05-31 20:09:39 +02:00
" data = data.drop(columns=['filename', 'length'])\n",
2021-05-26 21:08:58 +02:00
" data.to_csv(filename, index=False)\n",
2021-05-31 20:09:39 +02:00
"display(data.head(10))\n",
"\n",
"data.columns"
2021-05-26 21:08:58 +02:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-05-31 20:09:39 +02:00
"# 2. Podział danych na zbiory: uczący i testowy"
2021-05-26 21:08:58 +02:00
]
},
{
"cell_type": "code",
2021-05-31 20:09:39 +02:00
"execution_count": 4,
"metadata": {
"scrolled": true
},
2021-05-26 21:08:58 +02:00
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>chroma_stft_mean</th>\n",
" <th>chroma_stft_var</th>\n",
" <th>rms_mean</th>\n",
" <th>rms_var</th>\n",
" <th>spectral_centroid_mean</th>\n",
" <th>spectral_centroid_var</th>\n",
" <th>spectral_bandwidth_mean</th>\n",
" <th>spectral_bandwidth_var</th>\n",
" <th>rolloff_mean</th>\n",
" <th>rolloff_var</th>\n",
" <th>...</th>\n",
" <th>mfcc16_var</th>\n",
" <th>mfcc17_mean</th>\n",
" <th>mfcc17_var</th>\n",
" <th>mfcc18_mean</th>\n",
" <th>mfcc18_var</th>\n",
" <th>mfcc19_mean</th>\n",
" <th>mfcc19_var</th>\n",
" <th>mfcc20_mean</th>\n",
" <th>mfcc20_var</th>\n",
2021-05-31 20:09:39 +02:00
" <th>label</th>\n",
2021-05-26 21:08:58 +02:00
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>687</th>\n",
" <td>0.516547</td>\n",
" <td>0.072241</td>\n",
" <td>0.267380</td>\n",
" <td>0.001175</td>\n",
" <td>3338.581900</td>\n",
" <td>172002.893292</td>\n",
" <td>2697.128636</td>\n",
" <td>45771.294278</td>\n",
" <td>6670.863091</td>\n",
" <td>3.556853e+05</td>\n",
" <td>...</td>\n",
" <td>37.339474</td>\n",
" <td>-8.121326</td>\n",
" <td>33.968277</td>\n",
" <td>4.910113</td>\n",
" <td>42.063385</td>\n",
" <td>-2.474697</td>\n",
" <td>35.162354</td>\n",
" <td>3.192656</td>\n",
" <td>36.478157</td>\n",
2021-05-31 20:09:39 +02:00
" <td>metal</td>\n",
2021-05-26 21:08:58 +02:00
" </tr>\n",
" <tr>\n",
" <th>500</th>\n",
" <td>0.344511</td>\n",
" <td>0.085002</td>\n",
" <td>0.046747</td>\n",
" <td>0.001542</td>\n",
" <td>1503.869486</td>\n",
" <td>554576.511533</td>\n",
" <td>1754.216082</td>\n",
" <td>283554.933422</td>\n",
" <td>2799.283099</td>\n",
" <td>2.685679e+06</td>\n",
" <td>...</td>\n",
" <td>50.311016</td>\n",
" <td>-1.503434</td>\n",
" <td>41.141155</td>\n",
" <td>0.221949</td>\n",
" <td>55.707256</td>\n",
" <td>-1.991485</td>\n",
" <td>50.006485</td>\n",
" <td>-3.353825</td>\n",
" <td>49.906403</td>\n",
2021-05-31 20:09:39 +02:00
" <td>jazz</td>\n",
2021-05-26 21:08:58 +02:00
" </tr>\n",
" <tr>\n",
" <th>332</th>\n",
" <td>0.368345</td>\n",
" <td>0.090390</td>\n",
" <td>0.111073</td>\n",
" <td>0.004402</td>\n",
" <td>2446.919077</td>\n",
" <td>490397.099115</td>\n",
" <td>2449.159840</td>\n",
" <td>215375.540632</td>\n",
" <td>4958.057490</td>\n",
" <td>2.650020e+06</td>\n",
" <td>...</td>\n",
" <td>78.892769</td>\n",
" <td>-1.054999</td>\n",
" <td>79.877068</td>\n",
" <td>4.496278</td>\n",
" <td>112.834435</td>\n",
" <td>-0.978958</td>\n",
" <td>75.059898</td>\n",
" <td>-5.256925</td>\n",
" <td>120.275269</td>\n",
2021-05-31 20:09:39 +02:00
" <td>disco</td>\n",
2021-05-26 21:08:58 +02:00
" </tr>\n",
" <tr>\n",
" <th>979</th>\n",
" <td>0.360042</td>\n",
" <td>0.083953</td>\n",
" <td>0.116724</td>\n",
" <td>0.000789</td>\n",
" <td>2148.410463</td>\n",
" <td>253618.158995</td>\n",
" <td>2107.165355</td>\n",
" <td>72155.551685</td>\n",
" <td>4479.264304</td>\n",
" <td>9.787046e+05</td>\n",
" <td>...</td>\n",
" <td>37.060532</td>\n",
" <td>-13.479134</td>\n",
" <td>50.848667</td>\n",
" <td>3.308529</td>\n",
" <td>47.726006</td>\n",
" <td>-3.704957</td>\n",
" <td>56.781952</td>\n",
" <td>1.085497</td>\n",
" <td>54.243389</td>\n",
2021-05-31 20:09:39 +02:00
" <td>rock</td>\n",
2021-05-26 21:08:58 +02:00
" </tr>\n",
" <tr>\n",
" <th>817</th>\n",
" <td>0.425788</td>\n",
" <td>0.091852</td>\n",
" <td>0.139799</td>\n",
" <td>0.003601</td>\n",
" <td>1803.774378</td>\n",
" <td>659241.158049</td>\n",
" <td>1973.418903</td>\n",
" <td>201432.199120</td>\n",
" <td>3777.969679</td>\n",
" <td>2.632339e+06</td>\n",
" <td>...</td>\n",
" <td>64.068756</td>\n",
" <td>-2.219202</td>\n",
" <td>99.249870</td>\n",
" <td>5.304260</td>\n",
" <td>64.088127</td>\n",
" <td>-6.597187</td>\n",
" <td>62.661850</td>\n",
" <td>-2.923168</td>\n",
" <td>67.490440</td>\n",
2021-05-31 20:09:39 +02:00
" <td>reggae</td>\n",
2021-05-26 21:08:58 +02:00
" </tr>\n",
" <tr>\n",
" <th>620</th>\n",
" <td>0.495959</td>\n",
" <td>0.072854</td>\n",
" <td>0.117362</td>\n",
" <td>0.000867</td>\n",
" <td>2657.912854</td>\n",
" <td>189139.438926</td>\n",
" <td>2345.662472</td>\n",
" <td>32730.579626</td>\n",
" <td>5358.261979</td>\n",
" <td>5.918222e+05</td>\n",
" <td>...</td>\n",
" <td>27.937113</td>\n",
" <td>-10.676390</td>\n",
" <td>26.519361</td>\n",
" <td>3.875155</td>\n",
" <td>25.613684</td>\n",
" <td>-4.943561</td>\n",
" <td>24.334734</td>\n",
" <td>3.255899</td>\n",
" <td>25.199259</td>\n",
2021-05-31 20:09:39 +02:00
" <td>metal</td>\n",
2021-05-26 21:08:58 +02:00
" </tr>\n",
" <tr>\n",
" <th>814</th>\n",
" <td>0.395137</td>\n",
" <td>0.093939</td>\n",
" <td>0.114246</td>\n",
" <td>0.004025</td>\n",
" <td>1716.249594</td>\n",
" <td>920189.339374</td>\n",
" <td>2062.885827</td>\n",
" <td>358557.016423</td>\n",
" <td>3790.901258</td>\n",
" <td>4.734865e+06</td>\n",
" <td>...</td>\n",
" <td>66.090370</td>\n",
" <td>-4.590122</td>\n",
" <td>72.595345</td>\n",
" <td>4.261040</td>\n",
" <td>63.185764</td>\n",
" <td>-2.127876</td>\n",
" <td>50.693245</td>\n",
" <td>-3.665569</td>\n",
" <td>89.750290</td>\n",
2021-05-31 20:09:39 +02:00
" <td>reggae</td>\n",
2021-05-26 21:08:58 +02:00
" </tr>\n",
" <tr>\n",
" <th>516</th>\n",
" <td>0.249535</td>\n",
" <td>0.087563</td>\n",
" <td>0.060560</td>\n",
" <td>0.001276</td>\n",
" <td>1465.857446</td>\n",
" <td>143302.098295</td>\n",
" <td>1738.858902</td>\n",
" <td>58868.399307</td>\n",
" <td>2822.406728</td>\n",
" <td>7.392007e+05</td>\n",
" <td>...</td>\n",
" <td>109.811813</td>\n",
" <td>-0.027696</td>\n",
" <td>113.660950</td>\n",
" <td>2.098475</td>\n",
" <td>160.025497</td>\n",
" <td>1.109709</td>\n",
" <td>136.810165</td>\n",
" <td>2.935807</td>\n",
" <td>95.914490</td>\n",
2021-05-31 20:09:39 +02:00
" <td>jazz</td>\n",
2021-05-26 21:08:58 +02:00
" </tr>\n",
" <tr>\n",
" <th>518</th>\n",
" <td>0.353474</td>\n",
" <td>0.087755</td>\n",
" <td>0.052264</td>\n",
" <td>0.000316</td>\n",
" <td>1993.352766</td>\n",
" <td>64753.479332</td>\n",
" <td>2127.165109</td>\n",
" <td>36027.039069</td>\n",
" <td>4248.194549</td>\n",
" <td>3.987029e+05</td>\n",
" <td>...</td>\n",
" <td>57.230133</td>\n",
" <td>-1.110214</td>\n",
" <td>48.080849</td>\n",
" <td>-0.784249</td>\n",
" <td>57.033504</td>\n",
" <td>-2.984207</td>\n",
" <td>55.737625</td>\n",
" <td>0.350456</td>\n",
" <td>64.126846</td>\n",
2021-05-31 20:09:39 +02:00
" <td>jazz</td>\n",
2021-05-26 21:08:58 +02:00
" </tr>\n",
" <tr>\n",
" <th>940</th>\n",
" <td>0.416089</td>\n",
" <td>0.087772</td>\n",
" <td>0.142935</td>\n",
" <td>0.003150</td>\n",
" <td>3009.958707</td>\n",
" <td>435134.775688</td>\n",
" <td>2778.049758</td>\n",
" <td>135548.871316</td>\n",
" <td>6131.200719</td>\n",
" <td>1.788624e+06</td>\n",
" <td>...</td>\n",
" <td>42.315434</td>\n",
" <td>-3.953057</td>\n",
" <td>48.761936</td>\n",
" <td>-3.092345</td>\n",
" <td>49.514446</td>\n",
" <td>-2.731183</td>\n",
" <td>58.219994</td>\n",
" <td>-0.909785</td>\n",
" <td>63.111858</td>\n",
2021-05-31 20:09:39 +02:00
" <td>rock</td>\n",
2021-05-26 21:08:58 +02:00
" </tr>\n",
" </tbody>\n",
"</table>\n",
2021-05-31 20:09:39 +02:00
"<p>10 rows × 58 columns</p>\n",
2021-05-26 21:08:58 +02:00
"</div>"
],
"text/plain": [
" chroma_stft_mean chroma_stft_var rms_mean rms_var \\\n",
"687 0.516547 0.072241 0.267380 0.001175 \n",
"500 0.344511 0.085002 0.046747 0.001542 \n",
"332 0.368345 0.090390 0.111073 0.004402 \n",
"979 0.360042 0.083953 0.116724 0.000789 \n",
"817 0.425788 0.091852 0.139799 0.003601 \n",
"620 0.495959 0.072854 0.117362 0.000867 \n",
"814 0.395137 0.093939 0.114246 0.004025 \n",
"516 0.249535 0.087563 0.060560 0.001276 \n",
"518 0.353474 0.087755 0.052264 0.000316 \n",
"940 0.416089 0.087772 0.142935 0.003150 \n",
"\n",
" spectral_centroid_mean spectral_centroid_var spectral_bandwidth_mean \\\n",
"687 3338.581900 172002.893292 2697.128636 \n",
"500 1503.869486 554576.511533 1754.216082 \n",
"332 2446.919077 490397.099115 2449.159840 \n",
"979 2148.410463 253618.158995 2107.165355 \n",
"817 1803.774378 659241.158049 1973.418903 \n",
"620 2657.912854 189139.438926 2345.662472 \n",
"814 1716.249594 920189.339374 2062.885827 \n",
"516 1465.857446 143302.098295 1738.858902 \n",
"518 1993.352766 64753.479332 2127.165109 \n",
"940 3009.958707 435134.775688 2778.049758 \n",
"\n",
2021-05-31 20:09:39 +02:00
" spectral_bandwidth_var rolloff_mean rolloff_var ... mfcc16_var \\\n",
"687 45771.294278 6670.863091 3.556853e+05 ... 37.339474 \n",
"500 283554.933422 2799.283099 2.685679e+06 ... 50.311016 \n",
"332 215375.540632 4958.057490 2.650020e+06 ... 78.892769 \n",
"979 72155.551685 4479.264304 9.787046e+05 ... 37.060532 \n",
"817 201432.199120 3777.969679 2.632339e+06 ... 64.068756 \n",
"620 32730.579626 5358.261979 5.918222e+05 ... 27.937113 \n",
"814 358557.016423 3790.901258 4.734865e+06 ... 66.090370 \n",
"516 58868.399307 2822.406728 7.392007e+05 ... 109.811813 \n",
"518 36027.039069 4248.194549 3.987029e+05 ... 57.230133 \n",
"940 135548.871316 6131.200719 1.788624e+06 ... 42.315434 \n",
2021-05-26 21:08:58 +02:00
"\n",
2021-05-31 20:09:39 +02:00
" mfcc17_mean mfcc17_var mfcc18_mean mfcc18_var mfcc19_mean \\\n",
"687 -8.121326 33.968277 4.910113 42.063385 -2.474697 \n",
"500 -1.503434 41.141155 0.221949 55.707256 -1.991485 \n",
"332 -1.054999 79.877068 4.496278 112.834435 -0.978958 \n",
"979 -13.479134 50.848667 3.308529 47.726006 -3.704957 \n",
"817 -2.219202 99.249870 5.304260 64.088127 -6.597187 \n",
"620 -10.676390 26.519361 3.875155 25.613684 -4.943561 \n",
"814 -4.590122 72.595345 4.261040 63.185764 -2.127876 \n",
"516 -0.027696 113.660950 2.098475 160.025497 1.109709 \n",
"518 -1.110214 48.080849 -0.784249 57.033504 -2.984207 \n",
"940 -3.953057 48.761936 -3.092345 49.514446 -2.731183 \n",
2021-05-26 21:08:58 +02:00
"\n",
2021-05-31 20:09:39 +02:00
" mfcc19_var mfcc20_mean mfcc20_var label \n",
"687 35.162354 3.192656 36.478157 metal \n",
"500 50.006485 -3.353825 49.906403 jazz \n",
"332 75.059898 -5.256925 120.275269 disco \n",
"979 56.781952 1.085497 54.243389 rock \n",
"817 62.661850 -2.923168 67.490440 reggae \n",
"620 24.334734 3.255899 25.199259 metal \n",
"814 50.693245 -3.665569 89.750290 reggae \n",
"516 136.810165 2.935807 95.914490 jazz \n",
"518 55.737625 0.350456 64.126846 jazz \n",
"940 58.219994 -0.909785 63.111858 rock \n",
2021-05-26 21:08:58 +02:00
"\n",
2021-05-31 20:09:39 +02:00
"[10 rows x 58 columns]"
2021-05-26 21:08:58 +02:00
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
2021-05-31 20:09:39 +02:00
"# Podział ten jest dokonywany w proporcji 80:20, gdzie 80% danych trafia do zbioru uczącego, a 20%\n",
"# do zbioru testowego, podejście to jest standardową praktyką w dziedzinie uczenia maszynwego\n",
"\n",
"# wartość X reprezentuje 57 parametrów opisujących poszczególne utwory\n",
2021-05-26 21:08:58 +02:00
"X = data.drop([\"genre\"], axis=1)\n",
2021-05-31 20:09:39 +02:00
"# wartość Y zawiera kolumnę gatunków wyrażonych przy pomocy wartości liczbowych od 1 do 10\n",
2021-05-26 21:08:58 +02:00
"Y = data[\"genre\"]\n",
"X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.20, random_state = False)\n",
"display(X_train.head(10))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-05-31 20:09:39 +02:00
"### Ilość krotek dla poszczególnych gatunków z podziałem na zbiory: uczący i testowy"
2021-05-26 21:08:58 +02:00
]
},
{
"cell_type": "code",
2021-05-31 20:09:39 +02:00
"execution_count": 5,
2021-05-26 21:08:58 +02:00
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"blues\ttest: 15\ttrain: 85\tall: 100\n",
"classical\ttest: 11\ttrain: 89\tall: 100\n",
"country\ttest: 27\ttrain: 73\tall: 100\n",
"disco\ttest: 22\ttrain: 78\tall: 100\n",
"hiphop\ttest: 23\ttrain: 77\tall: 100\n",
"jazz\ttest: 18\ttrain: 82\tall: 100\n",
"metal\ttest: 20\ttrain: 80\tall: 100\n",
"pop\ttest: 24\ttrain: 76\tall: 100\n",
"reggae\ttest: 15\ttrain: 85\tall: 100\n",
"rock\ttest: 25\ttrain: 75\tall: 100\n"
]
}
],
"source": [
2021-05-31 20:09:39 +02:00
"# skrypt odpowiadający za przeiterowanie po słowniku i zliczenie liczebności poszczególnych gatunków\n",
"# w ramach podziału na zbiory: uczący i testowy\n",
"\n",
2021-05-26 21:08:58 +02:00
"for key in genre_dict.keys():\n",
" count = len(data[data[\"genre\"]==genre_dict[key]])\n",
" count_train = len(X_train[Y_train==genre_dict[key]])\n",
" count_test = len(X_test[Y_test==genre_dict[key]])\n",
" print(f\"{key}\\ttest: {count_test}\\ttrain: {count_train}\\tall: {count}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-05-31 20:09:39 +02:00
"# 3. Wizualizacja danych"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-06-01 11:26:48 +02:00
"### Boxplot dla tempa gatunków"
]
},
2021-05-31 20:09:39 +02:00
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Jedną z najciekawszych i najbardziej intuicyjnych wartości mierzalnych dla poszczególnych utworów jest tempo. Parametr ten został przedstawiony przy pomocy wykresu pudełkowego w odniesieniu do wspomnianych wcześniej 10 gatunków muzycznych.\n",
"\n",
2021-06-01 13:05:09 +02:00
"Ze zgromadzonych danych jednoznacznie wynika, że najwyższą medianę dla tempa mają utwory z gatunku Reggee, zaś na drugim i trzecim miejscu znajdują się odpowiednio muzyka klasyczna oraz blues. Podczas gdy najniższe wartości mają gatunki hip-hop oraz pop. \n",
2021-05-31 20:09:39 +02:00
"\n",
"Z kolei największe rozbieżności pomiędzy wartościami zauważalne są w przypadku muzyki klasycznej, country i metalu, chociaż najwięcej obserwacji odstających pojawia się w przypadku hiphopu oraz popu."
]
},
{
"cell_type": "code",
"execution_count": 6,
2021-06-01 11:26:48 +02:00
"metadata": {
"scrolled": false
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA7cAAAI5CAYAAAB6qc0fAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAB3i0lEQVR4nO3dd3QUZd/G8WsTQigJ0juhuiAlIagUpUtRFGmKtIAUBVRUbBQRUUAeBekgCggaUHq1PSoCCmIQQSIdQkswhA4pkIRk3j/y7j4sJCEhZTLh+zmHc9id2Xt/MzuZnWvve2ZshmEYAgAAAADAwtzMLgAAAAAAgIwi3AIAAAAALI9wCwAAAACwPMItAAAAAMDyCLcAAAAAAMsj3AIAAAAALI9wCyBVM2bMUPXq1dP1b9WqVRl6z6CgIGdb169fz6QlsY6oqCg1bNhQ1atX16BBg5SQkHDb10RERMjPz0/Vq1fX8OHDs6HKnKVly5aqXr26li9fnuXvFRYW5tw+T5w4keXvl15Xr15VWFiY2WXA4i5duqSzZ8+aWkNAQICqV6+uKVOmpGn+u/27AwDhFsBtlClTRvXq1bvtvxIlSjhf4+HhYWLFqbsxrAcFBZldTrLmzp2rixcvqm7dupoyZYrc3d1v+5rp06fr2rVratKkicaNG5cNVSInWr9+vdq2batt27aZXQosbOHChWrTpo0OHz5sdikAkC55zC4AQM721FNP6amnnkp1nsOHD6t79+6SJF9fX7Vt2zY7SsuVzpw5oy+++EKVKlXSJ598ovz589/2NSEhIVq9erXq1Kmj6dOnK0+eu2/XvnDhQsXHx6tkyZJml2KqKVOmKCIiwuwyYHETJkwwu4Q74uvrq++++06S7sr9IADCLYAMunz5sl544QVFRkaqcOHCmjZtmvLmzWt2WSkqUqSIKleuLElpCo7ZrWTJkvr777/T9ZqqVatq3759WVOQRfj4+JhdAgCT5c+fX1WrVjW7DAAmItwCuGOGYWjYsGE6efKk3NzcNHHiRJUtW9bsslLVq1cv9erVy+wyAAAAkMkItwDu2Lx587Rx40ZJ0uDBg9W0adNk5ztx4oQWLVqkoKAg/fvvv7p69aq8vLxUvXp1PfHEE+rSpUuazit1OHDggBYsWKCgoCCdO3dOBQsWVO3atdW1a9dbhkSvWrVKI0aMSFO7X375pRo0aKCgoCD17t1bfn5+Wrx4sQIDA7VmzRqdOHFCHh4eqlWrlgICAtSqVatk24mIiNDChQu1efNmnTp1Sm5ubvLx8VHr1q3Vu3dvFSpU6JbXhISEaN68eQoODtapU6fk7u6uChUqqFmzZurdu7eKFSt2y2sMw9C6deu0Zs0aHTp0SJcvX1aJEiXUsGFDDRw4UJUqVUrTcs+YMUMzZ85U7969NWDAAE2ePFm//faboqOjVa5cObVr1059+vSRt7d3sq//559/9OWXX+rPP//UuXPnVKBAAVWvXl0dOnRQp06dbvlsAwICtH37di1YsEBeXl765JNPtHPnTsXGxqpy5crq06ePOnbsKElavny5vv76ax09elTu7u7y8/PTyy+/rLp167q02bJlS506dUrjxo3T008/7TItKipKX3zxhX766SedOHFChmGoQoUKat26tZ599tlkPw9J2rdvn+bPn68dO3bo4sWLqlixorp165bidl69evU0rG2pU6dOGj9+vFq0aKGIiAgNHz5cffv2TXbet99+WytWrFD37t01ZsyYFNt0fIYOo0aN0qhRo/TSSy9pyJAhzufPnTunzz//XJs2bXJum1WqVNHjjz+unj17ytPTM9l2Bw0apF69emnmzJnauHGjLl68qFKlSunxxx/Xiy++qLx58yooKEifffaZgoODFRsbq6pVq7p8lg7Dhw/X6tWrNWLECDVp0kRTpkzRn3/+qbi4OFWsWFGdOnVSt27dbqnF4c8//1RgYKB27typS5cuqVChQqpbt64CAgLUqFGjW+Z3bBs//vijQkNDNW/ePO3Zs0eJiYmy2+0aNGiQmjdvruvXr2vhwoVas2aNTp48qfz58+vBBx/U0KFD090beObMGS1evFhbt27VyZMnFR0drYIFC6pKlSpq06aNevTooXz58qWrzYiICM2fP1+bN29WeHi47rnnHrVo0UIvvfSSJk+erNWrV2vChAnq3Lmzy+u2b9+u5cuXa9euXTp37pyuX7+uIkWKqG7duurRo4fLOnN8Ng6O7dLRrmN7qFevnr7++utbanTsOyXp4MGDt7Q7ZswYNWnSRLNmzdLWrVt14cIFFS1aVE2aNNHgwYNVvnz5NK2Lq1ev6vnnn9f27dtVunRp56kcN77/3r17bxmanJ7vDilr9lUAshbhFsAd2bFjh6ZOnSpJevjhh/XSSy8lO9/PP/+soUOHKi4uTgUKFFCFChVkGIbCwsIUFBTk/Pfxxx+n6X0XL16s8ePHKyEhQQUKFNC9996rS5cuacuWLdqyZYueeOIJffTRR85AVaxYMdWrVy/F9o4ePapLly7Jw8PD5aJYkhQfH6/nnntO27ZtU5EiRVS1alUdO3ZMf/zxh/744w+NGTPGea6xw7Zt2zRkyBBFRkbKw8ND1apV0/Xr13Xo0CEdOHBAK1as0KeffuoShHbt2qV+/fopJiZGhQoVUuXKlRUbG6tDhw5p//79Wr16tZYuXaoyZco4XxMdHa2XX35ZW7ZskSSVLVtWdrtdx44d06pVq/TDDz9o0aJFqlWrVprWq5R08PzUU0/pzJkzqlSpkooXL67Dhw9rxowZ+u677/T555+rdOnSLq+ZO3euJk+erMTEROcPFhcvXtT27du1fft2rV27VrNnz042GP/www9auXKl8ubNq0qVKunff//Vvn37NGzYMMXExGjnzp1av369ihUrpsqVK+vw4cPaunWr/vzzTy1fvlw1atS47TKFhIToueeec/nBIF++fDpy5IhmzZqlNWvWaO7cubeEl3Xr1mnkyJGKj4/XPffco3vvvVenTp3S+++/r/r16yf7XqltZ5cuXdLRo0clJX1W7u7u6tSpk+bMmaO1a9cmG26vXbumH374QZJuCSw3c1z4bc+ePc6QWKxYMZdt5q+//tILL7zg3N4rVaokwzC0d+9e7dmzR2vXrtW8efNu+TuQkn6g6tChgy5evKhq1arJ3d1doaGhmjNnjkJDQ/Xggw/qvffeU/78+VWpUiWFhYU5P8tr166pW7dut7R58OBBTZ8+XTExMbr33nt1/fp17d+/X/v379d///tfffrpp7dsN5MmTdLcuXMlSffcc4/sdrvOnDmjDRs2aMOGDRowYIDefPPNZNfRF198ocWLF6tQoUKqUKGCTpw4oV27dmnQoEGaMWOGAgMDFRQUpFKlSqly5co6dOiQfvrpJ/35559at26dSpUqlepn4PD333/rueee05UrV+Tp6SkfHx/lyZNHYWFh2rVrl3bt2qUNGzboyy+/TPOPenv37tWAAQN04cIFeXh4yG6369KlS1q6dKk2bNigChUqJPu6jz/+WJ999pkkqWjRoqpSpYqioqKcYf/HH3/U+++/r2eeeUaSVKlSJdWrV087d+6UJNntdnl5eSX749qd2LdvnyZNmqSYmBj5+PioYsWKOnLkiFasWKFffvlFq1atctlmkxMbG6vBgwdr+/btKleunL744osUl/9G6f3uuFF27KsAZBIDANLp/PnzRpMmTQy73W40bdrUOH/+fLLzXbp0yXjwwQcNu91uvPvuu0ZMTIxzWnR0tDF27FjDbrcbdrvdOHTokHPaH3/84Xw+Pj7e+fzmzZuN6tWrG7Vq1TK++OIL4/r1685pv//+u9GoUSPDbrcbU6ZMSdNy/PHHH0atWrUMu91urFixItn3r1u3rrFu3TrntCtXrhh9+vQx7Ha7Ub9+fZf6wsLCjLp16xp2u90YNGiQcfbsWee0kydPGs8884xht9uN5s2bG1euXHFOe/rppw273W6MHTvWiI2NdXlNmzZtDLvdbrzzzjsutb/zzjuG3W43GjRoYGzdutWlvhdeeMGw2+1Gs2bNXNZRSqZPn+5c3gc
"text/plain": [
"<Figure size 1152x648 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"f, ax = plt.subplots(figsize=(16, 9));\n",
"\n",
2021-06-01 11:26:48 +02:00
"sns.boxplot(x = \"label\", y = \"tempo\", data = data[[\"label\", \"tempo\"]], palette = 'pastel');\n",
2021-05-31 20:09:39 +02:00
"\n",
2021-06-01 11:26:48 +02:00
"plt.title('Zależność pomiędzy tempem a gatunkiem', fontsize = 25)\n",
2021-05-31 20:09:39 +02:00
"plt.xticks(fontsize = 14)\n",
"plt.yticks(fontsize = 10);\n",
"plt.xlabel(\"Genre\", fontsize = 15)\n",
2021-06-01 12:37:29 +02:00
"plt.ylabel(\"Tempo\", fontsize = 15);"
2021-05-31 20:09:39 +02:00
]
},
2021-06-01 11:26:48 +02:00
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Boxplot dla średnich melowych współczynników cepstralnych sygnału dla poszczególnych gatunków"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Interesujące wyniki pojawiły się także na wykresie pokazującym zależność pomiędzy MFCC mean, czyli średnimi wartościami dla melowych współczynników cepstralnych sygnału a gatunkami muzycznimi. \n",
"\n",
"Najwyższe wartości MFCC_mean dotyczą metalu oraz bluesa, podczas gdy najniższe wartości uzyskiwane są w przypadu popu i muzyki klasycznej. Z kolei najwięcej obserwacji odstających pojawia się w przypadku reggae."
]
},
2021-05-31 20:09:39 +02:00
{
"cell_type": "code",
2021-06-01 11:26:48 +02:00
"execution_count": 7,
2021-05-31 20:09:39 +02:00
"metadata": {},
"outputs": [
{
"data": {
2021-06-01 11:26:48 +02:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA7cAAAI5CAYAAAB6qc0fAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAABzHklEQVR4nO3dd3QUZd/G8WsTQgkJRaR3kKCUhCCCYuhNQJCiUgMoSLEgiAo2QGkqSEcBKWoApQuoKIIUA4IimEiH0EILHZJQEjbz/pE3+xBJWUiZnfD9nOOR7MzO/mZ2M5lr7zI2wzAMAQAAAABgYW5mFwAAAAAAQFoRbgEAAAAAlke4BQAAAABYHuEWAAAAAGB5hFsAAAAAgOURbgEAAAAAlpfN7AIA3L+mTJmiqVOn3tVzxowZo3bt2t3za27btk3dunWTJO3evVvZst1fp8GoqCg1btxYly5dUoMGDTRt2jS5u7un+JyIiAg1bdpUN27cUNu2bfXxxx9nUrVJGzJkiJYvX65WrVpp3LhxGf56DRs21MmTJzVy5Eg999xzGf56zkqoS5J8fHy0atWqVJ/z77//6tlnn3X8vGbNGpUuXdrxc8WKFe+qhr/++kt58uS543G73a61a9dq9erVCg0N1blz52Sz2VSoUCH5+/vr2WefVa1atZx6jeDgYP3www/auXOnzpw5I7vdrgcffFC+vr5q3bq1GjdufFc1ZyXXr1/XhQsXVKJECdNquNvfxxMnTqhRo0aS7vz8AUBa3V9XdQBcStGiRVW9evVU1wsPD9e5c+ckSR4eHhld1j27Pax/8803Tl+8Z6Yvv/xSly5dUrVq1TRhwoRUg60kTZ48WTdu3FCdOnU0cuTITKgSd+vAgQM6fPiwypUrl+J6P/30k1PbK1OmjB544IFU10vq83P48GENHDhQ+/btkyR5enqqTJkyunXrlk6cOKGVK1dq5cqVeuqppzRmzBh5enomue2zZ8/qrbfe0tatWyVJOXLkULFixeTh4aETJ07ol19+0S+//KKaNWtq4sSJKlCggFP7llWsWrVKY8eO1WuvveZSX7oAgJkItwBM8+yzzyZqRUrKwYMH1alTJ0mSr6+vmjVrlhmlZUlnz57V119/rTJlyuiLL75Qrly5Un1OWFiYli9frqpVq2ry5Mku0dL9xhtv6KWXXpK3t7fZpbiEbNmy6datW/r555/18ssvJ7ueYRj6+eefndpmnz597qmHxF9//aXevXvr2rVrqlKlivr376969eo5lt+4cUPfffedJk2apJ9//lmXLl3SnDlz7vhchYWFqWvXrrp48aLKlCmj/v3766mnnnKE6Vu3bmnFihX67LPP9Oeff6p79+767rvv5OXlddc1W9WECRMUERFhdhl3rXDhwo4vWYoVK2ZyNQCyGsbcAnBZV65c0csvv6zIyEjly5dPkyZNUvbs2c0uK1n58+dX2bJlVbZsWaeCY2YrVKiQ/vnnH/3yyy9OtcpJUvny5bVnzx4tWbIk2Ra2zFaoUCGVL19ehQoVMrsUl/D4449LUqrB9Z9//tGpU6dUqVKlDKnj0qVLGjRokK5du6aAgAAtWLAgUbCVpJw5c6pHjx6aNm2abDabtm3bpnnz5iVaJyYmRm+88YYuXryoSpUqaeHChWrZsmWiVuJs2bKpffv2+uqrr5QzZ04dPHhQEydOzJD9Qvry8PBQ+fLlVb58eZfuiQPAmgi3AFySYRgaPHiwjh8/Ljc3N40dO9blv+Xv2rWrfv75Z/3888/y9fU1uxzcJ+rVqydPT0/t379fR44cSXa9hNayFi1aZEgdEydOVEREhDw9PTV27FjlyJEj2XVr166t5s2bS5Lmzp2ruLg4x7KvvvpK+/btc/ze58uXL9nt+Pj4KDAwUJK0ePFiRUVFpc/OAAAsiXALwCXNmjVL69evlyT169dPdevWTXK9Y8eOadSoUWrdurVq1KihypUrq1atWurWrZsWLVoku91+V6+7b98+DR48WPXr11eVKlVUq1Yt9ezZU7/88ssd6y5btkwVK1Z06r9t27ZJip/QqmLFinr++ecVGxurOXPmqHXr1vLz81ONGjXUvXt3rV27Ntn6IiIi9Mknn6hFixby8/OTv7+/nnnmGU2dOlVXr15N8jlhYWF655131LJlS1WrVk2PPvqo2rRpowkTJujChQtJPscwDK1YsUIvvPCCnnzySVWpUkUNGjTQO++8o6NHjzp9PKdMmaKKFStqwoQJOnfunIYNG6a6deuqatWqaty4sSZMmKCYmBjHsenZs6cee+wx+fr6qm3btvr+++/v2OaQIUNUsWJFvfnmm3css9vtWr58ubp166aaNWuqSpUqatiwoT744IMU6z579qw++eQTNWvWTL6+vqpfv74++eSTZMNSYGCgU+97w4YNJcV3pa5YsaL69u2bbA0rVqxQxYoV7zp85sqVy9FCmlzrbVxcnH7++Wd5enqqQYMGd7V9Z9y8eVMrV66UFD/cwJmeAa+88oomT56spUuXys3tf5cjixcvliQ1aNBADz30UKrbCQwM1NixY/XTTz/dVbfkq1evaubMmerSpYtq1aqlypUrq0aNGmrXrp2mTJmiK1euOL2tBJcvX9aUKVMcv2tPPPGEBg0apCNHjjh+F6ZMmXLH8/bu3asPPvhAzZs3V/Xq1VWlShXVrl1bL7300h3vacJ2EiYTe//99xNtN+G8lNw588SJE47P54kTJ+7Y7rhx43Tx4kWNHDlSDRs2dNQycOBA7d+/3+ljYbfbNWjQIFWsWFE1atTQP//8c8frHzt27I7nhYeHa/jw4WrSpImqVq2qGjVqqHPnzlq8eHGS5/OE88HixYt19OhRDRo0SE8++aR8fX3VokULzZ07V4ZhSIqfwKpLly6qXr26qlWrpo4dO2rjxo1O7xMA12f+4CkA+I/t27c7uhg++eSTevXVV5Ncb+3atRo4cKBiYmLk6empkiVLyjAMnThxQtu2bXP899lnnzn1uvPnz9eoUaNkt9vl6empChUq6PLlywoODlZwcLCefvppffrpp47ukQUKFEhxQqzDhw/r8uXL8vDwUMGCBRMti42N1UsvvaQ//vhD+fPnV/ny5XXkyBFt3bpVW7du1fDhwx1jjRP88ccfeu211xQZGSkPDw899NBDunXrlg4cOKB9+/ZpyZIlmjFjRqIZb3fu3KkXX3xR165dU548eVS2bFndvHlTBw4c0N69e7V8+XItXLhQRYsWdTwnOjpa/fv3V3BwsKT4cXE+Pj46cuSIli1bpp9//lnz5s1T5cqVnTquUvyXEM8884wuXbqkhx56SO7u7goPD9f06dMVHh6uxx57TB9++KFy5cqlMmXK6MSJE9qzZ48GDx6sGzduqGPHjqm+RnR0tF599VVt2bJFUvzYvhIlSujo0aNatGiRVq5cqbFjx6pp06aJnrdv3z716tVL586dk4eHh3x8fHTlyhXNmTNHv//+u65fv37Ha/n4+OjWrVtJ1mG32xUSEuI4dpLUvn17/fjjjwoODtbFixeTDH8rVqyQJLVt2zbVff2v5s2ba/Xq1fr555/Vr1+/O5Zv375dZ8+e1dNPP62cOXPe9fZTs3PnTl27dk1S/O+sMx566KE7wmt4eLiOHz9+V9spXLiwWrdufRfVSkePHlWPHj10+vRpZcuWTaVKlVLx4sV18uRJ7d69W7t379aPP/6opUuXKnfu3E5tMzw8XD179tSxY8fk7u6uChUq6ObNm/rhhx/022+/JXuuWLBggUaMGKG4uDjlzZtXpUuX1o0bN3TixAlt2rRJmzZtUt++fTVw4EBJ/5uIb9euXYqJiVHp0qVVoECBRL/DaXHq1Cm1adNGZ8+eVbFixVS+fHkdOHBAP/30k9avX6/58+en+rsfFxend955Rz/88IPy5s2r2bNnq2rVqqm+9q+//qo333xTN27cUM6cOVWuXDldv35df//9t/7++2/9+OOPmjZtWpLvyY4dOzRq1CjdunVL5cuXl81mU1hYmD7++GNdvHhRNptNM2bMUJ48eVSmTBkdOXJEO3fuVJ8+fTRz5sxkvww
2021-05-31 20:09:39 +02:00
"text/plain": [
"<Figure size 1152x648 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"f, ax = plt.subplots(figsize=(16, 9));\n",
"sns.boxplot(x = \"label\", y = \"mfcc4_mean\", data = data[[\"label\", \"mfcc4_mean\"]], palette = 'pastel');\n",
"\n",
2021-06-01 11:26:48 +02:00
"plt.title('Zależność między MFCC a gatunkiem', fontsize = 25)\n",
"plt.xticks(fontsize = 14)\n",
"plt.yticks(fontsize = 10);\n",
"plt.xlabel(\"Genre\", fontsize = 15)\n",
2021-05-31 20:09:39 +02:00
"plt.ylabel(\"mfcc4_mean4\", fontsize = 15);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Korelacja między cechami średnimi"
]
},
2021-05-31 20:09:39 +02:00
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-06-01 17:48:18 +02:00
"W procesie badania zależności pomiędzy dostępnymi cechami wykorzystana została mapa ciepła, która jednak pokazała, że w wielu przypadkach korelacje nie zachodzą, co jest szczególnie widoczne w przypadku średniej częstotliwości melodycznej cepstrum2 (mfcc2_mean), a jeżeli takowe korelacje zachodza to mają stosunkowo niewielkie wartości.Występowanie zależności widać w górnej oraz środkowej częsci mapy."
2021-05-31 20:09:39 +02:00
]
},
{
"cell_type": "code",
2021-05-31 15:14:01 +02:00
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
2021-05-31 14:33:39 +02:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA00AAAL4CAYAAACncxxeAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAADocklEQVR4nOzdeVxV1f7/8RcoEOjBKVQcUsBA1HAsLZMG05tXyaI0wTStSCustMEMLNK0q+ZQGOQ8IGpws7pOpahXzdKiuqGZIc6mAgrGoIzn/P7wx/l6BI5oh5B4Px8PHo/NOvt81mctDsXHtffadiaTyYSIiIiIiIiUyb6qExAREREREbmRqWgSERERERGxQkWTiIiIiIiIFSqaRERERERErFDRJCIiIiIiYoWKJhEREREREStUNIlItfbLL7/w1ltv8eCDD9KxY0e6dOnCkCFDiI2NpaioqKrTs2rYsGH4+PiQlZV1Xe8vLi5mxYoVXLhwwdwWGRmJj48PCQkJtkrzmvn4+HD//fdbPeeNN97Ax8eHPXv2/CU5HTlyhI0bN/4lff0VoqOjGThwIEajsapTqRKzZ8/m8ccfr7HjF5G/noomEamWjEYjH3zwAY8++iifffYZXl5eBAcH889//pMzZ84wadIkRo4cSV5eXlWnWmleeeUVJk+ebFEc3nHHHYSGhuLh4VGFmd1YDhw4QEBAAD/++GNVp2IThw4dIioqivHjx2NvXzP/Nx4SEsKJEydYsWJFVaciIjVE7apOQETkenz88cdERUXRqVMnPvzwQ5o0aWJ+raCggDfffJO1a9fyxhtvMGfOnKpLtBKdO3euVFv37t3p3r17FWRz4/rjjz8oLCys6jRsJiIigo4dO3LXXXdVdSpVpm7dujz77LPMnj2bfv364ebmVtUpicjfXM38JyoRqdaOHDlCVFQUDRs2ZMGCBRYFE4CjoyPvvfcezZs358svv+TQoUNVlKmIbf3888989913DBs2rKpTqXKPPfYYJpOJmJiYqk5FRGoAFU0iUu18/vnnFBYWMnToUFxdXcs8x8HBgYkTJzJ16lQaNGhg8dqGDRsYMmQInTp1onPnzgwZMoT169dbnHPy5El8fHz44IMPePfdd+nUqRPdu3dn48aN5vuGvv32WwYNGkSHDh34xz/+QW5uLgDp6elERETg7+9Phw4duP/++5kxYwY5OTlXHVthYSHLli1j8ODBdO3alQ4dOnDffffx1ltvkZGRYT7Px8eH7777DoDbb7/d/Ed0efc07dq1i5EjR9KlSxf8/Px45JFHiI2NLXVPyP3338+wYcM4dOgQo0ePpmvXrnTu3JmQkBAOHDhw1fz/rGuZu+TkZF577TXuueceOnToYL6f7auvvjKfExkZyfDhwwFYvny5xX1UPj4+hIWF8d133xEcHEzHjh25++67mTVrFsXFxaSkpPD000/TuXNnevXqxeTJk7l48aJFDrm5uXz00UcMHDiQzp07c9ttt9G3b1+mT59uca9Zyedp9uzZbNy4kf79++Pn58c//vEPFi9eXOF7cxYvXkzdunVL3TM2bNgw+vTpw++//85LL71Et27d6NatGy+++CIZGRlkZWUxceJEunfvzh133MHo0aM5efJkqfi//PILzz//PN27d8fPz4+BAweyatUqTCZTqXO3bdvGM888Q48ePWjfvj09evTg+eef59dffy2V2/3338+ZM2d45ZVX6N69Ox07dmTo0KGl7mkrKipi7ty5BAQE0LFjR+644w6efvppvv3221L9161bl3vvvZfVq1dbzLWISGXQ5XkiUu3s3LkTgF69elk977777ivVNm3aNBYvXoybmxsDBgwA4L///S/jxo1j//79vPbaaxbnx8XFARAUFMThw4fp1KkTKSkpALz66qt4enoybNgwcnNzqVOnDqdOnSIoKIjU1FTuu+8+vLy8+PXXX1m4cCHffPMNsbGxuLi4lJvzK6+8wldffUXXrl0ZPHgwBQUFfP3113zyySf88ssvfPrppwCEhoby2Wef8fvvvxMSEoKnp2e5MWNiYnj33XcxGAz06dMHFxcXdu7cyaRJk0hMTGTWrFnY2dmZzz99+jRBQUG0atWKwYMHc+TIEbZt28bPP//M1q1bqVu3rtV5v17XMndJSUkMGzYMR0dH+vbtS8OGDTl27BhbtmzhxRdf5OOPP+a+++7jjjvu4JFHHuGzzz6jY8eO9OrVi+bNm5v7/Pnnn/niiy+49957CQoKYtOmTcybN49z586xadMmOnToQFBQEDt27GDFihXUqlWLN998E7j0B/7IkSNJSkri7rvv5u677yY3N5etW7eyaNEiTp48yYcffmgxxp07dzJv3jzuvfdeevbsyfbt25k2bRrJycn861//sjo/eXl5bNmyBX9/fxwcHEq9npOTQ1BQEE2bNmXw4MH88MMPfPXVV2RmZnLhwgXy8/N55JFHOHjwINu2bSMtLY1PP/3U/LPfvn07oaGhODg4mOd0586dREREsH//fiZPnmzua8WKFUyePJlbbrmFAQMG4ODgwN69e9myZQu7d+/myy+/pHHjxubzc3NzCQ4OxtnZmYcffpizZ8+yYcMGnn76aTZs2MAtt9wCwOTJk1m9ejV33HEH/v7+ZGdnm89bsmRJqUtP7777bjZu3MjXX39N3759r/oZExG5biYRkWrmzjvvNHl7e5vOnz9/Te/7/vvvTd7e3qaHH37YdO7cOXP7uXPnTAMGDDB5e3ubvvvuO5PJZDKdOHHC5O3tbfLx8TH9+uuvFnE+/PBDk7e3t+nRRx81FRcXW7wWEhJi8vHxMW3dutWifdmyZSZvb2/TtGnTzG1PPPGEydvb2/THH3+YTCaT6aeffjJ5e3ubXnnlFYv3FhYWmvM7fPhwue+/PLfNmzebTCaT6fjx46Z27dqZ7r33XtPx48fN5+Xm5pqGDx9u8vb2Nn322Wfm9vvuu8/k7e1teuedd0xGo9HcHh4ebvL29jbFx8dbmeFLvL29TV27djV9+OGH5X499NBDJm9vb9Pu3buva+6eeuopU7t27UwpKSkW565fv97k7e1tGjdunLlt9+7dJm9vb9O7775bKk9vb2/TkiVLzG2HDh0yt//rX/8yt2dnZ5u6dOliuvPOO81t69atM3l7e5tmzZplETc7O9t01113mXx9fU0XLlwwmUz/93ny9vY2LVy40Hxubm6uafDgwaXmoizffPONydvb2/TRRx+Veq3ks/DCCy+Yf26FhYWme++91+Tt7W16/PHHTfn5+aXOL5m/CxcumHr06GHq0aOH6cSJE+bziouLTWPGjDF5e3ub/vvf/5pMJpMpPz/f1KVLF1Pfvn1Nubm5Fnm8/fbbJm9vb9Pq1atL9fXcc8+ZCgoKzO3R0dEmb29v05w5c8zz1rZtW9PQoUMtYiYlJZm8vb1NY8aMKTXuX3/91eTt7W2aPHmy1bkTEfmzdHmeiFQ7JVt016lT55ret2bNGgBef/11GjZsaG5v2LAhr7zyCoB5JadEq1ataNu2bZnx+vTpY7F7WVpaGjt27OCee+4ptcr1xBNP4O7ubs6hLE2bNuVf//oXL730kkV77dq16dq1K1D25g/W/Oc//6GoqIgXXniBli1bmttdXFwIDw8HSo8ZLu1Odvnq0z333APA0aNHK9RvdnY2c+fOLffrykv9rnXuRowYwYwZM/Dy8rI4t2QloqLz5OjoSHBwsPl7T09P8+WcTz31lLm9bt26eHl5ce7cOfOOjO3atePdd99lxIgRFjHr1q1Lu3btKC4u5o8//rB4rXnz5jz55JPm711cXHj55ZcBWLt2rdVcf/nlFwDatGlT7jnDhw83/9xq167NbbfdBmBelSvRsWNHAPMlelu3biUjI4NnnnmGFi1amM+zt7cv9btRXFzM5MmTmTJlSqlV0zvuuAMoe/6feuopixWyKz9TRqMRk8nEqVOnOH36tPm82267jYSEBGbOnFkqpqenJ/b29uzbt6/cORERsQVdnici1U79+vVJT08nKyvLovi5mgMHDmBvb28uQC5X0nblH/OX/wF5pcsv8wLYv38
"text/plain": [
"<Figure size 1152x792 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"mean_cols = [col for col in data.columns if 'mean' in col]\n",
"mean_correlation = data[mean_cols].corr()\n",
"\n",
"\n",
"mask = np.triu(np.ones_like(mean_correlation, dtype=bool))\n",
"f, ax = plt.subplots(figsize=(16, 11))\n",
"cmap = sns.diverging_palette(150, 275, as_cmap=True, s = 90, l = 45, n = 5)\n",
"\n",
"sns.heatmap(mean_correlation, mask=mask, cmap=cmap, vmax=.3, center=0,\n",
" square=True, linewidths=.5)\n",
"\n",
"plt.title('Correlation Heatmap (means)', fontsize = 20)\n",
"plt.xticks(fontsize = 10)\n",
"plt.yticks(fontsize = 10);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Korelacja między cechami wariancji"
]
},
2021-05-31 20:09:39 +02:00
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-06-01 11:26:48 +02:00
"Odwrotna sytuacja ma miejsce w przypadku mapy ciepła dla cech wariancji, w przypadku ktorych korelacja nie zachodzi wyłącznie dla dwóch parametrów czyli harmony i perceptr w środkowej cześci wykresu. Z kolei stosunkow wysokie wartości korelacji można zaobserwować dla parametrów \"skrajnych\", czyli pierwszych i ostatnich na liście parametrów.\n"
2021-05-31 20:09:39 +02:00
]
},
{
"cell_type": "code",
2021-05-31 14:33:39 +02:00
"execution_count": 9,
2021-05-31 20:09:39 +02:00
"metadata": {
2021-06-01 11:26:48 +02:00
"scrolled": false
2021-05-31 20:09:39 +02:00
},
"outputs": [
{
"data": {
2021-05-31 14:33:39 +02:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA0gAAALtCAYAAAASQUQyAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAADaBUlEQVR4nOzdeViV1drH8e8GwRidDpKJJVIMZqBoYZo0WJqmqagkmFNlNDiX2VFTnMuJHNKTsyhZkmgRaippZINlg2hk4ZRpBqgYODDv9w9hv+EGRduIyO9zXVwXe+313OteCzyHu/U8axuMRqMRERERERERwaqiExAREREREblRqEASEREREREppAJJRERERESkkAokERERERGRQiqQRERERERECqlAEhERERERKaQCSUQqtZ9//plx48bx+OOP4+fnh7+/P7169SIqKoq8vLyKTu+y+vTpg5eXFxkZGdd0fX5+PqtXr+b8+fOmtnnz5uHl5cW2bdssleZV8/Ly4pFHHrlsn9dffx0vLy927dp1XXI6fPgwmzZtui5jXQ8LFy6kS5cuFBQUVHQqpcrJyeHRRx/lvffeq+hURESuigokEamUCgoKmDNnDt27d2f9+vV4eHgQGhpKx44d+euvv5g4cSIDBgwgKyurolMtN6+88gqTJk0qVgjed999DBo0CHd39wrM7Mayf/9+OnfuzA8//FDRqVjEwYMHWbBgAaNGjcLK6sb9v3FbW1tGjBjBzJkz+euvvyo6HRGRMqtW0QmIiFyL//3vfyxYsICmTZsyd+5cXF1dTe/l5OQwevRoYmNjef3113n77bcrLtFydOrUKbO2gIAAAgICKiCbG9fff/9Nbm5uRadhMeHh4fj5+dGqVauKTuWKOnTowKJFi5g6dSpz586t6HRERMrkxv1PTyIipTh8+DALFiygdu3aLF68uFhxBBf/y/W0adOoX78+mzdv5uDBgxWUqYhl7dmzh2+//ZY+ffpUdCplYjAY6N27N1u2bOHQoUMVnY6ISJmoQBKRSmfDhg3k5ubSu3dvnJ2dS+xjY2PDG2+8wdSpU6lVq1ax9zZu3EivXr1o2rQpzZo1o1evXsTFxRXrc+zYMby8vJgzZw6TJ0+madOmBAQEsGnTJtNzPl9//TU9e/akSZMmtG/fnnPnzgGQlpZGeHg4gYGBNGnShEceeYQZM2Zw9uzZK84tNzeXlStXEhwcTPPmzWnSpAkPP/ww48aN4/Tp06Z+Xl5efPvttwDce++9pj+YS3sG6csvv2TAgAH4+/vj6+tLt27diIqKMnuG5ZFHHqFPnz4cPHiQF154gebNm9OsWTMGDhzI/v37r5j/v3U1a/fbb78xcuRIHnzwQZo0aWJ6/uzTTz819Zk3bx59+/YFIDIysthzT15eXowZM4Zvv/2W0NBQ/Pz8eOCBB5g9ezb5+fkcOHCAZ599lmbNmtGmTRsmTZrEhQsXiuVw7tw53nnnHbp06UKzZs245557aNeuHdOnTy/2bFjR71NERASbNm3iiSeewNfXl/bt27Ns2bIyP0u0bNkyHB0diz3j9cILL+Dl5VViARIXF4eXlxdLliwxtX3//fcMGjSIBx54gCZNmnDvvfcyYMAAvvnmm2LX9unTh0ceeYTPP/+cRx55BD8/P4YOHQrA77//ztChQ3n44YdNP6fw8HDS0tLMcujQoQPVqlVjxYoVZZqjiEhFU4EkIpXOF198AUCbNm0u2+/hhx8mKCiI2rVrm9reeusthg8fzrFjx+jUqRNPPPEEx44dY8SIEcyYMcMsxtq1a9m0aRMhISE0bdqUpk2bmt579dVXueWWW+jTpw8BAQE4ODjw559/0qNHD95//33uvvtu+vfvj7u7O0uWLKFPnz7F/mguySuvvMLUqVOpVq0awcHBPPXUU9ja2vLBBx8wcOBAU79BgwZRv359AAYOHEi3bt1Kjblq1SqeeeYZ9u7dy2OPPUb37t3JzMxk4sSJvPLKKxiNxmL9T5w4QUhICKdOnSI4OJiAgAASEhLo27dvmYq8a3U1a5eYmEjPnj3ZsWMHDzzwAAMGDOCBBx5g7969DBkyhO3btwMXn8kqWhs/P79i6wYXd2SeeeYZateuTUhICLa2trz77ruMGzeOkJAQCgoKCAkJoUaNGqxevZqIiAjTtXl5eQwYMIB58+bh4uJCaGgo3bt3Jysri6VLl/L666+bzfGLL75g+PDhNGjQgF69egEXfydHjx59xfXJysoiPj6egIAAbGxsTO1PPvkkQImHUMTFxWEwGOjUqRMA27Zto0+fPvz00088+uij9OvXj2bNmvH111/z7LPP8ssvvxS7Pj09nWHDhuHv70+3bt1o0aIFp0+fpn///nz++efcd999DBgwgDvvvJM1a9bQt29fs9sZHR0d8fPzY9OmTTf0oRIiIiZGEZFK5v777zd6enoaz5w5c1XXfffdd0ZPT09j165djadOnTK1nzp1ytipUyejp6en8dtvvzUajUbjH3/8YfT09DR6eXkZf/nll2Jx5s6da/T09DR2797dmJ+fX+y9gQMHGr28vIyfffZZsfaVK1caPT09jW+99Zap7emnnzZ6enoa//77b6PRaDT++OOPRk9PT+Mrr7xS7Nrc3FxTfocOHSr1+n/mtnXrVqPRaDQePXrU2LhxY+NDDz1kPHr0qKnfuXPnjH379jV6enoa169fb2p/+OGHjZ6ensYJEyYYCwoKTO1jx441enp6GqOjoy+zwhd5enoamzdvbpw7d26pX08++aTR09PT+M0331zT2j3zzDPGxo0bGw8cOFCsb1xcnNHT09M4YsQIU9s333xj9PT0NE6ePNksT09PT+Py5ctNbQcPHjS1v/nmm6b2zMxMo7+/v/H+++83tX3yySdGT09P4+zZs4vFzczMNLZq1cro4+NjPH/+vNFo/P/fJ09PT+OSJUtMfc+dO2cMDg42W4uSfPXVV0ZPT0/jO++8U6w9KyvL6O/vb3ziiSeKtWdkZBibNGlifPrpp01t7du3N953333GtLS0Yn0XLVpk9PT0NM6aNcvUVvT7NW3atGJ9V61aZfT09DR++OGHxdonTJhg9PT0NG7fvt0s96lTpxo9PT2N+/btu+wcRURuBNpBEpFKp+hYbAcHh6u6LiYmBoDXXnut2K5S7dq1eeWVVwBYt25dsWvuuOMOvL29S4z32GOPFTtFLDU1lYSEBB588EEefvjhYn2ffvpp6tWrZ8qhJLfeeitvvvmm6TamItWqVaN58+ZAyQczXM7HH39MXl4eL7/8Mg0aNDC129vbM3bsWMB8znBxV8pgMJheP/jggwAcOXKkTONmZmYyf/78Ur8uvV3vateuf//+zJgxAw8Pj2J9iw6oKOs62draEhoaanrdqFEj0y2ZzzzzjKnd0dERDw8PTp06ZToZsXHjxkyePJn+/fsXi+no6Ejjxo3Jz8/n77//LvZe/fr16devn+m1vb09w4YNAyA2Nvayuf78888A3HnnncXaq1evTrt27UhOTiY5OdnUvm3bNnJyckw7TAUFBbzyyitMnz6d//znP8ViXG7d2rdvX+x10S7Qnj17yM/PN7UPHz6cnTt38tBDD5nFKMq5aA4iIjcynWInIpVOzZo1SUtLIyMjo1ihcyX79+/HysrKVGz8U1HbpX+4u7m5lRrvn7dqASQlJWE0Gjlz5gzz5s0z629jY8OJEydISUkxO1gCLhZI3bp1Iy8vj59//pnDhw9z9OhRfvnlF7766iuAq75FqWg+9957r9l7d911F87OzmZzrl69OvXq1SvW5ujoCFw8IbAs6tevz2effVbq+6+//jrr1683vb7atSu6vTItLY39+/dz9OhRDh8+zPfffw9Q7A/3y6lXrx62trbF2uzt7Tl//jwuLi7F2qtXrw5cXINbbrkFd3d33N3dyc7OZs+ePaaf188//2x6PuzSPJo1a0a1asX/r9fX1xcw/927VFHxcukzdXDxNruYmBg2btxoKrDj4uKwtbU1FThWVlY89thjABw/fpzk5GSOHj3KgQMHTM9llfT7denvefv27XnnnXf44IMP2LJ
"text/plain": [
"<Figure size 1152x792 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"var_cols = [col for col in data.columns if 'var' in col]\n",
"var_correlation = data[var_cols].corr()\n",
"\n",
"\n",
"mask = np.triu(np.ones_like(var_correlation, dtype=bool))\n",
"f, ax = plt.subplots(figsize=(16, 11))\n",
"cmap = sns.diverging_palette(240, 10, as_cmap=True, s = 90, l = 45, n = 5)\n",
"\n",
"sns.heatmap(var_correlation, mask=mask, cmap=cmap, vmax=.3, center=0,\n",
" square=True, linewidths=.5)\n",
"\n",
"plt.title('Correlation Heatmap (vars)', fontsize = 20)\n",
"plt.xticks(fontsize = 10)\n",
"plt.yticks(fontsize = 10);"
]
},
2021-06-01 12:37:29 +02:00
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Cechy średnie dają lepsze rezultaty niż cechy wariancji ze względu na mniejszą korelację pomiędzy poszczególnymi parametrami.(https://datascience.stackexchange.com/questions/9087/correlation-and-naive-bayes). \n",
"\n",
"W toku przeprowadzanych testów okazało sie, że dokładność stworzonego i wytrenowanego modelu zależy od rodzaju cech. W przypadku cech wariancji dokładność jest niższa niż w przypadku cech średnich. Z kolei najwyższą dokładność udało się uzyskać poprzez wykorzystanie kombinacji 8 różnych kolumn.\n",
"\n",
"- dla var_cols accuracy = 0.3875,\n",
"- dla mean_cols accuracy = 0.4375,\n",
"- dla ['mfcc4_mean', 'mfcc12_mean', 'mfcc9_var', 'mfcc1_mean', 'rms_mean', 'chroma_stft_mean', 'mfcc6_var', 'mfcc9_mean'] accuracy = 0.56125 \n",
"\n",
"Równocześnie uzyskane wyniki mogłyby mieć zdecydowanie wyższą dokładność jednak ograniczeniem okazał się specyfika samego datasetu, który posiada niewielkie zróżnicowanie wartości cech i niewielką korelację pomiędzy poszczególnymi cechami!"
]
},
2021-05-31 20:09:39 +02:00
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-06-01 13:05:09 +02:00
"### Wykres punktowy przedstawiający zależność pomiędzy chroma_stft_mean (wysokością dźwięku) a mfcc12_mean (melowym współczynnikiem cepstralnym )"
2021-05-31 20:09:39 +02:00
]
},
2021-06-01 11:26:48 +02:00
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-06-01 13:05:09 +02:00
"Wykres ten pokazuje, że chociaż pojawia się pewna pula obserwacji odstających, to jednak wraz ze wzrostem wartości mfcc12_mean rosną wartości chroma_stft_mean. Tym samym zależność pomiędzy tymi dwiema wartościami, w ogólności, ma charakter liniowy, co potwierdza wynik uzyskany na heatmapie, gdzie korelacja wynosiła 0.2."
2021-06-01 11:26:48 +02:00
]
},
2021-05-31 20:09:39 +02:00
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"<ipython-input-10-bcf947f4e0ac>:3: MatplotlibDeprecationWarning: Adding an axes using the same arguments as a previous axes currently reuses the earlier instance. In a future version, a new instance will always be created and returned. Meanwhile, this warning can be suppressed, and the future behavior ensured, by passing a unique label to each axes instance.\n",
2021-06-01 11:26:48 +02:00
" ax = fig.add_subplot()\n"
2021-05-31 20:09:39 +02:00
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAlgAAAJBCAYAAACav8uPAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAD2sklEQVR4nOyde3xcdZn/32dmkslkMp00TZOGJG2RGugNMD8vEcvNEkCxyBoNl2p30RawWsGuIja6oe4vBZUtsFXWtbho3QrkZxQprrC1glvEitqFXiUWe0lKmjYNTZNJMs3MnN8fkzOZyzkzZ27JtHnevHil+c6Zc5mTZJ55ns/zeRRVVVUEQRAEQRCEjGGZ6BMQBEEQBEE415AASxAEQRAEIcNIgCUIgiAIgpBhJMASBEEQBEHIMBJgCYIgCIIgZBgJsARBEARBEDKMBFiCIAiCIAgZxjbRJxDN2297CATOPWuuadOKOHlyYKJPQ9BB7k3uIvcmd5F7k7vIvRkfLBaFqVOdho/nXIAVCKjnZIAFnLPXdS4g9yZ3kXuTu8i9yV3k3kw8UiIUBEEQBEHIMBJgCYIgCIIgZBgJsARBEARBEDKMBFiCIAiCIAgZRgIsQRAEQRCEDCMBliAIgiAIQoaRAEsQBEEQBCHDSIAlCIIgCIKQYSTAEgRBEARByDASYAmCIAiCIGQYCbAEQRAEQRAyjARYgiAIgiAIGUYCLEEQBEEQhAwjAZYgCIIgCEKGkQBLEARBEAQhw0iAJQiCIAiCkGEkwBIEQRAEQcgwEmAJgiAIgiBkGNtEn4AgCMJEYz/Vh/N4Dxafj4DNhqesFG+xe6JPSxCEsxgJsARBmNTYT/Xh6upGUVUArD4frq5uAAmyBEFIGSkRCoIwqXEe7wkFVxqKquI83jNBZyQIwrmABFiCIExqLD5fUuuCIAhmkABLEIRJTcCmr5QwWhcEQTCDBFiCIExqPGWlqIoSsaYqCp6y0gk6I0EQzgXkI5ogCJMaTcguXYSCIGQSCbAEQZj0eIvdElAJgpBRpEQoCIIgCIKQYSTAEgRBEARByDASYAmCIAiCIGQYCbAEQRAEQRAyjARYgiAIgiAIGUYCLEEQBEEQhAwjAZYgCIIgCEKGkQBLEARBEAQhw0iAJQiCIAiCkGEkwBIEQRAEQcgwEmAJgiAIgiBkGAmwBEEQBEEQMowEWIIgCIIgCBlGAixBEARBEIQMIwGWIAiCIAhChpEASxAEQRAEIcNIgCUIgiAIgpBhJMASBEEQBEHIMBJgCYIgCIIgZBgJsARBEARBEDKMBFiCIAiCIAgZRgIsQRAEQRCEDCMBliAIgpAx7PZWSkrmU1rqpqRkPnZ760SfkiBMCLaJPgFBEATh3MBub8XlWoWiDAFgtXbgcq0CwOttnMhTE4RxRzJYgiAIQkZwOteGgisNRRnC6Vw7QWckCBOHBFiCIAhCRrBYOpNaF4RzGQmwBEEQhIwQCFQltS4I5zISYAmCIAgZweNpRlUdEWuq6sDjaZ6gMxKEiUMCLEEQBCEjeL2N9PdvwO+vRlUV/P5q+vs3iMBdmJRIF6EgCIKQMbzeRgmoBAHJYAmCIAiCIGSctAKs73znO9xwww3ccMMNfOtb3wLglVdeYcmSJVx77bU8/PDDGTlJQRAEQRCEs4mUS4SvvPIKL7/8Mj//+c9RFIXly5fz3HPP8dBDD/HjH/+YiooK7rzzTn77299y5ZVXZvKcBUEQhLMA+6k+nMd7sPh8BGw2PGWleIvdE31agjAupJzBmj59Ovfddx/5+fnk5eVxwQUXcOjQIWbNmkV1dTU2m40lS5bw/PPPZ/J8BUEQhLMA+6k+XF3dWH0+FMDq8+Hq6sZ+qm+iT00QxoWUM1jvfOc7Q/8+dOgQv/rVr/jkJz/J9OnTQ+tlZWV0d3cntd9p04pSPaWcZ/p010SfgmCA3JvcRe5N7hL33rx5EFQ1YklRVaac7IUpDjh4FLxnwJ4P51dC+bQsn+3kQn5vJp60uwj/+te/cuedd3LvvfditVo5dOhQ6DFVVVEUJan9nTw5QCCgJt7wLGP6dBcnTvRP9GkIOsi9yV3k3uQuie5NqfcMen/9Ve8ZeOMQihZ8ec+gvnGI/tNDUj7MEPJ7Mz5YLErcpFBaIvc///nP/MM//AP/+I//yN/93d8xY8YMTpw4EXr8xIkTlJWVpXMIQRAE4SwkYDP+/K7oZLacx3uyfUqCMK6kHGB1dXXxuc99joceeogbbrgBgEsuuYSDBw9y+PBh/H4/zz33HFdccUXGTlYQBEE4O/CUlaJGVTCivw/H4vNl+5QEYVxJuUT4gx/8AK/Xy4MPPhhau+WWW3jwwQdZtWoVXq+XK6+8kuuvvz4jJyoIgiCcPWjlvuguQufxHqw6wVS8jJcgnI0oqqrmlOBJNFjCeCP3JneRe5O7pHpvtO7C8DKhqij0V5SLBitDyO/N+JBIgyUfGQRBECYBueJJZZTZkuBKONeQAEsQBOEcJzprpHlSARMWZElAJZzrSIAlCIJwjuM83mPYuZfrgU6uZN6E1Jms91ACLEEQhCzQ3rafHS0vM3C0n6JKF3VNi6hpmDsh52LUoZfrnXu5lnkTkmcy30MJsARBEDJMe9t+Xlq9Fd9QMIAZ6OznpdVbASYkyArYbEl17uVKxuFszrwJQSbzPUzLaFQQBEGIZUfLy6HgSsM35GNHy8sTcj5GnlSestKYbU3PEOw+SUn7m5Tue4OS9jezMmPwbM28CWNM5nsoGSxBEIQMM3BUv0XeaD3T6JUnFy4+z1RWykzGwX6qD44dxxoIAJkv+2gZNCPEM+vsIdns6bnEuX+FgiCkRK6UiSaCdK+9qNLFQGdsMFVUmf0BvIblyfX1psqTZjIOzuM9MBpcaWSq7KPnkxWOUeZNyE08ZaW6vmeT4R5KiVAQhBhMl4nOQTJx7XVNi7A5Ij+/2hw26poWZfhsY0m3PGmUWQhfz2bZRy+DBqACfptNDEnPMrzFbvoryvHbbJPuHkoGSxCEGCazMDUT165liiaiizDd8qSZjEM2yz7xgrTemgvS3r8w/kxW3zMJsARBiGEyC1Mzde01DXMnpGMw3fKkGad1T1kpU44djygTZqrsM5k1O8K5hZQIBUGIwUyZ6FzlbL/2TJQnvcVuemsuoGfehfTWXBCTffAWu6FmVlbKPsl0PApCLnN2/MUQBGFcmczC1LP92setPFk+jV5Lfmb3icwqFM4dJMA6C2lv38+OHS8zMNBPUZGLurpF1NRMjEO0cG4ymd/kzoVrn6jyZKaYrJod4dxCAqyzjPb2/bz00lZ8oxqFgYF+Xnpp1CFagqyzlrb2Vlp2rOXoQCeVRVU01TXTUNM4oec0md/kJuLac2m0jiAI6SMarLOMHTteDgVXGj6fjx07JsYhWkiftvZWVr+0is6BDlRUOgc6WP3SKtraWyf61IRxQvOuGujsB3XMu6q9bf9En5ogCCkiAdZZxsCAQQu2wbqQ+7TsWMuQbyhibcg3RMuOtRN0RsJ4Y9a7qmvr82y/+Sa2Xv1+tt98E11bnx/P08R+qi80HocduyaFL5ogpIqUCM8yiopcusFUUVH2HaKF7HB0oDOpdeHcw4x3VdfW59n30IMEvMMADHcfY99DDwJQUX99zHOj3egPHD7Dc19JvQQZ47DuPZPR8TiCcK4hGayzjLq6Rdii2sVtNht1ddl3iBayQ2VRVVLrwrmHkUdV+PqBx78XCq40At5hDjz+vZjn6bnRnz9NZfaFrpRLkPEMWLNBeLYsW8OkJwPyOk4cEmCdZdTUzOWqq+pDGauiIhdXXVUvAvezmKa6Zhw2R8Saw+agqa55gs5IGG/MeFcNH+/Wfa7eul4wlF9gZfGKC0PfJzM+B8bXfHYyj2rKJPI6TixSIjwLqamZKwHVOYTWLZhrXYTC+GHGu6qgrJzh7mMxzy0oK49ZMwp63GWRgbzZ8Tkwvg7rk3lUUyaR13FikQBLEHKAhppGCagmOYm8q+YsvytCgwVgsRcwZ/ldMdsaBUN9xyObKcyOzwE4cPgM7yhVybNbQ2vZMmCdzKOaMom8jhO
2021-05-31 20:09:39 +02:00
"text/plain": [
"<Figure size 720x720 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"fig = plt.figure(figsize=(10,10))\n",
"chart = fig.add_subplot()\n",
2021-06-01 11:26:48 +02:00
"ax = fig.add_subplot()\n",
2021-06-01 12:37:29 +02:00
"colors = ['red', 'green', 'blue', 'brown','purple', 'gray', 'pink', 'black', 'yellow', 'orange']\n",
2021-05-31 20:09:39 +02:00
"for genre in genre_dict:\n",
" genre_data = data[data[\"genre\"]==genre_dict[genre]]\n",
2021-06-01 11:26:48 +02:00
" ax.scatter(genre_data['chroma_stft_mean'],genre_data['mfcc12_mean'], c=colors[genre_dict[genre]-1])\n",
2021-05-31 20:09:39 +02:00
"plt.show()"
]
},
2021-05-26 21:08:58 +02:00
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-05-31 20:09:39 +02:00
"# 4. Wykorzystanie algorytmu Bayesa"
2021-05-31 14:33:39 +02:00
]
},
{
"cell_type": "code",
2021-05-31 20:09:39 +02:00
"execution_count": 11,
2021-05-31 14:33:39 +02:00
"metadata": {},
"outputs": [],
"source": [
"class NaiveBayesContinues:\n",
" def __init__(self, X, Y):\n",
" self.classes = Y.unique()\n",
" self.priors = [] # prawdopodobieństwo każdej z klas\n",
" self.stds = [] #lista odchyleń standardowych każdej z cech dla każdej z klas\n",
" self.means = [] #lista średnich dla każdej z cech dla każdej z klas\n",
" for c in self.classes:\n",
" x_with_c_class = X[c == Y]\n",
" self.priors.append(len(x_with_c_class) / len(X))\n",
" self.means.append(x_with_c_class.mean(axis=0))\n",
" self.stds.append(x_with_c_class.std(axis=0))\n",
"\n",
" \n",
" def predict(self, X, display_results=False):\n",
" y_preds = []\n",
" for x in X:\n",
" posteriors = []\n",
" for i, c in enumerate(self.classes):\n",
" prior = self.priors[i] # prawdopodobieństwo dla rozpatrywanej klasy\n",
" mean = self.means[i] # średnia cech dla rozpatrywanej klasy\n",
" std = self.stds[i] # odchylenie standardowe cech dla rozpatrywanej klasy\n",
" \n",
2021-05-31 15:14:01 +02:00
" posterior = 1 #P(X1|Yi)*P(X2|Yi)*P(X3|Yi)...\n",
" for j, feature in (enumerate(x)):\n",
" P_X_yi = np.exp((-(feature - mean[j]) ** 2) / (2 * std[j] ** 2)) / np.sqrt(2 * np.pi * std[j] ** 2) #P(Xj|Yi)\n",
" posterior *= P_X_yi\n",
2021-05-31 14:33:39 +02:00
" \n",
2021-05-31 15:14:01 +02:00
" posterior = (posterior * prior) #P(Yi)P(X1|Yi)*P(X2|Yi)*P(X3|Yi)...\n",
2021-05-31 14:33:39 +02:00
" posteriors.append(posterior)\n",
" \n",
" if(display_results):\n",
2021-05-31 15:14:01 +02:00
" print(\"posteriors\")\n",
2021-05-31 14:33:39 +02:00
" print(posteriors)\n",
" print(np.argmax(posteriors))\n",
2021-05-31 15:14:01 +02:00
" \n",
" y_pred = self.classes[np.argmax(posteriors)] # Wzięcie klasy z największym prawdopodobieństem\n",
2021-05-31 14:33:39 +02:00
" y_preds.append(y_pred)\n",
" return y_preds"
2021-05-26 21:08:58 +02:00
]
},
2021-05-31 15:14:01 +02:00
{
"attachments": {
2021-06-01 17:48:18 +02:00
"image-2.png": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAwAAAAHaCAYAAAC3uy7uAAAgAElEQVR4Ae29B7QsRbm/fe9fDCQFQUAQkCRIzjkIApIzEhUkpwMHOGQQRHIUBMk55yAi+SggQZKSgyCgIElFgtcr99rfevp+te3pmb1Pd0/39Mzup9aa1XP2manwVE33+6t6663/iEwSkIAEJCABCUhAAhKQQGMI/EdjWmpDJSABCUhAAhKQgAQkIIFIAeAgkIAEJCABCUhAAhKQQIMIKAAa1Nk2VQISkIAEJCABCUhAAgoAx4AEJCABCUhAAhKQgAQaREAB0KDOtqkSkIAEJCABCUhAAhJQADgGJCABCUhAAhKQgAQk0CACCoAGdbZNlYAEJCABCUhAAhKQgALAMSABCUhAAhKQgAQkIIEGEVAANKizbaoEJCABCUhAAhKQgAQUAI4BCUhAAhKQgAQkIAEJNIiAAqBBnW1TJSABCUhAAhKQgAQkoABwDEhAAhKQgAQkIAEJSKBBBBQADepsmyoBCUhAAhKQgAQkIAEFgGNAAhKQgAQkIAEJSEACDSKgAGhQZ9tUCUhAAhKQgAQkIAEJKAAcAxKQgAQkIAEJSEACEmgQAQVAgzrbpkpAAhKQgAQkIAEJSEAB4BiQgAQkIAEJSEACEpBAgwgoABrU2TZVAhKQgAQkIAEJSEACCgDHgAQkIAEJSEACEpCABBpEQAHQoM62qRKQgAQkIAEJSEACElAAOAYkIAEJSEACEpCABCTQIAIKgAZ1tk2VgAQkIAEJSEACEpCAAsAxIAEJSEACEpCABCQggQYRUAA0qLNtqgQkIAEJSEACEpCABBQAjgEJSEACEpCABCQgAQk0iIACoEGdbVMlIAEJSEACEpCABCSgAHAMSEACEpCABCQgAQlIoEEEFAAN6mybKgEJSEACEpCABCQgAQWAY0ACEpCABCQgAQlIQAINIqAAaFBn21QJSEACEpCABCQgAQkoABwDEpCABCQgAQlIQAISaBABBUCDOtumSkACEpCABCQgAQlIQAHgGJCABCQgAQlIQAISkECDCCgAGtTZNlUCEpCABCQgAQlIQAIKAMeABCQgAQlIQAISkIAEGkRAAdCgzrapEpCABCQgAQlIQAISUAA4BiQgAQlIQAISkIAEJNAgAgqABnW2TZWABCQgAQlIQAISkIACwDEgAQlIQAISkIAEJCCBBhFQADSos22qBCQgAQlIQAISkIAEFACOAQlIQAISkIAEJCABCTSIgAKgQZ1tUyUgAQlIQAISkIAEJKAAcAxIQAISkIAEJCABCUigQQQUAA3qbJsqAQlIQAISkIAEJCABBYBjQAISkIAEJCABCUhAAg0ioABoUGfbVAlIQAISkIAEJCABCSgAHAMSkIAEJCABCUhAAhJoEAEFQIM626ZKQAISkIAEJCABCUhAAeAYkIAEJCABCUhAAhKQQIMIKAAa1Nk2VQISkIAEJCABCUhAAgoAx4AEJCABCUhAAhKQgAQaREAB0KDOtqkSkIAEJCABCUhAAhJQADgGJCABCUhAAhKQgAQk0CACCoAGdbZNlYAEJCABCUhAAhKQgALAMSABCUhAAhKQgAQkIIEGEVAANKizbaoEJCABCUhAAhKQgAQUAI4BCUhAAhKQgAQkIAEJNIiAAqBBnW1TJSABCUhAAhKQgAQkoABwDEhAAhKQgAQkIAEJSKBBBBQADepsmyoBCUhAAhKQgAQkIAEFgGNAAhKQgAQkIAEJSEACDSKgAGhQZ9tUCUhAAhKQgAQkIAEJKAAcAxKQgAQkIAEJSEACEmgQAQVAgzrbpkpAAhKQgAQkIAEJSEAB4BiQgAQkIAEJSEACEpBAgwgoABrU2TZVAhKQgAQkIAEJSEACCgDHgAQkIAEJSEACEpCABBpEQAHQoM62qRKQgAQkIAEJSEACElAAOAYkIAEJSEACEpCABCTQIAIKgAZ1tk2VgAQkIAEJSEACEpCAAsAxIAEJSEACEpCABCQggQYRUAA0qLNtqgQkIAEJSEACEpCABBQAjgEJSEACEpCABCQgAQk0iIACoEGdbVMlIAEJSEACEpCABCSgAHAMSEACEpCABCQgAQlIoEEEFAAN6mybKgEJSEACEpCABCQgAQWAY0ACEpCABCQgAQlIQAINIqAAaFBn21QJSEACEpCABCQgAQkoABwDEpCABCQgAQlIQAISaBABBUCDOtumSkACEpCABCQgAQlIQAHgGJCABCQgAQlIQAISkECDCCgAGtTZNlUCEpCABCQgAQlIQAIKAMeABCQgAQlIQAISkIAEGkRAAdCgzrapEpCABCQgAQlIQAISUAA4BiQgAQlIQAISkIAEJNAgAgqABnW2TZWABCQgAQlIQAISkIACwDEgAQlIQAISkIAEJCCBBhFQADSos22qBCQgAQlIQAISkIAEFACOAQlIQAISkIAEJCABCTSIgAKgQZ1tUyUgAQlIQAISkIAEJKAAcAxIQAISkIAEJCABCUigQQQUAA3qbJsqAQlIQAISkIAEJCABBYBjQAISkIAEJCABCUhAAg0ioABoUGfbVAlIQAISkIAEJCABCSgAHAMSkIAEJCABCUhAAhJoEAEFQIM626ZKQAISkIAEJCABCUhAAeAYkIAEJCABCUhAAhKQQIMIKAAa1Nk2VQISkIAEJCABCUhAAgoAx4AEJCABCUhAAhKQgAQaREAB0KDOtqkSkIAEJCABCUhAAhJQADgGJCABCUhAAhKQgAQk0CACCoAGdbZNlYAEJCABCUhAAhKQgALAMSABCUhAAhKQgAQkIIEGEVAANKizbaoEJCABCUhAAhKQgAQUAI4BCUhAAhKQgAQkIAEJNIiAAqBBnW1TJSABCUhAAhKQgAQkoABwDEhAAhKQgAQkIAEJSKBBBBQADepsmyoBCUhAAhKQgAQkIAEFgGNAAhKQgAQkIAEJSEACDSKgAGhQZ9tUCUhAAhKQgAQkIAEJKAAcAxKQgAQkIAEJSEACEmgQAQVAgzrbpkpAAhKQgAQkIAEJSEAB4BiQgAQkIAEJSEACEpBAgwgoABrU2TZVAhKQgAQkIAEJSEACCgDHgAQkIAEJSEACEpCABBpEQAHQoM62qRKQgAQkIAEJSEACElAAOAYkIAEJSEACEpCABCTQIAIKgAZ1tk2VgAQkIAEJSEACEpCAAsAxIAEJSEACEpCABCQggQYRUAA0qLNtqgQkIAEJSEACEpCABBQAjgEJSEACEpCABCQgAQk0iIACoEGdbVMlIAEJSEACEpCABCSgAHAMSEACEpCABCQgAQlIoEEEFAAN6mybKgEJSEACEpCABCQgAQWAY0ACEpCABCQgAQlIQAINIqAAaFBn21QJSEACEpCABCQgAQkoABwDEpCABCQgAQlIQAISaBABBUCDOtumSkACEpCABCQgAQlIQAHgGJCABCQgAQlIQAISkECDCCgAGtTZNlUCEpCABCQgAQlIQAIKAMeABCQgAQlIQAISkIAEGkRAAdCgzrapEpCABCQgAQlIQAISUAA4BiQgAQlIQAISkIAEJNAgAgqABnW2TZWABCQgAQlIQAISkIACwDEgAQlIQAISkIAEJCCBBhFQADSos22qBCQgAQlIQAISkIAEFACOAQlIQAISkIAEJCABCTSIgAKgQZ1tUyUgAQlIQAISkIAEJKAAcAxIQAISkIAEJCABCUigQQQUAA3qbJsqAQlIQAISkIAEJCABBYBjQAISkIAEJCABCUhAAg0ioABoUGfbVAlIQAISkIAEJCABCSgAHAMSkIAEJCABCUhAAhJoEAEFQIM626ZKQAISkIAEJCABCUhAAeAYkIAEJCABCUhAAhKQQIMIKAAa1Nk2VQISkIAEJCABCUhAAgoAx4AEJCABCUhAAhKQgAQaREAB0KDOtqkSkIAEJCABCUhAAhJQADgGJCABCUhAAhKQgAQk0CACCoAGdbZNlYAEJCABCUhAAhKQgALAMSABCUhAAhKQgAQkIIEGEVAANKizbaoEJCABCUhAAhKQgAQUAI4BCUhAAhKQgAQkIAEJNIiAAqBBnW1TJSABCUhAAhKQgAQkoABw
},
2021-05-31 15:14:01 +02:00
"image.png": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAvsAAAClCAYAAADCmjLgAAAgAElEQVR4Ae2di3sU9b3/z//RnkMSyJJFIRutSM8p0CoEhCiXWFFoi3DEgmLRSvFXlJtcKxFofMDHlGJUbCwICIKHX0UjJEYEEwS56RMF4egPsBwwUpJTcN+/5zu3/c7s7Ozs7uxlZt88D0925/K9vL6zM+/5fj+XfwH/kQAJkAAJkAAJkAAJkAAJBJLAvwSyV+wUCZAACZAACZAACZAACZAAKPZ5EZAACZAACZAACZAACZBAQAlQ7Ad0YNktEiABEiABEiABEiABEqDY5zVAAiRAAiRAAiRAAiRAAgElQLEf0IFlt0iABEiABEiABEiABEiAYp/XAAmQAAmQAAmQAAmQAAkElADFfkAHlt0iARIgARIgARIgARIgAYp9XgMkQAIkQAIkQAIkQAIkEFACFPsBHVh2iwRIgARIgARIgARIgAQo9nkNkAAJkAAJkAAJkAAJkEBACVDsB3Rg2S0SIAESIAESIAESIAESoNjnNUACJEACJEACEoHo2b9i8oAKhCc14ay0nR9JgAScCXz6wjCUlkfw291XnA/k3pwSoNjPKW5WRgIkQAIkUMgEomd343fVIZRWTcTzR/9ZyE1l20ig8AhcO4b64eUoDdVgRQd/P4UyQBT7hTISbAcJkAAJkEB+CVzpwOJqVajM3f1VftvC2knApwT0lTEh+J8/6dNOBKzZFPsBG1B2hwRIgARIIA0C186gaVo/lJaHMeLJPfgmjSJ4CgmQgErg0s7HUFEeRslNc/D2JVLJNwGK/XyPAOsnARIgARLIO4HTTVMVoV92x1LspTjJ+3iwAX4n0IN35vVVflOhx3ahy+/d8Xn7KfZ9PoBsPgmQAAmQQGYEutufxbDysGJnvKiFjoWZ0eTZviFw/QI+2rwCv66+RRHlpaGBqJ4yB+vf/QLfXvegF1ea8VR/VfCP2nDKgwJZRLoEKPbTJcfzSIAESIAE/E9At9On+Y7/x5I9cE/gSgeeubM/qqY24L0v/o5v/9/neH/jXIyurFCEf3hcPQ5ddV9coiN1cx7a7ycilJvtFPu54cxaSIAESIAECo5ADw4ur1bNd26bgzfpk1twI8QGZYOAuO6HovekRnxumcG/+v4yDBarXOVh3Dh7jwfmNxfQNKVMXTkY3YDjlvqy0TuWGU+AYj+eCbeQAAmQAAkUAQHDfKc8gglNwTMzON+8AMNCNZjbfLEIRrP4unhu+wz0TSfEZXcznuqtmtdUjGs0C/BoJxrH9FbEeUnfWrz6ReZco0fXqmZy5WHQnCdznumUQLGfDjWeQwIkQAIk4G8Cejzw8jDKRtV5YrJQSEBOb3kIleUR3NNwEj2F1DC2xUMC2spUaCDm7nHvaxI9vx2T+6jmOiUV0/H6OXOTDi7/gToTXx7GtO1evCjGnHVL+k7Eq1+a6+O37BOg2M8+Y9ZAAiRAAiRQYATObVKj74hsn9O2Bst+p/vAMmUm9aezd+EszSYK7MrzuDkiZKwwk0kppn0X3llyFyrKI7hrXnyYWe/FPhA99TLGay8YIU/MgzzmGPDiKPYDPsDsHgmQAAmQgIWAFCUkaLP6ekKjMvogWAY9uF+VMe9T4VFMe8nGPlTtYVKsHrQuDGkrBhEsamF23VxekRT7uaTNukiABEiABPJO4Mi64YboCNas/gVsUxKDBW+1Iu8XTYE34NPGu5VrOtNZ82hnbAa+z4NbPU0uJ5dddm8TzhY40yA1j2I/SKPJvpAACZAACTgTuLgbjwxQnRNLb5sXqARaV5sXKFlLg7Za4Tyg3KsQMFarMpk178Jbs1Rb/tKq6Xjdc+u2LmybWWK8aHN2P3fXLsV+7lizJhIgAQ8JRM83Y8nkcXgggFFUPMTEoiwEYrP6YdxRfzg4zquSw/GExuBFFrIMI7/aEDhSP0QR0mXjGvG5zf5km05vmqq8LJYOmpW1MLRX96kvpCK0J2f3k42Id/sp9r1jyZJIgARyQCB6sRPvPvcABlaqsaBvbziZg1pZRSAIdLcZGT1FVJANHoQVLBQuhoiqmo6dlugqhdJGtiO7BGJOsBGsPpxaXbpTd3hsHfZ+k9q5KR0tXkqHlBuz+6m2M6W6eLBBgGLfQMEPJEAChUjg8sFXULdmDZbMnozRgyu1h4Qq9MXsEMV+IY5aYbbp0vYZxvXjtT1yfnscM48IVr/yS9V3tUsx8vvOb3G9aqU7dd8wowmd/8h+rz9trDF+h6m0M/stC24NFPvBHVv2jAQCQUAPkRgeUos7H16CZ199FYvH6jNDFPuBGORcdCLaiReG9zFmFL2JH56Lhruo4/JuPNJH9UOY0OS5obWLBvCQQiFwZJ1qytOr31Ls+9/krYqe3Y4Hb+mLEU/usYRpvYBts8dgRYv7+P3Ja1OPiK1AhMG4+26pZXYcxX5m/Hg2CZBAzglIoeE4s59z+n6tMHp4LQaXqytCJTfNwduX/NqT+HYLx1yxylUqQiUei9/PLcVD4Pv3lxmz5is6kvT7SgcWV5dj5JIWfGPNx6Bk2R2QsjlQkhrV3dEzaKxVs/SK63YsfUxcYcvkIIr9TOjxXBIggTwQoNjPA3SfVynH+A7jxoAl9Tm4XF3pKgnPws5UX2IuH8MbK2YaJnJiBe2hdQdi4u/yMbw2fwIqlRelCAbN+CP2Z9Om2ydXWvfBtRh/SwVKfjzf+cVR8P3Tajy75YRrs5pMEAiTnHu15FWjNjg4aotkXNP6oc/Eevzf1v1os/x/9/kH0Ncmu24mbZPPPf1KrfFS0utH9fhY3snPnhOg2PccKQskARLILgGK/ezyDWDpkmOukjF3+8XgdDJ6Bk2TyhThVDKsAcetM7QOPT3fvByjKyvQb8wcPLtlH9pad+CZCTeqImx0A46d2qGYeNxw3zy8svNDvLt5kZIFteTmpdh31aHgwO+S7kFOqynSDHZJ39rcOIRfa8PTvdXkVQlj7l87g7fmjDDEtrIqpK16yZ973VCXNREuYu6PLtfCfJan7lAc+EvM4w5S7HsMlMWRAAlkm4D0oKUZT7ZhB6J8w8ylPIySLM5W5gWWJO5Scc5Vo69E8PMVLSZb7ejXf8VkbWZYCD8Rs39/F4Arx1CvvwiUhzF3j/e23Hnhl06lko9Eyc3z8bbgY/PPxPL2pdib4DibU9PfJDnpKi9/NiWZ2mUj8nXB3+ehXZ4m1TI1RWqnqC9cl2L4IFNh/JKMAMV+MkLcTwIkUGAEKPYLbEAKvDlmE56gxfaWhVvCmVzrCHW3Y/HQEG54aKtJ6CuHKbbaWtKxUDXmNquiXo6gInwDFmXBcdPazEL9Lr88Or1gXdr5mDF7HnpsF3Kh9YHY/bFX+ImELyKFwFZ3JhZin6Y82R0Riv3s8mXpJEACnhOIPczEQ4KhNz0HHKwCTSY8YYwKWF4G2RzC7W9BhCAVUVCePxEfrkUur1SajZZtrMOTGnG8J1iXSSq90X0kxP3HyS6+dckPDbGfyyhJB5f/QKm30MX+9weWqUm8NH8QxtxP5SpM7ViK/dR48WgSIIG8E6DYz/sQ+KgBJkERqsaKjn/6qPXJmyr6J0Sn+xffLryz5C7cObMJn9vY9+uhbkV5ptno61347CPhyHkInS6mqL/ctVBxYI3c25iSH0HyHuf5CNn8xOl6ikrJo5zs+rPQHV3sK9GZCjnnoGQO5f76zQKwIiiSYr8IBpldJIFgEaDYD9Z4Zrc3sqlA0EJuCnKpi30n3j14Z96/Gi8Pac9GR9vxh/6qk6hw5v0wfgHBqRHavh58qbxcxEeKsUaOSeV7xxffuag78SGy2ZS88mE9I9EKifW4bHz3jdiXTI6E2A+aiV02xjbdMin20yXH80iABPJEgGI/T+D9V608CxtQMeGp2JdEemmoBs+nPSvcgyPrH8adU+Zg49G0lD5weTd+N0DzHXBwIhUiMZX/4bGaw3GaV7Nsh1949vpqp/wj9gHTy3jQnOfTvMaycRrFfjaoskwSIIEsEqDYzyLcQBUdPb/dFFnmtrrDOYl1nkuIXor9fM5G55JZJnXJ9vqJVz48WiFJs6F+
}
},
"cell_type": "markdown",
"metadata": {},
"source": [
2021-06-01 17:48:18 +02:00
"Wzór bayesa\n",
"![image-2.png](attachment:image-2.png)\n",
"\n",
"\n",
2021-05-31 20:24:24 +02:00
"Gausowski Naiwny Bayes\n",
"Stosowany w przypadku pracy na danych o charakterze ciągłym.\n",
2021-05-31 15:14:01 +02:00
"![image.png](attachment:image.png)"
]
},
2021-05-31 20:09:39 +02:00
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-06-01 11:26:48 +02:00
"W procesie trenowania i testowania modelu wykorzystany został skrypt losujący kolumny i zapisujący uzyskiwane wartości accuracy w celu znalezienia najbardziej efektywnej kombinacji cech. W ten sposób wybranych zostało 8 cech, w tym sześć cech należących do kategorii średnich i dwie do wariancji."
2021-05-31 20:09:39 +02:00
]
},
2021-05-26 21:08:58 +02:00
{
"cell_type": "code",
2021-05-31 20:09:39 +02:00
"execution_count": 12,
2021-05-31 14:33:39 +02:00
"metadata": {},
"outputs": [],
"source": [
"X_train_np = X_train[['mfcc4_mean', 'mfcc12_mean', 'mfcc9_var', 'mfcc1_mean', 'rms_mean', 'chroma_stft_mean', 'mfcc6_var', 'mfcc9_mean']].to_numpy()\n",
"X_test_np = X_test[['mfcc4_mean', 'mfcc12_mean', 'mfcc9_var', 'mfcc1_mean', 'rms_mean', 'chroma_stft_mean', 'mfcc6_var', 'mfcc9_mean']].to_numpy()\n",
"\n",
"model = NaiveBayesContinues(X_train_np, Y_train)"
]
},
{
"cell_type": "code",
2021-05-31 20:09:39 +02:00
"execution_count": 13,
2021-05-26 21:08:58 +02:00
"metadata": {},
2021-05-31 15:14:01 +02:00
"outputs": [],
2021-05-26 21:08:58 +02:00
"source": [
2021-05-31 15:14:01 +02:00
"Y_train_predicted = model.predict(X_train_np[:1])"
2021-05-26 21:08:58 +02:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Ewaluacja"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Zbiór trenujący"
]
},
{
"cell_type": "code",
2021-05-31 20:09:39 +02:00
"execution_count": 14,
"metadata": {
2021-05-31 20:24:24 +02:00
"scrolled": false
},
2021-05-26 21:08:58 +02:00
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(Train data) Confusion matrix:\n"
]
},
{
"data": {
"text/plain": [
2021-05-31 15:14:01 +02:00
"array([[27, 3, 11, 3, 1, 8, 18, 0, 10, 4],\n",
" [ 1, 75, 2, 0, 0, 10, 0, 0, 1, 0],\n",
" [10, 1, 29, 9, 1, 4, 2, 8, 6, 3],\n",
" [ 2, 0, 2, 39, 1, 1, 12, 12, 5, 4],\n",
" [ 0, 0, 0, 8, 36, 0, 11, 13, 8, 1],\n",
" [ 8, 20, 2, 0, 0, 45, 1, 0, 0, 6],\n",
" [ 1, 0, 0, 3, 2, 0, 71, 0, 0, 3],\n",
" [ 1, 1, 2, 2, 5, 0, 0, 63, 2, 0],\n",
" [ 1, 0, 8, 8, 6, 3, 3, 5, 48, 3],\n",
" [ 4, 0, 10, 15, 0, 6, 14, 5, 5, 16]], dtype=int64)"
2021-05-26 21:08:58 +02:00
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"(Train data) Accuracy:\n",
2021-05-31 15:14:01 +02:00
"0.56125\n"
2021-05-26 21:08:58 +02:00
]
}
],
"source": [
2021-05-31 14:33:39 +02:00
"Y_train_predicted = model.predict(X_train_np)\n",
2021-05-26 21:08:58 +02:00
"cm = confusion_matrix(Y_train, Y_train_predicted)\n",
"ac = accuracy_score(Y_train, Y_train_predicted)\n",
"print(\"(Train data) Confusion matrix:\")\n",
"display(cm)\n",
"print(\"(Train data) Accuracy:\")\n",
"print(ac)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Zbiór testowy"
]
},
{
"cell_type": "code",
2021-05-31 20:09:39 +02:00
"execution_count": 15,
2021-05-26 21:08:58 +02:00
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Confusion matrix:\n"
]
},
{
"data": {
"text/plain": [
2021-05-31 14:33:39 +02:00
"array([[ 5, 0, 2, 0, 0, 2, 4, 0, 1, 1],\n",
" [ 0, 8, 0, 0, 0, 3, 0, 0, 0, 0],\n",
" [ 7, 0, 6, 6, 0, 4, 1, 0, 1, 2],\n",
" [ 0, 0, 2, 7, 1, 0, 2, 4, 5, 1],\n",
" [ 0, 0, 0, 2, 10, 0, 6, 1, 4, 0],\n",
" [ 0, 3, 0, 1, 0, 14, 0, 0, 0, 0],\n",
" [ 0, 0, 0, 0, 1, 0, 17, 0, 2, 0],\n",
" [ 1, 0, 0, 2, 1, 0, 0, 18, 1, 1],\n",
" [ 0, 1, 0, 0, 3, 1, 0, 4, 6, 0],\n",
" [ 5, 0, 3, 5, 1, 1, 5, 1, 0, 4]], dtype=int64)"
2021-05-26 21:08:58 +02:00
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Accuracy:\n",
2021-05-31 14:33:39 +02:00
"0.475\n"
2021-05-26 21:08:58 +02:00
]
}
],
"source": [
2021-05-31 14:33:39 +02:00
"Y_test_predicted = model.predict(X_test_np)\n",
2021-05-26 21:08:58 +02:00
"cm = confusion_matrix(Y_test, Y_test_predicted)\n",
"ac = accuracy_score(Y_test, Y_test_predicted)\n",
"print(\"Confusion matrix:\")\n",
"display(cm)\n",
"print(\"Accuracy:\")\n",
"print(ac)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Przykładowe porównania"
]
},
{
"cell_type": "code",
2021-05-31 15:18:20 +02:00
"execution_count": 16,
2021-05-26 21:08:58 +02:00
"metadata": {},
"outputs": [
{
2021-05-31 15:18:20 +02:00
"name": "stdout",
"output_type": "stream",
"text": [
"Y: 10\tPredicted: 10\n",
"Y: 9\tPredicted: 8\n",
"Y: 3\tPredicted: 1\n",
"Y: 6\tPredicted: 6\n",
"Y: 7\tPredicted: 7\n",
"Y: 10\tPredicted: 7\n",
"Y: 1\tPredicted: 1\n",
"Y: 3\tPredicted: 6\n",
"Y: 4\tPredicted: 4\n",
"Y: 8\tPredicted: 10\n"
2021-05-26 21:08:58 +02:00
]
}
],
"source": [
2021-05-31 15:18:20 +02:00
"for i in range(10):\n",
2021-05-26 21:08:58 +02:00
" print(f\"Y: {Y_test.to_numpy()[i]}\\tPredicted: {Y_test_predicted[i]}\")"
]
2021-05-31 20:28:35 +02:00
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Bayes z wykorzystaniem gotowej biblioteki"
]
},
{
"cell_type": "code",
2021-06-01 11:26:48 +02:00
"execution_count": 17,
2021-05-31 20:28:35 +02:00
"metadata": {},
"outputs": [],
"source": [
"from sklearn.naive_bayes import GaussianNB\n",
"from sklearn.metrics import confusion_matrix, accuracy_score\n",
"import pandas as pd\n",
"import numpy as np\n",
"import pickle, os\n",
"import typing\n",
"\n",
"class Bayes:\n",
" def __init__(self):\n",
" self.classifier = GaussianNB()\n",
"\n",
"\n",
" def train(self, X: pd.DataFrame, Y: pd.Series) -> None:\n",
" self.classifier.fit(X, Y)\n",
"\n",
"\n",
" def predict(self, X: pd.DataFrame) -> np.ndarray:\n",
" predictions = self.classifier.predict(X)\n",
" return predictions\n",
"\n",
"\n",
" def eval(self, Y: pd.Series, Y_pred: np.ndarray) -> typing.Tuple[np.ndarray, np.float64]:\n",
" cm = confusion_matrix(Y, Y_pred)\n",
" ac = accuracy_score(Y, Y_pred)\n",
" return (cm, ac)"
]
},
{
"cell_type": "code",
2021-06-01 12:37:29 +02:00
"execution_count": 18,
2021-05-31 20:28:35 +02:00
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Train:\n",
"0.56125\n",
"Test:\n",
"0.475\n"
2021-05-31 20:28:35 +02:00
]
}
],
"source": [
"bayes = Bayes()\n",
2021-06-01 11:26:48 +02:00
"bayes.train(X_train_np, Y_train)\n",
2021-05-31 20:28:35 +02:00
"\n",
"Y_predicted = bayes.predict(X_train_np)\n",
2021-05-31 20:28:35 +02:00
"eval_result = bayes.eval(Y_train, Y_predicted)\n",
"print(\"Train:\")\n",
"print(eval_result[1])\n",
"\n",
"Y_predicted = bayes.predict(X_test_np)\n",
2021-05-31 20:28:35 +02:00
"eval_result = bayes.eval(Y_test, Y_predicted)\n",
"print(\"Test:\")\n",
"print(eval_result[1])"
]
2021-06-01 00:05:06 +02:00
},
{
"cell_type": "code",
"execution_count": 19,
2021-06-01 00:05:06 +02:00
"metadata": {},
2021-06-01 12:37:29 +02:00
"outputs": [],
2021-06-01 00:05:06 +02:00
"source": [
"# skrypt losujacy kolumny ze zbioru i sprawdzajacy accuracy na zbiorze trenujacym\n",
"\n",
"for i in range(100):\n",
" X = data.drop([\"genre\", \"label\", \"tempo\"], axis=1)\n",
2021-06-01 00:05:06 +02:00
" X_rand = X.sample(n=10, axis='columns')\n",
" Y = data[\"genre\"] \n",
" \n",
" X_train, X_test, Y_train, Y_test = train_test_split(X_rand, Y, test_size = 0.20, random_state = False)\n",
" \n",
" model = GaussianNB()\n",
" model.fit(X_train, Y_train)\n",
" Y_train_predicted = model.predict(X_train)\n",
" ac = accuracy_score(Y_train, Y_train_predicted)\n",
" filename = 'accuracy.txt'\n",
"\n",
" if os.path.exists(filename):\n",
" append_write = 'a'\n",
" else:\n",
" append_write = 'w'\n",
"\n",
" acc_random = open(filename, append_write)\n",
" acc_random.write(str(ac) + \" \" + str(list(X_rand.columns)) + '\\n')\n",
" acc_random.close()\n",
"\n",
"#!sort -k1,1nr -k2,2 accuracy.txt"
2021-06-01 00:05:06 +02:00
]
2021-05-26 21:08:58 +02:00
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
2021-06-01 17:48:18 +02:00
"version": "3.8.5"
2021-05-26 21:08:58 +02:00
}
},
"nbformat": 4,
"nbformat_minor": 4
}