2022-05-17 17:30:50 +02:00
|
|
|
|
{
|
|
|
|
|
"cells": [
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
"metadata": {
|
|
|
|
|
"id": "TQqrOdkY6nsy"
|
|
|
|
|
},
|
|
|
|
|
"source": [
|
|
|
|
|
"# **Klasyfikacja za pomocą naiwnej metody bayesowskiej z rozkładem normalnym**"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
"metadata": {
|
|
|
|
|
"id": "AlcfRFCPSXIj"
|
|
|
|
|
},
|
|
|
|
|
"source": [
|
|
|
|
|
"# **Twierdzenie Bayesa**\n",
|
|
|
|
|
"![bayes.svg](data:image/svg+xml;base64,PHN2ZyB4bWxuczp4bGluaz0iaHR0cDovL3d3dy53My5vcmcvMTk5OS94bGluayIgd2lkdGg9IjI3LjQ4NGV4IiBoZWlnaHQ9IjYuNTA5ZXgiIHN0eWxlPSJ2ZXJ0aWNhbC1hbGlnbjogLTIuNjcxZXg7IiB2aWV3Qm94PSIwIC0xNjUyLjUgMTE4MzMuMyAyODAyLjYiIHJvbGU9ImltZyIgZm9jdXNhYmxlPSJmYWxzZSIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIiBhcmlhLWxhYmVsbGVkYnk9Ik1hdGhKYXgtU1ZHLTEtVGl0bGUiPgo8dGl0bGUgaWQ9Ik1hdGhKYXgtU1ZHLTEtVGl0bGUiPntcZGlzcGxheXN0eWxlIHtcbWF0aHNmIHtQfX0oQVxtaWQgQik9e1xmcmFjIHt7XG1hdGhzZiB7UH19KEJcbWlkIEEpXCx7XG1hdGhzZiB7UH19KEEpfXt7XG1hdGhzZiB7UH19KEIpfX0sfTwvdGl0bGU+CjxkZWZzIGFyaWEtaGlkZGVuPSJ0cnVlIj4KPHBhdGggc3Ryb2tlLXdpZHRoPSIxIiBpZD0iRTEtTUpTUy01MCIgZD0iTTg4IDBWNjk0SDIzMFEzNDcgNjkzIDM3MCA2OTJUNDEwIDY4NlE0ODcgNjY3IDUzNSA2MTFUNTgzIDQ4NVE1ODMgNDA5IDUyNyAzNDhUMzc5IDI3NlEzNjkgMjc0IDI3OSAyNzRIMTkyVjBIODhaTTQ4NiA0ODVRNDg2IDUyMyA0NzEgNTUxVDQzMiA1OTNUMzkxIDYxMlQzNTcgNjIxUTM1MCA2MjIgMjY4IDYyM0gxODlWMzQ3SDI2OFEzNTAgMzQ4IDM1NyAzNDlRMzcwIDM1MSAzODMgMzU0VDQxNiAzNjhUNDUwIDM5MVQ0NzUgNDI5VDQ4NiA0ODVaIj48L3BhdGg+CjxwYXRoIHN0cm9rZS13aWR0aD0iMSIgaWQ9IkUxLU1KTUFJTi0yOCIgZD0iTTk0IDI1MFE5NCAzMTkgMTA0IDM4MVQxMjcgNDg4VDE2NCA1NzZUMjAyIDY0M1QyNDQgNjk1VDI3NyA3MjlUMzAyIDc1MEgzMTVIMzE5UTMzMyA3NTAgMzMzIDc0MVEzMzMgNzM4IDMxNiA3MjBUMjc1IDY2N1QyMjYgNTgxVDE4NCA0NDNUMTY3IDI1MFQxODQgNThUMjI1IC04MVQyNzQgLTE2N1QzMTYgLTIyMFQzMzMgLTI0MVEzMzMgLTI1MCAzMTggLTI1MEgzMTVIMzAyTDI3NCAtMjI2UTE4MCAtMTQxIDEzNyAtMTRUOTQgMjUwWiI+PC9wYXRoPgo8cGF0aCBzdHJva2Utd2lkdGg9IjEiIGlkPSJFMS1NSk1BVEhJLTQxIiBkPSJNMjA4IDc0UTIwOCA1MCAyNTQgNDZRMjcyIDQ2IDI3MiAzNVEyNzIgMzQgMjcwIDIyUTI2NyA4IDI2NCA0VDI1MSAwUTI0OSAwIDIzOSAwVDIwNSAxVDE0MSAyUTcwIDIgNTAgMEg0MlEzNSA3IDM1IDExUTM3IDM4IDQ4IDQ2SDYyUTEzMiA0OSAxNjQgOTZRMTcwIDEwMiAzNDUgNDAxVDUyMyA3MDRRNTMwIDcxNiA1NDcgNzE2SDU1NUg1NzJRNTc4IDcwNyA1NzggNzA2TDYwNiAzODNRNjM0IDYwIDYzNiA1N1E2NDEgNDYgNzAxIDQ2UTcyNiA0NiA3MjYgMzZRNzI2IDM0IDcyMyAyMlE3MjAgNyA3MTggNFQ3MDQgMFE3MDEgMCA2OTAgMFQ2NTEgMVQ1NzggMlE0ODQgMiA0NTUgMEg0NDNRNDM3IDYgNDM3IDlUNDM5IDI3UTQ0MyA0MCA0NDUgNDNMNDQ5IDQ2SDQ2OVE1MjMgNDkgNTMzIDYzTDUyMSAyMTNIMjgzTDI0OSAxNTVRMjA4IDg2IDIwOCA3NFpNNTE2IDI2MFE1MTYgMjcxIDUwNCA0MTZUNDkwIDU2Mkw0NjMgNTE5UTQ0NyA0OTIgNDAwIDQxMkwzMTAgMjYwTDQxMyAyNTlRNTE2IDI1OSA1MTYgMjYwWiI+PC9wYXRoPgo8cGF0aCBzdHJva2Utd2lkdGg9IjEiIGlkPSJFMS1NSk1BSU4tMjIyMyIgZD0iTTEzOSAtMjQ5SDEzN1ExMjUgLTI0OSAxMTkgLTIzNVYyNTFMMTIwIDczN1ExMzAgNzUwIDEzOSA3NTBRMTUyIDc1MCAxNTkgNzM1Vi0yMzVRMTUxIC0yNDkgMTQxIC0yNDlIMTM5WiI+PC9wYXRoPgo8cGF0aCBzdHJva2Utd2lkdGg9IjEiIGlkPSJFMS1NSk1BVEhJLTQyIiBkPSJNMjMxIDYzN1EyMDQgNjM3IDE5OSA2MzhUMTk0IDY0OVExOTQgNjc2IDIwNSA2ODJRMjA2IDY4MyAzMzUgNjgzUTU5NCA2ODMgNjA4IDY4MVE2NzEgNjcxIDcxMyA2MzZUNzU2IDU0NFE3NTYgNDgwIDY5OCA0MjlUNTY1IDM2MEw1NTUgMzU3UTYxOSAzNDggNjYwIDMxMVQ3MDIgMjE5UTcwMiAxNDYgNjMwIDc4VDQ1MyAxUTQ0NiAwIDI0MiAwUTQyIDAgMzkgMlEzNSA1IDM1IDEwUTM1IDE3IDM3IDI0UTQyIDQzIDQ3IDQ1UTUxIDQ2IDYyIDQ2SDY4UTk1IDQ2IDEyOCA0OVExNDIgNTIgMTQ3IDYxUTE1MCA2NSAyMTkgMzM5VDI4OCA2MjhRMjg4IDYzNSAyMzEgNjM3Wk02NDkgNTQ0UTY0OSA1NzQgNjM0IDYwMFQ1ODUgNjM0UTU3OCA2MzYgNDkzIDYzN1E0NzMgNjM3IDQ1MSA2MzdUNDE2IDYzNkg0MDNRMzg4IDYzNSAzODQgNjI2UTM4MiA2MjIgMzUyIDUwNlEzNTIgNTAzIDM1MSA1MDBMMzIwIDM3NEg0MDFRNDgyIDM3NCA0OTQgMzc2UTU1NCAzODYgNjAxIDQzNFQ2NDkgNTQ0Wk01OTUgMjI5UTU5NSAyNzMgNTcyIDMwMlQ1MTIgMzM2UTUwNiAzMzcgNDI5IDMzN1EzMTEgMzM3IDMxMCAzMzZRMzEwIDMzNCAyOTMgMjYzVDI1OCAxMjJMMjQwIDUyUTI0MCA0OCAyNTIgNDhUMzMzIDQ2UTQyMiA0NiA0MjkgNDdRNDkxIDU0IDU0MyAxMDVUNTk1IDIyOVoiPjwvcGF0aD4KPHBhdGggc3Ryb2tlLXdpZHRoPSIxIiBpZD0iRTEtTUpNQUlOLTI5IiBkPSJNNjAgNzQ5TDY0IDc1MFE2OSA3NTAgNzQgNzUwSDg2TDExNCA3MjZRMjA4IDY0MSAyNTEgNTE0VDI5NCAyNTBRMjk0IDE4MiAyODQgMTE5VDI2MSAxMlQyMjQgLTc2VDE4NiAtMTQzVDE0NSAtMTk0VDExMyAtMjI3VDkwIC0yNDZRODcgLTI0OSA4NiAtMjUwSDc0UTY2IC0yNTAgNjMgLTI1MFQ1OCAtMjQ3VDU1IC0yMzhRNTYgLTIzNyA2NiAtMjI1UTIyMSAtNjQgMjIxIDI1MFQ2NiA3MjVRNTYgNzM3IDU1IDczOFE1NSA3NDYgNjAgNzQ5WiI+PC9wYXRoPgo8cGF0aCBzdHJva2Utd2lkdGg9IjEiIGlkPSJFMS1NSk1BSU4tM0QiIGQ9Ik01NiAzNDdRNTYgMzYwIDcwIDM2N0g3MDdRNzIyIDM1OSA3MjIgMzQ3UTcyMiAzMzYgNzA4IDMyOEwzOTAgMzI3SDcyUTU2IDMzMiA1NiAzNDdaTTU2IDE1M1E1NiAxNjggNzIgMTczSDcwOFE3MjIgMTYzIDcyMiAxNTNRNzIyIDE0MCA3MDcgMTMzSDcwUTU2IDE0MCA1NiAxNTNaIj48L3BhdGg+CjxwYXRoIHN0cm9rZS13aWR0aD0iMSIgaWQ9IkUxLU1KTUFJTi0yQyIgZD0iTTc4IDM1VDc4IDYwVDk0IDE
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
"metadata": {
|
|
|
|
|
"id": "rcpTnWjOh5dq"
|
|
|
|
|
},
|
|
|
|
|
"source": [
|
|
|
|
|
"P(A) -- oznacza prawdopodobieństwo a-priori wystąpienia klasy A (tj. prawdopodobieństwo, że dowolny przykład należy do klasy A)\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"P(B|A) -- oznacza prawdopodobieństwo a-posteriori, że B należy do \n",
|
|
|
|
|
"klasy A\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"P(B) -- znacza prawdopodobieństwo a-priori wystąpienia przykładu B "
|
|
|
|
|
]
|
|
|
|
|
},
|
2022-05-17 23:08:12 +02:00
|
|
|
|
{
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"source": [
|
2022-05-18 18:09:00 +02:00
|
|
|
|
"**Naiwny klasyfikator bayesowski jest naiwny, ponieważ zakłada, że poszczególne cechy są niezależne od siebie**"
|
2022-05-17 23:08:12 +02:00
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
"metadata": {
|
|
|
|
|
"id": "SSaJsYOhz8h8"
|
|
|
|
|
},
|
|
|
|
|
"source": [
|
|
|
|
|
"![rozklady.jpg](data:image/jpeg;base64,/9j/4AAQSkZJRgABAQEAYABgAAD//gA8Q1JFQVRPUjogZ2QtanBlZyB2MS4wICh1c2luZyBJSkcgSlBFRyB2NjIpLCBxdWFsaXR5ID0gMTAwCv/bAEMAAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAf/bAEMBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAf/AABEIAucC6QMBEQACEQEDEQH/xAAfAAABBQEBAQEBAQAAAAAAAAAAAQIDBAUGBwgJCgv/xAC1EAACAQMDAgQDBQUEBAAAAX0BAgMABBEFEiExQQYTUWEHInEUMoGRoQgjQrHBFVLR8CQzYnKCCQoWFxgZGiUmJygpKjQ1Njc4OTpDREVGR0hJSlNUVVZXWFlaY2RlZmdoaWpzdHV2d3h5eoOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3uLm6wsPExcbHyMnK0tPU1dbX2Nna4eLj5OXm5+jp6vHy8/T19vf4+fr/xAAfAQADAQEBAQEBAQEBAAAAAAAAAQIDBAUGBwgJCgv/xAC1EQACAQIEBAMEBwUEBAABAncAAQIDEQQFITEGEkFRB2FxEyIygQgUQpGhscEJIzNS8BVictEKFiQ04SXxFxgZGiYnKCkqNTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqCg4SFhoeIiYqSk5SVlpeYmZqio6Slpqeoqaqys7S1tre4ubrCw8TFxsfIycrS09TV1tfY2dri4+Tl5ufo6ery8/T19vf4+fr/2gAMAwEAAhEDEQA/AP73P7Pi9dS/8HOvf/F0AH9nxeupf+DnXv8A4ugA/s+L11L/AMHOvf8AxdAB/Z8XrqX/AIOde/8Ai6AD+z4vXUv/AAc69/8AF0AH9nxeupf+DnXv/i6AD+z4vXUv/Bzr3/xdAB/Z8XrqX/g517/4ugA/s+L11L/wc69/8XQAf2fF66l/4Ode/wDi6AD+z4vXUv8Awc69/wDF0AH9nxeupf8Ag517/wCLoAP7Pi9dS/8ABzr3/wAXQAf2fF66l/4Ode/+LoAP7Pi9dS/8HOvf/F0AH9nxeupf+DnXv/i6AD+z4vXUv/Bzr3/xdAB/Z8XrqX/g517/AOLoAP7Pi9dS/wDBzr3/AMXQAf2fF66l/wCDnXv/AIugA/s+L11L/wAHOvf/ABdAB/Z8XrqX/g517/4ugA/s+L11L/wc69/8XQAf2fF66l/4Ode/+LoAP7Pi9dS/8HOvf/F0AH9nxeupf+DnXv8A4ugA/s+L11L/AMHOvf8AxdAB/Z8XrqX/AIOde/8Ai6AD+z4vXUv/AAc69/8AF0AH9nxeupf+DnXv/i6AD+z4vXUv/Bzr3/xdAB/Z8XrqX/g517/4ugA/s+L11L/wc69/8XQAf2fF66l/4Ode/wDi6AD+z4vXUv8Awc69/wDF0AH9nxeupf8Ag517/wCLoAP7Pi9dS/8ABzr3/wAXQAf2fF66l/4Ode/+LoAP7Pi9dS/8HOvf/F0AH9nxeupf+DnXv/i6AD+z4vXUv/Bzr3/xdAB/Z8XrqX/g517/AOLoAP7Pi9dS/wDBzr3/AMXQAf2fF66l/wCDnXv/AIugA/s+L11L/wAHOvf/ABdAB/Z8XrqX/g517/4ugA/s+L11L/wc69/8XQAf2fF66l/4Ode/+LoAP7Pi9dS/8HOvf/F0AH9nxeupf+DnXv8A4ugA/s+L11L/AMHOvf8AxdAB/Z8XrqX/AIOde/8Ai6AD+z4vXUv/AAc69/8AF0AH9nxeupf+DnXv/i6AD+z4vXUv/Bzr3/xdAB/Z8XrqX/g517/4ugA/s+L11L/wc69/8XQAf2fF66l/4Ode/wDi6AD+z4vXUv8Awc69/wDF0AH9nxeupf8Ag517/wCLoAP7Pi9dS/8ABzr3/wAXQAf2fF66l/4Ode/+LoAP7Pi9dS/8HOvf/F0AH9nxeupf+DnXv/i6AD+z4vXUv/Bzr3/xdAB/Z8XrqX/g517/AOLoAP7Pi9dS/wDBzr3/AMXQAf2fF66l/wCDnXv/AIugA/s+L11L/wAHOvf/ABdAB/Z8XrqX/g517/4ugA/s+L11L/wc69/8XQAf2fF66l/4Ode/+LoAP7Pi9dS/8HOvf/F0AH9nxeupf+DnXv8A4ugA/s+L11L/AMHOvf8AxdAB/Z8XrqX/AIOde/8Ai6AD+z4vXUv/AAc69/8AF0AH9nxeupf+DnXv/i6AD+z4vXUv/Bzr3/xdAB/Z8XrqX/g517/4ugA/s+L11L/wc69/8XQAf2fF66l/4Ode/wDi6AD+z4vXUv8Awc69/wDF0AH9nxeupf8Ag517/wCLoAP7Pi9dS/8ABzr3/wAXQAf2fF66l/4Ode/+LoAP7Pi9dS/8HOvf/F0AH9nxeupf+DnXv/i6AD+z4vXUv/Bzr3/xdAB/Z8XrqX/g517/AOLoAP7Pi9dS/wDBzr3/AMXQAf2fF66l/wCDnXv/AIugA/s+L11L/wAHOvf/ABdAHz1/buv/APQwar/4HD/5q6APqGgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKAPkX5/9n9aAPrqgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKAPkX5/9n9aAPrqgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKAPkX5/wDZ/WgD66oAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgD5F+f/AGf1oA+uqACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoA+Rfn/2f1oA+uqACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoAKACgAoA+Rfn/2f1oA+uqACgAoAKACgAoAKACgAoAKACg
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "code",
|
2022-05-18 18:09:00 +02:00
|
|
|
|
"execution_count": 46,
|
2022-05-17 23:08:12 +02:00
|
|
|
|
"metadata": {},
|
|
|
|
|
"outputs": [
|
|
|
|
|
{
|
|
|
|
|
"data": {
|
|
|
|
|
"text/html": [
|
|
|
|
|
"<div>\n",
|
|
|
|
|
"<style scoped>\n",
|
|
|
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|
|
|
|
" vertical-align: middle;\n",
|
|
|
|
|
" }\n",
|
|
|
|
|
"\n",
|
|
|
|
|
" .dataframe tbody tr th {\n",
|
|
|
|
|
" vertical-align: top;\n",
|
|
|
|
|
" }\n",
|
|
|
|
|
"\n",
|
|
|
|
|
" .dataframe thead th {\n",
|
|
|
|
|
" text-align: right;\n",
|
|
|
|
|
" }\n",
|
|
|
|
|
"</style>\n",
|
|
|
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
|
|
|
" <thead>\n",
|
|
|
|
|
" <tr style=\"text-align: right;\">\n",
|
|
|
|
|
" <th></th>\n",
|
|
|
|
|
" <th>sepal.length</th>\n",
|
|
|
|
|
" <th>sepal.width</th>\n",
|
|
|
|
|
" <th>petal.length</th>\n",
|
|
|
|
|
" <th>petal.width</th>\n",
|
|
|
|
|
" <th>variety</th>\n",
|
|
|
|
|
" </tr>\n",
|
|
|
|
|
" </thead>\n",
|
|
|
|
|
" <tbody>\n",
|
|
|
|
|
" <tr>\n",
|
|
|
|
|
" <th>0</th>\n",
|
|
|
|
|
" <td>5.1</td>\n",
|
|
|
|
|
" <td>3.5</td>\n",
|
|
|
|
|
" <td>1.4</td>\n",
|
|
|
|
|
" <td>0.2</td>\n",
|
|
|
|
|
" <td>Setosa</td>\n",
|
|
|
|
|
" </tr>\n",
|
|
|
|
|
" <tr>\n",
|
|
|
|
|
" <th>1</th>\n",
|
|
|
|
|
" <td>4.9</td>\n",
|
|
|
|
|
" <td>3.0</td>\n",
|
|
|
|
|
" <td>1.4</td>\n",
|
|
|
|
|
" <td>0.2</td>\n",
|
|
|
|
|
" <td>Setosa</td>\n",
|
|
|
|
|
" </tr>\n",
|
|
|
|
|
" <tr>\n",
|
|
|
|
|
" <th>2</th>\n",
|
|
|
|
|
" <td>4.7</td>\n",
|
|
|
|
|
" <td>3.2</td>\n",
|
|
|
|
|
" <td>1.3</td>\n",
|
|
|
|
|
" <td>0.2</td>\n",
|
|
|
|
|
" <td>Setosa</td>\n",
|
|
|
|
|
" </tr>\n",
|
|
|
|
|
" <tr>\n",
|
|
|
|
|
" <th>3</th>\n",
|
|
|
|
|
" <td>4.6</td>\n",
|
|
|
|
|
" <td>3.1</td>\n",
|
|
|
|
|
" <td>1.5</td>\n",
|
|
|
|
|
" <td>0.2</td>\n",
|
|
|
|
|
" <td>Setosa</td>\n",
|
|
|
|
|
" </tr>\n",
|
|
|
|
|
" <tr>\n",
|
|
|
|
|
" <th>4</th>\n",
|
|
|
|
|
" <td>5.0</td>\n",
|
|
|
|
|
" <td>3.6</td>\n",
|
|
|
|
|
" <td>1.4</td>\n",
|
|
|
|
|
" <td>0.2</td>\n",
|
|
|
|
|
" <td>Setosa</td>\n",
|
|
|
|
|
" </tr>\n",
|
|
|
|
|
" <tr>\n",
|
|
|
|
|
" <th>...</th>\n",
|
|
|
|
|
" <td>...</td>\n",
|
|
|
|
|
" <td>...</td>\n",
|
|
|
|
|
" <td>...</td>\n",
|
|
|
|
|
" <td>...</td>\n",
|
|
|
|
|
" <td>...</td>\n",
|
|
|
|
|
" </tr>\n",
|
|
|
|
|
" <tr>\n",
|
|
|
|
|
" <th>145</th>\n",
|
|
|
|
|
" <td>6.7</td>\n",
|
|
|
|
|
" <td>3.0</td>\n",
|
|
|
|
|
" <td>5.2</td>\n",
|
|
|
|
|
" <td>2.3</td>\n",
|
|
|
|
|
" <td>Virginica</td>\n",
|
|
|
|
|
" </tr>\n",
|
|
|
|
|
" <tr>\n",
|
|
|
|
|
" <th>146</th>\n",
|
|
|
|
|
" <td>6.3</td>\n",
|
|
|
|
|
" <td>2.5</td>\n",
|
|
|
|
|
" <td>5.0</td>\n",
|
|
|
|
|
" <td>1.9</td>\n",
|
|
|
|
|
" <td>Virginica</td>\n",
|
|
|
|
|
" </tr>\n",
|
|
|
|
|
" <tr>\n",
|
|
|
|
|
" <th>147</th>\n",
|
|
|
|
|
" <td>6.5</td>\n",
|
|
|
|
|
" <td>3.0</td>\n",
|
|
|
|
|
" <td>5.2</td>\n",
|
|
|
|
|
" <td>2.0</td>\n",
|
|
|
|
|
" <td>Virginica</td>\n",
|
|
|
|
|
" </tr>\n",
|
|
|
|
|
" <tr>\n",
|
|
|
|
|
" <th>148</th>\n",
|
|
|
|
|
" <td>6.2</td>\n",
|
|
|
|
|
" <td>3.4</td>\n",
|
|
|
|
|
" <td>5.4</td>\n",
|
|
|
|
|
" <td>2.3</td>\n",
|
|
|
|
|
" <td>Virginica</td>\n",
|
|
|
|
|
" </tr>\n",
|
|
|
|
|
" <tr>\n",
|
|
|
|
|
" <th>149</th>\n",
|
|
|
|
|
" <td>5.9</td>\n",
|
|
|
|
|
" <td>3.0</td>\n",
|
|
|
|
|
" <td>5.1</td>\n",
|
|
|
|
|
" <td>1.8</td>\n",
|
|
|
|
|
" <td>Virginica</td>\n",
|
|
|
|
|
" </tr>\n",
|
|
|
|
|
" </tbody>\n",
|
|
|
|
|
"</table>\n",
|
|
|
|
|
"<p>150 rows × 5 columns</p>\n",
|
|
|
|
|
"</div>"
|
|
|
|
|
],
|
|
|
|
|
"text/plain": [
|
|
|
|
|
" sepal.length sepal.width petal.length petal.width variety\n",
|
|
|
|
|
"0 5.1 3.5 1.4 0.2 Setosa\n",
|
|
|
|
|
"1 4.9 3.0 1.4 0.2 Setosa\n",
|
|
|
|
|
"2 4.7 3.2 1.3 0.2 Setosa\n",
|
|
|
|
|
"3 4.6 3.1 1.5 0.2 Setosa\n",
|
|
|
|
|
"4 5.0 3.6 1.4 0.2 Setosa\n",
|
|
|
|
|
".. ... ... ... ... ...\n",
|
|
|
|
|
"145 6.7 3.0 5.2 2.3 Virginica\n",
|
|
|
|
|
"146 6.3 2.5 5.0 1.9 Virginica\n",
|
|
|
|
|
"147 6.5 3.0 5.2 2.0 Virginica\n",
|
|
|
|
|
"148 6.2 3.4 5.4 2.3 Virginica\n",
|
|
|
|
|
"149 5.9 3.0 5.1 1.8 Virginica\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"[150 rows x 5 columns]"
|
|
|
|
|
]
|
|
|
|
|
},
|
2022-05-18 18:09:00 +02:00
|
|
|
|
"execution_count": 46,
|
2022-05-17 23:08:12 +02:00
|
|
|
|
"metadata": {},
|
|
|
|
|
"output_type": "execute_result"
|
|
|
|
|
}
|
|
|
|
|
],
|
|
|
|
|
"source": [
|
|
|
|
|
"import pandas as pd\n",
|
|
|
|
|
"iris_data = pd.read_csv(\"iris.csv\")\n",
|
|
|
|
|
"iris_data"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "code",
|
2022-05-18 18:09:00 +02:00
|
|
|
|
"execution_count": 47,
|
2022-05-17 23:08:12 +02:00
|
|
|
|
"metadata": {},
|
|
|
|
|
"outputs": [
|
|
|
|
|
{
|
|
|
|
|
"data": {
|
|
|
|
|
"text/plain": [
|
2022-05-18 18:09:00 +02:00
|
|
|
|
"<matplotlib.collections.PathCollection at 0x7fd5194d9f40>"
|
2022-05-17 23:08:12 +02:00
|
|
|
|
]
|
|
|
|
|
},
|
2022-05-18 18:09:00 +02:00
|
|
|
|
"execution_count": 47,
|
2022-05-17 23:08:12 +02:00
|
|
|
|
"metadata": {},
|
|
|
|
|
"output_type": "execute_result"
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"data": {
|
2022-05-18 18:09:00 +02:00
|
|
|
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXQAAAD7CAYAAAB68m/qAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8rg+JYAAAACXBIWXMAAAsTAAALEwEAmpwYAAAqEUlEQVR4nO2df0zd1f3/n7wRSmtFCrNAL2as/Yw7Is2MQMzW3I7VHyUbrDZksR+mZnO6bN2m27TKpqOt1VRqYzadHTNblmzmW7eG0A10uNlSd2uy5Upstgvulk8LTi4XGi8l2pYflXu/f1xAClx4n3Pv+9zX+7xfj2SJ3Pft5Zy+L6/dPl4vnictGo1GwTAMw9geI9ULYBiGYZIDF3SGYRhN4ILOMAyjCVzQGYZhNIELOsMwjCZclapvHIlEcPHiRWRkZCAtLS1Vy2AYhrEV0WgUly9fxtVXXw3DuPIzecoK+sWLF3H69OlUfXuGYRhbU1JSgmuuueaKx1JW0DMyMgDEFnX69GmUlZWlaikpx+/3O3b/vHdn7h1w9v4T2fvk5CROnz49W0PnYqqgb9myBZmZmVixYgUA4OGHH4bH48GpU6fQ2NiIiYkJuFwuPPPMM8jLyzO1qBnNkpmZCQCzr+1UnLx/3rtzcfL+E937Yqra9Cf05557DiUlJbNfRyIR7Nq1C/v370dFRQUOHTqEgwcPYv/+/QktkmEYhpFDesrF7/djxYoVqKioAADs2LEDHR0dSVsYwzAMI4bpT+gPP/wwotEoysvL8aMf/QihUAjr1q2bvZ6bm4tIJILR0VHk5ORYsVaGYRhmCdLMhHOFQiEUFhZicnISTz31FC5evIjbbrsNLS0tePHFF2ef99nPfhZvvPGGqYI+MTEBv9+f0OLNMj6ehs7ONQgGM1FUNIGqqlFkZXEmGbOQtMg41lzoROblICYyijC6ugpRIyvVy1LG+NQ4OkOdCI4FUbSqCFUFVchKd87+7URZWdkCD2/qE3phYSGAWAOzvr4e3/nOd3DPPfdgcHBw9jkjIyMwDEP403lZWRn8fj/Ky8uF/pxZfD6grg4YHv74sfx8oK0NqKy05FsK09XVZdn+qUNq72Ef8EYdMD7nzZKVD3yhDchL/puF1N4B+II+1B2uw/DFj/eff3U+2v63DZUu/fevkkT2vtSH4WUd+qVLl/Dhhx8CiA20v/rqqygtLUVZWRnGx8fx1ltvAQBefvllVFdXSy3QKsbGgNraK4s5EPu6tjZ2nWEAAB+NAW/UXlnMgdjXb9TGrmvM2OUx1B6uvaKYA8DwxWHUHq7F2GW9968Ly35CD4fD+P73v4+pqSlEIhFs2LABu3fvhmEYOHDgAHbv3n3F2CIlWlsXFvMZhodj1+vr1a6JIcpA68JiPsP4cOx6sb5vltb/tC4o5jMMXxxG639aUb9R3/3rwrIF/frrr8fRo0cXvXbTTTehra0t2WtKGmfPJnadIcZHl4D3WoGLfcDq9UDRduCqlcl57QvLvBmWu25zzp5fen/LXdeBS5cvofWdVvSN9mH9mvXY/pntWJmRpPeXIlL2m6IqWL8+sesMIcK+hUokmX579TJvhuWu25z1a5be33LX7Y4v6FugnKzsH1iF1mmL1dWAEWeHhhG7ztgAFX67aHvs/yAWIys/dl1jtn9mO/KvXnz/+VfnY/tn9N2/Tv0DrQt6RwcQiSx+LRKJXWdsgBm/nShXrYx92p9f1Gf+FZAstUOUlRkr0fa/bQuK+synVLupBxHM9A/sgtbKhR26Jqjy23mVwJe6gX81Ah8GgGw3sHEfkJWbnNcnTqWrEn0P9qH1P604e/6sZR55xlWf7D2JQGYg5a5ap/6B1gWdHbomqPLb8z398DHgvRbL5tApsjJjpaXTLPNddXOgOeWuWqf+gdbKhR26Jqjw2w6fQ1cBVVetU/9A64LODl0TVPhtFZ7e4VB11Tr1D7RWLuzQNSKvEvhKX6ywXjjLc+hEEJndpuyqVfUPrEbrgs4OXTOuWmndb2s6fA5dBtHZbequ2ur+gQq0Vi7s0BnTFFQj/o+DMX2dmUHGh+vkqqmidUFnh86YZqgDQJw3CyLT15kZZHy4Tq6aKlorF1mHfulSLLirry+mZbZvB1bye81+iGS/sEMXQtaHz3XV3m4vPDd4yLhqznIhjoxD9/kWRu5Sy09nTCCa/cIOXYhEfPiMq3ZPulG+kUYeOme52IDt22PFeDHy82PX58L56ZogM1Pu8CwXUXTy4VTn42XQuqCvXBn7ZD2/qM984p6vUczkpzM2QGam3OFZLqLo5MOpzsfLoLVyAWKapK8vVozPnl3aiScyt87enRCyPtzhWS6i6DK7rXI+3uocG+0LOhArrGZOJpKdW2fvTgxZH85ZLsLoMLutaj5eRY6N1spFFFHnDrB3J4mMD+csF8eioh+gytNzQZ+DqHMH2LuTRMaHc5aLY1HRD1Dl6R2hXEQQce6A2rwYrTz99Ix4Qfgk0B9Ibi4LIJ79wnPojqbSVYnund1o7GxEIByA+xNu7Kvah9xVyemfqPL0XNAXwaxzB9TlxWjl6ee4ahcAvN+c3PNBZxDJfuE5dEcz328f6zuGlp6WpPltVZ6elUuCyHh3UbTy9FRdNc+hOxYVflvV3D4X9ASR8e6iaOXpqbpqnkN3LCr8tqq5fUcoF6vds6h3F0WrXHeVrlokywWwPnNdQ3TIP1Hlt1Xk2Ghf0FW5ZxHvLopWue6pOh8UMOfprcxc1wxd8k9U5rRbnWOjtXLRxT2r8PTK4PNBtUCn/BOdcmm0Lui6uGcVnl4ZfD6oFuiUf6JTLo3WykWnPHSrPb1S5rjq4GkvXCUee58PKurqNYDy+aAyWD2HrgqtC7pueehWenrlTLvqobAbruIku0SVM+Wyrt7mUD8fVBSr59BVobVy4Tx0h6JqptzBrl4n76xTP0Drgs556A5F1Uy5g129Tt5Zp36AkHL5xS9+geeffx5tbW0oKSnBqVOn0NjYiImJCbhcLjzzzDPIy8uzaq1SqMpDZ4ihItvc4fkvnIdObw7fdEHv7u7GqVOn4HK5AACRSAS7du3C/v37UVFRgUOHDuHgwYPYv3+/ZYuVxeo8dIYgKrLNOf/F0XnoFOfwTSmXyclJPPHEE9izZ8/sY36/HytWrEBFRQUAYMeOHejo6LBkkaqorgaMOH8jhhG7ztgAVW6b81+0QKYfQNW7myroP//5z/GVr3wFRUVFs4+FQiGsW7du9uvc3FxEIhGMjo4mfZGq6OgAIpHFr0UiseuMDVDltjn/RQtk+gFUvfuyyuXtt9+G3+/Hww8/bMkC/H4/AKCrq8uS1weA8fE0dHauQTCYiaKiCVRVjSIrK7rgeV5vARALdF0UrzcIt3vIkjVauX/qJHvvBWHvEncRCJ72YijsXvRaWmQcay50IvNyEBMZRRhdXYWokbXEqxlIu74FORdOYMXlgY//TL8B9C+/Lyffd8D8/senxtEZ6kRwLIiiVUWoKqhCVvpS90UMAwZaNrfgxNAJDFwamP0expCBrqGFa/T2epd8PW+3F+7Jxd9jM1hx75ct6D6fD2fOnMEtt9wCABgaGsI3v/lN3H333RgcHJx93sjICAzDQE5OjtACysrK4Pf7UV6e/FwDIDZXXldnbq48EACam+O/lsfjQnn5UqVCjq6uLsv2Tx1L9t4fiGWsx8FV4ll89j3sA96ok5wp3yS8TCffd8D8/n1BH+oO1ylx1ZtM3sdAZgDNgfjvMc8NniWzWhK59xMTE7MfhOezrHL51re+hZMnT+L48eM4fvw4CgoK8Jvf/Ab33XcfxsfH8dZbbwEAXn75ZVQTk8yic+VaZaY4GT5TVBuoumqqc/jSc+iGYeDAgQPYu3cvbr/9dvh8Pjz00EPJXFvCiM6Va5WZ4mT4TFFtoOqqqc7hC//q//Hjx2f/+6abbkJbW1tSF5RMZObKtcpMcTJ8pqgU1OaqKWfGUJzD5yyXRdAqM8XJ8JmiQlCcq6aeGUNtDl/rX/3nuXLGNAXViP/jYExf1xd21XqgdUHnuXLGNEMdAOK8WRCZvq4v7Kr1wHbKRSS
|
2022-05-17 23:08:12 +02:00
|
|
|
|
"text/plain": [
|
|
|
|
|
"<Figure size 432x288 with 1 Axes>"
|
|
|
|
|
]
|
|
|
|
|
},
|
2022-05-18 18:09:00 +02:00
|
|
|
|
"metadata": {},
|
2022-05-17 23:08:12 +02:00
|
|
|
|
"output_type": "display_data"
|
|
|
|
|
}
|
|
|
|
|
],
|
|
|
|
|
"source": [
|
|
|
|
|
"import matplotlib.pyplot as plt\n",
|
|
|
|
|
"import numpy as np\n",
|
|
|
|
|
"fig = plt.figure()\n",
|
|
|
|
|
"ax = fig.add_subplot(111)\n",
|
|
|
|
|
"setosa = iris_data[:50]\n",
|
|
|
|
|
"versicolor = iris_data[50:100]\n",
|
|
|
|
|
"virginica = iris_data[100:150]\n",
|
|
|
|
|
"# ax.scatter(setosa['sepal.length'],np.arange(50), color='blue', lw=2)\n",
|
|
|
|
|
"# ax.scatter(versicolor['sepal.length'],np.arange(50), color='orange', lw=2)\n",
|
|
|
|
|
"# ax.scatter(virginica['sepal.length'],np.arange(50), color='green', lw=2) \n",
|
|
|
|
|
"\n",
|
|
|
|
|
"ax.scatter(setosa['petal.width'],np.arange(50), color='blue', lw=2)\n",
|
|
|
|
|
"ax.scatter(versicolor['petal.width'],np.arange(50), color='orange', lw=2)\n",
|
2022-05-18 18:09:00 +02:00
|
|
|
|
"ax.scatter(virginica['petal.width'],np.arange(50), color='green', lw=2)"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "code",
|
|
|
|
|
"execution_count": 42,
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"outputs": [
|
|
|
|
|
{
|
|
|
|
|
"name": "stdout",
|
|
|
|
|
"output_type": "stream",
|
|
|
|
|
"text": [
|
|
|
|
|
"ShapiroResult(statistic=0.9776989221572876, pvalue=0.4595281183719635)\n",
|
|
|
|
|
"ShapiroResult(statistic=0.9778355956077576, pvalue=0.46473264694213867)\n",
|
|
|
|
|
"ShapiroResult(statistic=0.9711798429489136, pvalue=0.25832483172416687)\n"
|
|
|
|
|
]
|
|
|
|
|
}
|
|
|
|
|
],
|
|
|
|
|
"source": [
|
|
|
|
|
"from scipy import stats\n",
|
|
|
|
|
"print(stats.shapiro(setosa['sepal.length']))\n",
|
|
|
|
|
"print(stats.shapiro(versicolor['sepal.length']))\n",
|
|
|
|
|
"print(stats.shapiro(virginica['sepal.length']))"
|
2022-05-17 23:08:12 +02:00
|
|
|
|
]
|
|
|
|
|
},
|
2022-05-17 17:30:50 +02:00
|
|
|
|
{
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
"metadata": {
|
|
|
|
|
"id": "Yabcm4Rei2ue"
|
|
|
|
|
},
|
|
|
|
|
"source": [
|
|
|
|
|
"![GaussianNB.png](data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAA8AAAAKACAYAAABT+RFDAAABhWlDQ1BJQ0MgcHJvZmlsZQAAKJF9kT1Iw0AcxV9TpVUqDnYQdchQHcSCqIijVqEIFUKt0KqDyaVf0KQhSXFxFFwLDn4sVh1cnHV1cBUEwQ8QJ0cnRRcp8X9JoUWsB8f9eHfvcfcOEGolplkd44Cm22YyHhPTmVUx8IogBHRhEKMys4w5SUqg7fi6h4+vd1Ge1f7cn6NHzVoM8InEs8wwbeIN4ulN2+C8TxxmBVklPiceM+mCxI9cVzx+45x3WeCZYTOVnCcOE4v5FlZamBVMjXiKOKJqOuULaY9VzluctVKFNe7JXxjK6ivLXKc5hDgWsQQJIhRUUEQJNqK06qRYSNJ+rI1/wPVL5FLIVQQjxwLK0CC7fvA/+N2tlZuc8JJCMaDzxXE+hoHALlCvOs73sePUTwD/M3ClN/3lGjDzSXq1qUWOgN5t4OK6qSl7wOUO0P9kyKbsSn6aQi4HvJ/RN2WAvluge83rrbGP0wcgRV0lboCDQ2AkT9nrbd4dbO3t3zON/n4ARb1ylXHM+TcAAAAGYktHRAD0APwA/9UrKNsAAAAJcEhZcwAACxMAAAsTAQCanBgAAAAHdElNRQfkCAIJHTANHBdBAAAAGXRFWHRDb21tZW50AENyZWF0ZWQgd2l0aCBHSU1QV4EOFwAAIABJREFUeNrs3Xd4lfX9//HXnZzskDCSkEASVgxLZIugAkqrIEgrw4KihCVVsa2jitZRrP1+7VeLP5xQxdAUBC1EGnEUWTKkypAhIAEkJoEMdgbZ5/z+SLlznwxIQgJJzvNxXbmu3Ofc5z73fX+Okdd5f4bhcDgcAgAAAACgiXPjFgAAAAAACMAAAAAAABCAAQAAAAAgAAMAAAAAQAAGAAAAAIAADAAAAAAAARgAAAAAAAIwAAAAAAAEYAAAAAAAARgAAAAAAAIwAAAAAABNga0xnrSXl5eCg4NpPQAAAABwYSdOnFBBQUHTDsDBwcFKTU2ltQEAAADAhYWHh9dof7pAAwAAAABcAgEYAAAAAEAABgAAAACAAAwAAAAAAAEYAAAAAAACMAAAAAAABGAAAAAAAAjAAAAAAADUko1bAAAAADQODofD/AGaAsMwzB8CMAAAAAAVFRXp1KlTOnfunOx2OzcETY6fn58CAgIUGBhYr2GYAAwAAAA0YCUlJUpOTpaXl5ciIiLk5eXFTUGTUlxcrNzcXJ08eVLnz59XWFhYvYVgAjAAAADQgJ06dUo2m01t27a9Yt1EgSvJ3d1dXl5eCggIUFJSks6dO6fmzZvXy3sxCRYAAADQgOXm5qp58+aEXzR5NptNLVq0UFZWVr29BwEYAAAAaKAcDofy8/Pl4+PDzYBL8Pf3V25ubr1N9EYABgAAABpwAJZKu4gCrsBmszl99gnAAAAAgIsFYIDPPgEYAAAAAAACMAAAAAAABGAAAAAATcqhQ4c0a9YsdevWTX5+fvL29lZ4eLj69++vWbNmacWKFdwk1D4AHzp0SIMGDVJ0dLT69++vffv2Vbmvw+HQrbfeWmEtp1WrVqlLly665pprNGbMmHqd7hoAAABA0xQfH68ePXrorbfeUmZmpm688UaNHTtW1113nY4dO6a33npLM2fOvOz3SUpKkmEYat++PTfd1QLwzJkz9cADDygxMVFPPfWUYmJiqtz3tddeU6dOnZwey8nJ0bRp07Ry5UodOnRIbdq00Z/+9CdaBAAAAEC1ZWRkaPLkySooKNDjjz+u1NRUrV69WkuWLNFnn32m48ePa/v27ZoxYwY3C7ULwJmZmdq+fbsmTZokSRo7dqxSUlJ0+PDhCvvu27dPK1eu1OzZs50e//zzz9W7d2916dJFkvTQQw9p6dKltAgAAACAalu1apVycnLUpk0bvfrqq/L29q6wT9++ffW///u/3CzULgCnpKQoLCzMXKPJMAxFRkYqOTnZab+ioiLNmDFDCxYsqLB2WXJystq1a2dut2/fXmlpaSouLqZVAAAAAFRLRkaGJCk4OLjGry0uLtZ7772noUOHqmXLlvLy8lKHDh304IMPKiUlxWnfmJgYdejQQZL0008/yTAMp5/yli1bpmHDhpnHbdeunaZOnarExMRKzyUtLU2//e1vFR0dLW9vb/n6+ioiIkLDhg3Tq6++WmH/+Ph4TZ8+Xddee61atGghb29vdejQQVOnTtXBgwf5YFTBVp8HnzNnjsaMGaOuXbsqKSmp1seZO3eu5s6da27n5OTQcgAanJM5Bfp0T5pyCsq+yOsWFqBbuoRwcwAAqCeRkZGSpO+//15r167VsGHDqvW67OxsjR49Whs2bJC/v7/69u2r4OBg7d27V/Pnz9c///lPffnll+rdu7ck6aabblJOTo5WrFghPz8/jRs3rtLjOhwOxcTEKC4uTjabTYMHD1ZISIh27typ2NhYffjhh1qxYoWGDx9uviY9PV39+vXT8ePHFRkZqeHDh8vb21vHjx/Xrl27tGPHDj3xxBNO73P33XfLy8tL3bp106233qri4mJ9//33io2N1UcffaTVq1dr0KBBfEDKMRy1WGE4MzNTUVFROn36tGw2mxwOh8LCwrR582ZFRUWZ+918881KTk6WYRgqLi42G3Tbtm3asGGDFi5cqC+++EKStH//ft12221KTU295PuHh4dXaz8AuFLyi0p05xubdSiz4hd0/3NXD90zIJKbBACosZKSEiUmJio6OrpCj8oL//9JPn2+8YbXlr7y9nC/rGPk5OSoS5cuOnbsmAzD0JAhQzRs2DD16dNH/fv3r7IyfO+99+qDDz7QqFGjtHDhQoWElH1h/f/+3//To48+qmuuuUYHDhww731SUpI6dOigdu3aVVngmz9/vh588EEFBQXpyy+/VK9evcxgPGfOHM2ZM0fNmzdXYmKieW4vvviiXnjhBT3wwAOaP3++U0W5qKhIGzdurBDsP/zwQ40aNUp+fn5O4fudd97Rww8/rO7du2vv3r2VVqcb82f+crNhrQKwJA0dOlQxMTGKiYnR8uXL9fLLL2v79u1V7p+UlKRevXrp7Nmz5jcunTp10saNG9WlSxfNmjVL3t7elZb3CcAAGrq31h/WK/+uvLtRgLdN658Yqlb+XtwoAECdhoHEjGzd9trGRnt9qx8drOjWzS77OAcPHtTkyZP1zTffVHiuV69emjlzpmbMmGHewwMHDqh79+4KCwvTDz/8oGbNKp7DyJEj9dlnn+mTTz7RqFGjqh2Ao6KidOTIEb3++ut65JFHnJ5zOBzq1auX9uzZoz//+c965plnJEkPP/yw3n77bcXHx+uuu+667PsxaNAgbd26Vfv27VO3bt0IwBa1ngV6wYIFWrBggaKjo/Xyyy8rNjZWkjR9+nQlJCRc8vXNmjXTe++9p1/+8peKiopSamqqnnvuOf7KAWh0jp/N05vryiYBbOZtU2RLX3M7K79Yr65mLA4AAPWlc+fO+s9//qNvvvlGzz//vG6//Xazurpr1y49+OCDGj58uAoLCyVJn332mRwOh0aMGFFp+JVKC36S9PXXX1f7PFJTU3XkyBFJ0uTJkys8bxiGpkyZIklav369+fj1118vSZo9e7bi4+OrPeTz8OHDevPNN/W73/1O06ZNMwuUF8ZFMxa4ItvlfMi2bt1a4fH33nuv0v3bt29vVn8vGD16tEaPHk0rAGjU/uezA8orKjG3F025Xn3btdDMf2zXv/eV/g9o2bYUTbw+UteFN+eGAQBQT66//nozTDocDn333Xd65ZVXtGzZMq1Zs0bz5s3T73//e/3444+SpIULF2rhwoUXPeaJEyeq/f7Hjh2TJLVq1UoBAQGV7nNhedgL+0rSfffdpy+//FJLlizR2LFj5e7urm7duummm27SuHHjdOuttzodo6SkRLNmzdKCBQt0sQ69WVlZfCjqKgADAKStR05p1Z40c3tMn7bq266FJOnZkd204eAJFRTb5XBILyTs04pfD5Kbm8GNAwDUiciWvlr96OBGff71xTAM9enTR0
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
"metadata": {
|
|
|
|
|
"id": "dsf6FnlgjiOL"
|
|
|
|
|
},
|
|
|
|
|
"source": [
|
|
|
|
|
"# Funkcja gęstości prawdopodobieństwa rozkładu normalnego \n",
|
|
|
|
|
"![gestosc.svg](data:image/svg+xml;base64,PHN2ZyB4bWxuczp4bGluaz0iaHR0cDovL3d3dy53My5vcmcvMTk5OS94bGluayIgd2lkdGg9IjM2LjMyOGV4IiBoZWlnaHQ9IjcuNTA5ZXgiIHN0eWxlPSJ2ZXJ0aWNhbC1hbGlnbjogLTMuMTcxZXg7IiB2aWV3Qm94PSIwIC0xODY3LjcgMTU2NDEuMiAzMjMzLjIiIHJvbGU9ImltZyIgZm9jdXNhYmxlPSJmYWxzZSIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIiBhcmlhLWxhYmVsbGVkYnk9Ik1hdGhKYXgtU1ZHLTEtVGl0bGUiPgo8dGl0bGUgaWQ9Ik1hdGhKYXgtU1ZHLTEtVGl0bGUiPntcZGlzcGxheXN0eWxlIGZfe1xtdSAsXHNpZ21hIH0oeCk9e1xmcmFjIHsxfXtcc2lnbWEge1xzcXJ0IHsyXHBpIH19fX1cLFxleHAgXGxlZnQoe1xmcmFjIHstKHgtXG11ICleezJ9fXsyXHNpZ21hIF57Mn19fVxyaWdodCkufTwvdGl0bGU+CjxkZWZzIGFyaWEtaGlkZGVuPSJ0cnVlIj4KPHBhdGggc3Ryb2tlLXdpZHRoPSIxIiBpZD0iRTEtTUpNQVRISS02NiIgZD0iTTExOCAtMTYyUTEyMCAtMTYyIDEyNCAtMTY0VDEzNSAtMTY3VDE0NyAtMTY4UTE2MCAtMTY4IDE3MSAtMTU1VDE4NyAtMTI2UTE5NyAtOTkgMjIxIDI3VDI2NyAyNjdUMjg5IDM4MlYzODVIMjQyUTE5NSAzODUgMTkyIDM4N1ExODggMzkwIDE4OCAzOTdMMTk1IDQyNVExOTcgNDMwIDIwMyA0MzBUMjUwIDQzMVEyOTggNDMxIDI5OCA0MzJRMjk4IDQzNCAzMDcgNDgyVDMxOSA1NDBRMzU2IDcwNSA0NjUgNzA1UTUwMiA3MDMgNTI2IDY4M1Q1NTAgNjMwUTU1MCA1OTQgNTI5IDU3OFQ0ODcgNTYxUTQ0MyA1NjEgNDQzIDYwM1E0NDMgNjIyIDQ1NCA2MzZUNDc4IDY1N0w0ODcgNjYyUTQ3MSA2NjggNDU3IDY2OFE0NDUgNjY4IDQzNCA2NThUNDE5IDYzMFE0MTIgNjAxIDQwMyA1NTJUMzg3IDQ2OVQzODAgNDMzUTM4MCA0MzEgNDM1IDQzMVE0ODAgNDMxIDQ4NyA0MzBUNDk4IDQyNFE0OTkgNDIwIDQ5NiA0MDdUNDkxIDM5MVE0ODkgMzg2IDQ4MiAzODZUNDI4IDM4NUgzNzJMMzQ5IDI2M1EzMDEgMTUgMjgyIC00N1EyNTUgLTEzMiAyMTIgLTE3M1ExNzUgLTIwNSAxMzkgLTIwNVExMDcgLTIwNSA4MSAtMTg2VDU1IC0xMzJRNTUgLTk1IDc2IC03OFQxMTggLTYxUTE2MiAtNjEgMTYyIC0xMDNRMTYyIC0xMjIgMTUxIC0xMzZUMTI3IC0xNTdMMTE4IC0xNjJaIj48L3BhdGg+CjxwYXRoIHN0cm9rZS13aWR0aD0iMSIgaWQ9IkUxLU1KTUFUSEktM0JDIiBkPSJNNTggLTIxNlE0NCAtMjE2IDM0IC0yMDhUMjMgLTE4NlEyMyAtMTc2IDk2IDExNlQxNzMgNDE0UTE4NiA0NDIgMjE5IDQ0MlEyMzEgNDQxIDIzOSA0MzVUMjQ5IDQyM1QyNTEgNDEzUTI1MSA0MDEgMjIwIDI3OVQxODcgMTQyUTE4NSAxMzEgMTg1IDEwN1Y5OVExODUgMjYgMjUyIDI2UTI2MSAyNiAyNzAgMjdUMjg3IDMxVDMwMiAzOFQzMTUgNDVUMzI3IDU1VDMzOCA2NVQzNDggNzdUMzU2IDg4VDM2NSAxMDBMMzcyIDExMEw0MDggMjUzUTQ0NCAzOTUgNDQ4IDQwNFE0NjEgNDMxIDQ5MSA0MzFRNTA0IDQzMSA1MTIgNDI0VDUyMyA0MTJUNTI1IDQwMkw0NDkgODRRNDQ4IDc5IDQ0OCA2OFE0NDggNDMgNDU1IDM1VDQ3NiAyNlE0ODUgMjcgNDk2IDM1UTUxNyA1NSA1MzcgMTMxUTU0MyAxNTEgNTQ3IDE1MlE1NDkgMTUzIDU1NyAxNTNINTYxUTU4MCAxNTMgNTgwIDE0NFE1ODAgMTM4IDU3NSAxMTdUNTU1IDYzVDUyMyAxM1E1MTAgMCA0OTEgLThRNDgzIC0xMCA0NjcgLTEwUTQ0NiAtMTAgNDI5IC00VDQwMiAxMVQzODUgMjlUMzc2IDQ0VDM3NCA1MUwzNjggNDVRMzYyIDM5IDM1MCAzMFQzMjQgMTJUMjg4IC00VDI0NiAtMTFRMTk5IC0xMSAxNTMgMTJMMTI5IC04NVExMDggLTE2NyAxMDQgLTE4MFQ5MiAtMjAyUTc2IC0yMTYgNTggLTIxNloiPjwvcGF0aD4KPHBhdGggc3Ryb2tlLXdpZHRoPSIxIiBpZD0iRTEtTUpNQUlOLTJDIiBkPSJNNzggMzVUNzggNjBUOTQgMTAzVDEzNyAxMjFRMTY1IDEyMSAxODcgOTZUMjEwIDhRMjEwIC0yNyAyMDEgLTYwVDE4MCAtMTE3VDE1NCAtMTU4VDEzMCAtMTg1VDExNyAtMTk0UTExMyAtMTk0IDEwNCAtMTg1VDk1IC0xNzJROTUgLTE2OCAxMDYgLTE1NlQxMzEgLTEyNlQxNTcgLTc2VDE3MyAtM1Y5TDE3MiA4UTE3MCA3IDE2NyA2VDE2MSAzVDE1MiAxVDE0MCAwUTExMyAwIDk2IDE3WiI+PC9wYXRoPgo8cGF0aCBzdHJva2Utd2lkdGg9IjEiIGlkPSJFMS1NSk1BVEhJLTNDMyIgZD0iTTE4NCAtMTFRMTE2IC0xMSA3NCAzNFQzMSAxNDdRMzEgMjQ3IDEwNCAzMzNUMjc0IDQzMFEyNzUgNDMxIDQxNCA0MzFINTUyUTU1MyA0MzAgNTU1IDQyOVQ1NTkgNDI3VDU2MiA0MjVUNTY1IDQyMlQ1NjcgNDIwVDU2OSA0MTZUNTcwIDQxMlQ1NzEgNDA3VDU3MiA0MDFRNTcyIDM1NyA1MDcgMzU3UTUwMCAzNTcgNDkwIDM1N1Q0NzYgMzU4SDQxNkw0MjEgMzQ4UTQzOSAzMTAgNDM5IDI2M1E0MzkgMTUzIDM1OSA3MVQxODQgLTExWk0zNjEgMjc4UTM2MSAzNTggMjc2IDM1OFExNTIgMzU4IDExNSAxODRRMTE0IDE4MCAxMTQgMTc4UTEwNiAxNDEgMTA2IDExN1ExMDYgNjcgMTMxIDQ3VDE4OCAyNlEyNDIgMjYgMjg3IDczUTMxNiAxMDMgMzM0IDE1M1QzNTYgMjMzVDM2MSAyNzhaIj48L3BhdGg+CjxwYXRoIHN0cm9rZS13aWR0aD0iMSIgaWQ9IkUxLU1KTUFJTi0yOCIgZD0iTTk0IDI1MFE5NCAzMTkgMTA0IDM4MVQxMjcgNDg4VDE2NCA1NzZUMjAyIDY0M1QyNDQgNjk1VDI3NyA3MjlUMzAyIDc1MEgzMTVIMzE5UTMzMyA3NTAgMzMzIDc0MVEzMzMgNzM4IDMxNiA3MjBUMjc1IDY2N1QyMjYgNTgxVDE4NCA0NDNUMTY3IDI1MFQxODQgNThUMjI1IC04MVQyNzQgLTE2N1QzMTYgLTIyMFQzMzMgLTI0MVEzMzMgLTI1MCAzMTggLTI1MEgzMTVIMzAyTDI3NCAtMjI2UTE4MCAtMTQxIDEzNyAtMTRUOTQgMjUwWiI+PC9wYXRoPgo8cGF0aCBzdHJva2Utd2lkdGg9IjEiIGlkPSJFMS1NSk1BVEhJLTc4IiBkPSJNNTIgMjg5UTU5IDMzMSAxMDYgMzg2VDIyMiA0NDJRMjU3IDQ0MiAyODYgNDI0VDMyOSAzNzlRMzcxIDQ0MiA0MzAgNDQyUTQ2NyA0NDIgNDk0IDQyMFQ1MjIgMzYxUTUyMiAzMzIgNTA4IDMxNFQ0ODEgMjkyVDQ1OCAyO
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "code",
|
2022-05-18 18:09:00 +02:00
|
|
|
|
"execution_count": 3,
|
2022-05-17 17:30:50 +02:00
|
|
|
|
"metadata": {
|
|
|
|
|
"id": "v0oeHebytjNp"
|
|
|
|
|
},
|
|
|
|
|
"outputs": [],
|
|
|
|
|
"source": [
|
|
|
|
|
"import numpy as np\n",
|
|
|
|
|
"import pandas as pd\n",
|
|
|
|
|
"import scipy.stats as stats\n",
|
|
|
|
|
"import matplotlib.pyplot as plt\n",
|
|
|
|
|
"import seaborn as sns\n",
|
|
|
|
|
"from sklearn.model_selection import train_test_split\n",
|
|
|
|
|
"sns.set(style=\"whitegrid\")"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "code",
|
2022-05-18 18:09:00 +02:00
|
|
|
|
"execution_count": 4,
|
2022-05-17 17:30:50 +02:00
|
|
|
|
"metadata": {
|
|
|
|
|
"id": "fOYTA3VVtjNw"
|
|
|
|
|
},
|
|
|
|
|
"outputs": [],
|
|
|
|
|
"source": [
|
|
|
|
|
"class NaiveBayesClassifier():\n",
|
|
|
|
|
" def calc_prior(self, features, target):\n",
|
|
|
|
|
" '''\n",
|
|
|
|
|
" Wyliczenie prawdopodobieństwa a priori\n",
|
|
|
|
|
" '''\n",
|
|
|
|
|
" self.prior = (features.groupby(target).apply(lambda x: len(x)) / self.rows).to_numpy()\n",
|
|
|
|
|
"\n",
|
|
|
|
|
" return self.prior\n",
|
|
|
|
|
" \n",
|
|
|
|
|
" def calc_statistics(self, features, target):\n",
|
|
|
|
|
" '''\n",
|
|
|
|
|
" Wyliczenie średnich i wariancji dla danych\n",
|
|
|
|
|
" ''' \n",
|
|
|
|
|
" self.mean = features.groupby(target).apply(np.mean).to_numpy()\n",
|
|
|
|
|
" self.var = features.groupby(target).apply(np.var).to_numpy()\n",
|
|
|
|
|
" \n",
|
|
|
|
|
" return self.mean, self.var\n",
|
|
|
|
|
" \n",
|
|
|
|
|
" def gaussian_density(self, class_idx, x): \n",
|
|
|
|
|
" '''\n",
|
|
|
|
|
" Wyliczenie prawdopodobieństwa z rozkładu normalnego \n",
|
|
|
|
|
" (1/√2pi*σ) * exp((-1/2)*((x-μ)^2)/(2*σ²))\n",
|
|
|
|
|
" μ -średnia\n",
|
|
|
|
|
" σ² - wariancja\n",
|
|
|
|
|
" σ - odchylenie standardowe\n",
|
|
|
|
|
" '''\n",
|
|
|
|
|
" mean = self.mean[class_idx]\n",
|
|
|
|
|
" var = self.var[class_idx]\n",
|
2022-05-17 18:19:47 +02:00
|
|
|
|
" \n",
|
|
|
|
|
" numerator = np.exp((-1/2)*((x-mean)**2) / (2 * var)) # Licznik wzoru na gęstość rozkładu normalnego \n",
|
|
|
|
|
" denominator = np.sqrt(2 * np.pi * var) # Mianownik wzoru na gęstość rozkładu normalnego \n",
|
2022-05-17 17:30:50 +02:00
|
|
|
|
" prob = numerator / denominator\n",
|
2022-05-17 18:19:47 +02:00
|
|
|
|
" \n",
|
2022-05-17 17:30:50 +02:00
|
|
|
|
" return prob\n",
|
|
|
|
|
" \n",
|
2022-05-17 18:19:47 +02:00
|
|
|
|
" def classify(self, x):\n",
|
2022-05-17 17:30:50 +02:00
|
|
|
|
" '''\n",
|
|
|
|
|
" Wyliczenie prawdopodobieństwa a posteriori i zwrócenie klasy, dla której prawdopodobieństwo jest najwyższe\n",
|
|
|
|
|
" '''\n",
|
|
|
|
|
" posteriors = []\n",
|
|
|
|
|
" posteriors_no_log = []\n",
|
|
|
|
|
"\n",
|
|
|
|
|
" # calculate posterior probability for each class\n",
|
|
|
|
|
" for i in range(self.count):\n",
|
|
|
|
|
" prior = np.log(self.prior[i]) # Do predykcji używane jest prawodopodobieństwo logarytmiczne\n",
|
|
|
|
|
" prior_no_log = self.prior[i] # Zwykłe prawdopodobieństwo liczymy, żeby zwrócić je z predykcjami\n",
|
|
|
|
|
"\n",
|
|
|
|
|
" conditional = np.sum(np.log(self.gaussian_density(i, x))) \n",
|
|
|
|
|
" conditional_no_log = np.prod(self.gaussian_density(i, x))\n",
|
|
|
|
|
"\n",
|
|
|
|
|
" posterior = prior + conditional\n",
|
|
|
|
|
" posterior_no_log = prior_no_log * conditional_no_log\n",
|
|
|
|
|
"\n",
|
|
|
|
|
" posteriors.append(posterior)\n",
|
|
|
|
|
" posteriors_no_log.append(posterior_no_log)\n",
|
|
|
|
|
"\n",
|
|
|
|
|
" # Zwracamy klasę o największym prawdopodobieństwie\n",
|
|
|
|
|
" return self.classes[np.argmax(posteriors)], np.max(posteriors_no_log)\n",
|
|
|
|
|
"\n",
|
|
|
|
|
" def fit(self, features, target):\n",
|
|
|
|
|
" '''\n",
|
|
|
|
|
" Główna metoda trenująca model\n",
|
|
|
|
|
" '''\n",
|
|
|
|
|
" self.classes = np.unique(target)\n",
|
|
|
|
|
" self.count = len(self.classes)\n",
|
|
|
|
|
" self.feature_nums = features.shape[1]\n",
|
|
|
|
|
" self.rows = features.shape[0]\n",
|
|
|
|
|
" \n",
|
|
|
|
|
" self.calc_statistics(features, target)\n",
|
|
|
|
|
" self.calc_prior(features, target)\n",
|
|
|
|
|
" \n",
|
|
|
|
|
" def predict(self, features):\n",
|
|
|
|
|
" '''\n",
|
|
|
|
|
" Predykcja wartości dla każdego wiersza\n",
|
|
|
|
|
" '''\n",
|
2022-05-17 18:19:47 +02:00
|
|
|
|
" preds = [self.classify(f) for f in features.to_numpy()]\n",
|
2022-05-17 17:30:50 +02:00
|
|
|
|
" return preds\n",
|
|
|
|
|
"\n",
|
|
|
|
|
" def accuracy(self, y_test, y_pred):\n",
|
|
|
|
|
" '''\n",
|
|
|
|
|
" Wyliczenie accuracy modelu\n",
|
|
|
|
|
" '''\n",
|
|
|
|
|
" accuracy = np.sum(y_test == y_pred) / len(y_test)\n",
|
|
|
|
|
" return accuracy\n",
|
|
|
|
|
"\n",
|
|
|
|
|
" def visualize(self, y_true, y_pred, target):\n",
|
|
|
|
|
" '''\n",
|
|
|
|
|
" Narysowanie wykresu porównującego rozkład klas prawdziwych i przewidzianych\n",
|
|
|
|
|
" '''\n",
|
|
|
|
|
" tr = pd.DataFrame(data=y_true, columns=[target])\n",
|
|
|
|
|
" pr = pd.DataFrame(data=y_pred, columns=[target])\n",
|
|
|
|
|
" \n",
|
|
|
|
|
" \n",
|
|
|
|
|
" fig, ax = plt.subplots(1, 2, sharex='col', sharey='row', figsize=(15,6))\n",
|
|
|
|
|
" \n",
|
2022-05-17 18:19:47 +02:00
|
|
|
|
" sns.countplot(x=target, data=tr, ax=ax[0], alpha=0.7, hue=target, dodge=False)\n",
|
|
|
|
|
" sns.countplot(x=target, data=pr, ax=ax[1], alpha=0.7, hue=target, dodge=False)\n",
|
2022-05-17 17:30:50 +02:00
|
|
|
|
" \n",
|
|
|
|
|
" ax[0].tick_params(labelsize=12)\n",
|
|
|
|
|
" ax[1].tick_params(labelsize=12)\n",
|
|
|
|
|
" ax[0].set_title(\"Prawdziwe wartości\", fontsize=18)\n",
|
2022-05-17 18:19:47 +02:00
|
|
|
|
" ax[1].set_title(\"Predykcje\", fontsize=18)\n",
|
2022-05-17 17:30:50 +02:00
|
|
|
|
" plt.show()\n"
|
|
|
|
|
]
|
|
|
|
|
},
|
2022-05-17 18:54:16 +02:00
|
|
|
|
{
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"source": [
|
|
|
|
|
"### Pitność wody"
|
|
|
|
|
]
|
|
|
|
|
},
|
2022-05-17 17:30:50 +02:00
|
|
|
|
{
|
|
|
|
|
"cell_type": "code",
|
2022-05-18 18:09:00 +02:00
|
|
|
|
"execution_count": 5,
|
2022-05-17 17:30:50 +02:00
|
|
|
|
"metadata": {
|
|
|
|
|
"colab": {
|
|
|
|
|
"base_uri": "https://localhost:8080/",
|
|
|
|
|
"height": 382
|
|
|
|
|
},
|
|
|
|
|
"id": "5-riUAGntjN2",
|
|
|
|
|
"outputId": "f87f047d-bc71-41ef-a43a-17b6f7cf84c3"
|
|
|
|
|
},
|
|
|
|
|
"outputs": [
|
|
|
|
|
{
|
|
|
|
|
"name": "stdout",
|
|
|
|
|
"output_type": "stream",
|
|
|
|
|
"text": [
|
|
|
|
|
"(2948, 9) (2948,)\n",
|
|
|
|
|
"(328, 9) (328,)\n"
|
|
|
|
|
]
|
|
|
|
|
}
|
|
|
|
|
],
|
|
|
|
|
"source": [
|
|
|
|
|
"# Preprocessing danych\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"# Uzupełnienie pustych wartości w kolumnach\n",
|
|
|
|
|
"def fill_nan(df):\n",
|
|
|
|
|
" for index, column in enumerate(df.columns[:9]):\n",
|
|
|
|
|
" df[column] = df[column].fillna(df.groupby('Potability')[column].transform('mean'))\n",
|
|
|
|
|
" return df\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"# Wczytywanie danych\n",
|
|
|
|
|
"df = pd.read_csv(\"water_potability.csv\")\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"df = fill_nan(df)\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"# Zrandomizowanie kolejności danych w datasecie\n",
|
|
|
|
|
"df = df.sample(frac=1, random_state=10).reset_index(drop=True)\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"# Podział na atrybuty i przewidywane wartości\n",
|
|
|
|
|
"X, y = df.iloc[:, :-1], df.iloc[:, -1]\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"# Normalizacja i skalowanie danych\n",
|
|
|
|
|
"from sklearn.preprocessing import StandardScaler\n",
|
|
|
|
|
"sc = StandardScaler()\n",
|
|
|
|
|
"X = sc.fit_transform(X.to_numpy())\n",
|
|
|
|
|
"X = pd.DataFrame(X, columns=df.columns.values.tolist()[:-1])\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"# Podział na dane trenujące i testowe, z uwzględnieniem równego rozłożenia danych\n",
|
|
|
|
|
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, stratify=y, random_state=1)\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"print(X_train.shape, y_train.shape)\n",
|
|
|
|
|
"print(X_test.shape, y_test.shape)"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "code",
|
2022-05-18 18:09:00 +02:00
|
|
|
|
"execution_count": 6,
|
2022-05-17 17:30:50 +02:00
|
|
|
|
"metadata": {
|
|
|
|
|
"id": "O82SGzK6tjN5"
|
|
|
|
|
},
|
|
|
|
|
"outputs": [
|
|
|
|
|
{
|
|
|
|
|
"data": {
|
|
|
|
|
"text/html": [
|
|
|
|
|
"<div>\n",
|
|
|
|
|
"<style scoped>\n",
|
|
|
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|
|
|
|
" vertical-align: middle;\n",
|
|
|
|
|
" }\n",
|
|
|
|
|
"\n",
|
|
|
|
|
" .dataframe tbody tr th {\n",
|
|
|
|
|
" vertical-align: top;\n",
|
|
|
|
|
" }\n",
|
|
|
|
|
"\n",
|
|
|
|
|
" .dataframe thead th {\n",
|
|
|
|
|
" text-align: right;\n",
|
|
|
|
|
" }\n",
|
|
|
|
|
"</style>\n",
|
|
|
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
|
|
|
" <thead>\n",
|
|
|
|
|
" <tr style=\"text-align: right;\">\n",
|
|
|
|
|
" <th></th>\n",
|
|
|
|
|
" <th>ph</th>\n",
|
|
|
|
|
" <th>Hardness</th>\n",
|
|
|
|
|
" <th>Solids</th>\n",
|
|
|
|
|
" <th>Chloramines</th>\n",
|
|
|
|
|
" <th>Sulfate</th>\n",
|
|
|
|
|
" <th>Conductivity</th>\n",
|
|
|
|
|
" <th>Organic_carbon</th>\n",
|
|
|
|
|
" <th>Trihalomethanes</th>\n",
|
|
|
|
|
" <th>Turbidity</th>\n",
|
|
|
|
|
" </tr>\n",
|
|
|
|
|
" </thead>\n",
|
|
|
|
|
" <tbody>\n",
|
|
|
|
|
" <tr>\n",
|
|
|
|
|
" <th>1022</th>\n",
|
|
|
|
|
" <td>0.003078</td>\n",
|
|
|
|
|
" <td>0.688791</td>\n",
|
|
|
|
|
" <td>0.846257</td>\n",
|
|
|
|
|
" <td>1.428934</td>\n",
|
|
|
|
|
" <td>-0.858263</td>\n",
|
|
|
|
|
" <td>0.002792</td>\n",
|
|
|
|
|
" <td>0.913790</td>\n",
|
|
|
|
|
" <td>0.232417</td>\n",
|
|
|
|
|
" <td>2.319505</td>\n",
|
|
|
|
|
" </tr>\n",
|
|
|
|
|
" <tr>\n",
|
|
|
|
|
" <th>3191</th>\n",
|
|
|
|
|
" <td>-0.587365</td>\n",
|
|
|
|
|
" <td>0.223203</td>\n",
|
|
|
|
|
" <td>-0.731867</td>\n",
|
|
|
|
|
" <td>0.397503</td>\n",
|
|
|
|
|
" <td>0.759893</td>\n",
|
|
|
|
|
" <td>0.330607</td>\n",
|
|
|
|
|
" <td>0.094379</td>\n",
|
|
|
|
|
" <td>0.282563</td>\n",
|
|
|
|
|
" <td>0.235024</td>\n",
|
|
|
|
|
" </tr>\n",
|
|
|
|
|
" <tr>\n",
|
|
|
|
|
" <th>13</th>\n",
|
|
|
|
|
" <td>0.003078</td>\n",
|
|
|
|
|
" <td>-0.241037</td>\n",
|
|
|
|
|
" <td>0.773051</td>\n",
|
|
|
|
|
" <td>0.580019</td>\n",
|
|
|
|
|
" <td>1.334369</td>\n",
|
|
|
|
|
" <td>-0.049130</td>\n",
|
|
|
|
|
" <td>-1.121422</td>\n",
|
|
|
|
|
" <td>-0.200432</td>\n",
|
|
|
|
|
" <td>-0.946356</td>\n",
|
|
|
|
|
" </tr>\n",
|
|
|
|
|
" <tr>\n",
|
|
|
|
|
" <th>2068</th>\n",
|
|
|
|
|
" <td>-2.176058</td>\n",
|
|
|
|
|
" <td>1.443006</td>\n",
|
|
|
|
|
" <td>-1.626771</td>\n",
|
|
|
|
|
" <td>-4.164610</td>\n",
|
|
|
|
|
" <td>-0.033706</td>\n",
|
|
|
|
|
" <td>-1.050763</td>\n",
|
|
|
|
|
" <td>-0.391328</td>\n",
|
|
|
|
|
" <td>-0.398649</td>\n",
|
|
|
|
|
" <td>-0.298341</td>\n",
|
|
|
|
|
" </tr>\n",
|
|
|
|
|
" <tr>\n",
|
|
|
|
|
" <th>1484</th>\n",
|
|
|
|
|
" <td>0.213047</td>\n",
|
|
|
|
|
" <td>0.403036</td>\n",
|
|
|
|
|
" <td>-0.464729</td>\n",
|
|
|
|
|
" <td>0.070417</td>\n",
|
|
|
|
|
" <td>0.021560</td>\n",
|
|
|
|
|
" <td>-0.952776</td>\n",
|
|
|
|
|
" <td>-0.213330</td>\n",
|
|
|
|
|
" <td>0.111419</td>\n",
|
|
|
|
|
" <td>-0.235893</td>\n",
|
|
|
|
|
" </tr>\n",
|
|
|
|
|
" <tr>\n",
|
|
|
|
|
" <th>...</th>\n",
|
|
|
|
|
" <td>...</td>\n",
|
|
|
|
|
" <td>...</td>\n",
|
|
|
|
|
" <td>...</td>\n",
|
|
|
|
|
" <td>...</td>\n",
|
|
|
|
|
" <td>...</td>\n",
|
|
|
|
|
" <td>...</td>\n",
|
|
|
|
|
" <td>...</td>\n",
|
|
|
|
|
" <td>...</td>\n",
|
|
|
|
|
" <td>...</td>\n",
|
|
|
|
|
" </tr>\n",
|
|
|
|
|
" <tr>\n",
|
|
|
|
|
" <th>691</th>\n",
|
|
|
|
|
" <td>0.003078</td>\n",
|
|
|
|
|
" <td>1.199106</td>\n",
|
|
|
|
|
" <td>-0.003483</td>\n",
|
|
|
|
|
" <td>-0.670308</td>\n",
|
|
|
|
|
" <td>-0.069513</td>\n",
|
|
|
|
|
" <td>0.185754</td>\n",
|
|
|
|
|
" <td>-0.466010</td>\n",
|
|
|
|
|
" <td>0.031975</td>\n",
|
|
|
|
|
" <td>0.676276</td>\n",
|
|
|
|
|
" </tr>\n",
|
|
|
|
|
" <tr>\n",
|
|
|
|
|
" <th>1283</th>\n",
|
|
|
|
|
" <td>-2.034004</td>\n",
|
|
|
|
|
" <td>-1.508135</td>\n",
|
|
|
|
|
" <td>0.255310</td>\n",
|
|
|
|
|
" <td>0.083839</td>\n",
|
|
|
|
|
" <td>-1.413707</td>\n",
|
|
|
|
|
" <td>0.694074</td>\n",
|
|
|
|
|
" <td>-1.110579</td>\n",
|
|
|
|
|
" <td>0.232996</td>\n",
|
|
|
|
|
" <td>2.544703</td>\n",
|
|
|
|
|
" </tr>\n",
|
|
|
|
|
" <tr>\n",
|
|
|
|
|
" <th>2818</th>\n",
|
|
|
|
|
" <td>-0.702987</td>\n",
|
|
|
|
|
" <td>-0.575677</td>\n",
|
|
|
|
|
" <td>0.755056</td>\n",
|
|
|
|
|
" <td>0.664695</td>\n",
|
|
|
|
|
" <td>0.021560</td>\n",
|
|
|
|
|
" <td>-0.489334</td>\n",
|
|
|
|
|
" <td>0.371852</td>\n",
|
|
|
|
|
" <td>-2.272990</td>\n",
|
|
|
|
|
" <td>-1.764684</td>\n",
|
|
|
|
|
" </tr>\n",
|
|
|
|
|
" <tr>\n",
|
|
|
|
|
" <th>1330</th>\n",
|
|
|
|
|
" <td>1.525943</td>\n",
|
|
|
|
|
" <td>0.497074</td>\n",
|
|
|
|
|
" <td>-0.714355</td>\n",
|
|
|
|
|
" <td>-1.024237</td>\n",
|
|
|
|
|
" <td>-1.022037</td>\n",
|
|
|
|
|
" <td>-0.327074</td>\n",
|
|
|
|
|
" <td>-1.107341</td>\n",
|
|
|
|
|
" <td>0.517432</td>\n",
|
|
|
|
|
" <td>-1.230528</td>\n",
|
|
|
|
|
" </tr>\n",
|
|
|
|
|
" <tr>\n",
|
|
|
|
|
" <th>1926</th>\n",
|
|
|
|
|
" <td>-0.043558</td>\n",
|
|
|
|
|
" <td>-0.882359</td>\n",
|
|
|
|
|
" <td>-0.456141</td>\n",
|
|
|
|
|
" <td>-0.770271</td>\n",
|
|
|
|
|
" <td>0.795189</td>\n",
|
|
|
|
|
" <td>0.560306</td>\n",
|
|
|
|
|
" <td>-1.086081</td>\n",
|
|
|
|
|
" <td>-1.356820</td>\n",
|
|
|
|
|
" <td>0.172521</td>\n",
|
|
|
|
|
" </tr>\n",
|
|
|
|
|
" </tbody>\n",
|
|
|
|
|
"</table>\n",
|
|
|
|
|
"<p>2948 rows × 9 columns</p>\n",
|
|
|
|
|
"</div>"
|
|
|
|
|
],
|
|
|
|
|
"text/plain": [
|
|
|
|
|
" ph Hardness Solids Chloramines Sulfate Conductivity \\\n",
|
|
|
|
|
"1022 0.003078 0.688791 0.846257 1.428934 -0.858263 0.002792 \n",
|
|
|
|
|
"3191 -0.587365 0.223203 -0.731867 0.397503 0.759893 0.330607 \n",
|
|
|
|
|
"13 0.003078 -0.241037 0.773051 0.580019 1.334369 -0.049130 \n",
|
|
|
|
|
"2068 -2.176058 1.443006 -1.626771 -4.164610 -0.033706 -1.050763 \n",
|
|
|
|
|
"1484 0.213047 0.403036 -0.464729 0.070417 0.021560 -0.952776 \n",
|
|
|
|
|
"... ... ... ... ... ... ... \n",
|
|
|
|
|
"691 0.003078 1.199106 -0.003483 -0.670308 -0.069513 0.185754 \n",
|
|
|
|
|
"1283 -2.034004 -1.508135 0.255310 0.083839 -1.413707 0.694074 \n",
|
|
|
|
|
"2818 -0.702987 -0.575677 0.755056 0.664695 0.021560 -0.489334 \n",
|
|
|
|
|
"1330 1.525943 0.497074 -0.714355 -1.024237 -1.022037 -0.327074 \n",
|
|
|
|
|
"1926 -0.043558 -0.882359 -0.456141 -0.770271 0.795189 0.560306 \n",
|
|
|
|
|
"\n",
|
|
|
|
|
" Organic_carbon Trihalomethanes Turbidity \n",
|
|
|
|
|
"1022 0.913790 0.232417 2.319505 \n",
|
|
|
|
|
"3191 0.094379 0.282563 0.235024 \n",
|
|
|
|
|
"13 -1.121422 -0.200432 -0.946356 \n",
|
|
|
|
|
"2068 -0.391328 -0.398649 -0.298341 \n",
|
|
|
|
|
"1484 -0.213330 0.111419 -0.235893 \n",
|
|
|
|
|
"... ... ... ... \n",
|
|
|
|
|
"691 -0.466010 0.031975 0.676276 \n",
|
|
|
|
|
"1283 -1.110579 0.232996 2.544703 \n",
|
|
|
|
|
"2818 0.371852 -2.272990 -1.764684 \n",
|
|
|
|
|
"1330 -1.107341 0.517432 -1.230528 \n",
|
|
|
|
|
"1926 -1.086081 -1.356820 0.172521 \n",
|
|
|
|
|
"\n",
|
|
|
|
|
"[2948 rows x 9 columns]"
|
|
|
|
|
]
|
|
|
|
|
},
|
2022-05-18 18:09:00 +02:00
|
|
|
|
"execution_count": 6,
|
2022-05-17 17:30:50 +02:00
|
|
|
|
"metadata": {},
|
|
|
|
|
"output_type": "execute_result"
|
|
|
|
|
}
|
|
|
|
|
],
|
|
|
|
|
"source": [
|
|
|
|
|
"X_train"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "code",
|
2022-05-18 18:09:00 +02:00
|
|
|
|
"execution_count": 7,
|
2022-05-17 17:30:50 +02:00
|
|
|
|
"metadata": {
|
|
|
|
|
"id": "a3jkTMFLtjN6"
|
|
|
|
|
},
|
2022-05-18 18:09:00 +02:00
|
|
|
|
"outputs": [],
|
2022-05-17 17:30:50 +02:00
|
|
|
|
"source": [
|
|
|
|
|
"# Trenowanie modelu klasyfikatora\n",
|
|
|
|
|
"x = NaiveBayesClassifier()\n",
|
|
|
|
|
"x.fit(X_train, y_train)"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "code",
|
2022-05-18 18:09:00 +02:00
|
|
|
|
"execution_count": 8,
|
2022-05-17 17:30:50 +02:00
|
|
|
|
"metadata": {
|
|
|
|
|
"id": "CoC22aNgtjN9"
|
|
|
|
|
},
|
|
|
|
|
"outputs": [
|
|
|
|
|
{
|
|
|
|
|
"data": {
|
|
|
|
|
"text/plain": [
|
|
|
|
|
"0"
|
|
|
|
|
]
|
|
|
|
|
},
|
2022-05-18 18:09:00 +02:00
|
|
|
|
"execution_count": 8,
|
2022-05-17 17:30:50 +02:00
|
|
|
|
"metadata": {},
|
|
|
|
|
"output_type": "execute_result"
|
|
|
|
|
}
|
|
|
|
|
],
|
|
|
|
|
"source": [
|
|
|
|
|
"# Predykcja wartości dla danych testowych\n",
|
|
|
|
|
"predictions = x.predict(X_test)\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"# Prawdopodobieństwa kolejnych predykcji\n",
|
|
|
|
|
"probabilities = [p[1] for p in predictions]\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"# Przewidziana wartość\n",
|
|
|
|
|
"predictions = [p[0] for p in predictions]\n",
|
|
|
|
|
"predictions[0]"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "code",
|
2022-05-18 18:09:00 +02:00
|
|
|
|
"execution_count": 9,
|
2022-05-17 17:30:50 +02:00
|
|
|
|
"metadata": {
|
|
|
|
|
"id": "JR06zodmtjN9"
|
|
|
|
|
},
|
|
|
|
|
"outputs": [
|
|
|
|
|
{
|
|
|
|
|
"data": {
|
|
|
|
|
"text/plain": [
|
2022-05-18 18:09:00 +02:00
|
|
|
|
"0.6280487804878049"
|
2022-05-17 17:30:50 +02:00
|
|
|
|
]
|
|
|
|
|
},
|
2022-05-18 18:09:00 +02:00
|
|
|
|
"execution_count": 9,
|
2022-05-17 17:30:50 +02:00
|
|
|
|
"metadata": {},
|
|
|
|
|
"output_type": "execute_result"
|
|
|
|
|
}
|
|
|
|
|
],
|
|
|
|
|
"source": [
|
|
|
|
|
"# Wyliczenie accuracy modelu\n",
|
|
|
|
|
"x.accuracy(y_test, predictions)"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "code",
|
2022-05-18 18:09:00 +02:00
|
|
|
|
"execution_count": 10,
|
2022-05-17 17:30:50 +02:00
|
|
|
|
"metadata": {
|
|
|
|
|
"id": "1jW0QPootjN_"
|
|
|
|
|
},
|
|
|
|
|
"outputs": [
|
|
|
|
|
{
|
|
|
|
|
"data": {
|
|
|
|
|
"text/plain": [
|
|
|
|
|
"0.14084507042253522"
|
|
|
|
|
]
|
|
|
|
|
},
|
2022-05-18 18:09:00 +02:00
|
|
|
|
"execution_count": 10,
|
2022-05-17 17:30:50 +02:00
|
|
|
|
"metadata": {},
|
|
|
|
|
"output_type": "execute_result"
|
|
|
|
|
}
|
|
|
|
|
],
|
|
|
|
|
"source": [
|
|
|
|
|
"from sklearn.metrics import f1_score\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"f1_score(y_test, predictions)"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "code",
|
2022-05-18 18:09:00 +02:00
|
|
|
|
"execution_count": 11,
|
2022-05-17 17:30:50 +02:00
|
|
|
|
"metadata": {
|
|
|
|
|
"id": "vEVogTmAtjOA"
|
|
|
|
|
},
|
|
|
|
|
"outputs": [
|
|
|
|
|
{
|
|
|
|
|
"data": {
|
|
|
|
|
"text/plain": [
|
|
|
|
|
"0 0.609756\n",
|
|
|
|
|
"1 0.390244\n",
|
|
|
|
|
"Name: Potability, dtype: float64"
|
|
|
|
|
]
|
|
|
|
|
},
|
2022-05-18 18:09:00 +02:00
|
|
|
|
"execution_count": 11,
|
2022-05-17 17:30:50 +02:00
|
|
|
|
"metadata": {},
|
|
|
|
|
"output_type": "execute_result"
|
|
|
|
|
}
|
|
|
|
|
],
|
|
|
|
|
"source": [
|
|
|
|
|
"y_test.value_counts(normalize=True)"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "code",
|
2022-05-18 18:09:00 +02:00
|
|
|
|
"execution_count": 12,
|
2022-05-17 17:30:50 +02:00
|
|
|
|
"metadata": {
|
|
|
|
|
"id": "jCVOdBZytjOB"
|
|
|
|
|
},
|
|
|
|
|
"outputs": [
|
|
|
|
|
{
|
|
|
|
|
"data": {
|
2022-05-18 18:09:00 +02:00
|
|
|
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA4QAAAGQCAYAAAD2lq6fAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8rg+JYAAAACXBIWXMAAAsTAAALEwEAmpwYAABASUlEQVR4nO3deViVdf7/8RcgBz0Hwtw1NQ0FNBJRFFccLLfC3HJqMtpsMRPNSculsm2G1EknGcPRHJtsstKGEu1ySR21As3GcjRwwVwzN5SEo6z37w+/nJ9HUECWA9zPx3V5DedePvf7nBzevs79ue/bzTAMQwAAAAAA03F3dQEAAAAAANcgEAIAAACASREIAQAAAMCkCIQAAAAAYFIEQgAAAAAwKQIhANM4efKkYmNjlZSU5OpSAAAAqgQCIVAOAgICNGXKlFLvN2XKFAUEBFRARbhadna2xo0bp6SkJAUHB5d6/23btikgIED//ve/K6A6AEB5OXbsmAICAhQbG3tD+5d3b6Z/oKqr5eoCYF7btm3Tww8/7LTMarWqdevWGjJkiB566CF5eHi4qDq40ldffaXk5GRFR0eX25gzZsyQYRj6+9//rjp16pTbuACAy+jrQPVEIITLRUZGKjw8XIZh6NSpU4qPj9ef//xnHThwQG+88Yary6tQb7zxhl577TVXl1HlfPXVV4qPjy+3QPjrr7/qlltu0Ysvvihvb+8bGqNLly7atWuXatXi1yYAXI+Z+3pR6B+o6vibCZdr3769hgwZ4nj94IMPatCgQVq+fLkmTJigBg0aFLlfRkbGDf/jvqrw9PR0dQlVSkX9N23SpInGjRtXpjHc3d3l5eVVThUBQM11I329JvT0a6F/oKrjGkJUOd7e3goJCZFhGDp69KgkqW/fvoqKitJPP/2k0aNHq3Pnzrr33nslXW4ic+fO1ciRIxUWFqagoCD169dPf/nLX3Tx4kXHuNnZ2erQoYNefPFFp+O98sorCggI0Jtvvum0/LnnnlOnTp2Um5vrWLZ//36NHj1aHTt2VNeuXfX888/r7Nmzhd5DVFSUAgICivzTt29fx3ZXX6cQHx+vgIAAp5ue5OTkKCQkRAEBAfrpp58cyzMyMnT77bdrxowZTsf+3//+p2effdbxWQwYMEBxcXFO7+NaoqKinOqTpFWrVikgIMDxeRf46KOPFBAQoB9//FGSlJ+fr7i4OI0aNUo9e/ZUUFCQfve732nGjBk6d+6c075XXt/x5Zdfavjw4erQoYPefPNNRUVFKT4+XpKcPrcrr71ISUlxvMc77rhDd999txYtWqS8vDyn45w4cUJTp05VRESEgoKC1L17dz3wwAOO8QsYhqFPP/1UI0eOVEhIiEJCQjR48GC98847jm24BgQAbszVff16PV2SDh06pMmTJ6tXr14KCgpS3759NXPmTNnt9kJj79ixQw888IA6dOigHj166PXXXy+03U8//aSAgADNnTu3yPqeeuopderUqcjxC1y6dEnPPPOMgoKCtHLlSsfysvQPwzD00Ucfafjw4QoODlZISIiioqK48RkqHWcIUeUYhqHDhw9Lkm6++WbH8l9++UWPPPKIBg4cqP79+zt+cZ88eVIrVqxQ//79FRkZqVq1amn79u167733lJycrMWLF0uSLBaLQkJCtG3bNqfjJSYmyt3d3Wm5YRjavn27QkNDHVM8jh49qlGjRik7O1ujRo1S06ZNtWnTJj3xxBOF3sOYMWN03333OS07evSoYmNjVb9+/Wu+927dukmSkpKSHD//+OOPstvtcnd3V1JSktq3by/pchPMzc11bCdJ//nPfzRu3Djdeuutevzxx+Xr66sffvhB8+bNU3JysubNm3e9j17dunXTvHnzdOTIEbVs2dLp89m3b5/S0tJUr149R43e3t4KCgqSdDm4Ll68WP3799edd96pOnXq6H//+58+++wz/fe//9Vnn30mi8XidLyvvvpKS5cu1R/+8Ac98MAD8vb2lq+vr/Lz87Vjxw7NmjXLsW2nTp0kXQ68UVFRqlWrlkaNGqUGDRpo06ZN+stf/qKUlBS9/fbbkqTc3Fw99thjOnnypB588EG1atVKGRkZ2rt3r3bs2KFhw4Y5xp48ebISEhIUHBysMWPGyMfHRwcPHtTatWs1YcKE635mAIDrK6qvX6un7969W4888ohuuukm3X///WrcuLFSUlK0dOlS7dy5U0uXLnXMrvnxxx/12GOPyWaz6cknn5SPj4++/PLLQl/8tm/fXrfffrvi4+M1fvx4p+sYT548qa+//lojRoyQ1Wotsv5z587pmWee0b59+7Rw4UL16NHDsa4s/WPy5MlavXq1BgwYoOHDhys7O1sJCQl6/PHHFRsbqzvvvLOUnzRwgwzARZKSkgx/f38jNjbWOHv2rHH27FkjOTnZmD59uuHv72/8/ve/d2wbERFh+Pv7G59++mmhcbKysozs7OxCy+fOnWv4+/sbP/74o2PZu+++a/j7+xs///yzYRiGcfz4ccPf39+YNGmS4e/vb5w+fdowDMNISUkx/P39jcWLFzv2/eMf/2j4+/sbiYmJjmX5+fnG2LFjDX9/f+PFF1+85ns9f/68MWDAAKNr167G4cOHHctffPFFw9/f32nbfv36Gffff7/jdWxsrBEWFmaMHj3aeOKJJxzLY2JijICAAOPs2bOGYRjGpUuXjB49ehgPPvigkZOT4zTmkiVLDH9/fyMpKemaNRqGYezYscPw9/c3PvnkE8eyvn37Oj6f1atXO953WFiY8fTTTzt9FhcvXiw05qeffuq0r2EYxtGjRw1/f3+jffv2xoEDBwrtU9TnUuD+++832rVrZyQnJzsde/z48Ya/v7/x7bffGoZhGMnJyYa/v7+xcOHC677n1atXO/4O5OXlOa278nXB39fPPvvsuuMBgFmVtK9fr6cPHjzYGDBggHHhwgWn5evWrSv0O/j+++83br/9duPgwYOOZVlZWcaIESMMf39/Y968eY7lH3/8seHv72/85z//cRq34N8FV/5b4coedPToUWPAgAFGz549jT179jjtW5b+UfB+Pv74Y6f9cnJyjGHDhhkRERFGfn5+oc8HqAhMGYXLxcbGqnv37urevbuGDBmizz77TH379tX8+fOdtqtbt66GDx9eaH+LxeL4tjA3N1fp6elKS0tzfINXMKVRcj4DV/C/Hh4eio6Olpubm2N5wdnCgu3z8/O1ceNGBQUFOZ2Rc3NzK/IM4ZVycnIUHR2tY8eOaf78+Y4zb9fSrVs37d69W5mZmY5awsLC1KNHD+3YsUM5OTmO5f7+/o4zdt98843OnDmj4cOH67ffflNaWprjT3h4uGOb6+nQoYOsVqvjczh+/LiOHTumyMhI+fv7O5bv3btX586dK/RZ1K5dW5KUl5fnqKFgm127dhU6Xp8+feTn53fdmq509uxZ7dy5U3379lVgYKDTsZ955hlJ0vr16yVJPj4+js+pqGm9BRISEiRJL774otzdnX8lXv0aAFC8kvT1onr63r17tXfvXkVGRio7O9upj3Xu3FlWq9XRx67sB61bt3aMYbFY9OijjxaqKTIyUlarVStWrHAsMwxDn332mfz9/dWhQ4dC+yQnJ+uBBx6QYRhatmyZY4ZOgbL0j5UrV8pms+muu+5yep+//fab+vbtq+PHj+vQoUPXHQMoL0wZhcvdf//9GjhwoNzc3FSnTh21atVKdevWLbRdixYtrnm76n/961/6+OOPdeDAAeXn5zutS09Pd/x8xx13yGazKSkpSQ888ICSkpIUFBSkli1bOgJPZGSkkpKSVLduXbVr107S5cZjt9t12223FTp2mzZtrvv+XnnlFW3btk0zZ85UaGhocR+HunXrpk8++UQ7duxQWFiYfvjhB02bNk3BwcGaOXOmdu3aJT8/P6WkpCgqKsqxX2pqqiRp2rRp1xz7zJkz1z22p6enOnfu7AjEiYmJqlWrlkJDQxUWFqYtW7ZI+v+B+spAKElffvmllixZouTkZEdwLXDlf4cCrVq1um49Vzt27Jikoj/z2267Te7u7o7rTm+55Ra
|
2022-05-17 17:30:50 +02:00
|
|
|
|
"text/plain": [
|
|
|
|
|
"<Figure size 1080x432 with 2 Axes>"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"output_type": "display_data"
|
|
|
|
|
}
|
|
|
|
|
],
|
|
|
|
|
"source": [
|
|
|
|
|
"x.visualize(y_test, predictions, 'Potability')"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "code",
|
2022-05-18 18:09:00 +02:00
|
|
|
|
"execution_count": 13,
|
2022-05-17 17:30:50 +02:00
|
|
|
|
"metadata": {
|
|
|
|
|
"id": "aw8Tefprhjnn"
|
|
|
|
|
},
|
|
|
|
|
"outputs": [
|
|
|
|
|
{
|
|
|
|
|
"data": {
|
2022-05-18 18:09:00 +02:00
|
|
|
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAaQAAAEUCAYAAABkhkJAAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8rg+JYAAAACXBIWXMAAAsTAAALEwEAmpwYAAA1L0lEQVR4nO3de1hUdf4H8PfMMIAoCl4wSN3SDcNIRVRU1AQp0BBvmO4K/jZ9crXM3M0Sn8xEXZM2ly01zd3StbXLaoqal7Q0rCx+wo9VedRI1isXURl1BGTGme/vD5cJZC5nhhnmDLxfz+MjM+d7vudzzvl+5nPOmctRCCEEiIiI3Ezp7gCIiIgAFiQiIpIJFiQiIpIFFiQiIpIFFiQiIpIFFiQiIpIFFiQiajFSU1OxdetW0+PMzExERUUhOjrajVFRLRYkD5eTk4Phw4c3eP7+xCNqTnJzczFlyhRERkZi4MCBmDJlCk6cOGFXHyUlJdi4cSP27t2L77//3mb7tLQ0ZGZmOhoySeDl7gCIiOxx+/ZtzJo1C0uWLMGoUaOg1+uRm5sLb29vu/opKSlBQEAAOnTo4KJIyV48Q/IQsbGxeP/99zF69GgMGDAACxcuRE1NjbvDImpy586dAwAkJiZCpVLB19cXQ4cOxaOPPorVq1dj/vz5praXL19Gz549cffu3Xp9HD16FNOnT0d5eTkiIiKQlpYGAJg7dy6io6MRGRmJqVOn4ueffwYAfPbZZ9i9ezc++OADREREYNasWQCAK1eu4MUXX8SgQYMQGxuLzZs3N8UmaLZYkDxIbUIcPHgQ586dw3vvvefukIia3MMPPwyVSoUFCxYgOzsbN2/etLuPIUOG4G9/+xuCgoKQn5+PlStXAgCGDx+OL7/8Ej/88AN69eplKm6TJ0/GmDFjMGPGDOTn52P9+vUwGo2YPXs2evbsiSNHjuAf//gH/vGPf+Dbb7916vq2JCxIHmTq1KkIDg5GQEAAZs+ejT179gAAysvL0b9//3r/8vLy3BwtkWu0adMGH3/8MRQKBV5//XUMHjwYs2bNwrVr1xrdd3JyMtq0aQNvb2+8+OKLOHPmDLRardm2J0+eREVFBebMmQNvb2907doVzzzzDPbu3dvoOFoqvofkQYKDg01/h4SEoLy8HAAQFBSEI0eO1GubmprapLERNaUePXqYzmqKiorwyiuvYMWKFXj44Ycd7tNgMCAzMxP79+9HRUUFlMp7x+sajQb+/v4N2hcXF5sOBuv2Ufcx2YcFyYOUlpaa/i4pKUFQUJAboyGShx49emDChAn47LPP0KtXL9y5c8c0zZ6zpt27d+Prr7/Gxo0b0aVLF2i1WgwYMAC1N0RQKBT12gcHB6NLly44cOCAc1aEeMnOk3z88ccoKyvDjRs3sH79eowePdrdIRE1uaKiInz44YcoKysDcO9A7YsvvkCfPn0QFhaGY8eOoaSkBFqtFu+//77kfisrK+Ht7Y3AwEBUV1fjL3/5S73pHTp0wOXLl02Pe/fujdatW2PDhg24c+cODAYDCgsL7f74Of2CBcmDJCYmYvr06YiLi0O3bt0we/Zsd4dE1OTatGmD48ePY9KkSejbty+eeeYZhIaGIi0tDdHR0Rg9ejSSkpIwYcIExMTESO533LhxCAkJwbBhw/D000+jb9++9aYnJyfj7Nmz6N+/P55//nmoVCqsX78eZ86cwciRIzFo0CAsWrQIt2/fdvIatxwK3qDPM8TGxmL58uUYMmSIu0MhInIJniEREZEssCAREZEs8JIdERHJgs0zpIyMDMTGxqJnz54oLCxsipiImhXmEJE0Nr+HNHLkSEybNg1Tp061u3Oj0YjKykqo1eoGn+En8iRCCOj1erRu3dr0hUmpHM0h5g81F1Lzx2ZBasy3jisrK3lESM1KaGio2W/tW+NoDjF/qLmxlT8u/aUGtVptCsLen4Z3hoKCAoSHhzs076trvkXFzTtmp7Vv5wsAZqe3b+eLt+YMszq/QqmAMNp+6+7vrz1pR8T3WFvu/THa20fd+RqzbR1ZXmM1Nl6dTofCwkLTmG4KlvLH2vYCrI9LZ5AyxuzhzNhcxZnj3R725obUOB0ZQ+ZYei27Pz6p+ePSglR7mcGdR3kFBQUOzfefkkqL025UWp9WUFBgdX6pHIldynJrY7S3j/vnc3TbOrq8xnJGX0156cxS/ljbXpY4c1s6Y2zX5ez97CruiNGR3JASpyNjyB6W4rOVP03yW3bh4eHw8fFpikXVk5eXh8jISIfm7bTvOq5qqs1PC2wFAGandwpshcjISKvzK5UKGCWcITkSu7Xl3h+jvX3Una8x29aR5TVWY+Otqalx24vm/fljbXsB1selM0gZY3b158TYXMWZ490e9uaG1DgdGUPmWHotuz8+qfnD7yFZMG1UGFTKhtXcS6XAtFFhmDYqDD5qVb1pPmoVpo0KM83vpWo4v0qpQEJUtwbz3q/Prx27i6WluM3FaK0Pa+vmbE29PE9nbXs1xbY0twxHqf+bT2Seq/anrTFk7rXrfj5qldnXssbEZ/MMafny5Thw4ACuXbuGZ599FgEBAab78DRnIyK7AgA2ZJ2EtkoPAPD3U2PmuMdN0wBg877TuKapRsfAVpg2Ksw0zdb8YQ93MM2rVAIG4y/L7vPrDlg+e6jT4lYoACHuHbXUjdFWH5bWzdmaenlNzdk5JGV7/T3rOG5VGVyyLc0tf8CjQTh2phzXNNVo43fvfYLbVXqb04Y96tts9rMruCo3pPRb9zWkVu0ZUd3XkrqvZY2Nz6VfjK09TfPES3bu4EnxelKsgPMu2TXlWG7MMj1l/zBO55JrnFLHMi/ZERGRLLAgERGRLLAgERGRLLAgERGRLLAgERGRLLAgERGRLLAgERGRLLAgERGRLLAgERGRLLAgERGRLLAgERGRLLAgERGRLLAgERGRLLAgERGRLLAgERGRLLAgERGRLLAgERGRLLAgERGRLLAgERGRLLAgERGRLLAgERGRLLAgERGRLLAgERGRLLAgERGRLLAgERGRLLAgERGRLLAgERGRLLAgERGRLLAgERGRLLAgERGRLEgqSOfOncPkyZMRHx+PyZMn4/z58y4Oi6j5YP4QSeMlpdEbb7yB3/72txg7dix27tyJxYsXY/PmzY1a8Dd5l7B532lc01SjjZ8aAHC7So+Oga0wbVQYRkR2tTn/hqyT0FbpTc8pFIAQgL+fGvq7RtzRGYCPLzcqzibnAfH6+6nxZJ82+OCrr3CpvNL0vALAqMG/wrEz5bimqZa8L2vVHRPW5pXaTipn93c/V+SPpzKXt/j4MtReSujvGk05LEsyzs32/t6o0OruPfhvnP5+aswc9zhGRHY1jfGrmmoolQoYjQKdXDDWG8vmGdL169dx6tQpJCYmAgASExNx6tQpVFRUOLzQb/IuYc3W47iqqYYAoK3SQ1ulhwBwVVONNVuP45u8S1bnf+ez/PqDGr8MZG2V/l4xIpfQVumx/QdNvWIEAALA3h8umParlH1Z6/4xYWleqe2kcnZ/93NF/niqb/Iu4a+fNsxbANDfNQKQcTGSOVMxqkNbpcdfP83Hum3/No1xADAa721kZ491Z7BZkEpLS9G5c2eoVCoAgEqlQlBQEEpLSx1e6OZ9p1Gjt1wwavQGbN532ur8dw0cuZ7A1r6sZW5MmJtXajupnN3f/VyRP55q877TMBiZt03JYBTYn3PR4uutM8e6M0i6ZNdYBQUF9R7XVmprrmqqkZeXZ3EaeQ5r+7JuGynzSm1njrnpjemvqdyfP1LJJf5azFv3MNo4CJDTWLdZkIKDg3HlyhUYDAaoVCoYDAaUl5cjODjYZufiv+ffoaGh8Pb2Nj3fPUSDipt3rM7bvp0vwsPDzU6TMj/Jh7V9WcvSPr1/Xqnt7ldQUGB2utT+dDodCgsLTWNaKlfkjxSW1tedmLfuoVAqIKwUJSn52VhS80chJGRYamoqkpOTTW/Kbtu2DR999JHNILRaLQoLC6VHTSRzoaGh8Pf3t2se5g/RPbbyR1JBKioqQlpaGm7duoW2bdsiIyMD3bt3t7lwo9G
|
2022-05-17 17:30:50 +02:00
|
|
|
|
"text/plain": [
|
|
|
|
|
"<Figure size 432x288 with 6 Axes>"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"output_type": "display_data"
|
|
|
|
|
}
|
|
|
|
|
],
|
|
|
|
|
"source": [
|
|
|
|
|
"ph_val = X_test[\"ph\"]\n",
|
|
|
|
|
"sulfate_val = X_test[\"Sulfate\"]\n",
|
|
|
|
|
"hard_val = X_test[\"Hardness\"]\n",
|
|
|
|
|
"carb_val = X_test[\"Organic_carbon\"]\n",
|
|
|
|
|
"turb_val = X_test[\"Turbidity\"]\n",
|
|
|
|
|
"ch_val = X_test[\"Chloramines\"]\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"figure, axes = plt.subplots(nrows=3, ncols=2)\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"axes[0, 0].plot(ph_val, predictions, 'bo')\n",
|
|
|
|
|
"axes[0, 0].set_title(\"pH\")\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"axes[0, 1].plot(sulfate_val, predictions, 'bo')\n",
|
|
|
|
|
"axes[0, 1].set_title(\"Sulfate\")\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"axes[1, 0].plot(hard_val, predictions, 'bo')\n",
|
|
|
|
|
"axes[1, 0].set_title(\"Hardness\")\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"axes[1, 1].plot(carb_val, predictions, 'bo')\n",
|
|
|
|
|
"axes[1, 1].set_title(\"Organic carbon\")\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"axes[2, 0].plot(turb_val, predictions, 'bo')\n",
|
|
|
|
|
"axes[2, 0].set_title(\"Turbidity\")\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"axes[2, 1].plot(ch_val, predictions, 'bo')\n",
|
|
|
|
|
"axes[2, 1].set_title(\"Chloramines\")\n",
|
|
|
|
|
"\n",
|
2022-05-17 20:01:51 +02:00
|
|
|
|
"figure.tight_layout()\n",
|
2022-05-17 17:30:50 +02:00
|
|
|
|
"plt.show()"
|
|
|
|
|
]
|
2022-05-17 18:37:29 +02:00
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "code",
|
2022-05-18 18:09:00 +02:00
|
|
|
|
"execution_count": 14,
|
2022-05-17 18:37:29 +02:00
|
|
|
|
"metadata": {},
|
|
|
|
|
"outputs": [
|
|
|
|
|
{
|
|
|
|
|
"data": {
|
|
|
|
|
"text/plain": [
|
2022-05-18 18:09:00 +02:00
|
|
|
|
"<matplotlib.collections.PathCollection at 0x7fd5199264c0>"
|
2022-05-17 18:37:29 +02:00
|
|
|
|
]
|
|
|
|
|
},
|
2022-05-18 18:09:00 +02:00
|
|
|
|
"execution_count": 14,
|
2022-05-17 18:37:29 +02:00
|
|
|
|
"metadata": {},
|
|
|
|
|
"output_type": "execute_result"
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"data": {
|
2022-05-18 18:09:00 +02:00
|
|
|
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXYAAAD7CAYAAAB+B7/XAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8rg+JYAAAACXBIWXMAAAsTAAALEwEAmpwYAACA5UlEQVR4nO2dd3gUVReH35mt6Qm999577yBNqoIUAaUoAiIfiCCI0kFEadJBmhQFkQ7SeyfU0HtNgIRA2vad74+FhZDdzaZAQpj3eXgesrNz58wme+bec8/5HUGSJAkZGRkZmTSDmNIGyMjIyMgkL7Jjl5GRkUljyI5dRkZGJo0hO3YZGRmZNIbs2GVkZGTSGMqUNsBqtRIdHY1KpUIQhJQ2R0ZGRuadQJIkTCYTXl5eiGLsOXqKO/bo6GiuXLmS0mbIyMjIvJMUKlQIHx+fWK+luGNXqVSAzTi1Wp2itgQFBVGiRIkUtcEVsn1JJ7XbmNrtg9RvY2q3D5LHRqPRyJUrV+w+9FVS3LG/CL+o1Wo0Gk0KW0OqsMEVsn1JJ7XbmNrtg9RvY2q3D5LPRkchbHnzVEZGRiaNITt2GRkZmTRGiodiZGRkZFIjkuUxmC+B6A/KEu9U1p7s2GVkZGReQZIMSM++B/12ENSAFQR/CJiGoCqV0ua5hRyKkZGRkXkF6el3oN8BGEGKAikGrA+QnnyGZAlJafPcQnbsMjIyMs+RLCFg2AUYHBw0IsX8+dZtSgyyY5eRkZF5gSnoefjF4UEwHH2r5iQW2bHLyMjIvED0BVz0HhL935YlSUJ27DIyMjIvUJUHnBQOCZ4Inh3fqjmJRXbsMjIyMs8RBAWC/xTAg9hJgx6grg2aOiliV0KR0x1lZGRkXkHQVIEM65GiF4ApEMR0CJ6fguYDBOHdmAsnu5XTp0+ncOHCsmKjjIzMO4ugzI3oNxIxw0bEdEsQtI3eGacOyezYz58/z+nTp8mePXtyDisjIyMjkwCSzbEbjUZGjRrFiBEjkmtIGRkZGZlEkGyOferUqbRo0YIcOXIk15AyMjIyMolAkCTJRdKme5w6dYopU6awaNEiBEGgXr16zJ49m0KFCsV7rsFgICgoKKkmyMjIyLyXlChRIo62e7JkxRw/fpzr169Tv359AEJCQujevTvjx4+nRo0aiTbubRMYGEj58uVT1AZXyPYlndRuY2q3D1K/jandPkgeG11NipPFsX/55Zd8+eWX9p8TMmOXkZGRkUle3p38HRkZGRkZt3gjBUq7du16E8PKyMjIyLiBPGOXkZGRSWPIjl1GRkYmjSE7dhkZGZk0huzYZWRkZNIYsmOXkZGRSWPIjl1GRkYmjSE7dhkZGZk0huzYZWRkZNIYsmOXkZGRSWPIjl1GRkYmjSE7dhkZGZk0huzYZWRkZNIYsmOXkZFJ9UiSNaVNeKeQHbuMjEyqRJKMWCMnYX1YHulhEayPqmONXig7eTd4I7K9MjIyMklBkiSkJz3AdAow2F60PobIKUjm6wh+Y1LUvtSOPGOXkZFJfRgPg/ksdqduRwe6dUjmOylh1TuD7NhlZGRSHZJ+K0gxzt9g2PPWbHkXkR27jIxMKkSK55ir4zKyY5eRkUl1CNpGIHg6Owqa2m/VnneNZNs87d27N/fu3UMURTw9Pfnxxx8pWrRocg0vIyPzPqGuCsqiYAoidpzdA7SNEZR5Usiwd4Nkc+wTJkzAx8cHgB07djB06FDWrFmTXMPLyMi8RwiCCOkWIkVOAd1fIOlB8AOvbghePVLavFRPsjn2F04dICoqCkEQkmvoVIvFbOHsvgtEhUdTqEJ+MufOmNImycikGQRBi+D7PZLPYMAIqN8Lv5IcCJIkJdsuxA8//MDBgweRJIn58+dTsGDBeM8xGAwEBQUllwlvjWvHb7F86DosJgsAFpOFwtXy0W50C9RaVQpbJyMj875QokQJNBpNrNeS1bG/YO3atWzatIl58+bF+94Xjt2RcW+bwMBAypcvH+/77l8LpmeZ7zDExM6xVWtVVGlWnh9Xfpui9qUUqd0+SP02pnb7IPXbmNrtg+Sx0ZXvfCNZMa1ateLo0aOEh4e/ieFTnNWTN2I2muO8btSbOLwhkND7YSlglYyMjIyNZImxR0dHExERQdasWQHYtWsXfn5++Pv7J8fwqY7zBy9jMVscHlNrVdw4e4cM2dO/ZatkUiNPQsL5Z9JGDq07jlKl4IMutWn2VUO8fJ2l8snIJJ1kcew6nY5+/fqh0+kQRRE/Pz9mz56dZjc6/DP5OT1mtVjxy+Dj9LjM+8P9a8H0rTIUfZQe0/MV3pKRq9g0dwczjv+MT4B3Clsok1ZJFseeIUMGVq5cmRxDvRO07NOYC4cvo49+XccCfNJ5U6hC/hSwSia1MbXXPKKeRiNZX25jGXVGQu+FsXT0P/Sa9HnKGSeTppErTxNB1RYVqNaqElqvlxsWKo0KDx8PfvpnYJpdqci4T3REDOf2X4zl1F9gMprZvmRvClgl874gy/YmAkEQ+H5JX05sPc2G2dt49jiCMnVL0Lx3IzJkS5fS5smkAgwxBkTR+QPeEGN8i9bIvG/Ijj2RCIJAxcZlqdi4bEqbIpMK8c/kh5efF0b9U4fHC1XM93YNknmvkEMxMjJvAFEU+Xx0OzSecWszNJ5quo7ukAJWybwvyDN2GZk3RNMeDdBH61n000oEwdYVSK1V87/ZX1KqVrE47w/cfoZZg5by+OY0fNN707JPY1p+3RiVWq5klkkYsmOXkXmDfNSvGc2+asS1kzdQqJQUKJsHhUIR532b5m1nVv/F9mrmmIgYFv30F0c2BTJh248Oz5GRcYYcinmHiAyN4vev5/Nxxm60CviM0e0mcfvivZQ2SyYe1BoVxaoWpnCF/A4dtC5Kx6z/LYojUWGIMXL5+HWObAh8W6bGwmq1EvU02mkxnkzqRXbs7whhweFM7biATfN2EBEWSfSzGPavPsLXlYdw9eSNlDZPJgkEbj+LQuV4Rq6P0rN10e63ao/FbGHJyJV8lKErbbP0oKV/F37vOx99TNy6DZnUiezY3xGWjFhJTITeriYJIFkl9FF6pvaOX2xNJvViMphxpcX3+kz+TTOhy++snLiO6KcxmI1mDDFGtszfxXf1R2K1Wt+qLTKJQ3bs7wh7Vx7CanH8pbp++haR4VFv2SKZ5KJkzSKYTY7DHRpPDdVaVnxrtty5dJ+D647HybM3GUzcPn+XwO1n35otMolHduzvCM6++GDLqTcZTEkc30zQgYuc2nUOXZQuSWPJJIwM2dPT4NOaaDzVsV4XFSLe/p580KXOW7PlxH+nkZzMynVReg6uOfrWbJFJPHJWzDtC6TrFObblpMPm7AFZ/AjI7J/osfeuPMSUr+ZitVoRBAGzyUy7wa3oNKyNLI/wlug3+0v8Mvry79RNiAoFFpOZ0rWL8+0fvfD08XhrdogK0envXBBAVMrZOe8CsmN/R+g6pj2ndp3FpI+tA6/xVPPVr58l2gGf2XOeid1mxFl6r5ywDm8/L1p/0zTRNsu4j0KhoPu4TynerAA5M+XCJ503vunevkpo5WblmDf4T4fHNJ4aaretihn9W7ZKJqHIoZh3hAJl8tJjenvylsyFSqNC46EmQ470fLfwa2p+XCXR4y766S+HuiX6GANLR6/CYpFT3d4mKo2S7AWypohTB8iaNzONu9VD+1rFrMZDTYnqRRwWVsmkPuQZ+ztE7tI5mHvmN56EhGM2msmYM0OSQyWuUiUNMUaeBD8lY4600zTEqDdx7fRNvP29yJInU0qbkyr5+vfu5C2ZixXj1/D4Xhh+GXxo1bcp7Qa1lENz7wiyY38HSZclINnG8vD2cKo0aLFY8fTRJtu1UhKr1crin/5m1aQNqNRKzEYz2QtmZfCSvuQvnSelzUtVCIJAs54NadazYUqbIpNI5FDMe06T7vVQaeJqkYiiQMmaRfHy80oBq5Kf+d8vZfWUTZj0JmIidBj1Jm6eu8OA2j/x6G5orPca9UYCt5/h6KZAOY1U5p1EnrG/h1jMFm6euwMCtP2uJYfWnyDk5kP7zF2tVaH10tJ/bs8
|
2022-05-17 18:37:29 +02:00
|
|
|
|
"text/plain": [
|
|
|
|
|
"<Figure size 432x288 with 1 Axes>"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"output_type": "display_data"
|
|
|
|
|
}
|
|
|
|
|
],
|
|
|
|
|
"source": [
|
|
|
|
|
"from sklearn.decomposition import PCA\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"pca = PCA(n_components=2)\n",
|
|
|
|
|
"pca.fit(X_test)\n",
|
|
|
|
|
"X_pca = pca.transform(X_test)\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"plt.scatter(X_pca[:, 0], X_pca[:, 1], c=predictions, s=50, cmap='viridis')"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "code",
|
2022-05-18 18:09:00 +02:00
|
|
|
|
"execution_count": 15,
|
2022-05-17 18:37:29 +02:00
|
|
|
|
"metadata": {},
|
|
|
|
|
"outputs": [
|
|
|
|
|
{
|
|
|
|
|
"data": {
|
|
|
|
|
"text/plain": [
|
2022-05-18 18:09:00 +02:00
|
|
|
|
"<matplotlib.collections.PathCollection at 0x7fd51990eaf0>"
|
2022-05-17 18:37:29 +02:00
|
|
|
|
]
|
|
|
|
|
},
|
2022-05-18 18:09:00 +02:00
|
|
|
|
"execution_count": 15,
|
2022-05-17 18:37:29 +02:00
|
|
|
|
"metadata": {},
|
|
|
|
|
"output_type": "execute_result"
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"data": {
|
2022-05-18 18:09:00 +02:00
|
|
|
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXYAAAD7CAYAAAB+B7/XAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8rg+JYAAAACXBIWXMAAAsTAAALEwEAmpwYAACOAElEQVR4nOydd3gUVReH3zvbNz0h9N57R6QLKKACgopg+VQERVFUFBtWxIYoAjYEG4oNRERABBGUJtJL6L1DSE+27879/tgQWHY3CRAgwLzP4yPZmblzZsuZO+ee8ztCSinR0NDQ0LhiUC61ARoaGhoaRYvm2DU0NDSuMDTHrqGhoXGFoTl2DQ0NjSsMzbFraGhoXGHoL7UBqqpis9kwGAwIIS61ORoaGhqXBVJKPB4PERERKErgHP2SO3abzcaOHTsutRkaGhoalyU1a9YkKioq4LVL7tgNBgPgN85oNF5SW5KSkqhfv/4ltSE/NPvOn+JuY3G3D4q/jcXdPigaG91uNzt27MjzoadzyR37yfCL0WjEZDJdYmsoFjbkh2bf+VPcbSzu9kHxt7G42wdFZ2OoELa2eKqhoaFxhaE5dg0NDY0rjEseitHQ0NAojkjfCfBuAyUW9PUvq6w9zbFraGhonIaULmTm8+D8E4QRUEHEQtx4hKHhpTavUGihGA0NDY3TkBnPgHMB4AaZA9IO6hFk2n1I37FLbV6h0By7hoaGRi7SdwxcCwFXiI1upP3bi27TuaA5dg0NDY2TeJJywy8hN4Lrv4tqzrmiOXYNDQ2NkyjRQD69h5TYi2XJeaE5dg0NDY2TGJoBYQqHhBVhveuimnOuaI5dQ0NDIxchdIjYsYCFwKRBCxg7gOm6S2LX2aKlO2poaGichjBdCyV+Q9q+BM8aUOIR1rvBdANCXB5z4SK38qOPPqJWrVqaYqOGhsZli9BXQokZgVJiNkr8Nwhz18vGqUMRO/bNmzezfv16ypUrV5TDamhoaGicBUXm2N1uN6+//jqvvfZaUQ2poaGhoXEOFJljHzduHD179qR8+fJFNaSGhoaGxjkgpJT5JG0WjnXr1jF27Fi+/vprhBB06tSJCRMmULNmzQKPdblcJCUlna8JGhoaGlcl9evXD9J2L5KsmFWrVrF79246d+4MwLFjxxgwYABvv/02bdu2PWfjLjZr1qyhWbNml9SG/NDsO3+Ku43F3T4o/jYWd/ugaGzMb1JcJI79oYce4qGHHsr7+2xm7BoaGhoaRcvlk7+joaGhoVEoLkiB0sKFCy/EsBoaGhoahUCbsWtoaGhcYWiOXUNDQ+MKQ3PsGhoaGlcYmmPX0NDQuMLQHLuGhobGFYbm2DU0NDSuMDTHrqGhoXGFoTl2DQ0NjSsMzbFraGhoXGFojl1DQ0PjCkNz7BoaGhpXGJpj19DQ0LjC0By7hoZGsUdK9VKbcFmhOXYNDY1iiZRu1OwxqMebIY/XRk1ug2r7SnPyheCCyPZqaGhonA9SSmTaQPCsA1z+F9UTkD0W6d2NiHnjktpX3NFm7BoaGsUP97/g3UieU8/DAY6ZSO+BS2HVZYPm2DU0NIod0jkPpD38Dq6/L5otlyOaY9fQ0CiGyAK25bddQ3PsGhoaxQ5h7grCGm4rmDpcVHsuN4ps8XTw4MEcOnQIRVGwWq28/PLL1KlTp6iG19DQuJowtgJ9HfAkERhnt4C5G0Jf+RIZdnlQZI591KhRREVFAbBgwQKGDx/OjBkzimp4DQ2NqwghFIj/Cpk9Fhw/gnSCiIGIBxARAy+1ecWeInPsJ506QE5ODkKIohq62OLz+ti4eAs56TZqNq9GqUqJl9okDY0rBiHMiOjnkVHPAW7AeFX4laJASCmLbBXixRdfZNmyZUgp+fzzz6lRo0aBx7hcLpKSkorKhIvGrlX7+H74THweHwA+j49aravSd2RPjGbDJbZOQ0PjaqF+/fqYTKaA14rUsZ/k119/Zc6cOUyaNKnAfU869lDGXWzWrFlDs2bNCtzv8K6jDGr8DC57YI6t0Wzg2u7NeHnq05fUvktFcbcPir+Nxd0+KP42Fnf7oGhszM93XpCsmF69evHff/+Rnp5+IYa/5Ez/YDZetzfodbfTw7+z1pByOPUSWKWhoaHhp0hi7DabjaysLMqUKQPAwoULiYmJITY2tiiGL3ZsXrYdn9cXcpvRbGDPxgOUKJdwka3SKI6kHUvn5zGzWT5zFXqDjhvu7UD3h7sQER0ulU9D4/wpEsfucDh44okncDgcKIpCTEwMEyZMuGIXOmJLxoTdpvpUYkpEhd2ucfVweNdRhlw7HGeOE0/uE943I6YxZ+ICPl71DlFxkZfYQo0rlSJx7CVKlGDq1KlFMdRlwS2PdmPLv9tx2s7UsYCo+EhqNq92CazSKG6Me2QSORk2pHpqGcvtcJNyKJUpI3/mkTH3XzrjNK5otMrTc6BVz+a07nUN5ohTCxYGkwFLlIVXfh52xT6paBQeW5adTUu2Bjj1k3jcXv785p9LYJXG1YIm23sOCCF4/pshrJ63nlkT5pN5IovGHevTY3BXSpSNv9TmaRQDXHYXihL+Bu+yuy+iNRpXG5pjP0eEELTo1oQW3ZpcalM0iiGxJWOIiInA7cwIub1mi6oX1yCNqwotFKOhcQFQFIX7R/bFZA2uzTBZjfQfeeclsErjakGbsWtoXCBuGng9TpuTr1+ZihD+rkBGs5EnJzxEw/Z1g/Zf8+cGPn12Cif2jic6IZJbHu3GLY91w2DUKpk1zg7NsWtoXEBufaI73R/uyq61e9AZ9FRvUhmdThe035xJf/Lp0Ml51cz2LDtfv/IjK+asYdT8l0Meo6ERDi0UcxmRnZLDh499zm2JD9Ar7j5G9h3D/q2HLrVZGgVgNBmo26oWtZpXC+mgHTkOPn3y6yCJCpfdzfZVu1kxa83FMjUAVVXJybCFLcbTKL5ojv0yIfVoOuPu+pI5kxaQlZqNLdPOkukreKzlC+xcu+dSm6dxHqz5cyM6Q+gZuTPHybyvF11Ue3xeH9+MmMqtJfrTp/RAbom9lw+HfI7THly3oVE80Rz7ZcI3r03FnuXMU5MEkKrEmeNk3OCCxdY0ii8el5f8tPjOnMlfaEbd+yFTR8/ElmHH6/bisruZ+/lCnuk8AlVVL6otGueG5tgvE/6ZuhzVF/pHtXv9PrLTcy6yRRpFRYN2tfF6Qoc7TFYTrW9pcdFsObDtMMtmrgrKs/e4POzffJA1f268aLZonDuaY79MCPfDB39OvcflOa/xpfQg3auRrn+Rqu28xtI4O0qUS+D6u9thshoDXld0CpGxVm6497qLZsvqP9Yjw8zKHTlOls3476LZonHuaI79MqHRdfUgTCFjXOkY4krFnvPYquN3ZHIrZPpDyIzHkMmtUHM+yjc8oFG0PDHhIXo/fhNGiwFzpBmDSU/Tzg346L+3sUZZLpodik4JK4khBCh6LTvnckBLd7xM6P9GP9Yt3IjHGagDb7Iaefi9+85Zn0a6/oPM5wFn4IacSUgRhYi47xwt1jgbdDodA966m3rdq1OhZEWi4iOJjr/4KqEtuzdl0nPfhtxmspro0KcV3jO/KxrFDm3GfplQvXEVBn7UjyoNKmIwGTBZjJQon8AzXz1Gu9uuPedxZc5Ygpw6AA7I+RgptVS3i4nBpKdc9TKXxKkDlKlSim4PdMJ8RsWsyWKkfpvaIQurNIof2oz9MqJSo/JM3PA+acfS8bq9JFYocf5Kkp7N4bdJJ6gnQFf6/M5RjBDChfRsARGN0Je/1OYUSx77cABVGlTkh7dncOJQKjEloug15Cb6PnuLplx6maA59suQ+NJxRTeYEgFquEdrH4iIojvXJURKFZkzlgalv0Km6UF6kfrKiJh3EYY6l9q8YoUQgu6DutB9UJdLbYrGOaKFYq52LLcDxhAbFDA2RyhXRjcomT0abJPRKS6QNsAF3u3ItLuRvqOB+0oX0rUM6VyEVDMvjcEaGueBNmO/CvF5fezddAAEVK73IIprIXgPcirWbgJhRUS/cSnNLDKkmg32KUCIQh/pQtq+RkS/AIBq/wWyR5KXgiTdSOu
|
2022-05-17 18:37:29 +02:00
|
|
|
|
"text/plain": [
|
|
|
|
|
"<Figure size 432x288 with 1 Axes>"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"output_type": "display_data"
|
|
|
|
|
}
|
|
|
|
|
],
|
|
|
|
|
"source": [
|
|
|
|
|
"plt.scatter(X_pca[:, 0], X_pca[:, 1], c=y_test, s=50, cmap='viridis')"
|
|
|
|
|
]
|
2022-05-17 18:54:16 +02:00
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"source": [
|
|
|
|
|
"### Irysy"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "code",
|
2022-05-18 18:09:00 +02:00
|
|
|
|
"execution_count": 16,
|
2022-05-17 18:54:16 +02:00
|
|
|
|
"metadata": {},
|
|
|
|
|
"outputs": [
|
|
|
|
|
{
|
|
|
|
|
"name": "stdout",
|
|
|
|
|
"output_type": "stream",
|
|
|
|
|
"text": [
|
|
|
|
|
"(105, 4) (105,)\n",
|
|
|
|
|
"(45, 4) (45,)\n",
|
2022-05-18 18:09:00 +02:00
|
|
|
|
"0.9333333333333333\n"
|
2022-05-17 18:54:16 +02:00
|
|
|
|
]
|
|
|
|
|
}
|
|
|
|
|
],
|
|
|
|
|
"source": [
|
|
|
|
|
"# Preprocessing danych\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"# Wczytywanie danych\n",
|
|
|
|
|
"df = pd.read_csv(\"iris.csv\")\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"# Zrandomizowanie kolejności danych w datasecie\n",
|
|
|
|
|
"df = df.sample(frac=1, random_state=10).reset_index(drop=True)\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"# Podział na atrybuty i przewidywane wartości\n",
|
|
|
|
|
"X, y = df.iloc[:, :-1], df.iloc[:, -1]\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"# Normalizacja i skalowanie danych\n",
|
|
|
|
|
"from sklearn.preprocessing import StandardScaler\n",
|
|
|
|
|
"sc = StandardScaler()\n",
|
|
|
|
|
"X = sc.fit_transform(X.to_numpy())\n",
|
|
|
|
|
"X = pd.DataFrame(X, columns=df.columns.values.tolist()[:-1])\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"# Podział na dane trenujące i testowe, z uwzględnieniem równego rozłożenia danych\n",
|
|
|
|
|
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, stratify=y, random_state=1)\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"print(X_train.shape, y_train.shape)\n",
|
|
|
|
|
"print(X_test.shape, y_test.shape)\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"# Trenowanie modelu klasyfikatora\n",
|
|
|
|
|
"x = NaiveBayesClassifier()\n",
|
|
|
|
|
"x.fit(X_train, y_train)\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"# Predykcja wartości dla danych testowych\n",
|
|
|
|
|
"predictions = x.predict(X_test)\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"# Prawdopodobieństwa kolejnych predykcji\n",
|
|
|
|
|
"probabilities = [p[1] for p in predictions]\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"# Przewidziana wartość\n",
|
|
|
|
|
"predictions = [p[0] for p in predictions]\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"# Wyliczenie accuracy modelu\n",
|
|
|
|
|
"print(x.accuracy(y_test, predictions))"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "code",
|
2022-05-18 18:09:00 +02:00
|
|
|
|
"execution_count": 17,
|
2022-05-17 18:54:16 +02:00
|
|
|
|
"metadata": {},
|
|
|
|
|
"outputs": [
|
|
|
|
|
{
|
|
|
|
|
"data": {
|
|
|
|
|
"text/plain": [
|
|
|
|
|
"0.9326599326599326"
|
|
|
|
|
]
|
|
|
|
|
},
|
2022-05-18 18:09:00 +02:00
|
|
|
|
"execution_count": 17,
|
2022-05-17 18:54:16 +02:00
|
|
|
|
"metadata": {},
|
|
|
|
|
"output_type": "execute_result"
|
|
|
|
|
}
|
|
|
|
|
],
|
|
|
|
|
"source": [
|
|
|
|
|
"f1_score(y_test, predictions, average=\"macro\")"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "code",
|
2022-05-17 20:01:51 +02:00
|
|
|
|
"execution_count": 18,
|
2022-05-17 18:54:16 +02:00
|
|
|
|
"metadata": {},
|
|
|
|
|
"outputs": [
|
|
|
|
|
{
|
|
|
|
|
"data": {
|
|
|
|
|
"text/plain": [
|
2022-05-18 18:09:00 +02:00
|
|
|
|
"<matplotlib.collections.PathCollection at 0x7fd519886fd0>"
|
2022-05-17 18:54:16 +02:00
|
|
|
|
]
|
|
|
|
|
},
|
2022-05-17 20:01:51 +02:00
|
|
|
|
"execution_count": 18,
|
2022-05-17 18:54:16 +02:00
|
|
|
|
"metadata": {},
|
|
|
|
|
"output_type": "execute_result"
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"data": {
|
2022-05-18 18:09:00 +02:00
|
|
|
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXYAAAD7CAYAAAB+B7/XAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8rg+JYAAAACXBIWXMAAAsTAAALEwEAmpwYAAAs20lEQVR4nO3deXhU1f3H8feZO2sWyEYgyr4HQ0VQUbEooGIFFLSK8hO3at3XVuuKRXHBWqytKCqKraC17gIKorixqBgVjSCLsqiEPZBlJrPd8/sjgCyTEMjM3MzN9/U8PsLcybmfG5JvTs499xyltdYIIYSwDYfVAYQQQsSXFHYhhLAZKexCCGEzUtiFEMJmpLALIYTNOK0OYJomVVVVuFwulFJWxxFCiJSgtSYcDpOeno7DsWcf3fLCXlVVxfLly62OIYQQKalr165kZmbu8Zrlhd3lcgE14dxut8VpDkxJSQlFRUVWx4gbu10PyDWlArtdDyTnmkKhEMuXL99VQ3dneWHfOfzidrvxeDwWpzlwqZi5Lna7HpBrSgV2ux5I3jXFGsKWm6dCCGEzUtiFEMJmLB+KEUKIg1G6agM/Ly+lRetc2h/Wxuo4jYoUdiFESinfUsG950xgycJluDwuIuEIh3Yu4O5X/8whnVpZHa9RkKEYIUTK0Frzl1PupWTeUkLVYaq2+wn6Q6wqWcsNx99JMBC0OmKjIIVdCJEyvpv/PT8vX0ckHN3jdW1qAlVBPnxpgUXJGhcp7EKIlPH95yuJhCMxj1VXVrP4w++SnKhxksIuhEgZGVnpOF2xbw0aTgdZ+c2SnKhxksIuhEgZ/UYcjWnG3vTNcDk55cIBSU7UOElhF0KkjMzsDK5/4jI8PjcOx69PXHrTPJx14xCZ9riDTHcUQqSUUy44kQ5FbXn579NZ9e0aWrXP58wbhnDEwJ5WR2s0pLALIVJOl94duX3a9VbHaLRkKEYIIWxGCrsQQtiMFHYhhLAZKexCCGEzUtiFEMJmpLALIYTNSGEXQgibkcIuhBA2I4VdCCFsRgq7EELYjBR2IYSwGSnsQghhM3FZBKysrIxbbrmFtWvX4na7adeuHffccw85OTnxaD4mbVZB6BPQfnAdiXK2Tdi5hBAilcSlx66U4tJLL2X27NlMnz6dNm3a8PDDD8ej6ZhM/6vojceit9+GLh+L3jwEs+xatA4l7JxCCJEq4lLYs7Ky6Nu3766/9+rVi3Xr1sWj6X3o0CIoHwtUg64CHQCCEPwIXX5vQs4phBCpJO5j7KZp8uKLLzJw4MB4Nw2ArpwEVMc4Ug2BN9BmZULOK4QQqUJprWNvIHiQxo4dy4YNG3jsscdwOPb/cyMYDFJSUlLv9otaXYHL2B7zWNT0sWLzGALh9vVuTwghUllRUREej2eP1+K6g9L48eNZs2YNkyZNqldR31+4WMzNrSASu7AbjiiFhx2PMloe0LkPVnFxMX369EnKuZLBbtcDck2pwG7XA8m5pro6xXEbipkwYQIlJSVMnDgRt9sdr2b3odIuAnwxjjjA1TNpRV0IIRqruPTYV6xYwZNPPkn79u0599xzAWjdujUTJ06MR/N78g2H4FwIza+Z6ljzIjjSUVmJm4kjhBCpIi6FvUuXLixbtiweTe2XUgZkPQah+ejAq2BWgucElG84ypGRlAxCCNGYxXWMPVmUUuA5HuU53uooQgjR6MiSAkIIYTNS2IUQwmaksAshhM2k5Bj7wdLhb9DVc0BHUd4BNYuHKWV1LCGEiKsmUdi1jqC3XQ/BeUAQMNGBF8DZE3Imo9T+H4wSQohU0SSGYnTVFAh+AgQAc8eLfgh/ja6YYGU0IYSIuyZR2PE/R+yFw4IQeAmto0kOJIQQidM0Cru5pfZjOlyz/K8QQthE0yjsjha1H1NuUOnJyyKEEAnWNAp7+qXEXjjMC77/q1mmQAghbKJJFHaVNhq8pwJewKDmsn3gORaVeZ214YQQIs6axHRHpRyorPHoyGVQ/T5ggqc/ynWY1dGEECLumkRh30k5O0NGZ6tjCCFEQjWJoRghhGhKpLALIYTNSGEXQgibkcIuhBA2I4VdCCFsRgq7EELYjBR2IYSwGSnsQghhM1LYhRDCZqSwCyGEzUhhF0IIm5HCLoQQNiOFXQghbEYKuxBC2IwUdiGEsBkp7EIIYTNS2IUQwmaksAshhM1IYRdCCJuRwi6EEDYjhV0IIWxGCrsQQtiMFHYhhLCZuBX28ePHM3DgQLp168by5cvj1awQQogDFLfCPmjQIKZNm8ahhx4aryaFEEIcBGe8GjryyCPj1ZQQwgI/ryjlhfteZdGsr3G5nZw0uj+//9MwmuVkWh1NHKC4FXYhROpa+dUqbjphDMFACDNqAvDKhOm8P+0Tnih+iGa5B1bcV369ipcfns7Kr1eR3zaPM68fwlGDeyUguYhFaa11PBscOHAgkyZNomvXrvV6fzAYpKSkJJ4R6kGT5voRj3Md4WgOlaFC5D6yaMr+dcFz/LJ0/T6vGy6D487pzZAbBtW7ra9mfcdr971DJBRFmzXlxe1zcfSIXgy9sf7tiPopKirC4/Hs8Vqj6bHHCpcIOroeXXYpRH4CpQAFKgOV/RTKVXhAbRUXF9OnT5/EBLWA3a4H5Jrqo2zjdjb+uDnmsWg4yrdzljPm+Vvq1Za/IsDd/ScQro7s8XooEGbRG4sZddPv6dK74x7H5N/o4NTVKW5S3VQzsgG95VyI/AAEQPtBV4G5Ab11NNqstDqiEEkXCoRwGLWXglB1uN5tLXzri1rbCgcjzH7ugwPOJw5c3Ar7uHHj6N+/P+vXr+fiiy9myJAh8Wq6wXS0FHPL+bB5AJjrgGiMN4XRgTeTnk0Iq7Vok0tapi/mMaXgN/3r/5ts5bYqopEY31+AGTXZtrH8oDKKAxO3wn7nnXfy8ccfs2TJEubPn8/MmTPj1XSDaNOP3nI2hIuBSB3vDEB4cbJiCdFoOBwOLrl/FJ60fYdC3T43F44dWe+2uvftgnLELivedA9Ot8E/rniSaeNeYePaTQedWdSt0YyxJ4oOvAVmBTF76XtwgtEqGZGEaHROvXggZtTkmdumEaqOoE2TvENzuPHpK+h8RId6t9PtyE50/E07Vnz5I5HQrx0p5VBU+4PMe/1zglVBXB4nL9z/Glc9ejGteuck4pKaNNsXdkIfAoF6vNFA+X6f4DBCNF6nXXoSgy8awM8rSnF7XbRqn49Sqt4fH41GiYQi3P/27dx/3j9Y/NF3uDwuIuGa16ORKMGqIFAz3g4w8fopXPPvC8Fe904tZ//CrtLr8SYvZP4J5Wyb8DhCNGaG06BdYesD+pit68uYdNO/+eS1zzCjJvlt87jkvlHc8OQf+WnZOjb9vIWJ1z9LtHLf35rD1WEW/K+Y084eHK9LEDSBWTHKdyYQ+8ZQDQMyLseRfmGyIglhGxVllVx11K18/MpCIqEIZtRk/aqN/P3SJ/h0RjF9Tj4ch8NR58yalZ+vSmLipsH2hR33ceDpV8cbohB4K2lxhLCT6ZPepWJrJdGIucfrQX+QybdOI1QdomW7Fph7Hd9dxeaqRMdscmxf2JVSkHkndY46RdclLY8QdvLhS/MJBUIxjymH4vvPV9Km8FCoY6g+Et7fxAZxoOw/xg4oIxeNQa3THY28pOYRTcOGykqmffs1X64vpVV6Buf/phe9WhVYHSuu9ndzVSlF89xMvOleqiurY74nu1XzRERr0mzfYwdQygO+YYA7xlEfpF2S7EjC5hat+5lBzz/L019+wYKf1vLGsqWMeu1/PLxgntXR4mrQqN/i9sX6vqrR7ejOGE6D4df+Dk/avu/zpnsYcMmxiYzYJDWJwg6gMu8AZzdQaTtecYDygedEVNooS7MJe4mYJlfMeAt/OEwwWjPMYGpNdSTClK+LWby+1OKE8TPkjyeR1aIZhsvY43VPmpsr/n4hbo8LgIvGjuSYoUfi9rlxe127/j/08lPoM7SnFdFtrUkMxQAoRzrkvgyhBejqD0C5Ub7foVy/sTqasJmFP60lHI09bhyMRpn6zdccbpMhmfTm6Uxc9CDP3DaN91+
|
2022-05-17 18:54:16 +02:00
|
|
|
|
"text/plain": [
|
|
|
|
|
"<Figure size 432x288 with 1 Axes>"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"output_type": "display_data"
|
|
|
|
|
}
|
|
|
|
|
],
|
|
|
|
|
"source": [
|
|
|
|
|
"pca = PCA(n_components=2)\n",
|
|
|
|
|
"pca.fit(X_test)\n",
|
|
|
|
|
"X_pca = pca.transform(X_test)\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"df_pred = pd.DataFrame(predictions).replace({'Virginica': 0, 'Versicolor': 1, \"Setosa\": 2}, regex=True)\n",
|
|
|
|
|
"df_pred = np.array(df_pred).reshape(1, -1)\n",
|
|
|
|
|
"plt.scatter(X_pca[:, 0], X_pca[:, 1], c=df_pred[0], s=50, cmap='viridis')"
|
|
|
|
|
]
|
2022-05-17 20:01:51 +02:00
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "code",
|
|
|
|
|
"execution_count": 19,
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"outputs": [
|
|
|
|
|
{
|
|
|
|
|
"data": {
|
2022-05-18 18:09:00 +02:00
|
|
|
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAaQAAAEUCAYAAABkhkJAAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8rg+JYAAAACXBIWXMAAAsTAAALEwEAmpwYAAA8ZElEQVR4nO3de1xU9b74/9fMcBHEBBTICs0tKZgX7qSZN6zQEFS8VtI5ba082226NXWfytLsgmn+ilK7bdvkNssLYW6xtqVSR0Vhg8YRMzmGNwQVQbnIZWZ+f/Bl4jIDzDAyC3w/Hw8fD2etz+ez3mvxec971prFQqXX6/UIIYQQNqa2dQBCCCEESEESQgihEFKQhBBCKIIUJCGEEIogBUkIIYQiSEESQgihCFKQhMHo0aM5ePCg0XVLly5l7dq1tzyGttqOELdaU/nUUrNmzSIxMdHouvPnz9OvXz+qq6tN9u/Xrx+5ubmtiqEtSUFSoLS0NKZPn05QUBChoaFMnz6d48eP2zosRWhvCSZsrz3n0yeffMLEiRNb1HbmzJls3br1Fkd0a9nZOgBRX0lJCc899xyvvvoqY8eOpaqqirS0NBwcHGwdmhDtjuRT+yJnSApz5swZACIjI9FoNHTq1Ilhw4bh6+traLNt2zbGjh1LSEgIf/zjH7lw4YJhXb9+/UhISCA8PJywsDDi4uLQ6XQAnD17ltjYWMLCwggLC2PhwoVcv37dojj37dtHdHQ0wcHBTJ8+nZMnTxrWjR49mk8//ZTx48cTFBTE/PnzqaioMKz/+OOPGTZsGMOGDWPr1q2NznquX7/OM888Q0BAAFOmTOHs2bMAPPHEEwBER0cTEBDA7t27LYpd3D6UmE/nzp0jODjYMM5LL73EkCFDDOtfeOEFPvvsM6D+WY9WqyUuLo6wsDDCw8M5cOCAoc/atWtJS0tjxYoVBAQEsGLFCsO6gwcP8sgjjxAcHMzy5ctR8sN5pCApTO/evdFoNCxZsoQDBw5QXFxcb/3evXv58MMPef/99zl06BBBQUEsXLiwXpt//etfbN++ncTERH744Qe2b98OgF6v59lnn+XHH38kOTmZS5cuER8fb3aMJ06c4L//+79ZsWIFqampTJs2jf/6r/+isrLS0CY5OZlPPvmE77//nl9++YUdO3YAkJKSwmeffcbGjRv517/+RWpqaqPxd+/ezdy5czl69Cg9e/Y0fKf0j3/8A4CkpCQyMjIYN26c2bGL24sS88nb2xsXFxdOnDgBwNGjR3F2diYnJ8fwOjQ0tFG/r776in379vH111+zfft29uzZY1i3YMECgoODWbZsGRkZGSxbtsywbv/+/Wzbto2dO3eSnJzMjz/+2MKj1/akICmMi4sLmzdvRqVS8fLLLzNkyBCee+45rly5AsCWLVt45pln6NOnD3Z2djz33HNkZ2fX+1Q3e/ZsXF1dueuuu4iNjWXXrl0A9OrViwcffBAHBwfc3d35z//8T44ePWp2jF9++SXTpk1j8ODBaDQaJk6ciL29PZmZmYY2M2fOxMvLC1dXV0aNGkV2djZQU6gmTZrEfffdh5OTE3/+858bjT9mzBgGDRqEnZ0dUVFRhr5CmEup+RQSEsLRo0e5fPkyAI8++ihHjhzh3LlzlJSU1DuDq5WcnMxTTz1Fjx49cHV15dlnn23RtmbPns0dd9zBXXfdRVhYWL2rGUoj3yEpUJ8+fXjrrbcAyMnJ4YUXXuCNN97gnXfe4eLFi7zxxhvExcUZ2uv1evLz87n77rsB6NGjh2Hd3XffTUFBAQBXrlzh9ddfJy0tjdLSUvR6PXfccYfZ8V28eJGvv/6aTZs2GZZVVVUZtgPg4eFh+L+Tk5NhXUFBAQMGDDCsqxtrre7duxv+36lTJ8rKysyOUYhaSsyn0NBQvv/+e7y8vAgJCSEsLIykpCQcHR0JDg5GrW58rlBQUFAvlrvuuqtF22qYi6WlpS3qZwtSkBSuT58+TJo0iS+//BKoSY7nnnuOqKgok33y8vK47777gJri4enpCcA777yDSqXim2++wdXVlb1799a71txStTHMmTPH7L6enp7k5+fXi1WItqKUfAoJCWHVqlXceeedhISEEBQUxCuvvIKjoyMhISFG+3h4eNTLl46YO3LJTmFycnL429/+xqVLl4CaSbdr1y4GDx4MwPTp0/noo4/49ddfAbhx4wbJycn1xvj0008pLi4mLy+PhIQEw3ctpaWlODs706VLF/Lz8/nkk08sinHKlCls2bKFY8eOodfrKSsrY//+/ZSUlDTbNyIigh07dpCTk0N5eTnr1q0za9vdu3fn3LlzFsUtbj9Kzad7770XR0dHdu7cSWhoKC4uLnTr1o1vv/3WZEEaO3Ysn3/+OZcuXaK4uJiPPvqo3vqOkBtSkBTGxcWFY8eOMWXKFPz9/Zk6dSp9+/Zl6dKlADz88MPMmjWLv/zlLwQGBhIZGUlKSkq9McLDw5k0aRITJkxg5MiRTJ48GYC5c+dy4sQJgoODeeaZZ3jkkUcsinHgwIG89tprrFixgpCQEB555BHDTQvNGTFiBDNnziQ2NpaHH37Y8MbQ0ttw586dy9KlSwkODpa77ESzlJxPoaGhuLq6Gi7DhYaGotfruf/++422nzp1KsOGDSM6OpqJEyc22l5sbKyhoK1cudKsWJRCJX+gr2Pp168f3333Hb169bJ1KC2Sk5NDZGQkP//8M3Z2cgVZKEt7y6f2Ts6QRJv717/+RWVlJcXFxbz99tuMGjVKipEQQgqSaHtbtmxhyJAhPPzww2g0Gl599VVbhySEUAC5ZCeEEEIR5AxJCCGEIsiFexN0Oh2lpaXY29ujUqlsHY5QAL1eT1VVFZ07dzb6i4uiMckjUVdzOdRkQZo1axbh4eHMmDGj3oBjxozhzTffNPq8pZbKz89n0aJFfP755xb1P3/+PDExMUafhWYNpaWlnDp16paMLdq3vn370qVLlxa3lzySPBL1mcqhJgtSTEwMGzdurJdIqampqNVqk7+8VZdOp0OlUhn9ZOTl5WVxErWGVqtFo9E0287e3h6oOXBt/aj6w1l57Nh/msLim3R2qvkRlZZX4961E5NG+vDAgMaP22nYz71rJwb16cbxnKuG13X7bko+wYHMi+h1elRqFQO8O/H8k8MsimHNpjSyc68ZXvv1cmPhk8Gt2u/m9rW1srKy6j3CqCUqKys5deqUYW60lORR/TxqOPdG+N/Fk2P7mxzD3PZgek42NZap+WdquSVzqC3neGtYsm8t0VwONVmQwsPDefXVV8nJyaFPnz4A7Nixg0mTJvHxxx/z3XffodVq8fLy4rXXXsPDw4P4+Hh+/fVXSkpKuHjxIl988QXvvvsuhw8fxsHBAWdnZ7Zs2dLok1lGRgarVq0yPGdp8eLFDBs2jOPHj/P6669TVlaGs7MzL774IoMGDWoUa0pKCu+88w5arRZ3d3dWrFhBr169SE1NZeXKlQwYMIATJ04wf/58Ro0a1eyBq01+BwcHHB0dm21vLfvTz/H+1iwqqrQAFJVqDeuKSkuJ35qFHg0jg7yb6VfK/10sNdo3+8xVdh86W6//jydK6PJNNnMm+5sVw0vrf+LY6av1xjp04gqv/e0oK+cMo6WMxW9qX63F0p+ruZeeJI9+z6P12zIbzb2vfzxLZbWKOZP9G/U3tz1gck4+8+YPFN6orLe8diy/3t2Mzr8TZ4r4Pu280XnZBfPmkC3meGvcyvc9UznU5IVwBwcHxo8fb3jceklJCXv37uWuu+7i3LlzfPXVVyQmJjJ8+HDDwwsBjh8/zurVq9mzZw95eXmkpqaye/dudu7cyYcffthoO0VFRcydO5cXXniBnTt3kpiYyMCBA6msrGTevHnMnz+fb775hueff5558+bV+zMHAFevXmXx4sWsXr2ab775hsjISBYtWmRYf/r0aaZOnUpSUlKLksiWEpKzDRPWmIoqLQnJjZ9+3Vy/un33pJ41ur52uTkxNEz8WqaWm2Jsm6b2tb2RPPpdc3OvtcvB9NxrWIzqjmVq/u1JPWu1edmR57i1NHtTw+TJk5k1axY
|
2022-05-17 20:01:51 +02:00
|
|
|
|
"text/plain": [
|
|
|
|
|
"<Figure size 432x288 with 4 Axes>"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"output_type": "display_data"
|
|
|
|
|
}
|
|
|
|
|
],
|
|
|
|
|
"source": [
|
|
|
|
|
"sep_len = X_test[\"sepal.length\"]\n",
|
|
|
|
|
"sep_with = X_test[\"sepal.width\"]\n",
|
|
|
|
|
"pet_len = X_test[\"petal.length\"]\n",
|
|
|
|
|
"pet_with = X_test[\"petal.width\"]\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"figure, axes = plt.subplots(nrows=2, ncols=2)\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"axes[0, 0].plot(sep_len, predictions, 'bo')\n",
|
|
|
|
|
"axes[0, 0].set_title(\"Sepal lenght\")\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"axes[0, 1].plot(sep_with, predictions, 'bo')\n",
|
|
|
|
|
"axes[0, 1].set_title(\"Sepal width\")\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"axes[1, 0].plot(pet_len, predictions, 'bo')\n",
|
|
|
|
|
"axes[1, 0].set_title(\"Petal length\")\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"axes[1, 1].plot(pet_with, predictions, 'bo')\n",
|
|
|
|
|
"axes[1, 1].set_title(\"Petal width\")\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"figure.tight_layout()\n",
|
|
|
|
|
"plt.show()"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "code",
|
|
|
|
|
"execution_count": null,
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"outputs": [],
|
|
|
|
|
"source": []
|
2022-05-17 17:30:50 +02:00
|
|
|
|
}
|
|
|
|
|
],
|
|
|
|
|
"metadata": {
|
|
|
|
|
"colab": {
|
|
|
|
|
"collapsed_sections": [],
|
|
|
|
|
"name": "naive_bayes.ipynb",
|
|
|
|
|
"provenance": []
|
|
|
|
|
},
|
|
|
|
|
"kernelspec": {
|
2022-05-17 20:01:51 +02:00
|
|
|
|
"display_name": "Python 3 (ipykernel)",
|
2022-05-17 17:30:50 +02:00
|
|
|
|
"language": "python",
|
|
|
|
|
"name": "python3"
|
|
|
|
|
},
|
|
|
|
|
"language_info": {
|
|
|
|
|
"codemirror_mode": {
|
|
|
|
|
"name": "ipython",
|
|
|
|
|
"version": 3
|
|
|
|
|
},
|
|
|
|
|
"file_extension": ".py",
|
|
|
|
|
"mimetype": "text/x-python",
|
|
|
|
|
"name": "python",
|
|
|
|
|
"nbconvert_exporter": "python",
|
|
|
|
|
"pygments_lexer": "ipython3",
|
2022-05-17 23:08:12 +02:00
|
|
|
|
"version": "3.8.12"
|
2022-05-17 17:30:50 +02:00
|
|
|
|
}
|
|
|
|
|
},
|
|
|
|
|
"nbformat": 4,
|
|
|
|
|
"nbformat_minor": 4
|
|
|
|
|
}
|