559 lines
38 KiB
Plaintext
559 lines
38 KiB
Plaintext
|
{
|
|||
|
"cells": [
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"id": "statewide-crown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"**12. Projekt**\n",
|
|||
|
"================="
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"id": "explicit-bunch",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"## **1. Cel projektu**"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"id": "encouraging-officer",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"#### Celem projektu jest przewidzenie ze zbioru danych jakie widomości są Fake Newsami, użyte algorytmy:\n",
|
|||
|
"* TfidfVectorizer\n",
|
|||
|
"* PassiveAggressiveClassifier\n",
|
|||
|
"\n",
|
|||
|
"Opis algorytmów.\n",
|
|||
|
"\n",
|
|||
|
"**TF (Term Frequency):** Liczba wystąpień danego słowa w dokumencie to jego częstotliwość występowania. Wyższa wartość oznacza, że dany termin pojawia się częściej niż inne, a zatem dokument jest dobrze dopasowany, jeśli termin ten jest częścią wyszukiwanych słów.\n",
|
|||
|
"\n",
|
|||
|
"Wektorator TfidfVectorizer przekształca zbiór dokumentów w macierz cech TF-IDF.\n",
|
|||
|
"\n",
|
|||
|
"**Algorytmy pasywno-agresywne** to algorytmy uczące się online. Taki algorytm pozostaje pasywny w przypadku poprawnego wyniku klasyfikacji, a staje się agresywny w przypadku błędnego obliczenia, aktualizując i dostosowując się. W przeciwieństwie do większości innych algorytmów nie jest on zbieżny. Jego zadaniem jest dokonywanie aktualizacji korygujących stratę, powodujących bardzo niewielkie zmiany w normie wektora wag.\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"id": "informal-filename",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"#### Dane news.csv wykorzstane do uczenia pochodzą ze strony https://paperswithcode.com/datasets?task=fake-news-detection"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"id": "beginning-minute",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"## **2. Importowanie potrzebnych bibliotek**"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"id": "effective-democracy",
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"import numpy as np\n",
|
|||
|
"import pandas as pd\n",
|
|||
|
"import itertools\n",
|
|||
|
"from sklearn.model_selection import train_test_split\n",
|
|||
|
"from sklearn.feature_extraction.text import TfidfVectorizer\n",
|
|||
|
"from sklearn.linear_model import PassiveAggressiveClassifier\n",
|
|||
|
"from sklearn.metrics import accuracy_score, confusion_matrix"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"id": "alternative-knock",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"## **3. Wczytanie danych**"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 56,
|
|||
|
"id": "worldwide-blake",
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/html": [
|
|||
|
"<div>\n",
|
|||
|
"<style scoped>\n",
|
|||
|
" .dataframe tbody tr th:only-of-type {\n",
|
|||
|
" vertical-align: middle;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe tbody tr th {\n",
|
|||
|
" vertical-align: top;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe thead th {\n",
|
|||
|
" text-align: right;\n",
|
|||
|
" }\n",
|
|||
|
"</style>\n",
|
|||
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|||
|
" <thead>\n",
|
|||
|
" <tr style=\"text-align: right;\">\n",
|
|||
|
" <th></th>\n",
|
|||
|
" <th>Unnamed: 0</th>\n",
|
|||
|
" <th>title</th>\n",
|
|||
|
" <th>text</th>\n",
|
|||
|
" <th>label</th>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </thead>\n",
|
|||
|
" <tbody>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>0</th>\n",
|
|||
|
" <td>8476</td>\n",
|
|||
|
" <td>You Can Smell Hillary’s Fear</td>\n",
|
|||
|
" <td>Daniel Greenfield, a Shillman Journalism Fello...</td>\n",
|
|||
|
" <td>FAKE</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>1</th>\n",
|
|||
|
" <td>10294</td>\n",
|
|||
|
" <td>Watch The Exact Moment Paul Ryan Committed Pol...</td>\n",
|
|||
|
" <td>Google Pinterest Digg Linkedin Reddit Stumbleu...</td>\n",
|
|||
|
" <td>FAKE</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>2</th>\n",
|
|||
|
" <td>3608</td>\n",
|
|||
|
" <td>Kerry to go to Paris in gesture of sympathy</td>\n",
|
|||
|
" <td>U.S. Secretary of State John F. Kerry said Mon...</td>\n",
|
|||
|
" <td>REAL</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>3</th>\n",
|
|||
|
" <td>10142</td>\n",
|
|||
|
" <td>Bernie supporters on Twitter erupt in anger ag...</td>\n",
|
|||
|
" <td>— Kaydee King (@KaydeeKing) November 9, 2016 T...</td>\n",
|
|||
|
" <td>FAKE</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>4</th>\n",
|
|||
|
" <td>875</td>\n",
|
|||
|
" <td>The Battle of New York: Why This Primary Matters</td>\n",
|
|||
|
" <td>It's primary day in New York and front-runners...</td>\n",
|
|||
|
" <td>REAL</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>5</th>\n",
|
|||
|
" <td>6903</td>\n",
|
|||
|
" <td>Tehran, USA</td>\n",
|
|||
|
" <td>\\nI’m not an immigrant, but my grandparents ...</td>\n",
|
|||
|
" <td>FAKE</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>6</th>\n",
|
|||
|
" <td>7341</td>\n",
|
|||
|
" <td>Girl Horrified At What She Watches Boyfriend D...</td>\n",
|
|||
|
" <td>Share This Baylee Luciani (left), Screenshot o...</td>\n",
|
|||
|
" <td>FAKE</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>7</th>\n",
|
|||
|
" <td>95</td>\n",
|
|||
|
" <td>‘Britain’s Schindler’ Dies at 106</td>\n",
|
|||
|
" <td>A Czech stockbroker who saved more than 650 Je...</td>\n",
|
|||
|
" <td>REAL</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>8</th>\n",
|
|||
|
" <td>4869</td>\n",
|
|||
|
" <td>Fact check: Trump and Clinton at the 'commande...</td>\n",
|
|||
|
" <td>Hillary Clinton and Donald Trump made some ina...</td>\n",
|
|||
|
" <td>REAL</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>9</th>\n",
|
|||
|
" <td>2909</td>\n",
|
|||
|
" <td>Iran reportedly makes new push for uranium con...</td>\n",
|
|||
|
" <td>Iranian negotiators reportedly have made a las...</td>\n",
|
|||
|
" <td>REAL</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>10</th>\n",
|
|||
|
" <td>1357</td>\n",
|
|||
|
" <td>With all three Clintons in Iowa, a glimpse at ...</td>\n",
|
|||
|
" <td>CEDAR RAPIDS, Iowa — “I had one of the most wo...</td>\n",
|
|||
|
" <td>REAL</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>11</th>\n",
|
|||
|
" <td>988</td>\n",
|
|||
|
" <td>Donald Trump’s Shockingly Weak Delegate Game S...</td>\n",
|
|||
|
" <td>Donald Trump’s organizational problems have go...</td>\n",
|
|||
|
" <td>REAL</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>12</th>\n",
|
|||
|
" <td>7041</td>\n",
|
|||
|
" <td>Strong Solar Storm, Tech Risks Today | S0 News...</td>\n",
|
|||
|
" <td>Click Here To Learn More About Alexandra's Per...</td>\n",
|
|||
|
" <td>FAKE</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>13</th>\n",
|
|||
|
" <td>7623</td>\n",
|
|||
|
" <td>10 Ways America Is Preparing for World War 3</td>\n",
|
|||
|
" <td>October 31, 2016 at 4:52 am \\nPretty factual e...</td>\n",
|
|||
|
" <td>FAKE</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>14</th>\n",
|
|||
|
" <td>1571</td>\n",
|
|||
|
" <td>Trump takes on Cruz, but lightly</td>\n",
|
|||
|
" <td>Killing Obama administration rules, dismantlin...</td>\n",
|
|||
|
" <td>REAL</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>15</th>\n",
|
|||
|
" <td>4739</td>\n",
|
|||
|
" <td>How women lead differently</td>\n",
|
|||
|
" <td>As more women move into high offices, they oft...</td>\n",
|
|||
|
" <td>REAL</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>16</th>\n",
|
|||
|
" <td>7737</td>\n",
|
|||
|
" <td>Shocking! Michele Obama & Hillary Caught Glamo...</td>\n",
|
|||
|
" <td>Shocking! Michele Obama & Hillary Caught Glamo...</td>\n",
|
|||
|
" <td>FAKE</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>17</th>\n",
|
|||
|
" <td>8716</td>\n",
|
|||
|
" <td>Hillary Clinton in HUGE Trouble After America ...</td>\n",
|
|||
|
" <td>0 \\nHillary Clinton has barely just lost the p...</td>\n",
|
|||
|
" <td>FAKE</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>18</th>\n",
|
|||
|
" <td>3304</td>\n",
|
|||
|
" <td>What's in that Iran bill that Obama doesn't like?</td>\n",
|
|||
|
" <td>Washington (CNN) For months, the White House a...</td>\n",
|
|||
|
" <td>REAL</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>19</th>\n",
|
|||
|
" <td>3078</td>\n",
|
|||
|
" <td>The 1 chart that explains everything you need ...</td>\n",
|
|||
|
" <td>While paging through Pew's best data visualiza...</td>\n",
|
|||
|
" <td>REAL</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </tbody>\n",
|
|||
|
"</table>\n",
|
|||
|
"</div>"
|
|||
|
],
|
|||
|
"text/plain": [
|
|||
|
" Unnamed: 0 title \\\n",
|
|||
|
"0 8476 You Can Smell Hillary’s Fear \n",
|
|||
|
"1 10294 Watch The Exact Moment Paul Ryan Committed Pol... \n",
|
|||
|
"2 3608 Kerry to go to Paris in gesture of sympathy \n",
|
|||
|
"3 10142 Bernie supporters on Twitter erupt in anger ag... \n",
|
|||
|
"4 875 The Battle of New York: Why This Primary Matters \n",
|
|||
|
"5 6903 Tehran, USA \n",
|
|||
|
"6 7341 Girl Horrified At What She Watches Boyfriend D... \n",
|
|||
|
"7 95 ‘Britain’s Schindler’ Dies at 106 \n",
|
|||
|
"8 4869 Fact check: Trump and Clinton at the 'commande... \n",
|
|||
|
"9 2909 Iran reportedly makes new push for uranium con... \n",
|
|||
|
"10 1357 With all three Clintons in Iowa, a glimpse at ... \n",
|
|||
|
"11 988 Donald Trump’s Shockingly Weak Delegate Game S... \n",
|
|||
|
"12 7041 Strong Solar Storm, Tech Risks Today | S0 News... \n",
|
|||
|
"13 7623 10 Ways America Is Preparing for World War 3 \n",
|
|||
|
"14 1571 Trump takes on Cruz, but lightly \n",
|
|||
|
"15 4739 How women lead differently \n",
|
|||
|
"16 7737 Shocking! Michele Obama & Hillary Caught Glamo... \n",
|
|||
|
"17 8716 Hillary Clinton in HUGE Trouble After America ... \n",
|
|||
|
"18 3304 What's in that Iran bill that Obama doesn't like? \n",
|
|||
|
"19 3078 The 1 chart that explains everything you need ... \n",
|
|||
|
"\n",
|
|||
|
" text label \n",
|
|||
|
"0 Daniel Greenfield, a Shillman Journalism Fello... FAKE \n",
|
|||
|
"1 Google Pinterest Digg Linkedin Reddit Stumbleu... FAKE \n",
|
|||
|
"2 U.S. Secretary of State John F. Kerry said Mon... REAL \n",
|
|||
|
"3 — Kaydee King (@KaydeeKing) November 9, 2016 T... FAKE \n",
|
|||
|
"4 It's primary day in New York and front-runners... REAL \n",
|
|||
|
"5 \\nI’m not an immigrant, but my grandparents ... FAKE \n",
|
|||
|
"6 Share This Baylee Luciani (left), Screenshot o... FAKE \n",
|
|||
|
"7 A Czech stockbroker who saved more than 650 Je... REAL \n",
|
|||
|
"8 Hillary Clinton and Donald Trump made some ina... REAL \n",
|
|||
|
"9 Iranian negotiators reportedly have made a las... REAL \n",
|
|||
|
"10 CEDAR RAPIDS, Iowa — “I had one of the most wo... REAL \n",
|
|||
|
"11 Donald Trump’s organizational problems have go... REAL \n",
|
|||
|
"12 Click Here To Learn More About Alexandra's Per... FAKE \n",
|
|||
|
"13 October 31, 2016 at 4:52 am \\nPretty factual e... FAKE \n",
|
|||
|
"14 Killing Obama administration rules, dismantlin... REAL \n",
|
|||
|
"15 As more women move into high offices, they oft... REAL \n",
|
|||
|
"16 Shocking! Michele Obama & Hillary Caught Glamo... FAKE \n",
|
|||
|
"17 0 \\nHillary Clinton has barely just lost the p... FAKE \n",
|
|||
|
"18 Washington (CNN) For months, the White House a... REAL \n",
|
|||
|
"19 While paging through Pew's best data visualiza... REAL "
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 56,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"df=pd.read_csv('news.csv')\n",
|
|||
|
"df.shape\n",
|
|||
|
"df.head(20)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 57,
|
|||
|
"id": "major-section",
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"0 FAKE\n",
|
|||
|
"1 FAKE\n",
|
|||
|
"2 REAL\n",
|
|||
|
"3 FAKE\n",
|
|||
|
"4 REAL\n",
|
|||
|
"Name: label, dtype: object"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 57,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"labels=df.label\n",
|
|||
|
"labels.head()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"id": "surprised-desperate",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"## **4. Wizualizacja cech na histogramach**"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 52,
|
|||
|
"id": "literary-correlation",
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7f51d81ab470>]],\n",
|
|||
|
" dtype=object)"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 52,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAABIcAAASBCAYAAACuKP6GAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAAIABJREFUeJzs3XuspVd93vHnB8NNFGwT6BRsqkHBSavWgsCIUqG2Y4iicGlBKlS0qFzk1lIbpaliqThVWyUoVZ22iBCEori1VCeQmouCbHEJRRBLRRUkuATMJQKXGmwguIBxwy1h6Oof53VzasbMGebYZ2aez0canf2+7zp71t6sLaPvrL33rLUCAAAAQKcHHPQEAAAAADg44hAAAABAMXEIAAAAoJg4BAAAAFBMHAIAAAAoJg4BAAAAFBOHAAAOwMwcmZk1M4cOei4AQDdxCAA4o20B5Yn3OPfzM/OGg5rTQZiZR83M22bmGzPz2Zn5ewc9JwDg3OBfqgAAzg6vT/InSQ4neXKSd8zMR9ZaHz/YaQEAZzs7hwCAs9rMHJuZ22fmipm5Y2a+ODOv2HX9P83M62fmHTPzRzPzwZn54V3XXzszt83M/56Zm2bmr+269vMz85aZecP2uzfPzI/MzM9tf9dtM/MTu8afNzPXbHP4/Mz84sw8cLv2wJn59zPz5Zn5TJLnnsJjfHiSv53kX661vr7Wen+SG5L8/dN68gAAIg4BAOeGP5fkvCQXJrksyetn5oJd1/9ukl9IckGSW5L8613Xfi87O3EeleQ3k7xlZh666/rfTPIb2+9+OMm7s/P/oS5M8qokv7Zr7LVJjid5YpIfS/ITSf7Bdu0fJnnedv5okhfufgAzc+XMvP1eHt+PJPnuWutTu859JMlfupfxAAB7Jg4BAOeC7yR51VrrO2utdyb5epIf3XX9t9Zav7vWOp7kjdmJQUmStdYb1lpfWWsdX2u9OslD7vG7/3Wt9e7td9+S5DFJrlprfSfJdUmOzMz5M3M4ybOT/NO11jfWWnckeU2SF2/383eS/PJa67a11leT/JvdD2CtddVa63n38vj+TJK77nHuriSP2NOzAwDwffjMIQDgTPfdJA+6x7kHZScI3e0rW7y52zezE1Tu9of3dm1mrsjO7p7HJVlJHpnk0bvGf2nX7W8l+fJa67u7jrPd3+O2eX1xZu4e/4Akt223H7frdpJ8Nnv39W1euz0yyR+dwn0AAJyQOAQAnOk+l+RIkk/uOveEJJ864ehTsH2+0CuTPCvJx9da/2dm7kwy3/83T+i2JH+c5NH3CFV3+2KSx+86/vOncN+fSnJoZi5ea316O/ekJD6MGgA4bd5WBgCc6d6U5F/MzEUz84CZ+fHsfA7QW/fhvh+Rnc8I+l/ZiS//Kt+7Q2dP1lpfTPJfkrx6Zh65zfWHZ+ZvbEPenOSfbI/jgiRXnsJ9fyPJbyV51cw8fGaekeT52fksJACA0yIOAQBnulcl+W9J3p/kziT/NslL1lof24f7fneSd2VnZ85nk3w7//9bv07VS5M8OMknsjPXtyZ57HbtP2x/30eS/PfsxJ7/Z2b++cy86/vc9z9O8rAkdyT5z0n+ka+xBwD2w6y1DnoOAAAAABwQO4cAAAAAiolDAAAAAMXEIQAAAIBi4hAAAABAsUMHPYEkefSjH72OHDly0NPYF9/4xjfy8Ic//KCnAQfC+qeZ9U8z6592XgM0s/7PbDfddNOX11qPOdm4MyIOHTlyJB/60IcOehr74sYbb8yxY8cOehpwIKx/mln/NLP+aec1QDPr/8w2M5/dyzhvKwMAAAAoJg4BAAAAFBOHAAAAAIqJQwAAAADFxCEAAACAYuIQAAAAQDFxCAAAAKCYOAQAAABQTBwCAAAAKCYOAQAAABQThwAAAACKiUMAAAAAxcQhAAAAgGLiEAAAAEAxcQgAAACgmDgEAAAAUEwcAgAAACgmDgEAAAAUE4cAAAAAiolDAAAAAMXEIQAAAIBi4hAAAABAMXEIAAAAoJg4BAAAAFBMHAIAAAAoJg4BAAAAFBOHAAAAAIqJQwAAAADFxCEAAACAYuIQAAAAQDFxCAAAAKCYOAQAAABQTBwCAAAAKCYOAQAAABQThwAAAACKiUMAAAAAxcQhAAAAgGLiEAAAAEAxcQgAAACgmDgEAAAAUEwcAgAAACgmDgEAAAAUE4cAAAAAiolDAAAAAMXEIQAAAIBi4hAAAABAMXEIAAAAoJg4BAAAAFBMHAIAAAAoJg4BAAAAFDt00BMAAO7dkSvfcdBT4B5uveq5Bz0FAIB9ZecQAAAAQDFxCAAAAKCYOAQAAABQTBwCAAAAKCYOAQAAABQThwAAAACKiUMAAAAAxcQhAAAAgGLiEAAAAEAxcQgAAACg2KGDngAAAHDuOXLlOw56CvebKy45npefBY/31quee9BTAM5Qdg4BAAAAFBOHAAAAAIp5W9k+u/nzd50VW0qb2D4LAAAA987OIQAAAIBi4hAAAABAMXEIAAAAoJjPHAIA4KzX9LXpALDf7BwCAAAAKCYOAQAAABTztjIAknhLxum64pLjebnnEIAzmP/Wn5luveq5Bz0FsHMIAAAAoJk4BAAAAFBMHAIAAAAoJg4BAAAAFBOHAAAAAIqJQwAAAADFxCEAAACAYuIQAAAAQDFxCAAAAKCYOAQAAABQTBwCAAAAKCYOAQAAABQ7dNATAAA4mxy58h0HPYUzzhWXHM/LPS8AcNaycwgAAACgmDgEAAAAUEwcAgAAACjmM4eAA+EzOwAAAM4Mdg4BAAAAFBOHAAAAAIqJQwAAAADFxCEAAACAYuIQAAAAQDFxCAAAAKCYOAQAAABQTBwCAAAAKCYOAQAAABQThwAAAACKiUMAAAAAxcQhAAAAgGLiEAAAAEAxcQgAAACg2KGDngDc145c+Y6DnkKNKy45npd7vgEAAM4q4hAAAAAckLP9H7PPxX8gvvWq5x70FO533lYGAAAAUEwcAgAAACgmDgEAAAAUE4cAAAAAiolDAAAAAMXEIQAAAIBi4hAAAABAMXEIAAAAoJg4BAAAAFBMHAIAAAAoJg4BAAAAFBOHAAAAAIqJQwAAAADFxCEAAACAYuIQAAAAQDFxCAAAAKCYOAQAAABQTBwCAAAAKCYOAQAAABQThwAAAACKiUMAAAAAxcQhAAAAgGJ7ikMzc/7MvHVm/mBmPjkzf3VmHjUz75mZT28/L9jGzsz8yszcMjMfnZmn3LcPAQAAAIAf1F53Dr02yW+vtf5Ckicl+WSSK5O8d611cZL3bsdJ8uwkF29/Lk/yq/s6YwAAAAD2zUnj0Mw8MslfT3JNkqy1/mSt9bUkz09y7Tbs2iQv2G4/P8mvrx0fSHL+zDx232cOAAAAwGmbtdb3HzDz5CRXJ/lEdnYN3ZTkZ5J8fq11/q5xd661LpiZtye5aq31/u38e5O8cq31oXvc7+XZ2VmUw4cPP/W6667bv0d1gO746l350rcOehZwMA4/LNY/tax/mln/tPMaoNm5uP4vufC8g57Cvrn00ktvWmsdPdm4Q3u4r0NJnpLkp9daH5yZ1+ZP30J2InOCc99ToNZaV2cnOuXo0aPr2LFje5jKme91b7w+r755L08rnHuuuOS49U8t659m1j/tvAZodi6u/1tfcuygp3C/28tnDt2e5Pa11ge347dmJxZ96e63i20/79g1/vG7fv+iJF/Yn+kCAAAAsJ9OGofWWn+Y5LaZ+dHt1LOy8xazG5K8bDv3siTXb7dvSPLS7VvLnp7krrXWF/d32gAAAADsh73u/frpJG+cmQcn+UySV2QnLL15Zi5L8rkkL9rGvjPJc5LckuSb21gAAAAAzkB7ikNrrd9PcqIPMHrWCcauJD91mvMCAAAA4H6wl88cAgAAAOAcJQ4BAAAAFBOHAAAAAIqJQwAAAADFxCEAAACAYuIQAAAAQDFxCAAAAKCYOAQAAABQTBwCAAAAKCYOAQAAABQThwAAAACKiUMAAAAAxcQhAAAAgGLiEAAAAEAxcQgAAACgmDgEAAAAUEwcAgAAACgmDgE
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 1440x1440 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {
|
|||
|
"needs_background": "light"
|
|||
|
},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"df.hist(figsize=(20,20), xrot=-45)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"id": "hungry-costa",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"## **5. Podział na zbiór testowy i treningowy**"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 53,
|
|||
|
"id": "optical-wales",
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"x_train,x_test,y_train,y_test=train_test_split(df['text'], labels, test_size=0.2, random_state=7)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"id": "continued-system",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"## **6. Trenowanie**"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"id": "oriental-trinity",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"### 6.1. Użycie TFIDF\n",
|
|||
|
"\n",
|
|||
|
"Inicjalizuje wektor TfidfVectorizer ze słowami stop z języka angielskiego i maksymalną częstotliwością występowania w dokumentach wynoszącą 0,7 (terminy o wyższej częstotliwości występowania w dokumentach zostaną odrzucone). Stop words to najczęściej występujące słowa w danym języku, które należy odfiltrować przed przetworzeniem danych języka naturalnego. Wektoryzator TfidfVectorizer przekształca zbiór nieprzetworzonych dokumentów w macierz cech TF-IDF.\n",
|
|||
|
"\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 58,
|
|||
|
"id": "south-liability",
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"tfidf_vectorizer=TfidfVectorizer(stop_words='english', max_df=0.7)\n",
|
|||
|
"\n",
|
|||
|
"tfidf_train=tfidf_vectorizer.fit_transform(x_train) \n",
|
|||
|
"tfidf_test=tfidf_vectorizer.transform(x_test)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"id": "linear-chest",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"### 6.2. PassiveAggressiveClassifier"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 55,
|
|||
|
"id": "flying-gabriel",
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Dokładność: 92.82%\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"name": "stderr",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"/usr/lib/python3/dist-packages/sklearn/linear_model/base.py:283: DeprecationWarning: `np.int` is a deprecated alias for the builtin `int`. To silence this warning, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.\n",
|
|||
|
"Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations\n",
|
|||
|
" indices = (scores > 0).astype(np.int)\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"pac=PassiveAggressiveClassifier(max_iter=50)\n",
|
|||
|
"pac.fit(tfidf_train,y_train)\n",
|
|||
|
"\n",
|
|||
|
"#Przewidzenie na zbiorze testowym i kalkulacja dokładnośći\n",
|
|||
|
"y_pred=pac.predict(tfidf_test)\n",
|
|||
|
"score=accuracy_score(y_test,y_pred)\n",
|
|||
|
"print(f'Dokładność: {round(score*100,2)}%')"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"id": "balanced-security",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"## **7. Podsumowanie wyników**"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"id": "military-radar",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"W tym modelu uzyskaliśmy dokładność 92,82%. Na koniec wydrukujmy macierz konfuzji, aby uzyskać wgląd w liczbę fałszywych i prawdziwych wyników negatywnych i pozytywnych."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 60,
|
|||
|
"id": "fifty-melbourne",
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"array([[590, 48],\n",
|
|||
|
" [ 43, 586]])"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 60,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"confusion_matrix(y_test,y_pred, labels=['FAKE','REAL'])"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"id": "collectible-ireland",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"W przypadku tego modelu mamy 590 prawdziwych wyników dodatnich, 586 prawdziwych wyników ujemnych, 43 fałszywe wyniki dodatnie i 48 fałszywych wyników ujemnych."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"id": "natural-premiere",
|
|||
|
"metadata": {},
|
|||
|
"source": []
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"id": "crucial-geneva",
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": []
|
|||
|
}
|
|||
|
],
|
|||
|
"metadata": {
|
|||
|
"kernelspec": {
|
|||
|
"display_name": "Python 3",
|
|||
|
"language": "python",
|
|||
|
"name": "python3"
|
|||
|
},
|
|||
|
"language_info": {
|
|||
|
"codemirror_mode": {
|
|||
|
"name": "ipython",
|
|||
|
"version": 3
|
|||
|
},
|
|||
|
"file_extension": ".py",
|
|||
|
"mimetype": "text/x-python",
|
|||
|
"name": "python",
|
|||
|
"nbconvert_exporter": "python",
|
|||
|
"pygments_lexer": "ipython3",
|
|||
|
"version": "3.7.3"
|
|||
|
}
|
|||
|
},
|
|||
|
"nbformat": 4,
|
|||
|
"nbformat_minor": 5
|
|||
|
}
|