diff --git a/UMA12_ANTONIO_ProjektFakeNews.ipynb b/UMA12_ANTONIO_ProjektFakeNews.ipynb
deleted file mode 100644
index 3a7b648..0000000
--- a/UMA12_ANTONIO_ProjektFakeNews.ipynb
+++ /dev/null
@@ -1,550 +0,0 @@
-{
- "cells": [
- {
- "cell_type": "markdown",
- "id": "statewide-crown",
- "metadata": {},
- "source": [
- "**12. Projekt**\n",
- "================="
- ]
- },
- {
- "cell_type": "markdown",
- "id": "explicit-bunch",
- "metadata": {},
- "source": [
- "## **1. Cel projektu**"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "encouraging-officer",
- "metadata": {},
- "source": [
- "#### Celem projektu jest przewidzenie ze zbioru danych jakie widomości są Fake Newsami, użyte algorytmy:\n",
- "* TfidfVectorizer\n",
- "* PassiveAggressiveClassifier\n",
- "\n",
- "Opis algorytmów.\n",
- "\n",
- "**TF (Term Frequency):** Liczba wystąpień danego słowa w dokumencie to jego częstotliwość występowania. Wyższa wartość oznacza, że dany termin pojawia się częściej niż inne, a zatem dokument jest dobrze dopasowany, jeśli termin ten jest częścią wyszukiwanych słów.\n",
- "\n",
- "Wektorator TfidfVectorizer przekształca zbiór dokumentów w macierz cech TF-IDF.\n",
- "\n",
- "**Algorytmy pasywno-agresywne** to algorytmy uczące się online. Taki algorytm pozostaje pasywny w przypadku poprawnego wyniku klasyfikacji, a staje się agresywny w przypadku błędnego obliczenia, aktualizując i dostosowując się. W przeciwieństwie do większości innych algorytmów nie jest on zbieżny. Jego zadaniem jest dokonywanie aktualizacji korygujących stratę, powodujących bardzo niewielkie zmiany w normie wektora wag.\n",
- "\n",
- "\n",
- "\n"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "informal-filename",
- "metadata": {},
- "source": [
- "#### Dane news.csv wykorzstane do uczenia pochodzą ze strony https://paperswithcode.com/datasets?task=fake-news-detection"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "beginning-minute",
- "metadata": {},
- "source": [
- "## **2. Importowanie potrzebnych bibliotek**"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "effective-democracy",
- "metadata": {},
- "outputs": [],
- "source": [
- "import numpy as np\n",
- "import pandas as pd\n",
- "import itertools\n",
- "from sklearn.model_selection import train_test_split\n",
- "from sklearn.feature_extraction.text import TfidfVectorizer\n",
- "from sklearn.linear_model import PassiveAggressiveClassifier\n",
- "from sklearn.metrics import accuracy_score, confusion_matrix"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "alternative-knock",
- "metadata": {},
- "source": [
- "## **3. Wczytanie danych**"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 56,
- "id": "worldwide-blake",
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/html": [
- "
\n",
- "\n",
- "
\n",
- " \n",
- " \n",
- " | \n",
- " Unnamed: 0 | \n",
- " title | \n",
- " text | \n",
- " label | \n",
- "
\n",
- " \n",
- " \n",
- " \n",
- " 0 | \n",
- " 8476 | \n",
- " You Can Smell Hillary’s Fear | \n",
- " Daniel Greenfield, a Shillman Journalism Fello... | \n",
- " FAKE | \n",
- "
\n",
- " \n",
- " 1 | \n",
- " 10294 | \n",
- " Watch The Exact Moment Paul Ryan Committed Pol... | \n",
- " Google Pinterest Digg Linkedin Reddit Stumbleu... | \n",
- " FAKE | \n",
- "
\n",
- " \n",
- " 2 | \n",
- " 3608 | \n",
- " Kerry to go to Paris in gesture of sympathy | \n",
- " U.S. Secretary of State John F. Kerry said Mon... | \n",
- " REAL | \n",
- "
\n",
- " \n",
- " 3 | \n",
- " 10142 | \n",
- " Bernie supporters on Twitter erupt in anger ag... | \n",
- " — Kaydee King (@KaydeeKing) November 9, 2016 T... | \n",
- " FAKE | \n",
- "
\n",
- " \n",
- " 4 | \n",
- " 875 | \n",
- " The Battle of New York: Why This Primary Matters | \n",
- " It's primary day in New York and front-runners... | \n",
- " REAL | \n",
- "
\n",
- " \n",
- " 5 | \n",
- " 6903 | \n",
- " Tehran, USA | \n",
- " \\nI’m not an immigrant, but my grandparents ... | \n",
- " FAKE | \n",
- "
\n",
- " \n",
- " 6 | \n",
- " 7341 | \n",
- " Girl Horrified At What She Watches Boyfriend D... | \n",
- " Share This Baylee Luciani (left), Screenshot o... | \n",
- " FAKE | \n",
- "
\n",
- " \n",
- " 7 | \n",
- " 95 | \n",
- " ‘Britain’s Schindler’ Dies at 106 | \n",
- " A Czech stockbroker who saved more than 650 Je... | \n",
- " REAL | \n",
- "
\n",
- " \n",
- " 8 | \n",
- " 4869 | \n",
- " Fact check: Trump and Clinton at the 'commande... | \n",
- " Hillary Clinton and Donald Trump made some ina... | \n",
- " REAL | \n",
- "
\n",
- " \n",
- " 9 | \n",
- " 2909 | \n",
- " Iran reportedly makes new push for uranium con... | \n",
- " Iranian negotiators reportedly have made a las... | \n",
- " REAL | \n",
- "
\n",
- " \n",
- " 10 | \n",
- " 1357 | \n",
- " With all three Clintons in Iowa, a glimpse at ... | \n",
- " CEDAR RAPIDS, Iowa — “I had one of the most wo... | \n",
- " REAL | \n",
- "
\n",
- " \n",
- " 11 | \n",
- " 988 | \n",
- " Donald Trump’s Shockingly Weak Delegate Game S... | \n",
- " Donald Trump’s organizational problems have go... | \n",
- " REAL | \n",
- "
\n",
- " \n",
- " 12 | \n",
- " 7041 | \n",
- " Strong Solar Storm, Tech Risks Today | S0 News... | \n",
- " Click Here To Learn More About Alexandra's Per... | \n",
- " FAKE | \n",
- "
\n",
- " \n",
- " 13 | \n",
- " 7623 | \n",
- " 10 Ways America Is Preparing for World War 3 | \n",
- " October 31, 2016 at 4:52 am \\nPretty factual e... | \n",
- " FAKE | \n",
- "
\n",
- " \n",
- " 14 | \n",
- " 1571 | \n",
- " Trump takes on Cruz, but lightly | \n",
- " Killing Obama administration rules, dismantlin... | \n",
- " REAL | \n",
- "
\n",
- " \n",
- " 15 | \n",
- " 4739 | \n",
- " How women lead differently | \n",
- " As more women move into high offices, they oft... | \n",
- " REAL | \n",
- "
\n",
- " \n",
- " 16 | \n",
- " 7737 | \n",
- " Shocking! Michele Obama & Hillary Caught Glamo... | \n",
- " Shocking! Michele Obama & Hillary Caught Glamo... | \n",
- " FAKE | \n",
- "
\n",
- " \n",
- " 17 | \n",
- " 8716 | \n",
- " Hillary Clinton in HUGE Trouble After America ... | \n",
- " 0 \\nHillary Clinton has barely just lost the p... | \n",
- " FAKE | \n",
- "
\n",
- " \n",
- " 18 | \n",
- " 3304 | \n",
- " What's in that Iran bill that Obama doesn't like? | \n",
- " Washington (CNN) For months, the White House a... | \n",
- " REAL | \n",
- "
\n",
- " \n",
- " 19 | \n",
- " 3078 | \n",
- " The 1 chart that explains everything you need ... | \n",
- " While paging through Pew's best data visualiza... | \n",
- " REAL | \n",
- "
\n",
- " \n",
- "
\n",
- "
"
- ],
- "text/plain": [
- " Unnamed: 0 title \\\n",
- "0 8476 You Can Smell Hillary’s Fear \n",
- "1 10294 Watch The Exact Moment Paul Ryan Committed Pol... \n",
- "2 3608 Kerry to go to Paris in gesture of sympathy \n",
- "3 10142 Bernie supporters on Twitter erupt in anger ag... \n",
- "4 875 The Battle of New York: Why This Primary Matters \n",
- "5 6903 Tehran, USA \n",
- "6 7341 Girl Horrified At What She Watches Boyfriend D... \n",
- "7 95 ‘Britain’s Schindler’ Dies at 106 \n",
- "8 4869 Fact check: Trump and Clinton at the 'commande... \n",
- "9 2909 Iran reportedly makes new push for uranium con... \n",
- "10 1357 With all three Clintons in Iowa, a glimpse at ... \n",
- "11 988 Donald Trump’s Shockingly Weak Delegate Game S... \n",
- "12 7041 Strong Solar Storm, Tech Risks Today | S0 News... \n",
- "13 7623 10 Ways America Is Preparing for World War 3 \n",
- "14 1571 Trump takes on Cruz, but lightly \n",
- "15 4739 How women lead differently \n",
- "16 7737 Shocking! Michele Obama & Hillary Caught Glamo... \n",
- "17 8716 Hillary Clinton in HUGE Trouble After America ... \n",
- "18 3304 What's in that Iran bill that Obama doesn't like? \n",
- "19 3078 The 1 chart that explains everything you need ... \n",
- "\n",
- " text label \n",
- "0 Daniel Greenfield, a Shillman Journalism Fello... FAKE \n",
- "1 Google Pinterest Digg Linkedin Reddit Stumbleu... FAKE \n",
- "2 U.S. Secretary of State John F. Kerry said Mon... REAL \n",
- "3 — Kaydee King (@KaydeeKing) November 9, 2016 T... FAKE \n",
- "4 It's primary day in New York and front-runners... REAL \n",
- "5 \\nI’m not an immigrant, but my grandparents ... FAKE \n",
- "6 Share This Baylee Luciani (left), Screenshot o... FAKE \n",
- "7 A Czech stockbroker who saved more than 650 Je... REAL \n",
- "8 Hillary Clinton and Donald Trump made some ina... REAL \n",
- "9 Iranian negotiators reportedly have made a las... REAL \n",
- "10 CEDAR RAPIDS, Iowa — “I had one of the most wo... REAL \n",
- "11 Donald Trump’s organizational problems have go... REAL \n",
- "12 Click Here To Learn More About Alexandra's Per... FAKE \n",
- "13 October 31, 2016 at 4:52 am \\nPretty factual e... FAKE \n",
- "14 Killing Obama administration rules, dismantlin... REAL \n",
- "15 As more women move into high offices, they oft... REAL \n",
- "16 Shocking! Michele Obama & Hillary Caught Glamo... FAKE \n",
- "17 0 \\nHillary Clinton has barely just lost the p... FAKE \n",
- "18 Washington (CNN) For months, the White House a... REAL \n",
- "19 While paging through Pew's best data visualiza... REAL "
- ]
- },
- "execution_count": 56,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "df=pd.read_csv('news.csv')\n",
- "df.shape\n",
- "df.head(20)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 57,
- "id": "major-section",
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- "0 FAKE\n",
- "1 FAKE\n",
- "2 REAL\n",
- "3 FAKE\n",
- "4 REAL\n",
- "Name: label, dtype: object"
- ]
- },
- "execution_count": 57,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "labels=df.label\n",
- "labels.head()"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "surprised-desperate",
- "metadata": {},
- "source": [
- "## **4. Wizualizacja cech na histogramach**"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 52,
- "id": "literary-correlation",
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- "array([[]],\n",
- " dtype=object)"
- ]
- },
- "execution_count": 52,
- "metadata": {},
- "output_type": "execute_result"
- },
- {
- "data": {
- "image/png": "\n",
- "text/plain": [
- ""
- ]
- },
- "metadata": {
- "needs_background": "light"
- },
- "output_type": "display_data"
- }
- ],
- "source": [
- "df.hist(figsize=(20,20), xrot=-45)"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "hungry-costa",
- "metadata": {},
- "source": [
- "## **5. Podział na zbiór testowy i treningowy**"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 53,
- "id": "optical-wales",
- "metadata": {},
- "outputs": [],
- "source": [
- "x_train,x_test,y_train,y_test=train_test_split(df['text'], labels, test_size=0.2, random_state=7)"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "continued-system",
- "metadata": {},
- "source": [
- "## **6. Trenowanie**"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "oriental-trinity",
- "metadata": {},
- "source": [
- "### 6.1. Użycie TFIDF\n",
- "\n",
- "Inicjalizuje wektor TfidfVectorizer ze słowami stop z języka angielskiego i maksymalną częstotliwością występowania w dokumentach wynoszącą 0,7 (terminy o wyższej częstotliwości występowania w dokumentach zostaną odrzucone). Stop words to najczęściej występujące słowa w danym języku, które należy odfiltrować przed przetworzeniem danych języka naturalnego. Wektoryzator TfidfVectorizer przekształca zbiór nieprzetworzonych dokumentów w macierz cech TF-IDF.\n",
- "\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 58,
- "id": "south-liability",
- "metadata": {},
- "outputs": [],
- "source": [
- "tfidf_vectorizer=TfidfVectorizer(stop_words='english', max_df=0.7)\n",
- "\n",
- "tfidf_train=tfidf_vectorizer.fit_transform(x_train) \n",
- "tfidf_test=tfidf_vectorizer.transform(x_test)"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "linear-chest",
- "metadata": {},
- "source": [
- "### 6.2. PassiveAggressiveClassifier"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 55,
- "id": "flying-gabriel",
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "Dokładność: 92.82%\n"
- ]
- }
- ],
- "source": [
- "pac=PassiveAggressiveClassifier(max_iter=50)\n",
- "pac.fit(tfidf_train,y_train)\n",
- "\n",
- "#Przewidzenie na zbiorze testowym i kalkulacja dokładnośći\n",
- "y_pred=pac.predict(tfidf_test)\n",
- "score=accuracy_score(y_test,y_pred)\n",
- "print(f'Dokładność: {round(score*100,2)}%')"
- ]
- }
- ],
- {
- "cell_type": "markdown",
- "id": "balanced-security",
- "metadata": {},
- "source": [
- "## **7. Podsumowanie wyników**"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "military-radar",
- "metadata": {},
- "source": [
- "W tym modelu uzyskaliśmy dokładność 92,82%. Na koniec wydrukujmy macierz konfuzji, aby uzyskać wgląd w liczbę fałszywych i prawdziwych wyników negatywnych i pozytywnych."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 60,
- "id": "fifty-melbourne",
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- "array([[590, 48],\n",
- " [ 43, 586]])"
- ]
- },
- "execution_count": 60,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "confusion_matrix(y_test,y_pred, labels=['FAKE','REAL'])"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "collectible-ireland",
- "metadata": {},
- "source": [
- "W przypadku tego modelu mamy 590 prawdziwych wyników dodatnich, 586 prawdziwych wyników ujemnych, 43 fałszywe wyniki dodatnie i 48 fałszywych wyników ujemnych."
- ]
- },
- {
- "cell_type": "markdown",
- "id": "natural-premiere",
- "metadata": {},
- "source": []
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "crucial-geneva",
- "metadata": {},
- "outputs": [],
- "source": []
- }
- ],
- "metadata": {
- "kernelspec": {
- "display_name": "Python 3",
- "language": "python",
- "name": "python3"
- },
- "language_info": {
- "codemirror_mode": {
- "name": "ipython",
- "version": 3
- },
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3",
- "version": "3.7.3"
- }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}