{ "cells": [ { "cell_type": "markdown", "id": "statewide-crown", "metadata": {}, "source": [ "**12. Projekt**\n", "=================" ] }, { "cell_type": "markdown", "id": "explicit-bunch", "metadata": {}, "source": [ "## **1. Cel projektu**" ] }, { "cell_type": "markdown", "id": "encouraging-officer", "metadata": {}, "source": [ "#### Celem projektu jest przewidzenie ze zbioru danych jakie widomości są Fake Newsami, użyte algorytmy:\n", "* TfidfVectorizer\n", "* PassiveAggressiveClassifier\n", "\n", "Opis algorytmów.\n", "\n", "**TF (Term Frequency):** Liczba wystąpień danego słowa w dokumencie to jego częstotliwość występowania. Wyższa wartość oznacza, że dany termin pojawia się częściej niż inne, a zatem dokument jest dobrze dopasowany, jeśli termin ten jest częścią wyszukiwanych słów.\n", "\n", "Wektorator TfidfVectorizer przekształca zbiór dokumentów w macierz cech TF-IDF.\n", "\n", "**Algorytmy pasywno-agresywne** to algorytmy uczące się online. Taki algorytm pozostaje pasywny w przypadku poprawnego wyniku klasyfikacji, a staje się agresywny w przypadku błędnego obliczenia, aktualizując i dostosowując się. W przeciwieństwie do większości innych algorytmów nie jest on zbieżny. Jego zadaniem jest dokonywanie aktualizacji korygujących stratę, powodujących bardzo niewielkie zmiany w normie wektora wag.\n", "\n", "\n", "\n" ] }, { "cell_type": "markdown", "id": "informal-filename", "metadata": {}, "source": [ "#### Dane news.csv wykorzstane do uczenia pochodzą ze strony https://paperswithcode.com/datasets?task=fake-news-detection" ] }, { "cell_type": "markdown", "id": "beginning-minute", "metadata": {}, "source": [ "## **2. Importowanie potrzebnych bibliotek**" ] }, { "cell_type": "code", "execution_count": null, "id": "effective-democracy", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import itertools\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.feature_extraction.text import TfidfVectorizer\n", "from sklearn.linear_model import PassiveAggressiveClassifier\n", "from sklearn.metrics import accuracy_score, confusion_matrix" ] }, { "cell_type": "markdown", "id": "alternative-knock", "metadata": {}, "source": [ "## **3. Wczytanie danych**" ] }, { "cell_type": "code", "execution_count": 56, "id": "worldwide-blake", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | Unnamed: 0 | \n", "title | \n", "text | \n", "label | \n", "
---|---|---|---|---|
0 | \n", "8476 | \n", "You Can Smell Hillary’s Fear | \n", "Daniel Greenfield, a Shillman Journalism Fello... | \n", "FAKE | \n", "
1 | \n", "10294 | \n", "Watch The Exact Moment Paul Ryan Committed Pol... | \n", "Google Pinterest Digg Linkedin Reddit Stumbleu... | \n", "FAKE | \n", "
2 | \n", "3608 | \n", "Kerry to go to Paris in gesture of sympathy | \n", "U.S. Secretary of State John F. Kerry said Mon... | \n", "REAL | \n", "
3 | \n", "10142 | \n", "Bernie supporters on Twitter erupt in anger ag... | \n", "— Kaydee King (@KaydeeKing) November 9, 2016 T... | \n", "FAKE | \n", "
4 | \n", "875 | \n", "The Battle of New York: Why This Primary Matters | \n", "It's primary day in New York and front-runners... | \n", "REAL | \n", "
5 | \n", "6903 | \n", "Tehran, USA | \n", "\\nI’m not an immigrant, but my grandparents ... | \n", "FAKE | \n", "
6 | \n", "7341 | \n", "Girl Horrified At What She Watches Boyfriend D... | \n", "Share This Baylee Luciani (left), Screenshot o... | \n", "FAKE | \n", "
7 | \n", "95 | \n", "‘Britain’s Schindler’ Dies at 106 | \n", "A Czech stockbroker who saved more than 650 Je... | \n", "REAL | \n", "
8 | \n", "4869 | \n", "Fact check: Trump and Clinton at the 'commande... | \n", "Hillary Clinton and Donald Trump made some ina... | \n", "REAL | \n", "
9 | \n", "2909 | \n", "Iran reportedly makes new push for uranium con... | \n", "Iranian negotiators reportedly have made a las... | \n", "REAL | \n", "
10 | \n", "1357 | \n", "With all three Clintons in Iowa, a glimpse at ... | \n", "CEDAR RAPIDS, Iowa — “I had one of the most wo... | \n", "REAL | \n", "
11 | \n", "988 | \n", "Donald Trump’s Shockingly Weak Delegate Game S... | \n", "Donald Trump’s organizational problems have go... | \n", "REAL | \n", "
12 | \n", "7041 | \n", "Strong Solar Storm, Tech Risks Today | S0 News... | \n", "Click Here To Learn More About Alexandra's Per... | \n", "FAKE | \n", "
13 | \n", "7623 | \n", "10 Ways America Is Preparing for World War 3 | \n", "October 31, 2016 at 4:52 am \\nPretty factual e... | \n", "FAKE | \n", "
14 | \n", "1571 | \n", "Trump takes on Cruz, but lightly | \n", "Killing Obama administration rules, dismantlin... | \n", "REAL | \n", "
15 | \n", "4739 | \n", "How women lead differently | \n", "As more women move into high offices, they oft... | \n", "REAL | \n", "
16 | \n", "7737 | \n", "Shocking! Michele Obama & Hillary Caught Glamo... | \n", "Shocking! Michele Obama & Hillary Caught Glamo... | \n", "FAKE | \n", "
17 | \n", "8716 | \n", "Hillary Clinton in HUGE Trouble After America ... | \n", "0 \\nHillary Clinton has barely just lost the p... | \n", "FAKE | \n", "
18 | \n", "3304 | \n", "What's in that Iran bill that Obama doesn't like? | \n", "Washington (CNN) For months, the White House a... | \n", "REAL | \n", "
19 | \n", "3078 | \n", "The 1 chart that explains everything you need ... | \n", "While paging through Pew's best data visualiza... | \n", "REAL | \n", "