diff --git a/cw/03a_tfidf.ipynb b/cw/03a_tfidf.ipynb
new file mode 100644
index 0000000..86cf237
--- /dev/null
+++ b/cw/03a_tfidf.ipynb
@@ -0,0 +1,1120 @@
+{
+    "cells": [
+        {
+            "cell_type": "markdown",
+            "metadata": {
+                "collapsed": false
+            },
+            "source": [
+                "![Logo 1](https://git.wmi.amu.edu.pl/AITech/Szablon/raw/branch/master/Logotyp_AITech1.jpg)\n",
+                "<div class=\"alert alert-block alert-info\">\n",
+                "<h1> Ekstrakcja informacji </h1>\n",
+                "<h2> 3. <i>tfidf (1)</i>  [\u0107wiczenia]</h2> \n",
+                "<h3> Jakub Pokrywka (2021)</h3>\n",
+                "</div>\n",
+                "\n",
+                "![Logo 2](https://git.wmi.amu.edu.pl/AITech/Szablon/raw/branch/master/Logotyp_AITech2.jpg)"
+            ]
+        },
+        {
+            "cell_type": "markdown",
+            "metadata": {},
+            "source": [
+                "# Zaj\u0119cia 2\n",
+                "\n",
+                "Na tych zaj\u0119ciach za aktywno\u015bc mo\u017cna otrzyma\u0107 po 5 punkt\u00f3w za warto\u015bciow\u0105 wypowied\u017a. Maksymalnie jedna osoba mo\u017ce zdoby\u0107 na tych \u0107wiczeniach do 15 punkt\u00f3w."
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 1,
+            "metadata": {},
+            "outputs": [],
+            "source": [
+                "import numpy as np\n",
+                "import re"
+            ]
+        },
+        {
+            "cell_type": "markdown",
+            "metadata": {},
+            "source": [
+                "## zbi\u00f3r dokument\u00f3w"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 2,
+            "metadata": {},
+            "outputs": [],
+            "source": [
+                "documents = ['Ala lubi zwierz\u0119ta i ma kota oraz psa!',\n",
+                "             'Ola lubi zwierz\u0119ta oraz ma kota a tak\u017ce chomika!',\n",
+                "             'I Jan je\u017adzi na rowerze.',\n",
+                "             '2 wojna \u015bwiatowa by\u0142a wielkim konfliktem zbrojnym',\n",
+                "             'Tomek lubi psy, ma psa  i je\u017adzi na motorze i rowerze.',\n",
+                "            ]"
+            ]
+        },
+        {
+            "cell_type": "markdown",
+            "metadata": {},
+            "source": [
+                "### CZEGO CHCEMY?\n",
+                "- chcemy zamieni\u0107 teksty na zbi\u00f3r s\u0142\u00f3w\n",
+                "\n",
+                "\n",
+                "### PYTANIE\n",
+                "- czy mo\u017cemy ztokenizowa\u0107 tekst np. documents.split(' ') jakie wyst\u0105pi\u0105 wtedy problemy?"
+            ]
+        },
+        {
+            "cell_type": "markdown",
+            "metadata": {},
+            "source": [
+                "## preprocessing"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 3,
+            "metadata": {},
+            "outputs": [],
+            "source": [
+                "def get_str_cleaned(str_dirty):\n",
+                "    punctuation = '!\"#$%&\\'()*+,-./:;<=>?@[\\\\]^_`{|}~'\n",
+                "    new_str = str_dirty.lower()\n",
+                "    new_str = re.sub(' +', ' ', new_str)\n",
+                "    for char in punctuation:\n",
+                "        new_str = new_str.replace(char,'')\n",
+                "    return new_str\n"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 4,
+            "metadata": {},
+            "outputs": [],
+            "source": [
+                "sample_document = get_str_cleaned(documents[0])"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 5,
+            "metadata": {},
+            "outputs": [
+                {
+                    "data": {
+                        "text/plain": [
+                            "'ala lubi zwierz\u0119ta i ma kota oraz psa'"
+                        ]
+                    },
+                    "execution_count": 5,
+                    "metadata": {},
+                    "output_type": "execute_result"
+                }
+            ],
+            "source": [
+                "sample_document"
+            ]
+        },
+        {
+            "cell_type": "markdown",
+            "metadata": {},
+            "source": [
+                "## tokenizacja"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 6,
+            "metadata": {},
+            "outputs": [],
+            "source": [
+                "def tokenize_str(document):\n",
+                "    return document.split(' ')"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 7,
+            "metadata": {},
+            "outputs": [
+                {
+                    "data": {
+                        "text/plain": [
+                            "['ala', 'lubi', 'zwierz\u0119ta', 'i', 'ma', 'kota', 'oraz', 'psa']"
+                        ]
+                    },
+                    "execution_count": 7,
+                    "metadata": {},
+                    "output_type": "execute_result"
+                }
+            ],
+            "source": [
+                "tokenize_str(sample_document)"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 8,
+            "metadata": {},
+            "outputs": [],
+            "source": [
+                "documents_cleaned = [get_str_cleaned(d) for d in documents]"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 9,
+            "metadata": {},
+            "outputs": [
+                {
+                    "data": {
+                        "text/plain": [
+                            "['ala lubi zwierz\u0119ta i ma kota oraz psa',\n",
+                            " 'ola lubi zwierz\u0119ta oraz ma kota a tak\u017ce chomika',\n",
+                            " 'i jan je\u017adzi na rowerze',\n",
+                            " '2 wojna \u015bwiatowa by\u0142a wielkim konfliktem zbrojnym',\n",
+                            " 'tomek lubi psy ma psa i je\u017adzi na motorze i rowerze']"
+                        ]
+                    },
+                    "execution_count": 9,
+                    "metadata": {},
+                    "output_type": "execute_result"
+                }
+            ],
+            "source": [
+                "documents_cleaned"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 10,
+            "metadata": {},
+            "outputs": [],
+            "source": [
+                "documents_tokenized = [tokenize_str(d) for d in documents_cleaned]"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 11,
+            "metadata": {},
+            "outputs": [
+                {
+                    "data": {
+                        "text/plain": [
+                            "[['ala', 'lubi', 'zwierz\u0119ta', 'i', 'ma', 'kota', 'oraz', 'psa'],\n",
+                            " ['ola', 'lubi', 'zwierz\u0119ta', 'oraz', 'ma', 'kota', 'a', 'tak\u017ce', 'chomika'],\n",
+                            " ['i', 'jan', 'je\u017adzi', 'na', 'rowerze'],\n",
+                            " ['2', 'wojna', '\u015bwiatowa', 'by\u0142a', 'wielkim', 'konfliktem', 'zbrojnym'],\n",
+                            " ['tomek',\n",
+                            "  'lubi',\n",
+                            "  'psy',\n",
+                            "  'ma',\n",
+                            "  'psa',\n",
+                            "  'i',\n",
+                            "  'je\u017adzi',\n",
+                            "  'na',\n",
+                            "  'motorze',\n",
+                            "  'i',\n",
+                            "  'rowerze']]"
+                        ]
+                    },
+                    "execution_count": 11,
+                    "metadata": {},
+                    "output_type": "execute_result"
+                }
+            ],
+            "source": [
+                "documents_tokenized"
+            ]
+        },
+        {
+            "cell_type": "markdown",
+            "metadata": {},
+            "source": [
+                "## PYTANIA\n",
+                "- jaki jest nast\u0119pny krok w celu stworzenia wekt\u00f3r\u00f3w TF lub TF-IDF\n",
+                "- jakie wielko\u015bci b\u0119dzie wektor TF lub TF-IDF?\n"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 12,
+            "metadata": {},
+            "outputs": [],
+            "source": [
+                "vocabulary = []\n",
+                "for document in documents_tokenized:\n",
+                "    for word in document:\n",
+                "        vocabulary.append(word)\n",
+                "vocabulary = sorted(set(vocabulary))"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 13,
+            "metadata": {},
+            "outputs": [
+                {
+                    "data": {
+                        "text/plain": [
+                            "['2',\n",
+                            " 'a',\n",
+                            " 'ala',\n",
+                            " 'by\u0142a',\n",
+                            " 'chomika',\n",
+                            " 'i',\n",
+                            " 'jan',\n",
+                            " 'je\u017adzi',\n",
+                            " 'konfliktem',\n",
+                            " 'kota',\n",
+                            " 'lubi',\n",
+                            " 'ma',\n",
+                            " 'motorze',\n",
+                            " 'na',\n",
+                            " 'ola',\n",
+                            " 'oraz',\n",
+                            " 'psa',\n",
+                            " 'psy',\n",
+                            " 'rowerze',\n",
+                            " 'tak\u017ce',\n",
+                            " 'tomek',\n",
+                            " 'wielkim',\n",
+                            " 'wojna',\n",
+                            " 'zbrojnym',\n",
+                            " 'zwierz\u0119ta',\n",
+                            " '\u015bwiatowa']"
+                        ]
+                    },
+                    "execution_count": 13,
+                    "metadata": {},
+                    "output_type": "execute_result"
+                }
+            ],
+            "source": [
+                "vocabulary"
+            ]
+        },
+        {
+            "cell_type": "markdown",
+            "metadata": {},
+            "source": [
+                "## PYTANIA\n",
+                "\n",
+                "jak b\u0119dzie s\u0142owo \"jak\" w reprezentacji wektorowej TF?"
+            ]
+        },
+        {
+            "cell_type": "markdown",
+            "metadata": {},
+            "source": [
+                "### ZADANIE 1 stworzy\u0107 funkcj\u0119 word_to_index(word:str), funkcja ma zwara\u0107 one-hot vector w postaciu numpy array"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 14,
+            "metadata": {},
+            "outputs": [],
+            "source": [
+                "def word_to_index(word):\n",
+                "    pass"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 16,
+            "metadata": {},
+            "outputs": [
+                {
+                    "data": {
+                        "text/plain": [
+                            "array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.,\n",
+                            "       0., 0., 0., 0., 0., 0., 0., 0., 0.])"
+                        ]
+                    },
+                    "execution_count": 16,
+                    "metadata": {},
+                    "output_type": "execute_result"
+                }
+            ],
+            "source": [
+                "word_to_index('psa')"
+            ]
+        },
+        {
+            "cell_type": "markdown",
+            "metadata": {},
+            "source": [
+                "### ZADANIE 2 NAPISAC FUNKCJ\u0118, kt\u00f3ra bierze list\u0119 s\u0142\u00f3w i zamienia na wetktor TF\n"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 17,
+            "metadata": {},
+            "outputs": [],
+            "source": [
+                "def tf(document):\n",
+                "    pass"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 18,
+            "metadata": {},
+            "outputs": [],
+            "source": []
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 19,
+            "metadata": {},
+            "outputs": [
+                {
+                    "data": {
+                        "text/plain": [
+                            "array([0., 0., 1., 0., 0., 1., 0., 0., 0., 1., 1., 1., 0., 0., 0., 1., 1.,\n",
+                            "       0., 0., 0., 0., 0., 0., 0., 1., 0.])"
+                        ]
+                    },
+                    "execution_count": 19,
+                    "metadata": {},
+                    "output_type": "execute_result"
+                }
+            ],
+            "source": [
+                "tf(documents_tokenized[0])"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 20,
+            "metadata": {},
+            "outputs": [],
+            "source": [
+                "documents_vectorized = list()\n",
+                "for document in documents_tokenized:\n",
+                "    document_vector = tf(document)\n",
+                "    documents_vectorized.append(document_vector)"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 21,
+            "metadata": {
+                "scrolled": true
+            },
+            "outputs": [
+                {
+                    "data": {
+                        "text/plain": [
+                            "[array([0., 0., 1., 0., 0., 1., 0., 0., 0., 1., 1., 1., 0., 0., 0., 1., 1.,\n",
+                            "        0., 0., 0., 0., 0., 0., 0., 1., 0.]),\n",
+                            " array([0., 1., 0., 0., 1., 0., 0., 0., 0., 1., 1., 1., 0., 0., 1., 1., 0.,\n",
+                            "        0., 0., 1., 0., 0., 0., 0., 1., 0.]),\n",
+                            " array([0., 0., 0., 0., 0., 1., 1., 1., 0., 0., 0., 0., 0., 1., 0., 0., 0.,\n",
+                            "        0., 1., 0., 0., 0., 0., 0., 0., 0.]),\n",
+                            " array([1., 0., 0., 1., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.,\n",
+                            "        0., 0., 0., 0., 1., 1., 1., 0., 1.]),\n",
+                            " array([0., 0., 0., 0., 0., 2., 0., 1., 0., 0., 1., 1., 1., 1., 0., 0., 1.,\n",
+                            "        1., 1., 0., 1., 0., 0., 0., 0., 0.])]"
+                        ]
+                    },
+                    "execution_count": 21,
+                    "metadata": {},
+                    "output_type": "execute_result"
+                }
+            ],
+            "source": [
+                "documents_vectorized"
+            ]
+        },
+        {
+            "cell_type": "markdown",
+            "metadata": {},
+            "source": [
+                "### IDF"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 22,
+            "metadata": {},
+            "outputs": [
+                {
+                    "data": {
+                        "text/plain": [
+                            "array([5.        , 5.        , 5.        , 5.        , 5.        ,\n",
+                            "       1.66666667, 5.        , 2.5       , 5.        , 2.5       ,\n",
+                            "       1.66666667, 1.66666667, 5.        , 2.5       , 5.        ,\n",
+                            "       2.5       , 2.5       , 5.        , 2.5       , 5.        ,\n",
+                            "       5.        , 5.        , 5.        , 5.        , 2.5       ,\n",
+                            "       5.        ])"
+                        ]
+                    },
+                    "metadata": {},
+                    "output_type": "display_data"
+                }
+            ],
+            "source": [
+                "idf = np.zeros(len(vocabulary))\n",
+                "idf = len(documents_vectorized) / np.sum(np.array(documents_vectorized) != 0,axis=0)\n",
+                "display(idf)"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 23,
+            "metadata": {},
+            "outputs": [],
+            "source": [
+                "for i in range(len(documents_vectorized)):\n",
+                "    documents_vectorized[i] = documents_vectorized[i]# * idf"
+            ]
+        },
+        {
+            "cell_type": "markdown",
+            "metadata": {},
+            "source": [
+                "### ZADANIE 3 Napisa\u0107 funkcj\u0119 similarity, kt\u00f3ra zwraca podobie\u0144stwo kosinusowe mi\u0119dzy dwoma dokumentami w postaci zwektoryzowanej"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": null,
+            "metadata": {},
+            "outputs": [],
+            "source": [
+                "def similarity(query, document):\n",
+                "    pass"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 25,
+            "metadata": {},
+            "outputs": [
+                {
+                    "data": {
+                        "text/plain": [
+                            "'Ala lubi zwierz\u0119ta i ma kota oraz psa!'"
+                        ]
+                    },
+                    "execution_count": 25,
+                    "metadata": {},
+                    "output_type": "execute_result"
+                }
+            ],
+            "source": [
+                "documents[0]"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 26,
+            "metadata": {},
+            "outputs": [
+                {
+                    "data": {
+                        "text/plain": [
+                            "array([0., 0., 1., 0., 0., 1., 0., 0., 0., 1., 1., 1., 0., 0., 0., 1., 1.,\n",
+                            "       0., 0., 0., 0., 0., 0., 0., 1., 0.])"
+                        ]
+                    },
+                    "execution_count": 26,
+                    "metadata": {},
+                    "output_type": "execute_result"
+                }
+            ],
+            "source": [
+                "documents_vectorized[0]"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 27,
+            "metadata": {},
+            "outputs": [
+                {
+                    "data": {
+                        "text/plain": [
+                            "'Ola lubi zwierz\u0119ta oraz ma kota a tak\u017ce chomika!'"
+                        ]
+                    },
+                    "execution_count": 27,
+                    "metadata": {},
+                    "output_type": "execute_result"
+                }
+            ],
+            "source": [
+                "documents[1]"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 28,
+            "metadata": {},
+            "outputs": [
+                {
+                    "data": {
+                        "text/plain": [
+                            "array([0., 1., 0., 0., 1., 0., 0., 0., 0., 1., 1., 1., 0., 0., 1., 1., 0.,\n",
+                            "       0., 0., 1., 0., 0., 0., 0., 1., 0.])"
+                        ]
+                    },
+                    "execution_count": 28,
+                    "metadata": {},
+                    "output_type": "execute_result"
+                }
+            ],
+            "source": [
+                "documents_vectorized[1]"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 29,
+            "metadata": {},
+            "outputs": [
+                {
+                    "data": {
+                        "text/plain": [
+                            "0.5892556509887895"
+                        ]
+                    },
+                    "execution_count": 29,
+                    "metadata": {},
+                    "output_type": "execute_result"
+                }
+            ],
+            "source": [
+                "similarity(documents_vectorized[0],documents_vectorized[1])"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 30,
+            "metadata": {},
+            "outputs": [],
+            "source": [
+                "def transform_query(query):\n",
+                "    query_vector = tf(tokenize_str(get_str_cleaned(query)))\n",
+                "    return query_vector"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 31,
+            "metadata": {},
+            "outputs": [
+                {
+                    "data": {
+                        "text/plain": [
+                            "array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.,\n",
+                            "       0., 0., 0., 0., 0., 0., 0., 0., 0.])"
+                        ]
+                    },
+                    "execution_count": 31,
+                    "metadata": {},
+                    "output_type": "execute_result"
+                }
+            ],
+            "source": [
+                "transform_query('psa')"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 32,
+            "metadata": {},
+            "outputs": [
+                {
+                    "data": {
+                        "text/plain": [
+                            "0.4999999999999999"
+                        ]
+                    },
+                    "execution_count": 32,
+                    "metadata": {},
+                    "output_type": "execute_result"
+                }
+            ],
+            "source": [
+                "similarity(transform_query('psa kota'), documents_vectorized[0])"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 33,
+            "metadata": {},
+            "outputs": [
+                {
+                    "data": {
+                        "text/plain": [
+                            "'Ala lubi zwierz\u0119ta i ma kota oraz psa!'"
+                        ]
+                    },
+                    "metadata": {},
+                    "output_type": "display_data"
+                },
+                {
+                    "data": {
+                        "text/plain": [
+                            "0.4999999999999999"
+                        ]
+                    },
+                    "metadata": {},
+                    "output_type": "display_data"
+                },
+                {
+                    "data": {
+                        "text/plain": [
+                            "'Ola lubi zwierz\u0119ta oraz ma kota a tak\u017ce chomika!'"
+                        ]
+                    },
+                    "metadata": {},
+                    "output_type": "display_data"
+                },
+                {
+                    "data": {
+                        "text/plain": [
+                            "0.2357022603955158"
+                        ]
+                    },
+                    "metadata": {},
+                    "output_type": "display_data"
+                },
+                {
+                    "data": {
+                        "text/plain": [
+                            "'I Jan je\u017adzi na rowerze.'"
+                        ]
+                    },
+                    "metadata": {},
+                    "output_type": "display_data"
+                },
+                {
+                    "data": {
+                        "text/plain": [
+                            "0.0"
+                        ]
+                    },
+                    "metadata": {},
+                    "output_type": "display_data"
+                },
+                {
+                    "data": {
+                        "text/plain": [
+                            "'2 wojna \u015bwiatowa by\u0142a wielkim konfliktem zbrojnym'"
+                        ]
+                    },
+                    "metadata": {},
+                    "output_type": "display_data"
+                },
+                {
+                    "data": {
+                        "text/plain": [
+                            "0.0"
+                        ]
+                    },
+                    "metadata": {},
+                    "output_type": "display_data"
+                },
+                {
+                    "data": {
+                        "text/plain": [
+                            "'Tomek lubi psy, ma psa  i je\u017adzi na motorze i rowerze.'"
+                        ]
+                    },
+                    "metadata": {},
+                    "output_type": "display_data"
+                },
+                {
+                    "data": {
+                        "text/plain": [
+                            "0.19611613513818402"
+                        ]
+                    },
+                    "metadata": {},
+                    "output_type": "display_data"
+                }
+            ],
+            "source": [
+                "# tak s\u0105 obs\u0142ugiwane 2 s\u0142owa\n",
+                "query = 'psa kota'\n",
+                "for i in range(len(documents)):\n",
+                "    display(documents[i])\n",
+                "    display(similarity(transform_query(query), documents_vectorized[i]))"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 34,
+            "metadata": {},
+            "outputs": [
+                {
+                    "data": {
+                        "text/plain": [
+                            "'Ala lubi zwierz\u0119ta i ma kota oraz psa!'"
+                        ]
+                    },
+                    "metadata": {},
+                    "output_type": "display_data"
+                },
+                {
+                    "data": {
+                        "text/plain": [
+                            "0.0"
+                        ]
+                    },
+                    "metadata": {},
+                    "output_type": "display_data"
+                },
+                {
+                    "data": {
+                        "text/plain": [
+                            "'Ola lubi zwierz\u0119ta oraz ma kota a tak\u017ce chomika!'"
+                        ]
+                    },
+                    "metadata": {},
+                    "output_type": "display_data"
+                },
+                {
+                    "data": {
+                        "text/plain": [
+                            "0.0"
+                        ]
+                    },
+                    "metadata": {},
+                    "output_type": "display_data"
+                },
+                {
+                    "data": {
+                        "text/plain": [
+                            "'I Jan je\u017adzi na rowerze.'"
+                        ]
+                    },
+                    "metadata": {},
+                    "output_type": "display_data"
+                },
+                {
+                    "data": {
+                        "text/plain": [
+                            "0.4472135954999579"
+                        ]
+                    },
+                    "metadata": {},
+                    "output_type": "display_data"
+                },
+                {
+                    "data": {
+                        "text/plain": [
+                            "'2 wojna \u015bwiatowa by\u0142a wielkim konfliktem zbrojnym'"
+                        ]
+                    },
+                    "metadata": {},
+                    "output_type": "display_data"
+                },
+                {
+                    "data": {
+                        "text/plain": [
+                            "0.0"
+                        ]
+                    },
+                    "metadata": {},
+                    "output_type": "display_data"
+                },
+                {
+                    "data": {
+                        "text/plain": [
+                            "'Tomek lubi psy, ma psa  i je\u017adzi na motorze i rowerze.'"
+                        ]
+                    },
+                    "metadata": {},
+                    "output_type": "display_data"
+                },
+                {
+                    "data": {
+                        "text/plain": [
+                            "0.2773500981126146"
+                        ]
+                    },
+                    "metadata": {},
+                    "output_type": "display_data"
+                }
+            ],
+            "source": [
+                "# dlatego potrzebujemy mianownik w cosine similarity\n",
+                "query = 'rowerze'\n",
+                "for i in range(len(documents)):\n",
+                "    display(documents[i])\n",
+                "    display(similarity(transform_query(query), documents_vectorized[i]))"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 35,
+            "metadata": {
+                "scrolled": true
+            },
+            "outputs": [
+                {
+                    "data": {
+                        "text/plain": [
+                            "'Ala lubi zwierz\u0119ta i ma kota oraz psa!'"
+                        ]
+                    },
+                    "metadata": {},
+                    "output_type": "display_data"
+                },
+                {
+                    "data": {
+                        "text/plain": [
+                            "0.35355339059327373"
+                        ]
+                    },
+                    "metadata": {},
+                    "output_type": "display_data"
+                },
+                {
+                    "data": {
+                        "text/plain": [
+                            "'Ola lubi zwierz\u0119ta oraz ma kota a tak\u017ce chomika!'"
+                        ]
+                    },
+                    "metadata": {},
+                    "output_type": "display_data"
+                },
+                {
+                    "data": {
+                        "text/plain": [
+                            "0.0"
+                        ]
+                    },
+                    "metadata": {},
+                    "output_type": "display_data"
+                },
+                {
+                    "data": {
+                        "text/plain": [
+                            "'I Jan je\u017adzi na rowerze.'"
+                        ]
+                    },
+                    "metadata": {},
+                    "output_type": "display_data"
+                },
+                {
+                    "data": {
+                        "text/plain": [
+                            "0.4472135954999579"
+                        ]
+                    },
+                    "metadata": {},
+                    "output_type": "display_data"
+                },
+                {
+                    "data": {
+                        "text/plain": [
+                            "'2 wojna \u015bwiatowa by\u0142a wielkim konfliktem zbrojnym'"
+                        ]
+                    },
+                    "metadata": {},
+                    "output_type": "display_data"
+                },
+                {
+                    "data": {
+                        "text/plain": [
+                            "0.0"
+                        ]
+                    },
+                    "metadata": {},
+                    "output_type": "display_data"
+                },
+                {
+                    "data": {
+                        "text/plain": [
+                            "'Tomek lubi psy, ma psa  i je\u017adzi na motorze i rowerze.'"
+                        ]
+                    },
+                    "metadata": {},
+                    "output_type": "display_data"
+                },
+                {
+                    "data": {
+                        "text/plain": [
+                            "0.5547001962252291"
+                        ]
+                    },
+                    "metadata": {},
+                    "output_type": "display_data"
+                }
+            ],
+            "source": [
+                "# dlatego potrzebujemy term frequency \u2192 wiecej znaczy bardziej dopasowany dokument\n",
+                "query = 'i'\n",
+                "for i in range(len(documents)):\n",
+                "    display(documents[i])\n",
+                "    display(similarity(transform_query(query), documents_vectorized[i]))"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 36,
+            "metadata": {},
+            "outputs": [
+                {
+                    "data": {
+                        "text/plain": [
+                            "'Ala lubi zwierz\u0119ta i ma kota oraz psa!'"
+                        ]
+                    },
+                    "metadata": {},
+                    "output_type": "display_data"
+                },
+                {
+                    "data": {
+                        "text/plain": [
+                            "0.24999999999999994"
+                        ]
+                    },
+                    "metadata": {},
+                    "output_type": "display_data"
+                },
+                {
+                    "data": {
+                        "text/plain": [
+                            "'Ola lubi zwierz\u0119ta oraz ma kota a tak\u017ce chomika!'"
+                        ]
+                    },
+                    "metadata": {},
+                    "output_type": "display_data"
+                },
+                {
+                    "data": {
+                        "text/plain": [
+                            "0.2357022603955158"
+                        ]
+                    },
+                    "metadata": {},
+                    "output_type": "display_data"
+                },
+                {
+                    "data": {
+                        "text/plain": [
+                            "'I Jan je\u017adzi na rowerze.'"
+                        ]
+                    },
+                    "metadata": {},
+                    "output_type": "display_data"
+                },
+                {
+                    "data": {
+                        "text/plain": [
+                            "0.31622776601683794"
+                        ]
+                    },
+                    "metadata": {},
+                    "output_type": "display_data"
+                },
+                {
+                    "data": {
+                        "text/plain": [
+                            "'2 wojna \u015bwiatowa by\u0142a wielkim konfliktem zbrojnym'"
+                        ]
+                    },
+                    "metadata": {},
+                    "output_type": "display_data"
+                },
+                {
+                    "data": {
+                        "text/plain": [
+                            "0.0"
+                        ]
+                    },
+                    "metadata": {},
+                    "output_type": "display_data"
+                },
+                {
+                    "data": {
+                        "text/plain": [
+                            "'Tomek lubi psy, ma psa  i je\u017adzi na motorze i rowerze.'"
+                        ]
+                    },
+                    "metadata": {},
+                    "output_type": "display_data"
+                },
+                {
+                    "data": {
+                        "text/plain": [
+                            "0.39223227027636803"
+                        ]
+                    },
+                    "metadata": {},
+                    "output_type": "display_data"
+                }
+            ],
+            "source": [
+                "# dlatego IDF - \u017ceby wa\u017cniejsze s\u0142owa mia\u0142 wi\u0119ksz\u0105 wag\u0119\n",
+                "query = 'i chomika'\n",
+                "for i in range(len(documents)):\n",
+                "    display(documents[i])\n",
+                "    display(similarity(transform_query(query), documents_vectorized[i]))"
+            ]
+        },
+        {
+            "cell_type": "markdown",
+            "metadata": {},
+            "source": [
+                "### ZADANIE 4 NAPISA\u0106 IDF w celu zmiany wag z TF na TF- IDF \n",
+                "\n",
+                "Prosz\u0119 u\u017cy\u0107 wersj\u0119 bez \u017cadnej normalizacji\n",
+                "\n",
+                "\n",
+                "$idf_i = \\Large\\frac{|D|}{|\\{d : t_i \\in d \\}|}$\n",
+                "\n",
+                "\n",
+                "$|D|$ - ilo\u015b\u0107 dokument\u00f3w w korpusie\n",
+                "$|\\{d : t_i \\in d \\}|$ - ilo\u015b\u0107 dokument\u00f3w w korpusie, gdzie dany term wyst\u0119puje chocia\u017c jeden raz"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": null,
+            "metadata": {},
+            "outputs": [],
+            "source": []
+        }
+    ],
+    "metadata": {
+        "kernelspec": {
+            "display_name": "Python 3",
+            "language": "python",
+            "name": "python3"
+        },
+        "language_info": {
+            "codemirror_mode": {
+                "name": "ipython",
+                "version": 3
+            },
+            "file_extension": ".py",
+            "mimetype": "text/x-python",
+            "name": "python",
+            "nbconvert_exporter": "python",
+            "pygments_lexer": "ipython3",
+            "version": "3.8.3"
+        },
+        "author": "Jakub Pokrywka",
+        "email": "kubapok@wmi.amu.edu.pl",
+        "lang": "pl",
+        "subtitle": "3.tfidf (1)[\u0107wiczenia]",
+        "title": "Ekstrakcja informacji",
+        "year": "2021"
+    },
+    "nbformat": 4,
+    "nbformat_minor": 4
+}
\ No newline at end of file
diff --git a/cw/03a_tfidf_ODPOWIEDZI.ipynb b/cw/03a_tfidf_ODPOWIEDZI.ipynb
new file mode 100644
index 0000000..30d45ca
--- /dev/null
+++ b/cw/03a_tfidf_ODPOWIEDZI.ipynb
@@ -0,0 +1,91 @@
+{
+    "cells": [
+        {
+            "cell_type": "markdown",
+            "metadata": {
+                "collapsed": false
+            },
+            "source": [
+                "![Logo 1](https://git.wmi.amu.edu.pl/AITech/Szablon/raw/branch/master/Logotyp_AITech1.jpg)\n",
+                "<div class=\"alert alert-block alert-info\">\n",
+                "<h1> Ekstrakcja informacji </h1>\n",
+                "<h2> 3. <i>tfidf (1)</i>  [\u0107wiczenia]</h2> \n",
+                "<h3> Jakub Pokrywka (2021)</h3>\n",
+                "</div>\n",
+                "\n",
+                "![Logo 2](https://git.wmi.amu.edu.pl/AITech/Szablon/raw/branch/master/Logotyp_AITech2.jpg)"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 1,
+            "metadata": {},
+            "outputs": [],
+            "source": [
+                "def word_to_index(word):\n",
+                "    vec = np.zeros(len(vocabulary))\n",
+                "    if word in vocabulary:\n",
+                "        idx = vocabulary.index(word)\n",
+                "        vec[idx] = 1\n",
+                "    else:\n",
+                "        vec[-1] = 1\n",
+                "    return vec"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": null,
+            "metadata": {},
+            "outputs": [],
+            "source": [
+                "def tf(document):\n",
+                "    document_vector = None\n",
+                "    for word in document:\n",
+                "        if document_vector is None:\n",
+                "            document_vector = word_to_index(word)\n",
+                "        else:\n",
+                "            document_vector += word_to_index(word)\n",
+                "    return document_vector"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": null,
+            "metadata": {},
+            "outputs": [],
+            "source": [
+                "def similarity(query, document):\n",
+                "    numerator = np.sum(query * document)\n",
+                "    denominator = np.sqrt(np.sum(query*query)) * np.sqrt(np.sum(document*document)) \n",
+                "    return numerator / denominator"
+            ]
+        }
+    ],
+    "metadata": {
+        "kernelspec": {
+            "display_name": "Python 3",
+            "language": "python",
+            "name": "python3"
+        },
+        "language_info": {
+            "codemirror_mode": {
+                "name": "ipython",
+                "version": 3
+            },
+            "file_extension": ".py",
+            "mimetype": "text/x-python",
+            "name": "python",
+            "nbconvert_exporter": "python",
+            "pygments_lexer": "ipython3",
+            "version": "3.8.3"
+        },
+        "author": "Jakub Pokrywka",
+        "email": "kubapok@wmi.amu.edu.pl",
+        "lang": "pl",
+        "subtitle": "3.tfidf (1)[\u0107wiczenia]",
+        "title": "Ekstrakcja informacji",
+        "year": "2021"
+    },
+    "nbformat": 4,
+    "nbformat_minor": 4
+}
\ No newline at end of file
diff --git a/cw/03b_tfidf_newsgroup.ipynb b/cw/03b_tfidf_newsgroup.ipynb
new file mode 100644
index 0000000..66db375
--- /dev/null
+++ b/cw/03b_tfidf_newsgroup.ipynb
@@ -0,0 +1,730 @@
+{
+    "cells": [
+        {
+            "cell_type": "markdown",
+            "metadata": {
+                "collapsed": false
+            },
+            "source": [
+                "![Logo 1](https://git.wmi.amu.edu.pl/AITech/Szablon/raw/branch/master/Logotyp_AITech1.jpg)\n",
+                "<div class=\"alert alert-block alert-info\">\n",
+                "<h1> Ekstrakcja informacji </h1>\n",
+                "<h2> 3. <i>tfidf (2)</i>  [\u0107wiczenia]</h2> \n",
+                "<h3> Jakub Pokrywka (2021)</h3>\n",
+                "</div>\n",
+                "\n",
+                "![Logo 2](https://git.wmi.amu.edu.pl/AITech/Szablon/raw/branch/master/Logotyp_AITech2.jpg)"
+            ]
+        },
+        {
+            "cell_type": "markdown",
+            "metadata": {},
+            "source": [
+                "# Zajecia 2\n",
+                "\n",
+                "Przydatne materia\u0142y:\n",
+                "\n",
+                "https://scikit-learn.org/0.19/datasets/twenty_newsgroups.html\n",
+                "\n",
+                "https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html\n",
+                "\n"
+            ]
+        },
+        {
+            "cell_type": "markdown",
+            "metadata": {},
+            "source": [
+                "## Importy"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 1,
+            "metadata": {},
+            "outputs": [],
+            "source": [
+                "import numpy as np\n",
+                "import sklearn.metrics\n",
+                "\n",
+                "from sklearn.datasets import fetch_20newsgroups\n",
+                "\n",
+                "from sklearn.feature_extraction.text import TfidfVectorizer"
+            ]
+        },
+        {
+            "cell_type": "markdown",
+            "metadata": {},
+            "source": [
+                "## Zbi\u00f3r danych"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 2,
+            "metadata": {},
+            "outputs": [],
+            "source": [
+                "newsgroups = fetch_20newsgroups()['data']"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 3,
+            "metadata": {},
+            "outputs": [
+                {
+                    "data": {
+                        "text/plain": [
+                            "11314"
+                        ]
+                    },
+                    "execution_count": 3,
+                    "metadata": {},
+                    "output_type": "execute_result"
+                }
+            ],
+            "source": [
+                "len(newsgroups)"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 4,
+            "metadata": {},
+            "outputs": [
+                {
+                    "name": "stdout",
+                    "output_type": "stream",
+                    "text": [
+                        "From: lerxst@wam.umd.edu (where's my thing)\n",
+                        "Subject: WHAT car is this!?\n",
+                        "Nntp-Posting-Host: rac3.wam.umd.edu\n",
+                        "Organization: University of Maryland, College Park\n",
+                        "Lines: 15\n",
+                        "\n",
+                        " I was wondering if anyone out there could enlighten me on this car I saw\n",
+                        "the other day. It was a 2-door sports car, looked to be from the late 60s/\n",
+                        "early 70s. It was called a Bricklin. The doors were really small. In addition,\n",
+                        "the front bumper was separate from the rest of the body. This is \n",
+                        "all I know. If anyone can tellme a model name, engine specs, years\n",
+                        "of production, where this car is made, history, or whatever info you\n",
+                        "have on this funky looking car, please e-mail.\n",
+                        "\n",
+                        "Thanks,\n",
+                        "- IL\n",
+                        "   ---- brought to you by your neighborhood Lerxst ----\n",
+                        "\n",
+                        "\n",
+                        "\n",
+                        "\n",
+                        "\n"
+                    ]
+                }
+            ],
+            "source": [
+                "print(newsgroups[0])"
+            ]
+        },
+        {
+            "cell_type": "markdown",
+            "metadata": {},
+            "source": [
+                "## Naiwne przeszukiwanie"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 5,
+            "metadata": {},
+            "outputs": [],
+            "source": [
+                "all_documents = list() \n",
+                "for document in newsgroups:\n",
+                "    if 'car' in document:\n",
+                "        all_documents.append(document)"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 6,
+            "metadata": {},
+            "outputs": [
+                {
+                    "name": "stdout",
+                    "output_type": "stream",
+                    "text": [
+                        "From: lerxst@wam.umd.edu (where's my thing)\n",
+                        "Subject: WHAT car is this!?\n",
+                        "Nntp-Posting-Host: rac3.wam.umd.edu\n",
+                        "Organization: University of Maryland, College Park\n",
+                        "Lines: 15\n",
+                        "\n",
+                        " I was wondering if anyone out there could enlighten me on this car I saw\n",
+                        "the other day. It was a 2-door sports car, looked to be from the late 60s/\n",
+                        "early 70s. It was called a Bricklin. The doors were really small. In addition,\n",
+                        "the front bumper was separate from the rest of the body. This is \n",
+                        "all I know. If anyone can tellme a model name, engine specs, years\n",
+                        "of production, where this car is made, history, or whatever info you\n",
+                        "have on this funky looking car, please e-mail.\n",
+                        "\n",
+                        "Thanks,\n",
+                        "- IL\n",
+                        "   ---- brought to you by your neighborhood Lerxst ----\n",
+                        "\n",
+                        "\n",
+                        "\n",
+                        "\n",
+                        "\n"
+                    ]
+                }
+            ],
+            "source": [
+                "print(all_documents[0])"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 7,
+            "metadata": {},
+            "outputs": [
+                {
+                    "name": "stdout",
+                    "output_type": "stream",
+                    "text": [
+                        "From: guykuo@carson.u.washington.edu (Guy Kuo)\n",
+                        "Subject: SI Clock Poll - Final Call\n",
+                        "Summary: Final call for SI clock reports\n",
+                        "Keywords: SI,acceleration,clock,upgrade\n",
+                        "Article-I.D.: shelley.1qvfo9INNc3s\n",
+                        "Organization: University of Washington\n",
+                        "Lines: 11\n",
+                        "NNTP-Posting-Host: carson.u.washington.edu\n",
+                        "\n",
+                        "A fair number of brave souls who upgraded their SI clock oscillator have\n",
+                        "shared their experiences for this poll. Please send a brief message detailing\n",
+                        "your experiences with the procedure. Top speed attained, CPU rated speed,\n",
+                        "add on cards and adapters, heat sinks, hour of usage per day, floppy disk\n",
+                        "functionality with 800 and 1.4 m floppies are especially requested.\n",
+                        "\n",
+                        "I will be summarizing in the next two days, so please add to the network\n",
+                        "knowledge base if you have done the clock upgrade and haven't answered this\n",
+                        "poll. Thanks.\n",
+                        "\n",
+                        "Guy Kuo <guykuo@u.washington.edu>\n",
+                        "\n"
+                    ]
+                }
+            ],
+            "source": [
+                "print(all_documents[1])"
+            ]
+        },
+        {
+            "cell_type": "markdown",
+            "metadata": {},
+            "source": [
+                "### jakie s\u0105 problemy z takim podej\u015bciem?\n"
+            ]
+        },
+        {
+            "cell_type": "markdown",
+            "metadata": {},
+            "source": [
+                "## TFIDF i odleg\u0142o\u015b\u0107 cosinusowa- gotowe biblioteki"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 8,
+            "metadata": {},
+            "outputs": [],
+            "source": [
+                "vectorizer = TfidfVectorizer()\n",
+                "#vectorizer = TfidfVectorizer(use_idf = False, ngram_range=(1,2))"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 9,
+            "metadata": {},
+            "outputs": [],
+            "source": [
+                "document_vectors = vectorizer.fit_transform(newsgroups)"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 10,
+            "metadata": {},
+            "outputs": [
+                {
+                    "data": {
+                        "text/plain": [
+                            "<11314x130107 sparse matrix of type '<class 'numpy.float64'>'\n",
+                            "\twith 1787565 stored elements in Compressed Sparse Row format>"
+                        ]
+                    },
+                    "execution_count": 10,
+                    "metadata": {},
+                    "output_type": "execute_result"
+                }
+            ],
+            "source": [
+                "document_vectors"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 11,
+            "metadata": {},
+            "outputs": [
+                {
+                    "data": {
+                        "text/plain": [
+                            "<1x130107 sparse matrix of type '<class 'numpy.float64'>'\n",
+                            "\twith 89 stored elements in Compressed Sparse Row format>"
+                        ]
+                    },
+                    "execution_count": 11,
+                    "metadata": {},
+                    "output_type": "execute_result"
+                }
+            ],
+            "source": [
+                "document_vectors[0]"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 12,
+            "metadata": {},
+            "outputs": [
+                {
+                    "data": {
+                        "text/plain": [
+                            "matrix([[0., 0., 0., ..., 0., 0., 0.]])"
+                        ]
+                    },
+                    "execution_count": 12,
+                    "metadata": {},
+                    "output_type": "execute_result"
+                }
+            ],
+            "source": [
+                "document_vectors[0].todense()"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 13,
+            "metadata": {},
+            "outputs": [
+                {
+                    "data": {
+                        "text/plain": [
+                            "matrix([[0., 0., 0., ..., 0., 0., 0.],\n",
+                            "        [0., 0., 0., ..., 0., 0., 0.],\n",
+                            "        [0., 0., 0., ..., 0., 0., 0.],\n",
+                            "        [0., 0., 0., ..., 0., 0., 0.]])"
+                        ]
+                    },
+                    "execution_count": 13,
+                    "metadata": {},
+                    "output_type": "execute_result"
+                }
+            ],
+            "source": [
+                "document_vectors[0:4].todense()"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 14,
+            "metadata": {},
+            "outputs": [],
+            "source": [
+                "query_str = 'speed'\n",
+                "#query_str = 'speed car'\n",
+                "#query_str = 'spider man'"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 15,
+            "metadata": {},
+            "outputs": [],
+            "source": [
+                "query_vector = vectorizer.transform([query_str])"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 16,
+            "metadata": {},
+            "outputs": [
+                {
+                    "data": {
+                        "text/plain": [
+                            "<11314x130107 sparse matrix of type '<class 'numpy.float64'>'\n",
+                            "\twith 1787565 stored elements in Compressed Sparse Row format>"
+                        ]
+                    },
+                    "execution_count": 16,
+                    "metadata": {},
+                    "output_type": "execute_result"
+                }
+            ],
+            "source": [
+                "document_vectors"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 17,
+            "metadata": {},
+            "outputs": [
+                {
+                    "data": {
+                        "text/plain": [
+                            "<1x130107 sparse matrix of type '<class 'numpy.float64'>'\n",
+                            "\twith 1 stored elements in Compressed Sparse Row format>"
+                        ]
+                    },
+                    "execution_count": 17,
+                    "metadata": {},
+                    "output_type": "execute_result"
+                }
+            ],
+            "source": [
+                "query_vector"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 18,
+            "metadata": {},
+            "outputs": [],
+            "source": [
+                "similarities = sklearn.metrics.pairwise.cosine_similarity(query_vector,document_vectors)"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 19,
+            "metadata": {},
+            "outputs": [
+                {
+                    "data": {
+                        "text/plain": [
+                            "array([0.26949927, 0.3491801 , 0.44292083, 0.47784165])"
+                        ]
+                    },
+                    "execution_count": 19,
+                    "metadata": {},
+                    "output_type": "execute_result"
+                }
+            ],
+            "source": [
+                "np.sort(similarities)[0][-4:]"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 20,
+            "metadata": {},
+            "outputs": [
+                {
+                    "data": {
+                        "text/plain": [
+                            "array([4517, 5509, 2116, 9921])"
+                        ]
+                    },
+                    "execution_count": 20,
+                    "metadata": {},
+                    "output_type": "execute_result"
+                }
+            ],
+            "source": [
+                "similarities.argsort()[0][-4:]"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 21,
+            "metadata": {
+                "scrolled": false
+            },
+            "outputs": [
+                {
+                    "name": "stdout",
+                    "output_type": "stream",
+                    "text": [
+                        "From: ray@netcom.com (Ray Fischer)\n",
+                        "Subject: Re: x86 ~= 680x0 ??  (How do they compare?)\n",
+                        "Organization: Netcom. San Jose, California\n",
+                        "Distribution: usa\n",
+                        "Lines: 36\n",
+                        "\n",
+                        "dhk@ubbpc.uucp (Dave Kitabjian) writes ...\n",
+                        ">I'm sure Intel and Motorola are competing neck-and-neck for \n",
+                        ">crunch-power, but for a given clock speed, how do we rank the\n",
+                        ">following (from 1st to 6th):\n",
+                        ">  486\t\t68040\n",
+                        ">  386\t\t68030\n",
+                        ">  286\t\t68020\n",
+                        "\n",
+                        "040 486 030 386 020 286\n",
+                        "\n",
+                        ">While you're at it, where will the following fit into the list:\n",
+                        ">  68060\n",
+                        ">  Pentium\n",
+                        ">  PowerPC\n",
+                        "\n",
+                        "060 fastest, then Pentium, with the first versions of the PowerPC\n",
+                        "somewhere in the vicinity.\n",
+                        "\n",
+                        ">And about clock speed:  Does doubling the clock speed double the\n",
+                        ">overall processor speed?  And fill in the __'s below:\n",
+                        ">  68030 @ __ MHz = 68040 @ __ MHz\n",
+                        "\n",
+                        "No.  Computer speed is only partly dependent of processor/clock speed.\n",
+                        "Memory system speed play a large role as does video system speed and\n",
+                        "I/O speed.  As processor clock rates go up, the speed of the memory\n",
+                        "system becomes the greatest factor in the overall system speed.  If\n",
+                        "you have a 50MHz processor, it can be reading another word from memory\n",
+                        "every 20ns.  Sure, you can put all 20ns memory in your computer, but\n",
+                        "it will cost 10 times as much as the slower 80ns SIMMs.\n",
+                        "\n",
+                        "And roughly, the 68040 is twice as fast at a given clock\n",
+                        "speed as is the 68030.\n",
+                        "\n",
+                        "-- \n",
+                        "Ray Fischer                   \"Convictions are more dangerous enemies of truth\n",
+                        "ray@netcom.com                 than lies.\"  -- Friedrich Nietzsche\n",
+                        "\n",
+                        "0.4778416465020907\n",
+                        "----------------------------------------------------------------------------------------------------\n",
+                        "----------------------------------------------------------------------------------------------------\n",
+                        "----------------------------------------------------------------------------------------------------\n",
+                        "From: rvenkate@ux4.cso.uiuc.edu (Ravikuma Venkateswar)\n",
+                        "Subject: Re: x86 ~= 680x0 ?? (How do they compare?)\n",
+                        "Distribution: usa\n",
+                        "Organization: University of Illinois at Urbana\n",
+                        "Lines: 59\n",
+                        "\n",
+                        "ray@netcom.com (Ray Fischer) writes:\n",
+                        "\n",
+                        ">dhk@ubbpc.uucp (Dave Kitabjian) writes ...\n",
+                        ">>I'm sure Intel and Motorola are competing neck-and-neck for \n",
+                        ">>crunch-power, but for a given clock speed, how do we rank the\n",
+                        ">>following (from 1st to 6th):\n",
+                        ">>  486\t\t68040\n",
+                        ">>  386\t\t68030\n",
+                        ">>  286\t\t68020\n",
+                        "\n",
+                        ">040 486 030 386 020 286\n",
+                        "\n",
+                        "How about some numbers here? Some kind of benchmark?\n",
+                        "If you want, let me start it - 486DX2-66 - 32 SPECint92, 16 SPECfp92 .\n",
+                        "\n",
+                        ">>While you're at it, where will the following fit into the list:\n",
+                        ">>  68060\n",
+                        ">>  Pentium\n",
+                        ">>  PowerPC\n",
+                        "\n",
+                        ">060 fastest, then Pentium, with the first versions of the PowerPC\n",
+                        ">somewhere in the vicinity.\n",
+                        "\n",
+                        "Numbers? Pentium @66MHz - 65 SPECint92, 57 SPECfp92 .\n",
+                        "\t PowerPC @66MHz - 50 SPECint92, 80 SPECfp92 . (Note this is the 601)\n",
+                        "        (Alpha @150MHz  - 74 SPECint92,126 SPECfp92 - just for comparison)\n",
+                        "\n",
+                        ">>And about clock speed:  Does doubling the clock speed double the\n",
+                        ">>overall processor speed?  And fill in the __'s below:\n",
+                        ">>  68030 @ __ MHz = 68040 @ __ MHz\n",
+                        "\n",
+                        ">No.  Computer speed is only partly dependent of processor/clock speed.\n",
+                        ">Memory system speed play a large role as does video system speed and\n",
+                        ">I/O speed.  As processor clock rates go up, the speed of the memory\n",
+                        ">system becomes the greatest factor in the overall system speed.  If\n",
+                        ">you have a 50MHz processor, it can be reading another word from memory\n",
+                        ">every 20ns.  Sure, you can put all 20ns memory in your computer, but\n",
+                        ">it will cost 10 times as much as the slower 80ns SIMMs.\n",
+                        "\n",
+                        "Not in a clock-doubled system. There isn't a doubling in performance, but\n",
+                        "it _is_ quite significant. Maybe about a 70% increase in performance.\n",
+                        "\n",
+                        "Besides, for 0 wait state performance, you'd need a cache anyway. I mean,\n",
+                        "who uses a processor that runs at the speed of 80ns SIMMs? Note that this\n",
+                        "memory speed corresponds to a clock speed of 12.5 MHz.\n",
+                        "\n",
+                        ">And roughly, the 68040 is twice as fast at a given clock\n",
+                        ">speed as is the 68030.\n",
+                        "\n",
+                        "Numbers?\n",
+                        "\n",
+                        ">-- \n",
+                        ">Ray Fischer                   \"Convictions are more dangerous enemies of truth\n",
+                        ">ray@netcom.com                 than lies.\"  -- Friedrich Nietzsche\n",
+                        "-- \n",
+                        "Ravikumar Venkateswar\n",
+                        "rvenkate@uiuc.edu\n",
+                        "\n",
+                        "A pun is a no' blessed form of whit.\n",
+                        "\n",
+                        "0.44292082969477664\n",
+                        "----------------------------------------------------------------------------------------------------\n",
+                        "----------------------------------------------------------------------------------------------------\n",
+                        "----------------------------------------------------------------------------------------------------\n",
+                        "From: ray@netcom.com (Ray Fischer)\n",
+                        "Subject: Re: x86 ~= 680x0 ?? (How do they compare?)\n",
+                        "Organization: Netcom. San Jose, California\n",
+                        "Distribution: usa\n",
+                        "Lines: 30\n",
+                        "\n",
+                        "rvenkate@ux4.cso.uiuc.edu (Ravikuma Venkateswar) writes ...\n",
+                        ">ray@netcom.com (Ray Fischer) writes:\n",
+                        ">>040 486 030 386 020 286\n",
+                        ">\n",
+                        ">How about some numbers here? Some kind of benchmark?\n",
+                        "\n",
+                        "Benchmarks are for marketing dweebs and CPU envy.  OK, if it will make\n",
+                        "you happy, the 486 is faster than the 040.  BFD.  Both architectures\n",
+                        "are nearing then end of their lifetimes.  And especially with the x86\n",
+                        "architecture: good riddance.\n",
+                        "\n",
+                        ">Besides, for 0 wait state performance, you'd need a cache anyway. I mean,\n",
+                        ">who uses a processor that runs at the speed of 80ns SIMMs? Note that this\n",
+                        ">memory speed corresponds to a clock speed of 12.5 MHz.\n",
+                        "\n",
+                        "The point being the processor speed is only one of many aspects of a\n",
+                        "computers performance.  Clock speed, processor, memory speed, CPU\n",
+                        "architecture, I/O systems, even the application program all contribute \n",
+                        "to the overall system performance.\n",
+                        "\n",
+                        ">>And roughly, the 68040 is twice as fast at a given clock\n",
+                        ">>speed as is the 68030.\n",
+                        ">\n",
+                        ">Numbers?\n",
+                        "\n",
+                        "Look them up yourself.\n",
+                        "\n",
+                        "-- \n",
+                        "Ray Fischer                   \"Convictions are more dangerous enemies of truth\n",
+                        "ray@netcom.com                 than lies.\"  -- Friedrich Nietzsche\n",
+                        "\n",
+                        "0.3491800997095306\n",
+                        "----------------------------------------------------------------------------------------------------\n",
+                        "----------------------------------------------------------------------------------------------------\n",
+                        "----------------------------------------------------------------------------------------------------\n",
+                        "From: mb4008@cehp11 (Morgan J Bullard)\n",
+                        "Subject: Re: speeding up windows\n",
+                        "Keywords: speed\n",
+                        "Organization: University of Illinois at Urbana\n",
+                        "Lines: 30\n",
+                        "\n",
+                        "djserian@flash.LakeheadU.Ca (Reincarnation of Elvis) writes:\n",
+                        "\n",
+                        ">I have a 386/33 with 8 megs of memory\n",
+                        "\n",
+                        ">I have noticed that lately when I use programs like WpfW or Corel Draw\n",
+                        ">my computer \"boggs\" down and becomes really sluggish!\n",
+                        "\n",
+                        ">What can I do to increase performance?  What should I turn on or off\n",
+                        "\n",
+                        ">Will not loading wallpapers or stuff like that help when it comes to\n",
+                        ">the running speed of windows and the programs that run under it?\n",
+                        "\n",
+                        ">Thanx in advance\n",
+                        "\n",
+                        ">Derek\n",
+                        "\n",
+                        "1) make sure your hard drive is defragmented. This will speed up more than \n",
+                        "   just windows BTW.  Use something like Norton's or PC Tools.\n",
+                        "2) I _think_ that leaving the wall paper out will use less RAM and therefore\n",
+                        "   will speed up your machine but I could very will be wrong on this.\n",
+                        "There's a good chance you've already done this but if not it may speed things\n",
+                        "up.  good luck\n",
+                        "\t\t\t\tMorgan Bullard mb4008@coewl.cen.uiuc.edu\n",
+                        "\t\t\t\t\t  or   mjbb@uxa.cso.uiuc.edu\n",
+                        "\n",
+                        ">--\n",
+                        ">$_    /|$Derek J.P. Serianni $ E-Mail : djserian@flash.lakeheadu.ca           $ \n",
+                        ">$\\'o.O' $Sociologist         $ It's 106 miles to Chicago,we've got a full tank$\n",
+                        ">$=(___)=$Lakehead University $ of gas, half a pack of cigarettes,it's dark,and$\n",
+                        ">$   U   $Thunder Bay, Ontario$ we're wearing sunglasses. -Elwood Blues        $  \n",
+                        "\n",
+                        "0.26949927393886913\n",
+                        "----------------------------------------------------------------------------------------------------\n",
+                        "----------------------------------------------------------------------------------------------------\n",
+                        "----------------------------------------------------------------------------------------------------\n"
+                    ]
+                }
+            ],
+            "source": [
+                "for i in range (1,5):\n",
+                "    print(newsgroups[similarities.argsort()[0][-i]])\n",
+                "    print(np.sort(similarities)[0,-i])\n",
+                "    print('-'*100)\n",
+                "    print('-'*100)\n",
+                "    print('-'*100)"
+            ]
+        },
+        {
+            "cell_type": "markdown",
+            "metadata": {},
+            "source": [
+                "## Zadanie domowe\n",
+                "\n",
+                "\n",
+                "- Wybra\u0107 zbi\u00f3r tekstowy, kt\u00f3ry ma conajmniej 10000 dokument\u00f3w (inny ni\u017c w tym przyk\u0142adzie).\n",
+                "- Na jego podstawie stworzy\u0107 wyszukiwark\u0119 bazuj\u0105c\u0105 na OKAPI BM25, tzn. system kt\u00f3ry dla podanej frazy podaje kilka (5-10) posortowanych najbardziej pasuj\u0105cych dokument\u00f3w razem ze scorami. Nale\u017cy wypisywa\u0107 te\u017c ilo\u015b\u0107 zwracanych dokument\u00f3w, czyli takich z niezerowym scorem. Mo\u017cna korzysta\u0107 z gotowych bibliotek do wektoryzacji dokument\u00f3w, nale\u017cy jednak samemu zaimplementowa\u0107 OKAPI BM25. \n",
+                "- Znale\u017a\u0107 fraz\u0119 (query), dla kt\u00f3rej wynik nie jest satysfakcjonuj\u0105cy.\n",
+                "- Poprawi\u0107 wyszukiwark\u0119 (np. poprzez zmian\u0119 preprocessingu tekstu, wektoryzer, zmian\u0119 parametr\u00f3w algorytmu rankuj\u0105cego lub sam algorytm) tak, \u017ceby zwraca\u0142a satysfakcjonuj\u0105ce wyniki dla poprzedniej frazy. Nale\u017cy zrobi\u0107 inn\u0105 zmian\u0119 ni\u017c w tym przyk\u0142adzie, tylko wymy\u015bli\u0107 co\u015b w\u0142asnego.\n",
+                "- prezentowa\u0107 prac\u0119 na nast\u0119pnych zaj\u0119ciach (14.04) odpowiadaj\u0105c na pytania:\n",
+                " - jak wygl\u0105da zbi\u00f3r i system wyszukiwania przed zmianami\n",
+                " - dla jakiej frazy wyniki s\u0105 niesatysfakcjonuj\u0105ce (pokaza\u0107 wyniki)\n",
+                " - jakie zmiany zosta\u0142y naniesione\n",
+                " - jak wygl\u0105daj\u0105 wyniki wyszukiwania po zmianach\n",
+                " - jak zmiany wp\u0142yn\u0119\u0142y na wyniki (1-2 zdania)\n",
+                " \n",
+                "Prezentacja powinna by\u0107 maksymalnie prosta i trwa\u0107 maksymalnie 2-3 minuty.\n",
+                "punkt\u00f3w do zdobycia: 60\n"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": null,
+            "metadata": {},
+            "outputs": [],
+            "source": []
+        }
+    ],
+    "metadata": {
+        "kernelspec": {
+            "display_name": "Python 3",
+            "language": "python",
+            "name": "python3"
+        },
+        "language_info": {
+            "codemirror_mode": {
+                "name": "ipython",
+                "version": 3
+            },
+            "file_extension": ".py",
+            "mimetype": "text/x-python",
+            "name": "python",
+            "nbconvert_exporter": "python",
+            "pygments_lexer": "ipython3",
+            "version": "3.8.3"
+        },
+        "author": "Jakub Pokrywka",
+        "email": "kubapok@wmi.amu.edu.pl",
+        "lang": "pl",
+        "subtitle": "3.tfidf (2)[\u0107wiczenia]",
+        "title": "Ekstrakcja informacji",
+        "year": "2021"
+    },
+    "nbformat": 4,
+    "nbformat_minor": 4
+}
\ No newline at end of file