{ "cells": [ { "cell_type": "code", "execution_count": 1, "id": "74100403-147c-42cd-8285-e30778c0fb66", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import torch\n", "import csv\n", "import lzma\n", "import gensim.downloader\n", "from nltk import word_tokenize" ] }, { "cell_type": "code", "execution_count": null, "id": "cbe60d7b-850e-4838-b4ce-672f13bf2bb2", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 2, "id": "bf211ece-e27a-4119-a1b9-9a9a610cfb46", "metadata": {}, "outputs": [], "source": [ "def predict_year(x, path_out, model):\n", " results = model.predict(x)\n", " with open(path_out, 'wt') as file:\n", " for r in results:\n", " file.write(str(r) + '\\n') " ] }, { "cell_type": "code", "execution_count": 3, "id": "1ec57d97-a852-490e-8da4-d1e4c9676cd6", "metadata": {}, "outputs": [], "source": [ "def read_file(filename):\n", " result = []\n", " with open(filename, 'r', encoding=\"utf-8\") as file:\n", " for line in file:\n", " text = line.split(\"\\t\")[0].strip()\n", " result.append(text)\n", " return result" ] }, { "cell_type": "code", "execution_count": 4, "id": "86fbfb79-76e7-49f5-b722-2827f93cb03f", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | 0 | \n", "1 | \n", "
---|---|---|
0 | \n", "have you had an medical issues recently? | \n", "1335187994 | \n", "
1 | \n", "It's supposedly aluminum, barium, and strontiu... | \n", "1346187161 | \n", "
2 | \n", "Nobel prizes don't make you rich. | \n", "1337160218 | \n", "
3 | \n", "I came for the article, I stayed for the doctor. | \n", "1277674344 | \n", "
4 | \n", "you resorted to insults AND got owned directly... | \n", "1348538535 | \n", "
... | \n", "... | \n", "... | \n", "
199995 | \n", "It's really sad. My sister used to believe tha... | \n", "1334111989 | \n", "
199996 | \n", "I don't mean it in a dickish way, I'm being se... | \n", "1322700456 | \n", "
199997 | \n", "Fair enough, I stand corrected. | \n", "1354646212 | \n", "
199998 | \n", "Right. Scientists tend to think and conclude l... | \n", "1348777201 | \n", "
199999 | \n", "Because they are illiterate | \n", "1249579722 | \n", "
200000 rows × 2 columns
\n", "\n", " | 0 | \n", "
---|---|
0 | \n", "1 | \n", "
1 | \n", "0 | \n", "
2 | \n", "0 | \n", "
3 | \n", "0 | \n", "
4 | \n", "0 | \n", "
... | \n", "... | \n", "
199995 | \n", "0 | \n", "
199996 | \n", "0 | \n", "
199997 | \n", "1 | \n", "
199998 | \n", "1 | \n", "
199999 | \n", "0 | \n", "
200000 rows × 1 columns
\n", "\n", " | 0 | \n", "1 | \n", "
---|---|---|
0 | \n", "In which case, tell them I'm in work, or dead,... | \n", "1328302967 | \n", "
1 | \n", "Put me down as another for Mysterious Universe... | \n", "1347836881 | \n", "
2 | \n", "The military of any country would never admit ... | \n", "1331905826 | \n", "
3 | \n", "An example would have been more productive tha... | \n", "1315584834 | \n", "
4 | \n", "sorry, but the authors of this article admit t... | \n", "1347389166 | \n", "
... | \n", "... | \n", "... | \n", "
5267 | \n", "Your fault for going at all. That's how we get... | \n", "1308176634 | \n", "
5268 | \n", "EVP....that's a shot in the GH drinking game. | \n", "1354408646 | \n", "
5269 | \n", "i think a good hard massage is good for you. t... | \n", "1305726318 | \n", "
5270 | \n", "Interesting theory. Makes my imagination run w... | \n", "1339839088 | \n", "
5271 | \n", "Tampering of candy? More like cooking somethin... | \n", "1320262659 | \n", "
5272 rows × 2 columns
\n", "\n", " | 0 | \n", "1 | \n", "
---|---|---|
0 | \n", "Gentleman, I believe we can agree that this is... | \n", "1304170330 | \n", "
1 | \n", "The problem is that it will just turn it r/nos... | \n", "1353763204 | \n", "
2 | \n", "Well, according to some Christian apologists, ... | \n", "1336314173 | \n", "
3 | \n", "Don't know if this is what you are looking for... | \n", "1348860314 | \n", "
4 | \n", "I respect what you're saying completely. I jus... | \n", "1341285952 | \n", "
... | \n", "... | \n", "... | \n", "
5147 | \n", "GAMBIT | \n", "1326441107 | \n", "
5148 | \n", ">Joe Rogan is no snake oil salesman.\\n\\nHe ... | \n", "1319464245 | \n", "
5149 | \n", "Reading further, Sagan does seem to agree with... | \n", "1322126150 | \n", "
5150 | \n", "Notice that they never invoke god, or any othe... | \n", "1307679295 | \n", "
5151 | \n", "They might co-ordinate an anniversary attack o... | \n", "1342409261 | \n", "
5152 rows × 2 columns
\n", "