First test

This commit is contained in:
Th3NiKo 2020-02-28 15:24:47 +01:00
parent c0a565f4d0
commit 8522487dc6
14 changed files with 594295 additions and 294874 deletions

16
.gitignore vendored
View File

@ -1,8 +1,8 @@
*~ *~
*.swp *.swp
*.bak *.bak
*.pyc *.pyc
*.o *.o
.DS_Store .DS_Store
.token .token

View File

@ -1,13 +1,13 @@
Skeptic vs paranormal subreddits Skeptic vs paranormal subreddits
================================ ================================
Classify a reddit as either from Skeptic subreddit or one of the Classify a reddit as either from Skeptic subreddit or one of the
"paranormal" subreddits (Paranormal, UFOs, TheTruthIsHere, Ghosts, "paranormal" subreddits (Paranormal, UFOs, TheTruthIsHere, Ghosts,
,Glitch-in-the-Matrix, conspiracytheories). ,Glitch-in-the-Matrix, conspiracytheories).
Output label is `S` and `P`. Output label is `S` and `P`.
Sources Sources
------- -------
Data taken from <https://archive.org/details/2015_reddit_comments_corpus>. Data taken from <https://archive.org/details/2015_reddit_comments_corpus>.

File diff suppressed because it is too large Load Diff

0
dev-0/mostUsedP.txt Normal file
View File

5272
dev-0/out.tsv Normal file

File diff suppressed because it is too large Load Diff

View File

@ -1 +1 @@
PostText Timestamp PostText Timestamp

1 PostText Timestamp

1500
mostUsed.txt Normal file

File diff suppressed because it is too large Load Diff

1525
mostUsedP.txt Normal file

File diff suppressed because it is too large Load Diff

1531
mostUsedS.txt Normal file

File diff suppressed because it is too large Load Diff

104076
onlyP.txt Normal file

File diff suppressed because one or more lines are too long

185503
onlyS.txt Normal file

File diff suppressed because one or more lines are too long

View File

@ -1 +1 @@
Label Label

1 Label

14
solve.py Normal file
View File

@ -0,0 +1,14 @@
#!/usr/bin/env python3
import pandas as pd
import re
import sys
# sort | uniq -c
#train = pd.read_csv("./train/in.tsv.xz", delimiter='\t')
#import sys
#for line in sys.stdin
#if re.search(r'UFO', line) print("P")
for line in sys.stdin:
if re.search(r'(ufo|lol|camera|picture|contact|phenomen|photo|paralysis|haunted|alien|demon|ghost|levitation|paranormal|spirit|telekinesis|flying|fake|sky|dream)', line.lower()):
print("P")
else:
print("S")

File diff suppressed because it is too large Load Diff