naive_bayes/README.md

758 B

Skeptic vs paranormal subreddits

Classify a reddit as either from Skeptic subreddit or one of the "paranormal" subreddits (Paranormal, UFOs, TheTruthIsHere, Ghosts, ,Glitch-in-the-Matrix, conspiracytheories).

Output label is the probability of a paranormal subreddit.

Pytorch logistic regression

The code can be found in Logistic.py

Trained models end with .pth extension.

Geval results:

$ ./geval -t dev-0
Likelihood	0.0000
Accuracy	0.7043
F1.0	0.4950
Precision	0.6257
Recall	0.4094

Logs from training have been copy-pasted into l1_epochs.txt (for single-layer model) and `l2_epochs.txt (for two-layer model).

Sources

Data taken from https://archive.org/details/2015_reddit_comments_corpus.