Commit Graph

  • c99c218436 Fix mcc score. master siulkilulki 2018-06-18 14:56:41 +0200
  • 7dd903b3b5 First version of ml hour classificator. siulkilulki 2018-05-28 15:10:31 +0200
  • 6ff4f230db Add redis database to tsv method. siulkilulki 2018-05-27 14:44:55 +0200
  • 606ebb5260 Add redis data structures description. Handle banned users. siulkilulki 2018-05-26 19:07:08 +0200
  • 626307f135 Fix get next utterance (by score). siulkilulki 2018-05-25 15:09:23 +0200
  • 6a3819eb0a Add passing users not annotated utterances (by them) siulkilulki 2018-05-25 12:39:06 +0200
  • fbcf3bad4e Add redis stats, helper script. siulkilulki 2018-05-24 12:56:02 +0200
  • 916703ed5e Refactor app.py and add robust undo functionality. siulkilulki 2018-05-21 01:09:05 +0200
  • 0e5dc170f6 Configure loggers. Add redis stats script. Add logging by ip in db. siulkilulki 2018-05-19 01:05:30 +0200
  • 7da40e76ac Fix find_hours regex. Fix app.py (adapt to addtion of utterances) siulkilulki 2018-05-16 20:33:32 +0200
  • 95491b20a7 Working annotator. Without abuse handling, but logging actions. siulkilulki 2018-05-15 07:13:09 +0200
  • 1f6b1e6ffe Working utterances getting/pickling siulkilulki 2018-05-14 01:51:40 +0200
  • 382666c563 Add test.py for data gathering (data for annotation) siulkilulki 2018-05-11 23:12:21 +0200
  • c617018611 Restructure code. Add frontend template. (logic to be done) siulkilulki 2018-05-04 23:25:07 +0200
  • 6982ac2e59 Add basic wsgi app. Rename extractors, change directories. siulkilulki 2018-04-27 22:44:15 +0200
  • 9b76f4e8aa Add robust recrawling of not completed data. siulkilulki 2018-04-16 23:54:03 +0200
  • e9c4dcd743 Tune download settings. Enable dummy cache with 7 days of expiration. siulkilulki 2018-04-15 12:17:35 +0200
  • a5cb3a090f remove plan.org siulkilulki 2018-04-13 22:44:40 +0200
  • ee636a65f1
    Update README.md siulkilulki 2018-04-13 22:36:04 +0200
  • c83c29e58e Delete old files Dawid Jurkiewicz 2018-04-13 22:33:11 +0200
  • 0bba61bbcd Fix checking if response is a binary string. Dawid Jurkiewicz 2018-04-13 21:45:20 +0200
  • 21ba56a8fa Add domain-blacklist.txt, domain filter, modify crawler. Dawid Jurkiewicz 2018-04-09 23:52:11 +0200
  • 6107a89c78
    Update README.md siulkilulki 2018-04-06 23:43:14 +0200
  • f9c5690657 Modifiy error logging in get_parishes_url. Enhance crawl_deon.py Dawid Jurkiewicz 2018-04-06 23:33:18 +0200
  • ccc4af3d51 Fix get parishes urls script. Dawid Jurkiewicz 2018-04-04 20:29:48 +0200
  • 56f704630e Add raw data viewer. Dawid Jurkiewicz 2018-03-30 22:04:14 +0200
  • 88c55891f4 Update README.md siulkilulki 2018-03-15 18:46:18 +0100
  • 63c4a71812 Add converter of content field in jsonline from html to text. Dawid Jurkiewicz 2018-03-15 16:09:59 +0100
  • 3027e1e7cc Switch to pure html download. Enhanced urls filtering. Dawid Jurkiewicz 2018-03-11 18:02:31 +0100
  • b433a5e297 Code refactorings. Dawid Jurkiewicz 2018-03-01 18:16:11 +0100
  • 0070ffe07d Merge branch 'master' of github.com:siulkilulki/mass-scraper Dawid Jurkiewicz 2018-03-01 14:50:49 +0100
  • 8b72d0b351 Prototype rule based masses extractor. Dawid Jurkiewicz 2018-01-20 21:55:26 +0100
  • c3b86fe5a9 Prototype rule based masses extractor. Dawid Jurkiewicz 2018-01-20 21:55:26 +0100
  • 7161193169 Add prototype basic crawl siulkilulki 2017-11-21 22:51:09 +0100
  • 9f1423b362 fixed url checking siulkilulki 2017-06-21 22:51:53 +0200
  • 5ad2a36499 urlschecker alpha & sync Dawid Jurkiewicz 2017-06-21 21:52:20 +0200
  • b17fe9b5c2 fix varaible name siulkilulki 2017-06-19 08:13:08 +0200
  • 4ae6cd24c0 fix proxy conditional statement siulkilulki 2017-06-18 21:44:12 +0200
  • f54e01581c code refactorings and improvements siulkilulki 2017-06-18 21:33:44 +0200
  • b16f29ef6d changed prdriver location siulkilulki 2017-06-12 22:17:23 +0200
  • 57315f9b31 proof of concept alpha siulkilulki 2017-06-12 22:08:29 +0200
  • de56ecb253 done proxy.py siulkilulki 2017-06-11 00:00:22 +0200
  • c205e1b627 added proxy downloader siulkilulki 2017-06-10 02:09:22 +0200
  • 35d3b11ec6 add downloaded parishes siulkilulki 2017-04-21 00:29:17 +0200
  • 35db6760f7 Merge branch 'master' of https://github.com/siulkilulki/mass-scraper siulkilulki 2017-04-20 10:56:23 +0200
  • 7aed0dda4f add parish scrapping script siulkilulki 2017-04-20 10:51:02 +0200
  • d25f3f2757 Update temat.md siulkilulki 2017-03-14 17:11:33 +0100
  • b463dee0d2 Update temat.md siulkilulki 2017-03-14 17:10:24 +0100
  • 5dc436781b add description of thesis siulkilulki 2017-03-14 17:08:44 +0100
  • af01adb7ab Initial commit siulkilulki 2017-03-10 16:05:59 +0100