Commit Graph

5 Commits

Author SHA1 Message Date
Dawid Jurkiewicz
21ba56a8fa Add domain-blacklist.txt, domain filter, modify crawler.
Add binary or not checker.
2018-04-09 23:53:36 +02:00
Dawid Jurkiewicz
f9c5690657 Modifiy error logging in get_parishes_url. Enhance crawl_deon.py
Fix Makefile - append instead of rewrite.
2018-04-06 23:33:18 +02:00
Dawid Jurkiewicz
3027e1e7cc Switch to pure html download. Enhanced urls filtering.
Update Makefile.
2018-03-11 18:02:31 +01:00
Dawid Jurkiewicz
b433a5e297 Code refactorings. 2018-03-01 18:16:11 +01:00
Dawid Jurkiewicz
8b72d0b351 Prototype rule based masses extractor.
Added spider.
Started working on testsets.
2018-03-01 14:40:13 +01:00