Commit Graph

3 Commits

Author SHA1 Message Date
Dawid Jurkiewicz
21ba56a8fa Add domain-blacklist.txt, domain filter, modify crawler.
Add binary or not checker.
2018-04-09 23:53:36 +02:00
Dawid Jurkiewicz
3027e1e7cc Switch to pure html download. Enhanced urls filtering.
Update Makefile.
2018-03-11 18:02:31 +01:00
Dawid Jurkiewicz
8b72d0b351 Prototype rule based masses extractor.
Added spider.
Started working on testsets.
2018-03-01 14:40:13 +01:00