Dawid Jurkiewicz
|
21ba56a8fa
|
Add domain-blacklist.txt, domain filter, modify crawler.
Add binary or not checker.
|
2018-04-09 23:53:36 +02:00 |
|
Dawid Jurkiewicz
|
56f704630e
|
Add raw data viewer.
|
2018-03-30 22:10:41 +02:00 |
|
Dawid Jurkiewicz
|
63c4a71812
|
Add converter of content field in jsonline from html to text.
|
2018-03-15 16:09:59 +01:00 |
|
Dawid Jurkiewicz
|
3027e1e7cc
|
Switch to pure html download. Enhanced urls filtering.
Update Makefile.
|
2018-03-11 18:02:31 +01:00 |
|
Dawid Jurkiewicz
|
b433a5e297
|
Code refactorings.
|
2018-03-01 18:16:11 +01:00 |
|
Dawid Jurkiewicz
|
8b72d0b351
|
Prototype rule based masses extractor.
Added spider.
Started working on testsets.
|
2018-03-01 14:40:13 +01:00 |
|