mass-scraper/parishwebsites
Dawid Jurkiewicz 21ba56a8fa Add domain-blacklist.txt, domain filter, modify crawler.
Add binary or not checker.
2018-04-09 23:53:36 +02:00
..
parishwebsites Add domain-blacklist.txt, domain filter, modify crawler. 2018-04-09 23:53:36 +02:00
commands-wrapper.sh Prototype rule based masses extractor. 2018-03-01 14:40:13 +01:00
convert_content2text.py Add converter of content field in jsonline from html to text. 2018-03-15 16:09:59 +01:00
domain-blacklist.txt Add domain-blacklist.txt, domain filter, modify crawler. 2018-04-09 23:53:36 +02:00
empty_files Prototype rule based masses extractor. 2018-03-01 14:40:13 +01:00
generate_spider_commands.sh Prototype rule based masses extractor. 2018-03-01 14:40:13 +01:00
missing_data.txt Prototype rule based masses extractor. 2018-03-01 14:40:13 +01:00
processed.final.txt Prototype rule based masses extractor. 2018-03-01 14:40:13 +01:00
remove_blacklisted.py Add domain-blacklist.txt, domain filter, modify crawler. 2018-04-09 23:53:36 +02:00
scrapy.cfg Prototype rule based masses extractor. 2018-03-01 14:40:13 +01:00
view_raw_data.py Add raw data viewer. 2018-03-30 22:10:41 +02:00