mass-scraper/parishwebsites
siulkilulki e9c4dcd743 Tune download settings. Enable dummy cache with 7 days of expiration.
Fix generating spiider commands.
Add redirected domain appenid to allowed domains.
Configure loggers.
Add more meta info to *processed.txt
Enhance view raw data python jsnoline viewer
2018-04-15 12:17:35 +02:00
..
parishwebsites Tune download settings. Enable dummy cache with 7 days of expiration. 2018-04-15 12:17:35 +02:00
commands-wrapper.sh Prototype rule based masses extractor. 2018-03-01 14:40:13 +01:00
convert_content2text.py Add converter of content field in jsonline from html to text. 2018-03-15 16:09:59 +01:00
domain-blacklist.txt Add domain-blacklist.txt, domain filter, modify crawler. 2018-04-09 23:53:36 +02:00
generate_spider_commands.sh Tune download settings. Enable dummy cache with 7 days of expiration. 2018-04-15 12:17:35 +02:00
remove_blacklisted.py Fix checking if response is a binary string. 2018-04-13 21:45:20 +02:00
scrapy.cfg Prototype rule based masses extractor. 2018-03-01 14:40:13 +01:00
view_raw_data.py Tune download settings. Enable dummy cache with 7 days of expiration. 2018-04-15 12:17:35 +02:00