wikisource-crawler/README.md

# Wikisource crawler and image downloader

## Requirements:
Python 3.8>
 
## Install/setup:
`pip install -r requirements.txt`

## Usage crawler
`python crawler.py --type {green or yellow or red} --output_file_name {output tsv file name} --start_file_name {name of file to start crawling from} --start_page_number {page of file to start crawling}`

## Usage image downloader
`python image_download.py --file_path {tsv file with data to download} --output_folder {folder to output images -> default images} --max_folder_size_mb {size in MB to stop, if not given will download all} --from_checkpoint {True to start from checkpoint if pickle available}`
readme and update for mb lock 2023-01-10 19:05:56 +01:00			`# Wikisource crawler and image downloader`

			`## Requirements:`
			`Python 3.8>`

			`## Install/setup:`
			`pip install -r requirements.txt`

			`## Usage crawler`
			`python crawler.py --type {green or yellow or red} --output_file_name {output tsv file name} --start_file_name {name of file to start crawling from} --start_page_number {page of file to start crawling}`

			`## Usage image downloader`
			`python image_download.py --file_path {tsv file with data to download} --output_folder {folder to output images -> default images} --max_folder_size_mb {size in MB to stop, if not given will download all} --from_checkpoint {True to start from checkpoint if pickle available}`