wko-on-cloud-n/data/data_download_instructions.txt
2022-02-16 16:48:37 +01:00

64 lines
3.6 KiB
Plaintext

# HOW TO DOWNLOAD THE CLOUD COVER COMPETITION IMAGES
======================================
Welcome to the On Cloud N: Cloud Cover Detection Challenge! These instructions will help you access the satellite image files for this competition. The imagery and labels are hosted in a set of Azure Blob Storage containers in three regions.
Here are the steps to download the entire training set features and labels to your local machine.
1. Save the `download_data.py` script from the competition data download page:
https://www.drivendata.org/competitions/83/cloud-cover/data/
2. Install the requirements with the command below. You may want to create and activate a Python (>3.6) virtual environment first, for example with conda (https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html).
pip install "cloudpathlib[azure]" loguru tqdm typer
3. To download from the competition container, you'll need a Shared Access Signature (SAS). The competition data is stored on three Azure Blob Containers in different regions―Central US, East Asia, and West Europe. Each region includes identical data, so choose the region closest to the machine you are downloading the data to. If your code is running on the Planetary Computer Hub, use West Europe, which is where the cluster is located. Download the token for your chosen region from the data download page. You should now have a plain-text file named `sas_centralus.txt`, `sas_eastasia.txt`, or `sas_westeurope.txt` containing a token that starts with `https://cloudcoverdata`.
4. Run the download script, passing the path to your SAS token with the `--sas-url` argument. By default, this will download the entire competition dataset to a directory named `data` in your current working directory. You can change the destination with the `--local-directory` argument.
`python download_data.py --sas-url <path to sas file>`
For example, to download from the West Europe region with an SAS token saved as `sas_westeurope.txt`:
`python download_data.py --sas-url sas_westeurope.txt`
You can also download a single directory containing a subset of the data with the `--cloud-directory` flag. For example:
# train features
`python download_data.py --cloud-directory "az://./train_features" --sas-url sas_westeurope.txt`
# train labels
`python download_data.py --cloud-directory "az://./train_labels" --sas-url sas_westeurope.txt`
# single chip
`python download_data.py --cloud-directory "az://./train_features/akny" --sas-url sas_westeurope.txt`
## Download script documentation
```
$ python download_data.py --help
Usage: download_data.py [OPTIONS]
Downloads the challenge dataset to your local machine.
Options:
--sas-url TEXT Shared Access Signature URL that allows you
to access the files (starting with
https://...). This can be either the SAS URL
itself or a path to a file containing the
SAS URL, available from the competition
datasets page. [required]
--cloud-directory TEXT Cloudpathlib URI (`az://./<directory>`) for
cloud directory to be downloaded. [default:
az://.]
--local-directory PATH Directory on your local machine to which the
files are downloaded. [default: data]
```
Good luck! If you have any questions you can always visit the user forum:
https://community.drivendata.org/c/cloud-cover