64 lines
3.6 KiB
Plaintext
64 lines
3.6 KiB
Plaintext
|
# HOW TO DOWNLOAD THE CLOUD COVER COMPETITION IMAGES
|
||
|
|
||
|
======================================
|
||
|
|
||
|
Welcome to the On Cloud N: Cloud Cover Detection Challenge! These instructions will help you access the satellite image files for this competition. The imagery and labels are hosted in a set of Azure Blob Storage containers in three regions.
|
||
|
|
||
|
Here are the steps to download the entire training set features and labels to your local machine.
|
||
|
|
||
|
1. Save the `download_data.py` script from the competition data download page:
|
||
|
|
||
|
https://www.drivendata.org/competitions/83/cloud-cover/data/
|
||
|
|
||
|
2. Install the requirements with the command below. You may want to create and activate a Python (>3.6) virtual environment first, for example with conda (https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html).
|
||
|
|
||
|
pip install "cloudpathlib[azure]" loguru tqdm typer
|
||
|
|
||
|
3. To download from the competition container, you'll need a Shared Access Signature (SAS). The competition data is stored on three Azure Blob Containers in different regions―Central US, East Asia, and West Europe. Each region includes identical data, so choose the region closest to the machine you are downloading the data to. If your code is running on the Planetary Computer Hub, use West Europe, which is where the cluster is located. Download the token for your chosen region from the data download page. You should now have a plain-text file named `sas_centralus.txt`, `sas_eastasia.txt`, or `sas_westeurope.txt` containing a token that starts with `https://cloudcoverdata`.
|
||
|
|
||
|
4. Run the download script, passing the path to your SAS token with the `--sas-url` argument. By default, this will download the entire competition dataset to a directory named `data` in your current working directory. You can change the destination with the `--local-directory` argument.
|
||
|
|
||
|
`python download_data.py --sas-url <path to sas file>`
|
||
|
|
||
|
For example, to download from the West Europe region with an SAS token saved as `sas_westeurope.txt`:
|
||
|
|
||
|
`python download_data.py --sas-url sas_westeurope.txt`
|
||
|
|
||
|
You can also download a single directory containing a subset of the data with the `--cloud-directory` flag. For example:
|
||
|
|
||
|
# train features
|
||
|
`python download_data.py --cloud-directory "az://./train_features" --sas-url sas_westeurope.txt`
|
||
|
|
||
|
# train labels
|
||
|
`python download_data.py --cloud-directory "az://./train_labels" --sas-url sas_westeurope.txt`
|
||
|
|
||
|
# single chip
|
||
|
`python download_data.py --cloud-directory "az://./train_features/akny" --sas-url sas_westeurope.txt`
|
||
|
|
||
|
## Download script documentation
|
||
|
|
||
|
```
|
||
|
$ python download_data.py --help
|
||
|
|
||
|
Usage: download_data.py [OPTIONS]
|
||
|
|
||
|
Downloads the challenge dataset to your local machine.
|
||
|
|
||
|
Options:
|
||
|
--sas-url TEXT Shared Access Signature URL that allows you
|
||
|
to access the files (starting with
|
||
|
https://...). This can be either the SAS URL
|
||
|
itself or a path to a file containing the
|
||
|
SAS URL, available from the competition
|
||
|
datasets page. [required]
|
||
|
--cloud-directory TEXT Cloudpathlib URI (`az://./<directory>`) for
|
||
|
cloud directory to be downloaded. [default:
|
||
|
az://.]
|
||
|
--local-directory PATH Directory on your local machine to which the
|
||
|
files are downloaded. [default: data]
|
||
|
```
|
||
|
|
||
|
Good luck! If you have any questions you can always visit the user forum:
|
||
|
|
||
|
https://community.drivendata.org/c/cloud-cover
|