72 KiB
Inżynieria uczenia maszynowego
10. DVC [laboratoria]
Tomasz Ziętkiewicz (2023)
DVC - Data Version Control
- dvc.org
- "Version Control System for Machine Learning Projects" (System kontroli wersji dla projektów uczenia maszynowego)
- Open Source
- Umożliwia:
- wersjonowanie danych i modeli. "Git dla danych i modeli"
- budowanie potoków ("pipeline") definiujących jak budować/trenować/ewaluować modele. "Makefile dla uczenia maszynowego"
- śledzenie, porównywanie metryk i parametrów
- ściśle zintegowany z gitem
- działa niezależnie od używanego języka/bibliotek i systemu operacyjnego
- 5-minutowe wprowadzenie: https://www.youtube.com/watch?v=UbL7VUpv1Bs&t=197s
Śledzenie plików za pomocą DVC
- dużymi plikami, takimi jak plikami z danymi wejściowymi czy plikami modeli, trudno zarządza się za pomocą gita, ze względu na problemy z:
- wydajnością
- przestrzenią w repozytorium
- ograniczenia ze strony serwisu (np. limit 100 MB na plik w Github)
- Git posiada rozszerzenie lfs(Large File Storage), które stanowi pewne rozwiązanie tego problemu.
- Same pliki przechowywane są na specjalnym zdalnym serwerze, w repozytorium przechowywane są jedynie odnośniki do tych plików i pewne metadane
- Github ma zintegrowany LFS z limitem 1GB dla kont bezpłatnych
- DVC proponuje podobne podejście co LFS, ale:
Instalacja i inicjalizacja
- https://dvc.org/doc/install
pip(x) install dvc
albo:conda install dvc
!pip3 install dvc
Collecting dvc Downloading dvc-2.55.0-py3-none-any.whl (419 kB) [K |████████████████████████████████| 419 kB 794 kB/s eta 0:00:01 [?25hCollecting funcy>=1.14 Downloading funcy-2.0-py2.py3-none-any.whl (30 kB) Collecting voluptuous>=0.11.7 Using cached voluptuous-0.13.1-py3-none-any.whl (29 kB) Collecting dvc-http>=2.29.0 Downloading dvc_http-2.30.2-py3-none-any.whl (12 kB) Requirement already satisfied: colorama>=0.3.9 in /home/tomek/miniconda3/lib/python3.9/site-packages (from dvc) (0.4.6) Collecting pathspec>=0.10.3 Downloading pathspec-0.11.1-py3-none-any.whl (29 kB) Collecting pygtrie>=2.3.2 Downloading pygtrie-2.5.0-py3-none-any.whl (25 kB) Requirement already satisfied: ruamel.yaml>=0.17.11 in /home/tomek/miniconda3/lib/python3.9/site-packages (from dvc) (0.17.21) Requirement already satisfied: tabulate>=0.8.7 in /home/tomek/miniconda3/lib/python3.9/site-packages (from dvc) (0.9.0) Collecting zc.lockfile>=1.2.1 Downloading zc.lockfile-3.0.post1-py3-none-any.whl (9.8 kB) Collecting dpath<3,>=2.1.0 Downloading dpath-2.1.5-py3-none-any.whl (17 kB) Collecting shtab<2,>=1.3.4 Downloading shtab-1.6.1-py3-none-any.whl (13 kB) Requirement already satisfied: tqdm<5,>=4.63.1 in /home/tomek/miniconda3/lib/python3.9/site-packages (from dvc) (4.64.0) Collecting pydot>=1.2.4 Using cached pydot-1.4.2-py2.py3-none-any.whl (21 kB) Collecting scmrepo<2,>=1.0.0 Downloading scmrepo-1.0.2-py3-none-any.whl (54 kB) [K |████████████████████████████████| 54 kB 1.8 MB/s eta 0:00:01 [?25hCollecting flatten-dict<1,>=0.4.1 Using cached flatten_dict-0.4.2-py2.py3-none-any.whl (9.7 kB) Collecting psutil>=5.8 Downloading psutil-5.9.5-cp36-abi3-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (282 kB) [K |████████████████████████████████| 282 kB 21.9 MB/s eta 0:00:01 [?25hCollecting dvc-data<0.48,>=0.47.1 Downloading dvc_data-0.47.2-py3-none-any.whl (59 kB) [K |████████████████████████████████| 59 kB 4.1 MB/s eta 0:00:01 [?25hCollecting dvc-render<0.4.0,>=0.3.1 Downloading dvc_render-0.3.1-py3-none-any.whl (18 kB) Collecting dvc-studio-client<1,>=0.6.1 Downloading dvc_studio_client-0.8.0-py3-none-any.whl (10 kB) Collecting flufl.lock>=5 Downloading flufl.lock-7.1.1-py3-none-any.whl (11 kB) Requirement already satisfied: platformdirs<4,>=3.1.1 in /home/tomek/miniconda3/lib/python3.9/site-packages (from dvc) (3.1.1) Collecting networkx>=2.5 Downloading networkx-3.1-py3-none-any.whl (2.1 MB) [K |████████████████████████████████| 2.1 MB 14.1 MB/s eta 0:00:01 [?25hCollecting grandalf<1,>=0.7 Downloading grandalf-0.8-py3-none-any.whl (41 kB) [K |████████████████████████████████| 41 kB 304 kB/s eta 0:00:01 [?25hCollecting hydra-core>=1.1 Downloading hydra_core-1.3.2-py3-none-any.whl (154 kB) [K |████████████████████████████████| 154 kB 14.3 MB/s eta 0:00:01 [?25hRequirement already satisfied: pyparsing>=2.4.7 in /home/tomek/.local/lib/python3.9/site-packages (from dvc) (3.0.9) Collecting tomlkit>=0.11.1 Downloading tomlkit-0.11.7-py3-none-any.whl (35 kB) Requirement already satisfied: requests>=2.22 in /home/tomek/miniconda3/lib/python3.9/site-packages (from dvc) (2.27.1) Requirement already satisfied: packaging>=19 in /home/tomek/miniconda3/lib/python3.9/site-packages (from dvc) (23.0) Collecting distro>=1.3 Downloading distro-1.8.0-py3-none-any.whl (20 kB) Collecting shortuuid>=0.5 Downloading shortuuid-1.0.11-py3-none-any.whl (10 kB) Collecting rich>=12 Downloading rich-13.3.4-py3-none-any.whl (238 kB) [K |████████████████████████████████| 238 kB 11.6 MB/s eta 0:00:01 [?25hCollecting dvc-task<1,>=0.2.0 Downloading dvc_task-0.2.0-py3-none-any.whl (23 kB) Collecting configobj>=5.0.6 Downloading configobj-5.0.8-py2.py3-none-any.whl (36 kB) Collecting iterative-telemetry>=0.0.7 Downloading iterative_telemetry-0.0.8-py3-none-any.whl (10 kB) Requirement already satisfied: six in /home/tomek/miniconda3/lib/python3.9/site-packages (from configobj>=5.0.6->dvc) (1.16.0) Collecting dvc-objects<1,>=0.21.1 Downloading dvc_objects-0.21.2-py3-none-any.whl (37 kB) Requirement already satisfied: attrs>=21.3.0 in /home/tomek/miniconda3/lib/python3.9/site-packages (from dvc-data<0.48,>=0.47.1->dvc) (22.2.0) Collecting dictdiffer>=0.8.1 Using cached dictdiffer-0.9.0-py2.py3-none-any.whl (16 kB) Collecting nanotime>=0.5.2 Using cached nanotime-0.5.2.tar.gz (3.2 kB) Collecting diskcache>=5.2.1 Downloading diskcache-5.6.1-py3-none-any.whl (45 kB) [K |████████████████████████████████| 45 kB 905 kB/s eta 0:00:01 [?25hCollecting sqltrie<1,>=0.3.1 Downloading sqltrie-0.3.1-py3-none-any.whl (16 kB) Requirement already satisfied: fsspec[http] in /home/tomek/miniconda3/lib/python3.9/site-packages (from dvc-http>=2.29.0->dvc) (2023.3.0) Collecting aiohttp-retry>=2.5.0 Downloading aiohttp_retry-2.8.3-py3-none-any.whl (9.8 kB) Requirement already satisfied: aiohttp in /home/tomek/miniconda3/lib/python3.9/site-packages (from aiohttp-retry>=2.5.0->dvc-http>=2.29.0->dvc) (3.8.4) Requirement already satisfied: typing-extensions>=3.7.4 in /home/tomek/miniconda3/lib/python3.9/site-packages (from dvc-objects<1,>=0.21.1->dvc-data<0.48,>=0.47.1->dvc) (4.5.0) Collecting dulwich Downloading dulwich-0.21.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (505 kB) [K |████████████████████████████████| 505 kB 16.6 MB/s eta 0:00:01 [?25hCollecting celery<6,>=5.2.0 Downloading celery-5.2.7-py3-none-any.whl (405 kB) [K |████████████████████████████████| 405 kB 19.2 MB/s eta 0:00:01 [?25hCollecting kombu<6,>=5.2.0 Downloading kombu-5.2.4-py3-none-any.whl (189 kB) [K |████████████████████████████████| 189 kB 14.8 MB/s eta 0:00:01 [?25hCollecting click-didyoumean>=0.0.3 Downloading click_didyoumean-0.3.0-py3-none-any.whl (2.7 kB) Collecting billiard<4.0,>=3.6.4.0 Downloading billiard-3.6.4.0-py3-none-any.whl (89 kB) [K |████████████████████████████████| 89 kB 3.8 MB/s eta 0:00:01 [?25hCollecting vine<6.0,>=5.0.0 Downloading vine-5.0.0-py2.py3-none-any.whl (9.4 kB) Collecting click-repl>=0.2.0 Downloading click_repl-0.2.0-py3-none-any.whl (5.2 kB) Requirement already satisfied: click<9.0,>=8.0.3 in /home/tomek/miniconda3/lib/python3.9/site-packages (from celery<6,>=5.2.0->dvc-task<1,>=0.2.0->dvc) (8.1.3) Collecting click-plugins>=1.1.1 Downloading click_plugins-1.1.1-py2.py3-none-any.whl (7.5 kB) Requirement already satisfied: pytz>=2021.3 in /home/tomek/miniconda3/lib/python3.9/site-packages (from celery<6,>=5.2.0->dvc-task<1,>=0.2.0->dvc) (2022.7.1) Requirement already satisfied: prompt-toolkit in /home/tomek/miniconda3/lib/python3.9/site-packages (from click-repl>=0.2.0->celery<6,>=5.2.0->dvc-task<1,>=0.2.0->dvc) (3.0.38) Collecting atpublic>=2.3 Downloading atpublic-3.1.1-py3-none-any.whl (4.8 kB) Collecting antlr4-python3-runtime==4.9.* Downloading antlr4-python3-runtime-4.9.3.tar.gz (117 kB) [K |████████████████████████████████| 117 kB 17.4 MB/s eta 0:00:01 [?25hCollecting omegaconf<2.4,>=2.2 Downloading omegaconf-2.3.0-py3-none-any.whl (79 kB) [K |████████████████████████████████| 79 kB 3.6 MB/s eta 0:00:01 [?25hCollecting appdirs Using cached appdirs-1.4.4-py2.py3-none-any.whl (9.6 kB) Requirement already satisfied: filelock in /home/tomek/miniconda3/lib/python3.9/site-packages (from iterative-telemetry>=0.0.7->dvc) (3.9.1) Collecting amqp<6.0.0,>=5.0.9 Downloading amqp-5.1.1-py3-none-any.whl (50 kB) [K |████████████████████████████████| 50 kB 2.7 MB/s eta 0:00:01 [?25hRequirement already satisfied: PyYAML>=5.1.0 in /home/tomek/miniconda3/lib/python3.9/site-packages (from omegaconf<2.4,>=2.2->hydra-core>=1.1->dvc) (6.0) Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/tomek/miniconda3/lib/python3.9/site-packages (from requests>=2.22->dvc) (1.26.9) Requirement already satisfied: charset-normalizer~=2.0.0 in /home/tomek/miniconda3/lib/python3.9/site-packages (from requests>=2.22->dvc) (2.0.4) Requirement already satisfied: certifi>=2017.4.17 in /home/tomek/miniconda3/lib/python3.9/site-packages (from requests>=2.22->dvc) (2022.12.7) Requirement already satisfied: idna<4,>=2.5 in /home/tomek/miniconda3/lib/python3.9/site-packages (from requests>=2.22->dvc) (3.3) Collecting markdown-it-py<3.0.0,>=2.2.0 Downloading markdown_it_py-2.2.0-py3-none-any.whl (84 kB) [K |████████████████████████████████| 84 kB 1.9 MB/s eta 0:00:01 [?25hRequirement already satisfied: pygments<3.0.0,>=2.13.0 in /home/tomek/miniconda3/lib/python3.9/site-packages (from rich>=12->dvc) (2.14.0) Collecting mdurl~=0.1 Downloading mdurl-0.1.2-py3-none-any.whl (10.0 kB) Requirement already satisfied: ruamel.yaml.clib>=0.2.6 in /home/tomek/miniconda3/lib/python3.9/site-packages (from ruamel.yaml>=0.17.11->dvc) (0.2.6) Collecting pygit2>=1.10.0 Downloading pygit2-1.12.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.9 MB) [K |████████████████████████████████| 4.9 MB 13.6 MB/s eta 0:00:01 [?25hCollecting asyncssh<3,>=2.13.1 Downloading asyncssh-2.13.1-py3-none-any.whl (348 kB) [K |████████████████████████████████| 348 kB 38.4 MB/s eta 0:00:01 [?25hRequirement already satisfied: gitpython>3 in /home/tomek/miniconda3/lib/python3.9/site-packages (from scmrepo<2,>=1.0.0->dvc) (3.1.31) Requirement already satisfied: cryptography>=3.1 in /home/tomek/miniconda3/lib/python3.9/site-packages (from asyncssh<3,>=2.13.1->scmrepo<2,>=1.0.0->dvc) (37.0.1) Requirement already satisfied: cffi>=1.12 in /home/tomek/miniconda3/lib/python3.9/site-packages (from cryptography>=3.1->asyncssh<3,>=2.13.1->scmrepo<2,>=1.0.0->dvc) (1.15.0) Requirement already satisfied: pycparser in /home/tomek/miniconda3/lib/python3.9/site-packages (from cffi>=1.12->cryptography>=3.1->asyncssh<3,>=2.13.1->scmrepo<2,>=1.0.0->dvc) (2.21) Requirement already satisfied: gitdb<5,>=4.0.1 in /home/tomek/miniconda3/lib/python3.9/site-packages (from gitpython>3->scmrepo<2,>=1.0.0->dvc) (4.0.10) Requirement already satisfied: smmap<6,>=3.0.1 in /home/tomek/miniconda3/lib/python3.9/site-packages (from gitdb<5,>=4.0.1->gitpython>3->scmrepo<2,>=1.0.0->dvc) (5.0.0) Collecting orjson Downloading orjson-3.8.10-cp39-cp39-manylinux_2_28_x86_64.whl (140 kB) [K |████████████████████████████████| 140 kB 39.5 MB/s eta 0:00:01 [?25hRequirement already satisfied: setuptools in /home/tomek/miniconda3/lib/python3.9/site-packages (from zc.lockfile>=1.2.1->dvc) (61.2.0) Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /home/tomek/miniconda3/lib/python3.9/site-packages (from aiohttp->aiohttp-retry>=2.5.0->dvc-http>=2.29.0->dvc) (4.0.2) Requirement already satisfied: multidict<7.0,>=4.5 in /home/tomek/miniconda3/lib/python3.9/site-packages (from aiohttp->aiohttp-retry>=2.5.0->dvc-http>=2.29.0->dvc) (6.0.4) Requirement already satisfied: yarl<2.0,>=1.0 in /home/tomek/miniconda3/lib/python3.9/site-packages (from aiohttp->aiohttp-retry>=2.5.0->dvc-http>=2.29.0->dvc) (1.8.2) Requirement already satisfied: aiosignal>=1.1.2 in /home/tomek/miniconda3/lib/python3.9/site-packages (from aiohttp->aiohttp-retry>=2.5.0->dvc-http>=2.29.0->dvc) (1.3.1) Requirement already satisfied: frozenlist>=1.1.1 in /home/tomek/miniconda3/lib/python3.9/site-packages (from aiohttp->aiohttp-retry>=2.5.0->dvc-http>=2.29.0->dvc) (1.3.3) Requirement already satisfied: wcwidth in /home/tomek/miniconda3/lib/python3.9/site-packages (from prompt-toolkit->click-repl>=0.2.0->celery<6,>=5.2.0->dvc-task<1,>=0.2.0->dvc) (0.2.6) Building wheels for collected packages: antlr4-python3-runtime, nanotime Building wheel for antlr4-python3-runtime (setup.py) ... [?25ldone [?25h Created wheel for antlr4-python3-runtime: filename=antlr4_python3_runtime-4.9.3-py3-none-any.whl size=144575 sha256=94691fc7a4109d606872ddee3ae9575c3c9f9f945643a27b5514fce3176c552a Stored in directory: /home/tomek/.cache/pip/wheels/23/cf/80/f3efa822e6ab23277902ee9165fe772eeb1dfb8014f359020a Building wheel for nanotime (setup.py) ... [?25ldone [?25h Created wheel for nanotime: filename=nanotime-0.5.2-py3-none-any.whl size=2441 sha256=42933d16d8f6362832282dea6b0b44f2bdd41b0eb0d68de121660a8a0db1f96c Stored in directory: /home/tomek/.cache/pip/wheels/ee/1f/7c/610bdb7d5541b98d9743c5953e32681ef35dd54fadddd347e8 Successfully built antlr4-python3-runtime nanotime Installing collected packages: vine, amqp, shortuuid, pygtrie, orjson, mdurl, kombu, funcy, click-repl, click-plugins, click-didyoumean, billiard, antlr4-python3-runtime, voluptuous, sqltrie, pygit2, psutil, pathspec, omegaconf, nanotime, markdown-it-py, dvc-objects, dulwich, distro, diskcache, dictdiffer, celery, atpublic, asyncssh, appdirs, aiohttp-retry, zc.lockfile, tomlkit, shtab, scmrepo, rich, pydot, networkx, iterative-telemetry, hydra-core, grandalf, flufl.lock, flatten-dict, dvc-task, dvc-studio-client, dvc-render, dvc-http, dvc-data, dpath, configobj, dvc Successfully installed aiohttp-retry-2.8.3 amqp-5.1.1 antlr4-python3-runtime-4.9.3 appdirs-1.4.4 asyncssh-2.13.1 atpublic-3.1.1 billiard-3.6.4.0 celery-5.2.7 click-didyoumean-0.3.0 click-plugins-1.1.1 click-repl-0.2.0 configobj-5.0.8 dictdiffer-0.9.0 diskcache-5.6.1 distro-1.8.0 dpath-2.1.5 dulwich-0.21.3 dvc-2.55.0 dvc-data-0.47.2 dvc-http-2.30.2 dvc-objects-0.21.2 dvc-render-0.3.1 dvc-studio-client-0.8.0 dvc-task-0.2.0 flatten-dict-0.4.2 flufl.lock-7.1.1 funcy-2.0 grandalf-0.8 hydra-core-1.3.2 iterative-telemetry-0.0.8 kombu-5.2.4 markdown-it-py-2.2.0 mdurl-0.1.2 nanotime-0.5.2 networkx-3.1 omegaconf-2.3.0 orjson-3.8.10 pathspec-0.11.1 psutil-5.9.5 pydot-1.4.2 pygit2-1.12.0 pygtrie-2.5.0 rich-13.3.4 scmrepo-1.0.2 shortuuid-1.0.11 shtab-1.6.1 sqltrie-0.3.1 tomlkit-0.11.7 vine-5.0.0 voluptuous-0.13.1 zc.lockfile-3.0.post1
Stwórzmy katalog, w którym będziemy przechowywać nasz projekt:
!rm -r -f IUM_10/sample-ml-project-2023
!mkdir -p IUM_10/sample-ml-project-2023
#Jupyter notebook magic https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-cd
%cd "IUM_10/sample-ml-project-2023"
/home/tomek/repos/aitech-ium/IUM_10/sample-ml-project-2023/IUM_10/sample-ml-project-2023
Inicjalizujemy puste repozytorium Git (możemy też pominąć ten krok i działać w istniejącym już repozytorium)
!git init
Initialized empty Git repository in /home/tomek/repos/aitech-ium/IUM_10/sample-ml-project-2023/IUM_10/sample-ml-project-2023/.git/
Teraz inicjalizujemy repozytorium DVC:
!dvc init
Initialized DVC repository. You can now commit the changes to git. [31m+---------------------------------------------------------------------+ [0m[31m|[0m [31m|[0m [31m|[0m DVC has enabled anonymous aggregate usage analytics. [31m|[0m [31m|[0m Read the analytics documentation (and how to opt-out) here: [31m|[0m [31m|[0m <[36mhttps://dvc.org/doc/user-guide/analytics[39m> [31m|[0m [31m|[0m [31m|[0m [31m+---------------------------------------------------------------------+ [0m [33mWhat's next?[39m [33m------------[39m - Check out the documentation: <[36mhttps://dvc.org/doc[39m> - Get help and share ideas: <[36mhttps://dvc.org/chat[39m> - Star us on GitHub: <[36mhttps://github.com/iterative/dvc[39m> [0m
Zobaczmy jakie pliki dodał (również do repozytorium git) DVC. Ich opis znajdziemy tutaj: https://dvc.org/doc/user-guide/project-structure/internal-files
!git status
On branch main No commits yet Changes to be committed: (use "git rm --cached <file>..." to unstage) [32mnew file: .dvc/.gitignore[m [32mnew file: .dvc/config[m [32mnew file: .dvcignore[m
.dvc/config
- główny plik konfiguracyjny dvc.dvc/config.local
- nadpisuje wartości zconfig
, do lokalnych zmian nie commitowanych do repo.dvc/.gitignore
- pliki dvc, które nie mają znaleźć się w repo.dvcignore
- dvc pomija pliki zdefiniowane w tym pliku (np. aby poprawić wydajność)
Możemy teraz zacommitować zmiany w git:
!git commit -m "Initial commit"
[main (root-commit) 6b03a40] Initial commit 3 files changed, 6 insertions(+) create mode 100644 .dvc/.gitignore create mode 100644 .dvc/config create mode 100644 .dvcignore
Przygotujmy przykładowe dane, pobierając je z Kaggle:
!kaggle datasets download -d uciml/iris
!unzip -o iris.zip
!rm database.sqlite iris.zip
!mkdir -p data
!mv Iris.csv data/
Downloading iris.zip to /home/tomek/repos/aitech-ium/IUM_10/sample-ml-project-2023/IUM_10/sample-ml-project-2023 0%| | 0.00/3.60k [00:00<?, ?B/s] 100%|██████████████████████████████████████| 3.60k/3.60k [00:00<00:00, 3.38MB/s] Archive: iris.zip inflating: Iris.csv inflating: database.sqlite
Teraz dodamy plik(i) z danymi do DVC:
!dvc add data/Iris.csv
[2K[32m⠋[0m Checking graph [32m⠋[0m Checking graph Adding... ![A 0% Checking cache in '/home/tomek/repos/aitech-ium/IUM_10/sample-ml-project-20[A [A ![A 0%| |Transferring 0/? [00:00<?, ?file/s][A 0%| |Transferring 0/1 [00:00<?, ?file/s][A 100% Adding...|████████████████████████████████████████|1/1 [00:00, 13.53file/s][A To track the changes with git, run: git add data/Iris.csv.dvc data/.gitignore To enable auto staging, run: dvc config core.autostage true [0m
- DVC utworzył plik
data/Iris.csv.dvc
i dadał oryginalny plik do.gitignore
- W repozytorium będzie obecny tylko plik
*.dvc
, zawierający odnośnik do prawdziwego pliku
!git status -u
On branch main Untracked files: (use "git add <file>..." to include in what will be committed) [31mdata/.gitignore[m [31mdata/Iris.csv.dvc[m nothing added to commit but untracked files present (use "git add" to track)
Dodajmy pliki data/Iris.csv.dvc data/.gitignore
do repozytorium git, zgodnie z sugestią DVC:
!git add data/Iris.csv.dvc data/.gitignore
!git commit -m "Dodano dane IRIS (DVC)"
[main 812cb53] Dodano dane IRIS (DVC) 2 files changed, 5 insertions(+) create mode 100644 data/.gitignore create mode 100644 data/Iris.csv.dvc
Plik *.dvc
zawiera m.in. hash pliku. Więcej o plikach *.dvc
: link
# %load data/Iris.csv.dvc
outs:
- md5: 717820ef0af287ff346c5cabfb4c612c
size: 5107
path: Iris.csv
Oryginalny plik Iris.csv
został przeniesiony do katalogu ./dvc/cache/{wartość hash pliku) i podlinkowany z powrotem do oryginalnej lokalizacji. Sposób tworzenia linków może być różny w zależności od systemu plików.
!ls -l .dvc/cache/71
total 8 -r--r--r-- 1 tomek tomek 5107 Sep 19 2019 7820ef0af287ff346c5cabfb4c612c
!head -n 3 .dvc/cache/71/7820ef0af287ff346c5cabfb4c612c
Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species 1,5.1,3.5,1.4,0.2,Iris-setosa 2,4.9,3.0,1.4,0.2,Iris-setosa
!git remote add origin git@git.wmi.amu.edu.pl:tzietkiewicz/sample-ml-project.git
!git push --set-upstream origin main
Enumerating objects: 11, done. Counting objects: 100% (11/11), done. Delta compression using up to 4 threads Compressing objects: 100% (8/8), done. Writing objects: 100% (11/11), 889 bytes | 889.00 KiB/s, done. Total 11 (delta 1), reused 0 (delta 0), pack-reused 0 remote: remote: Create a new pull request for 'main':[K remote: https://git.wmi.amu.edu.pl/tzietkiewicz/sample-ml-project/compare/master...main[K remote: remote: . Processing 1 references[K remote: Processed 1 references in total[K To git.wmi.amu.edu.pl:tzietkiewicz/sample-ml-project.git * [new branch] main -> main Branch 'main' set up to track remote branch 'main' from 'origin'.
dvc remote
- żeby wysłać właściwe pliki śledzone przez DVC do zdalnej lokalizacji (z której będą mogłby być pobrane np. przez system CI albo innych użytkowników) musimy mieć skonfigurowaną taką lokazliację
- służy do tego polecenie
dvc remote add
- użyjemy lokalnego "remote". Tutaj będzie to po prostu utworzony wcześniej katalog
/dvcstore
. Taki katalog istnieje też na naszym Jenkinsie, oczywiście należy go podmontować w Dockerze - w realnych zastosowaniach podalibyśmy tutaj ścieżkę do jakiegoś zasobu dostępnego przez inernet jak np. serwer SFTP, ścieżka do AWS S3 itp.
Obsługiwane typy zdalnych lokalizacji (remotes): https://dvc.org/doc/command-reference/remote/add#supported-storage-types
- Amazon S3
- S3-compatible storage
- Microsoft Azure Blob Storage
- Google Drive
- Google Cloud Storage
- Aliyun OSS
- SSH
- HDFS
- WebHDFS
- HTTP
- WebDAV
- local remote
Dodawanie remote typu local
!dvc remote add -d my_local_remote /dvcstore
Setting 'my_local_remote' as a default remote.
[0m
!git status
On branch master nothing to commit, working tree clean
!git add .dvc/config
!git commit -m "Added DVC remote"
On branch main nothing to commit, working tree clean
dvc push
Kiedy mamy już skonfigurowany "remote" możemy wypchnąć do niego pliki korzystając z polecenia dvc push
:
!dvc push
0% Transferring| |0/1 [00:00<?, ?file/s] ![A 0%| |7820ef0af287ff346c5cabfb4c612c 0.00/? [00:00<?, ?B/s][A 0%| |7820ef0af287ff346c5cabfb4c612c 0.00/4.99k [00:00<?, ?B/s][A 1 file pushed [A [0m
!tree /dvcstore
[34;42m/dvcstore[0m └── [01;34m71[0m └── 7820ef0af287ff346c5cabfb4c612c 1 directory, 1 file
dvc pull
Żeby pobrać dane z DVC (np. w innej lokalizacji, przez innego użytkownika), musimy:
- sklonować repozytorium git (żeby m.in. pobrać pliki
*.dvc
- wykonać
dvc pull
Dodawanie nowych plików i modyfikacja istniejących wygląda podobnie jak przy zwykłych plikach śledzonych przez git, tylko zamiast git
używamy polecenia dvc
a dodatkowo pamiętamy o zarządzaniu plikami *.dvc
za pomocą gita:
!head -n -1 data/Iris.csv | sponge data/Iris.csv
!git status
On branch master nothing to commit, working tree clean
!dvc status
data/Iris.csv.dvc: core[39m> changed outs: modified: data/Iris.csv [0m
!dvc add data/Iris.csv
[2K[32m⠹[0m Checking graph [32m⠋[0m Checking graph Adding... ![A 0% Checking cache in '/home/tomek/repos/aitech-ium/IUM_10/sample-ml-project/.d[A [A ![A 0%| |Transferring 0/1 [00:00<?, ?file/s][A [A ![A 0%| |.AquCc93WCb2aAJ98voTeFG.tmp 0.00/? [00:00<?, ?B/s][A 0%| |.AquCc93WCb2aAJ98voTeFG.tmp 0.00/4.00 [00:00<?, ?B/s][A [A ![A 0%| |4705c4d470a4d9dd152808e5e9f56f 0.00/? [00:00<?, ?B/s][A 0%| |4705c4d470a4d9dd152808e5e9f56f 0.00/4.92k [00:00<?, ?B/s][A 100% Adding...|████████████████████████████████████████|1/1 [00:00, 11.09file/s][A To track the changes with git, run: git add data/Iris.csv.dvc To enable auto staging, run: dvc config core.autostage true [0m
!git add data/Iris.csv.dvc
!git commit -m "Removed last line from Iris dataset"
[master 5379e3b] Removed last line from Iris dataset 1 file changed, 2 insertions(+), 2 deletions(-)
dvc checkout
- Polecenia
dvc checkout
używamy razem zgit checkout
, żeby zmienić branch, na którym pracujemy. - DVC podmieni wersje plików śledzonych przez siebie na pochodzące z innego brancha (o ile pliki te się różnią i różnią się pliki
*.dvc
w odpowiednich branchach - zmiana brancha przez git powoduje (ewentualną) zmianę plików
*.dvc
advc checkout
kopiuje/linkuje pliki z katalogu.dvc/cache
o wartościach hash odpowiadających tym z plików*.dvc
Wymiana danych między projektami
- za pomocą poleceń
dvc import
idvc update
możemy dodać i później aktualizować pliki śledzone przez DVC w innym repozytorium
!dvc import https://github.com/iterative/dataset-registry \
get-started/data.xml -o data/data.xml
Importing 'get-started/data.xml (https://github.com/iterative/dataset-registry)' -> 'data/data.xml' 0% Downloading| |0/1 [00:00<?, ?file/s] ![A 0%| |get-started/data.xml 0.00/37.9M [00:00<?, ?it/s][A 0%| |get-started/data.xml 64.0k/36.1M [00:00<02:12, 286kB/s][A 0%| |get-started/data.xml 128k/36.1M [00:00<01:33, 403kB/s][A 1%| |get-started/data.xml 256k/36.1M [00:00<00:57, 658kB/s][A 1%| |get-started/data.xml 384k/36.1M [00:00<00:45, 818kB/s][A 1%|▏ |get-started/data.xml 512k/36.1M [00:00<00:53, 693kB/s][A 2%|▏ |get-started/data.xml 640k/36.1M [00:01<00:57, 644kB/s][A 2%|▏ |get-started/data.xml 768k/36.1M [00:01<00:59, 619kB/s][A 2%|▏ |get-started/data.xml 896k/36.1M [00:01<00:51, 718kB/s][A 3%|▎ |get-started/data.xml 1.00M/36.1M [00:01<00:55, 666kB/s][A 3%|▎ |get-started/data.xml 1.12M/36.1M [00:01<00:57, 633kB/s][A 3%|▎ |get-started/data.xml 1.25M/36.1M [00:02<00:57, 638kB/s][A 4%|▍ |get-started/data.xml 1.38M/36.1M [00:02<00:52, 698kB/s][A 4%|▍ |get-started/data.xml 1.50M/36.1M [00:02<00:55, 656kB/s][A 4%|▍ |get-started/data.xml 1.62M/36.1M [00:02<00:57, 628kB/s][A 5%|▍ |get-started/data.xml 1.69M/36.1M [00:02<00:58, 618kB/s][A 5%|▌ |get-started/data.xml 1.81M/36.1M [00:02<00:53, 675kB/s][A 5%|▌ |get-started/data.xml 1.94M/36.1M [00:03<00:53, 672kB/s][A 6%|▌ |get-started/data.xml 2.06M/36.1M [00:03<00:55, 642kB/s][A 6%|▌ |get-started/data.xml 2.12M/36.1M [00:03<00:56, 628kB/s][A 6%|▌ |get-started/data.xml 2.19M/36.1M [00:03<00:57, 616kB/s][A 6%|▌ |get-started/data.xml 2.25M/36.1M [00:03<00:58, 606kB/s][A 7%|▋ |get-started/data.xml 2.38M/36.1M [00:03<00:48, 732kB/s][A 7%|▋ |get-started/data.xml 2.50M/36.1M [00:04<00:52, 666kB/s][A 7%|▋ |get-started/data.xml 2.62M/36.1M [00:04<00:55, 636kB/s][A 8%|▊ |get-started/data.xml 2.75M/36.1M [00:04<00:56, 614kB/s][A 8%|▊ |get-started/data.xml 2.88M/36.1M [00:04<00:49, 711kB/s][A 8%|▊ |get-started/data.xml 3.00M/36.1M [00:04<00:52, 663kB/s][A 9%|▊ |get-started/data.xml 3.12M/36.1M [00:05<00:54, 637kB/s][A 9%|▉ |get-started/data.xml 3.25M/36.1M [00:05<00:55, 623kB/s][A 9%|▉ |get-started/data.xml 3.38M/36.1M [00:05<00:48, 710kB/s][A 10%|▉ |get-started/data.xml 3.50M/36.1M [00:05<00:51, 664kB/s][A 10%|█ |get-started/data.xml 3.62M/36.1M [00:05<00:45, 751kB/s][A 10%|█ |get-started/data.xml 3.75M/36.1M [00:05<00:49, 691kB/s][A 11%|█ |get-started/data.xml 3.88M/36.1M [00:06<00:43, 777kB/s][A 11%|█ |get-started/data.xml 4.00M/36.1M [00:06<00:47, 705kB/s][A 11%|█▏ |get-started/data.xml 4.12M/36.1M [00:06<00:42, 790kB/s][A 12%|█▏ |get-started/data.xml 4.25M/36.1M [00:06<00:46, 716kB/s][A 12%|█▏ |get-started/data.xml 4.38M/36.1M [00:06<00:44, 749kB/s][A 12%|█▏ |get-started/data.xml 4.50M/36.1M [00:07<00:45, 734kB/s][A 13%|█▎ |get-started/data.xml 4.62M/36.1M [00:07<00:40, 810kB/s][A 13%|█▎ |get-started/data.xml 4.75M/36.1M [00:07<00:42, 773kB/s][A 13%|█▎ |get-started/data.xml 4.88M/36.1M [00:07<00:41, 795kB/s][A 14%|█▍ |get-started/data.xml 5.00M/36.1M [00:07<00:37, 870kB/s][A 14%|█▍ |get-started/data.xml 5.12M/36.1M [00:07<00:34, 932kB/s][A 15%|█▍ |get-started/data.xml 5.25M/36.1M [00:07<00:35, 916kB/s][A 15%|█▍ |get-started/data.xml 5.38M/36.1M [00:08<00:35, 898kB/s][A 15%|█▌ |get-started/data.xml 5.50M/36.1M [00:08<00:33, 962kB/s][A 16%|█▌ |get-started/data.xml 5.62M/36.1M [00:08<00:33, 949kB/s][A 16%|█▌ |get-started/data.xml 5.75M/36.1M [00:08<00:31, 1.00MB/s][A 16%|█▋ |get-started/data.xml 5.88M/36.1M [00:08<00:30, 1.04MB/s][A 17%|█▋ |get-started/data.xml 6.06M/36.1M [00:08<00:26, 1.19MB/s][A 17%|█▋ |get-started/data.xml 6.19M/36.1M [00:08<00:26, 1.19MB/s][A 17%|█▋ |get-started/data.xml 6.31M/36.1M [00:08<00:26, 1.19MB/s][A 18%|█▊ |get-started/data.xml 6.50M/36.1M [00:08<00:23, 1.31MB/s][A 18%|█▊ |get-started/data.xml 6.62M/36.1M [00:09<00:23, 1.30MB/s][A 19%|█▉ |get-started/data.xml 6.81M/36.1M [00:09<00:21, 1.41MB/s][A 19%|█▉ |get-started/data.xml 7.00M/36.1M [00:09<00:20, 1.48MB/s][A 20%|█▉ |get-started/data.xml 7.19M/36.1M [00:09<00:19, 1.54MB/s][A 20%|██ |get-started/data.xml 7.38M/36.1M [00:09<00:18, 1.60MB/s][A 21%|██ |get-started/data.xml 7.56M/36.1M [00:09<00:18, 1.62MB/s][A 21%|██▏ |get-started/data.xml 7.75M/36.1M [00:09<00:17, 1.68MB/s][A 22%|██▏ |get-started/data.xml 7.94M/36.1M [00:09<00:17, 1.70MB/s][A 22%|██▏ |get-started/data.xml 8.12M/36.1M [00:10<00:17, 1.72MB/s][A 23%|██▎ |get-started/data.xml 8.38M/36.1M [00:10<00:15, 1.88MB/s][A 24%|██▎ |get-started/data.xml 8.56M/36.1M [00:10<00:15, 1.84MB/s][A 24%|██▍ |get-started/data.xml 8.81M/36.1M [00:10<00:14, 1.96MB/s][A 25%|██▌ |get-started/data.xml 9.06M/36.1M [00:10<00:13, 2.06MB/s][A 26%|██▌ |get-started/data.xml 9.31M/36.1M [00:10<00:13, 2.14MB/s][A 27%|██▋ |get-started/data.xml 9.62M/36.1M [00:10<00:11, 2.32MB/s][A 27%|██▋ |get-started/data.xml 9.88M/36.1M [00:10<00:11, 2.33MB/s][A 28%|██▊ |get-started/data.xml 10.2M/36.1M [00:10<00:11, 2.46MB/s][A 29%|██▉ |get-started/data.xml 10.4M/36.1M [00:11<00:10, 2.45MB/s][A 30%|██▉ |get-started/data.xml 10.8M/36.1M [00:11<00:10, 2.57MB/s][A 31%|███ |get-started/data.xml 11.1M/36.1M [00:11<00:09, 2.67MB/s][A 32%|███▏ |get-started/data.xml 11.4M/36.1M [00:11<00:09, 2.84MB/s][A 33%|███▎ |get-started/data.xml 11.8M/36.1M [00:11<00:08, 2.85MB/s][A 34%|███▎ |get-started/data.xml 12.1M/36.1M [00:11<00:08, 3.01MB/s][A 35%|███▍ |get-started/data.xml 12.5M/36.1M [00:11<00:07, 3.12MB/s][A 36%|███▌ |get-started/data.xml 12.9M/36.1M [00:11<00:07, 3.22MB/s][A 37%|███▋ |get-started/data.xml 13.2M/36.1M [00:11<00:07, 3.31MB/s][A 38%|███▊ |get-started/data.xml 13.7M/36.1M [00:12<00:06, 3.49MB/s][A 39%|███▉ |get-started/data.xml 14.1M/36.1M [00:12<00:06, 3.62MB/s][A 40%|████ |get-started/data.xml 14.6M/36.1M [00:12<00:06, 3.74MB/s][A 42%|████▏ |get-started/data.xml 15.0M/36.1M [00:12<00:05, 3.82MB/s][A 43%|████▎ |get-started/data.xml 15.4M/36.1M [00:12<00:05, 3.97MB/s][A 44%|████▍ |get-started/data.xml 15.9M/36.1M [00:12<00:05, 4.08MB/s][A 45%|████▌ |get-started/data.xml 16.4M/36.1M [00:12<00:04, 4.23MB/s][A 47%|████▋ |get-started/data.xml 17.0M/36.1M [00:12<00:04, 4.44MB/s][A 48%|████▊ |get-started/data.xml 17.5M/36.1M [00:12<00:04, 4.52MB/s][A 50%|████▉ |get-started/data.xml 18.1M/36.1M [00:13<00:04, 4.69MB/s][A 52%|█████▏ |get-started/data.xml 18.6M/36.1M [00:13<00:03, 4.84MB/s][A 53%|█████▎ |get-started/data.xml 19.2M/36.1M [00:13<00:03, 5.05MB/s][A 55%|█████▍ |get-started/data.xml 19.8M/36.1M [00:13<00:03, 5.16MB/s][A 57%|█████▋ |get-started/data.xml 20.4M/36.1M [00:13<00:03, 5.35MB/s][A 58%|█████▊ |get-started/data.xml 21.1M/36.1M [00:13<00:02, 5.49MB/s][A 60%|██████ |get-started/data.xml 21.8M/36.1M [00:13<00:02, 5.66MB/s][A 62%|██████▏ |get-started/data.xml 22.4M/36.1M [00:13<00:02, 5.83MB/s][A 64%|██████▍ |get-started/data.xml 23.2M/36.1M [00:14<00:02, 6.05MB/s][A 66%|██████▌ |get-started/data.xml 23.9M/36.1M [00:14<00:02, 6.20MB/s][A 68%|██████▊ |get-started/data.xml 24.6M/36.1M [00:14<00:01, 6.40MB/s][A 70%|███████ |get-started/data.xml 25.4M/36.1M [00:14<00:01, 6.51MB/s][A 72%|███████▏ |get-started/data.xml 26.0M/36.1M [00:14<00:01, 5.75MB/s][A 74%|███████▎ |get-started/data.xml 26.6M/36.1M [00:14<00:02, 4.26MB/s][A 75%|███████▍ |get-started/data.xml 27.1M/36.1M [00:14<00:02, 3.53MB/s][A 76%|███████▌ |get-started/data.xml 27.5M/36.1M [00:15<00:02, 3.26MB/s][A 77%|███████▋ |get-started/data.xml 27.9M/36.1M [00:15<00:02, 3.00MB/s][A 78%|███████▊ |get-started/data.xml 28.2M/36.1M [00:15<00:02, 2.95MB/s][A 79%|███████▉ |get-started/data.xml 28.5M/36.1M [00:15<00:02, 2.91MB/s][A 80%|███████▉ |get-started/data.xml 28.8M/36.1M [00:15<00:02, 2.88MB/s][A 81%|████████ |get-started/data.xml 29.1M/36.1M [00:15<00:02, 2.86MB/s][A 81%|████████▏ |get-started/data.xml 29.4M/36.1M [00:15<00:02, 2.84MB/s][A 82%|████████▏ |get-started/data.xml 29.8M/36.1M [00:16<00:02, 2.83MB/s][A 83%|████████▎ |get-started/data.xml 30.1M/36.1M [00:16<00:02, 2.83MB/s][A 84%|████████▍ |get-started/data.xml 30.4M/36.1M [00:16<00:02, 2.83MB/s][A 85%|████████▍ |get-started/data.xml 30.7M/36.1M [00:16<00:02, 2.83MB/s][A 86%|████████▌ |get-started/data.xml 31.0M/36.1M [00:16<00:01, 2.83MB/s][A 87%|████████▋ |get-started/data.xml 31.3M/36.1M [00:16<00:01, 2.83MB/s][A 88%|████████▊ |get-started/data.xml 31.6M/36.1M [00:16<00:01, 2.83MB/s][A 88%|████████▊ |get-started/data.xml 31.9M/36.1M [00:16<00:01, 2.84MB/s][A 89%|████████▉ |get-started/data.xml 32.2M/36.1M [00:16<00:01, 2.85MB/s][A 90%|█████████ |get-started/data.xml 32.6M/36.1M [00:17<00:01, 2.85MB/s][A 91%|█████████ |get-started/data.xml 32.9M/36.1M [00:17<00:01, 2.86MB/s][A 92%|█████████▏|get-started/data.xml 33.2M/36.1M [00:17<00:01, 2.86MB/s][A 93%|█████████▎|get-started/data.xml 33.5M/36.1M [00:17<00:00, 2.87MB/s][A 94%|█████████▎|get-started/data.xml 33.8M/36.1M [00:17<00:00, 2.87MB/s][A 94%|█████████▍|get-started/data.xml 34.1M/36.1M [00:17<00:00, 2.87MB/s][A 95%|█████████▌|get-started/data.xml 34.4M/36.1M [00:17<00:00, 2.87MB/s][A 96%|█████████▌|get-started/data.xml 34.8M/36.1M [00:17<00:00, 2.87MB/s][A 97%|█████████▋|get-started/data.xml 35.1M/36.1M [00:17<00:00, 2.87MB/s][A 98%|█████████▊|get-started/data.xml 35.4M/36.1M [00:18<00:00, 2.87MB/s][A 99%|█████████▉|get-started/data.xml 35.7M/36.1M [00:18<00:00, 2.88MB/s][A 100%|█████████▉|get-started/data.xml 36.0M/36.1M [00:18<00:00, 2.87MB/s][A [A To track the changes with git, run: git add data/.gitignore data/data.xml.dvc [0m
!dvc status
Data and pipelines are up to date.
[0m
ls -l data
total 37020 -rw-rw-r-- 1 tomek tomek 37891850 maj 31 11:10 data.xml -rw-rw-r-- 1 tomek tomek 284 maj 31 11:10 data.xml.dvc -rw-rw-r-- 1 tomek tomek 5072 maj 31 11:01 Iris.csv -rw-rw-r-- 1 tomek tomek 76 maj 31 11:01 Iris.csv.dvc
# %load data/data.xml.dvc
md5: a7cd139231cc35ed63541ce3829b96db
frozen: true
deps:
- path: get-started/data.xml
repo:
url: https://github.com/iterative/dataset-registry
rev_lock: ba014f40e29670421a67cb1c47543f402348aa13
outs:
- md5: a304afb96060aad90176268345e10355
size: 37891850
path: data.xml
DVC pipelines
- wprowadzenie: https://youtu.be/71IGzyH95UY
- Getting started: https://dvc.org/doc/start/data-pipelines
- dvc pipelines pozwala nam zbudować (za pomocą polecenie
dvc run
) lub zdefiniować (edytując plikdvc.yaml
) graf zależności między krokami wykonywanymi w naszym projekcie (takimi jak "przygotowanie danych", "trenowanie", "ewaluacja") - tak zdefiniowany pipeline można potem uruchomić za pomocą polecenia
dvc reproduce
Zadania [10+5 pkt]
- Zainicjalizuj repozytorium DVC wewnątrz Twojego repozytorium z projektem [1pkt]
- Dodaj plik(i) z danymi w Twoim projekcie do DVC [1pkt]
- Skonfiguruj remote (dane do konfiguracji podane poniżej) [3pkt]
- Stwórz/zdefiniuj i dodaj do repozytorium plik
dvc.yaml
opisujący kroki wykonywane w Twoim projekcie. Wydziel przynajmniej 2 kroki (np. przygotowanie danych/trenowanie) powiązane ze sobą za pomocą zależności (skorzystaj z materiałów "Getting started", link powyżej) [5pkt (opcjonalne)] - Stwórz projekt na Jenkinsie (
s1233456-dvc
), w którym sklonujesz repozytorium, ściągniesz pliki dvc (za pomocądvc pull
) i uruchomisz pipeline (za pomocądvc reproduce
) [5pkt]
SSH remote
Jednym z remote obsługiwanych przez DVC jest SFTP/SSH.
W celu jego wykorzystania na serwerze tzietkiewicz.vm.wmi.amu.edu.pl utworzony został użytkownik ium-sftp
i skonfigurowany serwer SFTP.
Został też dla niego wygenerowany klucz ssh, który został dodany jako "Jenkins credential" (patrz opis konfiguracji na Jenkins poniżej)
Lokalnie
Będziemy potrzebować zależności (szczegóły)
conda install dvc-ssh
albo
pip install dvc[ssh] paramiko
conda install -c conda-forge dvc-ssh
Collecting package metadata (current_repodata.json): done Solving environment: done ## Package Plan ## environment location: /home/tomek/miniconda3 added / updated specs: - dvc-ssh The following packages will be downloaded: package | build ---------------------------|----------------- bcrypt-3.2.0 | py39h3811e60_1 44 KB conda-forge ca-certificates-2021.5.30 | ha878542_0 136 KB conda-forge certifi-2021.5.30 | py39hf3d152e_0 141 KB conda-forge dvc-2.3.0 | py39hf3d152e_0 542 KB conda-forge dvc-ssh-2.3.0 | py39hf3d152e_0 9 KB conda-forge fsspec-2021.5.0 | pyhd8ed1ab_0 77 KB conda-forge invoke-1.5.0 | pyhd3deb0d_0 137 KB conda-forge paramiko-2.7.2 | pyh9f0ad1d_0 135 KB conda-forge pynacl-1.4.0 | py39h3811e60_2 1.3 MB conda-forge ------------------------------------------------------------ Total: 2.5 MB The following NEW packages will be INSTALLED: bcrypt conda-forge/linux-64::bcrypt-3.2.0-py39h3811e60_1 dvc-ssh conda-forge/linux-64::dvc-ssh-2.3.0-py39hf3d152e_0 invoke conda-forge/noarch::invoke-1.5.0-pyhd3deb0d_0 paramiko conda-forge/noarch::paramiko-2.7.2-pyh9f0ad1d_0 pynacl conda-forge/linux-64::pynacl-1.4.0-py39h3811e60_2 The following packages will be UPDATED: ca-certificates 2020.12.5-ha878542_0 --> 2021.5.30-ha878542_0 certifi 2020.12.5-py39hf3d152e_1 --> 2021.5.30-py39hf3d152e_0 dvc 2.1.0-py39hf3d152e_0 --> 2.3.0-py39hf3d152e_0 fsspec 0.9.0-pyhd8ed1ab_2 --> 2021.5.0-pyhd8ed1ab_0 Downloading and Extracting Packages certifi-2021.5.30 | 141 KB | ##################################### | 100% fsspec-2021.5.0 | 77 KB | ##################################### | 100% dvc-2.3.0 | 542 KB | ##################################### | 100% invoke-1.5.0 | 137 KB | ##################################### | 100% paramiko-2.7.2 | 135 KB | ##################################### | 100% bcrypt-3.2.0 | 44 KB | ##################################### | 100% pynacl-1.4.0 | 1.3 MB | ##################################### | 100% dvc-ssh-2.3.0 | 9 KB | ##################################### | 100% ca-certificates-2021 | 136 KB | ##################################### | 100% Preparing transaction: done Verifying transaction: done Executing transaction: done Note: you may need to restart the kernel to use updated packages.
Dodajemy remote:
!dvc remote add -f -d ium_ssh_remote ssh://ium-sftp@tzietkiewicz.vm.wmi.amu.edu.pl
Setting 'ium_ssh_remote' as a default remote.
[0m
!dvc remote list
ium_ssh_remote ssh://ium-sftp@tzietkiewicz.vm.wmi.amu.edu.pl
[0m
Zapisujemy hasło:
!dvc remote modify --local ium_ssh_remote password IUM@2021
[0m
Pushujemy do skonfigurowanego remote:
!dvc push
0% Transferring| |0/1 [00:00<?, ?file/s] ![A 0%| |4705c4d470a4d9dd152808e5e9f56f 0.00/? [00:00<?, ?B/s][A 0%| |4705c4d470a4d9dd152808e5e9f56f 0.00/4.92k [00:00<?, ?B/s][A 1 file pushed [A [0m
Jenkins
W Jenkins można użyć mechanizmu "Credentials", żeby w bezpieczny sposób przekazać hasło albo klucz prywatny.
Takie dane dla użytkownika ium-sftp zostały stworzone na Jenkinsie:
- typu ssh key: https://tzietkiewicz.vm.wmi.amu.edu.pl:8081/credentials/store/system/domain/_/credential/48ac7004-216e-4260-abba-1fe5db753e18/
- typu "secret text" - zawierający hasło użytkownika ium-shftp: https://tzietkiewicz.vm.wmi.amu.edu.pl:8081/credentials/store/system/domain/_/credential/ium-sftp-password/
Opis używania "Credentials" w Jenkinsfile: https://www.jenkins.io/doc/book/pipeline/jenkinsfile/#for-other-credential-types
Klucza ssh można użyć tak:
withCredentials(
[sshUserPrivateKey(credentialsId: '48ac7004-216e-4260-abba-1fe5db753e18', keyFileVariable: 'IUM_SFTP_KEY', passphraseVariable: '', usernameVariable: '')]) {
sh 'dvc remote add -d ium_ssh_remote ssh://ium-sftp@tzietkiewicz.vm.wmi.amu.edu.pl/ium-sftp'
sh 'dvc remote modify --local ium_ssh_remote keyfile $IUM_SFTP_KEY'
sh 'dvc pull'}
Secret text tak:
withCredentials([string(credentialsId: 'ium-sftp-password', variable: 'IUM_SFTP_PASS')]) {
sh 'dvc remote add -d ium_ssh_remote ssh://ium-sftp@tzietkiewicz.vm.wmi.amu.edu.pl/ium-sftp'
sh 'dvc remote modify --local ium_ssh_remote password $IUM_SFTP_KEY'
sh 'dvc pull'
}
Przykład konfiguracji: