1
0
forked from pms/ium
ium/IUM_10.DVC.ipynb
Paweł Skórzewski 8ac25d0bd7 Lab. 10
2024-05-22 08:38:20 +02:00

87 KiB
Raw Blame History

Inżynieria uczenia maszynowego

22 maja 2024

10. DVC

DVC - Data Version Control

  • dvc.org
  • "Version Control System for Machine Learning Projects" (System kontroli wersji dla projektów uczenia maszynowego)
  • Open Source
  • Umożliwia:
    • wersjonowanie danych i modeli. "Git dla danych i modeli"
    • budowanie potoków ("pipeline") definiujących jak budować/trenować/ewaluować modele. "Makefile dla uczenia maszynowego"
    • śledzenie, porównywanie metryk i parametrów
  • ściśle zintegowany z gitem
  • działa niezależnie od używanego języka/bibliotek i systemu operacyjnego
  • 5-minutowe wprowadzenie: https://www.youtube.com/watch?v=UbL7VUpv1Bs

Śledzenie plików za pomocą DVC

  • dużymi plikami, takimi jak plikami z danymi wejściowymi czy plikami modeli, trudno zarządza się za pomocą gita, ze względu na problemy z:
  • Git posiada rozszerzenie lfs(Large File Storage), które stanowi pewne rozwiązanie tego problemu.
    • Same pliki przechowywane są na specjalnym zdalnym serwerze, w repozytorium przechowywane są jedynie odnośniki do tych plików i pewne metadane
    • Github ma zintegrowany LFS z limitem 1GB dla kont bezpłatnych
  • DVC proponuje podobne podejście co LFS, ale:
    • pliki mogą być przechowywane na niemal dowolnym serwerze, również lokalnie
    • brak limitu wielkości plików (w Git-LFS na Github limit 2GB)
    • DVC zapewnia dodatkowe narzędzie umożliwiające śledzenie plików i ich powiązań z wynikami eksperymentów
    • więcej, patrz tutaj

Instalacja i inicjalizacja

!pip3 install dvc
Requirement already satisfied: dvc in ./venv/lib/python3.10/site-packages (3.50.2)
Requirement already satisfied: attrs>=22.2.0 in ./venv/lib/python3.10/site-packages (from dvc) (23.2.0)
Requirement already satisfied: psutil>=5.8 in ./venv/lib/python3.10/site-packages (from dvc) (5.9.8)
Requirement already satisfied: zc.lockfile>=1.2.1 in ./venv/lib/python3.10/site-packages (from dvc) (3.0.post1)
Requirement already satisfied: ruamel.yaml>=0.17.11 in ./venv/lib/python3.10/site-packages (from dvc) (0.18.6)
Requirement already satisfied: dvc-http>=2.29.0 in ./venv/lib/python3.10/site-packages (from dvc) (2.32.0)
Requirement already satisfied: shortuuid>=0.5 in ./venv/lib/python3.10/site-packages (from dvc) (1.0.13)
Requirement already satisfied: platformdirs<4,>=3.1.1 in ./venv/lib/python3.10/site-packages (from dvc) (3.11.0)
Requirement already satisfied: scmrepo<4,>=3.3.2 in ./venv/lib/python3.10/site-packages (from dvc) (3.3.5)
Requirement already satisfied: pygtrie>=2.3.2 in ./venv/lib/python3.10/site-packages (from dvc) (2.5.0)
Requirement already satisfied: dvc-data<3.16,>=3.15 in ./venv/lib/python3.10/site-packages (from dvc) (3.15.1)
Requirement already satisfied: fsspec in ./venv/lib/python3.10/site-packages (from dvc) (2024.5.0)
Requirement already satisfied: dvc-objects in ./venv/lib/python3.10/site-packages (from dvc) (5.1.0)
Requirement already satisfied: grandalf<1,>=0.7 in ./venv/lib/python3.10/site-packages (from dvc) (0.8)
Requirement already satisfied: hydra-core>=1.1 in ./venv/lib/python3.10/site-packages (from dvc) (1.3.2)
Requirement already satisfied: tabulate>=0.8.7 in ./venv/lib/python3.10/site-packages (from dvc) (0.9.0)
Requirement already satisfied: colorama>=0.3.9 in ./venv/lib/python3.10/site-packages (from dvc) (0.4.6)
Requirement already satisfied: tomlkit>=0.11.1 in ./venv/lib/python3.10/site-packages (from dvc) (0.12.5)
Requirement already satisfied: rich>=12 in ./venv/lib/python3.10/site-packages (from dvc) (13.7.1)
Requirement already satisfied: dvc-render<2,>=1.0.1 in ./venv/lib/python3.10/site-packages (from dvc) (1.0.2)
Requirement already satisfied: flufl.lock<8,>=5 in ./venv/lib/python3.10/site-packages (from dvc) (7.1.1)
Requirement already satisfied: packaging>=19 in ./venv/lib/python3.10/site-packages (from dvc) (24.0)
Requirement already satisfied: pyparsing>=2.4.7 in ./venv/lib/python3.10/site-packages (from dvc) (3.1.2)
Requirement already satisfied: flatten-dict<1,>=0.4.1 in ./venv/lib/python3.10/site-packages (from dvc) (0.4.2)
Requirement already satisfied: dulwich in ./venv/lib/python3.10/site-packages (from dvc) (0.22.1)
Requirement already satisfied: configobj>=5.0.6 in ./venv/lib/python3.10/site-packages (from dvc) (5.0.8)
Requirement already satisfied: dvc-studio-client<1,>=0.20 in ./venv/lib/python3.10/site-packages (from dvc) (0.20.0)
Requirement already satisfied: omegaconf in ./venv/lib/python3.10/site-packages (from dvc) (2.3.0)
Requirement already satisfied: distro>=1.3 in ./venv/lib/python3.10/site-packages (from dvc) (1.9.0)
Requirement already satisfied: dpath<3,>=2.1.0 in ./venv/lib/python3.10/site-packages (from dvc) (2.1.6)
Requirement already satisfied: funcy>=1.14 in ./venv/lib/python3.10/site-packages (from dvc) (2.0)
Requirement already satisfied: dvc-task<1,>=0.3.0 in ./venv/lib/python3.10/site-packages (from dvc) (0.4.0)
Requirement already satisfied: voluptuous>=0.11.7 in ./venv/lib/python3.10/site-packages (from dvc) (0.14.2)
Requirement already satisfied: kombu in ./venv/lib/python3.10/site-packages (from dvc) (5.3.7)
Requirement already satisfied: shtab<2,>=1.3.4 in ./venv/lib/python3.10/site-packages (from dvc) (1.7.1)
Requirement already satisfied: iterative-telemetry>=0.0.7 in ./venv/lib/python3.10/site-packages (from dvc) (0.0.8)
Requirement already satisfied: celery in ./venv/lib/python3.10/site-packages (from dvc) (5.4.0)
Requirement already satisfied: pathspec>=0.10.3 in ./venv/lib/python3.10/site-packages (from dvc) (0.12.1)
Requirement already satisfied: requests>=2.22 in ./venv/lib/python3.10/site-packages (from dvc) (2.31.0)
Requirement already satisfied: tqdm<5,>=4.63.1 in ./venv/lib/python3.10/site-packages (from dvc) (4.66.2)
Requirement already satisfied: gto<2,>=1.6.0 in ./venv/lib/python3.10/site-packages (from dvc) (1.7.1)
Requirement already satisfied: networkx>=2.5 in ./venv/lib/python3.10/site-packages (from dvc) (3.3)
Requirement already satisfied: pydot>=1.2.4 in ./venv/lib/python3.10/site-packages (from dvc) (2.0.0)
Requirement already satisfied: six in ./venv/lib/python3.10/site-packages (from configobj>=5.0.6->dvc) (1.16.0)
Requirement already satisfied: sqltrie<1,>=0.11.0 in ./venv/lib/python3.10/site-packages (from dvc-data<3.16,>=3.15->dvc) (0.11.0)
Requirement already satisfied: dictdiffer>=0.8.1 in ./venv/lib/python3.10/site-packages (from dvc-data<3.16,>=3.15->dvc) (0.9.0)
Requirement already satisfied: diskcache>=5.2.1 in ./venv/lib/python3.10/site-packages (from dvc-data<3.16,>=3.15->dvc) (5.6.3)
Requirement already satisfied: aiohttp-retry>=2.5.0 in ./venv/lib/python3.10/site-packages (from dvc-http>=2.29.0->dvc) (2.8.3)
Requirement already satisfied: billiard<5.0,>=4.2.0 in ./venv/lib/python3.10/site-packages (from celery->dvc) (4.2.0)
Requirement already satisfied: python-dateutil>=2.8.2 in ./venv/lib/python3.10/site-packages (from celery->dvc) (2.9.0.post0)
Requirement already satisfied: click-plugins>=1.1.1 in ./venv/lib/python3.10/site-packages (from celery->dvc) (1.1.1)
Requirement already satisfied: click-repl>=0.2.0 in ./venv/lib/python3.10/site-packages (from celery->dvc) (0.3.0)
Requirement already satisfied: vine<6.0,>=5.1.0 in ./venv/lib/python3.10/site-packages (from celery->dvc) (5.1.0)
Requirement already satisfied: tzdata>=2022.7 in ./venv/lib/python3.10/site-packages (from celery->dvc) (2024.1)
Requirement already satisfied: click<9.0,>=8.1.2 in ./venv/lib/python3.10/site-packages (from celery->dvc) (8.1.7)
Requirement already satisfied: click-didyoumean>=0.3.0 in ./venv/lib/python3.10/site-packages (from celery->dvc) (0.3.1)
Requirement already satisfied: atpublic>=2.3 in ./venv/lib/python3.10/site-packages (from flufl.lock<8,>=5->dvc) (4.1.0)
Requirement already satisfied: pydantic!=2.0.0,<3,>=1.9.0 in ./venv/lib/python3.10/site-packages (from gto<2,>=1.6.0->dvc) (2.7.1)
Requirement already satisfied: semver>=2.13.0 in ./venv/lib/python3.10/site-packages (from gto<2,>=1.6.0->dvc) (3.0.2)
Requirement already satisfied: typer>=0.4.1 in ./venv/lib/python3.10/site-packages (from gto<2,>=1.6.0->dvc) (0.12.3)
Requirement already satisfied: entrypoints in ./venv/lib/python3.10/site-packages (from gto<2,>=1.6.0->dvc) (0.4)
Requirement already satisfied: antlr4-python3-runtime==4.9.* in ./venv/lib/python3.10/site-packages (from hydra-core>=1.1->dvc) (4.9.3)
Requirement already satisfied: appdirs in ./venv/lib/python3.10/site-packages (from iterative-telemetry>=0.0.7->dvc) (1.4.4)
Requirement already satisfied: filelock in ./venv/lib/python3.10/site-packages (from iterative-telemetry>=0.0.7->dvc) (3.14.0)
Requirement already satisfied: amqp<6.0.0,>=5.1.1 in ./venv/lib/python3.10/site-packages (from kombu->dvc) (5.2.0)
Requirement already satisfied: PyYAML>=5.1.0 in ./venv/lib/python3.10/site-packages (from omegaconf->dvc) (6.0.1)
Requirement already satisfied: certifi>=2017.4.17 in ./venv/lib/python3.10/site-packages (from requests>=2.22->dvc) (2024.2.2)
Requirement already satisfied: urllib3<3,>=1.21.1 in ./venv/lib/python3.10/site-packages (from requests>=2.22->dvc) (2.2.1)
Requirement already satisfied: charset-normalizer<4,>=2 in ./venv/lib/python3.10/site-packages (from requests>=2.22->dvc) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in ./venv/lib/python3.10/site-packages (from requests>=2.22->dvc) (3.6)
Requirement already satisfied: pygments<3.0.0,>=2.13.0 in ./venv/lib/python3.10/site-packages (from rich>=12->dvc) (2.17.2)
Requirement already satisfied: markdown-it-py>=2.2.0 in ./venv/lib/python3.10/site-packages (from rich>=12->dvc) (3.0.0)
Requirement already satisfied: ruamel.yaml.clib>=0.2.7 in ./venv/lib/python3.10/site-packages (from ruamel.yaml>=0.17.11->dvc) (0.2.8)
Requirement already satisfied: pygit2>=1.14.0 in ./venv/lib/python3.10/site-packages (from scmrepo<4,>=3.3.2->dvc) (1.15.0)
Requirement already satisfied: asyncssh<3,>=2.13.1 in ./venv/lib/python3.10/site-packages (from scmrepo<4,>=3.3.2->dvc) (2.14.2)
Requirement already satisfied: gitpython>3 in ./venv/lib/python3.10/site-packages (from scmrepo<4,>=3.3.2->dvc) (3.1.43)
Requirement already satisfied: setuptools in ./venv/lib/python3.10/site-packages (from zc.lockfile>=1.2.1->dvc) (59.6.0)
Requirement already satisfied: aiohttp in ./venv/lib/python3.10/site-packages (from aiohttp-retry>=2.5.0->dvc-http>=2.29.0->dvc) (3.9.5)
Requirement already satisfied: typing-extensions>=3.6 in ./venv/lib/python3.10/site-packages (from asyncssh<3,>=2.13.1->scmrepo<4,>=3.3.2->dvc) (4.11.0)
Requirement already satisfied: cryptography>=39.0 in ./venv/lib/python3.10/site-packages (from asyncssh<3,>=2.13.1->scmrepo<4,>=3.3.2->dvc) (42.0.7)
Requirement already satisfied: prompt-toolkit>=3.0.36 in ./venv/lib/python3.10/site-packages (from click-repl>=0.2.0->celery->dvc) (3.0.43)
Requirement already satisfied: gitdb<5,>=4.0.1 in ./venv/lib/python3.10/site-packages (from gitpython>3->scmrepo<4,>=3.3.2->dvc) (4.0.11)
Requirement already satisfied: mdurl~=0.1 in ./venv/lib/python3.10/site-packages (from markdown-it-py>=2.2.0->rich>=12->dvc) (0.1.2)
Requirement already satisfied: pydantic-core==2.18.2 in ./venv/lib/python3.10/site-packages (from pydantic!=2.0.0,<3,>=1.9.0->gto<2,>=1.6.0->dvc) (2.18.2)
Requirement already satisfied: annotated-types>=0.4.0 in ./venv/lib/python3.10/site-packages (from pydantic!=2.0.0,<3,>=1.9.0->gto<2,>=1.6.0->dvc) (0.7.0)
Requirement already satisfied: cffi>=1.16.0 in ./venv/lib/python3.10/site-packages (from pygit2>=1.14.0->scmrepo<4,>=3.3.2->dvc) (1.16.0)
Requirement already satisfied: orjson in ./venv/lib/python3.10/site-packages (from sqltrie<1,>=0.11.0->dvc-data<3.16,>=3.15->dvc) (3.10.3)
Requirement already satisfied: shellingham>=1.3.0 in ./venv/lib/python3.10/site-packages (from typer>=0.4.1->gto<2,>=1.6.0->dvc) (1.5.4)
Requirement already satisfied: async-timeout<5.0,>=4.0 in ./venv/lib/python3.10/site-packages (from aiohttp->aiohttp-retry>=2.5.0->dvc-http>=2.29.0->dvc) (4.0.3)
Requirement already satisfied: aiosignal>=1.1.2 in ./venv/lib/python3.10/site-packages (from aiohttp->aiohttp-retry>=2.5.0->dvc-http>=2.29.0->dvc) (1.3.1)
Requirement already satisfied: yarl<2.0,>=1.0 in ./venv/lib/python3.10/site-packages (from aiohttp->aiohttp-retry>=2.5.0->dvc-http>=2.29.0->dvc) (1.9.4)
Requirement already satisfied: multidict<7.0,>=4.5 in ./venv/lib/python3.10/site-packages (from aiohttp->aiohttp-retry>=2.5.0->dvc-http>=2.29.0->dvc) (6.0.5)
Requirement already satisfied: frozenlist>=1.1.1 in ./venv/lib/python3.10/site-packages (from aiohttp->aiohttp-retry>=2.5.0->dvc-http>=2.29.0->dvc) (1.4.1)
Requirement already satisfied: pycparser in ./venv/lib/python3.10/site-packages (from cffi>=1.16.0->pygit2>=1.14.0->scmrepo<4,>=3.3.2->dvc) (2.22)
Requirement already satisfied: smmap<6,>=3.0.1 in ./venv/lib/python3.10/site-packages (from gitdb<5,>=4.0.1->gitpython>3->scmrepo<4,>=3.3.2->dvc) (5.0.1)
Requirement already satisfied: wcwidth in ./venv/lib/python3.10/site-packages (from prompt-toolkit>=3.0.36->click-repl>=0.2.0->celery->dvc) (0.2.13)

Stwórzmy katalog, w którym będziemy przechowywać nasz projekt:

!rm -r -f IUM_10/sample-ml-project-2024
!mkdir -p IUM_10/sample-ml-project-2024
#Jupyter notebook magic https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-cd
%cd "IUM_10/sample-ml-project-2024"
/home/pawel/ium/IUM_10/sample-ml-project-2024
/home/pawel/ium/venv/lib/python3.10/site-packages/IPython/core/magics/osm.py:417: UserWarning: This is now an optional IPython functionality, setting dhist requires you to install the `pickleshare` library.
  self.shell.db['dhist'] = compress_dhist(dhist)[-100:]

Inicjalizujemy puste repozytorium Git (możemy też pominąć ten krok i działać w istniejącym już repozytorium)

!git init
Reinitialized existing Git repository in /home/pawel/ium/IUM_10/sample-ml-project-2024/.git/

Teraz inicjalizujemy repozytorium DVC:

!dvc init
Initialized DVC repository.

You can now commit the changes to git.

+---------------------------------------------------------------------+
|                                                                     |
|        DVC has enabled anonymous aggregate usage analytics.         |
|     Read the analytics documentation (and how to opt-out) here:     |
|             <https://dvc.org/doc/user-guide/analytics>              |
|                                                                     |
+---------------------------------------------------------------------+

What's next?
------------
- Check out the documentation: <https://dvc.org/doc>
- Get help and share ideas: <https://dvc.org/chat>
- Star us on GitHub: <https://github.com/iterative/dvc>


Zobaczmy jakie pliki dodał (również do repozytorium git) DVC. Ich opis znajdziemy tutaj: https://dvc.org/doc/user-guide/project-structure/internal-files

!git status
On branch master

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
	new file:   .dvc/.gitignore
	new file:   .dvc/config
	new file:   .dvcignore

  • .dvc/config - główny plik konfiguracyjny DVC
  • .dvc/config.local - nadpisuje wartości z config, do lokalnych zmian niecommitowanych do repozytorium
  • .dvc/.gitignore - pliki DVC, które nie mają znaleźć się w repo
  • .dvcignore - DVC pomija pliki zdefiniowane w tym pliku (np. aby poprawić wydajność)

Możemy teraz zacommitować zmiany w git:

!git commit -m "Initial commit"
[master (root-commit) a9746ad] Initial commit
 3 files changed, 6 insertions(+)
 create mode 100644 .dvc/.gitignore
 create mode 100644 .dvc/config
 create mode 100644 .dvcignore

Przygotujmy przykładowe dane, pobierając je z Kaggle:

!kaggle datasets download -d uciml/iris
!unzip -o iris.zip
!rm database.sqlite iris.zip
!mkdir -p data
!mv Iris.csv data/
Downloading iris.zip to /home/pawel/ium/IUM_10/sample-ml-project-2024
  0%|                                               | 0.00/3.60k [00:00<?, ?B/s]
100%|██████████████████████████████████████| 3.60k/3.60k [00:00<00:00, 8.23MB/s]
Archive:  iris.zip
  inflating: Iris.csv                
  inflating: database.sqlite         

Teraz dodamy plik(i) z danymi do DVC:

!dvc add data/Iris.csv
[?25l⠋ Checking graph                                       core>
Adding...                                                                       
!
Collecting files and computing hashes in data/Iris.csv |0.00 [00:00,     ?file/s
                                                                                
!
  0% Checking cache in '/home/pawel/ium/IUM_10/sample-ml-project-2024/.dvc/cache
                                                                                
!
  0%|          |Adding data/Iris.csv to cache         0/1 [00:00<?,     ?file/s]
                                                                                
!
  0%|          |Checking out /home/pawel/ium/IUM_10/sa0/1 [00:00<?,    ?files/s]
100% Adding...|████████████████████████████████████████|1/1 [00:00, 31.90file/s]

To track the changes with git, run:

	git add data/.gitignore data/Iris.csv.dvc

To enable auto staging, run:

	dvc config core.autostage true

  • DVC utworzył plik data/Iris.csv.dvc i dodał oryginalny plik do .gitignore
  • W repozytorium będzie obecny tylko plik *.dvc, zawierający odnośnik do prawdziwego pliku
!git status -u
On branch master
Untracked files:
  (use "git add <file>..." to include in what will be committed)
	data/.gitignore
	data/Iris.csv.dvc

nothing added to commit but untracked files present (use "git add" to track)

Dodajmy pliki data/Iris.csv.dvc data/.gitignore do repozytorium git, zgodnie z sugestią DVC:

!git add data/Iris.csv.dvc data/.gitignore
!git commit -m "Dodano dane IRIS (DVC)"
[master 92b2c9d] Dodano dane IRIS (DVC)
 2 files changed, 6 insertions(+)
 create mode 100644 data/.gitignore
 create mode 100644 data/Iris.csv.dvc

Plik *.dvc zawiera m.in. hash pliku. Więcej o plikach *.dvc: link

# %load data/Iris.csv.dvc

Oryginalny plik Iris.csv został przeniesiony do katalogu ./dvc/cache/{wartość hash pliku) i podlinkowany z powrotem do oryginalnej lokalizacji. Sposób tworzenia linków może być różny w zależności od systemu plików.

!ls -l .dvc/cache/files/md5/71
total 8
-r--r--r-- 1 pawel pawel 5107 Sep 19  2019 7820ef0af287ff346c5cabfb4c612c
!head -n 3 .dvc/cache/files/md5/71/7820ef0af287ff346c5cabfb4c612c
Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
1,5.1,3.5,1.4,0.2,Iris-setosa
2,4.9,3.0,1.4,0.2,Iris-setosa

dvc remote

  • Żeby wysłać właściwe pliki śledzone przez DVC do zdalnej lokalizacji (z której będą mogłby być pobrane np. przez system CI albo innych użytkowników), musimy mieć skonfigurowaną taką lokazliację.
  • Służy do tego polecenie dvc remote add.
  • Użyjemy lokalnego "remote". Tutaj będzie to po prostu utworzony wcześniej katalog ~/dvcstore. Taki katalog istnieje też na naszym Jenkinsie, oczywiście należy go podmontować w Dockerze.
  • W rzeczywistych zastosowaniach podalibyśmy tutaj ścieżkę do jakiegoś zasobu dostępnego przez internet jak np. serwer SFTP, ścieżka do AWS S3 itp.

Obsługiwane typy zdalnych lokalizacji (remotes): https://dvc.org/doc/command-reference/remote/add#supported-storage-types

  • Amazon S3
  • S3-compatible storage
  • Microsoft Azure Blob Storage
  • Google Drive
  • Google Cloud Storage
  • Aliyun OSS
  • SSH
  • HDFS
  • WebHDFS
  • HTTP
  • WebDAV
  • local remote

Dodawanie remote typu local

!dvc remote add -d my_local_remote ~/dvcstore
Setting 'my_local_remote' as a default remote.

!git status
On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   .dvc/config

no changes added to commit (use "git add" and/or "git commit -a")
!git add .dvc/config
!git commit -m "Added DVC remote"
[master 7123494] Added DVC remote
 1 file changed, 1 insertion(+), 1 deletion(-)

dvc push

Kiedy mamy już skonfigurowany "remote" możemy wypchnąć do niego pliki korzystając z polecenia dvc push:

!dvc push
Collecting                                            |1.00 [00:00,  137entry/s]
Pushing
!
  0% Checking cache in '/home/pawel/dvcstore/files/md5'| |0/? [00:00<?,    ?file
                                                                                
!
  0% Checking cache in '/home/pawel/ium/IUM_10/sample-ml-project-2024/.dvc/cache
                                                                                
!
  0%|          |Pushing to local                      0/1 [00:00<?,     ?file/s]
Pushing                                                                         
1 file pushed

!tree ~/dvcstore
/home/pawel/dvcstore
└── files
    └── md5
        └── 71
            └── 7820ef0af287ff346c5cabfb4c612c

3 directories, 1 file

dvc pull

Żeby pobrać dane z DVC (np. w innej lokalizacji, przez innego użytkownika), musimy:

  • sklonować repozytorium git (żeby m.in. pobrać pliki *.dvc
  • wykonać dvc pull

Dodawanie nowych plików i modyfikacja istniejących wygląda podobnie jak przy zwykłych plikach śledzonych przez git, tylko zamiast git używamy polecenia dvc a dodatkowo pamiętamy o zarządzaniu plikami *.dvc za pomocą gita:

!head -n -1 data/Iris.csv | tee data/Iris.csv
Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
1,5.1,3.5,1.4,0.2,Iris-setosa
2,4.9,3.0,1.4,0.2,Iris-setosa
3,4.7,3.2,1.3,0.2,Iris-setosa
4,4.6,3.1,1.5,0.2,Iris-setosa
5,5.0,3.6,1.4,0.2,Iris-setosa
6,5.4,3.9,1.7,0.4,Iris-setosa
7,4.6,3.4,1.4,0.3,Iris-setosa
8,5.0,3.4,1.5,0.2,Iris-setosa
9,4.4,2.9,1.4,0.2,Iris-setosa
10,4.9,3.1,1.5,0.1,Iris-setosa
11,5.4,3.7,1.5,0.2,Iris-setosa
12,4.8,3.4,1.6,0.2,Iris-setosa
13,4.8,3.0,1.4,0.1,Iris-setosa
14,4.3,3.0,1.1,0.1,Iris-setosa
15,5.8,4.0,1.2,0.2,Iris-setosa
16,5.7,4.4,1.5,0.4,Iris-setosa
17,5.4,3.9,1.3,0.4,Iris-setosa
18,5.1,3.5,1.4,0.3,Iris-setosa
19,5.7,3.8,1.7,0.3,Iris-setosa
20,5.1,3.8,1.5,0.3,Iris-setosa
21,5.4,3.4,1.7,0.2,Iris-setosa
22,5.1,3.7,1.5,0.4,Iris-setosa
23,4.6,3.6,1.0,0.2,Iris-setosa
24,5.1,3.3,1.7,0.5,Iris-setosa
25,4.8,3.4,1.9,0.2,Iris-setosa
26,5.0,3.0,1.6,0.2,Iris-setosa
27,5.0,3.4,1.6,0.4,Iris-setosa
28,5.2,3.5,1.5,0.2,Iris-setosa
29,5.2,3.4,1.4,0.2,Iris-setosa
30,4.7,3.2,1.6,0.2,Iris-setosa
31,4.8,3.1,1.6,0.2,Iris-setosa
32,5.4,3.4,1.5,0.4,Iris-setosa
33,5.2,4.1,1.5,0.1,Iris-setosa
34,5.5,4.2,1.4,0.2,Iris-setosa
35,4.9,3.1,1.5,0.1,Iris-setosa
36,5.0,3.2,1.2,0.2,Iris-setosa
37,5.5,3.5,1.3,0.2,Iris-setosa
38,4.9,3.1,1.5,0.1,Iris-setosa
39,4.4,3.0,1.3,0.2,Iris-setosa
40,5.1,3.4,1.5,0.2,Iris-setosa
41,5.0,3.5,1.3,0.3,Iris-setosa
42,4.5,2.3,1.3,0.3,Iris-setosa
43,4.4,3.2,1.3,0.2,Iris-setosa
44,5.0,3.5,1.6,0.6,Iris-setosa
45,5.1,3.8,1.9,0.4,Iris-setosa
46,4.8,3.0,1.4,0.3,Iris-setosa
47,5.1,3.8,1.6,0.2,Iris-setosa
48,4.6,3.2,1.4,0.2,Iris-setosa
49,5.3,3.7,1.5,0.2,Iris-setosa
50,5.0,3.3,1.4,0.2,Iris-setosa
51,7.0,3.2,4.7,1.4,Iris-versicolor
52,6.4,3.2,4.5,1.5,Iris-versicolor
53,6.9,3.1,4.9,1.5,Iris-versicolor
54,5.5,2.3,4.0,1.3,Iris-versicolor
55,6.5,2.8,4.6,1.5,Iris-versicolor
56,5.7,2.8,4.5,1.3,Iris-versicolor
57,6.3,3.3,4.7,1.6,Iris-versicolor
58,4.9,2.4,3.3,1.0,Iris-versicolor
59,6.6,2.9,4.6,1.3,Iris-versicolor
60,5.2,2.7,3.9,1.4,Iris-versicolor
61,5.0,2.0,3.5,1.0,Iris-versicolor
62,5.9,3.0,4.2,1.5,Iris-versicolor
63,6.0,2.2,4.0,1.0,Iris-versicolor
64,6.1,2.9,4.7,1.4,Iris-versicolor
65,5.6,2.9,3.6,1.3,Iris-versicolor
66,6.7,3.1,4.4,1.4,Iris-versicolor
67,5.6,3.0,4.5,1.5,Iris-versicolor
68,5.8,2.7,4.1,1.0,Iris-versicolor
69,6.2,2.2,4.5,1.5,Iris-versicolor
70,5.6,2.5,3.9,1.1,Iris-versicolor
71,5.9,3.2,4.8,1.8,Iris-versicolor
72,6.1,2.8,4.0,1.3,Iris-versicolor
73,6.3,2.5,4.9,1.5,Iris-versicolor
74,6.1,2.8,4.7,1.2,Iris-versicolor
75,6.4,2.9,4.3,1.3,Iris-versicolor
76,6.6,3.0,4.4,1.4,Iris-versicolor
77,6.8,2.8,4.8,1.4,Iris-versicolor
78,6.7,3.0,5.0,1.7,Iris-versicolor
79,6.0,2.9,4.5,1.5,Iris-versicolor
80,5.7,2.6,3.5,1.0,Iris-versicolor
81,5.5,2.4,3.8,1.1,Iris-versicolor
82,5.5,2.4,3.7,1.0,Iris-versicolor
83,5.8,2.7,3.9,1.2,Iris-versicolor
84,6.0,2.7,5.1,1.6,Iris-versicolor
85,5.4,3.0,4.5,1.5,Iris-versicolor
86,6.0,3.4,4.5,1.6,Iris-versicolor
87,6.7,3.1,4.7,1.5,Iris-versicolor
88,6.3,2.3,4.4,1.3,Iris-versicolor
89,5.6,3.0,4.1,1.3,Iris-versicolor
90,5.5,2.5,4.0,1.3,Iris-versicolor
91,5.5,2.6,4.4,1.2,Iris-versicolor
92,6.1,3.0,4.6,1.4,Iris-versicolor
93,5.8,2.6,4.0,1.2,Iris-versicolor
94,5.0,2.3,3.3,1.0,Iris-versicolor
95,5.6,2.7,4.2,1.3,Iris-versicolor
96,5.7,3.0,4.2,1.2,Iris-versicolor
97,5.7,2.9,4.2,1.3,Iris-versicolor
98,6.2,2.9,4.3,1.3,Iris-versicolor
99,5.1,2.5,3.0,1.1,Iris-versicolor
100,5.7,2.8,4.1,1.3,Iris-versicolor
101,6.3,3.3,6.0,2.5,Iris-virginica
102,5.8,2.7,5.1,1.9,Iris-virginica
103,7.1,3.0,5.9,2.1,Iris-virginica
104,6.3,2.9,5.6,1.8,Iris-virginica
105,6.5,3.0,5.8,2.2,Iris-virginica
106,7.6,3.0,6.6,2.1,Iris-virginica
107,4.9,2.5,4.5,1.7,Iris-virginica
108,7.3,2.9,6.3,1.8,Iris-virginica
109,6.7,2.5,5.8,1.8,Iris-virginica
110,7.2,3.6,6.1,2.5,Iris-virginica
111,6.5,3.2,5.1,2.0,Iris-virginica
112,6.4,2.7,5.3,1.9,Iris-virginica
113,6.8,3.0,5.5,2.1,Iris-virginica
114,5.7,2.5,5.0,2.0,Iris-virginica
115,5.8,2.8,5.1,2.4,Iris-virginica
116,6.4,3.2,5.3,2.3,Iris-virginica
117,6.5,3.0,5.5,1.8,Iris-virginica
118,7.7,3.8,6.7,2.2,Iris-virginica
119,7.7,2.6,6.9,2.3,Iris-virginica
120,6.0,2.2,5.0,1.5,Iris-virginica
121,6.9,3.2,5.7,2.3,Iris-virginica
122,5.6,2.8,4.9,2.0,Iris-virginica
123,7.7,2.8,6.7,2.0,Iris-virginica
124,6.3,2.7,4.9,1.8,Iris-virginica
125,6.7,3.3,5.7,2.1,Iris-virginica
126,7.2,3.2,6.0,1.8,Iris-virginica
127,6.2,2.8,4.8,1.8,Iris-virginica
128,6.1,3.0,4.9,1.8,Iris-virginica
129,6.4,2.8,5.6,2.1,Iris-virginica
130,7.2,3.0,5.8,1.6,Iris-virginica
131,7.4,2.8,6.1,1.9,Iris-virginica
132,7.9,3.8,6.4,2.0,Iris-virginica
133,6.4,2.8,5.6,2.2,Iris-virginica
134,6.3,2.8,5.1,1.5,Iris-virginica
135,6.1,2.6,5.6,1.4,Iris-virginica
136,7.7,3.0,6.1,2.3,Iris-virginica
137,6.3,3.4,5.6,2.4,Iris-virginica
138,6.4,3.1,5.5,1.8,Iris-virginica
139,6.0,3.0,4.8,1.8,Iris-virginica
140,6.9,3.1,5.4,2.1,Iris-virginica
141,6.7,3.1,5.6,2.4,Iris-virginica
142,6.9,3.1,5.1,2.3,Iris-virginica
143,5.8,2.7,5.1,1.9,Iris-virginica
144,6.8,3.2,5.9,2.3,Iris-virginica
145,6.7,3.3,5.7,2.5,Iris-virginica
146,6.7,3.0,5.2,2.3,Iris-virginica
147,6.3,2.5,5.0,1.9,Iris-virginica
148,6.5,3.0,5.2,2.0,Iris-virginica
149,6.2,3.4,5.4,2.3,Iris-virginica
!git status
On branch master
nothing to commit, working tree clean
!dvc status
data/Iris.csv.dvc:                                                              
	changed outs:
		modified:           data/Iris.csv

!dvc add data/Iris.csv
[?25l⠋ Checking graph                                       core>
Adding...                                                                       
!
Collecting files and computing hashes in data/Iris.csv |0.00 [00:00,     ?file/s
                                                                                
!
  0% Checking cache in '/home/pawel/ium/IUM_10/sample-ml-project-2024/.dvc/cache
                                                                                
!
  0%|          |Adding data/Iris.csv to cache         0/1 [00:00<?,     ?file/s]
                                                                                
!
  0%|          |Checking out /home/pawel/ium/IUM_10/sa0/1 [00:00<?,    ?files/s]
100% Adding...|████████████████████████████████████████|1/1 [00:00, 50.81file/s]

To track the changes with git, run:

	git add data/Iris.csv.dvc

To enable auto staging, run:

	dvc config core.autostage true

!git add data/Iris.csv.dvc
!git commit -m "Removed last line from Iris dataset"
[master 9de24e1] Removed last line from Iris dataset
 1 file changed, 2 insertions(+), 2 deletions(-)
!wc -l .dvc/cache/files/md5/*/*
  151 .dvc/cache/files/md5/71/7820ef0af287ff346c5cabfb4c612c
  150 .dvc/cache/files/md5/bc/cff2e578d76852294184c1dce9fdbf
  301 total

dvc checkout

  • Polecenia dvc checkout używamy razem z git checkout, żeby zmienić gałąź, na której pracujemy.
  • DVC podmieni wersje plików śledzonych przez siebie na pochodzące z innej gałęzi (o ile pliki te się różnią i różnią się pliki *.dvc na odpowiednich gałęziach)
  • Zmiana gałęzi przez git powoduje (ewentualną) zmianę plików *.dvc a dvc checkout kopiuje/linkuje pliki z katalogu .dvc/cache o wartościach hash odpowiadających tym z plików *.dvc.

Wymiana danych między projektami

  • za pomocą poleceń dvc import i dvc update możemy dodać i później aktualizować pliki śledzone przez DVC w innym repozytorium
!dvc import https://github.com/iterative/dataset-registry \
             get-started/data.xml -o data/data.xml
Importing 'get-started/data.xml (https://github.com/iterative/dataset-registry)' -> 'data/data.xml'
  0% Downloading data.xml|                           |0/1 [00:00<?,    ?files/s]
!
  0%|          |get-started/data.xml           0.00/13.8M [00:00<?,        ?B/s]
  0%|          |get-started/data.xml      16.5k/13.8M [00:00<01:47,     135kB/s]
  0%|          |get-started/data.xml      66.5k/13.8M [00:00<00:48,     294kB/s]
  1%|          |get-started/data.xml       102k/13.8M [00:00<00:48,     295kB/s]
  2%|▏         |get-started/data.xml       221k/13.8M [00:00<00:25,     566kB/s]
  3%|▎         |get-started/data.xml       374k/13.8M [00:00<00:17,     813kB/s]
  3%|▎         |get-started/data.xml       493k/13.8M [00:00<00:15,     873kB/s]
  5%|▍         |get-started/data.xml       697k/13.8M [00:00<00:12,    1.10MB/s]
  6%|▌         |get-started/data.xml       799k/13.8M [00:01<00:13,    1.02MB/s]
  7%|▋         |get-started/data.xml       935k/13.8M [00:01<00:12,    1.05MB/s]
  8%|▊         |get-started/data.xml      1.05M/13.8M [00:01<00:12,    1.07MB/s]
  8%|▊         |get-started/data.xml      1.10M/13.8M [00:01<00:15,     872kB/s]
  9%|▉         |get-started/data.xml      1.24M/13.8M [00:01<00:13,     991kB/s]
 10%|▉         |get-started/data.xml      1.38M/13.8M [00:01<00:12,    1.03MB/s]
 11%|█         |get-started/data.xml      1.51M/13.8M [00:01<00:12,    1.06MB/s]
 12%|█▏        |get-started/data.xml      1.66M/13.8M [00:01<00:11,    1.12MB/s]
 13%|█▎        |get-started/data.xml      1.77M/13.8M [00:02<00:11,    1.07MB/s]
 14%|█▍        |get-started/data.xml      1.91M/13.8M [00:02<00:11,    1.09MB/s]
 15%|█▍        |get-started/data.xml      2.04M/13.8M [00:02<00:11,    1.11MB/s]
 16%|█▌        |get-started/data.xml      2.19M/13.8M [00:02<00:10,    1.15MB/s]
 17%|█▋        |get-started/data.xml      2.32M/13.8M [00:02<00:10,    1.15MB/s]
 18%|█▊        |get-started/data.xml      2.42M/13.8M [00:02<00:11,    1.04MB/s]
 18%|█▊        |get-started/data.xml      2.52M/13.8M [00:02<00:12,     958kB/s]
 19%|█▉        |get-started/data.xml      2.62M/13.8M [00:02<00:12,     923kB/s]
 20%|█▉        |get-started/data.xml      2.72M/13.8M [00:03<00:12,     895kB/s]
 20%|██        |get-started/data.xml      2.82M/13.8M [00:03<00:12,     917kB/s]
 21%|██        |get-started/data.xml      2.92M/13.8M [00:03<00:12,     888kB/s]
 22%|██▏       |get-started/data.xml      3.00M/13.8M [00:03<00:14,     802kB/s]
 23%|██▎       |get-started/data.xml      3.10M/13.8M [00:03<00:13,     847kB/s]
 23%|██▎       |get-started/data.xml      3.20M/13.8M [00:03<00:13,     847kB/s]
 24%|██▍       |get-started/data.xml      3.30M/13.8M [00:03<00:12,     857kB/s]
 25%|██▍       |get-started/data.xml      3.40M/13.8M [00:03<00:12,     850kB/s]
 25%|██▌       |get-started/data.xml      3.49M/13.8M [00:04<00:14,     768kB/s]
 26%|██▌       |get-started/data.xml      3.59M/13.8M [00:04<00:12,     830kB/s]
 27%|██▋       |get-started/data.xml      3.69M/13.8M [00:04<00:12,     834kB/s]
 27%|██▋       |get-started/data.xml      3.78M/13.8M [00:04<00:12,     841kB/s]
 28%|██▊       |get-started/data.xml      3.90M/13.8M [00:04<00:12,     841kB/s]
 29%|██▉       |get-started/data.xml      4.00M/13.8M [00:04<00:12,     845kB/s]
 30%|██▉       |get-started/data.xml      4.10M/13.8M [00:04<00:11,     890kB/s]
 30%|███       |get-started/data.xml      4.20M/13.8M [00:04<00:11,     895kB/s]
 31%|███▏      |get-started/data.xml      4.32M/13.8M [00:04<00:11,     860kB/s]
 32%|███▏      |get-started/data.xml      4.42M/13.8M [00:05<00:10,     900kB/s]
 33%|███▎      |get-started/data.xml      4.53M/13.8M [00:05<00:11,     880kB/s]
 33%|███▎      |get-started/data.xml      4.58M/13.8M [00:05<00:14,     682kB/s]
 34%|███▎      |get-started/data.xml      4.65M/13.8M [00:05<00:14,     641kB/s]
 34%|███▍      |get-started/data.xml      4.68M/13.8M [00:05<00:17,     539kB/s]
 34%|███▍      |get-started/data.xml      4.71M/13.8M [00:05<00:20,     465kB/s]
 34%|███▍      |get-started/data.xml      4.75M/13.8M [00:05<00:22,     412kB/s]
 35%|███▍      |get-started/data.xml      4.80M/13.8M [00:06<00:22,     414kB/s]
 35%|███▌      |get-started/data.xml      4.85M/13.8M [00:06<00:22,     417kB/s]
 35%|███▌      |get-started/data.xml      4.88M/13.8M [00:06<00:24,     376kB/s]
 36%|███▌      |get-started/data.xml      4.93M/13.8M [00:06<00:23,     389kB/s]
 36%|███▌      |get-started/data.xml      4.96M/13.8M [00:06<00:25,     357kB/s]
 36%|███▋      |get-started/data.xml      5.01M/13.8M [00:06<00:24,     376kB/s]
 37%|███▋      |get-started/data.xml      5.06M/13.8M [00:06<00:23,     388kB/s]
 37%|███▋      |get-started/data.xml      5.10M/13.8M [00:06<00:25,     356kB/s]
 37%|███▋      |get-started/data.xml      5.15M/13.8M [00:07<00:24,     375kB/s]
 38%|███▊      |get-started/data.xml      5.18M/13.8M [00:07<00:25,     347kB/s]
 38%|███▊      |get-started/data.xml      5.25M/13.8M [00:07<00:21,     409kB/s]
 38%|███▊      |get-started/data.xml      5.28M/13.8M [00:07<00:24,     371kB/s]
 39%|███▊      |get-started/data.xml      5.33M/13.8M [00:07<00:22,     387kB/s]
 39%|███▉      |get-started/data.xml      5.38M/13.8M [00:07<00:22,     394kB/s]
 39%|███▉      |get-started/data.xml      5.43M/13.8M [00:07<00:21,     405kB/s]
 40%|███▉      |get-started/data.xml      5.48M/13.8M [00:07<00:21,     410kB/s]
 40%|████      |get-started/data.xml      5.53M/13.8M [00:08<00:20,     413kB/s]
 41%|████      |get-started/data.xml      5.59M/13.8M [00:08<00:18,     455kB/s]
 41%|████      |get-started/data.xml      5.64M/13.8M [00:08<00:19,     442kB/s]
 41%|████▏     |get-started/data.xml      5.69M/13.8M [00:08<00:19,     434kB/s]
 42%|████▏     |get-started/data.xml      5.74M/13.8M [00:08<00:19,     426kB/s]
 42%|████▏     |get-started/data.xml      5.79M/13.8M [00:08<00:19,     422kB/s]
 43%|████▎     |get-started/data.xml      5.86M/13.8M [00:08<00:18,     458kB/s]
 43%|████▎     |get-started/data.xml      5.91M/13.8M [00:08<00:18,     447kB/s]
 43%|████▎     |get-started/data.xml      5.96M/13.8M [00:09<00:19,     431kB/s]
 44%|████▎     |get-started/data.xml      6.02M/13.8M [00:09<00:17,     464kB/s]
 44%|████▍     |get-started/data.xml      6.06M/13.8M [00:09<00:19,     412kB/s]
 44%|████▍     |get-started/data.xml      6.13M/13.8M [00:09<00:17,     455kB/s]
 45%|████▍     |get-started/data.xml      6.16M/13.8M [00:09<00:19,     404kB/s]
 45%|████▌     |get-started/data.xml      6.24M/13.8M [00:09<00:16,     492kB/s]
 46%|████▌     |get-started/data.xml      6.29M/13.8M [00:09<00:16,     470kB/s]
 46%|████▌     |get-started/data.xml      6.34M/13.8M [00:09<00:17,     455kB/s]
 47%|████▋     |get-started/data.xml      6.41M/13.8M [00:10<00:15,     486kB/s]
 47%|████▋     |get-started/data.xml      6.47M/13.8M [00:10<00:15,     507kB/s]
 47%|████▋     |get-started/data.xml      6.52M/13.8M [00:10<00:15,     483kB/s]
 48%|████▊     |get-started/data.xml      6.57M/13.8M [00:10<00:15,     486kB/s]
 48%|████▊     |get-started/data.xml      6.64M/13.8M [00:10<00:15,     488kB/s]
 49%|████▊     |get-started/data.xml      6.71M/13.8M [00:10<00:14,     509kB/s]
 49%|████▉     |get-started/data.xml      6.77M/13.8M [00:10<00:14,     523kB/s]
 50%|████▉     |get-started/data.xml      6.86M/13.8M [00:10<00:12,     576kB/s]
 50%|█████     |get-started/data.xml      6.92M/13.8M [00:11<00:12,     569kB/s]
 51%|█████     |get-started/data.xml      7.01M/13.8M [00:11<00:11,     607kB/s]
 51%|█████▏    |get-started/data.xml      7.07M/13.8M [00:11<00:11,     592kB/s]
 52%|█████▏    |get-started/data.xml      7.14M/13.8M [00:11<00:11,     582kB/s]
 52%|█████▏    |get-started/data.xml      7.20M/13.8M [00:11<00:12,     574kB/s]
 53%|█████▎    |get-started/data.xml      7.25M/13.8M [00:11<00:12,     528kB/s]
 53%|█████▎    |get-started/data.xml      7.32M/13.8M [00:11<00:12,     537kB/s]
 54%|█████▎    |get-started/data.xml      7.40M/13.8M [00:11<00:11,     585kB/s]
 54%|█████▍    |get-started/data.xml      7.50M/13.8M [00:12<00:09,     658kB/s]
 55%|█████▍    |get-started/data.xml      7.57M/13.8M [00:12<00:10,     629kB/s]
 56%|█████▌    |get-started/data.xml      7.65M/13.8M [00:12<00:09,     651kB/s]
 56%|█████▌    |get-started/data.xml      7.74M/13.8M [00:12<00:09,     667kB/s]
 57%|█████▋    |get-started/data.xml      7.80M/13.8M [00:12<00:09,     637kB/s]
 57%|█████▋    |get-started/data.xml      7.90M/13.8M [00:12<00:08,     698kB/s]
 58%|█████▊    |get-started/data.xml      8.00M/13.8M [00:12<00:08,     739kB/s]
 59%|█████▉    |get-started/data.xml      8.10M/13.8M [00:12<00:07,     765kB/s]
 60%|█████▉    |get-started/data.xml      8.20M/13.8M [00:13<00:07,     791kB/s]
 60%|██████    |get-started/data.xml      8.33M/13.8M [00:13<00:06,     889kB/s]
 61%|██████▏   |get-started/data.xml      8.45M/13.8M [00:13<00:06,     901kB/s]
 62%|██████▏   |get-started/data.xml      8.55M/13.8M [00:13<00:06,     893kB/s]
 63%|██████▎   |get-started/data.xml      8.70M/13.8M [00:13<00:05,     987kB/s]
 64%|██████▍   |get-started/data.xml      8.81M/13.8M [00:13<00:05,    1.00MB/s]
 65%|██████▌   |get-started/data.xml      8.96M/13.8M [00:13<00:04,    1.05MB/s]
 66%|██████▌   |get-started/data.xml      9.11M/13.8M [00:13<00:04,    1.12MB/s]
 67%|██████▋   |get-started/data.xml      9.26M/13.8M [00:14<00:04,    1.16MB/s]
 68%|██████▊   |get-started/data.xml      9.43M/13.8M [00:14<00:03,    1.24MB/s]
 70%|██████▉   |get-started/data.xml      9.60M/13.8M [00:14<00:03,    1.29MB/s]
 71%|███████   |get-started/data.xml      9.76M/13.8M [00:14<00:03,    1.36MB/s]
 72%|███████▏  |get-started/data.xml      9.94M/13.8M [00:14<00:02,    1.42MB/s]
 74%|███████▎  |get-started/data.xml      10.1M/13.8M [00:14<00:02,    1.45MB/s]
 75%|███████▌  |get-started/data.xml      10.3M/13.8M [00:14<00:02,    1.53MB/s]
 77%|███████▋  |get-started/data.xml      10.6M/13.8M [00:14<00:02,    1.61MB/s]
 78%|███████▊  |get-started/data.xml      10.8M/13.8M [00:15<00:01,    1.68MB/s]
 80%|███████▉  |get-started/data.xml      11.0M/13.8M [00:15<00:01,    1.77MB/s]
 82%|████████▏ |get-started/data.xml      11.2M/13.8M [00:15<00:01,    1.89MB/s]
 83%|████████▎ |get-started/data.xml      11.5M/13.8M [00:15<00:01,    1.96MB/s]
 85%|████████▌ |get-started/data.xml      11.8M/13.8M [00:15<00:01,    1.98MB/s]
 87%|████████▋ |get-started/data.xml      12.0M/13.8M [00:15<00:00,    2.13MB/s]
 89%|████████▉ |get-started/data.xml      12.3M/13.8M [00:15<00:00,    2.20MB/s]
 91%|█████████▏|get-started/data.xml      12.6M/13.8M [00:15<00:00,    2.30MB/s]
 94%|█████████▎|get-started/data.xml      12.9M/13.8M [00:15<00:00,    2.37MB/s]
 96%|█████████▌|get-started/data.xml      13.2M/13.8M [00:16<00:00,    2.50MB/s]
 97%|█████████▋|get-started/data.xml      13.4M/13.8M [00:16<00:00,    2.01MB/s]
 98%|█████████▊|get-started/data.xml      13.5M/13.8M [00:16<00:00,    1.73MB/s]
100%|██████████|get-started/data.xml      13.8M/13.8M [00:16<00:00,    1.81MB/s]
                                                                                
To track the changes with git, run:

	git add data/data.xml.dvc data/.gitignore

To enable auto staging, run:

	dvc config core.autostage true

!dvc status
Data and pipelines are up to date.                                              

ls -l data
total 14124
-rw-r--r-- 1 pawel pawel     5072 May 22 07:57 Iris.csv
-rw-r--r-- 1 pawel pawel       88 May 22 07:57 Iris.csv.dvc
-rw-r--r-- 1 pawel pawel 14445097 May 22 07:59 data.xml
-rw-r--r-- 1 pawel pawel      296 May 22 07:59 data.xml.dvc

DVC pipelines

  • Wprowadzenie: https://youtu.be/71IGzyH95UY
  • Getting started: https://dvc.org/doc/start/data-pipelines
  • DVC pipelines pozwalają zbudować (za pomocą polecenia dvc run) lub zdefiniować (edytując plik dvc.yaml) graf zależności między krokami wykonywanymi w naszym projekcie (takimi jak "przygotowanie danych", "uczenie", "ewaluacja").
  • Tak zdefiniowany pipeline można potem uruchomić za pomocą polecenia dvc reproduce.

Zadania [5 pkt + dodatkowo 10 pkt]

Termin: 29 maja 2024

  1. Zainicjalizuj repozytorium DVC wewnątrz Twojego repozytorium z projektem [1pkt]
  2. Dodaj plik(i) z danymi w Twoim projekcie do DVC [1pkt]
  3. Skonfiguruj remote (dane do konfiguracji podane poniżej) [3pkt]
  4. [Dodatkowo] Stwórz/zdefiniuj i dodaj do repozytorium plik dvc.yaml opisujący kroki wykonywane w Twoim projekcie. Wydziel przynajmniej 2 kroki (np. przygotowanie danych/trenowanie) powiązane ze sobą za pomocą zależności (skorzystaj z materiałów "Getting started", link powyżej) [10pkt (opcjonalne)]

SSH remote

Jednym z remote obsługiwanych przez DVC jest SFTP/SSH. W celu jego wykorzystania na serwerze tzietkiewicz.vm.wmi.amu.edu.pl utworzony został użytkownik ium-sftp i skonfigurowany serwer SFTP. Został też dla niego wygenerowany klucz ssh, który został dodany jako "Jenkins credential" (patrz opis konfiguracji na Jenkins poniżej)

Lokalnie

Będziemy potrzebować zależności (szczegóły)

conda install dvc-ssh

albo

pip install dvc[ssh] paramiko

# conda install -c conda-forge dvc-ssh

!pip install dvc[ssh] paramiko
Requirement already satisfied: dvc[ssh] in /home/pawel/ium/venv/lib/python3.10/site-packages (3.50.2)
Collecting paramiko
  Downloading paramiko-3.4.0-py3-none-any.whl (225 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 225.9/225.9 KB 4.1 MB/s eta 0:00:00a 0:00:01
[?25hRequirement already satisfied: ruamel.yaml>=0.17.11 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (0.18.6)
Requirement already satisfied: fsspec in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (2024.5.0)
Requirement already satisfied: dvc-studio-client<1,>=0.20 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (0.20.0)
Requirement already satisfied: tomlkit>=0.11.1 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (0.12.5)
Requirement already satisfied: dvc-objects in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (5.1.0)
Requirement already satisfied: distro>=1.3 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (1.9.0)
Requirement already satisfied: pygtrie>=2.3.2 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (2.5.0)
Requirement already satisfied: voluptuous>=0.11.7 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (0.14.2)
Requirement already satisfied: attrs>=22.2.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (23.2.0)
Requirement already satisfied: dvc-http>=2.29.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (2.32.0)
Requirement already satisfied: rich>=12 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (13.7.1)
Requirement already satisfied: dulwich in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (0.22.1)
Requirement already satisfied: pyparsing>=2.4.7 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (3.1.2)
Requirement already satisfied: shortuuid>=0.5 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (1.0.13)
Requirement already satisfied: flufl.lock<8,>=5 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (7.1.1)
Requirement already satisfied: kombu in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (5.3.7)
Requirement already satisfied: iterative-telemetry>=0.0.7 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (0.0.8)
Requirement already satisfied: dpath<3,>=2.1.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (2.1.6)
Requirement already satisfied: colorama>=0.3.9 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (0.4.6)
Requirement already satisfied: celery in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (5.4.0)
Requirement already satisfied: packaging>=19 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (24.0)
Requirement already satisfied: tabulate>=0.8.7 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (0.9.0)
Requirement already satisfied: shtab<2,>=1.3.4 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (1.7.1)
Requirement already satisfied: scmrepo<4,>=3.3.2 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (3.3.5)
Requirement already satisfied: dvc-render<2,>=1.0.1 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (1.0.2)
Requirement already satisfied: gto<2,>=1.6.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (1.7.1)
Requirement already satisfied: pydot>=1.2.4 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (2.0.0)
Requirement already satisfied: psutil>=5.8 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (5.9.8)
Requirement already satisfied: configobj>=5.0.6 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (5.0.8)
Requirement already satisfied: funcy>=1.14 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (2.0)
Requirement already satisfied: grandalf<1,>=0.7 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (0.8)
Requirement already satisfied: dvc-task<1,>=0.3.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (0.4.0)
Requirement already satisfied: requests>=2.22 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (2.31.0)
Requirement already satisfied: zc.lockfile>=1.2.1 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (3.0.post1)
Requirement already satisfied: flatten-dict<1,>=0.4.1 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (0.4.2)
Requirement already satisfied: networkx>=2.5 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (3.3)
Requirement already satisfied: pathspec>=0.10.3 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (0.12.1)
Requirement already satisfied: hydra-core>=1.1 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (1.3.2)
Requirement already satisfied: omegaconf in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (2.3.0)
Requirement already satisfied: tqdm<5,>=4.63.1 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (4.66.2)
Requirement already satisfied: dvc-data<3.16,>=3.15 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (3.15.1)
Requirement already satisfied: platformdirs<4,>=3.1.1 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (3.11.0)
Collecting dvc-ssh<5,>=4
  Downloading dvc_ssh-4.1.1-py3-none-any.whl (15 kB)
Collecting bcrypt>=3.2
  Downloading bcrypt-4.1.3-cp39-abi3-manylinux_2_28_x86_64.whl (283 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 283.7/283.7 KB 12.6 MB/s eta 0:00:00
[?25hCollecting pynacl>=1.5
  Downloading PyNaCl-1.5.0-cp36-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (856 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 856.7/856.7 KB 14.2 MB/s eta 0:00:00a 0:00:01
[?25hRequirement already satisfied: cryptography>=3.3 in /home/pawel/ium/venv/lib/python3.10/site-packages (from paramiko) (42.0.7)
Requirement already satisfied: six in /home/pawel/ium/venv/lib/python3.10/site-packages (from configobj>=5.0.6->dvc[ssh]) (1.16.0)
Requirement already satisfied: cffi>=1.12 in /home/pawel/ium/venv/lib/python3.10/site-packages (from cryptography>=3.3->paramiko) (1.16.0)
Requirement already satisfied: diskcache>=5.2.1 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc-data<3.16,>=3.15->dvc[ssh]) (5.6.3)
Requirement already satisfied: dictdiffer>=0.8.1 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc-data<3.16,>=3.15->dvc[ssh]) (0.9.0)
Requirement already satisfied: sqltrie<1,>=0.11.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc-data<3.16,>=3.15->dvc[ssh]) (0.11.0)
Requirement already satisfied: aiohttp-retry>=2.5.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc-http>=2.29.0->dvc[ssh]) (2.8.3)
Collecting sshfs[bcrypt]>=2023.4.1
  Downloading sshfs-2024.4.1-py3-none-any.whl (15 kB)
Requirement already satisfied: billiard<5.0,>=4.2.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from celery->dvc[ssh]) (4.2.0)
Requirement already satisfied: tzdata>=2022.7 in /home/pawel/ium/venv/lib/python3.10/site-packages (from celery->dvc[ssh]) (2024.1)
Requirement already satisfied: python-dateutil>=2.8.2 in /home/pawel/ium/venv/lib/python3.10/site-packages (from celery->dvc[ssh]) (2.9.0.post0)
Requirement already satisfied: vine<6.0,>=5.1.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from celery->dvc[ssh]) (5.1.0)
Requirement already satisfied: click-plugins>=1.1.1 in /home/pawel/ium/venv/lib/python3.10/site-packages (from celery->dvc[ssh]) (1.1.1)
Requirement already satisfied: click-repl>=0.2.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from celery->dvc[ssh]) (0.3.0)
Requirement already satisfied: click<9.0,>=8.1.2 in /home/pawel/ium/venv/lib/python3.10/site-packages (from celery->dvc[ssh]) (8.1.7)
Requirement already satisfied: click-didyoumean>=0.3.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from celery->dvc[ssh]) (0.3.1)
Requirement already satisfied: atpublic>=2.3 in /home/pawel/ium/venv/lib/python3.10/site-packages (from flufl.lock<8,>=5->dvc[ssh]) (4.1.0)
Requirement already satisfied: semver>=2.13.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from gto<2,>=1.6.0->dvc[ssh]) (3.0.2)
Requirement already satisfied: pydantic!=2.0.0,<3,>=1.9.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from gto<2,>=1.6.0->dvc[ssh]) (2.7.1)
Requirement already satisfied: entrypoints in /home/pawel/ium/venv/lib/python3.10/site-packages (from gto<2,>=1.6.0->dvc[ssh]) (0.4)
Requirement already satisfied: typer>=0.4.1 in /home/pawel/ium/venv/lib/python3.10/site-packages (from gto<2,>=1.6.0->dvc[ssh]) (0.12.3)
Requirement already satisfied: antlr4-python3-runtime==4.9.* in /home/pawel/ium/venv/lib/python3.10/site-packages (from hydra-core>=1.1->dvc[ssh]) (4.9.3)
Requirement already satisfied: filelock in /home/pawel/ium/venv/lib/python3.10/site-packages (from iterative-telemetry>=0.0.7->dvc[ssh]) (3.14.0)
Requirement already satisfied: appdirs in /home/pawel/ium/venv/lib/python3.10/site-packages (from iterative-telemetry>=0.0.7->dvc[ssh]) (1.4.4)
Requirement already satisfied: amqp<6.0.0,>=5.1.1 in /home/pawel/ium/venv/lib/python3.10/site-packages (from kombu->dvc[ssh]) (5.2.0)
Requirement already satisfied: PyYAML>=5.1.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from omegaconf->dvc[ssh]) (6.0.1)
Requirement already satisfied: certifi>=2017.4.17 in /home/pawel/ium/venv/lib/python3.10/site-packages (from requests>=2.22->dvc[ssh]) (2024.2.2)
Requirement already satisfied: idna<4,>=2.5 in /home/pawel/ium/venv/lib/python3.10/site-packages (from requests>=2.22->dvc[ssh]) (3.6)
Requirement already satisfied: charset-normalizer<4,>=2 in /home/pawel/ium/venv/lib/python3.10/site-packages (from requests>=2.22->dvc[ssh]) (3.3.2)
Requirement already satisfied: urllib3<3,>=1.21.1 in /home/pawel/ium/venv/lib/python3.10/site-packages (from requests>=2.22->dvc[ssh]) (2.2.1)
Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from rich>=12->dvc[ssh]) (2.17.2)
Requirement already satisfied: markdown-it-py>=2.2.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from rich>=12->dvc[ssh]) (3.0.0)
Requirement already satisfied: ruamel.yaml.clib>=0.2.7 in /home/pawel/ium/venv/lib/python3.10/site-packages (from ruamel.yaml>=0.17.11->dvc[ssh]) (0.2.8)
Requirement already satisfied: pygit2>=1.14.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from scmrepo<4,>=3.3.2->dvc[ssh]) (1.15.0)
Requirement already satisfied: gitpython>3 in /home/pawel/ium/venv/lib/python3.10/site-packages (from scmrepo<4,>=3.3.2->dvc[ssh]) (3.1.43)
Requirement already satisfied: asyncssh<3,>=2.13.1 in /home/pawel/ium/venv/lib/python3.10/site-packages (from scmrepo<4,>=3.3.2->dvc[ssh]) (2.14.2)
Requirement already satisfied: setuptools in /home/pawel/ium/venv/lib/python3.10/site-packages (from zc.lockfile>=1.2.1->dvc[ssh]) (59.6.0)
Requirement already satisfied: aiohttp in /home/pawel/ium/venv/lib/python3.10/site-packages (from aiohttp-retry>=2.5.0->dvc-http>=2.29.0->dvc[ssh]) (3.9.5)
Requirement already satisfied: typing-extensions>=3.6 in /home/pawel/ium/venv/lib/python3.10/site-packages (from asyncssh<3,>=2.13.1->scmrepo<4,>=3.3.2->dvc[ssh]) (4.11.0)
Requirement already satisfied: pycparser in /home/pawel/ium/venv/lib/python3.10/site-packages (from cffi>=1.12->cryptography>=3.3->paramiko) (2.22)
Requirement already satisfied: prompt-toolkit>=3.0.36 in /home/pawel/ium/venv/lib/python3.10/site-packages (from click-repl>=0.2.0->celery->dvc[ssh]) (3.0.43)
Requirement already satisfied: gitdb<5,>=4.0.1 in /home/pawel/ium/venv/lib/python3.10/site-packages (from gitpython>3->scmrepo<4,>=3.3.2->dvc[ssh]) (4.0.11)
Requirement already satisfied: mdurl~=0.1 in /home/pawel/ium/venv/lib/python3.10/site-packages (from markdown-it-py>=2.2.0->rich>=12->dvc[ssh]) (0.1.2)
Requirement already satisfied: pydantic-core==2.18.2 in /home/pawel/ium/venv/lib/python3.10/site-packages (from pydantic!=2.0.0,<3,>=1.9.0->gto<2,>=1.6.0->dvc[ssh]) (2.18.2)
Requirement already satisfied: annotated-types>=0.4.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from pydantic!=2.0.0,<3,>=1.9.0->gto<2,>=1.6.0->dvc[ssh]) (0.7.0)
Requirement already satisfied: orjson in /home/pawel/ium/venv/lib/python3.10/site-packages (from sqltrie<1,>=0.11.0->dvc-data<3.16,>=3.15->dvc[ssh]) (3.10.3)
Requirement already satisfied: shellingham>=1.3.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from typer>=0.4.1->gto<2,>=1.6.0->dvc[ssh]) (1.5.4)
Requirement already satisfied: async-timeout<5.0,>=4.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from aiohttp->aiohttp-retry>=2.5.0->dvc-http>=2.29.0->dvc[ssh]) (4.0.3)
Requirement already satisfied: aiosignal>=1.1.2 in /home/pawel/ium/venv/lib/python3.10/site-packages (from aiohttp->aiohttp-retry>=2.5.0->dvc-http>=2.29.0->dvc[ssh]) (1.3.1)
Requirement already satisfied: multidict<7.0,>=4.5 in /home/pawel/ium/venv/lib/python3.10/site-packages (from aiohttp->aiohttp-retry>=2.5.0->dvc-http>=2.29.0->dvc[ssh]) (6.0.5)
Requirement already satisfied: yarl<2.0,>=1.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from aiohttp->aiohttp-retry>=2.5.0->dvc-http>=2.29.0->dvc[ssh]) (1.9.4)
Requirement already satisfied: frozenlist>=1.1.1 in /home/pawel/ium/venv/lib/python3.10/site-packages (from aiohttp->aiohttp-retry>=2.5.0->dvc-http>=2.29.0->dvc[ssh]) (1.4.1)
Requirement already satisfied: smmap<6,>=3.0.1 in /home/pawel/ium/venv/lib/python3.10/site-packages (from gitdb<5,>=4.0.1->gitpython>3->scmrepo<4,>=3.3.2->dvc[ssh]) (5.0.1)
Requirement already satisfied: wcwidth in /home/pawel/ium/venv/lib/python3.10/site-packages (from prompt-toolkit>=3.0.36->click-repl>=0.2.0->celery->dvc[ssh]) (0.2.13)
Installing collected packages: bcrypt, pynacl, paramiko, sshfs, dvc-ssh
Successfully installed bcrypt-4.1.3 dvc-ssh-4.1.1 paramiko-3.4.0 pynacl-1.5.0 sshfs-2024.4.1
## Poniższe są potrzebne, żeby polecania dvc remote działały:
!sudo apt install libssl3 libffi7
[sudo] password for pawel: 

Dodajemy remote:

!dvc remote add -f -d ium_ssh_remote ssh://ium-sftp@tzietkiewicz.vm.wmi.amu.edu.pl
Setting 'ium_ssh_remote' as a default remote.

!dvc remote list
my_local_remote	/home/pawel/dvcstore
ium_ssh_remote	ssh://ium-sftp@tzietkiewicz.vm.wmi.amu.edu.pl


Zapisujemy hasło:

!dvc remote modify --local ium_ssh_remote password IUM@2021


Pushujemy do skonfigurowanego remote:

!dvc push
Collecting                                            |1.00 [00:00,  252entry/s]
Pushing
!
  0% Checking cache in 'files/md5'|                  |0/? [00:00<?,    ?files/s]
                                                                                
!
  0% Checking cache in '/home/pawel/ium/IUM_10/sample-ml-project-2024/.dvc/cache
                                                                                
!
  0%|          |Pushing to ssh                        0/1 [00:00<?,     ?file/s]

!

  0%|          |/home/pawel/ium/IUM_10/sample-m0.00/4.95k [00:00<?,        ?B/s]

                                                                                
100%|██████████|Pushing to ssh                    1/1 [00:00<00:00,  8.63file/s]
Pushing                                                                         
1 file pushed


Jenkins

W Jenkins można użyć mechanizmu "Credentials", żeby w bezpieczny sposób przekazać hasło albo klucz prywatny.

Takie dane dla użytkownika ium-sftp zostały stworzone na Jenkinsie:

Opis używania "Credentials" w Jenkinsfile: https://www.jenkins.io/doc/book/pipeline/jenkinsfile/#for-other-credential-types

Klucza ssh można użyć tak:

withCredentials(
    [sshUserPrivateKey(credentialsId: '48ac7004-216e-4260-abba-1fe5db753e18', keyFileVariable: 'IUM_SFTP_KEY', passphraseVariable: '', usernameVariable: '')]) {
                sh 'dvc remote add -d ium_ssh_remote ssh://ium-sftp@tzietkiewicz.vm.wmi.amu.edu.pl/ium-sftp'
                sh 'dvc remote modify --local ium_ssh_remote keyfile $IUM_SFTP_KEY'
                sh 'dvc pull'}

Secret text tak:

    withCredentials([string(credentialsId: 'ium-sftp-password', variable: 'IUM_SFTP_PASS')]) {
                sh 'dvc remote add -d ium_ssh_remote ssh://ium-sftp@tzietkiewicz.vm.wmi.amu.edu.pl/ium-sftp'
                sh 'dvc remote modify --local ium_ssh_remote password $IUM_SFTP_PASS'
                sh 'dvc pull'
    }

Przykład konfiguracji: