{
"cells": [
{
"cell_type": "markdown",
"id": "7fe475ae",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Inżynieria uczenia maszynowego\n",
"### 22 maja 2024\n",
"# 10. DVC"
]
},
{
"cell_type": "markdown",
"id": "0c6f27a5",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
""
]
},
{
"cell_type": "markdown",
"id": "560eec71",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## DVC - Data Version Control\n",
"- [dvc.org](https://dvc.org/)\n",
"- \"Version Control System for Machine Learning Projects\" (System kontroli wersji dla projektów uczenia maszynowego)\n",
"- Open Source\n",
"- Umożliwia:\n",
" - wersjonowanie danych i modeli. \"Git dla danych i modeli\"\n",
" - budowanie potoków (\"pipeline\") definiujących jak budować/trenować/ewaluować modele. \"Makefile dla uczenia maszynowego\"\n",
" - śledzenie, porównywanie metryk i parametrów\n",
"- ściśle zintegowany z gitem\n",
"- działa niezależnie od używanego języka/bibliotek i systemu operacyjnego\n",
"- 5-minutowe wprowadzenie: https://www.youtube.com/watch?v=UbL7VUpv1Bs"
]
},
{
"cell_type": "markdown",
"id": "3d4ce1cb",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Śledzenie plików za pomocą DVC\n",
" - dużymi plikami, takimi jak plikami z danymi wejściowymi czy plikami modeli, trudno zarządza się za pomocą gita, ze względu na problemy z:\n",
" - wydajnością\n",
" - przestrzenią w repozytorium\n",
" - ograniczenia ze strony serwisu (np. [limit 100 MB na plik w Github](https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-large-files-on-github))\n",
" - Git posiada rozszerzenie [lfs(Large File Storage)](https://git-lfs.github.com/), które stanowi pewne rozwiązanie tego problemu. \n",
" - Same pliki przechowywane są na specjalnym zdalnym serwerze, w repozytorium przechowywane są jedynie odnośniki do tych plików i pewne metadane\n",
" - Github ma zintegrowany LFS z [limitem 1GB dla kont bezpłatnych](https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-storage-and-bandwidth-usage)"
]
},
{
"cell_type": "markdown",
"id": "dd8e529b",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
" - DVC proponuje podobne podejście co LFS, ale:\n",
" - pliki mogą być przechowywane na niemal dowolnym serwerze, również lokalnie\n",
" - brak limitu wielkości plików (w Git-LFS na Github [limit 2GB](https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-git-large-file-storage))\n",
" - DVC zapewnia dodatkowe narzędzie umożliwiające śledzenie plików i ich powiązań z wynikami eksperymentów\n",
" - więcej, patrz [tutaj](https://dvc.org/doc/user-guide/related-technologies)"
]
},
{
"cell_type": "markdown",
"id": "9bfb356e",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Instalacja i inicjalizacja\n",
" - https://dvc.org/doc/install\n",
" - ```pip install dvc```\n",
" - ```pipx install dvc```\n",
" - ```conda install dvc```"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "054c7a11",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: dvc in ./venv/lib/python3.10/site-packages (3.50.2)\n",
"Requirement already satisfied: attrs>=22.2.0 in ./venv/lib/python3.10/site-packages (from dvc) (23.2.0)\n",
"Requirement already satisfied: psutil>=5.8 in ./venv/lib/python3.10/site-packages (from dvc) (5.9.8)\n",
"Requirement already satisfied: zc.lockfile>=1.2.1 in ./venv/lib/python3.10/site-packages (from dvc) (3.0.post1)\n",
"Requirement already satisfied: ruamel.yaml>=0.17.11 in ./venv/lib/python3.10/site-packages (from dvc) (0.18.6)\n",
"Requirement already satisfied: dvc-http>=2.29.0 in ./venv/lib/python3.10/site-packages (from dvc) (2.32.0)\n",
"Requirement already satisfied: shortuuid>=0.5 in ./venv/lib/python3.10/site-packages (from dvc) (1.0.13)\n",
"Requirement already satisfied: platformdirs<4,>=3.1.1 in ./venv/lib/python3.10/site-packages (from dvc) (3.11.0)\n",
"Requirement already satisfied: scmrepo<4,>=3.3.2 in ./venv/lib/python3.10/site-packages (from dvc) (3.3.5)\n",
"Requirement already satisfied: pygtrie>=2.3.2 in ./venv/lib/python3.10/site-packages (from dvc) (2.5.0)\n",
"Requirement already satisfied: dvc-data<3.16,>=3.15 in ./venv/lib/python3.10/site-packages (from dvc) (3.15.1)\n",
"Requirement already satisfied: fsspec in ./venv/lib/python3.10/site-packages (from dvc) (2024.5.0)\n",
"Requirement already satisfied: dvc-objects in ./venv/lib/python3.10/site-packages (from dvc) (5.1.0)\n",
"Requirement already satisfied: grandalf<1,>=0.7 in ./venv/lib/python3.10/site-packages (from dvc) (0.8)\n",
"Requirement already satisfied: hydra-core>=1.1 in ./venv/lib/python3.10/site-packages (from dvc) (1.3.2)\n",
"Requirement already satisfied: tabulate>=0.8.7 in ./venv/lib/python3.10/site-packages (from dvc) (0.9.0)\n",
"Requirement already satisfied: colorama>=0.3.9 in ./venv/lib/python3.10/site-packages (from dvc) (0.4.6)\n",
"Requirement already satisfied: tomlkit>=0.11.1 in ./venv/lib/python3.10/site-packages (from dvc) (0.12.5)\n",
"Requirement already satisfied: rich>=12 in ./venv/lib/python3.10/site-packages (from dvc) (13.7.1)\n",
"Requirement already satisfied: dvc-render<2,>=1.0.1 in ./venv/lib/python3.10/site-packages (from dvc) (1.0.2)\n",
"Requirement already satisfied: flufl.lock<8,>=5 in ./venv/lib/python3.10/site-packages (from dvc) (7.1.1)\n",
"Requirement already satisfied: packaging>=19 in ./venv/lib/python3.10/site-packages (from dvc) (24.0)\n",
"Requirement already satisfied: pyparsing>=2.4.7 in ./venv/lib/python3.10/site-packages (from dvc) (3.1.2)\n",
"Requirement already satisfied: flatten-dict<1,>=0.4.1 in ./venv/lib/python3.10/site-packages (from dvc) (0.4.2)\n",
"Requirement already satisfied: dulwich in ./venv/lib/python3.10/site-packages (from dvc) (0.22.1)\n",
"Requirement already satisfied: configobj>=5.0.6 in ./venv/lib/python3.10/site-packages (from dvc) (5.0.8)\n",
"Requirement already satisfied: dvc-studio-client<1,>=0.20 in ./venv/lib/python3.10/site-packages (from dvc) (0.20.0)\n",
"Requirement already satisfied: omegaconf in ./venv/lib/python3.10/site-packages (from dvc) (2.3.0)\n",
"Requirement already satisfied: distro>=1.3 in ./venv/lib/python3.10/site-packages (from dvc) (1.9.0)\n",
"Requirement already satisfied: dpath<3,>=2.1.0 in ./venv/lib/python3.10/site-packages (from dvc) (2.1.6)\n",
"Requirement already satisfied: funcy>=1.14 in ./venv/lib/python3.10/site-packages (from dvc) (2.0)\n",
"Requirement already satisfied: dvc-task<1,>=0.3.0 in ./venv/lib/python3.10/site-packages (from dvc) (0.4.0)\n",
"Requirement already satisfied: voluptuous>=0.11.7 in ./venv/lib/python3.10/site-packages (from dvc) (0.14.2)\n",
"Requirement already satisfied: kombu in ./venv/lib/python3.10/site-packages (from dvc) (5.3.7)\n",
"Requirement already satisfied: shtab<2,>=1.3.4 in ./venv/lib/python3.10/site-packages (from dvc) (1.7.1)\n",
"Requirement already satisfied: iterative-telemetry>=0.0.7 in ./venv/lib/python3.10/site-packages (from dvc) (0.0.8)\n",
"Requirement already satisfied: celery in ./venv/lib/python3.10/site-packages (from dvc) (5.4.0)\n",
"Requirement already satisfied: pathspec>=0.10.3 in ./venv/lib/python3.10/site-packages (from dvc) (0.12.1)\n",
"Requirement already satisfied: requests>=2.22 in ./venv/lib/python3.10/site-packages (from dvc) (2.31.0)\n",
"Requirement already satisfied: tqdm<5,>=4.63.1 in ./venv/lib/python3.10/site-packages (from dvc) (4.66.2)\n",
"Requirement already satisfied: gto<2,>=1.6.0 in ./venv/lib/python3.10/site-packages (from dvc) (1.7.1)\n",
"Requirement already satisfied: networkx>=2.5 in ./venv/lib/python3.10/site-packages (from dvc) (3.3)\n",
"Requirement already satisfied: pydot>=1.2.4 in ./venv/lib/python3.10/site-packages (from dvc) (2.0.0)\n",
"Requirement already satisfied: six in ./venv/lib/python3.10/site-packages (from configobj>=5.0.6->dvc) (1.16.0)\n",
"Requirement already satisfied: sqltrie<1,>=0.11.0 in ./venv/lib/python3.10/site-packages (from dvc-data<3.16,>=3.15->dvc) (0.11.0)\n",
"Requirement already satisfied: dictdiffer>=0.8.1 in ./venv/lib/python3.10/site-packages (from dvc-data<3.16,>=3.15->dvc) (0.9.0)\n",
"Requirement already satisfied: diskcache>=5.2.1 in ./venv/lib/python3.10/site-packages (from dvc-data<3.16,>=3.15->dvc) (5.6.3)\n",
"Requirement already satisfied: aiohttp-retry>=2.5.0 in ./venv/lib/python3.10/site-packages (from dvc-http>=2.29.0->dvc) (2.8.3)\n",
"Requirement already satisfied: billiard<5.0,>=4.2.0 in ./venv/lib/python3.10/site-packages (from celery->dvc) (4.2.0)\n",
"Requirement already satisfied: python-dateutil>=2.8.2 in ./venv/lib/python3.10/site-packages (from celery->dvc) (2.9.0.post0)\n",
"Requirement already satisfied: click-plugins>=1.1.1 in ./venv/lib/python3.10/site-packages (from celery->dvc) (1.1.1)\n",
"Requirement already satisfied: click-repl>=0.2.0 in ./venv/lib/python3.10/site-packages (from celery->dvc) (0.3.0)\n",
"Requirement already satisfied: vine<6.0,>=5.1.0 in ./venv/lib/python3.10/site-packages (from celery->dvc) (5.1.0)\n",
"Requirement already satisfied: tzdata>=2022.7 in ./venv/lib/python3.10/site-packages (from celery->dvc) (2024.1)\n",
"Requirement already satisfied: click<9.0,>=8.1.2 in ./venv/lib/python3.10/site-packages (from celery->dvc) (8.1.7)\n",
"Requirement already satisfied: click-didyoumean>=0.3.0 in ./venv/lib/python3.10/site-packages (from celery->dvc) (0.3.1)\n",
"Requirement already satisfied: atpublic>=2.3 in ./venv/lib/python3.10/site-packages (from flufl.lock<8,>=5->dvc) (4.1.0)\n",
"Requirement already satisfied: pydantic!=2.0.0,<3,>=1.9.0 in ./venv/lib/python3.10/site-packages (from gto<2,>=1.6.0->dvc) (2.7.1)\n",
"Requirement already satisfied: semver>=2.13.0 in ./venv/lib/python3.10/site-packages (from gto<2,>=1.6.0->dvc) (3.0.2)\n",
"Requirement already satisfied: typer>=0.4.1 in ./venv/lib/python3.10/site-packages (from gto<2,>=1.6.0->dvc) (0.12.3)\n",
"Requirement already satisfied: entrypoints in ./venv/lib/python3.10/site-packages (from gto<2,>=1.6.0->dvc) (0.4)\n",
"Requirement already satisfied: antlr4-python3-runtime==4.9.* in ./venv/lib/python3.10/site-packages (from hydra-core>=1.1->dvc) (4.9.3)\n",
"Requirement already satisfied: appdirs in ./venv/lib/python3.10/site-packages (from iterative-telemetry>=0.0.7->dvc) (1.4.4)\n",
"Requirement already satisfied: filelock in ./venv/lib/python3.10/site-packages (from iterative-telemetry>=0.0.7->dvc) (3.14.0)\n",
"Requirement already satisfied: amqp<6.0.0,>=5.1.1 in ./venv/lib/python3.10/site-packages (from kombu->dvc) (5.2.0)\n",
"Requirement already satisfied: PyYAML>=5.1.0 in ./venv/lib/python3.10/site-packages (from omegaconf->dvc) (6.0.1)\n",
"Requirement already satisfied: certifi>=2017.4.17 in ./venv/lib/python3.10/site-packages (from requests>=2.22->dvc) (2024.2.2)\n",
"Requirement already satisfied: urllib3<3,>=1.21.1 in ./venv/lib/python3.10/site-packages (from requests>=2.22->dvc) (2.2.1)\n",
"Requirement already satisfied: charset-normalizer<4,>=2 in ./venv/lib/python3.10/site-packages (from requests>=2.22->dvc) (3.3.2)\n",
"Requirement already satisfied: idna<4,>=2.5 in ./venv/lib/python3.10/site-packages (from requests>=2.22->dvc) (3.6)\n",
"Requirement already satisfied: pygments<3.0.0,>=2.13.0 in ./venv/lib/python3.10/site-packages (from rich>=12->dvc) (2.17.2)\n",
"Requirement already satisfied: markdown-it-py>=2.2.0 in ./venv/lib/python3.10/site-packages (from rich>=12->dvc) (3.0.0)\n",
"Requirement already satisfied: ruamel.yaml.clib>=0.2.7 in ./venv/lib/python3.10/site-packages (from ruamel.yaml>=0.17.11->dvc) (0.2.8)\n",
"Requirement already satisfied: pygit2>=1.14.0 in ./venv/lib/python3.10/site-packages (from scmrepo<4,>=3.3.2->dvc) (1.15.0)\n",
"Requirement already satisfied: asyncssh<3,>=2.13.1 in ./venv/lib/python3.10/site-packages (from scmrepo<4,>=3.3.2->dvc) (2.14.2)\n",
"Requirement already satisfied: gitpython>3 in ./venv/lib/python3.10/site-packages (from scmrepo<4,>=3.3.2->dvc) (3.1.43)\n",
"Requirement already satisfied: setuptools in ./venv/lib/python3.10/site-packages (from zc.lockfile>=1.2.1->dvc) (59.6.0)\n",
"Requirement already satisfied: aiohttp in ./venv/lib/python3.10/site-packages (from aiohttp-retry>=2.5.0->dvc-http>=2.29.0->dvc) (3.9.5)\n",
"Requirement already satisfied: typing-extensions>=3.6 in ./venv/lib/python3.10/site-packages (from asyncssh<3,>=2.13.1->scmrepo<4,>=3.3.2->dvc) (4.11.0)\n",
"Requirement already satisfied: cryptography>=39.0 in ./venv/lib/python3.10/site-packages (from asyncssh<3,>=2.13.1->scmrepo<4,>=3.3.2->dvc) (42.0.7)\n",
"Requirement already satisfied: prompt-toolkit>=3.0.36 in ./venv/lib/python3.10/site-packages (from click-repl>=0.2.0->celery->dvc) (3.0.43)\n",
"Requirement already satisfied: gitdb<5,>=4.0.1 in ./venv/lib/python3.10/site-packages (from gitpython>3->scmrepo<4,>=3.3.2->dvc) (4.0.11)\n",
"Requirement already satisfied: mdurl~=0.1 in ./venv/lib/python3.10/site-packages (from markdown-it-py>=2.2.0->rich>=12->dvc) (0.1.2)\n",
"Requirement already satisfied: pydantic-core==2.18.2 in ./venv/lib/python3.10/site-packages (from pydantic!=2.0.0,<3,>=1.9.0->gto<2,>=1.6.0->dvc) (2.18.2)\n",
"Requirement already satisfied: annotated-types>=0.4.0 in ./venv/lib/python3.10/site-packages (from pydantic!=2.0.0,<3,>=1.9.0->gto<2,>=1.6.0->dvc) (0.7.0)\n",
"Requirement already satisfied: cffi>=1.16.0 in ./venv/lib/python3.10/site-packages (from pygit2>=1.14.0->scmrepo<4,>=3.3.2->dvc) (1.16.0)\n",
"Requirement already satisfied: orjson in ./venv/lib/python3.10/site-packages (from sqltrie<1,>=0.11.0->dvc-data<3.16,>=3.15->dvc) (3.10.3)\n",
"Requirement already satisfied: shellingham>=1.3.0 in ./venv/lib/python3.10/site-packages (from typer>=0.4.1->gto<2,>=1.6.0->dvc) (1.5.4)\n",
"Requirement already satisfied: async-timeout<5.0,>=4.0 in ./venv/lib/python3.10/site-packages (from aiohttp->aiohttp-retry>=2.5.0->dvc-http>=2.29.0->dvc) (4.0.3)\n",
"Requirement already satisfied: aiosignal>=1.1.2 in ./venv/lib/python3.10/site-packages (from aiohttp->aiohttp-retry>=2.5.0->dvc-http>=2.29.0->dvc) (1.3.1)\n",
"Requirement already satisfied: yarl<2.0,>=1.0 in ./venv/lib/python3.10/site-packages (from aiohttp->aiohttp-retry>=2.5.0->dvc-http>=2.29.0->dvc) (1.9.4)\n",
"Requirement already satisfied: multidict<7.0,>=4.5 in ./venv/lib/python3.10/site-packages (from aiohttp->aiohttp-retry>=2.5.0->dvc-http>=2.29.0->dvc) (6.0.5)\n",
"Requirement already satisfied: frozenlist>=1.1.1 in ./venv/lib/python3.10/site-packages (from aiohttp->aiohttp-retry>=2.5.0->dvc-http>=2.29.0->dvc) (1.4.1)\n",
"Requirement already satisfied: pycparser in ./venv/lib/python3.10/site-packages (from cffi>=1.16.0->pygit2>=1.14.0->scmrepo<4,>=3.3.2->dvc) (2.22)\n",
"Requirement already satisfied: smmap<6,>=3.0.1 in ./venv/lib/python3.10/site-packages (from gitdb<5,>=4.0.1->gitpython>3->scmrepo<4,>=3.3.2->dvc) (5.0.1)\n",
"Requirement already satisfied: wcwidth in ./venv/lib/python3.10/site-packages (from prompt-toolkit>=3.0.36->click-repl>=0.2.0->celery->dvc) (0.2.13)\n"
]
}
],
"source": [
"!pip3 install dvc"
]
},
{
"cell_type": "markdown",
"id": "20975d62",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Stwórzmy katalog, w którym będziemy przechowywać nasz projekt:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "4d94e912",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"!rm -r -f IUM_10/sample-ml-project-2024\n",
"!mkdir -p IUM_10/sample-ml-project-2024"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "aae59ec2",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"/home/pawel/ium/IUM_10/sample-ml-project-2024\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/pawel/ium/venv/lib/python3.10/site-packages/IPython/core/magics/osm.py:417: UserWarning: This is now an optional IPython functionality, setting dhist requires you to install the `pickleshare` library.\n",
" self.shell.db['dhist'] = compress_dhist(dhist)[-100:]\n"
]
}
],
"source": [
"#Jupyter notebook magic https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-cd\n",
"%cd \"IUM_10/sample-ml-project-2024\""
]
},
{
"cell_type": "markdown",
"id": "199c0d92",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Inicjalizujemy puste repozytorium Git (możemy też pominąć ten krok i działać w istniejącym już repozytorium)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "c13c525b",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Reinitialized existing Git repository in /home/pawel/ium/IUM_10/sample-ml-project-2024/.git/\n"
]
}
],
"source": [
"!git init"
]
},
{
"cell_type": "markdown",
"id": "c7155369",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Teraz inicjalizujemy repozytorium DVC:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "44f28226",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Initialized DVC repository.\n",
"\n",
"You can now commit the changes to git.\n",
"\n",
"\u001b[31m+---------------------------------------------------------------------+\n",
"\u001b[0m\u001b[31m|\u001b[0m \u001b[31m|\u001b[0m\n",
"\u001b[31m|\u001b[0m DVC has enabled anonymous aggregate usage analytics. \u001b[31m|\u001b[0m\n",
"\u001b[31m|\u001b[0m Read the analytics documentation (and how to opt-out) here: \u001b[31m|\u001b[0m\n",
"\u001b[31m|\u001b[0m <\u001b[36mhttps://dvc.org/doc/user-guide/analytics\u001b[39m> \u001b[31m|\u001b[0m\n",
"\u001b[31m|\u001b[0m \u001b[31m|\u001b[0m\n",
"\u001b[31m+---------------------------------------------------------------------+\n",
"\u001b[0m\n",
"\u001b[33mWhat's next?\u001b[39m\n",
"\u001b[33m------------\u001b[39m\n",
"- Check out the documentation: <\u001b[36mhttps://dvc.org/doc\u001b[39m>\n",
"- Get help and share ideas: <\u001b[36mhttps://dvc.org/chat\u001b[39m>\n",
"- Star us on GitHub: <\u001b[36mhttps://github.com/iterative/dvc\u001b[39m>\n",
"\u001b[0m"
]
}
],
"source": [
"!dvc init"
]
},
{
"cell_type": "markdown",
"id": "00bc72ed",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Zobaczmy jakie pliki dodał (również do repozytorium git) DVC.\n",
"Ich opis znajdziemy tutaj: https://dvc.org/doc/user-guide/project-structure/internal-files"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "d1aefe16",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"On branch master\n",
"\n",
"No commits yet\n",
"\n",
"Changes to be committed:\n",
" (use \"git rm --cached ...\" to unstage)\n",
"\t\u001b[32mnew file: .dvc/.gitignore\u001b[m\n",
"\t\u001b[32mnew file: .dvc/config\u001b[m\n",
"\t\u001b[32mnew file: .dvcignore\u001b[m\n",
"\n"
]
}
],
"source": [
"!git status"
]
},
{
"cell_type": "markdown",
"id": "b16a62e6",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"- `.dvc/config` - główny plik konfiguracyjny DVC\n",
"- `.dvc/config.local` - nadpisuje wartości z `config`, do lokalnych zmian niecommitowanych do repozytorium\n",
"- `.dvc/.gitignore` - pliki DVC, które nie mają znaleźć się w repo\n",
"- `.dvcignore` - DVC pomija pliki zdefiniowane w tym pliku (np. aby poprawić wydajność)"
]
},
{
"cell_type": "markdown",
"id": "72e0a272",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Możemy teraz zacommitować zmiany w git:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "59780e99",
"metadata": {
"scrolled": true,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[master (root-commit) a9746ad] Initial commit\n",
" 3 files changed, 6 insertions(+)\n",
" create mode 100644 .dvc/.gitignore\n",
" create mode 100644 .dvc/config\n",
" create mode 100644 .dvcignore\n"
]
}
],
"source": [
"!git commit -m \"Initial commit\""
]
},
{
"cell_type": "markdown",
"id": "a8861abe",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Przygotujmy przykładowe dane, pobierając je z Kaggle:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "f05ece1b",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Downloading iris.zip to /home/pawel/ium/IUM_10/sample-ml-project-2024\n",
" 0%| | 0.00/3.60k [00:00, ?B/s]\n",
"100%|██████████████████████████████████████| 3.60k/3.60k [00:00<00:00, 8.23MB/s]\n",
"Archive: iris.zip\n",
" inflating: Iris.csv \n",
" inflating: database.sqlite \n"
]
}
],
"source": [
"!kaggle datasets download -d uciml/iris\n",
"!unzip -o iris.zip\n",
"!rm database.sqlite iris.zip\n",
"!mkdir -p data\n",
"!mv Iris.csv data/"
]
},
{
"cell_type": "markdown",
"id": "adb9a522",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Teraz dodamy plik(i) z danymi do DVC:"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "74d182c7",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[?25l\u001b[32m⠋\u001b[0m Checking graph core\u001b[39m>\n",
"Adding... \n",
"!\u001b[A\n",
"Collecting files and computing hashes in data/Iris.csv |0.00 [00:00, ?file/s\u001b[A\n",
" \u001b[A\n",
"!\u001b[A\n",
" 0% Checking cache in '/home/pawel/ium/IUM_10/sample-ml-project-2024/.dvc/cache\u001b[A\n",
" \u001b[A\n",
"!\u001b[A\n",
" 0%| |Adding data/Iris.csv to cache 0/1 [00:00, ?file/s]\u001b[A\n",
" \u001b[A\n",
"!\u001b[A\n",
" 0%| |Checking out /home/pawel/ium/IUM_10/sa0/1 [00:00, ?files/s]\u001b[A\n",
"100% Adding...|████████████████████████████████████████|1/1 [00:00, 31.90file/s]\u001b[A\n",
"\n",
"To track the changes with git, run:\n",
"\n",
"\tgit add data/.gitignore data/Iris.csv.dvc\n",
"\n",
"To enable auto staging, run:\n",
"\n",
"\tdvc config core.autostage true\n",
"\u001b[0m"
]
}
],
"source": [
"!dvc add data/Iris.csv"
]
},
{
"cell_type": "markdown",
"id": "72c6b5d0",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
" - DVC utworzył plik `data/Iris.csv.dvc` i dodał oryginalny plik do `.gitignore`\n",
" - W repozytorium będzie obecny tylko plik `*.dvc`, zawierający odnośnik do prawdziwego pliku"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "74d54652",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"On branch master\n",
"Untracked files:\n",
" (use \"git add ...\" to include in what will be committed)\n",
"\t\u001b[31mdata/.gitignore\u001b[m\n",
"\t\u001b[31mdata/Iris.csv.dvc\u001b[m\n",
"\n",
"nothing added to commit but untracked files present (use \"git add\" to track)\n"
]
}
],
"source": [
"!git status -u"
]
},
{
"cell_type": "markdown",
"id": "8589fecf",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Dodajmy pliki `data/Iris.csv.dvc data/.gitignore` do repozytorium git, zgodnie z sugestią DVC:"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "460c4a17",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"!git add data/Iris.csv.dvc data/.gitignore"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "80644077",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[master 92b2c9d] Dodano dane IRIS (DVC)\n",
" 2 files changed, 6 insertions(+)\n",
" create mode 100644 data/.gitignore\n",
" create mode 100644 data/Iris.csv.dvc\n"
]
}
],
"source": [
"!git commit -m \"Dodano dane IRIS (DVC)\""
]
},
{
"cell_type": "markdown",
"id": "03899863",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Plik `*.dvc` zawiera m.in. hash pliku. Więcej o plikach `*.dvc`: [link](https://dvc.org/doc/user-guide/project-structure/dvc-files)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "8cb2ba7c",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"# %load data/Iris.csv.dvc\n"
]
},
{
"cell_type": "markdown",
"id": "0b421d45",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Oryginalny plik `Iris.csv` został przeniesiony do katalogu ./dvc/cache/{wartość hash pliku) i podlinkowany z powrotem do oryginalnej lokalizacji. Sposób tworzenia linków może być [różny w zależności od systemu plików](https://dvc.org/doc/user-guide/large-dataset-optimization)."
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "1d471f3a",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"total 8\n",
"-r--r--r-- 1 pawel pawel 5107 Sep 19 2019 7820ef0af287ff346c5cabfb4c612c\n"
]
}
],
"source": [
"!ls -l .dvc/cache/files/md5/71"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "32531aa8",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species\n",
"1,5.1,3.5,1.4,0.2,Iris-setosa\n",
"2,4.9,3.0,1.4,0.2,Iris-setosa\n"
]
}
],
"source": [
"!head -n 3 .dvc/cache/files/md5/71/7820ef0af287ff346c5cabfb4c612c"
]
},
{
"cell_type": "markdown",
"id": "901e8e90",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## dvc remote\n",
" - Żeby wysłać właściwe pliki śledzone przez DVC do zdalnej lokalizacji (z której będą mogłby być pobrane np. przez system CI albo innych użytkowników), musimy mieć skonfigurowaną taką lokazliację.\n",
" - Służy do tego polecenie [`dvc remote add`](https://dvc.org/doc/command-reference/remote/add).\n",
" - Użyjemy lokalnego \"remote\". Tutaj będzie to po prostu utworzony wcześniej katalog `~/dvcstore`. Taki katalog istnieje też na naszym Jenkinsie, oczywiście należy go podmontować w Dockerze.\n",
" - W rzeczywistych zastosowaniach podalibyśmy tutaj ścieżkę do jakiegoś zasobu dostępnego przez internet jak np. serwer SFTP, ścieżka do AWS S3 itp."
]
},
{
"cell_type": "markdown",
"id": "53429521",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Obsługiwane typy zdalnych lokalizacji (remotes): https://dvc.org/doc/command-reference/remote/add#supported-storage-types\n",
" - Amazon S3\n",
" - S3-compatible storage\n",
" - Microsoft Azure Blob Storage\n",
" - Google Drive\n",
" - Google Cloud Storage\n",
" - Aliyun OSS\n",
" - SSH\n",
" - HDFS\n",
" - WebHDFS\n",
" - HTTP\n",
" - WebDAV\n",
" - local remote"
]
},
{
"cell_type": "markdown",
"id": "507e3a09",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"#### Dodawanie remote typu local"
]
},
{
"cell_type": "code",
"execution_count": 30,
"id": "a16f2bfa",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Setting 'my_local_remote' as a default remote.\n",
"\u001b[0m"
]
}
],
"source": [
"!dvc remote add -d my_local_remote ~/dvcstore"
]
},
{
"cell_type": "code",
"execution_count": 31,
"id": "9c3deeaf",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"On branch master\n",
"Changes not staged for commit:\n",
" (use \"git add ...\" to update what will be committed)\n",
" (use \"git restore ...\" to discard changes in working directory)\n",
"\t\u001b[31mmodified: .dvc/config\u001b[m\n",
"\n",
"no changes added to commit (use \"git add\" and/or \"git commit -a\")\n"
]
}
],
"source": [
"!git status"
]
},
{
"cell_type": "code",
"execution_count": 32,
"id": "899eac7d",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[master 7123494] Added DVC remote\n",
" 1 file changed, 1 insertion(+), 1 deletion(-)\n"
]
}
],
"source": [
"!git add .dvc/config\n",
"!git commit -m \"Added DVC remote\""
]
},
{
"cell_type": "markdown",
"id": "8c556c96",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## dvc push\n",
"Kiedy mamy już skonfigurowany \"remote\" możemy wypchnąć do niego pliki korzystając z polecenia `dvc push`:"
]
},
{
"cell_type": "code",
"execution_count": 33,
"id": "c7f24f75",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Collecting |1.00 [00:00, 137entry/s]\n",
"Pushing\n",
"!\u001b[A\n",
" 0% Checking cache in '/home/pawel/dvcstore/files/md5'| |0/? [00:00, ?file\u001b[A\n",
" \u001b[A\n",
"!\u001b[A\n",
" 0% Checking cache in '/home/pawel/ium/IUM_10/sample-ml-project-2024/.dvc/cache\u001b[A\n",
" \u001b[A\n",
"!\u001b[A\n",
" 0%| |Pushing to local 0/1 [00:00, ?file/s]\u001b[A\n",
"Pushing \u001b[A\n",
"1 file pushed\n",
"\u001b[0m"
]
}
],
"source": [
"!dvc push"
]
},
{
"cell_type": "code",
"execution_count": 35,
"id": "8a355575",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[01;34m/home/pawel/dvcstore\u001b[0m\n",
"└── \u001b[01;34mfiles\u001b[0m\n",
" └── \u001b[01;34mmd5\u001b[0m\n",
" └── \u001b[01;34m71\u001b[0m\n",
" └── 7820ef0af287ff346c5cabfb4c612c\n",
"\n",
"3 directories, 1 file\n"
]
}
],
"source": [
"!tree ~/dvcstore"
]
},
{
"cell_type": "markdown",
"id": "af59ecb3",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## dvc pull\n",
"Żeby pobrać dane z DVC (np. w innej lokalizacji, przez innego użytkownika), musimy:\n",
" - sklonować repozytorium git (żeby m.in. pobrać pliki `*.dvc`\n",
" - wykonać `dvc pull`"
]
},
{
"cell_type": "markdown",
"id": "9fa914a7",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Dodawanie nowych plików i modyfikacja istniejących wygląda podobnie jak przy zwykłych plikach śledzonych przez git, tylko zamiast `git` używamy polecenia `dvc` a dodatkowo pamiętamy o zarządzaniu plikami `*.dvc` za pomocą gita:"
]
},
{
"cell_type": "code",
"execution_count": 37,
"id": "dde39796",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species\n",
"1,5.1,3.5,1.4,0.2,Iris-setosa\n",
"2,4.9,3.0,1.4,0.2,Iris-setosa\n",
"3,4.7,3.2,1.3,0.2,Iris-setosa\n",
"4,4.6,3.1,1.5,0.2,Iris-setosa\n",
"5,5.0,3.6,1.4,0.2,Iris-setosa\n",
"6,5.4,3.9,1.7,0.4,Iris-setosa\n",
"7,4.6,3.4,1.4,0.3,Iris-setosa\n",
"8,5.0,3.4,1.5,0.2,Iris-setosa\n",
"9,4.4,2.9,1.4,0.2,Iris-setosa\n",
"10,4.9,3.1,1.5,0.1,Iris-setosa\n",
"11,5.4,3.7,1.5,0.2,Iris-setosa\n",
"12,4.8,3.4,1.6,0.2,Iris-setosa\n",
"13,4.8,3.0,1.4,0.1,Iris-setosa\n",
"14,4.3,3.0,1.1,0.1,Iris-setosa\n",
"15,5.8,4.0,1.2,0.2,Iris-setosa\n",
"16,5.7,4.4,1.5,0.4,Iris-setosa\n",
"17,5.4,3.9,1.3,0.4,Iris-setosa\n",
"18,5.1,3.5,1.4,0.3,Iris-setosa\n",
"19,5.7,3.8,1.7,0.3,Iris-setosa\n",
"20,5.1,3.8,1.5,0.3,Iris-setosa\n",
"21,5.4,3.4,1.7,0.2,Iris-setosa\n",
"22,5.1,3.7,1.5,0.4,Iris-setosa\n",
"23,4.6,3.6,1.0,0.2,Iris-setosa\n",
"24,5.1,3.3,1.7,0.5,Iris-setosa\n",
"25,4.8,3.4,1.9,0.2,Iris-setosa\n",
"26,5.0,3.0,1.6,0.2,Iris-setosa\n",
"27,5.0,3.4,1.6,0.4,Iris-setosa\n",
"28,5.2,3.5,1.5,0.2,Iris-setosa\n",
"29,5.2,3.4,1.4,0.2,Iris-setosa\n",
"30,4.7,3.2,1.6,0.2,Iris-setosa\n",
"31,4.8,3.1,1.6,0.2,Iris-setosa\n",
"32,5.4,3.4,1.5,0.4,Iris-setosa\n",
"33,5.2,4.1,1.5,0.1,Iris-setosa\n",
"34,5.5,4.2,1.4,0.2,Iris-setosa\n",
"35,4.9,3.1,1.5,0.1,Iris-setosa\n",
"36,5.0,3.2,1.2,0.2,Iris-setosa\n",
"37,5.5,3.5,1.3,0.2,Iris-setosa\n",
"38,4.9,3.1,1.5,0.1,Iris-setosa\n",
"39,4.4,3.0,1.3,0.2,Iris-setosa\n",
"40,5.1,3.4,1.5,0.2,Iris-setosa\n",
"41,5.0,3.5,1.3,0.3,Iris-setosa\n",
"42,4.5,2.3,1.3,0.3,Iris-setosa\n",
"43,4.4,3.2,1.3,0.2,Iris-setosa\n",
"44,5.0,3.5,1.6,0.6,Iris-setosa\n",
"45,5.1,3.8,1.9,0.4,Iris-setosa\n",
"46,4.8,3.0,1.4,0.3,Iris-setosa\n",
"47,5.1,3.8,1.6,0.2,Iris-setosa\n",
"48,4.6,3.2,1.4,0.2,Iris-setosa\n",
"49,5.3,3.7,1.5,0.2,Iris-setosa\n",
"50,5.0,3.3,1.4,0.2,Iris-setosa\n",
"51,7.0,3.2,4.7,1.4,Iris-versicolor\n",
"52,6.4,3.2,4.5,1.5,Iris-versicolor\n",
"53,6.9,3.1,4.9,1.5,Iris-versicolor\n",
"54,5.5,2.3,4.0,1.3,Iris-versicolor\n",
"55,6.5,2.8,4.6,1.5,Iris-versicolor\n",
"56,5.7,2.8,4.5,1.3,Iris-versicolor\n",
"57,6.3,3.3,4.7,1.6,Iris-versicolor\n",
"58,4.9,2.4,3.3,1.0,Iris-versicolor\n",
"59,6.6,2.9,4.6,1.3,Iris-versicolor\n",
"60,5.2,2.7,3.9,1.4,Iris-versicolor\n",
"61,5.0,2.0,3.5,1.0,Iris-versicolor\n",
"62,5.9,3.0,4.2,1.5,Iris-versicolor\n",
"63,6.0,2.2,4.0,1.0,Iris-versicolor\n",
"64,6.1,2.9,4.7,1.4,Iris-versicolor\n",
"65,5.6,2.9,3.6,1.3,Iris-versicolor\n",
"66,6.7,3.1,4.4,1.4,Iris-versicolor\n",
"67,5.6,3.0,4.5,1.5,Iris-versicolor\n",
"68,5.8,2.7,4.1,1.0,Iris-versicolor\n",
"69,6.2,2.2,4.5,1.5,Iris-versicolor\n",
"70,5.6,2.5,3.9,1.1,Iris-versicolor\n",
"71,5.9,3.2,4.8,1.8,Iris-versicolor\n",
"72,6.1,2.8,4.0,1.3,Iris-versicolor\n",
"73,6.3,2.5,4.9,1.5,Iris-versicolor\n",
"74,6.1,2.8,4.7,1.2,Iris-versicolor\n",
"75,6.4,2.9,4.3,1.3,Iris-versicolor\n",
"76,6.6,3.0,4.4,1.4,Iris-versicolor\n",
"77,6.8,2.8,4.8,1.4,Iris-versicolor\n",
"78,6.7,3.0,5.0,1.7,Iris-versicolor\n",
"79,6.0,2.9,4.5,1.5,Iris-versicolor\n",
"80,5.7,2.6,3.5,1.0,Iris-versicolor\n",
"81,5.5,2.4,3.8,1.1,Iris-versicolor\n",
"82,5.5,2.4,3.7,1.0,Iris-versicolor\n",
"83,5.8,2.7,3.9,1.2,Iris-versicolor\n",
"84,6.0,2.7,5.1,1.6,Iris-versicolor\n",
"85,5.4,3.0,4.5,1.5,Iris-versicolor\n",
"86,6.0,3.4,4.5,1.6,Iris-versicolor\n",
"87,6.7,3.1,4.7,1.5,Iris-versicolor\n",
"88,6.3,2.3,4.4,1.3,Iris-versicolor\n",
"89,5.6,3.0,4.1,1.3,Iris-versicolor\n",
"90,5.5,2.5,4.0,1.3,Iris-versicolor\n",
"91,5.5,2.6,4.4,1.2,Iris-versicolor\n",
"92,6.1,3.0,4.6,1.4,Iris-versicolor\n",
"93,5.8,2.6,4.0,1.2,Iris-versicolor\n",
"94,5.0,2.3,3.3,1.0,Iris-versicolor\n",
"95,5.6,2.7,4.2,1.3,Iris-versicolor\n",
"96,5.7,3.0,4.2,1.2,Iris-versicolor\n",
"97,5.7,2.9,4.2,1.3,Iris-versicolor\n",
"98,6.2,2.9,4.3,1.3,Iris-versicolor\n",
"99,5.1,2.5,3.0,1.1,Iris-versicolor\n",
"100,5.7,2.8,4.1,1.3,Iris-versicolor\n",
"101,6.3,3.3,6.0,2.5,Iris-virginica\n",
"102,5.8,2.7,5.1,1.9,Iris-virginica\n",
"103,7.1,3.0,5.9,2.1,Iris-virginica\n",
"104,6.3,2.9,5.6,1.8,Iris-virginica\n",
"105,6.5,3.0,5.8,2.2,Iris-virginica\n",
"106,7.6,3.0,6.6,2.1,Iris-virginica\n",
"107,4.9,2.5,4.5,1.7,Iris-virginica\n",
"108,7.3,2.9,6.3,1.8,Iris-virginica\n",
"109,6.7,2.5,5.8,1.8,Iris-virginica\n",
"110,7.2,3.6,6.1,2.5,Iris-virginica\n",
"111,6.5,3.2,5.1,2.0,Iris-virginica\n",
"112,6.4,2.7,5.3,1.9,Iris-virginica\n",
"113,6.8,3.0,5.5,2.1,Iris-virginica\n",
"114,5.7,2.5,5.0,2.0,Iris-virginica\n",
"115,5.8,2.8,5.1,2.4,Iris-virginica\n",
"116,6.4,3.2,5.3,2.3,Iris-virginica\n",
"117,6.5,3.0,5.5,1.8,Iris-virginica\n",
"118,7.7,3.8,6.7,2.2,Iris-virginica\n",
"119,7.7,2.6,6.9,2.3,Iris-virginica\n",
"120,6.0,2.2,5.0,1.5,Iris-virginica\n",
"121,6.9,3.2,5.7,2.3,Iris-virginica\n",
"122,5.6,2.8,4.9,2.0,Iris-virginica\n",
"123,7.7,2.8,6.7,2.0,Iris-virginica\n",
"124,6.3,2.7,4.9,1.8,Iris-virginica\n",
"125,6.7,3.3,5.7,2.1,Iris-virginica\n",
"126,7.2,3.2,6.0,1.8,Iris-virginica\n",
"127,6.2,2.8,4.8,1.8,Iris-virginica\n",
"128,6.1,3.0,4.9,1.8,Iris-virginica\n",
"129,6.4,2.8,5.6,2.1,Iris-virginica\n",
"130,7.2,3.0,5.8,1.6,Iris-virginica\n",
"131,7.4,2.8,6.1,1.9,Iris-virginica\n",
"132,7.9,3.8,6.4,2.0,Iris-virginica\n",
"133,6.4,2.8,5.6,2.2,Iris-virginica\n",
"134,6.3,2.8,5.1,1.5,Iris-virginica\n",
"135,6.1,2.6,5.6,1.4,Iris-virginica\n",
"136,7.7,3.0,6.1,2.3,Iris-virginica\n",
"137,6.3,3.4,5.6,2.4,Iris-virginica\n",
"138,6.4,3.1,5.5,1.8,Iris-virginica\n",
"139,6.0,3.0,4.8,1.8,Iris-virginica\n",
"140,6.9,3.1,5.4,2.1,Iris-virginica\n",
"141,6.7,3.1,5.6,2.4,Iris-virginica\n",
"142,6.9,3.1,5.1,2.3,Iris-virginica\n",
"143,5.8,2.7,5.1,1.9,Iris-virginica\n",
"144,6.8,3.2,5.9,2.3,Iris-virginica\n",
"145,6.7,3.3,5.7,2.5,Iris-virginica\n",
"146,6.7,3.0,5.2,2.3,Iris-virginica\n",
"147,6.3,2.5,5.0,1.9,Iris-virginica\n",
"148,6.5,3.0,5.2,2.0,Iris-virginica\n",
"149,6.2,3.4,5.4,2.3,Iris-virginica\n"
]
}
],
"source": [
"!head -n -1 data/Iris.csv | tee data/Iris.csv"
]
},
{
"cell_type": "code",
"execution_count": 38,
"id": "7f14ec60",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"On branch master\n",
"nothing to commit, working tree clean\n"
]
}
],
"source": [
"!git status"
]
},
{
"cell_type": "code",
"execution_count": 39,
"id": "8a841039",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"data/Iris.csv.dvc: \n",
"\tchanged outs:\n",
"\t\tmodified: data/Iris.csv\n",
"\u001b[0m"
]
}
],
"source": [
"!dvc status"
]
},
{
"cell_type": "code",
"execution_count": 40,
"id": "bf6c1067",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[?25l\u001b[32m⠋\u001b[0m Checking graph core\u001b[39m>\n",
"Adding... \n",
"!\u001b[A\n",
"Collecting files and computing hashes in data/Iris.csv |0.00 [00:00, ?file/s\u001b[A\n",
" \u001b[A\n",
"!\u001b[A\n",
" 0% Checking cache in '/home/pawel/ium/IUM_10/sample-ml-project-2024/.dvc/cache\u001b[A\n",
" \u001b[A\n",
"!\u001b[A\n",
" 0%| |Adding data/Iris.csv to cache 0/1 [00:00, ?file/s]\u001b[A\n",
" \u001b[A\n",
"!\u001b[A\n",
" 0%| |Checking out /home/pawel/ium/IUM_10/sa0/1 [00:00, ?files/s]\u001b[A\n",
"100% Adding...|████████████████████████████████████████|1/1 [00:00, 50.81file/s]\u001b[A\n",
"\n",
"To track the changes with git, run:\n",
"\n",
"\tgit add data/Iris.csv.dvc\n",
"\n",
"To enable auto staging, run:\n",
"\n",
"\tdvc config core.autostage true\n",
"\u001b[0m"
]
}
],
"source": [
"!dvc add data/Iris.csv"
]
},
{
"cell_type": "code",
"execution_count": 41,
"id": "4a4865c9",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[master 9de24e1] Removed last line from Iris dataset\n",
" 1 file changed, 2 insertions(+), 2 deletions(-)\n"
]
}
],
"source": [
"!git add data/Iris.csv.dvc\n",
"!git commit -m \"Removed last line from Iris dataset\"\n"
]
},
{
"cell_type": "code",
"execution_count": 43,
"id": "05e2d320",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 151 .dvc/cache/files/md5/71/7820ef0af287ff346c5cabfb4c612c\n",
" 150 .dvc/cache/files/md5/bc/cff2e578d76852294184c1dce9fdbf\n",
" 301 total\n"
]
}
],
"source": [
"!wc -l .dvc/cache/files/md5/*/*"
]
},
{
"cell_type": "markdown",
"id": "d710977c",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### dvc checkout\n",
" - Polecenia `dvc checkout` używamy razem z `git checkout`, żeby zmienić gałąź, na której pracujemy.\n",
" - DVC podmieni wersje plików śledzonych przez siebie na pochodzące z innej gałęzi (o ile pliki te się różnią i różnią się pliki `*.dvc` na odpowiednich gałęziach)\n",
" - Zmiana gałęzi przez git powoduje (ewentualną) zmianę plików `*.dvc` a `dvc checkout` kopiuje/linkuje pliki z katalogu `.dvc/cache` o wartościach hash odpowiadających tym z plików `*.dvc`."
]
},
{
"cell_type": "markdown",
"id": "5897e8eb",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Wymiana danych między projektami\n",
" - za pomocą poleceń `dvc import` i `dvc update` możemy dodać i później aktualizować pliki śledzone przez DVC w innym repozytorium"
]
},
{
"cell_type": "code",
"execution_count": 44,
"id": "9b018146",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Importing 'get-started/data.xml (https://github.com/iterative/dataset-registry)' -> 'data/data.xml'\n",
" 0% Downloading data.xml| |0/1 [00:00, ?files/s]\n",
"!\u001b[A\n",
" 0%| |get-started/data.xml 0.00/13.8M [00:00, ?B/s]\u001b[A\n",
" 0%| |get-started/data.xml 16.5k/13.8M [00:00<01:47, 135kB/s]\u001b[A\n",
" 0%| |get-started/data.xml 66.5k/13.8M [00:00<00:48, 294kB/s]\u001b[A\n",
" 1%| |get-started/data.xml 102k/13.8M [00:00<00:48, 295kB/s]\u001b[A\n",
" 2%|▏ |get-started/data.xml 221k/13.8M [00:00<00:25, 566kB/s]\u001b[A\n",
" 3%|▎ |get-started/data.xml 374k/13.8M [00:00<00:17, 813kB/s]\u001b[A\n",
" 3%|▎ |get-started/data.xml 493k/13.8M [00:00<00:15, 873kB/s]\u001b[A\n",
" 5%|▍ |get-started/data.xml 697k/13.8M [00:00<00:12, 1.10MB/s]\u001b[A\n",
" 6%|▌ |get-started/data.xml 799k/13.8M [00:01<00:13, 1.02MB/s]\u001b[A\n",
" 7%|▋ |get-started/data.xml 935k/13.8M [00:01<00:12, 1.05MB/s]\u001b[A\n",
" 8%|▊ |get-started/data.xml 1.05M/13.8M [00:01<00:12, 1.07MB/s]\u001b[A\n",
" 8%|▊ |get-started/data.xml 1.10M/13.8M [00:01<00:15, 872kB/s]\u001b[A\n",
" 9%|▉ |get-started/data.xml 1.24M/13.8M [00:01<00:13, 991kB/s]\u001b[A\n",
" 10%|▉ |get-started/data.xml 1.38M/13.8M [00:01<00:12, 1.03MB/s]\u001b[A\n",
" 11%|█ |get-started/data.xml 1.51M/13.8M [00:01<00:12, 1.06MB/s]\u001b[A\n",
" 12%|█▏ |get-started/data.xml 1.66M/13.8M [00:01<00:11, 1.12MB/s]\u001b[A\n",
" 13%|█▎ |get-started/data.xml 1.77M/13.8M [00:02<00:11, 1.07MB/s]\u001b[A\n",
" 14%|█▍ |get-started/data.xml 1.91M/13.8M [00:02<00:11, 1.09MB/s]\u001b[A\n",
" 15%|█▍ |get-started/data.xml 2.04M/13.8M [00:02<00:11, 1.11MB/s]\u001b[A\n",
" 16%|█▌ |get-started/data.xml 2.19M/13.8M [00:02<00:10, 1.15MB/s]\u001b[A\n",
" 17%|█▋ |get-started/data.xml 2.32M/13.8M [00:02<00:10, 1.15MB/s]\u001b[A\n",
" 18%|█▊ |get-started/data.xml 2.42M/13.8M [00:02<00:11, 1.04MB/s]\u001b[A\n",
" 18%|█▊ |get-started/data.xml 2.52M/13.8M [00:02<00:12, 958kB/s]\u001b[A\n",
" 19%|█▉ |get-started/data.xml 2.62M/13.8M [00:02<00:12, 923kB/s]\u001b[A\n",
" 20%|█▉ |get-started/data.xml 2.72M/13.8M [00:03<00:12, 895kB/s]\u001b[A\n",
" 20%|██ |get-started/data.xml 2.82M/13.8M [00:03<00:12, 917kB/s]\u001b[A\n",
" 21%|██ |get-started/data.xml 2.92M/13.8M [00:03<00:12, 888kB/s]\u001b[A\n",
" 22%|██▏ |get-started/data.xml 3.00M/13.8M [00:03<00:14, 802kB/s]\u001b[A\n",
" 23%|██▎ |get-started/data.xml 3.10M/13.8M [00:03<00:13, 847kB/s]\u001b[A\n",
" 23%|██▎ |get-started/data.xml 3.20M/13.8M [00:03<00:13, 847kB/s]\u001b[A\n",
" 24%|██▍ |get-started/data.xml 3.30M/13.8M [00:03<00:12, 857kB/s]\u001b[A\n",
" 25%|██▍ |get-started/data.xml 3.40M/13.8M [00:03<00:12, 850kB/s]\u001b[A\n",
" 25%|██▌ |get-started/data.xml 3.49M/13.8M [00:04<00:14, 768kB/s]\u001b[A\n",
" 26%|██▌ |get-started/data.xml 3.59M/13.8M [00:04<00:12, 830kB/s]\u001b[A\n",
" 27%|██▋ |get-started/data.xml 3.69M/13.8M [00:04<00:12, 834kB/s]\u001b[A\n",
" 27%|██▋ |get-started/data.xml 3.78M/13.8M [00:04<00:12, 841kB/s]\u001b[A\n",
" 28%|██▊ |get-started/data.xml 3.90M/13.8M [00:04<00:12, 841kB/s]\u001b[A\n",
" 29%|██▉ |get-started/data.xml 4.00M/13.8M [00:04<00:12, 845kB/s]\u001b[A\n",
" 30%|██▉ |get-started/data.xml 4.10M/13.8M [00:04<00:11, 890kB/s]\u001b[A\n",
" 30%|███ |get-started/data.xml 4.20M/13.8M [00:04<00:11, 895kB/s]\u001b[A\n",
" 31%|███▏ |get-started/data.xml 4.32M/13.8M [00:04<00:11, 860kB/s]\u001b[A\n",
" 32%|███▏ |get-started/data.xml 4.42M/13.8M [00:05<00:10, 900kB/s]\u001b[A\n",
" 33%|███▎ |get-started/data.xml 4.53M/13.8M [00:05<00:11, 880kB/s]\u001b[A\n",
" 33%|███▎ |get-started/data.xml 4.58M/13.8M [00:05<00:14, 682kB/s]\u001b[A\n",
" 34%|███▎ |get-started/data.xml 4.65M/13.8M [00:05<00:14, 641kB/s]\u001b[A\n",
" 34%|███▍ |get-started/data.xml 4.68M/13.8M [00:05<00:17, 539kB/s]\u001b[A\n",
" 34%|███▍ |get-started/data.xml 4.71M/13.8M [00:05<00:20, 465kB/s]\u001b[A\n",
" 34%|███▍ |get-started/data.xml 4.75M/13.8M [00:05<00:22, 412kB/s]\u001b[A\n",
" 35%|███▍ |get-started/data.xml 4.80M/13.8M [00:06<00:22, 414kB/s]\u001b[A\n",
" 35%|███▌ |get-started/data.xml 4.85M/13.8M [00:06<00:22, 417kB/s]\u001b[A\n",
" 35%|███▌ |get-started/data.xml 4.88M/13.8M [00:06<00:24, 376kB/s]\u001b[A\n",
" 36%|███▌ |get-started/data.xml 4.93M/13.8M [00:06<00:23, 389kB/s]\u001b[A\n",
" 36%|███▌ |get-started/data.xml 4.96M/13.8M [00:06<00:25, 357kB/s]\u001b[A\n",
" 36%|███▋ |get-started/data.xml 5.01M/13.8M [00:06<00:24, 376kB/s]\u001b[A\n",
" 37%|███▋ |get-started/data.xml 5.06M/13.8M [00:06<00:23, 388kB/s]\u001b[A\n",
" 37%|███▋ |get-started/data.xml 5.10M/13.8M [00:06<00:25, 356kB/s]\u001b[A\n",
" 37%|███▋ |get-started/data.xml 5.15M/13.8M [00:07<00:24, 375kB/s]\u001b[A\n",
" 38%|███▊ |get-started/data.xml 5.18M/13.8M [00:07<00:25, 347kB/s]\u001b[A\n",
" 38%|███▊ |get-started/data.xml 5.25M/13.8M [00:07<00:21, 409kB/s]\u001b[A\n",
" 38%|███▊ |get-started/data.xml 5.28M/13.8M [00:07<00:24, 371kB/s]\u001b[A\n",
" 39%|███▊ |get-started/data.xml 5.33M/13.8M [00:07<00:22, 387kB/s]\u001b[A\n",
" 39%|███▉ |get-started/data.xml 5.38M/13.8M [00:07<00:22, 394kB/s]\u001b[A\n",
" 39%|███▉ |get-started/data.xml 5.43M/13.8M [00:07<00:21, 405kB/s]\u001b[A\n",
" 40%|███▉ |get-started/data.xml 5.48M/13.8M [00:07<00:21, 410kB/s]\u001b[A\n",
" 40%|████ |get-started/data.xml 5.53M/13.8M [00:08<00:20, 413kB/s]\u001b[A\n",
" 41%|████ |get-started/data.xml 5.59M/13.8M [00:08<00:18, 455kB/s]\u001b[A\n",
" 41%|████ |get-started/data.xml 5.64M/13.8M [00:08<00:19, 442kB/s]\u001b[A\n",
" 41%|████▏ |get-started/data.xml 5.69M/13.8M [00:08<00:19, 434kB/s]\u001b[A\n",
" 42%|████▏ |get-started/data.xml 5.74M/13.8M [00:08<00:19, 426kB/s]\u001b[A\n",
" 42%|████▏ |get-started/data.xml 5.79M/13.8M [00:08<00:19, 422kB/s]\u001b[A\n",
" 43%|████▎ |get-started/data.xml 5.86M/13.8M [00:08<00:18, 458kB/s]\u001b[A\n",
" 43%|████▎ |get-started/data.xml 5.91M/13.8M [00:08<00:18, 447kB/s]\u001b[A\n",
" 43%|████▎ |get-started/data.xml 5.96M/13.8M [00:09<00:19, 431kB/s]\u001b[A\n",
" 44%|████▎ |get-started/data.xml 6.02M/13.8M [00:09<00:17, 464kB/s]\u001b[A\n",
" 44%|████▍ |get-started/data.xml 6.06M/13.8M [00:09<00:19, 412kB/s]\u001b[A\n",
" 44%|████▍ |get-started/data.xml 6.13M/13.8M [00:09<00:17, 455kB/s]\u001b[A\n",
" 45%|████▍ |get-started/data.xml 6.16M/13.8M [00:09<00:19, 404kB/s]\u001b[A\n",
" 45%|████▌ |get-started/data.xml 6.24M/13.8M [00:09<00:16, 492kB/s]\u001b[A\n",
" 46%|████▌ |get-started/data.xml 6.29M/13.8M [00:09<00:16, 470kB/s]\u001b[A\n",
" 46%|████▌ |get-started/data.xml 6.34M/13.8M [00:09<00:17, 455kB/s]\u001b[A\n",
" 47%|████▋ |get-started/data.xml 6.41M/13.8M [00:10<00:15, 486kB/s]\u001b[A\n",
" 47%|████▋ |get-started/data.xml 6.47M/13.8M [00:10<00:15, 507kB/s]\u001b[A\n",
" 47%|████▋ |get-started/data.xml 6.52M/13.8M [00:10<00:15, 483kB/s]\u001b[A\n",
" 48%|████▊ |get-started/data.xml 6.57M/13.8M [00:10<00:15, 486kB/s]\u001b[A\n",
" 48%|████▊ |get-started/data.xml 6.64M/13.8M [00:10<00:15, 488kB/s]\u001b[A\n",
" 49%|████▊ |get-started/data.xml 6.71M/13.8M [00:10<00:14, 509kB/s]\u001b[A\n",
" 49%|████▉ |get-started/data.xml 6.77M/13.8M [00:10<00:14, 523kB/s]\u001b[A\n",
" 50%|████▉ |get-started/data.xml 6.86M/13.8M [00:10<00:12, 576kB/s]\u001b[A\n",
" 50%|█████ |get-started/data.xml 6.92M/13.8M [00:11<00:12, 569kB/s]\u001b[A\n",
" 51%|█████ |get-started/data.xml 7.01M/13.8M [00:11<00:11, 607kB/s]\u001b[A\n",
" 51%|█████▏ |get-started/data.xml 7.07M/13.8M [00:11<00:11, 592kB/s]\u001b[A\n",
" 52%|█████▏ |get-started/data.xml 7.14M/13.8M [00:11<00:11, 582kB/s]\u001b[A\n",
" 52%|█████▏ |get-started/data.xml 7.20M/13.8M [00:11<00:12, 574kB/s]\u001b[A\n",
" 53%|█████▎ |get-started/data.xml 7.25M/13.8M [00:11<00:12, 528kB/s]\u001b[A\n",
" 53%|█████▎ |get-started/data.xml 7.32M/13.8M [00:11<00:12, 537kB/s]\u001b[A\n",
" 54%|█████▎ |get-started/data.xml 7.40M/13.8M [00:11<00:11, 585kB/s]\u001b[A\n",
" 54%|█████▍ |get-started/data.xml 7.50M/13.8M [00:12<00:09, 658kB/s]\u001b[A\n",
" 55%|█████▍ |get-started/data.xml 7.57M/13.8M [00:12<00:10, 629kB/s]\u001b[A\n",
" 56%|█████▌ |get-started/data.xml 7.65M/13.8M [00:12<00:09, 651kB/s]\u001b[A\n",
" 56%|█████▌ |get-started/data.xml 7.74M/13.8M [00:12<00:09, 667kB/s]\u001b[A\n",
" 57%|█████▋ |get-started/data.xml 7.80M/13.8M [00:12<00:09, 637kB/s]\u001b[A\n",
" 57%|█████▋ |get-started/data.xml 7.90M/13.8M [00:12<00:08, 698kB/s]\u001b[A\n",
" 58%|█████▊ |get-started/data.xml 8.00M/13.8M [00:12<00:08, 739kB/s]\u001b[A\n",
" 59%|█████▉ |get-started/data.xml 8.10M/13.8M [00:12<00:07, 765kB/s]\u001b[A\n",
" 60%|█████▉ |get-started/data.xml 8.20M/13.8M [00:13<00:07, 791kB/s]\u001b[A\n",
" 60%|██████ |get-started/data.xml 8.33M/13.8M [00:13<00:06, 889kB/s]\u001b[A\n",
" 61%|██████▏ |get-started/data.xml 8.45M/13.8M [00:13<00:06, 901kB/s]\u001b[A\n",
" 62%|██████▏ |get-started/data.xml 8.55M/13.8M [00:13<00:06, 893kB/s]\u001b[A\n",
" 63%|██████▎ |get-started/data.xml 8.70M/13.8M [00:13<00:05, 987kB/s]\u001b[A\n",
" 64%|██████▍ |get-started/data.xml 8.81M/13.8M [00:13<00:05, 1.00MB/s]\u001b[A\n",
" 65%|██████▌ |get-started/data.xml 8.96M/13.8M [00:13<00:04, 1.05MB/s]\u001b[A\n",
" 66%|██████▌ |get-started/data.xml 9.11M/13.8M [00:13<00:04, 1.12MB/s]\u001b[A\n",
" 67%|██████▋ |get-started/data.xml 9.26M/13.8M [00:14<00:04, 1.16MB/s]\u001b[A\n",
" 68%|██████▊ |get-started/data.xml 9.43M/13.8M [00:14<00:03, 1.24MB/s]\u001b[A\n",
" 70%|██████▉ |get-started/data.xml 9.60M/13.8M [00:14<00:03, 1.29MB/s]\u001b[A\n",
" 71%|███████ |get-started/data.xml 9.76M/13.8M [00:14<00:03, 1.36MB/s]\u001b[A\n",
" 72%|███████▏ |get-started/data.xml 9.94M/13.8M [00:14<00:02, 1.42MB/s]\u001b[A\n",
" 74%|███████▎ |get-started/data.xml 10.1M/13.8M [00:14<00:02, 1.45MB/s]\u001b[A\n",
" 75%|███████▌ |get-started/data.xml 10.3M/13.8M [00:14<00:02, 1.53MB/s]\u001b[A\n",
" 77%|███████▋ |get-started/data.xml 10.6M/13.8M [00:14<00:02, 1.61MB/s]\u001b[A\n",
" 78%|███████▊ |get-started/data.xml 10.8M/13.8M [00:15<00:01, 1.68MB/s]\u001b[A\n",
" 80%|███████▉ |get-started/data.xml 11.0M/13.8M [00:15<00:01, 1.77MB/s]\u001b[A\n",
" 82%|████████▏ |get-started/data.xml 11.2M/13.8M [00:15<00:01, 1.89MB/s]\u001b[A\n",
" 83%|████████▎ |get-started/data.xml 11.5M/13.8M [00:15<00:01, 1.96MB/s]\u001b[A\n",
" 85%|████████▌ |get-started/data.xml 11.8M/13.8M [00:15<00:01, 1.98MB/s]\u001b[A\n",
" 87%|████████▋ |get-started/data.xml 12.0M/13.8M [00:15<00:00, 2.13MB/s]\u001b[A\n",
" 89%|████████▉ |get-started/data.xml 12.3M/13.8M [00:15<00:00, 2.20MB/s]\u001b[A\n",
" 91%|█████████▏|get-started/data.xml 12.6M/13.8M [00:15<00:00, 2.30MB/s]\u001b[A\n",
" 94%|█████████▎|get-started/data.xml 12.9M/13.8M [00:15<00:00, 2.37MB/s]\u001b[A\n",
" 96%|█████████▌|get-started/data.xml 13.2M/13.8M [00:16<00:00, 2.50MB/s]\u001b[A\n",
" 97%|█████████▋|get-started/data.xml 13.4M/13.8M [00:16<00:00, 2.01MB/s]\u001b[A\n",
" 98%|█████████▊|get-started/data.xml 13.5M/13.8M [00:16<00:00, 1.73MB/s]\u001b[A\n",
"100%|██████████|get-started/data.xml 13.8M/13.8M [00:16<00:00, 1.81MB/s]\u001b[A\n",
" \u001b[A\n",
"To track the changes with git, run:\n",
"\n",
"\tgit add data/data.xml.dvc data/.gitignore\n",
"\n",
"To enable auto staging, run:\n",
"\n",
"\tdvc config core.autostage true\n",
"\u001b[0m"
]
}
],
"source": [
"!dvc import https://github.com/iterative/dataset-registry \\\n",
" get-started/data.xml -o data/data.xml"
]
},
{
"cell_type": "code",
"execution_count": 46,
"id": "be2c1a37",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Data and pipelines are up to date. \n",
"\u001b[0m"
]
}
],
"source": [
"!dvc status"
]
},
{
"cell_type": "code",
"execution_count": 47,
"id": "3306c5b7",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"total 14124\n",
"-rw-r--r-- 1 pawel pawel 5072 May 22 07:57 Iris.csv\n",
"-rw-r--r-- 1 pawel pawel 88 May 22 07:57 Iris.csv.dvc\n",
"-rw-r--r-- 1 pawel pawel 14445097 May 22 07:59 data.xml\n",
"-rw-r--r-- 1 pawel pawel 296 May 22 07:59 data.xml.dvc\n"
]
}
],
"source": [
"ls -l data"
]
},
{
"cell_type": "markdown",
"id": "db1063ac",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## DVC pipelines\n",
" - Wprowadzenie: https://youtu.be/71IGzyH95UY\n",
" - Getting started: https://dvc.org/doc/start/data-pipelines\n",
" - DVC pipelines pozwalają zbudować (za pomocą polecenia `dvc run`) lub zdefiniować (edytując plik `dvc.yaml`) graf zależności między krokami wykonywanymi w naszym projekcie (takimi jak \"przygotowanie danych\", \"uczenie\", \"ewaluacja\").\n",
"- Tak zdefiniowany pipeline można potem uruchomić za pomocą polecenia `dvc reproduce`."
]
},
{
"cell_type": "markdown",
"id": "e2939867",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Zadania [5 pkt + dodatkowo 10 pkt]\n",
"### Termin: 29 maja 2024\n",
"1. Zainicjalizuj repozytorium DVC wewnątrz Twojego repozytorium z projektem [1pkt]\n",
"2. Dodaj plik(i) z danymi w Twoim projekcie do DVC [1pkt]\n",
"3. Skonfiguruj remote (dane do konfiguracji podane poniżej) [3pkt]\n",
"4. [Dodatkowo] Stwórz/zdefiniuj i dodaj do repozytorium plik `dvc.yaml` opisujący kroki wykonywane w Twoim projekcie. Wydziel przynajmniej 2 kroki (np. przygotowanie danych/trenowanie) powiązane ze sobą za pomocą zależności (skorzystaj z \n",
"materiałów \"Getting started\", link powyżej) [10pkt (opcjonalne)]"
]
},
{
"cell_type": "markdown",
"id": "2f5a8590",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## SSH remote\n",
"Jednym z remote obsługiwanych przez DVC jest SFTP/SSH.\n",
"W celu jego wykorzystania na serwerze tzietkiewicz.vm.wmi.amu.edu.pl utworzony został użytkownik `ium-sftp` i skonfigurowany serwer SFTP.\n",
"Został też dla niego wygenerowany klucz ssh, który został dodany jako \"Jenkins credential\" (patrz opis konfiguracji na Jenkins poniżej)"
]
},
{
"cell_type": "markdown",
"id": "82a61107",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Lokalnie\n",
"Będziemy potrzebować zależności ([szczegóły](https://dvc.org/doc/command-reference/remote/add))\n",
" \n",
" `conda install dvc-ssh` \n",
"\n",
"albo\n",
"\n",
"`pip install dvc[ssh] paramiko`"
]
},
{
"cell_type": "code",
"execution_count": 50,
"id": "c48c5b8e",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: dvc[ssh] in /home/pawel/ium/venv/lib/python3.10/site-packages (3.50.2)\n",
"Collecting paramiko\n",
" Downloading paramiko-3.4.0-py3-none-any.whl (225 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m225.9/225.9 KB\u001b[0m \u001b[31m4.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0ma \u001b[36m0:00:01\u001b[0m\n",
"\u001b[?25hRequirement already satisfied: ruamel.yaml>=0.17.11 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (0.18.6)\n",
"Requirement already satisfied: fsspec in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (2024.5.0)\n",
"Requirement already satisfied: dvc-studio-client<1,>=0.20 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (0.20.0)\n",
"Requirement already satisfied: tomlkit>=0.11.1 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (0.12.5)\n",
"Requirement already satisfied: dvc-objects in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (5.1.0)\n",
"Requirement already satisfied: distro>=1.3 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (1.9.0)\n",
"Requirement already satisfied: pygtrie>=2.3.2 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (2.5.0)\n",
"Requirement already satisfied: voluptuous>=0.11.7 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (0.14.2)\n",
"Requirement already satisfied: attrs>=22.2.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (23.2.0)\n",
"Requirement already satisfied: dvc-http>=2.29.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (2.32.0)\n",
"Requirement already satisfied: rich>=12 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (13.7.1)\n",
"Requirement already satisfied: dulwich in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (0.22.1)\n",
"Requirement already satisfied: pyparsing>=2.4.7 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (3.1.2)\n",
"Requirement already satisfied: shortuuid>=0.5 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (1.0.13)\n",
"Requirement already satisfied: flufl.lock<8,>=5 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (7.1.1)\n",
"Requirement already satisfied: kombu in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (5.3.7)\n",
"Requirement already satisfied: iterative-telemetry>=0.0.7 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (0.0.8)\n",
"Requirement already satisfied: dpath<3,>=2.1.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (2.1.6)\n",
"Requirement already satisfied: colorama>=0.3.9 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (0.4.6)\n",
"Requirement already satisfied: celery in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (5.4.0)\n",
"Requirement already satisfied: packaging>=19 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (24.0)\n",
"Requirement already satisfied: tabulate>=0.8.7 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (0.9.0)\n",
"Requirement already satisfied: shtab<2,>=1.3.4 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (1.7.1)\n",
"Requirement already satisfied: scmrepo<4,>=3.3.2 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (3.3.5)\n",
"Requirement already satisfied: dvc-render<2,>=1.0.1 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (1.0.2)\n",
"Requirement already satisfied: gto<2,>=1.6.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (1.7.1)\n",
"Requirement already satisfied: pydot>=1.2.4 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (2.0.0)\n",
"Requirement already satisfied: psutil>=5.8 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (5.9.8)\n",
"Requirement already satisfied: configobj>=5.0.6 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (5.0.8)\n",
"Requirement already satisfied: funcy>=1.14 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (2.0)\n",
"Requirement already satisfied: grandalf<1,>=0.7 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (0.8)\n",
"Requirement already satisfied: dvc-task<1,>=0.3.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (0.4.0)\n",
"Requirement already satisfied: requests>=2.22 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (2.31.0)\n",
"Requirement already satisfied: zc.lockfile>=1.2.1 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (3.0.post1)\n",
"Requirement already satisfied: flatten-dict<1,>=0.4.1 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (0.4.2)\n",
"Requirement already satisfied: networkx>=2.5 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (3.3)\n",
"Requirement already satisfied: pathspec>=0.10.3 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (0.12.1)\n",
"Requirement already satisfied: hydra-core>=1.1 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (1.3.2)\n",
"Requirement already satisfied: omegaconf in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (2.3.0)\n",
"Requirement already satisfied: tqdm<5,>=4.63.1 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (4.66.2)\n",
"Requirement already satisfied: dvc-data<3.16,>=3.15 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (3.15.1)\n",
"Requirement already satisfied: platformdirs<4,>=3.1.1 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (3.11.0)\n",
"Collecting dvc-ssh<5,>=4\n",
" Downloading dvc_ssh-4.1.1-py3-none-any.whl (15 kB)\n",
"Collecting bcrypt>=3.2\n",
" Downloading bcrypt-4.1.3-cp39-abi3-manylinux_2_28_x86_64.whl (283 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m283.7/283.7 KB\u001b[0m \u001b[31m12.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hCollecting pynacl>=1.5\n",
" Downloading PyNaCl-1.5.0-cp36-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (856 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m856.7/856.7 KB\u001b[0m \u001b[31m14.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0ma \u001b[36m0:00:01\u001b[0m\n",
"\u001b[?25hRequirement already satisfied: cryptography>=3.3 in /home/pawel/ium/venv/lib/python3.10/site-packages (from paramiko) (42.0.7)\n",
"Requirement already satisfied: six in /home/pawel/ium/venv/lib/python3.10/site-packages (from configobj>=5.0.6->dvc[ssh]) (1.16.0)\n",
"Requirement already satisfied: cffi>=1.12 in /home/pawel/ium/venv/lib/python3.10/site-packages (from cryptography>=3.3->paramiko) (1.16.0)\n",
"Requirement already satisfied: diskcache>=5.2.1 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc-data<3.16,>=3.15->dvc[ssh]) (5.6.3)\n",
"Requirement already satisfied: dictdiffer>=0.8.1 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc-data<3.16,>=3.15->dvc[ssh]) (0.9.0)\n",
"Requirement already satisfied: sqltrie<1,>=0.11.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc-data<3.16,>=3.15->dvc[ssh]) (0.11.0)\n",
"Requirement already satisfied: aiohttp-retry>=2.5.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc-http>=2.29.0->dvc[ssh]) (2.8.3)\n",
"Collecting sshfs[bcrypt]>=2023.4.1\n",
" Downloading sshfs-2024.4.1-py3-none-any.whl (15 kB)\n",
"Requirement already satisfied: billiard<5.0,>=4.2.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from celery->dvc[ssh]) (4.2.0)\n",
"Requirement already satisfied: tzdata>=2022.7 in /home/pawel/ium/venv/lib/python3.10/site-packages (from celery->dvc[ssh]) (2024.1)\n",
"Requirement already satisfied: python-dateutil>=2.8.2 in /home/pawel/ium/venv/lib/python3.10/site-packages (from celery->dvc[ssh]) (2.9.0.post0)\n",
"Requirement already satisfied: vine<6.0,>=5.1.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from celery->dvc[ssh]) (5.1.0)\n",
"Requirement already satisfied: click-plugins>=1.1.1 in /home/pawel/ium/venv/lib/python3.10/site-packages (from celery->dvc[ssh]) (1.1.1)\n",
"Requirement already satisfied: click-repl>=0.2.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from celery->dvc[ssh]) (0.3.0)\n",
"Requirement already satisfied: click<9.0,>=8.1.2 in /home/pawel/ium/venv/lib/python3.10/site-packages (from celery->dvc[ssh]) (8.1.7)\n",
"Requirement already satisfied: click-didyoumean>=0.3.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from celery->dvc[ssh]) (0.3.1)\n",
"Requirement already satisfied: atpublic>=2.3 in /home/pawel/ium/venv/lib/python3.10/site-packages (from flufl.lock<8,>=5->dvc[ssh]) (4.1.0)\n",
"Requirement already satisfied: semver>=2.13.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from gto<2,>=1.6.0->dvc[ssh]) (3.0.2)\n",
"Requirement already satisfied: pydantic!=2.0.0,<3,>=1.9.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from gto<2,>=1.6.0->dvc[ssh]) (2.7.1)\n",
"Requirement already satisfied: entrypoints in /home/pawel/ium/venv/lib/python3.10/site-packages (from gto<2,>=1.6.0->dvc[ssh]) (0.4)\n",
"Requirement already satisfied: typer>=0.4.1 in /home/pawel/ium/venv/lib/python3.10/site-packages (from gto<2,>=1.6.0->dvc[ssh]) (0.12.3)\n",
"Requirement already satisfied: antlr4-python3-runtime==4.9.* in /home/pawel/ium/venv/lib/python3.10/site-packages (from hydra-core>=1.1->dvc[ssh]) (4.9.3)\n",
"Requirement already satisfied: filelock in /home/pawel/ium/venv/lib/python3.10/site-packages (from iterative-telemetry>=0.0.7->dvc[ssh]) (3.14.0)\n",
"Requirement already satisfied: appdirs in /home/pawel/ium/venv/lib/python3.10/site-packages (from iterative-telemetry>=0.0.7->dvc[ssh]) (1.4.4)\n",
"Requirement already satisfied: amqp<6.0.0,>=5.1.1 in /home/pawel/ium/venv/lib/python3.10/site-packages (from kombu->dvc[ssh]) (5.2.0)\n",
"Requirement already satisfied: PyYAML>=5.1.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from omegaconf->dvc[ssh]) (6.0.1)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /home/pawel/ium/venv/lib/python3.10/site-packages (from requests>=2.22->dvc[ssh]) (2024.2.2)\n",
"Requirement already satisfied: idna<4,>=2.5 in /home/pawel/ium/venv/lib/python3.10/site-packages (from requests>=2.22->dvc[ssh]) (3.6)\n",
"Requirement already satisfied: charset-normalizer<4,>=2 in /home/pawel/ium/venv/lib/python3.10/site-packages (from requests>=2.22->dvc[ssh]) (3.3.2)\n",
"Requirement already satisfied: urllib3<3,>=1.21.1 in /home/pawel/ium/venv/lib/python3.10/site-packages (from requests>=2.22->dvc[ssh]) (2.2.1)\n",
"Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from rich>=12->dvc[ssh]) (2.17.2)\n",
"Requirement already satisfied: markdown-it-py>=2.2.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from rich>=12->dvc[ssh]) (3.0.0)\n",
"Requirement already satisfied: ruamel.yaml.clib>=0.2.7 in /home/pawel/ium/venv/lib/python3.10/site-packages (from ruamel.yaml>=0.17.11->dvc[ssh]) (0.2.8)\n",
"Requirement already satisfied: pygit2>=1.14.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from scmrepo<4,>=3.3.2->dvc[ssh]) (1.15.0)\n",
"Requirement already satisfied: gitpython>3 in /home/pawel/ium/venv/lib/python3.10/site-packages (from scmrepo<4,>=3.3.2->dvc[ssh]) (3.1.43)\n",
"Requirement already satisfied: asyncssh<3,>=2.13.1 in /home/pawel/ium/venv/lib/python3.10/site-packages (from scmrepo<4,>=3.3.2->dvc[ssh]) (2.14.2)\n",
"Requirement already satisfied: setuptools in /home/pawel/ium/venv/lib/python3.10/site-packages (from zc.lockfile>=1.2.1->dvc[ssh]) (59.6.0)\n",
"Requirement already satisfied: aiohttp in /home/pawel/ium/venv/lib/python3.10/site-packages (from aiohttp-retry>=2.5.0->dvc-http>=2.29.0->dvc[ssh]) (3.9.5)\n",
"Requirement already satisfied: typing-extensions>=3.6 in /home/pawel/ium/venv/lib/python3.10/site-packages (from asyncssh<3,>=2.13.1->scmrepo<4,>=3.3.2->dvc[ssh]) (4.11.0)\n",
"Requirement already satisfied: pycparser in /home/pawel/ium/venv/lib/python3.10/site-packages (from cffi>=1.12->cryptography>=3.3->paramiko) (2.22)\n",
"Requirement already satisfied: prompt-toolkit>=3.0.36 in /home/pawel/ium/venv/lib/python3.10/site-packages (from click-repl>=0.2.0->celery->dvc[ssh]) (3.0.43)\n",
"Requirement already satisfied: gitdb<5,>=4.0.1 in /home/pawel/ium/venv/lib/python3.10/site-packages (from gitpython>3->scmrepo<4,>=3.3.2->dvc[ssh]) (4.0.11)\n",
"Requirement already satisfied: mdurl~=0.1 in /home/pawel/ium/venv/lib/python3.10/site-packages (from markdown-it-py>=2.2.0->rich>=12->dvc[ssh]) (0.1.2)\n",
"Requirement already satisfied: pydantic-core==2.18.2 in /home/pawel/ium/venv/lib/python3.10/site-packages (from pydantic!=2.0.0,<3,>=1.9.0->gto<2,>=1.6.0->dvc[ssh]) (2.18.2)\n",
"Requirement already satisfied: annotated-types>=0.4.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from pydantic!=2.0.0,<3,>=1.9.0->gto<2,>=1.6.0->dvc[ssh]) (0.7.0)\n",
"Requirement already satisfied: orjson in /home/pawel/ium/venv/lib/python3.10/site-packages (from sqltrie<1,>=0.11.0->dvc-data<3.16,>=3.15->dvc[ssh]) (3.10.3)\n",
"Requirement already satisfied: shellingham>=1.3.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from typer>=0.4.1->gto<2,>=1.6.0->dvc[ssh]) (1.5.4)\n",
"Requirement already satisfied: async-timeout<5.0,>=4.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from aiohttp->aiohttp-retry>=2.5.0->dvc-http>=2.29.0->dvc[ssh]) (4.0.3)\n",
"Requirement already satisfied: aiosignal>=1.1.2 in /home/pawel/ium/venv/lib/python3.10/site-packages (from aiohttp->aiohttp-retry>=2.5.0->dvc-http>=2.29.0->dvc[ssh]) (1.3.1)\n",
"Requirement already satisfied: multidict<7.0,>=4.5 in /home/pawel/ium/venv/lib/python3.10/site-packages (from aiohttp->aiohttp-retry>=2.5.0->dvc-http>=2.29.0->dvc[ssh]) (6.0.5)\n",
"Requirement already satisfied: yarl<2.0,>=1.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from aiohttp->aiohttp-retry>=2.5.0->dvc-http>=2.29.0->dvc[ssh]) (1.9.4)\n",
"Requirement already satisfied: frozenlist>=1.1.1 in /home/pawel/ium/venv/lib/python3.10/site-packages (from aiohttp->aiohttp-retry>=2.5.0->dvc-http>=2.29.0->dvc[ssh]) (1.4.1)\n",
"Requirement already satisfied: smmap<6,>=3.0.1 in /home/pawel/ium/venv/lib/python3.10/site-packages (from gitdb<5,>=4.0.1->gitpython>3->scmrepo<4,>=3.3.2->dvc[ssh]) (5.0.1)\n",
"Requirement already satisfied: wcwidth in /home/pawel/ium/venv/lib/python3.10/site-packages (from prompt-toolkit>=3.0.36->click-repl>=0.2.0->celery->dvc[ssh]) (0.2.13)\n",
"Installing collected packages: bcrypt, pynacl, paramiko, sshfs, dvc-ssh\n",
"Successfully installed bcrypt-4.1.3 dvc-ssh-4.1.1 paramiko-3.4.0 pynacl-1.5.0 sshfs-2024.4.1\n"
]
}
],
"source": [
"# conda install -c conda-forge dvc-ssh\n",
"\n",
"!pip install dvc[ssh] paramiko"
]
},
{
"cell_type": "code",
"execution_count": 51,
"id": "9662b7aa",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[sudo] password for pawel: \n"
]
}
],
"source": [
"## Poniższe są potrzebne, żeby polecania dvc remote działały:\n",
"!sudo apt install libssl3 libffi7"
]
},
{
"cell_type": "markdown",
"id": "04c41da0",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Dodajemy remote:"
]
},
{
"cell_type": "code",
"execution_count": 52,
"id": "e9a04876",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Setting 'ium_ssh_remote' as a default remote.\n",
"\u001b[0m"
]
}
],
"source": [
"!dvc remote add -f -d ium_ssh_remote ssh://ium-sftp@tzietkiewicz.vm.wmi.amu.edu.pl"
]
},
{
"cell_type": "code",
"execution_count": 53,
"id": "e3f27bbb",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"my_local_remote\t/home/pawel/dvcstore\n",
"ium_ssh_remote\tssh://ium-sftp@tzietkiewicz.vm.wmi.amu.edu.pl\n",
"\u001b[0m"
]
}
],
"source": [
"!dvc remote list"
]
},
{
"cell_type": "markdown",
"id": "c92edd7b",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Zapisujemy hasło:"
]
},
{
"cell_type": "code",
"execution_count": 54,
"id": "5b2fa175",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[0m"
]
}
],
"source": [
"!dvc remote modify --local ium_ssh_remote password IUM@2021"
]
},
{
"cell_type": "markdown",
"id": "8b83049b",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Pushujemy do skonfigurowanego remote:"
]
},
{
"cell_type": "code",
"execution_count": 55,
"id": "ea6e16fa",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Collecting |1.00 [00:00, 252entry/s]\n",
"Pushing\n",
"!\u001b[A\n",
" 0% Checking cache in 'files/md5'| |0/? [00:00, ?files/s]\u001b[A\n",
" \u001b[A\n",
"!\u001b[A\n",
" 0% Checking cache in '/home/pawel/ium/IUM_10/sample-ml-project-2024/.dvc/cache\u001b[A\n",
" \u001b[A\n",
"!\u001b[A\n",
" 0%| |Pushing to ssh 0/1 [00:00, ?file/s]\u001b[A\n",
"\n",
"!\u001b[A\u001b[A\n",
"\n",
" 0%| |/home/pawel/ium/IUM_10/sample-m0.00/4.95k [00:00, ?B/s]\u001b[A\u001b[A\n",
"\n",
" \u001b[A\u001b[A\n",
"100%|██████████|Pushing to ssh 1/1 [00:00<00:00, 8.63file/s]\u001b[A\n",
"Pushing \u001b[A\n",
"1 file pushed\n",
"\u001b[0m"
]
}
],
"source": [
"!dvc push"
]
},
{
"cell_type": "markdown",
"id": "1468c44c",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Jenkins\n",
"\n",
"W Jenkins można użyć mechanizmu \"Credentials\", żeby w bezpieczny sposób przekazać hasło albo klucz prywatny.\n",
"\n",
"Takie dane dla użytkownika ium-sftp zostały stworzone na Jenkinsie:\n",
"\n",
" - typu ssh key: https://tzietkiewicz.vm.wmi.amu.edu.pl:8081/credentials/store/system/domain/_/credential/48ac7004-216e-4260-abba-1fe5db753e18/\n",
" - typu \"secret text\" - zawierający hasło użytkownika ium-shftp: https://tzietkiewicz.vm.wmi.amu.edu.pl:8081/credentials/store/system/domain/_/credential/ium-sftp-password/\n",
"\n",
"Opis używania \"Credentials\" w Jenkinsfile: https://www.jenkins.io/doc/book/pipeline/jenkinsfile/#for-other-credential-types\n",
"\n",
"Klucza ssh można użyć tak: \n",
"\n",
"```Jenkinsfile\n",
"withCredentials(\n",
" [sshUserPrivateKey(credentialsId: '48ac7004-216e-4260-abba-1fe5db753e18', keyFileVariable: 'IUM_SFTP_KEY', passphraseVariable: '', usernameVariable: '')]) {\n",
" sh 'dvc remote add -d ium_ssh_remote ssh://ium-sftp@tzietkiewicz.vm.wmi.amu.edu.pl/ium-sftp'\n",
" sh 'dvc remote modify --local ium_ssh_remote keyfile $IUM_SFTP_KEY'\n",
" sh 'dvc pull'}\n",
"```\n",
"\n",
"Secret text tak:\n",
"\n",
"```Jenkinsfile\n",
" withCredentials([string(credentialsId: 'ium-sftp-password', variable: 'IUM_SFTP_PASS')]) {\n",
" sh 'dvc remote add -d ium_ssh_remote ssh://ium-sftp@tzietkiewicz.vm.wmi.amu.edu.pl/ium-sftp'\n",
" sh 'dvc remote modify --local ium_ssh_remote password $IUM_SFTP_PASS'\n",
" sh 'dvc pull'\n",
" }\n",
"```\n",
"\n",
"Przykład konfiguracji: \n",
" - https://tzietkiewicz.vm.wmi.amu.edu.pl:8081/job/docker-test-mount/ \n",
" - https://git.wmi.amu.edu.pl/tzietkiewicz/ium-helloworld"
]
}
],
"metadata": {
"author": "Tomasz Ziętkiewicz",
"celltoolbar": "Slideshow",
"email": "tomasz.zietkiewicz@amu.edu.pl",
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"lang": "pl",
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
},
"slideshow": {
"slide_type": "slide"
},
"subtitle": "10.DVC[laboratoria]",
"title": "Inżynieria uczenia maszynowego",
"year": "2021"
},
"nbformat": 4,
"nbformat_minor": 5
}