{ "cells": [ { "cell_type": "markdown", "id": "7fe475ae", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Inżynieria uczenia maszynowego\n", "### 22 maja 2024\n", "# 10. DVC" ] }, { "cell_type": "markdown", "id": "0c6f27a5", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "" ] }, { "cell_type": "markdown", "id": "560eec71", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## DVC - Data Version Control\n", "- [dvc.org](https://dvc.org/)\n", "- \"Version Control System for Machine Learning Projects\" (System kontroli wersji dla projektów uczenia maszynowego)\n", "- Open Source\n", "- Umożliwia:\n", " - wersjonowanie danych i modeli. \"Git dla danych i modeli\"\n", " - budowanie potoków (\"pipeline\") definiujących jak budować/trenować/ewaluować modele. \"Makefile dla uczenia maszynowego\"\n", " - śledzenie, porównywanie metryk i parametrów\n", "- ściśle zintegowany z gitem\n", "- działa niezależnie od używanego języka/bibliotek i systemu operacyjnego\n", "- 5-minutowe wprowadzenie: https://www.youtube.com/watch?v=UbL7VUpv1Bs" ] }, { "cell_type": "markdown", "id": "3d4ce1cb", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Śledzenie plików za pomocą DVC\n", " - dużymi plikami, takimi jak plikami z danymi wejściowymi czy plikami modeli, trudno zarządza się za pomocą gita, ze względu na problemy z:\n", " - wydajnością\n", " - przestrzenią w repozytorium\n", " - ograniczenia ze strony serwisu (np. [limit 100 MB na plik w Github](https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-large-files-on-github))\n", " - Git posiada rozszerzenie [lfs(Large File Storage)](https://git-lfs.github.com/), które stanowi pewne rozwiązanie tego problemu. \n", " - Same pliki przechowywane są na specjalnym zdalnym serwerze, w repozytorium przechowywane są jedynie odnośniki do tych plików i pewne metadane\n", " - Github ma zintegrowany LFS z [limitem 1GB dla kont bezpłatnych](https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-storage-and-bandwidth-usage)" ] }, { "cell_type": "markdown", "id": "dd8e529b", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ " - DVC proponuje podobne podejście co LFS, ale:\n", " - pliki mogą być przechowywane na niemal dowolnym serwerze, również lokalnie\n", " - brak limitu wielkości plików (w Git-LFS na Github [limit 2GB](https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-git-large-file-storage))\n", " - DVC zapewnia dodatkowe narzędzie umożliwiające śledzenie plików i ich powiązań z wynikami eksperymentów\n", " - więcej, patrz [tutaj](https://dvc.org/doc/user-guide/related-technologies)" ] }, { "cell_type": "markdown", "id": "9bfb356e", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Instalacja i inicjalizacja\n", " - https://dvc.org/doc/install\n", " - ```pip install dvc```\n", " - ```pipx install dvc```\n", " - ```conda install dvc```" ] }, { "cell_type": "code", "execution_count": 1, "id": "054c7a11", "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Requirement already satisfied: dvc in ./venv/lib/python3.10/site-packages (3.50.2)\n", "Requirement already satisfied: attrs>=22.2.0 in ./venv/lib/python3.10/site-packages (from dvc) (23.2.0)\n", "Requirement already satisfied: psutil>=5.8 in ./venv/lib/python3.10/site-packages (from dvc) (5.9.8)\n", "Requirement already satisfied: zc.lockfile>=1.2.1 in ./venv/lib/python3.10/site-packages (from dvc) (3.0.post1)\n", "Requirement already satisfied: ruamel.yaml>=0.17.11 in ./venv/lib/python3.10/site-packages (from dvc) (0.18.6)\n", "Requirement already satisfied: dvc-http>=2.29.0 in ./venv/lib/python3.10/site-packages (from dvc) (2.32.0)\n", "Requirement already satisfied: shortuuid>=0.5 in ./venv/lib/python3.10/site-packages (from dvc) (1.0.13)\n", "Requirement already satisfied: platformdirs<4,>=3.1.1 in ./venv/lib/python3.10/site-packages (from dvc) (3.11.0)\n", "Requirement already satisfied: scmrepo<4,>=3.3.2 in ./venv/lib/python3.10/site-packages (from dvc) (3.3.5)\n", "Requirement already satisfied: pygtrie>=2.3.2 in ./venv/lib/python3.10/site-packages (from dvc) (2.5.0)\n", "Requirement already satisfied: dvc-data<3.16,>=3.15 in ./venv/lib/python3.10/site-packages (from dvc) (3.15.1)\n", "Requirement already satisfied: fsspec in ./venv/lib/python3.10/site-packages (from dvc) (2024.5.0)\n", "Requirement already satisfied: dvc-objects in ./venv/lib/python3.10/site-packages (from dvc) (5.1.0)\n", "Requirement already satisfied: grandalf<1,>=0.7 in ./venv/lib/python3.10/site-packages (from dvc) (0.8)\n", "Requirement already satisfied: hydra-core>=1.1 in ./venv/lib/python3.10/site-packages (from dvc) (1.3.2)\n", "Requirement already satisfied: tabulate>=0.8.7 in ./venv/lib/python3.10/site-packages (from dvc) (0.9.0)\n", "Requirement already satisfied: colorama>=0.3.9 in ./venv/lib/python3.10/site-packages (from dvc) (0.4.6)\n", "Requirement already satisfied: tomlkit>=0.11.1 in ./venv/lib/python3.10/site-packages (from dvc) (0.12.5)\n", "Requirement already satisfied: rich>=12 in ./venv/lib/python3.10/site-packages (from dvc) (13.7.1)\n", "Requirement already satisfied: dvc-render<2,>=1.0.1 in ./venv/lib/python3.10/site-packages (from dvc) (1.0.2)\n", "Requirement already satisfied: flufl.lock<8,>=5 in ./venv/lib/python3.10/site-packages (from dvc) (7.1.1)\n", "Requirement already satisfied: packaging>=19 in ./venv/lib/python3.10/site-packages (from dvc) (24.0)\n", "Requirement already satisfied: pyparsing>=2.4.7 in ./venv/lib/python3.10/site-packages (from dvc) (3.1.2)\n", "Requirement already satisfied: flatten-dict<1,>=0.4.1 in ./venv/lib/python3.10/site-packages (from dvc) (0.4.2)\n", "Requirement already satisfied: dulwich in ./venv/lib/python3.10/site-packages (from dvc) (0.22.1)\n", "Requirement already satisfied: configobj>=5.0.6 in ./venv/lib/python3.10/site-packages (from dvc) (5.0.8)\n", "Requirement already satisfied: dvc-studio-client<1,>=0.20 in ./venv/lib/python3.10/site-packages (from dvc) (0.20.0)\n", "Requirement already satisfied: omegaconf in ./venv/lib/python3.10/site-packages (from dvc) (2.3.0)\n", "Requirement already satisfied: distro>=1.3 in ./venv/lib/python3.10/site-packages (from dvc) (1.9.0)\n", "Requirement already satisfied: dpath<3,>=2.1.0 in ./venv/lib/python3.10/site-packages (from dvc) (2.1.6)\n", "Requirement already satisfied: funcy>=1.14 in ./venv/lib/python3.10/site-packages (from dvc) (2.0)\n", "Requirement already satisfied: dvc-task<1,>=0.3.0 in ./venv/lib/python3.10/site-packages (from dvc) (0.4.0)\n", "Requirement already satisfied: voluptuous>=0.11.7 in ./venv/lib/python3.10/site-packages (from dvc) (0.14.2)\n", "Requirement already satisfied: kombu in ./venv/lib/python3.10/site-packages (from dvc) (5.3.7)\n", "Requirement already satisfied: shtab<2,>=1.3.4 in ./venv/lib/python3.10/site-packages (from dvc) (1.7.1)\n", "Requirement already satisfied: iterative-telemetry>=0.0.7 in ./venv/lib/python3.10/site-packages (from dvc) (0.0.8)\n", "Requirement already satisfied: celery in ./venv/lib/python3.10/site-packages (from dvc) (5.4.0)\n", "Requirement already satisfied: pathspec>=0.10.3 in ./venv/lib/python3.10/site-packages (from dvc) (0.12.1)\n", "Requirement already satisfied: requests>=2.22 in ./venv/lib/python3.10/site-packages (from dvc) (2.31.0)\n", "Requirement already satisfied: tqdm<5,>=4.63.1 in ./venv/lib/python3.10/site-packages (from dvc) (4.66.2)\n", "Requirement already satisfied: gto<2,>=1.6.0 in ./venv/lib/python3.10/site-packages (from dvc) (1.7.1)\n", "Requirement already satisfied: networkx>=2.5 in ./venv/lib/python3.10/site-packages (from dvc) (3.3)\n", "Requirement already satisfied: pydot>=1.2.4 in ./venv/lib/python3.10/site-packages (from dvc) (2.0.0)\n", "Requirement already satisfied: six in ./venv/lib/python3.10/site-packages (from configobj>=5.0.6->dvc) (1.16.0)\n", "Requirement already satisfied: sqltrie<1,>=0.11.0 in ./venv/lib/python3.10/site-packages (from dvc-data<3.16,>=3.15->dvc) (0.11.0)\n", "Requirement already satisfied: dictdiffer>=0.8.1 in ./venv/lib/python3.10/site-packages (from dvc-data<3.16,>=3.15->dvc) (0.9.0)\n", "Requirement already satisfied: diskcache>=5.2.1 in ./venv/lib/python3.10/site-packages (from dvc-data<3.16,>=3.15->dvc) (5.6.3)\n", "Requirement already satisfied: aiohttp-retry>=2.5.0 in ./venv/lib/python3.10/site-packages (from dvc-http>=2.29.0->dvc) (2.8.3)\n", "Requirement already satisfied: billiard<5.0,>=4.2.0 in ./venv/lib/python3.10/site-packages (from celery->dvc) (4.2.0)\n", "Requirement already satisfied: python-dateutil>=2.8.2 in ./venv/lib/python3.10/site-packages (from celery->dvc) (2.9.0.post0)\n", "Requirement already satisfied: click-plugins>=1.1.1 in ./venv/lib/python3.10/site-packages (from celery->dvc) (1.1.1)\n", "Requirement already satisfied: click-repl>=0.2.0 in ./venv/lib/python3.10/site-packages (from celery->dvc) (0.3.0)\n", "Requirement already satisfied: vine<6.0,>=5.1.0 in ./venv/lib/python3.10/site-packages (from celery->dvc) (5.1.0)\n", "Requirement already satisfied: tzdata>=2022.7 in ./venv/lib/python3.10/site-packages (from celery->dvc) (2024.1)\n", "Requirement already satisfied: click<9.0,>=8.1.2 in ./venv/lib/python3.10/site-packages (from celery->dvc) (8.1.7)\n", "Requirement already satisfied: click-didyoumean>=0.3.0 in ./venv/lib/python3.10/site-packages (from celery->dvc) (0.3.1)\n", "Requirement already satisfied: atpublic>=2.3 in ./venv/lib/python3.10/site-packages (from flufl.lock<8,>=5->dvc) (4.1.0)\n", "Requirement already satisfied: pydantic!=2.0.0,<3,>=1.9.0 in ./venv/lib/python3.10/site-packages (from gto<2,>=1.6.0->dvc) (2.7.1)\n", "Requirement already satisfied: semver>=2.13.0 in ./venv/lib/python3.10/site-packages (from gto<2,>=1.6.0->dvc) (3.0.2)\n", "Requirement already satisfied: typer>=0.4.1 in ./venv/lib/python3.10/site-packages (from gto<2,>=1.6.0->dvc) (0.12.3)\n", "Requirement already satisfied: entrypoints in ./venv/lib/python3.10/site-packages (from gto<2,>=1.6.0->dvc) (0.4)\n", "Requirement already satisfied: antlr4-python3-runtime==4.9.* in ./venv/lib/python3.10/site-packages (from hydra-core>=1.1->dvc) (4.9.3)\n", "Requirement already satisfied: appdirs in ./venv/lib/python3.10/site-packages (from iterative-telemetry>=0.0.7->dvc) (1.4.4)\n", "Requirement already satisfied: filelock in ./venv/lib/python3.10/site-packages (from iterative-telemetry>=0.0.7->dvc) (3.14.0)\n", "Requirement already satisfied: amqp<6.0.0,>=5.1.1 in ./venv/lib/python3.10/site-packages (from kombu->dvc) (5.2.0)\n", "Requirement already satisfied: PyYAML>=5.1.0 in ./venv/lib/python3.10/site-packages (from omegaconf->dvc) (6.0.1)\n", "Requirement already satisfied: certifi>=2017.4.17 in ./venv/lib/python3.10/site-packages (from requests>=2.22->dvc) (2024.2.2)\n", "Requirement already satisfied: urllib3<3,>=1.21.1 in ./venv/lib/python3.10/site-packages (from requests>=2.22->dvc) (2.2.1)\n", "Requirement already satisfied: charset-normalizer<4,>=2 in ./venv/lib/python3.10/site-packages (from requests>=2.22->dvc) (3.3.2)\n", "Requirement already satisfied: idna<4,>=2.5 in ./venv/lib/python3.10/site-packages (from requests>=2.22->dvc) (3.6)\n", "Requirement already satisfied: pygments<3.0.0,>=2.13.0 in ./venv/lib/python3.10/site-packages (from rich>=12->dvc) (2.17.2)\n", "Requirement already satisfied: markdown-it-py>=2.2.0 in ./venv/lib/python3.10/site-packages (from rich>=12->dvc) (3.0.0)\n", "Requirement already satisfied: ruamel.yaml.clib>=0.2.7 in ./venv/lib/python3.10/site-packages (from ruamel.yaml>=0.17.11->dvc) (0.2.8)\n", "Requirement already satisfied: pygit2>=1.14.0 in ./venv/lib/python3.10/site-packages (from scmrepo<4,>=3.3.2->dvc) (1.15.0)\n", "Requirement already satisfied: asyncssh<3,>=2.13.1 in ./venv/lib/python3.10/site-packages (from scmrepo<4,>=3.3.2->dvc) (2.14.2)\n", "Requirement already satisfied: gitpython>3 in ./venv/lib/python3.10/site-packages (from scmrepo<4,>=3.3.2->dvc) (3.1.43)\n", "Requirement already satisfied: setuptools in ./venv/lib/python3.10/site-packages (from zc.lockfile>=1.2.1->dvc) (59.6.0)\n", "Requirement already satisfied: aiohttp in ./venv/lib/python3.10/site-packages (from aiohttp-retry>=2.5.0->dvc-http>=2.29.0->dvc) (3.9.5)\n", "Requirement already satisfied: typing-extensions>=3.6 in ./venv/lib/python3.10/site-packages (from asyncssh<3,>=2.13.1->scmrepo<4,>=3.3.2->dvc) (4.11.0)\n", "Requirement already satisfied: cryptography>=39.0 in ./venv/lib/python3.10/site-packages (from asyncssh<3,>=2.13.1->scmrepo<4,>=3.3.2->dvc) (42.0.7)\n", "Requirement already satisfied: prompt-toolkit>=3.0.36 in ./venv/lib/python3.10/site-packages (from click-repl>=0.2.0->celery->dvc) (3.0.43)\n", "Requirement already satisfied: gitdb<5,>=4.0.1 in ./venv/lib/python3.10/site-packages (from gitpython>3->scmrepo<4,>=3.3.2->dvc) (4.0.11)\n", "Requirement already satisfied: mdurl~=0.1 in ./venv/lib/python3.10/site-packages (from markdown-it-py>=2.2.0->rich>=12->dvc) (0.1.2)\n", "Requirement already satisfied: pydantic-core==2.18.2 in ./venv/lib/python3.10/site-packages (from pydantic!=2.0.0,<3,>=1.9.0->gto<2,>=1.6.0->dvc) (2.18.2)\n", "Requirement already satisfied: annotated-types>=0.4.0 in ./venv/lib/python3.10/site-packages (from pydantic!=2.0.0,<3,>=1.9.0->gto<2,>=1.6.0->dvc) (0.7.0)\n", "Requirement already satisfied: cffi>=1.16.0 in ./venv/lib/python3.10/site-packages (from pygit2>=1.14.0->scmrepo<4,>=3.3.2->dvc) (1.16.0)\n", "Requirement already satisfied: orjson in ./venv/lib/python3.10/site-packages (from sqltrie<1,>=0.11.0->dvc-data<3.16,>=3.15->dvc) (3.10.3)\n", "Requirement already satisfied: shellingham>=1.3.0 in ./venv/lib/python3.10/site-packages (from typer>=0.4.1->gto<2,>=1.6.0->dvc) (1.5.4)\n", "Requirement already satisfied: async-timeout<5.0,>=4.0 in ./venv/lib/python3.10/site-packages (from aiohttp->aiohttp-retry>=2.5.0->dvc-http>=2.29.0->dvc) (4.0.3)\n", "Requirement already satisfied: aiosignal>=1.1.2 in ./venv/lib/python3.10/site-packages (from aiohttp->aiohttp-retry>=2.5.0->dvc-http>=2.29.0->dvc) (1.3.1)\n", "Requirement already satisfied: yarl<2.0,>=1.0 in ./venv/lib/python3.10/site-packages (from aiohttp->aiohttp-retry>=2.5.0->dvc-http>=2.29.0->dvc) (1.9.4)\n", "Requirement already satisfied: multidict<7.0,>=4.5 in ./venv/lib/python3.10/site-packages (from aiohttp->aiohttp-retry>=2.5.0->dvc-http>=2.29.0->dvc) (6.0.5)\n", "Requirement already satisfied: frozenlist>=1.1.1 in ./venv/lib/python3.10/site-packages (from aiohttp->aiohttp-retry>=2.5.0->dvc-http>=2.29.0->dvc) (1.4.1)\n", "Requirement already satisfied: pycparser in ./venv/lib/python3.10/site-packages (from cffi>=1.16.0->pygit2>=1.14.0->scmrepo<4,>=3.3.2->dvc) (2.22)\n", "Requirement already satisfied: smmap<6,>=3.0.1 in ./venv/lib/python3.10/site-packages (from gitdb<5,>=4.0.1->gitpython>3->scmrepo<4,>=3.3.2->dvc) (5.0.1)\n", "Requirement already satisfied: wcwidth in ./venv/lib/python3.10/site-packages (from prompt-toolkit>=3.0.36->click-repl>=0.2.0->celery->dvc) (0.2.13)\n" ] } ], "source": [ "!pip3 install dvc" ] }, { "cell_type": "markdown", "id": "20975d62", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Stwórzmy katalog, w którym będziemy przechowywać nasz projekt:" ] }, { "cell_type": "code", "execution_count": 4, "id": "4d94e912", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "!rm -r -f IUM_10/sample-ml-project-2024\n", "!mkdir -p IUM_10/sample-ml-project-2024" ] }, { "cell_type": "code", "execution_count": 5, "id": "aae59ec2", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/home/pawel/ium/IUM_10/sample-ml-project-2024\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/home/pawel/ium/venv/lib/python3.10/site-packages/IPython/core/magics/osm.py:417: UserWarning: This is now an optional IPython functionality, setting dhist requires you to install the `pickleshare` library.\n", " self.shell.db['dhist'] = compress_dhist(dhist)[-100:]\n" ] } ], "source": [ "#Jupyter notebook magic https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-cd\n", "%cd \"IUM_10/sample-ml-project-2024\"" ] }, { "cell_type": "markdown", "id": "199c0d92", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Inicjalizujemy puste repozytorium Git (możemy też pominąć ten krok i działać w istniejącym już repozytorium)" ] }, { "cell_type": "code", "execution_count": 7, "id": "c13c525b", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Reinitialized existing Git repository in /home/pawel/ium/IUM_10/sample-ml-project-2024/.git/\n" ] } ], "source": [ "!git init" ] }, { "cell_type": "markdown", "id": "c7155369", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Teraz inicjalizujemy repozytorium DVC:" ] }, { "cell_type": "code", "execution_count": 8, "id": "44f28226", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Initialized DVC repository.\n", "\n", "You can now commit the changes to git.\n", "\n", "\u001b[31m+---------------------------------------------------------------------+\n", "\u001b[0m\u001b[31m|\u001b[0m \u001b[31m|\u001b[0m\n", "\u001b[31m|\u001b[0m DVC has enabled anonymous aggregate usage analytics. \u001b[31m|\u001b[0m\n", "\u001b[31m|\u001b[0m Read the analytics documentation (and how to opt-out) here: \u001b[31m|\u001b[0m\n", "\u001b[31m|\u001b[0m <\u001b[36mhttps://dvc.org/doc/user-guide/analytics\u001b[39m> \u001b[31m|\u001b[0m\n", "\u001b[31m|\u001b[0m \u001b[31m|\u001b[0m\n", "\u001b[31m+---------------------------------------------------------------------+\n", "\u001b[0m\n", "\u001b[33mWhat's next?\u001b[39m\n", "\u001b[33m------------\u001b[39m\n", "- Check out the documentation: <\u001b[36mhttps://dvc.org/doc\u001b[39m>\n", "- Get help and share ideas: <\u001b[36mhttps://dvc.org/chat\u001b[39m>\n", "- Star us on GitHub: <\u001b[36mhttps://github.com/iterative/dvc\u001b[39m>\n", "\u001b[0m" ] } ], "source": [ "!dvc init" ] }, { "cell_type": "markdown", "id": "00bc72ed", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Zobaczmy jakie pliki dodał (również do repozytorium git) DVC.\n", "Ich opis znajdziemy tutaj: https://dvc.org/doc/user-guide/project-structure/internal-files" ] }, { "cell_type": "code", "execution_count": 9, "id": "d1aefe16", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "On branch master\n", "\n", "No commits yet\n", "\n", "Changes to be committed:\n", " (use \"git rm --cached ...\" to unstage)\n", "\t\u001b[32mnew file: .dvc/.gitignore\u001b[m\n", "\t\u001b[32mnew file: .dvc/config\u001b[m\n", "\t\u001b[32mnew file: .dvcignore\u001b[m\n", "\n" ] } ], "source": [ "!git status" ] }, { "cell_type": "markdown", "id": "b16a62e6", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- `.dvc/config` - główny plik konfiguracyjny DVC\n", "- `.dvc/config.local` - nadpisuje wartości z `config`, do lokalnych zmian niecommitowanych do repozytorium\n", "- `.dvc/.gitignore` - pliki DVC, które nie mają znaleźć się w repo\n", "- `.dvcignore` - DVC pomija pliki zdefiniowane w tym pliku (np. aby poprawić wydajność)" ] }, { "cell_type": "markdown", "id": "72e0a272", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Możemy teraz zacommitować zmiany w git:" ] }, { "cell_type": "code", "execution_count": 10, "id": "59780e99", "metadata": { "scrolled": true, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[master (root-commit) a9746ad] Initial commit\n", " 3 files changed, 6 insertions(+)\n", " create mode 100644 .dvc/.gitignore\n", " create mode 100644 .dvc/config\n", " create mode 100644 .dvcignore\n" ] } ], "source": [ "!git commit -m \"Initial commit\"" ] }, { "cell_type": "markdown", "id": "a8861abe", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Przygotujmy przykładowe dane, pobierając je z Kaggle:" ] }, { "cell_type": "code", "execution_count": 11, "id": "f05ece1b", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Downloading iris.zip to /home/pawel/ium/IUM_10/sample-ml-project-2024\n", " 0%| | 0.00/3.60k [00:00\n", "Adding... \n", "!\u001b[A\n", "Collecting files and computing hashes in data/Iris.csv |0.00 [00:00, ?file/s\u001b[A\n", " \u001b[A\n", "!\u001b[A\n", " 0% Checking cache in '/home/pawel/ium/IUM_10/sample-ml-project-2024/.dvc/cache\u001b[A\n", " \u001b[A\n", "!\u001b[A\n", " 0%| |Adding data/Iris.csv to cache 0/1 [00:00...\" to include in what will be committed)\n", "\t\u001b[31mdata/.gitignore\u001b[m\n", "\t\u001b[31mdata/Iris.csv.dvc\u001b[m\n", "\n", "nothing added to commit but untracked files present (use \"git add\" to track)\n" ] } ], "source": [ "!git status -u" ] }, { "cell_type": "markdown", "id": "8589fecf", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Dodajmy pliki `data/Iris.csv.dvc data/.gitignore` do repozytorium git, zgodnie z sugestią DVC:" ] }, { "cell_type": "code", "execution_count": 14, "id": "460c4a17", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "!git add data/Iris.csv.dvc data/.gitignore" ] }, { "cell_type": "code", "execution_count": 15, "id": "80644077", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[master 92b2c9d] Dodano dane IRIS (DVC)\n", " 2 files changed, 6 insertions(+)\n", " create mode 100644 data/.gitignore\n", " create mode 100644 data/Iris.csv.dvc\n" ] } ], "source": [ "!git commit -m \"Dodano dane IRIS (DVC)\"" ] }, { "cell_type": "markdown", "id": "03899863", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Plik `*.dvc` zawiera m.in. hash pliku. Więcej o plikach `*.dvc`: [link](https://dvc.org/doc/user-guide/project-structure/dvc-files)" ] }, { "cell_type": "code", "execution_count": 16, "id": "8cb2ba7c", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "# %load data/Iris.csv.dvc\n" ] }, { "cell_type": "markdown", "id": "0b421d45", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Oryginalny plik `Iris.csv` został przeniesiony do katalogu ./dvc/cache/{wartość hash pliku) i podlinkowany z powrotem do oryginalnej lokalizacji. Sposób tworzenia linków może być [różny w zależności od systemu plików](https://dvc.org/doc/user-guide/large-dataset-optimization)." ] }, { "cell_type": "code", "execution_count": 18, "id": "1d471f3a", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "total 8\n", "-r--r--r-- 1 pawel pawel 5107 Sep 19 2019 7820ef0af287ff346c5cabfb4c612c\n" ] } ], "source": [ "!ls -l .dvc/cache/files/md5/71" ] }, { "cell_type": "code", "execution_count": 19, "id": "32531aa8", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species\n", "1,5.1,3.5,1.4,0.2,Iris-setosa\n", "2,4.9,3.0,1.4,0.2,Iris-setosa\n" ] } ], "source": [ "!head -n 3 .dvc/cache/files/md5/71/7820ef0af287ff346c5cabfb4c612c" ] }, { "cell_type": "markdown", "id": "901e8e90", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## dvc remote\n", " - Żeby wysłać właściwe pliki śledzone przez DVC do zdalnej lokalizacji (z której będą mogłby być pobrane np. przez system CI albo innych użytkowników), musimy mieć skonfigurowaną taką lokazliację.\n", " - Służy do tego polecenie [`dvc remote add`](https://dvc.org/doc/command-reference/remote/add).\n", " - Użyjemy lokalnego \"remote\". Tutaj będzie to po prostu utworzony wcześniej katalog `~/dvcstore`. Taki katalog istnieje też na naszym Jenkinsie, oczywiście należy go podmontować w Dockerze.\n", " - W rzeczywistych zastosowaniach podalibyśmy tutaj ścieżkę do jakiegoś zasobu dostępnego przez internet jak np. serwer SFTP, ścieżka do AWS S3 itp." ] }, { "cell_type": "markdown", "id": "53429521", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Obsługiwane typy zdalnych lokalizacji (remotes): https://dvc.org/doc/command-reference/remote/add#supported-storage-types\n", " - Amazon S3\n", " - S3-compatible storage\n", " - Microsoft Azure Blob Storage\n", " - Google Drive\n", " - Google Cloud Storage\n", " - Aliyun OSS\n", " - SSH\n", " - HDFS\n", " - WebHDFS\n", " - HTTP\n", " - WebDAV\n", " - local remote" ] }, { "cell_type": "markdown", "id": "507e3a09", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Dodawanie remote typu local" ] }, { "cell_type": "code", "execution_count": 30, "id": "a16f2bfa", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Setting 'my_local_remote' as a default remote.\n", "\u001b[0m" ] } ], "source": [ "!dvc remote add -d my_local_remote ~/dvcstore" ] }, { "cell_type": "code", "execution_count": 31, "id": "9c3deeaf", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "On branch master\n", "Changes not staged for commit:\n", " (use \"git add ...\" to update what will be committed)\n", " (use \"git restore ...\" to discard changes in working directory)\n", "\t\u001b[31mmodified: .dvc/config\u001b[m\n", "\n", "no changes added to commit (use \"git add\" and/or \"git commit -a\")\n" ] } ], "source": [ "!git status" ] }, { "cell_type": "code", "execution_count": 32, "id": "899eac7d", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[master 7123494] Added DVC remote\n", " 1 file changed, 1 insertion(+), 1 deletion(-)\n" ] } ], "source": [ "!git add .dvc/config\n", "!git commit -m \"Added DVC remote\"" ] }, { "cell_type": "markdown", "id": "8c556c96", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## dvc push\n", "Kiedy mamy już skonfigurowany \"remote\" możemy wypchnąć do niego pliki korzystając z polecenia `dvc push`:" ] }, { "cell_type": "code", "execution_count": 33, "id": "c7f24f75", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Collecting |1.00 [00:00, 137entry/s]\n", "Pushing\n", "!\u001b[A\n", " 0% Checking cache in '/home/pawel/dvcstore/files/md5'| |0/? [00:00\n", "Adding... \n", "!\u001b[A\n", "Collecting files and computing hashes in data/Iris.csv |0.00 [00:00, ?file/s\u001b[A\n", " \u001b[A\n", "!\u001b[A\n", " 0% Checking cache in '/home/pawel/ium/IUM_10/sample-ml-project-2024/.dvc/cache\u001b[A\n", " \u001b[A\n", "!\u001b[A\n", " 0%| |Adding data/Iris.csv to cache 0/1 [00:00 'data/data.xml'\n", " 0% Downloading data.xml| |0/1 [00:00=0.17.11 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (0.18.6)\n", "Requirement already satisfied: fsspec in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (2024.5.0)\n", "Requirement already satisfied: dvc-studio-client<1,>=0.20 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (0.20.0)\n", "Requirement already satisfied: tomlkit>=0.11.1 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (0.12.5)\n", "Requirement already satisfied: dvc-objects in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (5.1.0)\n", "Requirement already satisfied: distro>=1.3 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (1.9.0)\n", "Requirement already satisfied: pygtrie>=2.3.2 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (2.5.0)\n", "Requirement already satisfied: voluptuous>=0.11.7 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (0.14.2)\n", "Requirement already satisfied: attrs>=22.2.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (23.2.0)\n", "Requirement already satisfied: dvc-http>=2.29.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (2.32.0)\n", "Requirement already satisfied: rich>=12 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (13.7.1)\n", "Requirement already satisfied: dulwich in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (0.22.1)\n", "Requirement already satisfied: pyparsing>=2.4.7 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (3.1.2)\n", "Requirement already satisfied: shortuuid>=0.5 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (1.0.13)\n", "Requirement already satisfied: flufl.lock<8,>=5 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (7.1.1)\n", "Requirement already satisfied: kombu in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (5.3.7)\n", "Requirement already satisfied: iterative-telemetry>=0.0.7 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (0.0.8)\n", "Requirement already satisfied: dpath<3,>=2.1.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (2.1.6)\n", "Requirement already satisfied: colorama>=0.3.9 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (0.4.6)\n", "Requirement already satisfied: celery in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (5.4.0)\n", "Requirement already satisfied: packaging>=19 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (24.0)\n", "Requirement already satisfied: tabulate>=0.8.7 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (0.9.0)\n", "Requirement already satisfied: shtab<2,>=1.3.4 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (1.7.1)\n", "Requirement already satisfied: scmrepo<4,>=3.3.2 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (3.3.5)\n", "Requirement already satisfied: dvc-render<2,>=1.0.1 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (1.0.2)\n", "Requirement already satisfied: gto<2,>=1.6.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (1.7.1)\n", "Requirement already satisfied: pydot>=1.2.4 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (2.0.0)\n", "Requirement already satisfied: psutil>=5.8 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (5.9.8)\n", "Requirement already satisfied: configobj>=5.0.6 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (5.0.8)\n", "Requirement already satisfied: funcy>=1.14 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (2.0)\n", "Requirement already satisfied: grandalf<1,>=0.7 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (0.8)\n", "Requirement already satisfied: dvc-task<1,>=0.3.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (0.4.0)\n", "Requirement already satisfied: requests>=2.22 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (2.31.0)\n", "Requirement already satisfied: zc.lockfile>=1.2.1 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (3.0.post1)\n", "Requirement already satisfied: flatten-dict<1,>=0.4.1 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (0.4.2)\n", "Requirement already satisfied: networkx>=2.5 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (3.3)\n", "Requirement already satisfied: pathspec>=0.10.3 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (0.12.1)\n", "Requirement already satisfied: hydra-core>=1.1 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (1.3.2)\n", "Requirement already satisfied: omegaconf in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (2.3.0)\n", "Requirement already satisfied: tqdm<5,>=4.63.1 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (4.66.2)\n", "Requirement already satisfied: dvc-data<3.16,>=3.15 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (3.15.1)\n", "Requirement already satisfied: platformdirs<4,>=3.1.1 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc[ssh]) (3.11.0)\n", "Collecting dvc-ssh<5,>=4\n", " Downloading dvc_ssh-4.1.1-py3-none-any.whl (15 kB)\n", "Collecting bcrypt>=3.2\n", " Downloading bcrypt-4.1.3-cp39-abi3-manylinux_2_28_x86_64.whl (283 kB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m283.7/283.7 KB\u001b[0m \u001b[31m12.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hCollecting pynacl>=1.5\n", " Downloading PyNaCl-1.5.0-cp36-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (856 kB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m856.7/856.7 KB\u001b[0m \u001b[31m14.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0ma \u001b[36m0:00:01\u001b[0m\n", "\u001b[?25hRequirement already satisfied: cryptography>=3.3 in /home/pawel/ium/venv/lib/python3.10/site-packages (from paramiko) (42.0.7)\n", "Requirement already satisfied: six in /home/pawel/ium/venv/lib/python3.10/site-packages (from configobj>=5.0.6->dvc[ssh]) (1.16.0)\n", "Requirement already satisfied: cffi>=1.12 in /home/pawel/ium/venv/lib/python3.10/site-packages (from cryptography>=3.3->paramiko) (1.16.0)\n", "Requirement already satisfied: diskcache>=5.2.1 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc-data<3.16,>=3.15->dvc[ssh]) (5.6.3)\n", "Requirement already satisfied: dictdiffer>=0.8.1 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc-data<3.16,>=3.15->dvc[ssh]) (0.9.0)\n", "Requirement already satisfied: sqltrie<1,>=0.11.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc-data<3.16,>=3.15->dvc[ssh]) (0.11.0)\n", "Requirement already satisfied: aiohttp-retry>=2.5.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from dvc-http>=2.29.0->dvc[ssh]) (2.8.3)\n", "Collecting sshfs[bcrypt]>=2023.4.1\n", " Downloading sshfs-2024.4.1-py3-none-any.whl (15 kB)\n", "Requirement already satisfied: billiard<5.0,>=4.2.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from celery->dvc[ssh]) (4.2.0)\n", "Requirement already satisfied: tzdata>=2022.7 in /home/pawel/ium/venv/lib/python3.10/site-packages (from celery->dvc[ssh]) (2024.1)\n", "Requirement already satisfied: python-dateutil>=2.8.2 in /home/pawel/ium/venv/lib/python3.10/site-packages (from celery->dvc[ssh]) (2.9.0.post0)\n", "Requirement already satisfied: vine<6.0,>=5.1.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from celery->dvc[ssh]) (5.1.0)\n", "Requirement already satisfied: click-plugins>=1.1.1 in /home/pawel/ium/venv/lib/python3.10/site-packages (from celery->dvc[ssh]) (1.1.1)\n", "Requirement already satisfied: click-repl>=0.2.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from celery->dvc[ssh]) (0.3.0)\n", "Requirement already satisfied: click<9.0,>=8.1.2 in /home/pawel/ium/venv/lib/python3.10/site-packages (from celery->dvc[ssh]) (8.1.7)\n", "Requirement already satisfied: click-didyoumean>=0.3.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from celery->dvc[ssh]) (0.3.1)\n", "Requirement already satisfied: atpublic>=2.3 in /home/pawel/ium/venv/lib/python3.10/site-packages (from flufl.lock<8,>=5->dvc[ssh]) (4.1.0)\n", "Requirement already satisfied: semver>=2.13.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from gto<2,>=1.6.0->dvc[ssh]) (3.0.2)\n", "Requirement already satisfied: pydantic!=2.0.0,<3,>=1.9.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from gto<2,>=1.6.0->dvc[ssh]) (2.7.1)\n", "Requirement already satisfied: entrypoints in /home/pawel/ium/venv/lib/python3.10/site-packages (from gto<2,>=1.6.0->dvc[ssh]) (0.4)\n", "Requirement already satisfied: typer>=0.4.1 in /home/pawel/ium/venv/lib/python3.10/site-packages (from gto<2,>=1.6.0->dvc[ssh]) (0.12.3)\n", "Requirement already satisfied: antlr4-python3-runtime==4.9.* in /home/pawel/ium/venv/lib/python3.10/site-packages (from hydra-core>=1.1->dvc[ssh]) (4.9.3)\n", "Requirement already satisfied: filelock in /home/pawel/ium/venv/lib/python3.10/site-packages (from iterative-telemetry>=0.0.7->dvc[ssh]) (3.14.0)\n", "Requirement already satisfied: appdirs in /home/pawel/ium/venv/lib/python3.10/site-packages (from iterative-telemetry>=0.0.7->dvc[ssh]) (1.4.4)\n", "Requirement already satisfied: amqp<6.0.0,>=5.1.1 in /home/pawel/ium/venv/lib/python3.10/site-packages (from kombu->dvc[ssh]) (5.2.0)\n", "Requirement already satisfied: PyYAML>=5.1.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from omegaconf->dvc[ssh]) (6.0.1)\n", "Requirement already satisfied: certifi>=2017.4.17 in /home/pawel/ium/venv/lib/python3.10/site-packages (from requests>=2.22->dvc[ssh]) (2024.2.2)\n", "Requirement already satisfied: idna<4,>=2.5 in /home/pawel/ium/venv/lib/python3.10/site-packages (from requests>=2.22->dvc[ssh]) (3.6)\n", "Requirement already satisfied: charset-normalizer<4,>=2 in /home/pawel/ium/venv/lib/python3.10/site-packages (from requests>=2.22->dvc[ssh]) (3.3.2)\n", "Requirement already satisfied: urllib3<3,>=1.21.1 in /home/pawel/ium/venv/lib/python3.10/site-packages (from requests>=2.22->dvc[ssh]) (2.2.1)\n", "Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from rich>=12->dvc[ssh]) (2.17.2)\n", "Requirement already satisfied: markdown-it-py>=2.2.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from rich>=12->dvc[ssh]) (3.0.0)\n", "Requirement already satisfied: ruamel.yaml.clib>=0.2.7 in /home/pawel/ium/venv/lib/python3.10/site-packages (from ruamel.yaml>=0.17.11->dvc[ssh]) (0.2.8)\n", "Requirement already satisfied: pygit2>=1.14.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from scmrepo<4,>=3.3.2->dvc[ssh]) (1.15.0)\n", "Requirement already satisfied: gitpython>3 in /home/pawel/ium/venv/lib/python3.10/site-packages (from scmrepo<4,>=3.3.2->dvc[ssh]) (3.1.43)\n", "Requirement already satisfied: asyncssh<3,>=2.13.1 in /home/pawel/ium/venv/lib/python3.10/site-packages (from scmrepo<4,>=3.3.2->dvc[ssh]) (2.14.2)\n", "Requirement already satisfied: setuptools in /home/pawel/ium/venv/lib/python3.10/site-packages (from zc.lockfile>=1.2.1->dvc[ssh]) (59.6.0)\n", "Requirement already satisfied: aiohttp in /home/pawel/ium/venv/lib/python3.10/site-packages (from aiohttp-retry>=2.5.0->dvc-http>=2.29.0->dvc[ssh]) (3.9.5)\n", "Requirement already satisfied: typing-extensions>=3.6 in /home/pawel/ium/venv/lib/python3.10/site-packages (from asyncssh<3,>=2.13.1->scmrepo<4,>=3.3.2->dvc[ssh]) (4.11.0)\n", "Requirement already satisfied: pycparser in /home/pawel/ium/venv/lib/python3.10/site-packages (from cffi>=1.12->cryptography>=3.3->paramiko) (2.22)\n", "Requirement already satisfied: prompt-toolkit>=3.0.36 in /home/pawel/ium/venv/lib/python3.10/site-packages (from click-repl>=0.2.0->celery->dvc[ssh]) (3.0.43)\n", "Requirement already satisfied: gitdb<5,>=4.0.1 in /home/pawel/ium/venv/lib/python3.10/site-packages (from gitpython>3->scmrepo<4,>=3.3.2->dvc[ssh]) (4.0.11)\n", "Requirement already satisfied: mdurl~=0.1 in /home/pawel/ium/venv/lib/python3.10/site-packages (from markdown-it-py>=2.2.0->rich>=12->dvc[ssh]) (0.1.2)\n", "Requirement already satisfied: pydantic-core==2.18.2 in /home/pawel/ium/venv/lib/python3.10/site-packages (from pydantic!=2.0.0,<3,>=1.9.0->gto<2,>=1.6.0->dvc[ssh]) (2.18.2)\n", "Requirement already satisfied: annotated-types>=0.4.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from pydantic!=2.0.0,<3,>=1.9.0->gto<2,>=1.6.0->dvc[ssh]) (0.7.0)\n", "Requirement already satisfied: orjson in /home/pawel/ium/venv/lib/python3.10/site-packages (from sqltrie<1,>=0.11.0->dvc-data<3.16,>=3.15->dvc[ssh]) (3.10.3)\n", "Requirement already satisfied: shellingham>=1.3.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from typer>=0.4.1->gto<2,>=1.6.0->dvc[ssh]) (1.5.4)\n", "Requirement already satisfied: async-timeout<5.0,>=4.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from aiohttp->aiohttp-retry>=2.5.0->dvc-http>=2.29.0->dvc[ssh]) (4.0.3)\n", "Requirement already satisfied: aiosignal>=1.1.2 in /home/pawel/ium/venv/lib/python3.10/site-packages (from aiohttp->aiohttp-retry>=2.5.0->dvc-http>=2.29.0->dvc[ssh]) (1.3.1)\n", "Requirement already satisfied: multidict<7.0,>=4.5 in /home/pawel/ium/venv/lib/python3.10/site-packages (from aiohttp->aiohttp-retry>=2.5.0->dvc-http>=2.29.0->dvc[ssh]) (6.0.5)\n", "Requirement already satisfied: yarl<2.0,>=1.0 in /home/pawel/ium/venv/lib/python3.10/site-packages (from aiohttp->aiohttp-retry>=2.5.0->dvc-http>=2.29.0->dvc[ssh]) (1.9.4)\n", "Requirement already satisfied: frozenlist>=1.1.1 in /home/pawel/ium/venv/lib/python3.10/site-packages (from aiohttp->aiohttp-retry>=2.5.0->dvc-http>=2.29.0->dvc[ssh]) (1.4.1)\n", "Requirement already satisfied: smmap<6,>=3.0.1 in /home/pawel/ium/venv/lib/python3.10/site-packages (from gitdb<5,>=4.0.1->gitpython>3->scmrepo<4,>=3.3.2->dvc[ssh]) (5.0.1)\n", "Requirement already satisfied: wcwidth in /home/pawel/ium/venv/lib/python3.10/site-packages (from prompt-toolkit>=3.0.36->click-repl>=0.2.0->celery->dvc[ssh]) (0.2.13)\n", "Installing collected packages: bcrypt, pynacl, paramiko, sshfs, dvc-ssh\n", "Successfully installed bcrypt-4.1.3 dvc-ssh-4.1.1 paramiko-3.4.0 pynacl-1.5.0 sshfs-2024.4.1\n" ] } ], "source": [ "# conda install -c conda-forge dvc-ssh\n", "\n", "!pip install dvc[ssh] paramiko" ] }, { "cell_type": "code", "execution_count": 51, "id": "9662b7aa", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[sudo] password for pawel: \n" ] } ], "source": [ "## Poniższe są potrzebne, żeby polecania dvc remote działały:\n", "!sudo apt install libssl3 libffi7" ] }, { "cell_type": "markdown", "id": "04c41da0", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Dodajemy remote:" ] }, { "cell_type": "code", "execution_count": 52, "id": "e9a04876", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Setting 'ium_ssh_remote' as a default remote.\n", "\u001b[0m" ] } ], "source": [ "!dvc remote add -f -d ium_ssh_remote ssh://ium-sftp@tzietkiewicz.vm.wmi.amu.edu.pl" ] }, { "cell_type": "code", "execution_count": 53, "id": "e3f27bbb", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "my_local_remote\t/home/pawel/dvcstore\n", "ium_ssh_remote\tssh://ium-sftp@tzietkiewicz.vm.wmi.amu.edu.pl\n", "\u001b[0m" ] } ], "source": [ "!dvc remote list" ] }, { "cell_type": "markdown", "id": "c92edd7b", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Zapisujemy hasło:" ] }, { "cell_type": "code", "execution_count": 54, "id": "5b2fa175", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[0m" ] } ], "source": [ "!dvc remote modify --local ium_ssh_remote password IUM@2021" ] }, { "cell_type": "markdown", "id": "8b83049b", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Pushujemy do skonfigurowanego remote:" ] }, { "cell_type": "code", "execution_count": 55, "id": "ea6e16fa", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Collecting |1.00 [00:00, 252entry/s]\n", "Pushing\n", "!\u001b[A\n", " 0% Checking cache in 'files/md5'| |0/? [00:00