{ "cells": [ { "cell_type": "markdown", "id": "9d06fc91", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Inżynieria uczenia maszynowego\n", "### 29 maja 2024\n", "# 11. GitHub Actions" ] }, { "cell_type": "markdown", "id": "beeb17b2", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "" ] }, { "cell_type": "markdown", "id": "752995e1", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ " - https://docs.github.com/en/actions\n", " - System ciągłej integracji „wbudowany” w GitHub\n", " - Darmowy dla publicznych repozytoriów (z większymi niż w płatnych planach [ograniczeniami dotyczącymi zasobów](https://docs.github.com/en/actions/reference/usage-limits-billing-and-administration#usage-limits))\n", " - https://youtu.be/cP0I9w2coGU" ] }, { "cell_type": "markdown", "id": "b66dd41f", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Terminologia GitHub Actions\n", " - ***Workflow*** odpowiada *pipeline*'owi z Jenkinsa.\n", " - ***Event*** to zdarzenie, które uruchamia/wyzwala (*triggers*) *workflow*. Np. wypchnięcie zmiany do repozytorium (*push*), utworzenie pull requestu ([pełna lista tutaj](https://docs.github.com/en/actions/reference/events-that-trigger-workflows)).\n", " - ***Job*** - zadanie. Workflow składa się z jednego lub kilku zadań (*jobs*). Każde z nich może być wykonywane równolegle na innej maszynie (patrz *runner*).\n", " - ***Step*** (krok) - odpowiednik *stage* z Jenkinsa - służy do grupowania *actions*.\n", " - ***Action/command*** (akcja/polecenie) - odpowiednik *step* z Jenkinsa - pojedyncze polecenie do wykonania, np. dodanie komentarza do pull requestu, wykonanie polecenia systemowego itp.\n", " - ***Runner*** (wykonawca) - odpowiednik jenkinsowego *agent* - serwer, na którym mogą być wykonywane zadania (*jobs*):\n", " - *GitHub-hosted runner* - serwer utrzymywany przez GitHub (2-core CPU, 7 GB RAM, 14 GB SSD). Windows, Linux albo macOS.\n", " - *Self-hosted runner* - własny serwer, z zainstalowaną aplikacją [GitHub Actions Runner](https://github.com/actions/runner)." ] }, { "cell_type": "markdown", "id": "9f1f6d0a", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Definicja *workflow*\n", " - *Workflow* definiuje się w plikach YAML (o rozszerzeniu `*.yml` albo `*.yaml`) umieszczonych w specjalnym folderze `.github/workflows/` wewnątrz repozytorium.\n", " - Pełna składnia jest opisana [tutaj](https://docs.github.com/en/actions/reference/workflow-syntax-for-github-actions).\n", " - Podstawowe pola:\n", " - `name` (opcjonalne) - nazwa, pod którą *workflow*/*step* będzie widoczny w UI. Domyślnie: ścieżka do pliku YAML.\n", " - `on` - definiuje, kiedy workflow ma być uruchomiony.\n", " - `jobs` - grupuje razem zadania (*jobs*) do wykonania. Każde może być wykonane na innym „wykonawcy” (*runner*). Domyślnie wykonywane są równolegle (ale możemy definiować [zależności między jobami](https://docs.github.com/en/actions/reference/workflow-syntax-for-github-actions#jobsjob_idneeds), co powoduje wykonanie ich sekwencyjnie).\n", " - `runs-on` - parametr zadania (*job*) definiujący, na jakiej maszynie wirtualnej ma być uruchomiony (np. `ubuntu-latest`).\n", " - `uses` - umożliwia użycie gotowych akcji zdefiniowanych przez nas albo przez innych użytkowników, np. `-uses: actions/checkout@v2` spowoduje *checkout* plików z repozytorium.\n", " - `run` - pozwala uruchomić dowolne ([dostępne/zainstalowane](https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners#preinstalled-software)) polecenie, np. `python3 train.py`\n", " - `env` - pozwala zdefiniować zmienne środowiskowe dostępne dla akcji lub skorzystać ze [zmiennych ustawionych przez Github](https://docs.github.com/en/actions/reference/environment-variables#default-environment-variables)." ] }, { "cell_type": "code", "execution_count": 9, "id": "f4916c1f", "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/home/pawel/ium/IUM_11/github-actions-hello\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/home/pawel/ium/venv/lib/python3.10/site-packages/IPython/core/magics/osm.py:417: UserWarning: This is now an optional IPython functionality, setting dhist requires you to install the `pickleshare` library.\n", " self.shell.db['dhist'] = compress_dhist(dhist)[-100:]\n" ] } ], "source": [ "!mkdir -p IUM_11/github-actions-hello\n", "%cd IUM_11/github-actions-hello\n", "!mkdir -p .github/workflows" ] }, { "cell_type": "code", "execution_count": 10, "id": "88ce689f", "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Initialized empty Git repository in /home/pawel/ium/IUM_11/github-actions-hello/.git/\n", "Switched to a new branch 'main'\n" ] } ], "source": [ "!git init\n", "!git checkout -b main\n", "!git remote add origin git@github.com:USERNAME/ium-ga-hello.git\n", "!git push -u origin main" ] }, { "cell_type": "code", "execution_count": 11, "id": "dde8d432", "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Writing .github/workflows/workflow.yml\n" ] } ], "source": [ "%%writefile .github/workflows/workflow.yml\n", "name: github-actions-hello\n", "on: [push]\n", "jobs:\n", " hello-job:\n", " runs-on: ubuntu-latest\n", " steps:\n", " - name: Checkout repo\n", " uses: actions/checkout@v2\n", " - name: Setup Python\n", " uses: actions/setup-python@v2.2.2\n", " with:\n", " python-version: '3.10'\n", " - run: python3 --version" ] }, { "cell_type": "code", "execution_count": 22, "id": "ff1e011e", "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "On branch main\n", "Your branch is up to date with 'origin/main'.\n", "\n", "nothing to commit, working tree clean\n", "Everything up-to-date\n" ] } ], "source": [ "!git add .github/workflows/workflow.yml\n", "!git commit -m \"Github Actions Workflow\"\n", "!git push" ] }, { "cell_type": "markdown", "id": "3e237076", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Zakładka *Actions* na stronie repozytorium:\n", "https://github.com/skorzewski/ium-ga-hello/actions" ] }, { "cell_type": "code", "execution_count": 12, "id": "32701383", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "total 24\n", "drwxr-xr-x 2 pawel pawel 4096 May 28 10:10 .\n", "drwxr-xr-x 3 pawel pawel 4096 May 28 10:10 ..\n", "-rw-r--r-- 1 pawel pawel 1451 May 28 10:10 docker-artifact.yml\n", "-rw-r--r-- 1 pawel pawel 882 May 28 10:10 docker.yml\n", "-rw-r--r-- 1 pawel pawel 603 May 28 10:10 parametrized.yml\n", "-rw-r--r-- 1 pawel pawel 306 May 28 10:10 workflow.yml\n" ] } ], "source": [ "!ls -al .github/workflows" ] }, { "cell_type": "markdown", "id": "1c01acb5", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Ręczne wywoływanie\n", "Workflow można również wywołać ręcznie, podając parametry.\n", "Więcej informacji np. tutaj: https://github.blog/changelog/2020-07-06-github-actions-manual-triggers-with-workflow_dispatch/" ] }, { "cell_type": "code", "execution_count": 42, "id": "a7250bf7", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Overwriting .github/workflows/parametrized.yml\n" ] } ], "source": [ "%%writefile .github/workflows/parametrized.yml\n", "name: github-actions-hello-parametrized\n", "on: \n", " workflow_dispatch:\n", " inputs:\n", " input_text:\n", " description: 'Text to display' \n", " required: true\n", " default: 'Hello World'\n", "jobs:\n", " hello-job:\n", " runs-on: ubuntu-latest\n", " steps:\n", " - name: Checkout repo\n", " uses: actions/checkout@v2\n", " - name: Install dependencies\n", " run:\n", " sudo apt update;\n", " sudo apt install -y figlet\n", " - name: Write\n", " run:\n", " figlet \"${{ github.event.inputs.input_text }}\"" ] }, { "cell_type": "code", "execution_count": 43, "id": "36ddaac0", "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[main a98938d] just dispatch\n", " 1 file changed, 6 deletions(-)\n", "Enumerating objects: 9, done.\n", "Counting objects: 100% (9/9), done.\n", "Delta compression using up to 4 threads\n", "Compressing objects: 100% (3/3), done.\n", "Writing objects: 100% (5/5), 411 bytes | 411.00 KiB/s, done.\n", "Total 5 (delta 1), reused 0 (delta 0), pack-reused 0\n", "remote: Resolving deltas: 100% (1/1), completed with 1 local object.\u001b[K\n", "To github.com:TomekZet/ium-ga-hello.git\n", " 6c4a361..a98938d main -> main\n" ] } ], "source": [ "!git add -u .github/workflows\n", "!git commit -m \"just dispatch\"\n", "!git push" ] }, { "cell_type": "markdown", "id": "ed780dea", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Zależności\n", "\n", "Maszyny wirtualne (*runners*), na których uruchamiane są zadania, mają zainstalowany zbiór narzędzi. Przykładowa lista dla Ubuntu 24.04: https://github.com/actions/runner-images/blob/main/images/ubuntu/Ubuntu2404-Readme.md\n", "\n", "Brakujące zależności można zainstalować, korzystając z:\n", " - akcji\n", " - poleceń systemowych takich jak `apt install` czy `pip install` uruchomionych poprzez `run`. Patrz [przykład](https://docs.github.com/en/actions/using-github-hosted-runners/customizing-github-hosted-runners#installing-software-on-ubuntu-runners)" ] }, { "cell_type": "markdown", "id": "28b582c4", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Akcje\n", "Za pomocą polecenia `uses` możemy używać przygotowanych wcześniej akcji. Mogą one pochodzić:\n", " - z tego samego repozytorium co workflow ([więcej](https://docs.github.com/en/actions/learn-github-actions/finding-and-customizing-actions#referencing-an-action-in-the-same-repository-where-a-workflow-file-uses-the-action))\n", " - z dowolnego publicznego repozytorium Github (np. [repozytorioum iterative/setup-clm](https://github.com/iterative/setup-cml), patrz przykład poniżej\n", " - z [Github Marketplace](https://github.com/marketplace?type=actions)" ] }, { "cell_type": "markdown", "id": "a764cc0d", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Akcje wykonywane w kontenerze Docker\n", "Akcja może być wywołana w kontenerze Docker (pobranym z Docker Hub albo zbudowanym z `Dockerfile`).\n", "W tym celu należy stworzyć własną akcję w pliku `action.yaml` i potem użyć jej w *workflow*." ] }, { "cell_type": "markdown", "id": "6c9eea3e", "metadata": {}, "source": [ "W oficjalnej dokumentacji GitHuba można znaleźć materiały na temat wykorzystania Dockera w GitHub Actions:\n", "- [Creating a Docker container action](https://docs.github.com/en/actions/creating-actions/creating-a-docker-container-action)\n", "- [Dockerfile support for GitHub Actions](https://docs.github.com/en/actions/creating-actions/dockerfile-support-for-github-actions)" ] }, { "cell_type": "code", "execution_count": 59, "id": "ff4dab8c", "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Overwriting action.yml\n" ] } ], "source": [ "%%writefile action.yml\n", "name: 'Hello World'\n", "description: 'Greet someone and record the time'\n", "inputs:\n", " who-to-greet: # id of input\n", " description: 'Who to greet'\n", " required: true\n", " default: 'World'\n", "outputs:\n", " time: # id of output\n", " description: 'The time we greeted you'\n", "runs:\n", " using: 'docker'\n", " image: 'Dockerfile'\n", " args:\n", " - ${{ inputs.who-to-greet }}" ] }, { "cell_type": "code", "execution_count": 80, "id": "f1aaff7c", "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Overwriting Dockerfile\n" ] } ], "source": [ "%%writefile Dockerfile\n", "# Container image that runs your code\n", "FROM ubuntu:latest\n", " \n", "RUN apt update && apt install -y figlet\n", "\n", "# Copies your code file from your action repository to the filesystem path `/` of the container\n", "COPY entrypoint.sh /entrypoint.sh\n", "\n", "VOLUME /github/workspace/\n", "\n", "WORKDIR /github/workspace/\n", "\n", "# Code file to execute when the docker container starts up (`entrypoint.sh`)\n", "ENTRYPOINT [\"/entrypoint.sh\"]" ] }, { "cell_type": "code", "execution_count": 84, "id": "7f778025", "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Overwriting entrypoint.sh\n" ] } ], "source": [ "%%writefile entrypoint.sh\n", "#!/bin/sh -l\n", "\n", "figlet \"Hello $1\" | tee figlet.txt\n", "echo \"Entrypoint invoked in: $PWD\"\n", "readlink -f figlet.txt\n", "time=$(date)\n", "echo \"time=$time\" >> $GITHUB_OUTPUT" ] }, { "cell_type": "code", "execution_count": 60, "id": "911975de", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "!chmod +x entrypoint.sh" ] }, { "cell_type": "code", "execution_count": 62, "id": "483e0498", "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Overwriting .github/workflows/docker.yml\n" ] } ], "source": [ "%%writefile .github/workflows/docker.yml\n", "name: github-actions-hello-docker\n", "on: \n", " workflow_dispatch:\n", " inputs:\n", " input_text:\n", " description: 'Who to greet' \n", " required: true\n", " default: 'World'\n", "jobs:\n", " hello-job:\n", " runs-on: ubuntu-latest\n", " steps:\n", " - name: Checkout repo\n", " uses: actions/checkout@v2\n", " - name: Use docker action\n", " id: hello\n", " uses: ./\n", " with:\n", " who-to-greet: \"${{ github.event.inputs.input_text }}\"\n", " # Use the output from the `hello` step\n", " - name: Get the output time\n", " run: echo \"The time was ${{ steps.hello.outputs.time }}\"\n", " \n", " " ] }, { "cell_type": "code", "execution_count": 63, "id": "bc24dff3", "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[main 22a5094] Fix path\n", " 1 file changed, 1 insertion(+)\n", "Enumerating objects: 9, done.\n", "Counting objects: 100% (9/9), done.\n", "Delta compression using up to 4 threads\n", "Compressing objects: 100% (5/5), done.\n", "Writing objects: 100% (5/5), 570 bytes | 570.00 KiB/s, done.\n", "Total 5 (delta 1), reused 0 (delta 0), pack-reused 0\n", "remote: Resolving deltas: 100% (1/1), completed with 1 local object.\u001b[K\n", "To github.com:TomekZet/ium-ga-hello.git\n", " 97c7272..22a5094 main -> main\n" ] } ], "source": [ "!git add .github entrypoint.sh Dockerfile\n", "!git commit -m \"Fix path\"\n", "!git push" ] }, { "cell_type": "markdown", "id": "12af9d1b", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Archiwizowanie artefaktów\n", "https://docs.github.com/en/actions/using-workflows/storing-workflow-data-as-artifacts\n", "\n", "Do archiwizowania artefaktów służy akcja \"upload-artifact\":\n", "\n", "```yaml\n", " - name: Archive artifacts\n", " uses: actions/upload-artifact@v3\n", " with:\n", " name: figlet-output\n", " path: figlet.txt\n", "```" ] }, { "cell_type": "code", "execution_count": 11, "id": "245f7c8a", "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Overwriting .github/workflows/docker-artifact.yml\n" ] } ], "source": [ "%%writefile .github/workflows/docker-artifact.yml\n", "name: github-actions-hello-docker-artifact\n", "on: \n", " workflow_dispatch:\n", " inputs:\n", " input_text:\n", " description: 'Who to greet' \n", " required: true\n", " default: 'World'\n", "jobs:\n", " hello-job:\n", " name: \"Do all the hard stuff\"\n", " runs-on: ubuntu-latest\n", " steps:\n", " - name: Checkout repo\n", " uses: actions/checkout@v2\n", " - name: Use docker action\n", " id: hello\n", " uses: ./\n", " with:\n", " who-to-greet: \"${{ github.event.inputs.input_text }}\"\n", " # Use the output from the `hello` step\n", " - name: Get the output time\n", " run: echo \"The time was ${{ steps.hello.outputs.time }}\" > time.txt\n", " - name: Archive artifacts\n", " uses: actions/upload-artifact@v3\n", " with:\n", " name: figlet-output\n", " path: |\n", " figlet.txt\n", " time.txt\n", " publish:\n", " name: \"Publish as github comment\"\n", " runs-on: ubuntu-latest\n", " needs: hello-job\n", " steps:\n", " - uses: actions/checkout@v3\n", " #We need to download the artifact first, jobs do not share workflow files\n", " - name: get-artifact \n", " uses: actions/download-artifact@v3\n", " with:\n", " name: figlet-output\n", " - name: display_artifact_contents\n", " run:\n", " cat time.txt ; tr ' ' '#' < figlet.txt\n" ] }, { "cell_type": "code", "execution_count": 12, "id": "47e301f9", "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[main 5a40228] Archive in one job, use in other\n", " 1 file changed, 1 insertion(+)\n", "Enumerating objects: 9, done.\n", "Counting objects: 100% (9/9), done.\n", "Delta compression using up to 4 threads\n", "Compressing objects: 100% (5/5), done.\n", "Writing objects: 100% (5/5), 622 bytes | 622.00 KiB/s, done.\n", "Total 5 (delta 2), reused 0 (delta 0), pack-reused 0\n", "remote: Resolving deltas: 100% (2/2), completed with 2 local objects.\u001b[K\n", "To github.com:TomekZet/ium-ga-hello.git\n", " 4df6dc0..5a40228 main -> main\n" ] } ], "source": [ "!git add -u\n", "!git commit -m \"Archive in one job, use in other\"\n", "!git push" ] }, { "cell_type": "markdown", "id": "805622e8", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## CML - Continous Machine Learning" ] }, { "cell_type": "markdown", "id": "e0b3acbf", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ " - Tworzone przez [iterative.ai](iterative.ai) (tak jak DVC)\n", " - https://cml.dev/\n", " - Dokumentacja: https://dvc.org/doc/cml\n", " - Korzysta z Github Actions lub Gitlab CI (a także [Bitbucket Pipelines](https://github.com/iterative/cml/wiki/CML-with-Bitbucket-Cloud))\n", " - CML dodaje do Github Actions kilka \"akcji\":\n", " - `iterative/setup-cml` - dodaje poniższe akcje\n", " - `cml-send-comment` - dodaje raport CML jako komentarz do Pull Requesta na Githubie\n", " - `cml-send-github-check` - dodaje raport CML do zakładki \"Checks\" Pull Requesta na Githubie\n", " - `cml-publish` - umożliwia dodanie obrazka do raportu\n", " " ] }, { "cell_type": "markdown", "id": "cdb54b38", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Przykładowy Workflow CML:" ] }, { "cell_type": "code", "execution_count": 1, "id": "07b1035a", "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/home/tomek/AITech/repo/aitech-ium-private/IUM_11\n", "Cloning into 'example_cml'...\n", "remote: Enumerating objects: 25, done.\u001b[K\n", "remote: Total 25 (delta 0), reused 0 (delta 0), pack-reused 25\u001b[K\n", "Receiving objects: 100% (25/25), 222.95 KiB | 920.00 KiB/s, done.\n", "Resolving deltas: 100% (6/6), done.\n" ] } ], "source": [ "!git clone git@github.com:TomekZet/example_cml.git" ] }, { "cell_type": "code", "execution_count": 5, "id": "bf27a2b3", "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/home/tomek/AITech/repo/aitech-ium-private/IUM_11/example_cml\n" ] } ], "source": [ "%cd example_cml\n", "!mkdir -p .github/workflows/" ] }, { "cell_type": "code", "execution_count": 14, "id": "64f6e21d", "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Overwriting .github/workflows/cml.yaml\n" ] } ], "source": [ "%%writefile .github/workflows/cml.yaml\n", "name: model-training\n", "on: [push]\n", "jobs:\n", " run:\n", " runs-on: [ubuntu-latest]\n", " steps:\n", " - uses: actions/checkout@v2\n", " - uses: actions/setup-python@v2\n", " - uses: iterative/setup-cml@v1\n", " - name: Train model\n", " env:\n", " REPO_TOKEN: ${{ secrets.GITHUB_TOKEN }}\n", " run: |\n", " pip install -r requirements.txt\n", " python train.py\n", "\n", " cat metrics.txt >> report.md\n", " cml-publish confusion_matrix.png --md >> report.md\n", " cml-send-comment report.md" ] }, { "cell_type": "code", "execution_count": null, "id": "83e49d3b", "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "# %load train.py\n", "from sklearn.ensemble import RandomForestClassifier\n", "from sklearn.metrics import plot_confusion_matrix\n", "import matplotlib.pyplot as plt\n", "import json\n", "import os\n", "import numpy as np\n", "\n", "# Read in data\n", "X_train = np.genfromtxt(\"data/train_features.csv\")\n", "y_train = np.genfromtxt(\"data/train_labels.csv\")\n", "X_test = np.genfromtxt(\"data/test_features.csv\")\n", "y_test = np.genfromtxt(\"data/test_labels.csv\")\n", "\n", "\n", "# Fit a model\n", "depth = 2\n", "clf = RandomForestClassifier(max_depth=depth)\n", "clf.fit(X_train,y_train)\n", "\n", "acc = clf.score(X_test, y_test)\n", "print(acc)\n", "with open(\"metrics.txt\", 'w') as outfile:\n", " outfile.write(\"Accuracy: \" + str(acc) + \"\\n\")\n", "\n", "\n", "# Plot it\n", "disp = plot_confusion_matrix(clf, X_test, y_test, normalize='true',cmap=plt.cm.Blues)\n", "plt.savefig('confusion_matrix.png')\n", "\n" ] }, { "cell_type": "markdown", "id": "8dc5748f", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Wprowadźmy zmianę do pliku (linijka 17: `depth= = 6`)" ] }, { "cell_type": "code", "execution_count": 11, "id": "afeaf939", "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Overwriting train.py\n" ] } ], "source": [ "%%writefile train.py\n", "from sklearn.ensemble import RandomForestClassifier\n", "from sklearn.metrics import plot_confusion_matrix\n", "import matplotlib.pyplot as plt\n", "import json\n", "import os\n", "import numpy as np\n", "\n", "# Read in data\n", "X_train = np.genfromtxt(\"data/train_features.csv\")\n", "y_train = np.genfromtxt(\"data/train_labels.csv\")\n", "X_test = np.genfromtxt(\"data/test_features.csv\")\n", "y_test = np.genfromtxt(\"data/test_labels.csv\")\n", "\n", "\n", "# Fit a model\n", "depth = 6\n", "clf = RandomForestClassifier(max_depth=depth)\n", "clf.fit(X_train,y_train)\n", "\n", "acc = clf.score(X_test, y_test)\n", "print(acc)\n", "with open(\"metrics.txt\", 'w') as outfile:\n", " outfile.write(\"Accuracy: \" + str(acc) + \"\\n\")\n", "\n", "\n", "# Plot it\n", "disp = plot_confusion_matrix(clf, X_test, y_test, normalize='true',cmap=plt.cm.Blues)\n", "plt.savefig('confusion_matrix.png')" ] }, { "cell_type": "markdown", "id": "3e4a711a", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Stwórzmy nowy branch \"deep_depth\":" ] }, { "cell_type": "code", "execution_count": 13, "id": "ab019b0b", "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Switched to a new branch 'deep_depth'\n", "[deep_depth 0df0f2c] Changed depth and added cml workflow\n", " 2 files changed, 19 insertions(+), 2 deletions(-)\n", " create mode 100644 .github/workflows/cml.yaml\n", "Enumerating objects: 8, done.\n", "Counting objects: 100% (8/8), done.\n", "Delta compression using up to 4 threads\n", "Compressing objects: 100% (4/4), done.\n", "Writing objects: 100% (6/6), 738 bytes | 738.00 KiB/s, done.\n", "Total 6 (delta 2), reused 0 (delta 0)\n", "remote: Resolving deltas: 100% (2/2), completed with 2 local objects.\u001b[K\n", "remote: \n", "remote: Create a pull request for 'deep_depth' on GitHub by visiting:\u001b[K\n", "remote: https://github.com/TomekZet/example_cml/pull/new/deep_depth\u001b[K\n", "remote: \n", "To github.com:TomekZet/example_cml.git\n", " * [new branch] deep_depth -> deep_depth\n" ] } ], "source": [ "!git checkout -b deep_depth\n", "!git add train.py .github/workflows/cml.yaml\n", "!git commit -m \"Changed depth and added cml workflow\"\n", "!git push origin deep_depth" ] }, { "cell_type": "markdown", "id": "b50f46a8", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "" ] }, { "cell_type": "markdown", "id": "c56c8785", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "" ] }, { "cell_type": "markdown", "id": "fb25c587", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Zadania [20 pkt] (termin: 5 czerwca 2024)\n", "1. Utwórz konto w serwisie GitHub (jeśli jeszcze nie masz)\n", "2. Stwórz publiczne repozytorium. Link do niego wklej do kolumny *Link GitHub (Actions)* w arkuszu `IUM-2024.xlsx` [1 pkt]\n", "3. Stwórz prosty *GitHub workflow*, który:\n", " - zrobi checkout Twojego repozytorium [1 pkt]\n", " - pobierze pliki z danymi uczącymi (pliki można po prostu dodać do repozytorium albo pobrać przez `wget` jeśli są publicznie dostępne) [2 pkt]\n", " - będzie wywoływalny przez \"Workflow dispatch\" z parametrami uczenia [2 pkt]\n", " - będzie się składał z co najmniej 2 zadań (*job*):\n", " - uczenie modelu jako osobna akcja wykonana w Dockerze [8 pkt]\n", " - ewaluacja modelu [6 pkt]" ] } ], "metadata": { "author": "Tomasz Ziętkiewicz", "celltoolbar": "Slideshow", "email": "tomasz.zietkiewicz@amu.edu.pl", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "lang": "pl", "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" }, "slideshow": { "slide_type": "slide" }, "subtitle": "11.CML[laboratoria]", "title": "Inżynieria uczenia maszynowego", "year": "2021" }, "nbformat": 4, "nbformat_minor": 5 }