{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Inżynieria uczenia maszynowego\n", "### 24 kwietnia 2024\n", "# 8. MLFlow" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ " ## MLflow\n", "
\n", "\n", " - https://mlflow.org/\n", " - Narzędzie podobne do omawianego na poprzednich zajęciach Sacred\n", " - Nieco inne podejście: mniej ingerencji w istniejący kod\n", " - Bardziej kompleksowe rozwiązanie: 4 komponenty, pierwszy z nich ma funkcjonalność podobną do Sacred\n", " - Działa \"z każdym\" językiem. A tak naprawdę: Python, R, Java + CLI API + REST API\n", " - Popularna wśród pracodawców - wyniki wyszukiwania ofert pracy: 20 ofert (https://pl.indeed.com/), 36 ofert (linkedin). Sacred: 0\n", " - Integracja z licznymi bibliotekami / chmurami\n", " - Rozwiązanie OpenSource, stworzone przez firmę Databricks\n", " - Dostępne różne wydania / opcje instalacji:\n", " - płatne:\n", " - Databricks Customers\n", " - bezpłatne:\n", " - Databricks Community Edition\n", " - Self-managed MLflow\n", " - Local Tracking Server\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Komponenty\n", "\n", "MLflow składa się z czterech niezależnych komponentów:\n", " - **MLflow Tracking** - pozwala śledzić zmiany parametrów, kodu, środowiska i ich wpływ na metryki. Jest to funkcjonalność bardzo zbliżona do tej, którą zapewnia Sacred\n", " - **MLflow Projects** - umożliwia \"pakowanie\" kodu ekserymentów w taki sposób, żeby mogłby być w łatwy sposób zreprodukowane przez innych\n", " - **MLflow Models** - ułatwia \"pakowanie\" modeli uczenia maszynowego\n", " - **MLflow Registry** - zapewnia centralne miejsce do przechowywania i współdzielenia modeli. %%writefile IUM_08/examples/sklearn_elasticnet_wine/train.py
# The data set used in this example is from http://archive.ics.uci.edu/ml/datasets/Wine+Quality
# P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis.
# Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.

import os
import warnings
import sys

import pandas as pd
import numpy as np
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import ElasticNet
from urllib.parse import urlparse
import mlflow
import mlflow.sklearn

import logging

logging.basicConfig(level=logging.WARN)
logger = logging.getLogger(__name__)

mlflow.set_tracking_uri("http://localhost:5000")
mlflow.set_experiment("s123456")

def eval_metrics(actual, pred):
    rmse = np.sqrt(mean_squared_error(actual, pred))
    mae = mean_absolute_error(actual, pred)
    r2 = r2_score(actual, pred)
    return rmse, mae, r2


if __name__ == "__main__":
    warnings.filterwarnings("ignore")
    np.random.seed(40)

    # Read the wine-quality csv file from the URL
    csv_url = (
        "http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv"
    )
    try:
        data = pd.read_csv(csv_url, sep=";")
    except Exception as e:
        logger.exception(
            "Unable to download training & test CSV, check your internet connection. Error: %s", e
        )

    # Split the data into training and test sets. (0.75, 0.25) split.
    train, test = train_test_split(data)

    # The predicted column is "quality" which is a scalar from [3, 9]
    train_x = train.drop(["quality"], axis=1)
    test_x = test.drop(["quality"], axis=1)
    train_y = train[["quality"]]
    test_y = test[["quality"]]

    
    alpha = float(sys.argv[1]) if len(sys.argv) > 1 else 0.5
    #alpha = 0.5
    l1_ratio = float(sys.argv[2]) if len(sys.argv) > 2 else 0.5
    #l1_ratio = 0.5

    with mlflow.start_run() as run:
        print("MLflow run experiment_id: {0}".format(run.info.experiment_id))
        print("MLflow run artifact_uri: {0}".format(run.info.artifact_uri))

        lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
        lr.fit(train_x, train_y)

        predicted_qualities = lr.predict(test_x)

        (rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)

        print("Elasticnet model (alpha=%f, l1_ratio=%f):" % (alpha, l1_ratio))
        print("  RMSE: %s" % rmse)
        print("  MAE: %s" % mae)
        print("  R2: %s" % r2)

        mlflow.log_param("alpha", alpha)
        mlflow.log_param("l1_ratio", l1_ratio)
        mlflow.log_metric("rmse", rmse)
        mlflow.log_metric("r2", r2)
        mlflow.log_metric("mae", mae)
        
        # Infer model signature to log it
        # Więcej o sygnaturach: https://mlflow.org/docs/latest/models.html?highlight=signature#model-signature
        signature = mlflow.models.signature.infer_signature(train_x, lr.predict(train_x))

        tracking_url_type_store = urlparse(mlflow.get_tracking_uri()).scheme

        # Model registry does not work with file store
        if tracking_url_type_store != "file":

            # Register the model
            # There are other ways to use the Model Registry, which depends on the use case,
            # please refer to the doc for more information:
            # https://mlflow.org/docs/latest/model-registry.html#api-workflow
            mlflow.sklearn.log_model(lr, "wines-model", registered_model_name="ElasticnetWineModel", signature=signature)
        else:
            mlflow.sklearn.log_model(lr, "model", signature=signature) do python sklearn_elasticnet_wine/train.py 0.$a 0.$l; done; done" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "total 16\r\n", "drwxrwxr-x 6 tomek tomek 4096 maj 17 08:43 375cde31bdd44a45a91fd7cee92ebcda\r\n", "drwxrwxr-x 6 tomek tomek 4096 maj 17 10:38 b395b55b47fc43de876b67f5a4a5dae9\r\n", "drwxrwxr-x 6 tomek tomek 4096 maj 17 09:15 b3ead42eca964113b29e7e5f8bcb7bb7\r\n", "-rw-rw-r-- 1 tomek tomek 151 maj 17 08:43 meta.yaml\r\n" ] } ], "source": [ "### Informacje o przebieagach eksperymentu zostały zapisane w katalogu mlruns\n", "! ls -l IUM_08/examples/mlruns/0 | head" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "total 20\r\n", "drwxrwxr-x 3 tomek tomek 4096 maj 17 08:43 artifacts\r\n", "-rw-rw-r-- 1 tomek tomek 423 maj 17 08:43 meta.yaml\r\n", "drwxrwxr-x 2 tomek tomek 4096 maj 17 08:43 metrics\r\n", "drwxrwxr-x 2 tomek tomek 4096 maj 17 08:43 params\r\n", "drwxrwxr-x 2 tomek tomek 4096 maj 17 08:43 tags\r\n" ] } ], "source": [ "! ls -l IUM_08/examples/mlruns/0/375cde31bdd44a45a91fd7cee92ebcda" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[2021-05-16 17:58:43 +0200] [118029] [INFO] Starting gunicorn 20.1.0\n", "[2021-05-16 17:58:43 +0200] [118029] [ERROR] Connection in use: ('', 5000)\n", "[2021-05-16 17:58:43 +0200] [118029] [ERROR] Retrying in 1 second.\n", "[2021-05-16 17:58:44 +0200] [118029] [ERROR] Connection in use: ('', 5000)\n", "[2021-05-16 17:58:44 +0200] [118029] [ERROR] Retrying in 1 second.\n", "[2021-05-16 17:58:45 +0200] [118029] [ERROR] Connection in use: ('', 5000)\n", "[2021-05-16 17:58:45 +0200] [118029] [ERROR] Retrying in 1 second.\n", "[2021-05-16 17:58:46 +0200] [118029] [ERROR] Connection in use: ('', 5000)\n", "[2021-05-16 17:58:46 +0200] [118029] [ERROR] Retrying in 1 second.\n", "[2021-05-16 17:58:47 +0200] [118029] [ERROR] Connection in use: ('', 5000)\n", "[2021-05-16 17:58:47 +0200] [118029] [ERROR] Retrying in 1 second.\n", "[2021-05-16 17:58:48 +0200] [118029] [ERROR] Can't connect to ('', 5000)\n", "Running the mlflow server failed. Please see the logs above for details.\n" ] } ], "source": [ "### Możemy je obejrzeć w przeglądarce uruchamiając interfejs webowy:\n", "### (powinniśmy to wywołać w normalnej konsoli, w jupyter będziemy mieli zablokowany kernel)\n", "! cd IUM_08/examples/; mlflow ui" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Instancja na naszym serwerze: http://tzietkiewicz.vm.wmi.amu.edu.pl:5000/#/" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Wygląd interfejsu webowego\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Porównywanie wyników\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Logowanie\n", " - logowania metryk i parametrów można dokonać m.in. poprzez wywołania Python-owego API: `mlflow.log_param()` i `mlflow.log_metric()`. Więcej dostępnych funkcji: [link](https://mlflow.org/docs/latest/tracking.html#logging-functions)\n", " - wywołania te muszą nastąpić po wykonaniu [`mlflow.start_run()`](https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.start_run), najlepiej wewnątrz bloku:\n", "```python\n", " with mlflow.start_run():\n", " \n", " #[...]\n", "\n", " mlflow.log_param(\"alpha\", alpha)\n", " mlflow.log_param(\"l1_ratio\", l1_ratio)\n", "```\n", " - jest też możliwość automatycznego logwania dla wybranych bibliotek: https://mlflow.org/docs/latest/tracking.html#automatic-logging" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# MLflow Projects\n", " - MLflow projects to zestaw konwencji i kilku narzędzi\n", " - ułatwiają one uruchamianie eskperymentów" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Konfiguracja projektu\n", " - W pliku `MLproject` zapisuje się konfigurację projektu ([specyfikacja](https://mlflow.org/docs/latest/projects.html))\n", " - Zawiera ona:\n", " - odnośnik do środowiska, w którym ma być wywołany eksperyment [szczegóły](https://mlflow.org/docs/latest/projects.html#specifying-an-environment):\n", " - nazwa obrazu Docker\n", " - albo ścieżka do pliku conda.yaml definiującego środowisko wykonania Conda\n", " - parametry, z którymi można wywołać eksperyment\n", " - polecenia służące do wywołania eksperymentu" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Overwriting IUM_08/examples/sklearn_elasticnet_wine/MLproject\n" ] } ], "source": [ "%%writefile IUM_08/examples/sklearn_elasticnet_wine/MLproject\n", "name: tutorial\n", "\n", "conda_env: conda.yaml #ścieżka do pliku conda.yaml z definicją środowiska\n", " \n", "#docker_env:\n", "# image: mlflow-docker-example-environment\n", "\n", "entry_points:\n", " main:\n", " parameters:\n", " alpha: {type: float, default: 0.5}\n", " l1_ratio: {type: float, default: 0.1}\n", " command: \"python train.py {alpha} {l1_ratio}\"\n", " test:\n", " parameters:\n", " alpha: {type: cutoff, default: 0}\n", " command: \"python test.py {cutoff}\"" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Środowisko Conda\n", "
\n", " - https://docs.conda.io\n", " - Składnia plików conda.yaml definiujących środowisko: https://docs.conda.io/projects/conda/en/4.6.1/user-guide/tasks/manage-environments.html#create-env-file-manually\n", " - Składnia YAML: [przystępnie](https://learnxinyminutes.com/docs/yaml/), [oficjalnie](https://yaml.org/spec/1.2/spec.html)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Overwriting IUM_08/examples/sklearn_elasticnet_wine/conda.yaml\n" ] } ], "source": [ "%%writefile IUM_08/examples/sklearn_elasticnet_wine/conda.yaml\n", "name: tutorial\n", "channels:\n", " - defaults\n", "dependencies:\n", " - python=3.6 #Te zależności będą zainstalowane za pomocą conda isntall\n", " - pip\n", " - pip: #Te ząś za pomocą pip install\n", " - scikit-learn==0.23.2\n", " - mlflow>=1.0" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Środowisko docker\n", "- zamiast środowiska Conda możemy również podać nazwę obrazu docker, w którym ma być wywołany eksperyment.\n", "- obraz będzie szukany lokalnie a następnie na DockerHub, lub w innym repozytorium dockera\n", "- składnia specyfikacji ścieżki jest taka sama jak w przypadki poleceń dockera, np. docker pull [link](https://docs.docker.com/engine/reference/commandline/pull/#pull-from-a-different-registry)\n", "- Można również podać katalogi do podmontowania wewnątrz kontenera oraz wartości zmiennych środowiskowych do ustawienia w kontenerze:\n", "```yaml\n", "docker_env:\n", " image: mlflow-docker-example-environment\n", " volumes: [\"/local/path:/container/mount/path\"]\n", " environment: [[\"NEW_ENV_VAR\", \"new_var_value\"], \"VAR_TO_COPY_FROM_HOST_ENVIRONMENT\"]\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Parametry\n", " - Specyfikacja parametrów w pliku MLproject pozwala na ich walidację i używanie wartości domyślnych\n", " - Dostępne typy:\n", " - String\n", " - Float - dowolna liczba (MLflow waliduje, czy podana wartość jest liczbą)\n", " - Path - pozwala podawać ścieżki względne (przekształca je na bezwzlędne) do plików lokalnych albo do plików zdalnych (np. do s3://) - zostaną wtedy ściągnięte lokalnie\n", " - URI - podobnie jak path, ale do rozproszonych systemów plików\n", "\n", "- [Składnia](https://mlflow.org/docs/latest/projects.html#specifying-parameters)\n", " \n", "```yml:\n", " parameter_name: {type: data_type, default: value} # Short syntax\n", "\n", " parameter_name: # Long syntax\n", " type: data_type\n", " default: value\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Uruchamianie projektu\n", " - Projekt możemy uruchomić przy pomocy polecenia `mlflow run` ([dokumentacja](https://mlflow.org/docs/latest/cli.html#mlflow-run))\n", " - Spowoduje to przygotowanie środowiska i uruchomienie eksperymentu wewnątrz środowiska\n", " - domyślnie zostanie uruchomione polecenie zdefiniowane w \"entry point\" `main`. Żeby uruchomić inny \"entry point\", możemy użyć parametru `-e`, np:\n", " ```bash\n", " mlflow run sklearn_elasticnet_wine -e test\n", " ```\n", " - Parametry do naszego polecenia możemy przekazywać przy pomocy flagi `-P`" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2021/05/16 17:59:10 INFO mlflow.projects.utils: === Created directory /tmp/tmprq4mdosv for downloading remote URIs passed to arguments of type 'path' ===\n", "2021/05/16 17:59:10 INFO mlflow.projects.backend.local: === Running command 'source /home/tomek/miniconda3/bin/../etc/profile.d/conda.sh && conda activate mlflow-5987e03d4dbaa5faa1a697bb113be9b9bdc39b29 1>&2 && python train.py 0.42 0.1' in run with ID '1860d321ea1545ff8866e4ba199d1712' === \n", "Elasticnet model (alpha=0.420000, l1_ratio=0.100000):\n", " RMSE: 0.7420620899060748\n", " MAE: 0.5722846717246247\n", " R2: 0.21978513651550236\n", "2021/05/16 17:59:19 INFO mlflow.projects: === Run (ID '1860d321ea1545ff8866e4ba199d1712') succeeded ===\n" ] } ], "source": [ "!cd IUM_08/examples/; mlflow run sklearn_elasticnet_wine -P alpha=0.42" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Zadania [10p pkt]\n", "1. Dodaj do swojego projektu logowanie parametrów i metryk za pomocą MLflow (polecenia `mlflow.log_param` i `mlflow.log_metric`\n", "2. Dodaj plik MLProject definiujący polecenia do trenowania i testowania, ich parametry wywołania oraz środowisko (Conda albo Docker)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## MLflow Models\n", "\n", "MLflow Models to konwencja zapisu modeli, która ułatwia potem ich załadowanie i użycie" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Rodzaje modeli (\"flavors\") wspierane przez MLflow:\n", "\n", " - Python Function (python_function)\n", " - PyTorch (pytorch)\n", " - TensorFlow (tensorflow)\n", " - Keras (keras)\n", " - Scikit-learn (sklearn)\n", " - Spacy(spaCy)\n", " - ONNX (onnx)\n", " - R Function (crate)\n", " - H2O (h2o)\n", " - MLeap (mleap)\n", " - Spark MLlib (spark)\n", " - MXNet Gluon (gluon)\n", " - XGBoost (xgboost)\n", " - LightGBM (lightgbm)\n", " - CatBoost (catboost)\n", " - Fastai(fastai)\n", " - Statsmodels (statsmodels)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Zapisywanie modelu\n", "Model ML można zapisać w MLflow przy pomocy jednej z dwóch funkcji z pakietu odpowiadającego używanej przez nas bibliotece:\n", " - `save_model()` - zapisuje model na dysku\n", " - `log_model()` - zapisuje model razem z innymi informacjami (metrykami, parametrami). W zależności od ustawień [\"tracking_uri\"](https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.set_tracking_uri) może być to lokalny folder w `mlruns/ ` lub ścieżka na zdalnym serwerze MLflow\n", "\n", "```Python\n", " mlflow.sklearn.save_model(lr, \"my_model\")\n", "```\n", "\n", "```Python\n", " mlflow.keras.save_model(lr, \"my_model\")\n", "```\n", "\n", "Wywołanie tej funkcji spowoduje stworzenie katalogu \"my_model\" zawierającego:\n", " - plik *MLmodel* zawierający informacje o sposobach, w jaki model można załadować (\"flavors\") oraz ścieżki do plików związanych z modelem, takich jak:\n", " - *conda.yaml* - opis środowiska potrzebnego do załadowania modelu\n", " - *model.pkl* - plik z zserializowanym modelem\n", "\n", "Tylko plik *MLmodel* jest specjalnym plikiem MLflow - reszta zależy od konkrentego \"flavour\"\n" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "conda.yaml MLmodel model.pkl\r\n" ] } ], "source": [ "ls IUM_08/examples/my_model" ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "total 12\r\n", "-rw-rw-r-- 1 tomek tomek 153 maj 17 10:38 conda.yaml\r\n", "-rw-rw-r-- 1 tomek tomek 958 maj 17 10:38 MLmodel\r\n", "-rw-rw-r-- 1 tomek tomek 641 maj 17 10:38 model.pkl\r\n" ] } ], "source": [ "! ls -l IUM_08/examples/mlruns/0/b395b55b47fc43de876b67f5a4a5dae9/artifacts/model" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "# %load IUM_08/examples/mlruns/0/b395b55b47fc43de876b67f5a4a5dae9/artifacts/model/MLmodel\n", "artifact_path: model\n", "flavors:\n", " python_function:\n", " env: conda.yaml\n", " loader_module: mlflow.sklearn\n", " model_path: model.pkl\n", " python_version: 3.9.1\n", " sklearn:\n", " pickled_model: model.pkl\n", " serialization_format: cloudpickle\n", " sklearn_version: 0.24.2\n", "run_id: b395b55b47fc43de876b67f5a4a5dae9\n", "signature:\n", " inputs: '[{\"name\": \"fixed acidity\", \"type\": \"double\"}, {\"name\": \"volatile acidity\",\n", " \"type\": \"double\"}, {\"name\": \"citric acid\", \"type\": \"double\"}, {\"name\": \"residual\n", " sugar\", \"type\": \"double\"}, {\"name\": \"chlorides\", \"type\": \"double\"}, {\"name\": \"free\n", " sulfur dioxide\", \"type\": \"double\"}, {\"name\": \"total sulfur dioxide\", \"type\": \"double\"},\n", " {\"name\": \"density\", \"type\": \"double\"}, {\"name\": \"pH\", \"type\": \"double\"}, {\"name\":\n", " \"sulphates\", \"type\": \"double\"}, {\"name\": \"alcohol\", \"type\": \"double\"}]'\n", " outputs: '[{\"type\": \"tensor\", \"tensor-spec\": {\"dtype\": \"float64\", \"shape\": [-1]}}]'\n", "utc_time_created: '2021-05-17 08:38:41.749670'\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "# %load IUM_08/examples/my_model/conda.yaml\n", "channels:\n", "- defaults\n", "- conda-forge\n", "dependencies:\n", "- python=3.9.1\n", "- pip\n", "- pip:\n", " - mlflow\n", " - scikit-learn==0.24.2\n", " - cloudpickle==1.6.0\n", "name: mlflow-env" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Dodatkowe pola w MLmodel\n", "\n", "\n", "- *utc_time_created* - timestamp z czasem stworzenia modelu\n", "- *run_id* - ID uruchomienia (\"run\"), które stworzyło ten model, jeśli model był zapisany za pomocą MLflow Tracking.\n", "- *signature* - opisa danych wejściowych i wyjściowych w formacie JSON\n", "- *input_example* przykładowe wejście przyjmowane przez model. Można je podać poprzez parametr `input_example` funkcji [log_model](https://mlflow.org/docs/latest/python_api/mlflow.sklearn.html#mlflow.sklearn.log_model)\n", "\n" ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/plain": [ "array([5.57688397, 5.50664777, 5.52550482, 5.50431125, 5.57688397])" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import mlflow\n", "import pandas as pd\n", "model = mlflow.sklearn.load_model(\"IUM_08/examples/mlruns/0/b395b55b47fc43de876b67f5a4a5dae9/artifacts/model\")\n", "csv_url = \"http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv\"\n", "data = pd.read_csv(csv_url, sep=\";\")\n", "model.predict(data.drop([\"quality\"], axis=1).head())" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Serwowanie modeli" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Usage: mlflow models [OPTIONS] COMMAND [ARGS]...\r\n", "\r\n", " Deploy MLflow models locally.\r\n", "\r\n", " To deploy a model associated with a run on a tracking server, set the\r\n", " MLFLOW_TRACKING_URI environment variable to the URL of the desired server.\r\n", "\r\n", "Options:\r\n", " --help Show this message and exit.\r\n", "\r\n", "Commands:\r\n", " build-docker **EXPERIMENTAL**: Builds a Docker image whose default...\r\n", " predict Generate predictions in json format using a saved MLflow...\r\n", " prepare-env **EXPERIMENTAL**: Performs any preparation necessary to...\r\n", " serve Serve a model saved with MLflow by launching a webserver on...\r\n" ] } ], "source": [ "!cd IUM_08/examples/; mlflow models --help" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Usage: mlflow models serve [OPTIONS]\r\n", "\r\n", " Serve a model saved with MLflow by launching a webserver on the specified\r\n", " host and port. The command supports models with the ``python_function`` or\r\n", " ``crate`` (R Function) flavor. For information about the input data\r\n", " formats accepted by the webserver, see the following documentation:\r\n", " https://www.mlflow.org/docs/latest/models.html#built-in-deployment-tools.\r\n", "\r\n", " You can make requests to ``POST /invocations`` in pandas split- or record-\r\n", " oriented formats.\r\n", "\r\n", " Example:\r\n", "\r\n", " .. code-block:: bash\r\n", "\r\n", " $ mlflow models serve -m runs:/my-run-id/model-path &\r\n", "\r\n", " $ curl -H 'Content-Type:\r\n", " application/json' -d '{ \"columns\": [\"a\", \"b\", \"c\"],\r\n", " \"data\": [[1, 2, 3], [4, 5, 6]] }'\r\n", "\r\n", "Options:\r\n", " -m, --model-uri URI URI to the model. A local path, a 'runs:/' URI, or a\r\n", " remote storage URI (e.g., an 's3://' URI). For more\r\n", " information about supported remote URIs for model\r\n", " artifacts, see\r\n", " https://mlflow.org/docs/latest/tracking.html#artifact-\r\n", " stores [required]\r\n", "\r\n", " -p, --port INTEGER The port to listen on (default: 5000).\r\n", " -h, --host HOST The network address to listen on (default:\r\n", " Use to bind to all addresses if you want to\r\n", " access the tracking server from other machines.\r\n", "\r\n", " -w, --workers TEXT Number of gunicorn worker processes to handle requests\r\n", " (default: 4).\r\n", "\r\n", " --no-conda If specified, will assume that MLmodel/MLproject is\r\n", " running within a Conda environment with the necessary\r\n", " dependencies for the current project instead of\r\n", " attempting to create a new conda environment.\r\n", "\r\n", " --install-mlflow If specified and there is a conda environment to be\r\n", " activated mlflow will be installed into the environment\r\n", " after it has been activated. The version of installed\r\n", " mlflow will be the same asthe one used to invoke this\r\n", " command.\r\n", "\r\n", " --help Show this message and exit.\r\n" ] } ], "source": [ "!cd IUM_08/examples/; mlflow models serve --help" ] }, { "cell_type": "code", "execution_count": 54, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\"columns\":[\"fixed acidity\",\"volatile acidity\",\"citric acid\",\"residual sugar\",\"chlorides\",\"free sulfur dioxide\",\"total sulfur dioxide\",\"density\",\"pH\",\"sulphates\",\"alcohol\"],\"index\":[0],\"data\":[[7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4]]}\n" ] } ], "source": [ "import pandas as pd\n", "csv_url = \"http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv\"\n", "data = pd.read_csv(csv_url, sep=\";\").drop([\"quality\"], axis=1).head(1).to_json(orient='split')\n", "print(data)" ] }, { "cell_type": "code", "execution_count": 57, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[5.576883967129615]" ] } ], "source": [ "!curl -H 'Content-Type: application/json' -d '{\\\n", " \"columns\":[\\\n", " \"fixed acidity\",\"volatile acidity\",\"citric acid\",\"residual sugar\",\"chlorides\",\"free sulfur dioxide\",\"total sulfur dioxide\",\"density\",\"pH\",\"sulphates\",\"alcohol\"],\\\n", " \"index\":[0],\\\n", " \"data\":[[7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4]]}'" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "```\n", "$ cd IUM_08/examples/\n", "$ mlflow models serve -m my_model\n", "2021/05/17 08:52:07 INFO mlflow.models.cli: Selected backend for flavor 'python_function'\n", "2021/05/17 08:52:07 INFO mlflow.pyfunc.backend: === Running command 'source /home/tomek/miniconda3/bin/../etc/profile.d/conda.sh && conda activate mlflow-503f0c7520a32f054a9d168bd099584a9439de9d 1>&2 && gunicorn --timeout=60 -b -w 1 ${GUNICORN_CMD_ARGS} -- mlflow.pyfunc.scoring_server.wsgi:app'\n", "[2021-05-17 08:52:07 +0200] [291217] [INFO] Starting gunicorn 20.1.0\n", "[2021-05-17 08:52:07 +0200] [291217] [INFO] Listening at: (291217)\n", "[2021-05-17 08:52:07 +0200] [291217] [INFO] Using worker: sync\n", "[2021-05-17 08:52:07 +0200] [291221] [INFO] Booting worker with pid: 291221\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## MLflow Registry\n", " - umożliwia [zapisywanie](https://mlflow.org/docs/latest/model-registry.html#adding-an-mlflow-model-to-the-model-registry) i [ładowanie](https://mlflow.org/docs/latest/model-registry.html#fetching-an-mlflow-model-from-the-model-registry) modeli z centralnego rejestru\n", " - Modele można też serwować bezpośrednio z rejestru:\n", "\n", "```bash\n", "#!/usr/bin/env sh\n", "\n", "# Set environment variable for the tracking URL where the Model Registry resides\n", "export MLFLOW_TRACKING_URI=http://localhost:5000\n", "\n", "# Serve the production model from the model registry\n", "mlflow models serve -m \"models:/sk-learn-random-forest-reg-model/Production\"\n", "```\n", "\n", "- Żeby było to możliwe, musimy mieć uruchomiony [serwer MLflow](https://mlflow.org/docs/latest/tracking.html#tracking-server)\n", "- Umożliwia zarządzanie wersjami modeli i oznaczanie ich różnymi fazami, np. \"Staging\", \"Production\"" ] } ], "metadata": { "author": "Tomasz Ziętkiewicz", "celltoolbar": "Slideshow", "email": "tomasz.zietkiewicz@amu.edu.pl", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "lang": "pl", "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" }, "slideshow": { "slide_type": "slide" }, "subtitle": "8.MLFlow[laboratoria]", "title": "Inżynieria uczenia maszynowego", "year": "2021" }, "nbformat": 4, "nbformat_minor": 4 }