{ "cells": [ { "cell_type": "markdown", "id": "80377b3b", "metadata": {}, "source": [ "![Logo 1](img/aitech-logotyp-1.jpg)\n", "
\n", "

Widzenie komputerowe

\n", "

09. Metody głębokiego uczenia (1) [laboratoria]

\n", "

Andrzej Wójtowicz (2021)

\n", "
\n", "\n", "![Logo 2](img/aitech-logotyp-2.jpg)" ] }, { "cell_type": "markdown", "id": "07159136", "metadata": {}, "source": [ "W poniższym materiale zobaczymy w jaki sposób korzystać z metod głębokiego uczenia sieci neuronowych w pakiecie OpenCV.\n", "\n", "Na początku załadujmy niezbędne biblioteki:" ] }, { "cell_type": "code", "execution_count": null, "id": "b2e906f0", "metadata": {}, "outputs": [], "source": [ "import cv2 as cv\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "id": "f4348bc5", "metadata": {}, "source": [ "OpenCV wspiera [wiele](https://github.com/opencv/opencv/wiki/Deep-Learning-in-OpenCV) bibliotek i modeli sieci neuronowych. Modele trenuje się poza OpenCV - bibliotekę wykorzystuje się tylko do predykcji, aczkolwiek sama w sobie ma całkiem sporo możliwych optymalizacji w porównaniu do źródłowych bibliotek neuronowych, więc predykcja może być tutaj faktycznie szybsza.\n", "\n", "Pliki z modelami i danymi pomocniczymi będziemy pobierali z sieci i będziemy je zapisywali w katalogu `dnn`:" ] }, { "cell_type": "code", "execution_count": null, "id": "42b85f55", "metadata": {}, "outputs": [], "source": [ "!mkdir -p dnn" ] }, { "cell_type": "markdown", "id": "ac09b098", "metadata": {}, "source": [ "# Klasyfikacja obrazów\n", "\n", "Spróbujemy wykorzystać sieć do klasyfikacji obrazów wyuczonej na zbiorze [ImageNet](https://www.image-net.org/). Pobierzmy plik zawierający opis 1000 możliwych klas:" ] }, { "cell_type": "code", "execution_count": null, "id": "85b1b68c", "metadata": {}, "outputs": [], "source": [ "!wget -q --show-progress -O dnn/classification_classes_ILSVRC2012.txt https://raw.githubusercontent.com/opencv/opencv/master/samples/data/dnn/classification_classes_ILSVRC2012.txt " ] }, { "cell_type": "markdown", "id": "fd0c577b", "metadata": {}, "source": [ "Spójrzmy na pierwsze pięć klas w pliku:" ] }, { "cell_type": "code", "execution_count": null, "id": "fb0d0546", "metadata": {}, "outputs": [], "source": [ "with open('dnn/classification_classes_ILSVRC2012.txt', 'r') as f_fd:\n", " classes = f_fd.read().splitlines()\n", " \n", "print(len(classes), classes[:5])" ] }, { "cell_type": "markdown", "id": "5b0ee6ff", "metadata": {}, "source": [ "Do klasyfikacji użyjemy sieci [DenseNet](https://arxiv.org/abs/1608.06993). Pobierzemy jedną z mniejszych [reimplementacji](https://github.com/shicai/DenseNet-Caffe), która jest hostowana m.in. na Google Drive (musimy doinstalować jeden pakiet):" ] }, { "cell_type": "code", "execution_count": null, "id": "fb2bf2a1", "metadata": {}, "outputs": [], "source": [ "!pip3 install --user --disable-pip-version-check gdown" ] }, { "cell_type": "code", "execution_count": null, "id": "27996509", "metadata": {}, "outputs": [], "source": [ "import gdown\n", "\n", "url = 'https://drive.google.com/uc?id=0B7ubpZO7HnlCcHlfNmJkU2VPelE'\n", "output = 'dnn/DenseNet_121.caffemodel'\n", "gdown.download(url, output, quiet=False)" ] }, { "cell_type": "code", "execution_count": null, "id": "648ec9c9", "metadata": {}, "outputs": [], "source": [ "!wget -q --show-progress -O dnn/DenseNet_121.prototxt https://raw.githubusercontent.com/shicai/DenseNet-Caffe/master/DenseNet_121.prototxt" ] }, { "cell_type": "markdown", "id": "f7294c54", "metadata": {}, "source": [ "Konkretne biblioteki neuronowe posiadają dedykowane funkcje do ładowania modeli, np. [`readNetFromCaffe()`](https://docs.opencv.org/4.5.3/d6/d0f/group__dnn.html#ga29d0ea5e52b1d1a6c2681e3f7d68473a) lub [`readNetFromTorch()`](https://docs.opencv.org/4.5.3/d6/d0f/group__dnn.html#ga65a1da76cb7d6852bdf7abbd96f19084), jednak można też użyć ogólnej [`readNet()`](https://docs.opencv.org/4.5.3/d6/d0f/group__dnn.html#ga3b34fe7a29494a6a4295c169a7d32422):" ] }, { "cell_type": "code", "execution_count": null, "id": "6fd2d6b3", "metadata": {}, "outputs": [], "source": [ "model = cv.dnn.readNet(model='dnn/DenseNet_121.prototxt', config='dnn/DenseNet_121.caffemodel', framework='Caffe')" ] }, { "cell_type": "markdown", "id": "fe22fd6f", "metadata": {}, "source": [ "Spróbujemy sklasyfikować poniższy obraz:" ] }, { "cell_type": "code", "execution_count": null, "id": "6ace4606", "metadata": {}, "outputs": [], "source": [ "image = cv.imread('img/flamingo.jpg')\n", "plt.figure(figsize=[5,5])\n", "plt.imshow(image[:,:,::-1]);" ] }, { "cell_type": "markdown", "id": "e51db3ac", "metadata": {}, "source": [ "Aby móc przepuścić obraz przez sieć musimy zmienić jego formę reprezentacji poprzez funkcję [`blobFromImage()`](https://docs.opencv.org/4.5.3/d6/d0f/group__dnn.html#ga29f34df9376379a603acd8df581ac8d7). Aby uzyskać sensowne dane musimy ustawić parametry dotyczące preprocessingu (informacje o tym są zawarte na [stronie modelu](https://github.com/shicai/DenseNet-Caffe)):" ] }, { "cell_type": "code", "execution_count": null, "id": "d4e945ae", "metadata": {}, "outputs": [], "source": [ "image_blob = cv.dnn.blobFromImage(image=image, scalefactor=0.017, size=(224, 224), mean=(104, 117, 123), \n", " swapRB=False, crop=False)" ] }, { "cell_type": "markdown", "id": "625aebdd", "metadata": {}, "source": [ "Ustawiamy dane wejściowe w naszej sieci i pobieramy obliczone wartości:" ] }, { "cell_type": "code", "execution_count": null, "id": "753333a1", "metadata": {}, "outputs": [], "source": [ "model.setInput(image_blob)\n", "outputs = model.forward()[0]" ] }, { "cell_type": "markdown", "id": "34316ddb", "metadata": {}, "source": [ "Wyliczamy która klasa jest najbardziej prawdopodobna:" ] }, { "cell_type": "code", "execution_count": null, "id": "13423a6d", "metadata": {}, "outputs": [], "source": [ "outputs = outputs.reshape(1000, 1)\n", "\n", "label_id = np.argmax(outputs)\n", "\n", "probs = np.exp(outputs) / np.sum(np.exp(outputs))" ] }, { "cell_type": "markdown", "id": "874c1b1d", "metadata": {}, "source": [ "Wynik:" ] }, { "cell_type": "code", "execution_count": null, "id": "ec75a3c5", "metadata": {}, "outputs": [], "source": [ "plt.imshow(image[:,:,::-1])\n", "plt.title(classes[label_id])\n", "print(\"{:.2f} %\".format(np.max(probs) * 100.0))" ] }, { "cell_type": "markdown", "id": "3808c42c", "metadata": {}, "source": [ "# Wykrywanie twarzy\n", "\n", "Do wykrywania twarzy użyjemy sieci bazującej na [SSD](https://github.com/weiliu89/caffe/tree/ssd):" ] }, { "cell_type": "code", "execution_count": null, "id": "3c0df387", "metadata": {}, "outputs": [], "source": [ "!wget -q --show-progress -O dnn/res10_300x300_ssd_iter_140000_fp16.caffemodel https://raw.githubusercontent.com/opencv/opencv_3rdparty/dnn_samples_face_detector_20180205_fp16/res10_300x300_ssd_iter_140000_fp16.caffemodel\n", "!wget -q --show-progress -O dnn/res10_300x300_ssd_iter_140000_fp16.prototxt https://raw.githubusercontent.com/opencv/opencv/master/samples/dnn/face_detector/deploy.prototxt" ] }, { "cell_type": "markdown", "id": "c6142f6e", "metadata": {}, "source": [ "Ładujemy model:" ] }, { "cell_type": "code", "execution_count": null, "id": "60d41efb", "metadata": {}, "outputs": [], "source": [ "model = cv.dnn.readNet(model='dnn/res10_300x300_ssd_iter_140000_fp16.prototxt', config='dnn/res10_300x300_ssd_iter_140000_fp16.caffemodel', framework='Caffe')" ] }, { "cell_type": "markdown", "id": "ad612cc6", "metadata": {}, "source": [ "Będziemy chcieli wykryć twarze na poniższym obrazie:" ] }, { "cell_type": "code", "execution_count": null, "id": "b404d8c4", "metadata": {}, "outputs": [], "source": [ "image = cv.imread('img/people.jpg')\n", "plt.figure(figsize=[7,7])\n", "plt.imshow(image[:,:,::-1]);" ] }, { "cell_type": "markdown", "id": "a77f8e64", "metadata": {}, "source": [ "Znajdujemy twarze i oznaczamy je na zdjęciu (za próg przyjęliśmy 0.5; zob. informacje o [preprocessingu](https://github.com/opencv/opencv/tree/master/samples/dnn#face-detection)):" ] }, { "cell_type": "code", "execution_count": null, "id": "1d16f230", "metadata": {}, "outputs": [], "source": [ "height, width, _ = image.shape\n", "\n", "image_blob = cv.dnn.blobFromImage(image, scalefactor=1.0, size=(300, 300), mean=[104, 177, 123], \n", " swapRB=False, crop=False)\n", "\n", "model.setInput(image_blob)\n", "\n", "detections = model.forward()\n", "\n", "image_out = image.copy()\n", "\n", "for i in range(detections.shape[2]):\n", " confidence = detections[0, 0, i, 2]\n", " if confidence > 0.5:\n", "\n", " box = detections[0, 0, i, 3:7] * np.array([width, height, width, height])\n", " (x1, y1, x2, y2) = box.astype('int')\n", "\n", " cv.rectangle(image_out, (x1, y1), (x2, y2), (0, 255, 0), 6)\n", " label = '{:.3f}'.format(confidence)\n", " label_size, base_line = cv.getTextSize(label, cv.FONT_HERSHEY_SIMPLEX, 3.0, 1)\n", " cv.rectangle(image_out, (x1, y1 - label_size[1]), (x1 + label_size[0], y1 + base_line), \n", " (255, 255, 255), cv.FILLED)\n", " cv.putText(image_out, label, (x1, y1), cv.FONT_HERSHEY_SIMPLEX, 3.0, (0, 0, 0))\n", " \n", "plt.figure(figsize=[12,12])\n", "plt.imshow(image_out[:,:,::-1]);" ] }, { "cell_type": "markdown", "id": "590841cd", "metadata": {}, "source": [ "## Punkty charakterystyczne twarzy\n", "\n", "W OpenCV jest możliwość wykrywania punktów charakterystycznych twarzy (ang. *facial landmarks*). Użyjemy zaimplementowanego [modelu](http://www.jiansun.org/papers/CVPR14_FaceAlignment.pdf) podczas Google Summer of Code przy użyciu [`createFacemarkLBF()`](https://docs.opencv.org/4.5.3/d4/d48/namespacecv_1_1face.html#a0bec73a729ed878430c2feb9ce65bc2a):" ] }, { "cell_type": "code", "execution_count": null, "id": "8534a399", "metadata": {}, "outputs": [], "source": [ "!wget -q --show-progress -O dnn/lbfmodel.yaml https://raw.githubusercontent.com/kurnianggoro/GSOC2017/master/data/lbfmodel.yaml" ] }, { "cell_type": "code", "execution_count": null, "id": "c2971f10", "metadata": {}, "outputs": [], "source": [ "landmark_detector = cv.face.createFacemarkLBF()\n", "landmark_detector.loadModel('dnn/lbfmodel.yaml')" ] }, { "cell_type": "markdown", "id": "761dbc15", "metadata": {}, "source": [ "Ograniczamy nasze poszukiwania do twarzy:" ] }, { "cell_type": "code", "execution_count": null, "id": "39215601", "metadata": {}, "outputs": [], "source": [ "faces = []\n", "\n", "for detection in detections[0][0]:\n", " if detection[2] >= 0.5:\n", " left = detection[3] * width\n", " top = detection[4] * height\n", " right = detection[5] * width\n", " bottom = detection[6] * height\n", "\n", " face_w = right - left\n", " face_h = bottom - top\n", "\n", " face_roi = (left, top, face_w, face_h)\n", " faces.append(face_roi)\n", "\n", "faces = np.array(faces).astype(int)\n", "\n", "_, landmarks_list = landmark_detector.fit(image, faces)" ] }, { "cell_type": "markdown", "id": "56aa90c9", "metadata": {}, "source": [ "Model generuje 68 punktów charakterycznych, które możemy zwizualizować:" ] }, { "cell_type": "code", "execution_count": null, "id": "6d3ab726", "metadata": {}, "outputs": [], "source": [ "image_display = image.copy()\n", "landmarks = landmarks_list[0][0].astype(int)\n", "\n", "for idx, landmark in enumerate(landmarks):\n", " cv.circle(image_display, landmark, 2, (0,255,255), -1)\n", " cv.putText(image_display, str(idx), landmark, cv.FONT_HERSHEY_SIMPLEX, 0.35, (0, 255, 0), 1, \n", " cv.LINE_AA)\n", "\n", "plt.figure(figsize=(10,10))\n", "plt.imshow(image_display[700:1050,500:910,::-1]);" ] }, { "cell_type": "markdown", "id": "7cee8969", "metadata": {}, "source": [ "Jeśli nie potrzebujemy numeracji, to możemy użyć prostszego podejścia, tj. funkcji [`drawFacemarks()`](https://docs.opencv.org/4.5.3/db/d7c/group__face.html#ga318d9669d5ed4dfc6ab9fae2715310f5):" ] }, { "cell_type": "code", "execution_count": null, "id": "1039e253", "metadata": {}, "outputs": [], "source": [ "image_display = image.copy()\n", "for landmarks_set in landmarks_list:\n", " cv.face.drawFacemarks(image_display, landmarks_set, (0, 255, 0))\n", "\n", "plt.figure(figsize=(10,10))\n", "plt.imshow(image_display[500:1050,500:1610,::-1]);" ] }, { "cell_type": "markdown", "id": "db16a1bf", "metadata": {}, "source": [ "# Zadanie 1\n", "\n", "W katalogu `vid` znajdują się filmy `blinking-*.mp4`. Napisz program do wykrywania mrugnięć. Opcjonalnie możesz użyć *eye aspect ratio* z [tego artykułu](http://vision.fe.uni-lj.si/cvww2016/proceedings/papers/05.pdf) lub zaproponować własne rozwiązanie." ] } ], "metadata": { "author": "Andrzej Wójtowicz", "email": "andre@amu.edu.pl", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "lang": "pl", "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" }, "subtitle": "09. Wykrywanie i rozpoznawanie tekstu [laboratoria]", "title": "Widzenie komputerowe", "year": "2021" }, "nbformat": 4, "nbformat_minor": 5 }