widzenie-komputerowe-MP/wko-09.ipynb

27 KiB
Raw Permalink Blame History

Logo 1

Widzenie komputerowe

09. Metody głębokiego uczenia (1) [laboratoria]

Andrzej Wójtowicz (2021)

Logo 2

W poniższym materiale zobaczymy w jaki sposób korzystać z metod głębokiego uczenia sieci neuronowych w pakiecie OpenCV.

Na początku załadujmy niezbędne biblioteki:

import cv2 as cv
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

OpenCV wspiera wiele bibliotek i modeli sieci neuronowych. Modele trenuje się poza OpenCV - bibliotekę wykorzystuje się tylko do predykcji, aczkolwiek sama w sobie ma całkiem sporo możliwych optymalizacji w porównaniu do źródłowych bibliotek neuronowych, więc predykcja może być tutaj faktycznie szybsza.

Pliki z modelami i danymi pomocniczymi będziemy pobierali z sieci i będziemy je zapisywali w katalogu dnn:

!mkdir -p dnn

Klasyfikacja obrazów

Spróbujemy wykorzystać sieć do klasyfikacji obrazów wyuczonej na zbiorze ImageNet. Pobierzmy plik zawierający opis 1000 możliwych klas:

!wget -q --show-progress -O dnn/classification_classes_ILSVRC2012.txt https://raw.githubusercontent.com/opencv/opencv/master/samples/data/dnn/classification_classes_ILSVRC2012.txt 

Spójrzmy na pierwsze pięć klas w pliku:

with open('dnn/classification_classes_ILSVRC2012.txt', 'r') as f_fd:
    classes = f_fd.read().splitlines()
    
print(len(classes), classes[:5])

Do klasyfikacji użyjemy sieci DenseNet. Pobierzemy jedną z mniejszych reimplementacji, która jest hostowana m.in. na Google Drive (musimy doinstalować jeden pakiet):

!pip3 install --user --disable-pip-version-check gdown
import gdown

url = 'https://drive.google.com/uc?id=0B7ubpZO7HnlCcHlfNmJkU2VPelE'
output = 'dnn/DenseNet_121.caffemodel'
gdown.download(url, output, quiet=False)
!wget -q --show-progress -O dnn/DenseNet_121.prototxt https://raw.githubusercontent.com/shicai/DenseNet-Caffe/master/DenseNet_121.prototxt

Konkretne biblioteki neuronowe posiadają dedykowane funkcje do ładowania modeli, np. readNetFromCaffe() lub readNetFromTorch(), jednak można też użyć ogólnej readNet():

model = cv.dnn.readNet(model='dnn/DenseNet_121.prototxt', config='dnn/DenseNet_121.caffemodel', framework='Caffe')

Spróbujemy sklasyfikować poniższy obraz:

image = cv.imread('img/flamingo.jpg')
plt.figure(figsize=[5,5])
plt.imshow(image[:,:,::-1]);

Aby móc przepuścić obraz przez sieć musimy zmienić jego formę reprezentacji poprzez funkcję blobFromImage(). Aby uzyskać sensowne dane musimy ustawić parametry dotyczące preprocessingu (informacje o tym są zawarte na stronie modelu):

image_blob = cv.dnn.blobFromImage(image=image, scalefactor=0.017, size=(224, 224), mean=(104, 117, 123), 
                                  swapRB=False, crop=False)

Ustawiamy dane wejściowe w naszej sieci i pobieramy obliczone wartości:

model.setInput(image_blob)
outputs = model.forward()[0]

Wyliczamy która klasa jest najbardziej prawdopodobna:

outputs = outputs.reshape(1000, 1)

label_id = np.argmax(outputs)

probs = np.exp(outputs) / np.sum(np.exp(outputs))

Wynik:

plt.imshow(image[:,:,::-1])
plt.title(classes[label_id])
print("{:.2f} %".format(np.max(probs) * 100.0))

Wykrywanie twarzy

Do wykrywania twarzy użyjemy sieci bazującej na SSD:

!wget -q --show-progress -O dnn/res10_300x300_ssd_iter_140000_fp16.caffemodel https://raw.githubusercontent.com/opencv/opencv_3rdparty/dnn_samples_face_detector_20180205_fp16/res10_300x300_ssd_iter_140000_fp16.caffemodel
!wget -q --show-progress -O dnn/res10_300x300_ssd_iter_140000_fp16.prototxt https://raw.githubusercontent.com/opencv/opencv/master/samples/dnn/face_detector/deploy.prototxt

Ładujemy model:

model = cv.dnn.readNet(model='dnn/res10_300x300_ssd_iter_140000_fp16.prototxt', config='dnn/res10_300x300_ssd_iter_140000_fp16.caffemodel', framework='Caffe')

Będziemy chcieli wykryć twarze na poniższym obrazie:

image = cv.imread('img/people.jpg')
plt.figure(figsize=[7,7])
plt.imshow(image[:,:,::-1]);

Znajdujemy twarze i oznaczamy je na zdjęciu (za próg przyjęliśmy 0.5; zob. informacje o preprocessingu):

height, width, _ = image.shape

image_blob = cv.dnn.blobFromImage(image, scalefactor=1.0, size=(300, 300), mean=[104, 177, 123], 
                                  swapRB=False, crop=False)

model.setInput(image_blob)

detections = model.forward()

image_out = image.copy()

for i in range(detections.shape[2]):
    confidence = detections[0, 0, i, 2]
    if confidence > 0.5:

        box = detections[0, 0, i, 3:7] * np.array([width, height, width, height])
        (x1, y1, x2, y2) = box.astype('int')

        cv.rectangle(image_out, (x1, y1), (x2, y2), (0, 255, 0), 6)
        label = '{:.3f}'.format(confidence)
        label_size, base_line = cv.getTextSize(label, cv.FONT_HERSHEY_SIMPLEX, 3.0, 1)
        cv.rectangle(image_out, (x1, y1 - label_size[1]), (x1 + label_size[0], y1 + base_line), 
                      (255, 255, 255), cv.FILLED)
        cv.putText(image_out, label, (x1, y1), cv.FONT_HERSHEY_SIMPLEX, 3.0, (0, 0, 0))
        
plt.figure(figsize=[12,12])
plt.imshow(image_out[:,:,::-1]);

Punkty charakterystyczne twarzy

W OpenCV jest możliwość wykrywania punktów charakterystycznych twarzy (ang. _facial landmarks). Użyjemy zaimplementowanego modelu podczas Google Summer of Code przy użyciu createFacemarkLBF():

!wget -q --show-progress -O dnn/lbfmodel.yaml https://raw.githubusercontent.com/kurnianggoro/GSOC2017/master/data/lbfmodel.yaml
landmark_detector = cv.face.createFacemarkLBF()
landmark_detector.loadModel('dnn/lbfmodel.yaml')

Ograniczamy nasze poszukiwania do twarzy:

faces = []

for detection in detections[0][0]:
    if detection[2] >= 0.5:
        left   = detection[3] * width
        top    = detection[4] * height
        right  = detection[5] * width
        bottom = detection[6] * height

        face_w = right - left
        face_h = bottom - top

        face_roi = (left, top, face_w, face_h)
        faces.append(face_roi)

faces = np.array(faces).astype(int)

_, landmarks_list = landmark_detector.fit(image, faces)

Model generuje 68 punktów charakterycznych, które możemy zwizualizować:

image_display = image.copy()
landmarks = landmarks_list[0][0].astype(int)

for idx, landmark in enumerate(landmarks):
    cv.circle(image_display, landmark, 2, (0,255,255), -1)
    cv.putText(image_display, str(idx), landmark, cv.FONT_HERSHEY_SIMPLEX, 0.35, (0, 255, 0), 1, 
                cv.LINE_AA)

plt.figure(figsize=(10,10))
plt.imshow(image_display[700:1050,500:910,::-1]);

Jeśli nie potrzebujemy numeracji, to możemy użyć prostszego podejścia, tj. funkcji drawFacemarks():

image_display = image.copy()
for landmarks_set in landmarks_list:
    cv.face.drawFacemarks(image_display, landmarks_set, (0, 255, 0))

plt.figure(figsize=(10,10))
plt.imshow(image_display[500:1050,500:1610,::-1]);

Zadanie 1

W katalogu vid znajdują się filmy blinking-*.mp4. Napisz program do wykrywania mrugnięć. Opcjonalnie możesz użyć _eye aspect ratio z tego artykułu lub zaproponować własne rozwiązanie.

!pip3 install dlib
WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.
Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.
To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: dlib in /home/mikolaj/.local/lib/python3.8/site-packages (19.24.0)

[notice] A new release of pip available: 22.2.2 -> 22.3.1
[notice] To update, run: pip install --upgrade pip
!pip3 install imutils
WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.
Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.
To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: imutils in /home/mikolaj/.local/lib/python3.8/site-packages (0.5.4)

[notice] A new release of pip available: 22.2.2 -> 22.3.1
[notice] To update, run: pip install --upgrade pip
The Kernel crashed while executing code in the the current cell or a previous cell. Please review the code in the cell(s) to identify a possible cause of the failure. Click <a href='https://aka.ms/vscodeJupyterKernelCrash'>here</a> for more info. View Jupyter <a href='command:jupyter.viewOutput'>log</a> for further details.
import argparse
import time

import cv2
import dlib
import imutils
import numpy as np
from imutils import face_utils
from imutils.video import FileVideoStream, VideoStream
from scipy.spatial import distance as dist

def eye_aspect_ratio(eye):
    A = dist.euclidean(eye[1], eye[5])
    B = dist.euclidean(eye[2], eye[4])
    C = dist.euclidean(eye[0], eye[3])
    ear = (A + B) / (2.0 * C)
    return ear




EYE_AR_THRESH = 0.3
EYE_AR_CONSEC_FRAMES = 3


COUNTER = 0
TOTAL = 0


print("[INFO] loading facial landmark predictor...")
detector = dlib.get_frontal_face_detector()
predictor = dlib.shape_predictor('shape_predictor_68_face_landmarks.dat')


(lStart, lEnd) = face_utils.FACIAL_LANDMARKS_IDXS["left_eye"]
(rStart, rEnd) = face_utils.FACIAL_LANDMARKS_IDXS["right_eye"]


print("[INFO] starting video stream thread...")
vs = FileVideoStream('').start()
fileStream = True
vs = VideoStream(src=0).start()
fileStream = False
time.sleep(1.0)

# uncomment to our video
# video = cv2.VideoCapture("vid/blinking-woman1.mp4")


while True:

    if fileStream and not vs.more():
        break

    frame = vs.read()

    # uncomment to our video
    # _, frame = video.read()

    # comment to our video
    frame = imutils.resize(frame, width=800)

    # comment to our video
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

    rects = detector(gray, 0)

    # uncomment to our video
    # rects = detector(frame, 0)

    for rect in rects:

        # shape = predictor(gray, rect)
        shape = predictor(frame, rect)
        shape = face_utils.shape_to_np(shape)

        leftEye = shape[lStart:lEnd]
        rightEye = shape[rStart:rEnd]
        leftEAR = eye_aspect_ratio(leftEye)
        rightEAR = eye_aspect_ratio(rightEye)

        ear = (leftEAR + rightEAR) / 2.0

        leftEyeHull = cv2.convexHull(leftEye)
        rightEyeHull = cv2.convexHull(rightEye)
        cv2.drawContours(frame, [leftEyeHull], -1, (0, 255, 0), 1)
        cv2.drawContours(frame, [rightEyeHull], -1, (0, 255, 0), 1)

        if ear < EYE_AR_THRESH:
            COUNTER += 1

        else:

            if COUNTER >= EYE_AR_CONSEC_FRAMES:
                TOTAL += 1

            COUNTER = 0

        cv2.putText(
            frame,
            "Blinks: {}".format(TOTAL),
            (10, 30),
            cv2.FONT_HERSHEY_SIMPLEX,
            0.7,
            (0, 0, 255),
            2,
        )
        cv2.putText(
            frame,
            "EAR: {:.2f}".format(ear),
            (300, 30),
            cv2.FONT_HERSHEY_SIMPLEX,
            0.7,
            (0, 0, 255),
            2,
        )


    cv2.imshow("Frame", frame)
    key = cv2.waitKey(1) & 0xFF

    if key == ord("q"):
        break


cv2.destroyAllWindows()
# vs.stop()
[INFO] loading facial landmark predictor...
[INFO] starting video stream thread...
[ERROR:0@986.863] global /io/opencv/modules/videoio/src/cap.cpp (164) open VIDEOIO(CV_IMAGES): raised OpenCV exception:

OpenCV(4.5.5) /io/opencv/modules/videoio/src/cap_images.cpp:293: error: (-215:Assertion failed) !_filename.empty() in function 'open'


---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
/tmp/ipykernel_1138/2981424523.py in <module>
     64     gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
     65 
---> 66     rects = detector(gray, 0)
     67 
     68     # uncomment to our video

KeyboardInterrupt: 
WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.
Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.
To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.
Defaulting to user installation because normal site-packages is not writeable
Collecting dlib
  Using cached dlib-19.24.0.tar.gz (3.2 MB)
  Preparing metadata (setup.py) ... [?25ldone
[?25hBuilding wheels for collected packages: dlib
  Building wheel for dlib (setup.py) ... [?25ldone
[?25h  Created wheel for dlib: filename=dlib-19.24.0-cp38-cp38-linux_x86_64.whl size=4100165 sha256=76f1ab4b327e49bd2a857100e2ac9a94dba4113bd2673dafab4a2356ef010a92
  Stored in directory: /home/mikolaj/.cache/pip/wheels/4c/d8/2d/a83b10e7bf10cd7d8bef36bf4dcd15b0c9ebf98f990bc984dd
Successfully built dlib
Installing collected packages: dlib
Successfully installed dlib-19.24.0

[notice] A new release of pip available: 22.2.2 -> 22.3.1
[notice] To update, run: pip install --upgrade pip
WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.
Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.
To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.
Defaulting to user installation because normal site-packages is not writeable
Collecting imutils
  Downloading imutils-0.5.4.tar.gz (17 kB)
  Preparing metadata (setup.py) ... [?25ldone
[?25hBuilding wheels for collected packages: imutils
  Building wheel for imutils (setup.py) ... [?25ldone
[?25h  Created wheel for imutils: filename=imutils-0.5.4-py3-none-any.whl size=25836 sha256=fbd551cf6e0c14ad0239a80ba759a98832e345856a631e7d8ed76f2b21ea4279
  Stored in directory: /home/mikolaj/.cache/pip/wheels/59/1b/52/0dea905f8278d5514dc4d0be5e251967f8681670cadd3dca89
Successfully built imutils
Installing collected packages: imutils
Successfully installed imutils-0.5.4

[notice] A new release of pip available: 22.2.2 -> 22.3.1
[notice] To update, run: pip install --upgrade pip