AI2020_Project/Report_Klaudia_Przybylska.md

4.1 KiB

Report - Individual Project Klaudia Przybylska

General information

In our project, our agent - garbage truck is collecting trash from dumpsters on the grid and then bringing it to the garbage dump. However to make sure that it wasn't sorted incorrectly or mixed on the way because the road was bumpy, wastes is checked again before the truck is emptied and is sorted accordingly. The program uses Random Forest Classifier to recognize five types of rubbish:

  • cardboard
  • glass
  • metal
  • paper
  • plastic Before running the program it is obligatory to unpack "Garbage classifier.rar" and "ClassificationGarbage.rar".

Extracting information from images

In order to use Random Forest Classifier to classify pictures, I used three global feature descriptors:

  • Hu Moments - responsible for capturing information about shapes because they have information about intensity and position of pixels. They are invariant to image transformations (unlike moments or central moments).
def hu_moments(image):
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    moments = cv2.moments(gray)
    huMoments = cv2.HuMoments(moments).flatten()
    return huMoments
  • Color histogram - representation of the distribution of colors in an image.
def histogram(image, mask=None):
    image = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
    hist  = cv2.calcHist([image], [0, 1, 2], None, [8, 8, 8], [0, 256, 0, 256, 0, 256])
    cv2.normalize(hist, hist)
    histogram = hist.flatten()
    return histogram
  • Haralick Texture is used to quantify an image based on texture (the consistency of patterns and colors in an image).
def haralick(image):
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    haralick = mahotas.features.haralick(gray).mean(axis=0)
    return haralick
  • All three features are then stacked into one matrix and used in training the classifier, and in the same way for testing it.
allFeatures = np.hstack([histo, hara, huMoments])

##Creating test and training sets Data is divided between two sets, where training set contains 80% of all data and test set only 20%. Images are randomly shuffled.

allFileNames = os.listdir(sourceDir)
np.random.shuffle(allFileNames)
trainingFileNames, testFileNames = np.split(np.array(allFileNames), [int(len(allFileNames) * (1 - testRatio))])

##Implementation Functions in garbageDumpSorting.py:

  • createSets - divides images between test and training set. This function should be run only once, unless the folders with training and test set are removed,
trainingFileNames, testFileNames = np.split(np.array(allFileNames), [int(len(allFileNames) * (1 - testRatio))])
  • huMoments, haralick, histogram - calculate global feature descriptors,
  • processTrainData, processTestData - both work in the same way, they iterate over files in train or test directory, saves features as a matrix and then saves results to h5 file, it is recommended to run it only once as it takes some time to finish.
allFeatures = np.hstack([histo, hara, huMoments])
  • trainAndTest - creates classifier, trains it and scores it,
clf  = RandomForestClassifier(n_estimators=100, max_depth=15, random_state=9)
  • classifyImage - predicts what kind of garbage is visible on a single image,
prediction = clf.predict(features)[0]
  • sortDump - checks what kinds of trash are inside the garbage truck and their quantity, empties the garbage truck and sorts its contents on the garbage dump.

##Changes in common part I created class garbageDump in which I store information about the quantity of trash present on the garbage dump. I had to add a small function to Garbagetruck class in order to remove wastes from the garbage truck. In main I initialize garbage dump and at the end I display its contents.

##Libraries The following libraries are required to run the program:

import os
import numpy as np
import shutil
import cv2
import mahotas
import h5py
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import MinMaxScaler
from sklearn.ensemble import RandomForestClassifier
import random