csv s444529

2020-05-18 11:58:08 +02:00 · 2020-05-18 11:58:08 +02:00 · adc584e542
commit adc584e542
parent 32bb6289d1
9 changed files with 1404 additions and 0 deletions
--- a/Classification.md
+++ b/Classification.md
@ -0,0 +1,91 @@
+# CNN Plates Classification
+Author: Weronika Skowrońska, s444523
+
+As my individual project, I decided to perform a classification of plates images using a Convolutional Neural Network. The goal of the project is to classify a photo of the client's plate as empty(0), dirty(1) or full(2), and assign an appropriate value to the given instance of the "Table" class.
+
+# Architecture
+
+Architecture of my CNN is very simple. I decided to use two convolutions, each using 32 feature detectors of size 3 by 3, followed by the ReLU activation function and MaxPooling of size 2 by 2.
+```sh
+classifier = Sequential()
+
+classifier.add(Convolution2D(32, (3, 3), input_shape =(256, 256, 3), activation = "relu"))
+classifier.add(MaxPooling2D(pool_size = (2,2)))
+
+classifier.add(Convolution2D(32, 3, 3, activation = 'relu'))
+classifier.add(MaxPooling2D(pool_size = (2, 2)))
+
+classifier.add(Flatten())
+```
+After flattening, I added a fully connected layer of size 128 (again with ReLU activation function). The output layer consists of 3 neurons with softmax activation function, as I am using the Network for multiclass classification (3 possible outcomes).
+```sh
+classifier.add(Dense(units = 128, activation = "relu"))
+classifier.add(Dense(units = 3, activation = "softmax"))
+```
+The optimizer of my network is adam, and categorical cross entropy was my choice for a loss function.
+```sh
+classifier.compile(optimizer = "adam", loss = "categorical_crossentropy", metrics = ["accuracy"])
+```
+# Library
+
+I used keras to implement the network. It let me add some specific features to my network, such as early stopping and a few methods of data augmentation.
+```sh
+train_datagen = ImageDataGenerator(
+        rescale=1./255,
+        shear_range=0.2,
+        zoom_range=0.2,
+        horizontal_flip=True,
+        width_shift_range=0.2,
+        height_shift_range=0.1,
+        fill_mode='nearest')
+```
+This last issue was very important to me, as I did not have many photos to train the network with (altogether there were approximately 1200 of them).
+
+# Project implementation
+
+After training the Network, I firstly saved the model which gave me the best results (two keras callbacks, EarlyStopping and ModelCheckpoint were very useful) to a file named "best_model.h5".
+```sh
+# callbacks:
+es = EarlyStopping(monitor='val_loss', mode='min', baseline=1, patience = 10)
+mc = ModelCheckpoint('best_model.h5', monitor='val_loss', mode='min', save_best_only=True, verbose = 1, period = 10)
+```
+It occured though, that the file is to big to upload it to git, so I modified the code a little bit, and instead of saving the model, I saved the weights:
+```sh
+mc = ModelCheckpoint('best_model.h5', monitor='val_loss', mode='min', save_best_only=True, verbose = 1, period = 10, save_weights_only = True)
+```
+To be honest, it was not a very good idea either, as the new file is also to big to upload it. I managed to solve the probem in another way: I added the h5 file to my google drive, and added a link to download it to the project files.
+
+To use the saved weights, I created the CNN model inside our project:
+```sh
+#initializing:
+classifier = Sequential()
+#Convolution:
+classifier.add(Convolution2D(32, (3, 3), input_shape =(256, 256, 3), activation = "relu"))
+#Pooling:
+classifier.add(MaxPooling2D(pool_size = (2,2)))
+
+# Adding a second convolutional layer
+classifier.add(Convolution2D(32, 3, 3, activation = 'relu'))
+classifier.add(MaxPooling2D(pool_size = (2, 2)))
+
+#Flattening:
+classifier.add(Flatten())
+
+#Fully connected layers::
+classifier.add(Dense(units = 128, activation = "relu"))
+classifier.add(Dense(units = 3, activation = "softmax"))
+
+# loading weigjts:
+classifier.load_weights('s444523/best_model_weights2.h5')
+#Making CNN:
+classifier.compile(optimizer = "adam", loss = "categorical_crossentropy", metrics = ["accuracy"])
+```
+After coming to each table, the Agent (the waiter) evaluates a randomly selected photo of a plate using the provided model, and assigns the number of predicted class into the "state" attribute of a given table. This information will let perform further actions, based on the predicted outcome.
+
+I noticed that my program has difficulties in distinguishing a full plate from a dirty one - interestingly, this was also a problem for me and my friends when we worked as real waiters in the restaurant. Therefore, if the plate is classified by the waiter as dirty, he asks politely if the client already has done eating, and acts accordingly to his answer:
+```sh
+if result[1] == 1:
+            result[1] = 0
+            x = int(input("Excuse me, have You done eating? 1=Yes, 2 = No \n"))
+            result[x] = 1
+```
--- a/Reinforcement_learning.md
+++ b/Reinforcement_learning.md
@ -0,0 +1,235 @@
+# Reinforcement learning for route planning in restaurant
+##### Tao-Sen Chang    s442720
+
+###### We did the route planning by special algorithm on last task. In this machine learning sub-project I try to show different approach for the agent who can traversal multiple destinations on the grid system, and of course, get the shortest path of the route. What I want to use is called reinforcement learning. 
+
+## What is reinforcement learning?
+###### Reinforcement learning is how software agents ought to take actions in an environment in order to maximize the notion of cumulative reward. The agent makes a sequence of decisions, and learn to perform the best actions every step. For example, in my project there is a waiter in the grid and he has to reach many tables for serving the meal, so he must learn the shortest path to get a table.
+
+## How to do that?
+###### The idea of how to complete the reinforcement learning is not quite easy. However, there is an example - rat in a maze. Instead of writing the whole algorithm at the beginning, I use the existing codes(tools) by adjusting some parameters to train the agent on our 16x16 grid.
+https://www.samyzaf.com/ML/rl/qmaze.html
+###### I train the agent(waiter) with rewards and penalties, the waiter in the above grid gets a small penalty for every legal move. The reason is that we want it to get to the target table in the shortest possible path. However, the shortest path to the target table is sometimes long and winding, and our agent (the waiter) may have to endure many errors until he gets to the table.
+###### For example, one of the training parameters(rewards) are:
+```
+if rat_row == win_target_x and rat_col == win_target_y: # if reach the final target
+    return 1.0
+if mode == 'blocked':   # move to the block in the grid (blocks are tables or kitchen in our grid)
+    return -1.0
+if (rat_row, rat_col) in self.visited: # when get to the visited grid point
+    return -0.5    
+if mode == 'invalid': # when move to the boundary
+    return -0.75    
+if mode == 'valid': # to make the route shorter, we give a penalty by moving to valid grid point
+    return -0.04
+if (rat_row, rat_col) in self.curr_win_targets: # if reach any table
+    return 1.0
+```
+```
+self.min_reward = -0.5 * self.maze.size
+```
+## Q-learning
+###### We want to get the maximum reward from each action in a state. Here defines action=π(s).
+###### Q(s,a) = the maximum total reward we can get by choosing action a in state s. Hence it's obvious that we get the function π(s)=argmaxQ(s,ai)   Now the question is how to get Q(s,a)?
+###### There is a solution called Bellman's Equation: Q(s,a) = R(s,a) + maxQ(s′,ai)
+###### R(s,a) is the reward in current state s, action a. And s′ means the next state, so maxQ(s′,ai) means the maximum reward in 4 actions from next state. In the code we have the Experience Class to memorize each "episode", but the memory is limited, therefore if reach the max_memory, then delete the old episode which has lower effect to current episode.
+###### There is a coefficient called discount factor, usually denoted by γ which is required for the Bellman equation for stochastic environments. So the new Bellman's Equation can be written as Q(s,a) = R(s,a) + γ * maxQ(s′,ai). This discount factor is to diminish the effects which are far from current state.
+```
+class Experience(object):
+    def __init__(self, model, max_memory=100, discount=0.95):
+        self.model = model
+        self.max_memory = max_memory
+        self.discount = discount
+        self.memory = list()
+        self.num_actions = model.output_shape[-1]
+
+    def remember(self, episode):
+        # episode = [envstate, action, reward, envstate_next, game_over]
+        # memory[i] = episode
+        # envstate == flattened 1d maze cells info, including rat cell (see method: observe)
+        self.memory.append(episode)
+        if len(self.memory) > self.max_memory:
+            del self.memory[0]
+
+    def predict(self, envstate):
+        return self.model.predict(envstate)[0]
+
+    def get_data(self, data_size=10):
+        env_size = self.memory[0][0].shape[1]   # envstate 1d size (1st element of episode)
+        mem_size = len(self.memory)
+        data_size = min(mem_size, data_size)
+        inputs = np.zeros((data_size, env_size))
+        targets = np.zeros((data_size, self.num_actions))
+        for i, j in enumerate(np.random.choice(range(mem_size), data_size, replace=False)):
+            envstate, action, reward, envstate_next, game_over = self.memory[j]
+            inputs[i] = envstate
+            # There should be no target values for actions not taken.
+            targets[i] = self.predict(envstate)
+            # Q_sa = derived policy = max quality env/action = max_a' Q(s', a')
+            Q_sa = np.max(self.predict(envstate_next))
+            if game_over:
+                targets[i, action] = reward
+            else:
+                # reward + gamma * max_a' Q(s', a')
+                targets[i, action] = reward + self.discount * Q_sa
+        return inputs, targets
+```
+
+## Training
+###### Following is the algorithm for training neural network model to solve the problem. One epoch means one loop of the training, and in each epoch the agent will finally become either "win" or "lose". 
+###### Another coefficient "epsilon" is exploration factor which decides the probability of whether the agent will perform new actions instead of following the previous experiences (which is called exploitation). By this way the agent could not only collect better rewards from previous experiences, but also have the chances to explore unknown area where might get more rewards. If one of the strategies is determined, then let's start training it by neural network. (inputs: size equals to the maze size, targets: size is the same as the number of actions (4 in our case)).
+```
+# Exploration factor
+epsilon = 0.1
+def qtrain(model, maze, **opt):
+    global epsilon
+    n_epoch = opt.get('n_epoch', 15000)
+    max_memory = opt.get('max_memory', 1000)
+    data_size = opt.get('data_size', 50)
+    weights_file = opt.get('weights_file', "")
+    name = opt.get('name', 'model')
+    start_time = datetime.datetime.now()
+
+    # If you want to continue training from a previous model,
+    # just supply the h5 file name to weights_file option
+    if weights_file:
+        print("loading weights from file: %s" % (weights_file,))
+        model.load_weights(weights_file)
+
+    # Construct environment/game from numpy array: maze (see above)
+    qmaze = Qmaze(maze)
+
+    # Initialize experience replay object
+    experience = Experience(model, max_memory=max_memory)
+
+    win_history = []   # history of win/lose game
+    n_free_cells = len(qmaze.free_cells)
+    hsize = qmaze.maze.size//2   # history window size
+    win_rate = 0.0
+    imctr = 1
+    pre_episodes = 2**31 - 1
+
+    for epoch in range(n_epoch):
+        loss = 0.0
+        #rat_cell = random.choice(qmaze.free_cells)
+        #rat_cell = (0, 0)
+        rat_cell = (12, 12)
+
+        qmaze.reset(rat_cell)
+        game_over = False
+
+        # get initial envstate (1d flattened canvas)
+        envstate = qmaze.observe()
+
+        n_episodes = 0
+        while not game_over:
+            valid_actions = qmaze.valid_actions()
+            if not valid_actions: break
+            prev_envstate = envstate
+            # Get next action
+            if np.random.rand() < epsilon:
+                action = random.choice(valid_actions)
+            else:
+                action = np.argmax(experience.predict(prev_envstate))
+
+            # Apply action, get reward and new envstate
+            envstate, reward, game_status = qmaze.act(action)
+            if game_status == 'win':
+                print("win")
+                win_history.append(1)
+                game_over = True
+                # save_pic(qmaze)
+                if n_episodes <= pre_episodes:
+                    # output_route(qmaze)
+                    print(qmaze.visited)
+                    with open('res.data', 'wb') as filehandle:
+                        pickle.dump(qmaze.visited, filehandle)
+                    pre_episodes = n_episodes
+                    
+            elif game_status == 'lose':
+                print("lose")
+                win_history.append(0)
+                game_over = True
+                # save_pic(qmaze)
+            else:
+                game_over = False
+
+            # Store episode (experience)
+            episode = [prev_envstate, action, reward, envstate, game_over]
+            experience.remember(episode)
+            n_episodes += 1
+
+            # Train neural network model
+            inputs, targets = experience.get_data(data_size=data_size)
+            h = model.fit(
+                inputs,
+                targets,
+                epochs=8,
+                batch_size=16,
+                verbose=0,
+            )
+            loss = model.evaluate(inputs, targets, verbose=0)
+            
+        
+        if len(win_history) > hsize:
+            win_rate = sum(win_history[-hsize:]) / hsize
+    
+        dt = datetime.datetime.now() - start_time
+        t = format_time(dt.total_seconds())
+        
+        template = "Epoch: {:03d}/{:d} | Loss: {:.4f} | Episodes: {:d} | Win count: {:d} | Win rate: {:.3f} | time: {}"
+        print(template.format(epoch, n_epoch-1, loss, n_episodes, sum(win_history), win_rate, t))
+```
+
+## Testing
+###### Use this algorithm to our 16x16 grid and train.
+```
+grid = [[1 for x in range(16)] for y in range(16)]
+table1 = Table(2, 2)
+table2 = Table (2,7)
+table3 = Table(2, 12)
+table4 = Table(7, 2)
+table5 = Table(7, 7)
+table6 = Table(7, 12)
+table7 = Table(12, 2)
+table8 = Table(12, 7)
+
+kitchen = Kitchen(13, 13)
+maze = np.array(grid)
+model = build_model(maze)
+qtrain(model, maze, epochs=1000, max_memory=8*maze.size, data_size=32)
+```
+###### Also I create a list called win_targets to put the position of tables in the grid.
+```
+win_targets = [(4, 4),(4, 9),(4, 14),(9, 4),(9, 9),(9, 14),(14, 4),(14, 9)]
+```
+###### After tons of training, I realize it is not an easy task to obtain the shortest route in every training - that means most of the training are failed - especially in the case that the win_targets has more targets. For example, the result of training 8 targets is like this(part of result):
+```
+...
+Epoch: 167/14999 | Loss: 0.0299 | Episodes: 407 | Win count: 63 | Win rate: 0.422 | time: 2.44 hours
+Epoch: 168/14999 | Loss: 0.0112 | Episodes: 650 | Win count: 63 | Win rate: 0.414 | time: 2.46 hours
+Epoch: 169/14999 | Loss: 0.0147 | Episodes: 392 | Win count: 64 | Win rate: 0.422 | time: 2.47 hours
+Epoch: 170/14999 | Loss: 0.0112 | Episodes: 668 | Win count: 65 | Win rate: 0.422 | time: 2.48 hours
+Epoch: 171/14999 | Loss: 0.0101 | Episodes: 487 | Win count: 66 | Win rate: 0.430 | time: 2.50 hours
+Epoch: 172/14999 | Loss: 0.0121 | Episodes: 362 | Win count: 67 | Win rate: 0.438 | time: 2.51 hours
+Epoch: 173/14999 | Loss: 0.0101 | Episodes: 484 | Win count: 68 | Win rate: 0.445 | time: 2.52 hours
+...
+```
+###### The only one which is successful contains 4 targets(win_targets = [(4, 4),(4, 9),(4, 14),(9, 4)]) 
+```
+...
+Epoch: 223/14999 | Loss: 0.0228 | Episodes: 30 | Win count: 165 | Win rate: 0.906 | time: 64.02 minutes
+Epoch: 224/14999 | Loss: 0.0160 | Episodes: 52 | Win count: 166 | Win rate: 0.906 | time: 64.09 minutes
+Epoch: 225/14999 | Loss: 0.0702 | Episodes: 34 | Win count: 167 | Win rate: 0.914 | time: 64.14 minutes
+Epoch: 226/14999 | Loss: 0.0175 | Episodes: 40 | Win count: 168 | Win rate: 0.922 | time: 64.19 minutes
+Epoch: 227/14999 | Loss: 0.0271 | Episodes: 46 | Win count: 169 | Win rate: 0.930 | time: 64.25 minutes
+Epoch: 228/14999 | Loss: 0.0194 | Episodes: 40 | Win count: 170 | Win rate: 0.938 | time: 64.30 minutes
+...
+Epoch: 460/14999 | Loss: 0.0236 | Episodes: 60 | Win count: 401 | Win rate: 1.000 | time: 1.48 hours
+Reached 100% win rate at epoch: 460
+n_epoch: 460, max_mem: 2048, data: 32, time: 1.48 hours
+```
+###### In my opinion, there are 3 reasons cause such bad results.
+###### 1. The parameters in the algorithm are not optimal including the rewards, exploration rate, and discount factor. To adjust the parameters and to validate them costs lots of time, and the most intuitive way is always not the best solution. For example, the parameters of 4 targets are fine, but if the number of targets expanded to 8, the parameters are not just 1/2 of the original ones.
+###### 2. Because of the exploration rate, every time the same training and testing data may have a different result. It increases the difficulty to verify our result. The only way to check whether the parameters generate ideal results is training continuously until we collect sufficient data.
+###### 3. The algorithm is for a rat in a maze at the beginning, and the number of default target is only one. If we apply it for multiple targets, there may be inadequate for some reason. Moreover, the default size is 7x7. It is possible that the 16x16 grid is too huge for this algorithm.
--- a/main_training.py
+++ b/main_training.py
@ -0,0 +1,489 @@
+from __future__ import print_function
+import os, sys, time, datetime, json, random
+import numpy as np
+from keras.models import Sequential
+from keras.layers.core import Dense, Activation
+from keras.optimizers import SGD , Adam, RMSprop
+from keras.layers.advanced_activations import PReLU
+import matplotlib.pyplot as plt
+import pickle
+
+visited_mark = 0.8  # Cells visited by the rat will be painted by gray 0.8
+rat_mark = 0.5      # The current rat cell will be painteg by gray 0.5
+LEFT = 0
+UP = 1
+RIGHT = 2
+DOWN = 3
+
+# Actions dictionary
+actions_dict = {
+    LEFT: 'left',
+    UP: 'up',
+    RIGHT: 'right',
+    DOWN: 'down',
+}
+
+num_actions = len(actions_dict)
+
+# Exploration factor
+epsilon = 0.1
+file_name_num = 1
+win_targets = [(4, 4),(4, 9),(4, 14),(9, 4)]
+
+class Qmaze(object):
+    def __init__(self, maze, rat=(12,12)):
+        global win_targets
+        self._maze = np.array(maze)
+        nrows, ncols = self._maze.shape
+        #self.target = (nrows-1, ncols-1)   # target cell where the "cheese" is
+        self.target = win_targets[0]
+        self.free_cells = [(r,c) for r in range(nrows) for c in range(ncols) if self._maze[r,c] == 1.0]
+        self.free_cells.remove(win_targets[-1])
+        if self._maze[self.target] == 0.0:
+            raise Exception("Invalid maze: target cell cannot be blocked!")
+        if not rat in self.free_cells:
+            raise Exception("Invalid Rat Location: must sit on a free cell")
+        self.reset(rat)
+
+    def reset(self, rat):
+        global win_targets
+        self.rat = rat
+        self.maze = np.copy(self._maze)
+        nrows, ncols = self.maze.shape
+        row, col = rat
+        self.maze[row, col] = rat_mark
+        self.state = (row, col, 'start')
+        self.min_reward = -0.5 * self.maze.size
+        self.total_reward = 0
+        self.visited = list()
+        self.curr_win_targets = win_targets[:]
+
+    def update_state(self, action):
+        nrows, ncols = self.maze.shape
+        nrow, ncol, nmode = rat_row, rat_col, mode = self.state
+
+        if self.maze[rat_row, rat_col] > 0.0:
+            self.visited.append((rat_row, rat_col))  # mark visited cell
+
+        valid_actions = self.valid_actions()
+                
+        if not valid_actions:
+            nmode = 'blocked'
+        elif action in valid_actions:
+            nmode = 'valid'
+            if action == LEFT:
+                ncol -= 1
+            elif action == UP:
+                nrow -= 1
+            if action == RIGHT:
+                ncol += 1
+            elif action == DOWN:
+                nrow += 1
+        else:                  # invalid action, no change in rat position
+            mode = 'invalid'
+
+        # new state
+        self.state = (nrow, ncol, nmode)
+
+    def get_reward(self):
+        win_target_x, win_target_y = self.target
+        rat_row, rat_col, mode = self.state
+        nrows, ncols = self.maze.shape
+        if rat_row == win_target_x and rat_col == win_target_y:
+            return 1.0
+        if mode == 'blocked':  # move to the block in the grid
+            return -1.0
+        if (rat_row, rat_col) in self.visited:
+            return -0.5    # default -0.25 -> -0.5
+        if mode == 'invalid':
+            return -0.75    # default -0.75 move to the boundary
+        if mode == 'valid': # default -0.04 -> -0.1 
+            return -0.04
+        if (rat_row, rat_col) in self.curr_win_targets:
+            return 1.0
+
+    def act(self, action):
+        self.update_state(action)
+        reward = self.get_reward()
+        self.total_reward += reward
+        status = self.game_status()
+        envstate = self.observe()
+        return envstate, reward, status
+
+    def observe(self):
+        canvas = self.draw_env()
+        envstate = canvas.reshape((1, -1))
+        return envstate
+
+    def draw_env(self):
+        canvas = np.copy(self.maze)
+        nrows, ncols = self.maze.shape
+        # clear all visual marks
+        for r in range(nrows):
+            for c in range(ncols):
+                if canvas[r,c] > 0.0:
+                    canvas[r,c] = 1.0
+        # draw the rat
+        row, col, valid = self.state
+        canvas[row, col] = rat_mark
+        return canvas
+
+    def game_status(self):
+        if self.total_reward < self.min_reward:
+            return 'lose'
+        rat_row, rat_col, mode = self.state
+        nrows, ncols = self.maze.shape
+        
+        curPos = (rat_row, rat_col)
+                
+        if curPos in self.curr_win_targets:
+            self.curr_win_targets.remove(curPos)
+            if len(self.curr_win_targets) == 0:
+                return 'win'
+            else:
+                self.target = self.curr_win_targets[0]
+
+        return 'not_over'
+
+    def valid_actions(self, cell=None):
+        if cell is None:
+            row, col, mode = self.state
+        else:
+            row, col = cell
+        actions = [0, 1, 2, 3]
+        nrows, ncols = self.maze.shape
+        if row == 0:
+            actions.remove(1)
+        elif row == nrows-1:
+            actions.remove(3)
+
+        if col == 0:
+            actions.remove(0)
+        elif col == ncols-1:
+            actions.remove(2)
+
+        if row>0 and self.maze[row-1,col] == 0.0:
+            actions.remove(1)
+        if row<nrows-1 and self.maze[row+1,col] == 0.0:
+            actions.remove(3)
+
+        if col>0 and self.maze[row,col-1] == 0.0:
+            actions.remove(0)
+        if col<ncols-1 and self.maze[row,col+1] == 0.0:
+            actions.remove(2)
+
+        return actions
+    
+def show(qmaze):
+    global win_target
+    win_target_row, win_target_col = win_target
+    plt.grid('on')
+    nrows, ncols = qmaze.maze.shape
+    ax = plt.gca()
+    ax.set_xticks(np.arange(0.5, nrows, 1))
+    ax.set_yticks(np.arange(0.5, ncols, 1))
+    ax.set_xticklabels([])
+    ax.set_yticklabels([])
+    canvas = np.copy(qmaze.maze)
+    for row,col in qmaze.visited:
+        canvas[row,col] = 0.6
+    rat_row, rat_col, _ = qmaze.state
+    canvas[rat_row, rat_col] = 0.3   # rat cell
+    canvas[win_target_row, win_target_col] = 0.9 # cheese cell
+    img = plt.imshow(canvas, interpolation='none', cmap='gray')
+    return img
+
+
+def save_pic(qmaze):
+    global file_name_num
+    global win_target
+    win_target_row, win_target_col = win_target
+    plt.grid('on')
+    nrows, ncols = qmaze.maze.shape
+    ax = plt.gca()
+    ax.set_xticks(np.arange(0.5, nrows, 1))
+    ax.set_yticks(np.arange(0.5, ncols, 1))
+    ax.set_xticklabels([])
+    ax.set_yticklabels([])
+    canvas = np.copy(qmaze.maze)
+    for row,col in qmaze.visited:
+        canvas[row,col] = 0.6
+    rat_row, rat_col, _ = qmaze.state
+    canvas[rat_row, rat_col] = 0.3   # rat cell
+    canvas[win_target_row, win_target_col] = 0.9 # cheese cell
+    plt.imshow(canvas, interpolation='none', cmap='gray')
+    plt.savefig(str(file_name_num) + ".png")
+    file_name_num += 1
+
+def output_route(qmaze):
+    global win_target
+    win_target_row, win_target_col = win_target
+    print(qmaze._maze)
+
+def play_game(model, qmaze, rat_cell):
+    qmaze.reset(rat_cell)
+    envstate = qmaze.observe()
+    while True:
+        prev_envstate = envstate
+        # get next action
+        q = model.predict(prev_envstate)
+        action = np.argmax(q[0])
+
+        # apply action, get rewards and new state
+        envstate, reward, game_status = qmaze.act(action)
+        if game_status == 'win':
+            return True
+        elif game_status == 'lose':
+            return False
+
+
+def completion_check(model, qmaze):
+    for cell in qmaze.free_cells:
+        if not qmaze.valid_actions(cell):
+            return False
+        if not play_game(model, qmaze, cell):
+            return False
+    return True
+
+
+class Experience(object):
+    def __init__(self, model, max_memory=100, discount=0.9):
+        self.model = model
+        self.max_memory = max_memory
+        self.discount = discount
+        self.memory = list()
+        self.num_actions = model.output_shape[-1]
+
+    def remember(self, episode):
+        # episode = [envstate, action, reward, envstate_next, game_over]
+        # memory[i] = episode
+        # envstate == flattened 1d maze cells info, including rat cell (see method: observe)
+        self.memory.append(episode)
+        if len(self.memory) > self.max_memory:
+            del self.memory[0]
+
+    def predict(self, envstate):
+        return self.model.predict(envstate)[0]
+
+    def get_data(self, data_size=10):
+        env_size = self.memory[0][0].shape[1]   # envstate 1d size (1st element of episode)
+        mem_size = len(self.memory)
+        data_size = min(mem_size, data_size)
+        inputs = np.zeros((data_size, env_size))
+        targets = np.zeros((data_size, self.num_actions))
+        for i, j in enumerate(np.random.choice(range(mem_size), data_size, replace=False)):
+            envstate, action, reward, envstate_next, game_over = self.memory[j]
+            inputs[i] = envstate
+            # There should be no target values for actions not taken.
+            targets[i] = self.predict(envstate)
+            # Q_sa = derived policy = max quality env/action = max_a' Q(s', a')
+            Q_sa = np.max(self.predict(envstate_next))
+            if game_over:
+                targets[i, action] = reward
+            else:
+                # reward + gamma * max_a' Q(s', a')
+                targets[i, action] = reward + self.discount * Q_sa
+        return inputs, targets
+
+def qtrain(model, maze, **opt):
+    global epsilon
+    n_epoch = opt.get('n_epoch', 15000)
+    max_memory = opt.get('max_memory', 1000)
+    data_size = opt.get('data_size', 50)
+    weights_file = opt.get('weights_file', "")
+    name = opt.get('name', 'model')
+    start_time = datetime.datetime.now()
+
+    # If you want to continue training from a previous model,
+    # just supply the h5 file name to weights_file option
+    if weights_file:
+        print("loading weights from file: %s" % (weights_file,))
+        model.load_weights(weights_file)
+
+    # Construct environment/game from numpy array: maze (see above)
+    qmaze = Qmaze(maze)
+
+    # Initialize experience replay object
+    experience = Experience(model, max_memory=max_memory)
+
+    win_history = []   # history of win/lose game
+    n_free_cells = len(qmaze.free_cells)
+    hsize = qmaze.maze.size//2   # history window size
+    win_rate = 0.0
+    imctr = 1
+    pre_episodes = 2**31 - 1
+
+    for epoch in range(n_epoch):
+        loss = 0.0
+        #rat_cell = random.choice(qmaze.free_cells)
+        #rat_cell = (0, 0)
+        rat_cell = (12, 12)
+
+        qmaze.reset(rat_cell)
+        game_over = False
+
+        # get initial envstate (1d flattened canvas)
+        envstate = qmaze.observe()
+
+        n_episodes = 0
+        while not game_over:
+            valid_actions = qmaze.valid_actions()
+            if not valid_actions: break
+            prev_envstate = envstate
+            # Get next action
+            if np.random.rand() < epsilon:
+                action = random.choice(valid_actions)
+            else:
+                action = np.argmax(experience.predict(prev_envstate))
+
+            # Apply action, get reward and new envstate
+            envstate, reward, game_status = qmaze.act(action)
+            if game_status == 'win':
+                print("win")
+                win_history.append(1)
+                game_over = True
+                # save_pic(qmaze)
+                if n_episodes <= pre_episodes:
+                    # output_route(qmaze)
+                    print(qmaze.visited)
+                    with open('res.data', 'wb') as filehandle:
+                        pickle.dump(qmaze.visited, filehandle)
+                    pre_episodes = n_episodes
+                    
+            elif game_status == 'lose':
+                print("lose")
+                win_history.append(0)
+                game_over = True
+                # save_pic(qmaze)
+            else:
+                game_over = False
+
+            # Store episode (experience)
+            episode = [prev_envstate, action, reward, envstate, game_over]
+            experience.remember(episode)
+            n_episodes += 1
+
+            # Train neural network model
+            inputs, targets = experience.get_data(data_size=data_size)
+            h = model.fit(
+                inputs,
+                targets,
+                epochs=8,
+                batch_size=16,
+                verbose=0,
+            )
+            loss = model.evaluate(inputs, targets, verbose=0)
+            
+        
+        if len(win_history) > hsize:
+            win_rate = sum(win_history[-hsize:]) / hsize
+    
+        dt = datetime.datetime.now() - start_time
+        t = format_time(dt.total_seconds())
+        
+        template = "Epoch: {:03d}/{:d} | Loss: {:.4f} | Episodes: {:d} | Win count: {:d} | Win rate: {:.3f} | time: {}"
+        print(template.format(epoch, n_epoch-1, loss, n_episodes, sum(win_history), win_rate, t))
+        # we simply check if training has exhausted all free cells and if in all
+        # cases the agent won
+        if win_rate > 0.9 : epsilon = 0.05
+        train_max = 192
+        # print(sum(win_history[-192*1.5:]))
+        # print(192)
+        if sum(win_history[-192:]) >= 192:
+            print("Reached 100%% win rate at epoch: %d" % (epoch,))
+            break
+
+    # Save trained model weights and architecture, this will be used by the visualization code
+    h5file = name + ".h5"
+    json_file = name + ".json"
+    model.save_weights(h5file, overwrite=True)
+    with open(json_file, "w") as outfile:
+        json.dump(model.to_json(), outfile)
+    end_time = datetime.datetime.now()
+    dt = datetime.datetime.now() - start_time
+    seconds = dt.total_seconds()
+    t = format_time(seconds)
+    print('files: %s, %s' % (h5file, json_file))
+    print("n_epoch: %d, max_mem: %d, data: %d, time: %s" % (epoch, max_memory, data_size, t))
+    return seconds
+
+# This is a small utility for printing readable time strings:
+def format_time(seconds):
+    if seconds < 400:
+        s = float(seconds)
+        return "%.1f seconds" % (s,)
+    elif seconds < 4000:
+        m = seconds / 60.0
+        return "%.2f minutes" % (m,)
+    else:
+        h = seconds / 3600.0
+        return "%.2f hours" % (h,)
+
+def build_model(maze, lr=0.001):
+    model = Sequential()
+    model.add(Dense(maze.size, input_shape=(maze.size,)))
+    model.add(PReLU())
+    model.add(Dense(maze.size))
+    model.add(PReLU())
+    model.add(Dense(num_actions))
+    model.compile(optimizer='adam', loss='mse')
+    return model
+
+
+            
+class Table:
+    def __init__(self, coordinate_i, coordinate_j):
+        self.coordinate_i = coordinate_i
+        self.coordinate_j = coordinate_j
+        change_value(coordinate_i, coordinate_j, 2, 0.)
+    def get_destination_coor(self):
+        return [self.coordinate_i, self.coordinate_j-1]
+        
+class Kitchen:
+    def __init__(self, coordinate_i, coordinate_j):
+        self.coordinate_i = coordinate_i
+        self.coordinate_j = coordinate_j
+        change_value(coordinate_i, coordinate_j, 3, 0.)
+
+if __name__== "__main__":
+    
+    def change_value(i, j, width, n):
+        for r in range (i, i+width):
+            for c in range (j, j+width):
+                grid[r][c] = n
+
+    grid = [[1 for x in range(16)] for y in range(16)]
+    table1 = Table(2, 2)
+    table2 = Table (2,7)
+    table3 = Table(2, 12)
+    table4 = Table(7, 2)
+    table5 = Table(7, 7)
+    table6 = Table(7, 12)
+    table7 = Table(12, 2)
+    table8 = Table(12, 7)
+
+
+    kitchen = Kitchen(13, 13)
+    maze = np.array(grid)
+
+    # print(maze)
+    # maze =  np.array([
+    #     [ 1.,  0.,  1.,  1.,  1.,  1.,  1., 1.],
+    #     [ 1.,  1.,  1.,  0.,  0.,  1.,  0., 1.],
+    #     [ 1.,  1.,  1.,  1.,  1.,  1.,  0., 1.],
+    #     [ 1.,  1.,  1.,  1.,  0.,  0.,  1., 1.],
+    #     [ 1.,  0.,  0.,  0.,  1.,  1.,  1., 1.],
+    #     [ 1.,  0.,  1.,  1.,  1.,  1.,  1., 1.],
+    #     [ 1.,  1.,  1.,  0.,  1.,  1.,  1., 1.]
+    # ])
+    # print(maze)
+    
+    
+    # qmaze = Qmaze(maze)
+    # show(qmaze)
+
+    model = build_model(maze)
+    qtrain(model, maze, epochs=1000, max_memory=8*maze.size, data_size=32)
+
+
+
--- a/plates.rar
+++ b/plates.rar
--- a/res_targets_4-1.data
+++ b/res_targets_4-1.data
--- a/route_for_project.py
+++ b/route_for_project.py
@ -0,0 +1,151 @@
+import pygame
+import numpy as np
+import math
+import pickle
+
+# Colors:
+# Define some colors
+BLACK = (0, 0, 0)
+WHITE = (255, 255, 255)
+GREEN = (0, 255, 0)
+RED = (255, 0, 0)
+BLUE = (0, 0, 240)
+YELLOW = (255, 255, 0)
+#Width and Height of each square:
+WIDTH = 20
+HEIGHT = 20
+
+#Margin:
+MARGIN = 5
+grid = [[0 for x in range(16)] for y in range(16)]
+
+def change_value(i, j, width, n):
+    for r in range (i, i+width):
+        for c in range (j, j+width):
+            grid[r][c] = n
+            
+class Table:
+    def __init__(self, coordinate_i, coordinate_j):
+        self.coordinate_i = coordinate_i
+        self.coordinate_j = coordinate_j
+        change_value(coordinate_i, coordinate_j, 2, 1)
+    def get_destination_coor(self):
+        return [self.coordinate_i, self.coordinate_j-1]
+        
+class Kitchen:
+    def __init__(self, coordinate_i, coordinate_j):
+        self.coordinate_i = coordinate_i
+        self.coordinate_j = coordinate_j
+        change_value(coordinate_i, coordinate_j, 3, 2)
+        
+class Agent:
+    def __init__(self,orig_coordinate_i, orig_coordinate_j):
+        self.orig_coordinate_i = orig_coordinate_i
+        self.orig_coordinate_j = orig_coordinate_j
+        self.state = np.array([orig_coordinate_i,orig_coordinate_j])
+        change_value(orig_coordinate_j, orig_coordinate_j, 1, 3)
+        self.state_update(orig_coordinate_i, orig_coordinate_j)
+        
+    def state_update(self, c1, c2):
+        self.state[0] = c1
+        self.state[1] = c2
+        
+    def leave(self):
+        change_value(self.state[0], self.state[1], 1, 0)
+    
+    
+    def move_to(self, nextPos):
+        self.leave()
+        nextPos_x, nextPos_y = nextPos
+        self.state_update(nextPos_x, nextPos_y)
+        change_value(self.state[0], self.state[1], 1, 3)
+    
+
+def check_done():
+    for event in pygame.event.get():        # Checking for the event
+        if event.type == pygame.QUIT:       # If the program is closed:
+            return True                     # To exit the loop
+
+def draw_grid(visited):
+    for row in range(16):                   # Drawing the grid
+        for column in range(16):
+            color = WHITE
+            if grid[row][column] == 1:
+                color = GREEN
+            if grid[row][column] == 2:
+                color = RED
+            if grid[row][column] == 3:
+                color = BLUE
+            if (row, column) in visited or (row, column) in table_targets:
+                color = YELLOW
+            pygame.draw.rect(screen,
+                             color,
+                             [(MARGIN + WIDTH) * column + MARGIN,
+                              (MARGIN + HEIGHT) * row + MARGIN,
+                              WIDTH,
+                              HEIGHT]) 
+
+
+## default positions of the agent:
+x = 12
+y = 12
+agent = Agent(x, y)
+
+table1 = Table(2, 2)
+table2 = Table (2,7)
+table3 = Table(2, 12)
+table4 = Table(7, 2)
+table5 = Table(7, 7)
+table6 = Table(7, 12)
+table7 = Table(12, 2)
+table8 = Table(12, 7)
+
+#class Kitchen:
+kitchen = Kitchen(13, 13)
+
+pygame.init()
+WINDOW_SIZE = [405, 405]
+screen = pygame.display.set_mode(WINDOW_SIZE)
+
+pygame.display.set_caption("Waiter_Grid3")
+
+done = False
+
+clock = pygame.time.Clock()
+ 
+with open('res_targets_4-1.data', 'rb') as filehandle:
+    # read the data as binary data stream
+    trained_route = pickle.load(filehandle)
+
+print(trained_route)
+destination = (9, 4)
+trained_route.append(destination)
+
+table_targets = [(4, 4),(4, 9),(4, 14),(9, 4)]
+
+# -------- Main Program Loop -----------
+while not done:
+    visited = set()
+    screen.fill(BLACK)                      # Background color
+    draw_grid(visited)
+    done = check_done()
+    new_route = trained_route[:]
+    
+    while len(new_route) != 0:
+        x = agent.state[0]
+        y = agent.state[1]
+        
+        agent.move_to(new_route[0])
+        new_route = new_route[1:]
+        
+        
+        pygame.time.delay(150)
+        screen.fill(BLACK)  
+        visited.add((x,y))
+        draw_grid(visited)
+                        # Drawing the grid
+        clock.tick(100)                           # Limit to 60 frames per second
+        pygame.display.flip()                    # Updating the screen
+    
+    
+pygame.quit()
--- a/s444523.rar
+++ b/s444523.rar
--- a/waiter_v3.py
+++ b/waiter_v3.py
@ -0,0 +1,369 @@
+import pygame
+import numpy as np
+import math
+
+     ########################
+     ###        WS        ###
+     ########################
+# For CNN:
+
+import keras
+from keras.preprocessing import image
+from keras.models import Sequential
+from keras.layers import Convolution2D
+from keras.layers import MaxPooling2D
+from keras.layers import Flatten
+from keras.layers import Dense
+ 
+
+#initializing:
+classifier = Sequential()
+#Convolution:
+classifier.add(Convolution2D(32, (3, 3), input_shape =(256, 256, 3), activation = "relu"))
+#Pooling:
+classifier.add(MaxPooling2D(pool_size = (2,2)))
+
+# Adding a second convolutional layer
+classifier.add(Convolution2D(32, 3, 3, activation = 'relu'))
+classifier.add(MaxPooling2D(pool_size = (2, 2)))
+
+#Flattening:
+classifier.add(Flatten())
+
+#Fully connected layers::
+classifier.add(Dense(units = 128, activation = "relu"))
+classifier.add(Dense(units = 3, activation = "softmax"))
+
+# loading weigjts:
+classifier.load_weights('s444523/best_model_weights2.h5')
+#Making CNN:
+classifier.compile(optimizer = "adam", loss = "categorical_crossentropy", metrics = ["accuracy"])
+
+
+     ########################
+     ###        WS        ###
+     ########################
+# Colors:
+# Define some colors
+BLACK = (0, 0, 0)
+WHITE = (255, 255, 255)
+GREEN = (0, 255, 0)
+RED = (255, 0, 0)
+BLUE = (0, 0, 240)
+
+#Width and Height of each square:
+WIDTH = 20
+HEIGHT = 20
+
+#Margin:
+MARGIN = 5
+grid = [[0 for x in range(16)] for y in range(16)]
+
+def change_value(i, j, width, n):
+    for r in range (i, i+width):
+        for c in range (j, j+width):
+            grid[r][c] = n
+    
+# the class "Table"        
+class Table:
+    def __init__(self, coordinate_i, coordinate_j, state = 0):
+        self.coordinate_i = coordinate_i
+        self.coordinate_j = coordinate_j
+        self.state = state
+        change_value(coordinate_i, coordinate_j, 2, 1)
+    def get_destination_coor(self):
+        return [self.coordinate_i, self.coordinate_j-1]
+    
+     ########################
+     ###        WS        ###
+     ########################
+     
+    # The finction "state of meal" chooses a photo of a plate at the given table.
+    def state_of_meal(self):
+        ## !!!!!!###
+        num = np.random.randint(67, 100)
+                ## !!!!!!###
+
+        if num<=67:
+            img_name = 'plates/{}.png'.format(num)
+        else:
+            img_name = 'plates/{}.jpg'.format(num)
+        return img_name
+    
+    # The function "change state" changes the value of the state variable.
+    # It informs, whether the client are waiting for the waiter to make an order
+    # (0 - empty plates) are eating (2 - full plates) or are waiting for the 
+    # waiter to get a recipt (1 - dirty plates)
+    
+    def change_state(self, st):
+        self.state = st
+        
+    ########################
+    ###       /WS        ###
+    ########################
+    
+
+class Kitchen:
+    def __init__(self, coordinate_i, coordinate_j):
+        self.coordinate_i = coordinate_i
+        self.coordinate_j = coordinate_j
+        change_value(coordinate_i, coordinate_j, 3, 2)
+        
+class Agent:
+    def __init__(self,orig_coordinate_i, orig_coordinate_j):
+        self.orig_coordinate_i = orig_coordinate_i
+        self.orig_coordinate_j = orig_coordinate_j
+        self.state = np.array([1,2])
+        change_value(orig_coordinate_j, orig_coordinate_j, 1, 3)
+        self.state_update(orig_coordinate_i, orig_coordinate_j)
+        self.previous_grid = np.array([-1, -1])
+        
+    def state_update(self, c1, c2):
+        self.state[0] = c1
+        self.state[1] = c2
+        
+    def leave(self):
+        change_value(self.state[0], self.state[1], 1, 0)
+    
+    def previous_grid_update(self):
+        self.previous_grid[0] = self.state[0]
+        self.previous_grid[1] = self.state[1]
+    
+    def move_to(self, nextPos):
+        self.previous_grid_update()
+        self.leave()
+        self.state_update(x + nextPos[0], y + nextPos[1])
+        change_value(self.state[0], self.state[1], 1, 3)
+   
+       ########################
+       ###        WS        ###
+       ########################
+       
+       #The function "define_table" serches for the apropriate table in the 
+       # table_dict (to enable the usage of class attributes and methods)
+    def define_table(self, t_num):
+        t_num = 'table{}'.format(t_num)
+        t_num = table_dict[t_num]
+        return t_num
+
+       # The function "check_plates" uses the pretrained CNN model to classify 
+       # the plate on the table as empty(0), full(2) or dirty(1)
+    def check_plates(self, table_number):
+        table = self.define_table(table_number)
+        plate = table.state_of_meal()        
+        plate= image.load_img(plate, target_size = (256, 256))
+        plate = image.img_to_array(plate)
+        plate = np.expand_dims(plate, axis = 0)
+        result = classifier.predict(plate)[0]
+        print (result)
+        if result[1] == 1:
+            result[1] = 0
+            x = int(input("Excuse me, have You done eating? 1=Yes, 2 = No \n"))
+            result[x] = 1
+        for i, x in enumerate(result):
+            if result[i] == 1:
+                table.change_state(i)
+
+       ########################
+       ###       /WS        ###
+       ########################
+    # check the next grid is not the previous grid to prevent the loop
+    def next_is_previous(self, x, y):
+        return np.array_equal(self.previous_grid, np.array([x, y]))
+
+def check_done():
+    for event in pygame.event.get():        # Checking for the event
+        if event.type == pygame.QUIT:       # If the program is closed:
+            return True                     # To exit the loop
+
+def draw_grid():
+    for row in range(16):                   # Drawing the grid
+        for column in range(16):
+            color = WHITE
+            if grid[row][column] == 1:
+                color = GREEN
+            if grid[row][column] == 2:
+                color = RED
+            if grid[row][column] == 3:
+                color = BLUE
+            pygame.draw.rect(screen,
+                             color,
+                             [(MARGIN + WIDTH) * column + MARGIN,
+                              (MARGIN + HEIGHT) * row + MARGIN,
+                              WIDTH,
+                              HEIGHT]) 
+
+# calculate the distance between two points 
+def distance(point1, point2):
+    return math.sqrt((point2[0] - point1[0])**2 + (point2[1] - point1[1])**2)
+
+## default positions of the agent:
+x = 12
+y = 12
+agent = Agent(x, y)
+
+table1 = Table(2, 2)
+table2 = Table (2,7)
+table3 = Table(2, 12)
+table4 = Table(7, 2)
+table5 = Table(7, 7)
+table6 = Table(7, 12)
+table7 = Table(12, 2)
+table8 = Table(12, 7)
+
+
+################### WS #####################
+# I added the dict to loop through tables.
+table_dict = {"table1":table1, "table2":table2, "table3":table3,"table4":table4,
+              "table5":table5,"table6":table6,"table7":table7,"table8":table8
+              }
+################### WS #####################
+
+#class Kitchen:
+kitchen = Kitchen(13, 13)
+
+# destination array
+destination_tables = []
+
+
+pygame.init()
+WINDOW_SIZE = [405, 405]
+screen = pygame.display.set_mode(WINDOW_SIZE)
+
+pygame.display.set_caption("Waiter_Grid3")
+
+done = False
+
+clock = pygame.time.Clock()
+ 
+# -------- Main Program Loop -----------
+while not done:
+    screen.fill(BLACK)                      # Background color
+    
+    draw_grid()
+    done = check_done()
+    for value in table_dict.values(): destination_tables.append(value.get_destination_coor())
+    # We need to define the number of the table by which we are in:
+    
+    num_of_table = 1
+    while len(destination_tables) != 0:
+        
+        # set the first element(table) in array as currDestination
+        currDestination = destination_tables[0]
+        # from kitchen to table
+        while agent.state[0] != currDestination[0] or agent.state[1] != currDestination[1]:
+
+            #///////////////////////////////////////
+            x = agent.state[0]
+            y = agent.state[1]
+            
+            # set a huge default number
+            minDis = 9999
+            nextPos = []
+            # check whether the agent goes left
+            if y-1 >= 0 and grid[x][y-1] != 1 and not agent.next_is_previous(x, y-1):
+                minDis = distance([x, y-1], currDestination)
+                nextPos = [0, -1]  # means go left
+            
+            # check whether the agent goes right
+            if y+1 <= 15 and grid[x][y+1] != 1 and grid[x][y+1] != 2 and not agent.next_is_previous(x, y+1):  
+                d = distance([x, y+1], currDestination)
+                if d < minDis:
+                    minDis = d
+                    nextPos = [0, 1]  # means go right
+            
+            # check whether the agent goes up
+            if x-1 >= 0 and grid[x-1][y] != 1 and not agent.next_is_previous(x-1, y):
+                d = distance([x-1, y], currDestination)
+                if d < minDis:
+                    minDis = d
+                    nextPos = [-1, 0]  # means go up
+                    
+            # check whether the agent goes down
+            if x+1 <= 15 and grid[x+1][y] != 1 and grid[x+1][y] != 2 and not agent.next_is_previous(x+1, y):
+                d = distance([x+1, y], currDestination)
+                if d < minDis:
+                    minDis = d
+                    nextPos = [1, 0]  # means go down
+            
+#            print(agent.previous_grid)
+            agent.move_to(nextPos)
+            #////////////////////////////////////////////////
+            
+            pygame.time.delay(100)
+            screen.fill(BLACK)                      # Background color
+            draw_grid()                             # Drawing the grid
+            clock.tick(60)                           # Limit to 60 frames per second
+            pygame.display.flip()                    # Updating the screen
+        
+        
+        ########################
+        ###        WS        ###
+        ########################
+        #pygame.time.delay(100)
+        print("I'm at a table no. {}".format(num_of_table))
+        ## Checking at what state are the plates:
+        agent.check_plates(num_of_table)
+        num_of_table +=1
+        
+        ########################
+        ###       /WS        ###
+        ########################
+        # set the kitchen as currDestination 
+        currDestination = [13, 12]
+        # from table to kitchen
+        while agent.state[0] != currDestination[0] or agent.state[1] != currDestination[1]:
+
+            #///////////////////////////////////////
+            x = agent.state[0]
+            y = agent.state[1]
+            
+            # set a huge default number
+            minDis = 9999
+            nextPos = []
+            # check whether the agent goes left
+            if y-1 >= 0 and grid[x][y-1] != 1 and not agent.next_is_previous(x, y-1):
+                minDis = distance([x, y-1], currDestination)
+                nextPos = [0, -1]  # means go left
+            
+            # check whether the agent goes right
+            if y+1 <= 15 and grid[x][y+1] != 1 and grid[x][y+1] != 2 and not agent.next_is_previous(x, y+1):  
+                d = distance([x, y+1], currDestination)
+                if d < minDis:
+                    minDis = d
+                    nextPos = [0, 1]  # means go right
+            
+            # check whether the agent goes up
+            if x-1 >= 0 and grid[x-1][y] != 1 and grid[x-1][y] != 2 and not agent.next_is_previous(x-1, y):
+                d = distance([x-1, y], currDestination)
+                if d < minDis:
+                    minDis = d
+                    nextPos = [-1, 0]  # means go up
+                    
+            # check whether the agent goes down
+            if x+1 <= 15 and grid[x+1][y] != 1 and grid[x+1][y] != 2 and not agent.next_is_previous(x+1, y):
+                d = distance([x+1, y], currDestination)
+                if d < minDis:
+                    minDis = d
+                    nextPos = [1, 0]  # means go down
+            
+            agent.move_to(nextPos)
+            #////////////////////////////////////////////////
+            
+            pygame.time.delay(100)
+            screen.fill(BLACK)                      # Background color
+            draw_grid()                             # Drawing the grid
+            clock.tick(60)                           # Limit to 60 frames per second
+            pygame.display.flip()                    # Updating the screen
+            
+        
+        destination_tables = destination_tables[1:]    # remove the first element in the list
+        # After each fool loop, we can quit the program:.
+        if len(destination_tables) == 0:
+            play_again = 1
+            play_again = int(input("Exit? 0=No, 1=Yes \n"))
+            if play_again:
+                pygame.quit()
+   
+    
+pygame.quit()
--- a/which_plate_CNN.py
+++ b/which_plate_CNN.py
@ -0,0 +1,69 @@
+##My cnn, classyfing the plates as dirty, clean or full.
+#imports
+from keras.models import Sequential
+from keras.layers import Convolution2D
+from keras.layers import MaxPooling2D
+from keras.layers import Flatten
+from keras.layers import Dense
+from keras.callbacks import EarlyStopping
+from keras.callbacks import ModelCheckpoint
+
+#initializing:
+classifier = Sequential()
+
+#Convolution:
+classifier.add(Convolution2D(32, (3, 3), input_shape =(256, 256, 3), activation = "relu"))
+
+#Pooling:
+classifier.add(MaxPooling2D(pool_size = (2,2)))
+
+# Adding a second convolutional layer
+classifier.add(Convolution2D(32, 3, 3, activation = 'relu'))
+classifier.add(MaxPooling2D(pool_size = (2, 2)))
+
+
+#Flattening:
+classifier.add(Flatten())
+
+#Fully connected layers::
+classifier.add(Dense(units = 128, activation = "relu"))
+classifier.add(Dense(units = 3, activation = "softmax"))
+
+#Making CNN:
+classifier.compile(optimizer = "adam", loss = "categorical_crossentropy", metrics = ["accuracy"])
+
+#From KERAS:
+from keras.preprocessing.image import ImageDataGenerator
+
+#Data augmentation:
+train_datagen = ImageDataGenerator(
+        rescale=1./255,
+        shear_range=0.2,
+        zoom_range=0.2,
+        horizontal_flip=True,
+        width_shift_range=0.2,
+        height_shift_range=0.1,
+        fill_mode='nearest')
+
+test_datagen = ImageDataGenerator(rescale=1./255)
+
+training_set = train_datagen.flow_from_directory('plates/training_set',
+										         target_size=(256, 256),
+										         batch_size=16,
+										         class_mode='categorical')
+
+test_set = test_datagen.flow_from_directory('plates/test_set',
+											 target_size=(256, 256),
+											 batch_size=16,
+											 class_mode='categorical')
+
+# callbacks:
+es = EarlyStopping(monitor='val_loss', mode='min', baseline=1, patience = 10)
+mc = ModelCheckpoint('best_model.h5', monitor='val_loss', mode='min', save_best_only=True, verbose = 1, period = 10)
+classifier.fit_generator(
+			        training_set,
+                    steps_per_epoch = 88,
+			        epochs=200,
+                    callbacks=[es, mc],
+			        validation_data=test_set,
+			        validation_steps=10)