Compare commits

...

11 Commits

Author SHA1 Message Date
a7152dff3c Individual Project #2; s442720 2020-05-10 12:44:35 +00:00
d9a5f26c02 Individual Project #2; s442720 2020-05-10 12:42:43 +00:00
f3ddde84e3 Individual Project #2; s442720 2020-05-10 12:27:58 +00:00
2a1724a4b7 Individual Project #2; s442720 2020-05-10 12:27:43 +00:00
71dc3e81a2 Individual Project #2; s442720 2020-05-10 12:27:15 +00:00
a73862b48b Individual Project #2; s442720 2020-05-10 12:17:01 +00:00
7958ec4a7e Merge branch 'master' of s444523/Waiter_group into master 2020-05-04 09:46:45 +00:00
cf203978c6 Waiter group
Report from my individual part
2020-05-04 06:34:26 +00:00
21b2f6328b Merge branch 'master' of s444523/Waiter_group into master 2020-04-30 07:52:50 +00:00
74e6baecfa Individual Project #1 implementation; s444523
A Convoultional Neural Network classyfing customers plates into three categories. Documentation inside s444523.rar
2020-04-30 07:42:18 +00:00
08def183f7 Individual Project #1 implementation
CNN classifying images of plates
2020-04-29 19:47:30 +00:00
9 changed files with 1404 additions and 0 deletions

View File

@ -0,0 +1,91 @@
# CNN Plates Classification
Author: Weronika Skowrońska, s444523
As my individual project, I decided to perform a classification of plates images using a Convolutional Neural Network. The goal of the project is to classify a photo of the client's plate as empty(0), dirty(1) or full(2), and assign an appropriate value to the given instance of the "Table" class.
# Architecture
Architecture of my CNN is very simple. I decided to use two convolutions, each using 32 feature detectors of size 3 by 3, followed by the ReLU activation function and MaxPooling of size 2 by 2.
```sh
classifier = Sequential()
classifier.add(Convolution2D(32, (3, 3), input_shape =(256, 256, 3), activation = "relu"))
classifier.add(MaxPooling2D(pool_size = (2,2)))
classifier.add(Convolution2D(32, 3, 3, activation = 'relu'))
classifier.add(MaxPooling2D(pool_size = (2, 2)))
classifier.add(Flatten())
```
After flattening, I added a fully connected layer of size 128 (again with ReLU activation function). The output layer consists of 3 neurons with softmax activation function, as I am using the Network for multiclass classification (3 possible outcomes).
```sh
classifier.add(Dense(units = 128, activation = "relu"))
classifier.add(Dense(units = 3, activation = "softmax"))
```
The optimizer of my network is adam, and categorical cross entropy was my choice for a loss function.
```sh
classifier.compile(optimizer = "adam", loss = "categorical_crossentropy", metrics = ["accuracy"])
```
# Library
I used keras to implement the network. It let me add some specific features to my network, such as early stopping and a few methods of data augmentation.
```sh
train_datagen = ImageDataGenerator(
rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
width_shift_range=0.2,
height_shift_range=0.1,
fill_mode='nearest')
```
This last issue was very important to me, as I did not have many photos to train the network with (altogether there were approximately 1200 of them).
# Project implementation
After training the Network, I firstly saved the model which gave me the best results (two keras callbacks, EarlyStopping and ModelCheckpoint were very useful) to a file named "best_model.h5".
```sh
# callbacks:
es = EarlyStopping(monitor='val_loss', mode='min', baseline=1, patience = 10)
mc = ModelCheckpoint('best_model.h5', monitor='val_loss', mode='min', save_best_only=True, verbose = 1, period = 10)
```
It occured though, that the file is to big to upload it to git, so I modified the code a little bit, and instead of saving the model, I saved the weights:
```sh
mc = ModelCheckpoint('best_model.h5', monitor='val_loss', mode='min', save_best_only=True, verbose = 1, period = 10, save_weights_only = True)
```
To be honest, it was not a very good idea either, as the new file is also to big to upload it. I managed to solve the probem in another way: I added the h5 file to my google drive, and added a link to download it to the project files.
To use the saved weights, I created the CNN model inside our project:
```sh
#initializing:
classifier = Sequential()
#Convolution:
classifier.add(Convolution2D(32, (3, 3), input_shape =(256, 256, 3), activation = "relu"))
#Pooling:
classifier.add(MaxPooling2D(pool_size = (2,2)))
# Adding a second convolutional layer
classifier.add(Convolution2D(32, 3, 3, activation = 'relu'))
classifier.add(MaxPooling2D(pool_size = (2, 2)))
#Flattening:
classifier.add(Flatten())
#Fully connected layers::
classifier.add(Dense(units = 128, activation = "relu"))
classifier.add(Dense(units = 3, activation = "softmax"))
# loading weigjts:
classifier.load_weights('s444523/best_model_weights2.h5')
#Making CNN:
classifier.compile(optimizer = "adam", loss = "categorical_crossentropy", metrics = ["accuracy"])
```
After coming to each table, the Agent (the waiter) evaluates a randomly selected photo of a plate using the provided model, and assigns the number of predicted class into the "state" attribute of a given table. This information will let perform further actions, based on the predicted outcome.
I noticed that my program has difficulties in distinguishing a full plate from a dirty one - interestingly, this was also a problem for me and my friends when we worked as real waiters in the restaurant. Therefore, if the plate is classified by the waiter as dirty, he asks politely if the client already has done eating, and acts accordingly to his answer:
```sh
if result[1] == 1:
result[1] = 0
x = int(input("Excuse me, have You done eating? 1=Yes, 2 = No \n"))
result[x] = 1
```

235
Reinforcement_learning.md Normal file
View File

@ -0,0 +1,235 @@
# Reinforcement learning for route planning in restaurant
##### Tao-Sen Chang s442720
###### We did the route planning by special algorithm on last task. In this machine learning sub-project I try to show different approach for the agent who can traversal multiple destinations on the grid system, and of course, get the shortest path of the route. What I want to use is called reinforcement learning.
## What is reinforcement learning?
###### Reinforcement learning is how software agents ought to take actions in an environment in order to maximize the notion of cumulative reward. The agent makes a sequence of decisions, and learn to perform the best actions every step. For example, in my project there is a waiter in the grid and he has to reach many tables for serving the meal, so he must learn the shortest path to get a table.
## How to do that?
###### The idea of how to complete the reinforcement learning is not quite easy. However, there is an example - rat in a maze. Instead of writing the whole algorithm at the beginning, I use the existing codes(tools) by adjusting some parameters to train the agent on our 16x16 grid.
https://www.samyzaf.com/ML/rl/qmaze.html
###### I train the agent(waiter) with rewards and penalties, the waiter in the above grid gets a small penalty for every legal move. The reason is that we want it to get to the target table in the shortest possible path. However, the shortest path to the target table is sometimes long and winding, and our agent (the waiter) may have to endure many errors until he gets to the table.
###### For example, one of the training parameters(rewards) are:
```
if rat_row == win_target_x and rat_col == win_target_y: # if reach the final target
return 1.0
if mode == 'blocked': # move to the block in the grid (blocks are tables or kitchen in our grid)
return -1.0
if (rat_row, rat_col) in self.visited: # when get to the visited grid point
return -0.5
if mode == 'invalid': # when move to the boundary
return -0.75
if mode == 'valid': # to make the route shorter, we give a penalty by moving to valid grid point
return -0.04
if (rat_row, rat_col) in self.curr_win_targets: # if reach any table
return 1.0
```
```
self.min_reward = -0.5 * self.maze.size
```
## Q-learning
###### We want to get the maximum reward from each action in a state. Here defines action=π(s).
###### Q(s,a) = the maximum total reward we can get by choosing action a in state s. Hence it's obvious that we get the function π(s)=argmaxQ(s,ai) Now the question is how to get Q(s,a)?
###### There is a solution called Bellman's Equation: Q(s,a) = R(s,a) + maxQ(s,ai)
###### R(s,a) is the reward in current state s, action a. And s means the next state, so maxQ(s,ai) means the maximum reward in 4 actions from next state. In the code we have the Experience Class to memorize each "episode", but the memory is limited, therefore if reach the max_memory, then delete the old episode which has lower effect to current episode.
###### There is a coefficient called discount factor, usually denoted by γ which is required for the Bellman equation for stochastic environments. So the new Bellman's Equation can be written as Q(s,a) = R(s,a) + γ * maxQ(s,ai). This discount factor is to diminish the effects which are far from current state.
```
class Experience(object):
def __init__(self, model, max_memory=100, discount=0.95):
self.model = model
self.max_memory = max_memory
self.discount = discount
self.memory = list()
self.num_actions = model.output_shape[-1]
def remember(self, episode):
# episode = [envstate, action, reward, envstate_next, game_over]
# memory[i] = episode
# envstate == flattened 1d maze cells info, including rat cell (see method: observe)
self.memory.append(episode)
if len(self.memory) > self.max_memory:
del self.memory[0]
def predict(self, envstate):
return self.model.predict(envstate)[0]
def get_data(self, data_size=10):
env_size = self.memory[0][0].shape[1] # envstate 1d size (1st element of episode)
mem_size = len(self.memory)
data_size = min(mem_size, data_size)
inputs = np.zeros((data_size, env_size))
targets = np.zeros((data_size, self.num_actions))
for i, j in enumerate(np.random.choice(range(mem_size), data_size, replace=False)):
envstate, action, reward, envstate_next, game_over = self.memory[j]
inputs[i] = envstate
# There should be no target values for actions not taken.
targets[i] = self.predict(envstate)
# Q_sa = derived policy = max quality env/action = max_a' Q(s', a')
Q_sa = np.max(self.predict(envstate_next))
if game_over:
targets[i, action] = reward
else:
# reward + gamma * max_a' Q(s', a')
targets[i, action] = reward + self.discount * Q_sa
return inputs, targets
```
## Training
###### Following is the algorithm for training neural network model to solve the problem. One epoch means one loop of the training, and in each epoch the agent will finally become either "win" or "lose".
###### Another coefficient "epsilon" is exploration factor which decides the probability of whether the agent will perform new actions instead of following the previous experiences (which is called exploitation). By this way the agent could not only collect better rewards from previous experiences, but also have the chances to explore unknown area where might get more rewards. If one of the strategies is determined, then let's start training it by neural network. (inputs: size equals to the maze size, targets: size is the same as the number of actions (4 in our case)).
```
# Exploration factor
epsilon = 0.1
def qtrain(model, maze, **opt):
global epsilon
n_epoch = opt.get('n_epoch', 15000)
max_memory = opt.get('max_memory', 1000)
data_size = opt.get('data_size', 50)
weights_file = opt.get('weights_file', "")
name = opt.get('name', 'model')
start_time = datetime.datetime.now()
# If you want to continue training from a previous model,
# just supply the h5 file name to weights_file option
if weights_file:
print("loading weights from file: %s" % (weights_file,))
model.load_weights(weights_file)
# Construct environment/game from numpy array: maze (see above)
qmaze = Qmaze(maze)
# Initialize experience replay object
experience = Experience(model, max_memory=max_memory)
win_history = [] # history of win/lose game
n_free_cells = len(qmaze.free_cells)
hsize = qmaze.maze.size//2 # history window size
win_rate = 0.0
imctr = 1
pre_episodes = 2**31 - 1
for epoch in range(n_epoch):
loss = 0.0
#rat_cell = random.choice(qmaze.free_cells)
#rat_cell = (0, 0)
rat_cell = (12, 12)
qmaze.reset(rat_cell)
game_over = False
# get initial envstate (1d flattened canvas)
envstate = qmaze.observe()
n_episodes = 0
while not game_over:
valid_actions = qmaze.valid_actions()
if not valid_actions: break
prev_envstate = envstate
# Get next action
if np.random.rand() < epsilon:
action = random.choice(valid_actions)
else:
action = np.argmax(experience.predict(prev_envstate))
# Apply action, get reward and new envstate
envstate, reward, game_status = qmaze.act(action)
if game_status == 'win':
print("win")
win_history.append(1)
game_over = True
# save_pic(qmaze)
if n_episodes <= pre_episodes:
# output_route(qmaze)
print(qmaze.visited)
with open('res.data', 'wb') as filehandle:
pickle.dump(qmaze.visited, filehandle)
pre_episodes = n_episodes
elif game_status == 'lose':
print("lose")
win_history.append(0)
game_over = True
# save_pic(qmaze)
else:
game_over = False
# Store episode (experience)
episode = [prev_envstate, action, reward, envstate, game_over]
experience.remember(episode)
n_episodes += 1
# Train neural network model
inputs, targets = experience.get_data(data_size=data_size)
h = model.fit(
inputs,
targets,
epochs=8,
batch_size=16,
verbose=0,
)
loss = model.evaluate(inputs, targets, verbose=0)
if len(win_history) > hsize:
win_rate = sum(win_history[-hsize:]) / hsize
dt = datetime.datetime.now() - start_time
t = format_time(dt.total_seconds())
template = "Epoch: {:03d}/{:d} | Loss: {:.4f} | Episodes: {:d} | Win count: {:d} | Win rate: {:.3f} | time: {}"
print(template.format(epoch, n_epoch-1, loss, n_episodes, sum(win_history), win_rate, t))
```
## Testing
###### Use this algorithm to our 16x16 grid and train.
```
grid = [[1 for x in range(16)] for y in range(16)]
table1 = Table(2, 2)
table2 = Table (2,7)
table3 = Table(2, 12)
table4 = Table(7, 2)
table5 = Table(7, 7)
table6 = Table(7, 12)
table7 = Table(12, 2)
table8 = Table(12, 7)
kitchen = Kitchen(13, 13)
maze = np.array(grid)
model = build_model(maze)
qtrain(model, maze, epochs=1000, max_memory=8*maze.size, data_size=32)
```
###### Also I create a list called win_targets to put the position of tables in the grid.
```
win_targets = [(4, 4),(4, 9),(4, 14),(9, 4),(9, 9),(9, 14),(14, 4),(14, 9)]
```
###### After tons of training, I realize it is not an easy task to obtain the shortest route in every training - that means most of the training are failed - especially in the case that the win_targets has more targets. For example, the result of training 8 targets is like this(part of result):
```
...
Epoch: 167/14999 | Loss: 0.0299 | Episodes: 407 | Win count: 63 | Win rate: 0.422 | time: 2.44 hours
Epoch: 168/14999 | Loss: 0.0112 | Episodes: 650 | Win count: 63 | Win rate: 0.414 | time: 2.46 hours
Epoch: 169/14999 | Loss: 0.0147 | Episodes: 392 | Win count: 64 | Win rate: 0.422 | time: 2.47 hours
Epoch: 170/14999 | Loss: 0.0112 | Episodes: 668 | Win count: 65 | Win rate: 0.422 | time: 2.48 hours
Epoch: 171/14999 | Loss: 0.0101 | Episodes: 487 | Win count: 66 | Win rate: 0.430 | time: 2.50 hours
Epoch: 172/14999 | Loss: 0.0121 | Episodes: 362 | Win count: 67 | Win rate: 0.438 | time: 2.51 hours
Epoch: 173/14999 | Loss: 0.0101 | Episodes: 484 | Win count: 68 | Win rate: 0.445 | time: 2.52 hours
...
```
###### The only one which is successful contains 4 targets(win_targets = [(4, 4),(4, 9),(4, 14),(9, 4)])
```
...
Epoch: 223/14999 | Loss: 0.0228 | Episodes: 30 | Win count: 165 | Win rate: 0.906 | time: 64.02 minutes
Epoch: 224/14999 | Loss: 0.0160 | Episodes: 52 | Win count: 166 | Win rate: 0.906 | time: 64.09 minutes
Epoch: 225/14999 | Loss: 0.0702 | Episodes: 34 | Win count: 167 | Win rate: 0.914 | time: 64.14 minutes
Epoch: 226/14999 | Loss: 0.0175 | Episodes: 40 | Win count: 168 | Win rate: 0.922 | time: 64.19 minutes
Epoch: 227/14999 | Loss: 0.0271 | Episodes: 46 | Win count: 169 | Win rate: 0.930 | time: 64.25 minutes
Epoch: 228/14999 | Loss: 0.0194 | Episodes: 40 | Win count: 170 | Win rate: 0.938 | time: 64.30 minutes
...
Epoch: 460/14999 | Loss: 0.0236 | Episodes: 60 | Win count: 401 | Win rate: 1.000 | time: 1.48 hours
Reached 100% win rate at epoch: 460
n_epoch: 460, max_mem: 2048, data: 32, time: 1.48 hours
```
###### In my opinion, there are 3 reasons cause such bad results.
###### 1. The parameters in the algorithm are not optimal including the rewards, exploration rate, and discount factor. To adjust the parameters and to validate them costs lots of time, and the most intuitive way is always not the best solution. For example, the parameters of 4 targets are fine, but if the number of targets expanded to 8, the parameters are not just 1/2 of the original ones.
###### 2. Because of the exploration rate, every time the same training and testing data may have a different result. It increases the difficulty to verify our result. The only way to check whether the parameters generate ideal results is training continuously until we collect sufficient data.
###### 3. The algorithm is for a rat in a maze at the beginning, and the number of default target is only one. If we apply it for multiple targets, there may be inadequate for some reason. Moreover, the default size is 7x7. It is possible that the 16x16 grid is too huge for this algorithm.

489
main_training.py Normal file
View File

@ -0,0 +1,489 @@
from __future__ import print_function
import os, sys, time, datetime, json, random
import numpy as np
from keras.models import Sequential
from keras.layers.core import Dense, Activation
from keras.optimizers import SGD , Adam, RMSprop
from keras.layers.advanced_activations import PReLU
import matplotlib.pyplot as plt
import pickle
visited_mark = 0.8 # Cells visited by the rat will be painted by gray 0.8
rat_mark = 0.5 # The current rat cell will be painteg by gray 0.5
LEFT = 0
UP = 1
RIGHT = 2
DOWN = 3
# Actions dictionary
actions_dict = {
LEFT: 'left',
UP: 'up',
RIGHT: 'right',
DOWN: 'down',
}
num_actions = len(actions_dict)
# Exploration factor
epsilon = 0.1
file_name_num = 1
win_targets = [(4, 4),(4, 9),(4, 14),(9, 4)]
class Qmaze(object):
def __init__(self, maze, rat=(12,12)):
global win_targets
self._maze = np.array(maze)
nrows, ncols = self._maze.shape
#self.target = (nrows-1, ncols-1) # target cell where the "cheese" is
self.target = win_targets[0]
self.free_cells = [(r,c) for r in range(nrows) for c in range(ncols) if self._maze[r,c] == 1.0]
self.free_cells.remove(win_targets[-1])
if self._maze[self.target] == 0.0:
raise Exception("Invalid maze: target cell cannot be blocked!")
if not rat in self.free_cells:
raise Exception("Invalid Rat Location: must sit on a free cell")
self.reset(rat)
def reset(self, rat):
global win_targets
self.rat = rat
self.maze = np.copy(self._maze)
nrows, ncols = self.maze.shape
row, col = rat
self.maze[row, col] = rat_mark
self.state = (row, col, 'start')
self.min_reward = -0.5 * self.maze.size
self.total_reward = 0
self.visited = list()
self.curr_win_targets = win_targets[:]
def update_state(self, action):
nrows, ncols = self.maze.shape
nrow, ncol, nmode = rat_row, rat_col, mode = self.state
if self.maze[rat_row, rat_col] > 0.0:
self.visited.append((rat_row, rat_col)) # mark visited cell
valid_actions = self.valid_actions()
if not valid_actions:
nmode = 'blocked'
elif action in valid_actions:
nmode = 'valid'
if action == LEFT:
ncol -= 1
elif action == UP:
nrow -= 1
if action == RIGHT:
ncol += 1
elif action == DOWN:
nrow += 1
else: # invalid action, no change in rat position
mode = 'invalid'
# new state
self.state = (nrow, ncol, nmode)
def get_reward(self):
win_target_x, win_target_y = self.target
rat_row, rat_col, mode = self.state
nrows, ncols = self.maze.shape
if rat_row == win_target_x and rat_col == win_target_y:
return 1.0
if mode == 'blocked': # move to the block in the grid
return -1.0
if (rat_row, rat_col) in self.visited:
return -0.5 # default -0.25 -> -0.5
if mode == 'invalid':
return -0.75 # default -0.75 move to the boundary
if mode == 'valid': # default -0.04 -> -0.1
return -0.04
if (rat_row, rat_col) in self.curr_win_targets:
return 1.0
def act(self, action):
self.update_state(action)
reward = self.get_reward()
self.total_reward += reward
status = self.game_status()
envstate = self.observe()
return envstate, reward, status
def observe(self):
canvas = self.draw_env()
envstate = canvas.reshape((1, -1))
return envstate
def draw_env(self):
canvas = np.copy(self.maze)
nrows, ncols = self.maze.shape
# clear all visual marks
for r in range(nrows):
for c in range(ncols):
if canvas[r,c] > 0.0:
canvas[r,c] = 1.0
# draw the rat
row, col, valid = self.state
canvas[row, col] = rat_mark
return canvas
def game_status(self):
if self.total_reward < self.min_reward:
return 'lose'
rat_row, rat_col, mode = self.state
nrows, ncols = self.maze.shape
curPos = (rat_row, rat_col)
if curPos in self.curr_win_targets:
self.curr_win_targets.remove(curPos)
if len(self.curr_win_targets) == 0:
return 'win'
else:
self.target = self.curr_win_targets[0]
return 'not_over'
def valid_actions(self, cell=None):
if cell is None:
row, col, mode = self.state
else:
row, col = cell
actions = [0, 1, 2, 3]
nrows, ncols = self.maze.shape
if row == 0:
actions.remove(1)
elif row == nrows-1:
actions.remove(3)
if col == 0:
actions.remove(0)
elif col == ncols-1:
actions.remove(2)
if row>0 and self.maze[row-1,col] == 0.0:
actions.remove(1)
if row<nrows-1 and self.maze[row+1,col] == 0.0:
actions.remove(3)
if col>0 and self.maze[row,col-1] == 0.0:
actions.remove(0)
if col<ncols-1 and self.maze[row,col+1] == 0.0:
actions.remove(2)
return actions
def show(qmaze):
global win_target
win_target_row, win_target_col = win_target
plt.grid('on')
nrows, ncols = qmaze.maze.shape
ax = plt.gca()
ax.set_xticks(np.arange(0.5, nrows, 1))
ax.set_yticks(np.arange(0.5, ncols, 1))
ax.set_xticklabels([])
ax.set_yticklabels([])
canvas = np.copy(qmaze.maze)
for row,col in qmaze.visited:
canvas[row,col] = 0.6
rat_row, rat_col, _ = qmaze.state
canvas[rat_row, rat_col] = 0.3 # rat cell
canvas[win_target_row, win_target_col] = 0.9 # cheese cell
img = plt.imshow(canvas, interpolation='none', cmap='gray')
return img
def save_pic(qmaze):
global file_name_num
global win_target
win_target_row, win_target_col = win_target
plt.grid('on')
nrows, ncols = qmaze.maze.shape
ax = plt.gca()
ax.set_xticks(np.arange(0.5, nrows, 1))
ax.set_yticks(np.arange(0.5, ncols, 1))
ax.set_xticklabels([])
ax.set_yticklabels([])
canvas = np.copy(qmaze.maze)
for row,col in qmaze.visited:
canvas[row,col] = 0.6
rat_row, rat_col, _ = qmaze.state
canvas[rat_row, rat_col] = 0.3 # rat cell
canvas[win_target_row, win_target_col] = 0.9 # cheese cell
plt.imshow(canvas, interpolation='none', cmap='gray')
plt.savefig(str(file_name_num) + ".png")
file_name_num += 1
def output_route(qmaze):
global win_target
win_target_row, win_target_col = win_target
print(qmaze._maze)
def play_game(model, qmaze, rat_cell):
qmaze.reset(rat_cell)
envstate = qmaze.observe()
while True:
prev_envstate = envstate
# get next action
q = model.predict(prev_envstate)
action = np.argmax(q[0])
# apply action, get rewards and new state
envstate, reward, game_status = qmaze.act(action)
if game_status == 'win':
return True
elif game_status == 'lose':
return False
def completion_check(model, qmaze):
for cell in qmaze.free_cells:
if not qmaze.valid_actions(cell):
return False
if not play_game(model, qmaze, cell):
return False
return True
class Experience(object):
def __init__(self, model, max_memory=100, discount=0.9):
self.model = model
self.max_memory = max_memory
self.discount = discount
self.memory = list()
self.num_actions = model.output_shape[-1]
def remember(self, episode):
# episode = [envstate, action, reward, envstate_next, game_over]
# memory[i] = episode
# envstate == flattened 1d maze cells info, including rat cell (see method: observe)
self.memory.append(episode)
if len(self.memory) > self.max_memory:
del self.memory[0]
def predict(self, envstate):
return self.model.predict(envstate)[0]
def get_data(self, data_size=10):
env_size = self.memory[0][0].shape[1] # envstate 1d size (1st element of episode)
mem_size = len(self.memory)
data_size = min(mem_size, data_size)
inputs = np.zeros((data_size, env_size))
targets = np.zeros((data_size, self.num_actions))
for i, j in enumerate(np.random.choice(range(mem_size), data_size, replace=False)):
envstate, action, reward, envstate_next, game_over = self.memory[j]
inputs[i] = envstate
# There should be no target values for actions not taken.
targets[i] = self.predict(envstate)
# Q_sa = derived policy = max quality env/action = max_a' Q(s', a')
Q_sa = np.max(self.predict(envstate_next))
if game_over:
targets[i, action] = reward
else:
# reward + gamma * max_a' Q(s', a')
targets[i, action] = reward + self.discount * Q_sa
return inputs, targets
def qtrain(model, maze, **opt):
global epsilon
n_epoch = opt.get('n_epoch', 15000)
max_memory = opt.get('max_memory', 1000)
data_size = opt.get('data_size', 50)
weights_file = opt.get('weights_file', "")
name = opt.get('name', 'model')
start_time = datetime.datetime.now()
# If you want to continue training from a previous model,
# just supply the h5 file name to weights_file option
if weights_file:
print("loading weights from file: %s" % (weights_file,))
model.load_weights(weights_file)
# Construct environment/game from numpy array: maze (see above)
qmaze = Qmaze(maze)
# Initialize experience replay object
experience = Experience(model, max_memory=max_memory)
win_history = [] # history of win/lose game
n_free_cells = len(qmaze.free_cells)
hsize = qmaze.maze.size//2 # history window size
win_rate = 0.0
imctr = 1
pre_episodes = 2**31 - 1
for epoch in range(n_epoch):
loss = 0.0
#rat_cell = random.choice(qmaze.free_cells)
#rat_cell = (0, 0)
rat_cell = (12, 12)
qmaze.reset(rat_cell)
game_over = False
# get initial envstate (1d flattened canvas)
envstate = qmaze.observe()
n_episodes = 0
while not game_over:
valid_actions = qmaze.valid_actions()
if not valid_actions: break
prev_envstate = envstate
# Get next action
if np.random.rand() < epsilon:
action = random.choice(valid_actions)
else:
action = np.argmax(experience.predict(prev_envstate))
# Apply action, get reward and new envstate
envstate, reward, game_status = qmaze.act(action)
if game_status == 'win':
print("win")
win_history.append(1)
game_over = True
# save_pic(qmaze)
if n_episodes <= pre_episodes:
# output_route(qmaze)
print(qmaze.visited)
with open('res.data', 'wb') as filehandle:
pickle.dump(qmaze.visited, filehandle)
pre_episodes = n_episodes
elif game_status == 'lose':
print("lose")
win_history.append(0)
game_over = True
# save_pic(qmaze)
else:
game_over = False
# Store episode (experience)
episode = [prev_envstate, action, reward, envstate, game_over]
experience.remember(episode)
n_episodes += 1
# Train neural network model
inputs, targets = experience.get_data(data_size=data_size)
h = model.fit(
inputs,
targets,
epochs=8,
batch_size=16,
verbose=0,
)
loss = model.evaluate(inputs, targets, verbose=0)
if len(win_history) > hsize:
win_rate = sum(win_history[-hsize:]) / hsize
dt = datetime.datetime.now() - start_time
t = format_time(dt.total_seconds())
template = "Epoch: {:03d}/{:d} | Loss: {:.4f} | Episodes: {:d} | Win count: {:d} | Win rate: {:.3f} | time: {}"
print(template.format(epoch, n_epoch-1, loss, n_episodes, sum(win_history), win_rate, t))
# we simply check if training has exhausted all free cells and if in all
# cases the agent won
if win_rate > 0.9 : epsilon = 0.05
train_max = 192
# print(sum(win_history[-192*1.5:]))
# print(192)
if sum(win_history[-192:]) >= 192:
print("Reached 100%% win rate at epoch: %d" % (epoch,))
break
# Save trained model weights and architecture, this will be used by the visualization code
h5file = name + ".h5"
json_file = name + ".json"
model.save_weights(h5file, overwrite=True)
with open(json_file, "w") as outfile:
json.dump(model.to_json(), outfile)
end_time = datetime.datetime.now()
dt = datetime.datetime.now() - start_time
seconds = dt.total_seconds()
t = format_time(seconds)
print('files: %s, %s' % (h5file, json_file))
print("n_epoch: %d, max_mem: %d, data: %d, time: %s" % (epoch, max_memory, data_size, t))
return seconds
# This is a small utility for printing readable time strings:
def format_time(seconds):
if seconds < 400:
s = float(seconds)
return "%.1f seconds" % (s,)
elif seconds < 4000:
m = seconds / 60.0
return "%.2f minutes" % (m,)
else:
h = seconds / 3600.0
return "%.2f hours" % (h,)
def build_model(maze, lr=0.001):
model = Sequential()
model.add(Dense(maze.size, input_shape=(maze.size,)))
model.add(PReLU())
model.add(Dense(maze.size))
model.add(PReLU())
model.add(Dense(num_actions))
model.compile(optimizer='adam', loss='mse')
return model
class Table:
def __init__(self, coordinate_i, coordinate_j):
self.coordinate_i = coordinate_i
self.coordinate_j = coordinate_j
change_value(coordinate_i, coordinate_j, 2, 0.)
def get_destination_coor(self):
return [self.coordinate_i, self.coordinate_j-1]
class Kitchen:
def __init__(self, coordinate_i, coordinate_j):
self.coordinate_i = coordinate_i
self.coordinate_j = coordinate_j
change_value(coordinate_i, coordinate_j, 3, 0.)
if __name__== "__main__":
def change_value(i, j, width, n):
for r in range (i, i+width):
for c in range (j, j+width):
grid[r][c] = n
grid = [[1 for x in range(16)] for y in range(16)]
table1 = Table(2, 2)
table2 = Table (2,7)
table3 = Table(2, 12)
table4 = Table(7, 2)
table5 = Table(7, 7)
table6 = Table(7, 12)
table7 = Table(12, 2)
table8 = Table(12, 7)
kitchen = Kitchen(13, 13)
maze = np.array(grid)
# print(maze)
# maze = np.array([
# [ 1., 0., 1., 1., 1., 1., 1., 1.],
# [ 1., 1., 1., 0., 0., 1., 0., 1.],
# [ 1., 1., 1., 1., 1., 1., 0., 1.],
# [ 1., 1., 1., 1., 0., 0., 1., 1.],
# [ 1., 0., 0., 0., 1., 1., 1., 1.],
# [ 1., 0., 1., 1., 1., 1., 1., 1.],
# [ 1., 1., 1., 0., 1., 1., 1., 1.]
# ])
# print(maze)
# qmaze = Qmaze(maze)
# show(qmaze)
model = build_model(maze)
qtrain(model, maze, epochs=1000, max_memory=8*maze.size, data_size=32)

BIN
plates.rar Normal file

Binary file not shown.

BIN
res_targets_4-1.data Normal file

Binary file not shown.

151
route_for_project.py Normal file
View File

@ -0,0 +1,151 @@
import pygame
import numpy as np
import math
import pickle
# Colors:
# Define some colors
BLACK = (0, 0, 0)
WHITE = (255, 255, 255)
GREEN = (0, 255, 0)
RED = (255, 0, 0)
BLUE = (0, 0, 240)
YELLOW = (255, 255, 0)
#Width and Height of each square:
WIDTH = 20
HEIGHT = 20
#Margin:
MARGIN = 5
grid = [[0 for x in range(16)] for y in range(16)]
def change_value(i, j, width, n):
for r in range (i, i+width):
for c in range (j, j+width):
grid[r][c] = n
class Table:
def __init__(self, coordinate_i, coordinate_j):
self.coordinate_i = coordinate_i
self.coordinate_j = coordinate_j
change_value(coordinate_i, coordinate_j, 2, 1)
def get_destination_coor(self):
return [self.coordinate_i, self.coordinate_j-1]
class Kitchen:
def __init__(self, coordinate_i, coordinate_j):
self.coordinate_i = coordinate_i
self.coordinate_j = coordinate_j
change_value(coordinate_i, coordinate_j, 3, 2)
class Agent:
def __init__(self,orig_coordinate_i, orig_coordinate_j):
self.orig_coordinate_i = orig_coordinate_i
self.orig_coordinate_j = orig_coordinate_j
self.state = np.array([orig_coordinate_i,orig_coordinate_j])
change_value(orig_coordinate_j, orig_coordinate_j, 1, 3)
self.state_update(orig_coordinate_i, orig_coordinate_j)
def state_update(self, c1, c2):
self.state[0] = c1
self.state[1] = c2
def leave(self):
change_value(self.state[0], self.state[1], 1, 0)
def move_to(self, nextPos):
self.leave()
nextPos_x, nextPos_y = nextPos
self.state_update(nextPos_x, nextPos_y)
change_value(self.state[0], self.state[1], 1, 3)
def check_done():
for event in pygame.event.get(): # Checking for the event
if event.type == pygame.QUIT: # If the program is closed:
return True # To exit the loop
def draw_grid(visited):
for row in range(16): # Drawing the grid
for column in range(16):
color = WHITE
if grid[row][column] == 1:
color = GREEN
if grid[row][column] == 2:
color = RED
if grid[row][column] == 3:
color = BLUE
if (row, column) in visited or (row, column) in table_targets:
color = YELLOW
pygame.draw.rect(screen,
color,
[(MARGIN + WIDTH) * column + MARGIN,
(MARGIN + HEIGHT) * row + MARGIN,
WIDTH,
HEIGHT])
## default positions of the agent:
x = 12
y = 12
agent = Agent(x, y)
table1 = Table(2, 2)
table2 = Table (2,7)
table3 = Table(2, 12)
table4 = Table(7, 2)
table5 = Table(7, 7)
table6 = Table(7, 12)
table7 = Table(12, 2)
table8 = Table(12, 7)
#class Kitchen:
kitchen = Kitchen(13, 13)
pygame.init()
WINDOW_SIZE = [405, 405]
screen = pygame.display.set_mode(WINDOW_SIZE)
pygame.display.set_caption("Waiter_Grid3")
done = False
clock = pygame.time.Clock()
with open('res_targets_4-1.data', 'rb') as filehandle:
# read the data as binary data stream
trained_route = pickle.load(filehandle)
print(trained_route)
destination = (9, 4)
trained_route.append(destination)
table_targets = [(4, 4),(4, 9),(4, 14),(9, 4)]
# -------- Main Program Loop -----------
while not done:
visited = set()
screen.fill(BLACK) # Background color
draw_grid(visited)
done = check_done()
new_route = trained_route[:]
while len(new_route) != 0:
x = agent.state[0]
y = agent.state[1]
agent.move_to(new_route[0])
new_route = new_route[1:]
pygame.time.delay(150)
screen.fill(BLACK)
visited.add((x,y))
draw_grid(visited)
# Drawing the grid
clock.tick(100) # Limit to 60 frames per second
pygame.display.flip() # Updating the screen
pygame.quit()

BIN
s444523.rar Normal file

Binary file not shown.

369
waiter_v3.py Normal file
View File

@ -0,0 +1,369 @@
import pygame
import numpy as np
import math
########################
### WS ###
########################
# For CNN:
import keras
from keras.preprocessing import image
from keras.models import Sequential
from keras.layers import Convolution2D
from keras.layers import MaxPooling2D
from keras.layers import Flatten
from keras.layers import Dense
#initializing:
classifier = Sequential()
#Convolution:
classifier.add(Convolution2D(32, (3, 3), input_shape =(256, 256, 3), activation = "relu"))
#Pooling:
classifier.add(MaxPooling2D(pool_size = (2,2)))
# Adding a second convolutional layer
classifier.add(Convolution2D(32, 3, 3, activation = 'relu'))
classifier.add(MaxPooling2D(pool_size = (2, 2)))
#Flattening:
classifier.add(Flatten())
#Fully connected layers::
classifier.add(Dense(units = 128, activation = "relu"))
classifier.add(Dense(units = 3, activation = "softmax"))
# loading weigjts:
classifier.load_weights('s444523/best_model_weights2.h5')
#Making CNN:
classifier.compile(optimizer = "adam", loss = "categorical_crossentropy", metrics = ["accuracy"])
########################
### WS ###
########################
# Colors:
# Define some colors
BLACK = (0, 0, 0)
WHITE = (255, 255, 255)
GREEN = (0, 255, 0)
RED = (255, 0, 0)
BLUE = (0, 0, 240)
#Width and Height of each square:
WIDTH = 20
HEIGHT = 20
#Margin:
MARGIN = 5
grid = [[0 for x in range(16)] for y in range(16)]
def change_value(i, j, width, n):
for r in range (i, i+width):
for c in range (j, j+width):
grid[r][c] = n
# the class "Table"
class Table:
def __init__(self, coordinate_i, coordinate_j, state = 0):
self.coordinate_i = coordinate_i
self.coordinate_j = coordinate_j
self.state = state
change_value(coordinate_i, coordinate_j, 2, 1)
def get_destination_coor(self):
return [self.coordinate_i, self.coordinate_j-1]
########################
### WS ###
########################
# The finction "state of meal" chooses a photo of a plate at the given table.
def state_of_meal(self):
## !!!!!!###
num = np.random.randint(67, 100)
## !!!!!!###
if num<=67:
img_name = 'plates/{}.png'.format(num)
else:
img_name = 'plates/{}.jpg'.format(num)
return img_name
# The function "change state" changes the value of the state variable.
# It informs, whether the client are waiting for the waiter to make an order
# (0 - empty plates) are eating (2 - full plates) or are waiting for the
# waiter to get a recipt (1 - dirty plates)
def change_state(self, st):
self.state = st
########################
### /WS ###
########################
class Kitchen:
def __init__(self, coordinate_i, coordinate_j):
self.coordinate_i = coordinate_i
self.coordinate_j = coordinate_j
change_value(coordinate_i, coordinate_j, 3, 2)
class Agent:
def __init__(self,orig_coordinate_i, orig_coordinate_j):
self.orig_coordinate_i = orig_coordinate_i
self.orig_coordinate_j = orig_coordinate_j
self.state = np.array([1,2])
change_value(orig_coordinate_j, orig_coordinate_j, 1, 3)
self.state_update(orig_coordinate_i, orig_coordinate_j)
self.previous_grid = np.array([-1, -1])
def state_update(self, c1, c2):
self.state[0] = c1
self.state[1] = c2
def leave(self):
change_value(self.state[0], self.state[1], 1, 0)
def previous_grid_update(self):
self.previous_grid[0] = self.state[0]
self.previous_grid[1] = self.state[1]
def move_to(self, nextPos):
self.previous_grid_update()
self.leave()
self.state_update(x + nextPos[0], y + nextPos[1])
change_value(self.state[0], self.state[1], 1, 3)
########################
### WS ###
########################
#The function "define_table" serches for the apropriate table in the
# table_dict (to enable the usage of class attributes and methods)
def define_table(self, t_num):
t_num = 'table{}'.format(t_num)
t_num = table_dict[t_num]
return t_num
# The function "check_plates" uses the pretrained CNN model to classify
# the plate on the table as empty(0), full(2) or dirty(1)
def check_plates(self, table_number):
table = self.define_table(table_number)
plate = table.state_of_meal()
plate= image.load_img(plate, target_size = (256, 256))
plate = image.img_to_array(plate)
plate = np.expand_dims(plate, axis = 0)
result = classifier.predict(plate)[0]
print (result)
if result[1] == 1:
result[1] = 0
x = int(input("Excuse me, have You done eating? 1=Yes, 2 = No \n"))
result[x] = 1
for i, x in enumerate(result):
if result[i] == 1:
table.change_state(i)
########################
### /WS ###
########################
# check the next grid is not the previous grid to prevent the loop
def next_is_previous(self, x, y):
return np.array_equal(self.previous_grid, np.array([x, y]))
def check_done():
for event in pygame.event.get(): # Checking for the event
if event.type == pygame.QUIT: # If the program is closed:
return True # To exit the loop
def draw_grid():
for row in range(16): # Drawing the grid
for column in range(16):
color = WHITE
if grid[row][column] == 1:
color = GREEN
if grid[row][column] == 2:
color = RED
if grid[row][column] == 3:
color = BLUE
pygame.draw.rect(screen,
color,
[(MARGIN + WIDTH) * column + MARGIN,
(MARGIN + HEIGHT) * row + MARGIN,
WIDTH,
HEIGHT])
# calculate the distance between two points
def distance(point1, point2):
return math.sqrt((point2[0] - point1[0])**2 + (point2[1] - point1[1])**2)
## default positions of the agent:
x = 12
y = 12
agent = Agent(x, y)
table1 = Table(2, 2)
table2 = Table (2,7)
table3 = Table(2, 12)
table4 = Table(7, 2)
table5 = Table(7, 7)
table6 = Table(7, 12)
table7 = Table(12, 2)
table8 = Table(12, 7)
################### WS #####################
# I added the dict to loop through tables.
table_dict = {"table1":table1, "table2":table2, "table3":table3,"table4":table4,
"table5":table5,"table6":table6,"table7":table7,"table8":table8
}
################### WS #####################
#class Kitchen:
kitchen = Kitchen(13, 13)
# destination array
destination_tables = []
pygame.init()
WINDOW_SIZE = [405, 405]
screen = pygame.display.set_mode(WINDOW_SIZE)
pygame.display.set_caption("Waiter_Grid3")
done = False
clock = pygame.time.Clock()
# -------- Main Program Loop -----------
while not done:
screen.fill(BLACK) # Background color
draw_grid()
done = check_done()
for value in table_dict.values(): destination_tables.append(value.get_destination_coor())
# We need to define the number of the table by which we are in:
num_of_table = 1
while len(destination_tables) != 0:
# set the first element(table) in array as currDestination
currDestination = destination_tables[0]
# from kitchen to table
while agent.state[0] != currDestination[0] or agent.state[1] != currDestination[1]:
#///////////////////////////////////////
x = agent.state[0]
y = agent.state[1]
# set a huge default number
minDis = 9999
nextPos = []
# check whether the agent goes left
if y-1 >= 0 and grid[x][y-1] != 1 and not agent.next_is_previous(x, y-1):
minDis = distance([x, y-1], currDestination)
nextPos = [0, -1] # means go left
# check whether the agent goes right
if y+1 <= 15 and grid[x][y+1] != 1 and grid[x][y+1] != 2 and not agent.next_is_previous(x, y+1):
d = distance([x, y+1], currDestination)
if d < minDis:
minDis = d
nextPos = [0, 1] # means go right
# check whether the agent goes up
if x-1 >= 0 and grid[x-1][y] != 1 and not agent.next_is_previous(x-1, y):
d = distance([x-1, y], currDestination)
if d < minDis:
minDis = d
nextPos = [-1, 0] # means go up
# check whether the agent goes down
if x+1 <= 15 and grid[x+1][y] != 1 and grid[x+1][y] != 2 and not agent.next_is_previous(x+1, y):
d = distance([x+1, y], currDestination)
if d < minDis:
minDis = d
nextPos = [1, 0] # means go down
# print(agent.previous_grid)
agent.move_to(nextPos)
#////////////////////////////////////////////////
pygame.time.delay(100)
screen.fill(BLACK) # Background color
draw_grid() # Drawing the grid
clock.tick(60) # Limit to 60 frames per second
pygame.display.flip() # Updating the screen
########################
### WS ###
########################
#pygame.time.delay(100)
print("I'm at a table no. {}".format(num_of_table))
## Checking at what state are the plates:
agent.check_plates(num_of_table)
num_of_table +=1
########################
### /WS ###
########################
# set the kitchen as currDestination
currDestination = [13, 12]
# from table to kitchen
while agent.state[0] != currDestination[0] or agent.state[1] != currDestination[1]:
#///////////////////////////////////////
x = agent.state[0]
y = agent.state[1]
# set a huge default number
minDis = 9999
nextPos = []
# check whether the agent goes left
if y-1 >= 0 and grid[x][y-1] != 1 and not agent.next_is_previous(x, y-1):
minDis = distance([x, y-1], currDestination)
nextPos = [0, -1] # means go left
# check whether the agent goes right
if y+1 <= 15 and grid[x][y+1] != 1 and grid[x][y+1] != 2 and not agent.next_is_previous(x, y+1):
d = distance([x, y+1], currDestination)
if d < minDis:
minDis = d
nextPos = [0, 1] # means go right
# check whether the agent goes up
if x-1 >= 0 and grid[x-1][y] != 1 and grid[x-1][y] != 2 and not agent.next_is_previous(x-1, y):
d = distance([x-1, y], currDestination)
if d < minDis:
minDis = d
nextPos = [-1, 0] # means go up
# check whether the agent goes down
if x+1 <= 15 and grid[x+1][y] != 1 and grid[x+1][y] != 2 and not agent.next_is_previous(x+1, y):
d = distance([x+1, y], currDestination)
if d < minDis:
minDis = d
nextPos = [1, 0] # means go down
agent.move_to(nextPos)
#////////////////////////////////////////////////
pygame.time.delay(100)
screen.fill(BLACK) # Background color
draw_grid() # Drawing the grid
clock.tick(60) # Limit to 60 frames per second
pygame.display.flip() # Updating the screen
destination_tables = destination_tables[1:] # remove the first element in the list
# After each fool loop, we can quit the program:.
if len(destination_tables) == 0:
play_again = 1
play_again = int(input("Exit? 0=No, 1=Yes \n"))
if play_again:
pygame.quit()
pygame.quit()

69
which_plate_CNN.py Normal file
View File

@ -0,0 +1,69 @@
##My cnn, classyfing the plates as dirty, clean or full.
#imports
from keras.models import Sequential
from keras.layers import Convolution2D
from keras.layers import MaxPooling2D
from keras.layers import Flatten
from keras.layers import Dense
from keras.callbacks import EarlyStopping
from keras.callbacks import ModelCheckpoint
#initializing:
classifier = Sequential()
#Convolution:
classifier.add(Convolution2D(32, (3, 3), input_shape =(256, 256, 3), activation = "relu"))
#Pooling:
classifier.add(MaxPooling2D(pool_size = (2,2)))
# Adding a second convolutional layer
classifier.add(Convolution2D(32, 3, 3, activation = 'relu'))
classifier.add(MaxPooling2D(pool_size = (2, 2)))
#Flattening:
classifier.add(Flatten())
#Fully connected layers::
classifier.add(Dense(units = 128, activation = "relu"))
classifier.add(Dense(units = 3, activation = "softmax"))
#Making CNN:
classifier.compile(optimizer = "adam", loss = "categorical_crossentropy", metrics = ["accuracy"])
#From KERAS:
from keras.preprocessing.image import ImageDataGenerator
#Data augmentation:
train_datagen = ImageDataGenerator(
rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
width_shift_range=0.2,
height_shift_range=0.1,
fill_mode='nearest')
test_datagen = ImageDataGenerator(rescale=1./255)
training_set = train_datagen.flow_from_directory('plates/training_set',
target_size=(256, 256),
batch_size=16,
class_mode='categorical')
test_set = test_datagen.flow_from_directory('plates/test_set',
target_size=(256, 256),
batch_size=16,
class_mode='categorical')
# callbacks:
es = EarlyStopping(monitor='val_loss', mode='min', baseline=1, patience = 10)
mc = ModelCheckpoint('best_model.h5', monitor='val_loss', mode='min', save_best_only=True, verbose = 1, period = 10)
classifier.fit_generator(
training_set,
steps_per_epoch = 88,
epochs=200,
callbacks=[es, mc],
validation_data=test_set,
validation_steps=10)