1
0
forked from s425077/PotatoPlan
PotatoPlan/Oskar_Nastaly_ML_Report.md

108 lines
4.6 KiB
Markdown
Raw Normal View History

2020-05-10 23:36:05 +02:00
# Machine Learning Method implementation report
Oskar Nastały
2020-05-10 23:11:15 +02:00
## Introduction
Purpose of my ML implementation is for the agent (tractor) to decide what fertilizer it should use.
It's decision is mostly based on nutrients in soil, but also on few other properties.
Dataset is very small, it contains only 100 entries.
There are 7 types of fertilizers, each of them adding a specific amount of nutrients to the soil.
Example:
2020-05-10 23:29:40 +02:00
2020-05-10 23:11:15 +02:00
FertilizerType[6] = new Fertilizer
{
ID = 5,
Name = "DAP",
Nitrogen = 14.52f / 5,
Phosphorus = 1.77f / 5,
Potassium = 9.5f / 5
};
2020-05-10 23:29:40 +02:00
2020-05-10 23:11:15 +02:00
Unfortunately values of nutrients are not based on real values.
That is because even though dataset intention (by it's creator) was to be used to classify fertilizers, it looks like instead it says what fertilizer WAS used and what will be the results of using that fertilizer on some field.
E.g: Urea has 46% of Nitrogen in it and nothing else. In dataset it was classified as best fertilizer to be used on fields with already really high Nitrogen levels. That would lead to oversaturation with Nitrogen and lack of other nutrients.
So i did some calculations and Urea now looks like this:
FertilizerType[7] = new Fertilizer
{
ID = 6,
Name = "Urea",
Nitrogen = 1.81f / 5,
Phosphorus = 21.0f / 5,
Potassium = 9.5f / 5
};
// an "inversed" and little modified counterpart of real-world version of this fertilizer.
2020-05-10 23:29:40 +02:00
2020-05-10 23:11:15 +02:00
## Implementation
I used Gradient Boosting Decision Tree Algorithm for this task due to many features it has.
First a csv file is loaded:
IDataView trainingDataView = mlContext.Data.LoadFromTextFile<ModelInput>(
path: path,
hasHeader: true,
separatorChar: ',',
allowQuoting: true,
allowSparse: false);
2020-05-10 23:29:40 +02:00
2020-05-10 23:11:15 +02:00
Then it is passed to next function which will train, evaluate and build a model.
Also trainer parameters will be fine-tuned here to prevent overfitting as much as possible by:
- limiting number of leaves,
- limiting maximum tree depth,
- limiting the amount of bins per feature,
while maintaining high accuracy by:
- low learning rate combine with
- high number of iterations.
var options = new LightGbmMulticlassTrainer.Options
{
MaximumBinCountPerFeature = 8,
LearningRate = 0.00025,
NumberOfIterations = 40000,
NumberOfLeaves = 10,
LabelColumnName = "Fertilizer_NameF",
FeatureColumnName = "Features",
Booster = new DartBooster.Options()
{
MaximumTreeDepth = 10
}
};
Creating pipeline for the model:
var pipeline = mlContext.Transforms
.Text.FeaturizeText("Soil_TypeF", "Soil_Type")
.Append(mlContext.Transforms.Text.FeaturizeText("Crop_TypeF", "Crop_Type"))
.Append(mlContext.Transforms.Concatenate("Features", "Temperature", "Humidity", "Moisture", "Soil_TypeF", "Crop_TypeF", "Nitrogen", "Potassium", "Phosphorous"))
.Append(mlContext.Transforms.Conversion.MapValueToKey("Fertilizer_NameF", "Fertilizer_Name"), TransformerScope.TrainTest)
.AppendCacheCheckpoint(mLContext)
.Append(mLContext.MulticlassClassification.Trainers.LightGbm(options))
.Append(mlContext.Transforms.Conversion.MapKeyToValue("PredictedLabel", "PredictedLabel"));
Evaluation of the pipeline is done with cross-validation method with 10 folds.
Results are as follow:
Micro Accuracy: 0.95829
LogLoss Average: 0.100171
LogLoss Reduction: 0.933795
Model is created and saved for later use, to skip long trainig and evaluation times.
Later that model is loaded and prediction engine is created when program is started.
## Integration Details
Agent (tractor) navigates trough the grid looking for tiles where it can plant some plants.
Upon planting and visitin already growing plants agent decides if any fertilizer is needed (rule based decision), and what fertilizer to use (using ML prediction engine).
If field is properly fertilized it will have higher production rate, resulting in faster growth of a plant.
Production rate value is shown in the UI as well as it is represented by the colour of progression bar (right side of every tile).
At 100% bar will pure **Green**. Any value below will make bar more **Red**, while any value above will add **Blue**, eventually turning bar colour into cyan.
2020-05-10 23:21:03 +02:00
Example:
2020-05-10 23:31:30 +02:00
2020-05-10 23:33:39 +02:00
![Progression Bar](https://git.wmi.amu.edu.pl/s425077/PotatoPlan/raw/Oskar-ML/example_img.jpg)
2020-05-10 23:21:03 +02:00