1
0
forked from s425077/PotatoPlan
JoelForkTest/Joels Individual Report.md

102 lines
4.4 KiB
Markdown
Raw Normal View History

2020-05-25 11:48:22 +02:00
Machine Learning Implementation
Introduction
I have implemented a ML to add in season, daily rainfall and area which can be cultivated into the Productivity rate of each crop. This dataset contains over 65000 entries of which we only use around 20000 as those are the only crops that are plantable for our agen and the lowest amount of entries for a single crop is 600. The daily rainfall has been added to SoilProperties in our tileset. Implementation of the needed properties into SoilProperties:
public float Rainfall;
public float prevRainfall;
public int Area;
.
.
.
//Randomize the first days rainfall and the area,
prevRainfall = r.Next(247, 3617);
Area = r.Next(1, 270000);
The dataset is said to be based on real life data. But for our our program is not able to fit the data 1 to 1 because some Crops only had an daily rainfall from 247 to 2000 but the Rainfall of that tile could still be 3617. I havent done anything to mitigate this but it doesnt result in any strange results when passing through our data into the ML Model. It would have been better for the dataset contained rainfall data from 247 to 3617. Then the yield could possibly be lower or higher as rainfall increases or decreases. We added dynamic rainfall in is therefore constricting the rain from going outside of the datasets parameters:
//Sets previous days rainfall to current days rainfall and constrict the total rainfall from going outside of the parameters
soilProperties.prevRainfall = soilProperties.Rainfall;
if (soilProperties.prevRainfall > 3616)
soilProperties.prevRainfall = 3616;
else if (soilProperties.prevRainfall < 236)
soilProperties.prevRainfall = 236;
soilProperties.Rainfall = 0;
Implementation
I have used gradient boosting decision tree algorithm due to the features.
I used Gradient Boosting Decision Tree Algorithm for this task due to many features it has.
First a csv file is loaded:
IDataView trainingDataView = mlContext.Data.LoadFromTextFile<ModelInput>(
path: path,
hasHeader: true,
separatorChar: ',',
allowQuoting: true,
allowSparse: false);
Then it is passed to next function which will train, evaluate and build a model. Also trainer parameters will be fine-tuned here to prevent overfitting as much as possible by:
- limiting number of leaves,
- limiting maximum tree depth,
- limiting the amount of bins per feature,
maintaining high accuracy by:
- low learning rate combine with:
- high number of iterations.
Creating pipeline for the model:
var pipeline = mlContext.Transforms
.Text.FeaturizeText("Soil_TypeF", "Soil_Type")
.Append(mlContext.Transforms.Text.FeaturizeText("Crop_TypeF", "Crop_Type"))
.Append(mlContext.Transforms.Concatenate("Features", "Temperature", "Humidity", "Moisture", "Soil_TypeF", "Crop_TypeF", "Nitrogen", "Potassium", "Phosphorous"))
.Append(mlContext.Transforms.Conversion.MapValueToKey("Fertilizer_NameF", "Fertilizer_Name"), TransformerScope.TrainTest)
.AppendCacheCheckpoint(mLContext)
.Append(mLContext.MulticlassClassification.Trainers.LightGbm(options))
.Append(mlContext.Transforms.Conversion.MapKeyToValue("PredictedLabel", "PredictedLabel"));
Integration
Our tileset now has a dynamically moving overlay of clouds moving at the speed of WindSpeed and if the clouds are above at a tile we add to the total rainfall of the day. The rainfall we pass into the ML Model is the previous days rainfall and the first days rainfall is randomized.
We achieved this rainsystem by running a bitmap over our tileset, the bitmap is continues with itself so it runs seamlessly. Depending on the ransparency of each pixel on this bitmap we determine how much it is currently raining on set tile:
if (Rain >= 0.45f)
{
soilProperties.Rainfall = soilProperties.Rainfall + Rain * 4;
}
if the transparency is < 0.45 then there will be no cloud drawn and it will not be raining at all. And this is the result:
![Progression Bar]https://git.wmi.amu.edu.pl/s425077/PotatoPlan/src/Joel-ML/Example.png
To optimize the program we now update 10% of the columns every second or else it would freeze for seconds at a time and the ML Model is only being fed data once a day as the total rainfall is only updated once a day anyways.