diff --git a/Jenkinsfile b/Jenkinsfile index 1eb0901..7525c07 100644 --- a/Jenkinsfile +++ b/Jenkinsfile @@ -15,9 +15,9 @@ pipeline { steps { withEnv(["KAGGLE_USERNAME=${params.KAGGLE_USERNAME}", "KAGGLE_KEY=${params.KAGGLE_KEY}"]) { sh 'pip install kaggle' - sh 'kaggle datasets download -d nasa/meteorite-landings' - sh 'unzip -o meteorite-landings.zip' - sh 'rm meteorite-landings.zip' + sh 'kaggle datasets download -d uciml/forest-cover-type-dataset' + sh 'unzip -o forest-cover-type-dataset.zip' + sh 'rm forest-cover-type-dataset.zip' } } } @@ -29,7 +29,7 @@ pipeline { def customImage = docker.build("custom-image") customImage.inside { sh 'python3 ./IUM_2.py' - archiveArtifacts artifacts: 'meteorite-landings.csv, meteorite_train.csv, meteorite_test.csv, meteorite_val.csv', onlyIfSuccessful: true + archiveArtifacts artifacts: 'covtype.csv, forest_train.csv, forest_test.csv, forest_val.csv', onlyIfSuccessful: true } } } diff --git a/get_dataset.sh b/get_dataset.sh index c250169..d22b6e3 100644 --- a/get_dataset.sh +++ b/get_dataset.sh @@ -1,7 +1,7 @@ #!/bin/bash -kaggle datasets download -d nasa/meteorite-landings +kaggle datasets download -d uciml/forest-cover-type-dataset -unzip -o meteorite-landings.zip +unzip -o forest-cover-type-dataset.zip ###Zmienne### @@ -10,12 +10,12 @@ test_val_ratio=0.5 ##Przetwrazanie pliku## -shuf meteorite-landings.csv -o shuffled-meteorite-landings.csv +shuf covtype.csv -o forest.csv ##Cut off $1 rows## -head -n $1 shuffled-meteorite-landings.csv > shuffled-meteorite-landings.csv +head -n $1 forest.csv > forest.csv -total_lines=$(wc -l < shuffled-meteorite-landings.csv) +total_lines=$(wc -l < forest.csv) train_lines=$(echo $total_lines*$train_ratio| bc) train_lines=$(echo "($train_lines+0.5)/1" | bc ) @@ -24,9 +24,9 @@ test_lines=$(echo "($test_lines+0.5)/1" | bc ) validation_lines=$(echo $total_lines-$train_lines-$test_lines | bc) -head -n "$train_lines" shuffled-meteorite-landings.csv > "meteorite_train.csv" -tail -n $((test_lines+validation_lines)) shuffled-meteorite-landings.csv | head -n "$test_lines" > "meteorite_test.csv" -tail -n "$validation_lines" shuffled-meteorite-landings.csv > "meteorite_validation.csv" +head -n "$train_lines" forest.csv > "forest_train.csv" +tail -n $((test_lines+validation_lines)) forest.csv | head -n "$test_lines" > "forest_test.csv" +tail -n "$validation_lines" forest.csv > "forest_validation.csv" mkdir -p artifacts -mv meteorite-landings.csv shuffled-meteorite-landings.csv meteorite_test.csv meteorite_train.csv meteorite_validation.csv artifacts/ \ No newline at end of file +mv covtype.csv forest.csv forest_test.csv forest_train.csv forest_validation.csv artifacts/ \ No newline at end of file diff --git a/stats/stats_dataset.sh b/stats/stats_dataset.sh index c17eba7..cb13af5 100644 --- a/stats/stats_dataset.sh +++ b/stats/stats_dataset.sh @@ -1,9 +1,9 @@ #!/bin/bash -wc -l artifacts/meteorite_train.csv > stats_train.txt +wc -l artifacts/forest_train.csv > stats_train.txt -wc -l artifacts/meteorite_test.csv > stats_test.txt +wc -l artifacts/forest_test.csv > stats_test.txt -wc -l artifacts/meteorite_validation.csv > stats_validation.txt +wc -l artifacts/forest_validation.csv > stats_validation.txt mv stats_train.txt stats_test.txt stats_validation.txt artifacts/ \ No newline at end of file