ium_464913/IUM_2.ipynb
2024-04-01 19:14:34 +02:00

126 KiB

IUM 2

Installation of packages

%pip install kaggle
%pip install pandas
%pip install numpy
%pip install scikit-learn
Requirement already satisfied: kaggle in c:\users\skype\appdata\local\programs\python\python312\lib\site-packages (1.6.6)
Requirement already satisfied: six>=1.10 in c:\users\skype\appdata\roaming\python\python312\site-packages (from kaggle) (1.16.0)
Requirement already satisfied: certifi in c:\users\skype\appdata\local\programs\python\python312\lib\site-packages (from kaggle) (2024.2.2)
Requirement already satisfied: python-dateutil in c:\users\skype\appdata\roaming\python\python312\site-packages (from kaggle) (2.9.0.post0)
Requirement already satisfied: requests in c:\users\skype\appdata\local\programs\python\python312\lib\site-packages (from kaggle) (2.31.0)
Requirement already satisfied: tqdm in c:\users\skype\appdata\local\programs\python\python312\lib\site-packages (from kaggle) (4.66.2)
Requirement already satisfied: python-slugify in c:\users\skype\appdata\local\programs\python\python312\lib\site-packages (from kaggle) (8.0.4)
Requirement already satisfied: urllib3 in c:\users\skype\appdata\local\programs\python\python312\lib\site-packages (from kaggle) (2.2.1)
Requirement already satisfied: bleach in c:\users\skype\appdata\local\programs\python\python312\lib\site-packages (from kaggle) (6.1.0)
Requirement already satisfied: webencodings in c:\users\skype\appdata\local\programs\python\python312\lib\site-packages (from bleach->kaggle) (0.5.1)
Requirement already satisfied: text-unidecode>=1.3 in c:\users\skype\appdata\local\programs\python\python312\lib\site-packages (from python-slugify->kaggle) (1.3)
Requirement already satisfied: charset-normalizer<4,>=2 in c:\users\skype\appdata\local\programs\python\python312\lib\site-packages (from requests->kaggle) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in c:\users\skype\appdata\local\programs\python\python312\lib\site-packages (from requests->kaggle) (3.6)
Requirement already satisfied: colorama in c:\users\skype\appdata\roaming\python\python312\site-packages (from tqdm->kaggle) (0.4.6)
Note: you may need to restart the kernel to use updated packages.
Requirement already satisfied: pandas in c:\users\skype\appdata\local\programs\python\python312\lib\site-packages (2.2.1)
Requirement already satisfied: numpy<2,>=1.26.0 in c:\users\skype\appdata\local\programs\python\python312\lib\site-packages (from pandas) (1.26.3)
Requirement already satisfied: python-dateutil>=2.8.2 in c:\users\skype\appdata\roaming\python\python312\site-packages (from pandas) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in c:\users\skype\appdata\local\programs\python\python312\lib\site-packages (from pandas) (2024.1)
Requirement already satisfied: tzdata>=2022.7 in c:\users\skype\appdata\local\programs\python\python312\lib\site-packages (from pandas) (2024.1)
Requirement already satisfied: six>=1.5 in c:\users\skype\appdata\roaming\python\python312\site-packages (from python-dateutil>=2.8.2->pandas) (1.16.0)
Note: you may need to restart the kernel to use updated packages.
Requirement already satisfied: numpy in c:\users\skype\appdata\local\programs\python\python312\lib\site-packages (1.26.3)
Note: you may need to restart the kernel to use updated packages.
Requirement already satisfied: scikit-learn in c:\users\skype\appdata\local\programs\python\python312\lib\site-packages (1.4.1.post1)
Requirement already satisfied: numpy<2.0,>=1.19.5 in c:\users\skype\appdata\local\programs\python\python312\lib\site-packages (from scikit-learn) (1.26.3)
Requirement already satisfied: scipy>=1.6.0 in c:\users\skype\appdata\local\programs\python\python312\lib\site-packages (from scikit-learn) (1.12.0)
Requirement already satisfied: joblib>=1.2.0 in c:\users\skype\appdata\local\programs\python\python312\lib\site-packages (from scikit-learn) (1.3.2)
Requirement already satisfied: threadpoolctl>=2.0.0 in c:\users\skype\appdata\local\programs\python\python312\lib\site-packages (from scikit-learn) (3.3.0)
Note: you may need to restart the kernel to use updated packages.

Importing libraries

import pandas as pd
import numpy as np

# To preprocess the data
from sklearn.preprocessing import StandardScaler

# To split the data
from sklearn.model_selection import train_test_split

Downloading a dataset

!kaggle datasets download -d mlg-ulb/creditcardfraud
creditcardfraud.zip: Skipping, found more recently modified local copy (use --force to force download)

Uncompress a file

!unzip -o creditcardfraud.zip
Archive:  creditcardfraud.zip
  inflating: creditcard.csv          

Load the data

df = pd.read_csv("creditcard.csv")
pd.set_option("display.max_columns", None)

Check missing values

df.isnull().sum()
Time      0
V1        0
V2        0
V3        0
V4        0
V5        0
V6        0
V7        0
V8        0
V9        0
V10       0
V11       0
V12       0
V13       0
V14       0
V15       0
V16       0
V17       0
V18       0
V19       0
V20       0
V21       0
V22       0
V23       0
V24       0
V25       0
V26       0
V27       0
V28       0
Amount    0
Class     0
dtype: int64

Size of the dataset

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 284807 entries, 0 to 284806
Data columns (total 31 columns):
 #   Column  Non-Null Count   Dtype  
---  ------  --------------   -----  
 0   Time    284807 non-null  float64
 1   V1      284807 non-null  float64
 2   V2      284807 non-null  float64
 3   V3      284807 non-null  float64
 4   V4      284807 non-null  float64
 5   V5      284807 non-null  float64
 6   V6      284807 non-null  float64
 7   V7      284807 non-null  float64
 8   V8      284807 non-null  float64
 9   V9      284807 non-null  float64
 10  V10     284807 non-null  float64
 11  V11     284807 non-null  float64
 12  V12     284807 non-null  float64
 13  V13     284807 non-null  float64
 14  V14     284807 non-null  float64
 15  V15     284807 non-null  float64
 16  V16     284807 non-null  float64
 17  V17     284807 non-null  float64
 18  V18     284807 non-null  float64
 19  V19     284807 non-null  float64
 20  V20     284807 non-null  float64
 21  V21     284807 non-null  float64
 22  V22     284807 non-null  float64
 23  V23     284807 non-null  float64
 24  V24     284807 non-null  float64
 25  V25     284807 non-null  float64
 26  V26     284807 non-null  float64
 27  V27     284807 non-null  float64
 28  V28     284807 non-null  float64
 29  Amount  284807 non-null  float64
 30  Class   284807 non-null  int64  
dtypes: float64(30), int64(1)
memory usage: 67.4 MB

Normalising the data

scaler = StandardScaler()

df["Amount"] = scaler.fit_transform(df["Amount"].values.reshape(-1, 1))

Summary statistics

df.describe()
Time V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 V23 V24 V25 V26 V27 V28 Amount Class
count 284807.000000 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 284807.000000
mean 94813.859575 1.168375e-15 3.416908e-16 -1.379537e-15 2.074095e-15 9.604066e-16 1.487313e-15 -5.556467e-16 1.213481e-16 -2.406331e-15 2.239053e-15 1.673327e-15 -1.247012e-15 8.190001e-16 1.207294e-15 4.887456e-15 1.437716e-15 -3.772171e-16 9.564149e-16 1.039917e-15 6.406204e-16 1.654067e-16 -3.568593e-16 2.578648e-16 4.473266e-15 5.340915e-16 1.683437e-15 -3.660091e-16 -1.227390e-16 2.913952e-17 0.001727
std 47488.145955 1.958696e+00 1.651309e+00 1.516255e+00 1.415869e+00 1.380247e+00 1.332271e+00 1.237094e+00 1.194353e+00 1.098632e+00 1.088850e+00 1.020713e+00 9.992014e-01 9.952742e-01 9.585956e-01 9.153160e-01 8.762529e-01 8.493371e-01 8.381762e-01 8.140405e-01 7.709250e-01 7.345240e-01 7.257016e-01 6.244603e-01 6.056471e-01 5.212781e-01 4.822270e-01 4.036325e-01 3.300833e-01 1.000002e+00 0.041527
min 0.000000 -5.640751e+01 -7.271573e+01 -4.832559e+01 -5.683171e+00 -1.137433e+02 -2.616051e+01 -4.355724e+01 -7.321672e+01 -1.343407e+01 -2.458826e+01 -4.797473e+00 -1.868371e+01 -5.791881e+00 -1.921433e+01 -4.498945e+00 -1.412985e+01 -2.516280e+01 -9.498746e+00 -7.213527e+00 -5.449772e+01 -3.483038e+01 -1.093314e+01 -4.480774e+01 -2.836627e+00 -1.029540e+01 -2.604551e+00 -2.256568e+01 -1.543008e+01 -3.532294e-01 0.000000
25% 54201.500000 -9.203734e-01 -5.985499e-01 -8.903648e-01 -8.486401e-01 -6.915971e-01 -7.682956e-01 -5.540759e-01 -2.086297e-01 -6.430976e-01 -5.354257e-01 -7.624942e-01 -4.055715e-01 -6.485393e-01 -4.255740e-01 -5.828843e-01 -4.680368e-01 -4.837483e-01 -4.988498e-01 -4.562989e-01 -2.117214e-01 -2.283949e-01 -5.423504e-01 -1.618463e-01 -3.545861e-01 -3.171451e-01 -3.269839e-01 -7.083953e-02 -5.295979e-02 -3.308401e-01 0.000000
50% 84692.000000 1.810880e-02 6.548556e-02 1.798463e-01 -1.984653e-02 -5.433583e-02 -2.741871e-01 4.010308e-02 2.235804e-02 -5.142873e-02 -9.291738e-02 -3.275735e-02 1.400326e-01 -1.356806e-02 5.060132e-02 4.807155e-02 6.641332e-02 -6.567575e-02 -3.636312e-03 3.734823e-03 -6.248109e-02 -2.945017e-02 6.781943e-03 -1.119293e-02 4.097606e-02 1.659350e-02 -5.213911e-02 1.342146e-03 1.124383e-02 -2.652715e-01 0.000000
75% 139320.500000 1.315642e+00 8.037239e-01 1.027196e+00 7.433413e-01 6.119264e-01 3.985649e-01 5.704361e-01 3.273459e-01 5.971390e-01 4.539234e-01 7.395934e-01 6.182380e-01 6.625050e-01 4.931498e-01 6.488208e-01 5.232963e-01 3.996750e-01 5.008067e-01 4.589494e-01 1.330408e-01 1.863772e-01 5.285536e-01 1.476421e-01 4.395266e-01 3.507156e-01 2.409522e-01 9.104512e-02 7.827995e-02 -4.471707e-02 0.000000
max 172792.000000 2.454930e+00 2.205773e+01 9.382558e+00 1.687534e+01 3.480167e+01 7.330163e+01 1.205895e+02 2.000721e+01 1.559499e+01 2.374514e+01 1.201891e+01 7.848392e+00 7.126883e+00 1.052677e+01 8.877742e+00 1.731511e+01 9.253526e+00 5.041069e+00 5.591971e+00 3.942090e+01 2.720284e+01 1.050309e+01 2.252841e+01 4.584549e+00 7.519589e+00 3.517346e+00 3.161220e+01 3.384781e+01 1.023622e+02 1.000000

Distribution of legitimate and fraudulent transactions

df["Class"].value_counts()
Class
0    284315
1       492
Name: count, dtype: int64

Undersampling the data

We will employ undersampling as one class significantly dominates the other.

# Determine the number of instances in the minority class
fraud_count = len(df[df.Class == 1])
fraud_indices = np.array(df[df.Class == 1].index)

# Select indices corresponding to majority class instances
normal_indices = df[df.Class == 0].index

# Randomly sample the same number of instances from the majority class
random_normal_indices = np.random.choice(normal_indices, fraud_count, replace=False)
random_normal_indices = np.array(random_normal_indices)

# Combine indices of both classes
undersample_indice = np.concatenate([fraud_indices, random_normal_indices])

# Undersample dataset
undersample_data = df.iloc[undersample_indice, :]

X_undersample = undersample_data.iloc[:, undersample_data.columns != "Class"]
y_undersample = undersample_data.iloc[:, undersample_data.columns == "Class"]

Size of undersampled dataset

undersample_data.info()
<class 'pandas.core.frame.DataFrame'>
Index: 984 entries, 541 to 141412
Data columns (total 31 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Time    984 non-null    float64
 1   V1      984 non-null    float64
 2   V2      984 non-null    float64
 3   V3      984 non-null    float64
 4   V4      984 non-null    float64
 5   V5      984 non-null    float64
 6   V6      984 non-null    float64
 7   V7      984 non-null    float64
 8   V8      984 non-null    float64
 9   V9      984 non-null    float64
 10  V10     984 non-null    float64
 11  V11     984 non-null    float64
 12  V12     984 non-null    float64
 13  V13     984 non-null    float64
 14  V14     984 non-null    float64
 15  V15     984 non-null    float64
 16  V16     984 non-null    float64
 17  V17     984 non-null    float64
 18  V18     984 non-null    float64
 19  V19     984 non-null    float64
 20  V20     984 non-null    float64
 21  V21     984 non-null    float64
 22  V22     984 non-null    float64
 23  V23     984 non-null    float64
 24  V24     984 non-null    float64
 25  V25     984 non-null    float64
 26  V26     984 non-null    float64
 27  V27     984 non-null    float64
 28  V28     984 non-null    float64
 29  Amount  984 non-null    float64
 30  Class   984 non-null    int64  
dtypes: float64(30), int64(1)
memory usage: 246.0 KB

Summary statistics of the undersampled dataset

undersample_data.describe()
Time V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 V23 V24 V25 V26 V27 V28 Amount Class
count 984.000000 984.000000 984.000000 984.000000 984.000000 984.000000 984.000000 984.000000 984.000000 984.000000 984.000000 984.000000 984.000000 984.000000 984.000000 984.000000 984.000000 984.000000 984.000000 984.000000 984.000000 984.000000 984.000000 984.000000 984.000000 984.000000 984.000000 984.000000 984.000000 984.000000 984.000000
mean 88501.498984 -2.445079 1.781022 -3.509406 2.214004 -1.477993 -0.713150 -2.787427 0.279073 -1.253108 -2.841500 1.930697 -3.124120 -0.026229 -3.502384 -0.039494 -2.097294 -3.304208 -1.128950 0.343668 0.175905 0.331911 0.049631 -0.031264 -0.037389 0.022812 0.027632 0.086286 0.046738 0.039676 0.500000
std 48996.269445 5.512352 3.713232 6.223001 3.231076 4.274632 1.789350 5.856197 4.857643 2.371055 4.563067 2.764745 4.595103 1.054377 4.653202 1.002911 3.465619 5.990033 2.412032 1.290973 1.126258 2.787884 1.167097 1.177562 0.551518 0.677541 0.476480 1.023332 0.479168 0.851800 0.500254
min 60.000000 -30.552380 -15.799625 -31.103685 -3.863126 -22.105532 -10.261990 -43.557242 -41.044261 -13.434066 -24.588262 -2.613374 -18.683715 -3.223045 -19.214325 -4.498945 -14.129855 -25.162799 -9.498746 -3.681904 -7.242879 -22.797604 -8.887017 -19.254328 -2.028024 -4.781606 -1.214960 -7.263482 -2.735623 -0.353229 0.000000
25% 45531.000000 -2.867222 -0.155438 -5.084967 -0.172018 -1.700260 -1.619179 -3.066415 -0.204192 -2.279453 -4.572043 -0.187147 -5.495221 -0.784589 -6.721799 -0.627097 -3.543426 -5.302111 -1.809496 -0.412430 -0.187708 -0.157259 -0.509376 -0.240064 -0.379825 -0.321251 -0.281187 -0.061809 -0.050194 -0.347302 0.000000
50% 83076.500000 -0.823244 0.957399 -1.381998 1.287041 -0.394605 -0.689473 -0.668321 0.147397 -0.694910 -0.948441 1.170286 -0.858094 -0.000686 -1.110717 -0.006070 -0.677801 -0.513640 -0.383038 0.221049 0.040630 0.155404 0.080270 -0.030318 0.009379 0.049923 -0.007475 0.063100 0.039464 -0.280984 0.500000
75% 135051.500000 0.919444 2.791569 0.356911 4.175332 0.616305 0.069620 0.265089 0.877002 0.134399 -0.016047 3.586502 0.190356 0.683977 0.110541 0.672903 0.250353 0.313841 0.334927 0.978754 0.445616 0.642724 0.624948 0.180735 0.365624 0.395001 0.324059 0.457194 0.226492 0.046539 1.000000
max 172733.000000 2.335833 22.057729 3.476268 12.114672 14.103918 6.474115 5.802537 20.007208 6.816732 11.732926 12.018913 2.534876 3.091328 3.442422 2.471358 3.139656 6.739384 3.790316 5.228342 11.059004 27.202839 8.361985 5.466230 1.208141 2.208209 2.745261 3.052358 4.975792 8.146182 1.000000

Distribution of legitimate and fraudulent transactions in an undersampled dataset

undersample_data["Class"].value_counts()
Class
1    492
0    492
Name: count, dtype: int64

Splitting whole data into training and test datasets

X = df.iloc[:, df.columns != "Class"]
y = df.iloc[:, df.columns == "Class"]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

Statistical measures of the training dataset of whole data

pd.concat([X_train, y_train], axis=1).info()
<class 'pandas.core.frame.DataFrame'>
Index: 199364 entries, 161145 to 117952
Data columns (total 31 columns):
 #   Column  Non-Null Count   Dtype  
---  ------  --------------   -----  
 0   Time    199364 non-null  float64
 1   V1      199364 non-null  float64
 2   V2      199364 non-null  float64
 3   V3      199364 non-null  float64
 4   V4      199364 non-null  float64
 5   V5      199364 non-null  float64
 6   V6      199364 non-null  float64
 7   V7      199364 non-null  float64
 8   V8      199364 non-null  float64
 9   V9      199364 non-null  float64
 10  V10     199364 non-null  float64
 11  V11     199364 non-null  float64
 12  V12     199364 non-null  float64
 13  V13     199364 non-null  float64
 14  V14     199364 non-null  float64
 15  V15     199364 non-null  float64
 16  V16     199364 non-null  float64
 17  V17     199364 non-null  float64
 18  V18     199364 non-null  float64
 19  V19     199364 non-null  float64
 20  V20     199364 non-null  float64
 21  V21     199364 non-null  float64
 22  V22     199364 non-null  float64
 23  V23     199364 non-null  float64
 24  V24     199364 non-null  float64
 25  V25     199364 non-null  float64
 26  V26     199364 non-null  float64
 27  V27     199364 non-null  float64
 28  V28     199364 non-null  float64
 29  Amount  199364 non-null  float64
 30  Class   199364 non-null  int64  
dtypes: float64(30), int64(1)
memory usage: 48.7 MB
pd.concat([X_train, y_train], axis=1).describe()
Time V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 V23 V24 V25 V26 V27 V28 Amount Class
count 199364.000000 199364.000000 199364.000000 199364.000000 199364.000000 199364.000000 199364.000000 199364.000000 199364.000000 199364.000000 199364.000000 199364.000000 199364.000000 199364.000000 199364.000000 199364.000000 199364.000000 199364.000000 199364.000000 199364.000000 199364.000000 199364.000000 199364.000000 199364.000000 199364.000000 199364.000000 199364.000000 199364.000000 199364.000000 199364.000000 199364.000000
mean 94799.493936 0.000315 -0.002690 -0.001532 0.000721 -0.001494 -0.000210 -0.000870 -0.001980 0.000212 0.001357 -0.001039 -0.001565 0.000693 0.000137 0.000322 0.000084 0.000292 -0.000134 0.000490 0.000430 -0.000014 -0.000022 -0.000258 0.000362 0.000395 -0.000094 -0.000027 0.000015 0.001271 0.001731
std 47499.835491 1.963554 1.657379 1.516716 1.417138 1.368744 1.328673 1.226018 1.212338 1.102021 1.092801 1.020027 0.996526 0.997718 0.956938 0.916143 0.876131 0.852181 0.837556 0.814506 0.770257 0.743450 0.727625 0.629145 0.605298 0.521175 0.481842 0.401042 0.324849 0.983948 0.041563
min 0.000000 -46.855047 -63.344698 -33.680984 -5.560118 -42.147898 -23.496714 -43.557242 -73.216718 -13.434066 -24.588262 -4.797473 -17.769143 -5.791881 -19.214325 -4.498945 -14.129855 -25.162799 -9.498746 -7.213527 -23.646890 -34.830382 -10.933144 -44.807735 -2.822684 -10.295397 -2.534330 -22.565679 -11.710896 -0.353229 0.000000
25% 54126.000000 -0.921539 -0.601213 -0.892838 -0.848835 -0.692874 -0.769177 -0.554220 -0.209086 -0.644753 -0.535493 -0.762852 -0.407660 -0.648456 -0.425122 -0.583616 -0.467945 -0.484055 -0.498850 -0.456800 -0.211662 -0.229272 -0.544345 -0.162021 -0.354179 -0.316088 -0.327327 -0.070864 -0.052907 -0.330640 0.000000
50% 84633.500000 0.019705 0.063784 0.177888 -0.017852 -0.055832 -0.274397 0.039228 0.021803 -0.049633 -0.092069 -0.034135 0.137912 -0.013416 0.051179 0.049289 0.067772 -0.065113 -0.003217 0.004422 -0.062889 -0.029045 0.006744 -0.010915 0.040974 0.018014 -0.052287 0.001064 0.011119 -0.265271 0.000000
75% 139334.250000 1.316707 0.802437 1.025529 0.745566 0.609349 0.397928 0.569638 0.327023 0.597096 0.458129 0.738143 0.617393 0.664148 0.493925 0.649589 0.523095 0.401034 0.500436 0.460367 0.132834 0.187095 0.531017 0.147503 0.438953 0.350802 0.241082 0.090491 0.077989 -0.043058 0.000000
max 172792.000000 2.451888 22.057729 9.382558 16.715537 34.099309 23.917837 44.054461 20.007208 15.594995 23.745136 12.018913 7.848392 4.569009 10.526766 5.825654 7.059132 9.207059 5.041069 5.572113 39.420904 27.202839 10.503090 22.528412 4.022866 7.519589 3.463246 12.152401 22.620072 78.235272 1.000000
pd.concat([X_train, y_train], axis=1)["Class"].value_counts()
Class
0    199019
1       345
Name: count, dtype: int64

Statistical measures of the test dataset of whole data

pd.concat([X_test, y_test], axis=1).info()
<class 'pandas.core.frame.DataFrame'>
Index: 85443 entries, 183484 to 240913
Data columns (total 31 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Time    85443 non-null  float64
 1   V1      85443 non-null  float64
 2   V2      85443 non-null  float64
 3   V3      85443 non-null  float64
 4   V4      85443 non-null  float64
 5   V5      85443 non-null  float64
 6   V6      85443 non-null  float64
 7   V7      85443 non-null  float64
 8   V8      85443 non-null  float64
 9   V9      85443 non-null  float64
 10  V10     85443 non-null  float64
 11  V11     85443 non-null  float64
 12  V12     85443 non-null  float64
 13  V13     85443 non-null  float64
 14  V14     85443 non-null  float64
 15  V15     85443 non-null  float64
 16  V16     85443 non-null  float64
 17  V17     85443 non-null  float64
 18  V18     85443 non-null  float64
 19  V19     85443 non-null  float64
 20  V20     85443 non-null  float64
 21  V21     85443 non-null  float64
 22  V22     85443 non-null  float64
 23  V23     85443 non-null  float64
 24  V24     85443 non-null  float64
 25  V25     85443 non-null  float64
 26  V26     85443 non-null  float64
 27  V27     85443 non-null  float64
 28  V28     85443 non-null  float64
 29  Amount  85443 non-null  float64
 30  Class   85443 non-null  int64  
dtypes: float64(30), int64(1)
memory usage: 20.9 MB
pd.concat([X_test, y_test], axis=1).describe()
Time V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 V23 V24 V25 V26 V27 V28 Amount Class
count 85443.000000 85443.000000 85443.000000 85443.000000 85443.000000 85443.000000 85443.000000 85443.000000 85443.000000 85443.000000 85443.000000 85443.000000 85443.000000 85443.000000 85443.000000 85443.000000 85443.000000 85443.000000 85443.000000 85443.000000 85443.000000 85443.000000 85443.000000 85443.000000 85443.000000 85443.000000 85443.000000 85443.000000 85443.000000 85443.000000 85443.000000
mean 94847.378896 -0.000734 0.006277 0.003574 -0.001682 0.003486 0.000489 0.002030 0.004620 -0.000495 -0.003167 0.002424 0.003652 -0.001616 -0.000319 -0.000751 -0.000195 -0.000682 0.000312 -0.001144 -0.001004 0.000033 0.000052 0.000602 -0.000845 -0.000922 0.000220 0.000062 -0.000036 -0.002966 0.001720
std 47461.120548 1.947325 1.637050 1.515182 1.412908 1.406722 1.340636 1.262562 1.151291 1.090691 1.079574 1.022315 1.005413 0.989553 0.962457 0.913388 0.876542 0.842669 0.839626 0.812957 0.772484 0.713266 0.721198 0.613394 0.606464 0.521520 0.483126 0.409616 0.341987 1.036492 0.041443
min 0.000000 -56.407510 -72.715728 -48.325589 -5.683171 -113.743307 -26.160506 -28.215112 -50.943369 -9.481456 -20.949192 -4.568390 -18.683715 -3.888606 -18.493773 -4.391307 -13.303888 -22.883999 -9.287832 -6.938297 -54.497720 -22.665685 -9.499423 -32.828995 -2.836627 -8.696627 -2.604551 -9.793568 -15.430084 -0.353229 0.000000
25% 54354.000000 -0.916858 -0.591858 -0.883828 -0.848202 -0.688280 -0.766664 -0.553479 -0.207216 -0.638926 -0.535400 -0.761716 -0.400087 -0.648761 -0.426516 -0.581015 -0.468312 -0.483139 -0.498660 -0.455027 -0.211881 -0.226184 -0.537704 -0.161490 -0.355671 -0.319736 -0.326068 -0.070797 -0.053129 -0.331280 0.000000
50% 84850.000000 0.013238 0.070185 0.185047 -0.024109 -0.051627 -0.273686 0.042343 0.023782 -0.053821 -0.094949 -0.029129 0.144948 -0.013803 0.049248 0.045291 0.062957 -0.066955 -0.004245 0.002229 -0.061529 -0.030687 0.006971 -0.011789 0.040976 0.013508 -0.051695 0.001984 0.011561 -0.265271 0.000000
75% 139277.500000 1.313257 0.806615 1.031155 0.737784 0.618067 0.399864 0.572423 0.328337 0.597388 0.443126 0.743511 0.620694 0.657826 0.491916 0.647117 0.523608 0.396799 0.501455 0.455249 0.133608 0.184846 0.523689 0.147923 0.441093 0.350617 0.240657 0.092224 0.078900 -0.047356 0.000000
max 172788.000000 2.454930 15.876923 4.079168 16.875344 34.801666 73.301626 120.589494 18.748872 9.272376 15.331742 11.669205 4.406338 7.126883 7.439566 8.877742 17.315112 9.253526 4.712398 5.591971 38.117209 22.579714 7.220158 20.803344 4.584549 5.826159 3.517346 31.612198 33.847808 102.362243 1.000000
pd.concat([X_test, y_test], axis=1)["Class"].value_counts()
Class
0    85296
1      147
Name: count, dtype: int64

Splitting undersampled data into training and test datasets

X_train_undersample, X_test_undersample, y_train_undersample, y_test_undersample = (
    train_test_split(X_undersample, y_undersample, test_size=0.3, random_state=0)
)

Statistical measures of the training dataset of undersampled data

pd.concat([X_train_undersample, y_train_undersample], axis=1).info()
<class 'pandas.core.frame.DataFrame'>
Index: 688 entries, 6870 to 208266
Data columns (total 31 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Time    688 non-null    float64
 1   V1      688 non-null    float64
 2   V2      688 non-null    float64
 3   V3      688 non-null    float64
 4   V4      688 non-null    float64
 5   V5      688 non-null    float64
 6   V6      688 non-null    float64
 7   V7      688 non-null    float64
 8   V8      688 non-null    float64
 9   V9      688 non-null    float64
 10  V10     688 non-null    float64
 11  V11     688 non-null    float64
 12  V12     688 non-null    float64
 13  V13     688 non-null    float64
 14  V14     688 non-null    float64
 15  V15     688 non-null    float64
 16  V16     688 non-null    float64
 17  V17     688 non-null    float64
 18  V18     688 non-null    float64
 19  V19     688 non-null    float64
 20  V20     688 non-null    float64
 21  V21     688 non-null    float64
 22  V22     688 non-null    float64
 23  V23     688 non-null    float64
 24  V24     688 non-null    float64
 25  V25     688 non-null    float64
 26  V26     688 non-null    float64
 27  V27     688 non-null    float64
 28  V28     688 non-null    float64
 29  Amount  688 non-null    float64
 30  Class   688 non-null    int64  
dtypes: float64(30), int64(1)
memory usage: 172.0 KB
pd.concat([X_train_undersample, y_train_undersample], axis=1).describe()
Time V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 V23 V24 V25 V26 V27 V28 Amount Class
count 688.000000 688.000000 688.000000 688.000000 688.000000 688.000000 688.000000 688.000000 688.000000 688.000000 688.000000 688.000000 688.000000 688.000000 688.000000 688.000000 688.000000 688.000000 688.000000 688.000000 688.000000 688.000000 688.000000 688.000000 688.000000 688.000000 688.000000 688.000000 688.000000 688.000000 688.000000
mean 88546.635174 -2.443642 1.748210 -3.490693 2.161294 -1.466909 -0.737723 -2.759190 0.361773 -1.222417 -2.808144 1.937783 -3.131850 -0.001132 -3.568854 -0.022936 -2.145811 -3.365430 -1.137238 0.377690 0.127157 0.446495 0.012945 -0.069031 -0.020203 0.031782 0.022154 0.114684 0.041557 0.036592 0.501453
std 48529.661753 5.382638 3.616426 6.020391 3.198221 4.227553 1.829535 5.498995 4.741154 2.336555 4.417548 2.771137 4.560753 1.081826 4.641960 0.981683 3.458663 6.062216 2.462689 1.287256 1.072960 2.749354 1.143940 1.283882 0.549485 0.689015 0.474411 0.923161 0.487077 0.834360 0.500362
min 117.000000 -30.552380 -15.799625 -31.103685 -3.863126 -22.105532 -10.261990 -37.060311 -37.353443 -11.126624 -23.228255 -2.613374 -18.431131 -3.223045 -19.214325 -4.498945 -13.563273 -25.162799 -9.498746 -3.602657 -7.242879 -16.922016 -8.887017 -19.254328 -2.028024 -4.781606 -1.214960 -7.263482 -2.735623 -0.353229 0.000000
25% 45531.000000 -2.867222 -0.164478 -5.049001 -0.212543 -1.703845 -1.691031 -3.105154 -0.220868 -2.205996 -4.731895 -0.194163 -5.643631 -0.767631 -6.767749 -0.562582 -3.612856 -5.277726 -1.816368 -0.373523 -0.197730 -0.142520 -0.510247 -0.246005 -0.373302 -0.320463 -0.281449 -0.061809 -0.050983 -0.346113 0.000000
50% 82526.500000 -0.874057 0.984845 -1.482880 1.285768 -0.400360 -0.741307 -0.740952 0.141389 -0.694910 -0.981569 1.154879 -0.845463 0.008049 -1.132761 0.001558 -0.750918 -0.495063 -0.392743 0.246478 0.030556 0.163323 0.076684 -0.027143 0.014360 0.046511 -0.026232 0.059798 0.036635 -0.273188 1.000000
75% 135096.750000 0.945582 2.850947 0.348579 4.166857 0.599892 0.033569 0.240843 0.919999 0.196633 -0.001047 3.625262 0.163104 0.744021 0.086669 0.665736 0.219809 0.314206 0.371481 0.978754 0.443495 0.680597 0.629109 0.174862 0.382076 0.406056 0.306403 0.482488 0.235549 0.046539 1.000000
max 172573.000000 2.335833 19.167239 3.228978 11.927512 14.103918 6.355986 5.802537 20.007208 6.816732 11.732926 12.018913 2.534876 3.091328 3.442422 2.364199 3.139656 6.739384 3.790316 5.228342 7.907378 27.202839 5.774087 5.303607 1.208141 2.208209 2.745261 3.052358 4.975792 8.146182 1.000000
pd.concat([X_train_undersample, y_train_undersample], axis=1)["Class"].value_counts()
Class
1    345
0    343
Name: count, dtype: int64

Statistical measures of the test dataset of undersampled data

pd.concat([X_test_undersample, y_test_undersample], axis=1).info()
<class 'pandas.core.frame.DataFrame'>
Index: 296 entries, 102782 to 57921
Data columns (total 31 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Time    296 non-null    float64
 1   V1      296 non-null    float64
 2   V2      296 non-null    float64
 3   V3      296 non-null    float64
 4   V4      296 non-null    float64
 5   V5      296 non-null    float64
 6   V6      296 non-null    float64
 7   V7      296 non-null    float64
 8   V8      296 non-null    float64
 9   V9      296 non-null    float64
 10  V10     296 non-null    float64
 11  V11     296 non-null    float64
 12  V12     296 non-null    float64
 13  V13     296 non-null    float64
 14  V14     296 non-null    float64
 15  V15     296 non-null    float64
 16  V16     296 non-null    float64
 17  V17     296 non-null    float64
 18  V18     296 non-null    float64
 19  V19     296 non-null    float64
 20  V20     296 non-null    float64
 21  V21     296 non-null    float64
 22  V22     296 non-null    float64
 23  V23     296 non-null    float64
 24  V24     296 non-null    float64
 25  V25     296 non-null    float64
 26  V26     296 non-null    float64
 27  V27     296 non-null    float64
 28  V28     296 non-null    float64
 29  Amount  296 non-null    float64
 30  Class   296 non-null    int64  
dtypes: float64(30), int64(1)
memory usage: 74.0 KB
pd.concat([X_test_undersample, y_test_undersample], axis=1).describe()
Time V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 V23 V24 V25 V26 V27 V28 Amount Class
count 296.000000 296.000000 296.000000 296.000000 296.000000 296.000000 296.000000 296.000000 296.000000 296.000000 296.000000 296.000000 296.000000 296.000000 296.000000 296.000000 296.000000 296.000000 296.000000 296.000000 296.000000 296.000000 296.000000 296.000000 296.000000 296.000000 296.000000 296.000000 296.000000 296.000000 296.000000
mean 88396.587838 -2.448419 1.857288 -3.552900 2.336519 -1.503755 -0.656035 -2.853058 0.086851 -1.324446 -2.919028 1.914227 -3.106154 -0.084562 -3.347887 -0.077981 -1.984526 -3.161909 -1.109686 0.264590 0.289212 0.065582 0.134902 0.056521 -0.077336 0.001963 0.040364 0.020281 0.058781 0.046845 0.496622
std 50147.105326 5.812072 3.934323 6.680660 3.308417 4.389263 1.693893 6.622008 5.121293 2.451914 4.891517 2.754439 4.681722 0.986937 4.683458 1.051296 3.484989 5.826410 2.293910 1.298310 1.235841 2.862463 1.216935 0.877975 0.555090 0.650752 0.481822 1.224166 0.460841 0.892432 0.500835
min 60.000000 -29.876366 -8.402154 -30.558697 -2.956827 -21.665654 -5.773192 -43.557242 -41.044261 -13.434066 -24.588262 -2.383066 -18.683715 -3.076318 -17.620634 -3.092108 -14.129855 -22.541652 -9.090892 -3.681904 -5.225849 -22.797604 -8.887017 -5.988806 -1.742803 -2.079928 -1.170476 -7.263482 -1.931920 -0.353229 0.000000
25% 45977.500000 -2.867766 -0.130600 -5.417818 -0.118496 -1.667035 -1.477544 -2.835885 -0.168935 -2.345829 -4.445615 -0.144802 -5.340188 -0.815218 -6.363108 -0.729637 -3.303237 -5.358990 -1.747789 -0.563676 -0.165023 -0.178103 -0.483530 -0.212828 -0.405811 -0.324214 -0.270853 -0.056831 -0.042639 -0.349231 0.000000
50% 84069.000000 -0.740915 0.941852 -1.139964 1.340723 -0.369227 -0.596589 -0.501864 0.169642 -0.696902 -0.875521 1.267304 -0.938658 -0.060414 -1.059352 -0.012904 -0.547678 -0.527389 -0.318904 0.169827 0.056998 0.130060 0.081904 -0.035614 -0.010232 0.068890 0.031911 0.073702 0.046030 -0.300834 0.000000
75% 135023.500000 0.879511 2.700371 0.394765 4.305361 0.624459 0.139244 0.306788 0.833392 0.011527 -0.051012 3.542336 0.234752 0.609629 0.173916 0.685300 0.351119 0.309636 0.237358 0.948371 0.461180 0.568611 0.617588 0.200328 0.317653 0.386804 0.355382 0.395412 0.192766 0.028048 1.000000
max 172733.000000 2.306769 22.057729 3.476268 12.114672 9.880564 6.474115 3.791907 19.587773 4.866316 6.367661 11.152491 1.725185 2.897044 2.654275 2.471358 2.696475 6.443649 2.591846 4.851255 11.059004 27.202839 8.361985 5.466230 1.077407 2.156042 1.458828 2.706566 3.042406 5.663610 1.000000
pd.concat([X_test_undersample, y_test_undersample], axis=1)["Class"].value_counts()
Class
0    149
1    147
Name: count, dtype: int64