126 KiB
126 KiB
%pip install kaggle
%pip install pandas
%pip install numpy
%pip install scikit-learn
Requirement already satisfied: kaggle in c:\users\skype\appdata\local\programs\python\python312\lib\site-packages (1.6.6) Requirement already satisfied: six>=1.10 in c:\users\skype\appdata\roaming\python\python312\site-packages (from kaggle) (1.16.0) Requirement already satisfied: certifi in c:\users\skype\appdata\local\programs\python\python312\lib\site-packages (from kaggle) (2024.2.2) Requirement already satisfied: python-dateutil in c:\users\skype\appdata\roaming\python\python312\site-packages (from kaggle) (2.9.0.post0) Requirement already satisfied: requests in c:\users\skype\appdata\local\programs\python\python312\lib\site-packages (from kaggle) (2.31.0) Requirement already satisfied: tqdm in c:\users\skype\appdata\local\programs\python\python312\lib\site-packages (from kaggle) (4.66.2) Requirement already satisfied: python-slugify in c:\users\skype\appdata\local\programs\python\python312\lib\site-packages (from kaggle) (8.0.4) Requirement already satisfied: urllib3 in c:\users\skype\appdata\local\programs\python\python312\lib\site-packages (from kaggle) (2.2.1) Requirement already satisfied: bleach in c:\users\skype\appdata\local\programs\python\python312\lib\site-packages (from kaggle) (6.1.0) Requirement already satisfied: webencodings in c:\users\skype\appdata\local\programs\python\python312\lib\site-packages (from bleach->kaggle) (0.5.1) Requirement already satisfied: text-unidecode>=1.3 in c:\users\skype\appdata\local\programs\python\python312\lib\site-packages (from python-slugify->kaggle) (1.3) Requirement already satisfied: charset-normalizer<4,>=2 in c:\users\skype\appdata\local\programs\python\python312\lib\site-packages (from requests->kaggle) (3.3.2) Requirement already satisfied: idna<4,>=2.5 in c:\users\skype\appdata\local\programs\python\python312\lib\site-packages (from requests->kaggle) (3.6) Requirement already satisfied: colorama in c:\users\skype\appdata\roaming\python\python312\site-packages (from tqdm->kaggle) (0.4.6) Note: you may need to restart the kernel to use updated packages. Requirement already satisfied: pandas in c:\users\skype\appdata\local\programs\python\python312\lib\site-packages (2.2.1) Requirement already satisfied: numpy<2,>=1.26.0 in c:\users\skype\appdata\local\programs\python\python312\lib\site-packages (from pandas) (1.26.3) Requirement already satisfied: python-dateutil>=2.8.2 in c:\users\skype\appdata\roaming\python\python312\site-packages (from pandas) (2.9.0.post0) Requirement already satisfied: pytz>=2020.1 in c:\users\skype\appdata\local\programs\python\python312\lib\site-packages (from pandas) (2024.1) Requirement already satisfied: tzdata>=2022.7 in c:\users\skype\appdata\local\programs\python\python312\lib\site-packages (from pandas) (2024.1) Requirement already satisfied: six>=1.5 in c:\users\skype\appdata\roaming\python\python312\site-packages (from python-dateutil>=2.8.2->pandas) (1.16.0) Note: you may need to restart the kernel to use updated packages. Requirement already satisfied: numpy in c:\users\skype\appdata\local\programs\python\python312\lib\site-packages (1.26.3) Note: you may need to restart the kernel to use updated packages. Requirement already satisfied: scikit-learn in c:\users\skype\appdata\local\programs\python\python312\lib\site-packages (1.4.1.post1) Requirement already satisfied: numpy<2.0,>=1.19.5 in c:\users\skype\appdata\local\programs\python\python312\lib\site-packages (from scikit-learn) (1.26.3) Requirement already satisfied: scipy>=1.6.0 in c:\users\skype\appdata\local\programs\python\python312\lib\site-packages (from scikit-learn) (1.12.0) Requirement already satisfied: joblib>=1.2.0 in c:\users\skype\appdata\local\programs\python\python312\lib\site-packages (from scikit-learn) (1.3.2) Requirement already satisfied: threadpoolctl>=2.0.0 in c:\users\skype\appdata\local\programs\python\python312\lib\site-packages (from scikit-learn) (3.3.0) Note: you may need to restart the kernel to use updated packages.
import pandas as pd
import numpy as np
# To preprocess the data
from sklearn.preprocessing import StandardScaler
# To split the data
from sklearn.model_selection import train_test_split
!kaggle datasets download -d mlg-ulb/creditcardfraud
creditcardfraud.zip: Skipping, found more recently modified local copy (use --force to force download)
!unzip -o creditcardfraud.zip
Archive: creditcardfraud.zip inflating: creditcard.csv
df = pd.read_csv('creditcard.csv')
pd.set_option('display.max_columns', None)
df.isnull().sum()
Time 0 V1 0 V2 0 V3 0 V4 0 V5 0 V6 0 V7 0 V8 0 V9 0 V10 0 V11 0 V12 0 V13 0 V14 0 V15 0 V16 0 V17 0 V18 0 V19 0 V20 0 V21 0 V22 0 V23 0 V24 0 V25 0 V26 0 V27 0 V28 0 Amount 0 Class 0 dtype: int64
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 284807 entries, 0 to 284806 Data columns (total 31 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Time 284807 non-null float64 1 V1 284807 non-null float64 2 V2 284807 non-null float64 3 V3 284807 non-null float64 4 V4 284807 non-null float64 5 V5 284807 non-null float64 6 V6 284807 non-null float64 7 V7 284807 non-null float64 8 V8 284807 non-null float64 9 V9 284807 non-null float64 10 V10 284807 non-null float64 11 V11 284807 non-null float64 12 V12 284807 non-null float64 13 V13 284807 non-null float64 14 V14 284807 non-null float64 15 V15 284807 non-null float64 16 V16 284807 non-null float64 17 V17 284807 non-null float64 18 V18 284807 non-null float64 19 V19 284807 non-null float64 20 V20 284807 non-null float64 21 V21 284807 non-null float64 22 V22 284807 non-null float64 23 V23 284807 non-null float64 24 V24 284807 non-null float64 25 V25 284807 non-null float64 26 V26 284807 non-null float64 27 V27 284807 non-null float64 28 V28 284807 non-null float64 29 Amount 284807 non-null float64 30 Class 284807 non-null int64 dtypes: float64(30), int64(1) memory usage: 67.4 MB
scaler = StandardScaler()
df['Amount'] = scaler.fit_transform(df['Amount'].values.reshape(-1, 1))
df.describe()
Time | V1 | V2 | V3 | V4 | V5 | V6 | V7 | V8 | V9 | V10 | V11 | V12 | V13 | V14 | V15 | V16 | V17 | V18 | V19 | V20 | V21 | V22 | V23 | V24 | V25 | V26 | V27 | V28 | Amount | Class | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 284807.000000 | 2.848070e+05 | 2.848070e+05 | 2.848070e+05 | 2.848070e+05 | 2.848070e+05 | 2.848070e+05 | 2.848070e+05 | 2.848070e+05 | 2.848070e+05 | 2.848070e+05 | 2.848070e+05 | 2.848070e+05 | 2.848070e+05 | 2.848070e+05 | 2.848070e+05 | 2.848070e+05 | 2.848070e+05 | 2.848070e+05 | 2.848070e+05 | 2.848070e+05 | 2.848070e+05 | 2.848070e+05 | 2.848070e+05 | 2.848070e+05 | 2.848070e+05 | 2.848070e+05 | 2.848070e+05 | 2.848070e+05 | 2.848070e+05 | 284807.000000 |
mean | 94813.859575 | 1.168375e-15 | 3.416908e-16 | -1.379537e-15 | 2.074095e-15 | 9.604066e-16 | 1.487313e-15 | -5.556467e-16 | 1.213481e-16 | -2.406331e-15 | 2.239053e-15 | 1.673327e-15 | -1.247012e-15 | 8.190001e-16 | 1.207294e-15 | 4.887456e-15 | 1.437716e-15 | -3.772171e-16 | 9.564149e-16 | 1.039917e-15 | 6.406204e-16 | 1.654067e-16 | -3.568593e-16 | 2.578648e-16 | 4.473266e-15 | 5.340915e-16 | 1.683437e-15 | -3.660091e-16 | -1.227390e-16 | 2.913952e-17 | 0.001727 |
std | 47488.145955 | 1.958696e+00 | 1.651309e+00 | 1.516255e+00 | 1.415869e+00 | 1.380247e+00 | 1.332271e+00 | 1.237094e+00 | 1.194353e+00 | 1.098632e+00 | 1.088850e+00 | 1.020713e+00 | 9.992014e-01 | 9.952742e-01 | 9.585956e-01 | 9.153160e-01 | 8.762529e-01 | 8.493371e-01 | 8.381762e-01 | 8.140405e-01 | 7.709250e-01 | 7.345240e-01 | 7.257016e-01 | 6.244603e-01 | 6.056471e-01 | 5.212781e-01 | 4.822270e-01 | 4.036325e-01 | 3.300833e-01 | 1.000002e+00 | 0.041527 |
min | 0.000000 | -5.640751e+01 | -7.271573e+01 | -4.832559e+01 | -5.683171e+00 | -1.137433e+02 | -2.616051e+01 | -4.355724e+01 | -7.321672e+01 | -1.343407e+01 | -2.458826e+01 | -4.797473e+00 | -1.868371e+01 | -5.791881e+00 | -1.921433e+01 | -4.498945e+00 | -1.412985e+01 | -2.516280e+01 | -9.498746e+00 | -7.213527e+00 | -5.449772e+01 | -3.483038e+01 | -1.093314e+01 | -4.480774e+01 | -2.836627e+00 | -1.029540e+01 | -2.604551e+00 | -2.256568e+01 | -1.543008e+01 | -3.532294e-01 | 0.000000 |
25% | 54201.500000 | -9.203734e-01 | -5.985499e-01 | -8.903648e-01 | -8.486401e-01 | -6.915971e-01 | -7.682956e-01 | -5.540759e-01 | -2.086297e-01 | -6.430976e-01 | -5.354257e-01 | -7.624942e-01 | -4.055715e-01 | -6.485393e-01 | -4.255740e-01 | -5.828843e-01 | -4.680368e-01 | -4.837483e-01 | -4.988498e-01 | -4.562989e-01 | -2.117214e-01 | -2.283949e-01 | -5.423504e-01 | -1.618463e-01 | -3.545861e-01 | -3.171451e-01 | -3.269839e-01 | -7.083953e-02 | -5.295979e-02 | -3.308401e-01 | 0.000000 |
50% | 84692.000000 | 1.810880e-02 | 6.548556e-02 | 1.798463e-01 | -1.984653e-02 | -5.433583e-02 | -2.741871e-01 | 4.010308e-02 | 2.235804e-02 | -5.142873e-02 | -9.291738e-02 | -3.275735e-02 | 1.400326e-01 | -1.356806e-02 | 5.060132e-02 | 4.807155e-02 | 6.641332e-02 | -6.567575e-02 | -3.636312e-03 | 3.734823e-03 | -6.248109e-02 | -2.945017e-02 | 6.781943e-03 | -1.119293e-02 | 4.097606e-02 | 1.659350e-02 | -5.213911e-02 | 1.342146e-03 | 1.124383e-02 | -2.652715e-01 | 0.000000 |
75% | 139320.500000 | 1.315642e+00 | 8.037239e-01 | 1.027196e+00 | 7.433413e-01 | 6.119264e-01 | 3.985649e-01 | 5.704361e-01 | 3.273459e-01 | 5.971390e-01 | 4.539234e-01 | 7.395934e-01 | 6.182380e-01 | 6.625050e-01 | 4.931498e-01 | 6.488208e-01 | 5.232963e-01 | 3.996750e-01 | 5.008067e-01 | 4.589494e-01 | 1.330408e-01 | 1.863772e-01 | 5.285536e-01 | 1.476421e-01 | 4.395266e-01 | 3.507156e-01 | 2.409522e-01 | 9.104512e-02 | 7.827995e-02 | -4.471707e-02 | 0.000000 |
max | 172792.000000 | 2.454930e+00 | 2.205773e+01 | 9.382558e+00 | 1.687534e+01 | 3.480167e+01 | 7.330163e+01 | 1.205895e+02 | 2.000721e+01 | 1.559499e+01 | 2.374514e+01 | 1.201891e+01 | 7.848392e+00 | 7.126883e+00 | 1.052677e+01 | 8.877742e+00 | 1.731511e+01 | 9.253526e+00 | 5.041069e+00 | 5.591971e+00 | 3.942090e+01 | 2.720284e+01 | 1.050309e+01 | 2.252841e+01 | 4.584549e+00 | 7.519589e+00 | 3.517346e+00 | 3.161220e+01 | 3.384781e+01 | 1.023622e+02 | 1.000000 |
df['Class'].value_counts()
Class 0 284315 1 492 Name: count, dtype: int64
# Determine the number of instances in the minority class
fraud_count = len(df[df.Class == 1])
fraud_indices = np.array(df[df.Class == 1].index)
# Select indices corresponding to majority class instances
normal_indices = df[df.Class == 0].index
# Randomly sample the same number of instances from the majority class
random_normal_indices = np.random.choice(normal_indices, fraud_count, replace=False)
random_normal_indices = np.array(random_normal_indices)
# Combine indices of both classes
undersample_indice = np.concatenate([fraud_indices, random_normal_indices])
# Undersample dataset
undersample_data = df.iloc[undersample_indice, :]
X_undersample = undersample_data.iloc[:, undersample_data.columns != 'Class']
y_undersample = undersample_data.iloc[:, undersample_data.columns == 'Class']
undersample_data.info()
<class 'pandas.core.frame.DataFrame'> Index: 984 entries, 541 to 141412 Data columns (total 31 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Time 984 non-null float64 1 V1 984 non-null float64 2 V2 984 non-null float64 3 V3 984 non-null float64 4 V4 984 non-null float64 5 V5 984 non-null float64 6 V6 984 non-null float64 7 V7 984 non-null float64 8 V8 984 non-null float64 9 V9 984 non-null float64 10 V10 984 non-null float64 11 V11 984 non-null float64 12 V12 984 non-null float64 13 V13 984 non-null float64 14 V14 984 non-null float64 15 V15 984 non-null float64 16 V16 984 non-null float64 17 V17 984 non-null float64 18 V18 984 non-null float64 19 V19 984 non-null float64 20 V20 984 non-null float64 21 V21 984 non-null float64 22 V22 984 non-null float64 23 V23 984 non-null float64 24 V24 984 non-null float64 25 V25 984 non-null float64 26 V26 984 non-null float64 27 V27 984 non-null float64 28 V28 984 non-null float64 29 Amount 984 non-null float64 30 Class 984 non-null int64 dtypes: float64(30), int64(1) memory usage: 246.0 KB
undersample_data.describe()
Time | V1 | V2 | V3 | V4 | V5 | V6 | V7 | V8 | V9 | V10 | V11 | V12 | V13 | V14 | V15 | V16 | V17 | V18 | V19 | V20 | V21 | V22 | V23 | V24 | V25 | V26 | V27 | V28 | Amount | Class | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 984.000000 | 984.000000 | 984.000000 | 984.000000 | 984.000000 | 984.000000 | 984.000000 | 984.000000 | 984.000000 | 984.000000 | 984.000000 | 984.000000 | 984.000000 | 984.000000 | 984.000000 | 984.000000 | 984.000000 | 984.000000 | 984.000000 | 984.000000 | 984.000000 | 984.000000 | 984.000000 | 984.000000 | 984.000000 | 984.000000 | 984.000000 | 984.000000 | 984.000000 | 984.000000 | 984.000000 |
mean | 88501.498984 | -2.445079 | 1.781022 | -3.509406 | 2.214004 | -1.477993 | -0.713150 | -2.787427 | 0.279073 | -1.253108 | -2.841500 | 1.930697 | -3.124120 | -0.026229 | -3.502384 | -0.039494 | -2.097294 | -3.304208 | -1.128950 | 0.343668 | 0.175905 | 0.331911 | 0.049631 | -0.031264 | -0.037389 | 0.022812 | 0.027632 | 0.086286 | 0.046738 | 0.039676 | 0.500000 |
std | 48996.269445 | 5.512352 | 3.713232 | 6.223001 | 3.231076 | 4.274632 | 1.789350 | 5.856197 | 4.857643 | 2.371055 | 4.563067 | 2.764745 | 4.595103 | 1.054377 | 4.653202 | 1.002911 | 3.465619 | 5.990033 | 2.412032 | 1.290973 | 1.126258 | 2.787884 | 1.167097 | 1.177562 | 0.551518 | 0.677541 | 0.476480 | 1.023332 | 0.479168 | 0.851800 | 0.500254 |
min | 60.000000 | -30.552380 | -15.799625 | -31.103685 | -3.863126 | -22.105532 | -10.261990 | -43.557242 | -41.044261 | -13.434066 | -24.588262 | -2.613374 | -18.683715 | -3.223045 | -19.214325 | -4.498945 | -14.129855 | -25.162799 | -9.498746 | -3.681904 | -7.242879 | -22.797604 | -8.887017 | -19.254328 | -2.028024 | -4.781606 | -1.214960 | -7.263482 | -2.735623 | -0.353229 | 0.000000 |
25% | 45531.000000 | -2.867222 | -0.155438 | -5.084967 | -0.172018 | -1.700260 | -1.619179 | -3.066415 | -0.204192 | -2.279453 | -4.572043 | -0.187147 | -5.495221 | -0.784589 | -6.721799 | -0.627097 | -3.543426 | -5.302111 | -1.809496 | -0.412430 | -0.187708 | -0.157259 | -0.509376 | -0.240064 | -0.379825 | -0.321251 | -0.281187 | -0.061809 | -0.050194 | -0.347302 | 0.000000 |
50% | 83076.500000 | -0.823244 | 0.957399 | -1.381998 | 1.287041 | -0.394605 | -0.689473 | -0.668321 | 0.147397 | -0.694910 | -0.948441 | 1.170286 | -0.858094 | -0.000686 | -1.110717 | -0.006070 | -0.677801 | -0.513640 | -0.383038 | 0.221049 | 0.040630 | 0.155404 | 0.080270 | -0.030318 | 0.009379 | 0.049923 | -0.007475 | 0.063100 | 0.039464 | -0.280984 | 0.500000 |
75% | 135051.500000 | 0.919444 | 2.791569 | 0.356911 | 4.175332 | 0.616305 | 0.069620 | 0.265089 | 0.877002 | 0.134399 | -0.016047 | 3.586502 | 0.190356 | 0.683977 | 0.110541 | 0.672903 | 0.250353 | 0.313841 | 0.334927 | 0.978754 | 0.445616 | 0.642724 | 0.624948 | 0.180735 | 0.365624 | 0.395001 | 0.324059 | 0.457194 | 0.226492 | 0.046539 | 1.000000 |
max | 172733.000000 | 2.335833 | 22.057729 | 3.476268 | 12.114672 | 14.103918 | 6.474115 | 5.802537 | 20.007208 | 6.816732 | 11.732926 | 12.018913 | 2.534876 | 3.091328 | 3.442422 | 2.471358 | 3.139656 | 6.739384 | 3.790316 | 5.228342 | 11.059004 | 27.202839 | 8.361985 | 5.466230 | 1.208141 | 2.208209 | 2.745261 | 3.052358 | 4.975792 | 8.146182 | 1.000000 |
undersample_data['Class'].value_counts()
Class 1 492 0 492 Name: count, dtype: int64
X = df.iloc[:, df.columns != 'Class']
y = df.iloc[:, df.columns == 'Class']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
pd.concat([X_train, y_train], axis=1).info()
<class 'pandas.core.frame.DataFrame'> Index: 199364 entries, 161145 to 117952 Data columns (total 31 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Time 199364 non-null float64 1 V1 199364 non-null float64 2 V2 199364 non-null float64 3 V3 199364 non-null float64 4 V4 199364 non-null float64 5 V5 199364 non-null float64 6 V6 199364 non-null float64 7 V7 199364 non-null float64 8 V8 199364 non-null float64 9 V9 199364 non-null float64 10 V10 199364 non-null float64 11 V11 199364 non-null float64 12 V12 199364 non-null float64 13 V13 199364 non-null float64 14 V14 199364 non-null float64 15 V15 199364 non-null float64 16 V16 199364 non-null float64 17 V17 199364 non-null float64 18 V18 199364 non-null float64 19 V19 199364 non-null float64 20 V20 199364 non-null float64 21 V21 199364 non-null float64 22 V22 199364 non-null float64 23 V23 199364 non-null float64 24 V24 199364 non-null float64 25 V25 199364 non-null float64 26 V26 199364 non-null float64 27 V27 199364 non-null float64 28 V28 199364 non-null float64 29 Amount 199364 non-null float64 30 Class 199364 non-null int64 dtypes: float64(30), int64(1) memory usage: 48.7 MB
pd.concat([X_train, y_train], axis=1).describe()
Time | V1 | V2 | V3 | V4 | V5 | V6 | V7 | V8 | V9 | V10 | V11 | V12 | V13 | V14 | V15 | V16 | V17 | V18 | V19 | V20 | V21 | V22 | V23 | V24 | V25 | V26 | V27 | V28 | Amount | Class | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 199364.000000 | 199364.000000 | 199364.000000 | 199364.000000 | 199364.000000 | 199364.000000 | 199364.000000 | 199364.000000 | 199364.000000 | 199364.000000 | 199364.000000 | 199364.000000 | 199364.000000 | 199364.000000 | 199364.000000 | 199364.000000 | 199364.000000 | 199364.000000 | 199364.000000 | 199364.000000 | 199364.000000 | 199364.000000 | 199364.000000 | 199364.000000 | 199364.000000 | 199364.000000 | 199364.000000 | 199364.000000 | 199364.000000 | 199364.000000 | 199364.000000 |
mean | 94799.493936 | 0.000315 | -0.002690 | -0.001532 | 0.000721 | -0.001494 | -0.000210 | -0.000870 | -0.001980 | 0.000212 | 0.001357 | -0.001039 | -0.001565 | 0.000693 | 0.000137 | 0.000322 | 0.000084 | 0.000292 | -0.000134 | 0.000490 | 0.000430 | -0.000014 | -0.000022 | -0.000258 | 0.000362 | 0.000395 | -0.000094 | -0.000027 | 0.000015 | 0.001271 | 0.001731 |
std | 47499.835491 | 1.963554 | 1.657379 | 1.516716 | 1.417138 | 1.368744 | 1.328673 | 1.226018 | 1.212338 | 1.102021 | 1.092801 | 1.020027 | 0.996526 | 0.997718 | 0.956938 | 0.916143 | 0.876131 | 0.852181 | 0.837556 | 0.814506 | 0.770257 | 0.743450 | 0.727625 | 0.629145 | 0.605298 | 0.521175 | 0.481842 | 0.401042 | 0.324849 | 0.983948 | 0.041563 |
min | 0.000000 | -46.855047 | -63.344698 | -33.680984 | -5.560118 | -42.147898 | -23.496714 | -43.557242 | -73.216718 | -13.434066 | -24.588262 | -4.797473 | -17.769143 | -5.791881 | -19.214325 | -4.498945 | -14.129855 | -25.162799 | -9.498746 | -7.213527 | -23.646890 | -34.830382 | -10.933144 | -44.807735 | -2.822684 | -10.295397 | -2.534330 | -22.565679 | -11.710896 | -0.353229 | 0.000000 |
25% | 54126.000000 | -0.921539 | -0.601213 | -0.892838 | -0.848835 | -0.692874 | -0.769177 | -0.554220 | -0.209086 | -0.644753 | -0.535493 | -0.762852 | -0.407660 | -0.648456 | -0.425122 | -0.583616 | -0.467945 | -0.484055 | -0.498850 | -0.456800 | -0.211662 | -0.229272 | -0.544345 | -0.162021 | -0.354179 | -0.316088 | -0.327327 | -0.070864 | -0.052907 | -0.330640 | 0.000000 |
50% | 84633.500000 | 0.019705 | 0.063784 | 0.177888 | -0.017852 | -0.055832 | -0.274397 | 0.039228 | 0.021803 | -0.049633 | -0.092069 | -0.034135 | 0.137912 | -0.013416 | 0.051179 | 0.049289 | 0.067772 | -0.065113 | -0.003217 | 0.004422 | -0.062889 | -0.029045 | 0.006744 | -0.010915 | 0.040974 | 0.018014 | -0.052287 | 0.001064 | 0.011119 | -0.265271 | 0.000000 |
75% | 139334.250000 | 1.316707 | 0.802437 | 1.025529 | 0.745566 | 0.609349 | 0.397928 | 0.569638 | 0.327023 | 0.597096 | 0.458129 | 0.738143 | 0.617393 | 0.664148 | 0.493925 | 0.649589 | 0.523095 | 0.401034 | 0.500436 | 0.460367 | 0.132834 | 0.187095 | 0.531017 | 0.147503 | 0.438953 | 0.350802 | 0.241082 | 0.090491 | 0.077989 | -0.043058 | 0.000000 |
max | 172792.000000 | 2.451888 | 22.057729 | 9.382558 | 16.715537 | 34.099309 | 23.917837 | 44.054461 | 20.007208 | 15.594995 | 23.745136 | 12.018913 | 7.848392 | 4.569009 | 10.526766 | 5.825654 | 7.059132 | 9.207059 | 5.041069 | 5.572113 | 39.420904 | 27.202839 | 10.503090 | 22.528412 | 4.022866 | 7.519589 | 3.463246 | 12.152401 | 22.620072 | 78.235272 | 1.000000 |
pd.concat([X_train, y_train], axis=1)['Class'].value_counts()
Class 0 199019 1 345 Name: count, dtype: int64
pd.concat([X_test, y_test], axis=1).info()
<class 'pandas.core.frame.DataFrame'> Index: 85443 entries, 183484 to 240913 Data columns (total 31 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Time 85443 non-null float64 1 V1 85443 non-null float64 2 V2 85443 non-null float64 3 V3 85443 non-null float64 4 V4 85443 non-null float64 5 V5 85443 non-null float64 6 V6 85443 non-null float64 7 V7 85443 non-null float64 8 V8 85443 non-null float64 9 V9 85443 non-null float64 10 V10 85443 non-null float64 11 V11 85443 non-null float64 12 V12 85443 non-null float64 13 V13 85443 non-null float64 14 V14 85443 non-null float64 15 V15 85443 non-null float64 16 V16 85443 non-null float64 17 V17 85443 non-null float64 18 V18 85443 non-null float64 19 V19 85443 non-null float64 20 V20 85443 non-null float64 21 V21 85443 non-null float64 22 V22 85443 non-null float64 23 V23 85443 non-null float64 24 V24 85443 non-null float64 25 V25 85443 non-null float64 26 V26 85443 non-null float64 27 V27 85443 non-null float64 28 V28 85443 non-null float64 29 Amount 85443 non-null float64 30 Class 85443 non-null int64 dtypes: float64(30), int64(1) memory usage: 20.9 MB
pd.concat([X_test, y_test], axis=1).describe()
Time | V1 | V2 | V3 | V4 | V5 | V6 | V7 | V8 | V9 | V10 | V11 | V12 | V13 | V14 | V15 | V16 | V17 | V18 | V19 | V20 | V21 | V22 | V23 | V24 | V25 | V26 | V27 | V28 | Amount | Class | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 85443.000000 | 85443.000000 | 85443.000000 | 85443.000000 | 85443.000000 | 85443.000000 | 85443.000000 | 85443.000000 | 85443.000000 | 85443.000000 | 85443.000000 | 85443.000000 | 85443.000000 | 85443.000000 | 85443.000000 | 85443.000000 | 85443.000000 | 85443.000000 | 85443.000000 | 85443.000000 | 85443.000000 | 85443.000000 | 85443.000000 | 85443.000000 | 85443.000000 | 85443.000000 | 85443.000000 | 85443.000000 | 85443.000000 | 85443.000000 | 85443.000000 |
mean | 94847.378896 | -0.000734 | 0.006277 | 0.003574 | -0.001682 | 0.003486 | 0.000489 | 0.002030 | 0.004620 | -0.000495 | -0.003167 | 0.002424 | 0.003652 | -0.001616 | -0.000319 | -0.000751 | -0.000195 | -0.000682 | 0.000312 | -0.001144 | -0.001004 | 0.000033 | 0.000052 | 0.000602 | -0.000845 | -0.000922 | 0.000220 | 0.000062 | -0.000036 | -0.002966 | 0.001720 |
std | 47461.120548 | 1.947325 | 1.637050 | 1.515182 | 1.412908 | 1.406722 | 1.340636 | 1.262562 | 1.151291 | 1.090691 | 1.079574 | 1.022315 | 1.005413 | 0.989553 | 0.962457 | 0.913388 | 0.876542 | 0.842669 | 0.839626 | 0.812957 | 0.772484 | 0.713266 | 0.721198 | 0.613394 | 0.606464 | 0.521520 | 0.483126 | 0.409616 | 0.341987 | 1.036492 | 0.041443 |
min | 0.000000 | -56.407510 | -72.715728 | -48.325589 | -5.683171 | -113.743307 | -26.160506 | -28.215112 | -50.943369 | -9.481456 | -20.949192 | -4.568390 | -18.683715 | -3.888606 | -18.493773 | -4.391307 | -13.303888 | -22.883999 | -9.287832 | -6.938297 | -54.497720 | -22.665685 | -9.499423 | -32.828995 | -2.836627 | -8.696627 | -2.604551 | -9.793568 | -15.430084 | -0.353229 | 0.000000 |
25% | 54354.000000 | -0.916858 | -0.591858 | -0.883828 | -0.848202 | -0.688280 | -0.766664 | -0.553479 | -0.207216 | -0.638926 | -0.535400 | -0.761716 | -0.400087 | -0.648761 | -0.426516 | -0.581015 | -0.468312 | -0.483139 | -0.498660 | -0.455027 | -0.211881 | -0.226184 | -0.537704 | -0.161490 | -0.355671 | -0.319736 | -0.326068 | -0.070797 | -0.053129 | -0.331280 | 0.000000 |
50% | 84850.000000 | 0.013238 | 0.070185 | 0.185047 | -0.024109 | -0.051627 | -0.273686 | 0.042343 | 0.023782 | -0.053821 | -0.094949 | -0.029129 | 0.144948 | -0.013803 | 0.049248 | 0.045291 | 0.062957 | -0.066955 | -0.004245 | 0.002229 | -0.061529 | -0.030687 | 0.006971 | -0.011789 | 0.040976 | 0.013508 | -0.051695 | 0.001984 | 0.011561 | -0.265271 | 0.000000 |
75% | 139277.500000 | 1.313257 | 0.806615 | 1.031155 | 0.737784 | 0.618067 | 0.399864 | 0.572423 | 0.328337 | 0.597388 | 0.443126 | 0.743511 | 0.620694 | 0.657826 | 0.491916 | 0.647117 | 0.523608 | 0.396799 | 0.501455 | 0.455249 | 0.133608 | 0.184846 | 0.523689 | 0.147923 | 0.441093 | 0.350617 | 0.240657 | 0.092224 | 0.078900 | -0.047356 | 0.000000 |
max | 172788.000000 | 2.454930 | 15.876923 | 4.079168 | 16.875344 | 34.801666 | 73.301626 | 120.589494 | 18.748872 | 9.272376 | 15.331742 | 11.669205 | 4.406338 | 7.126883 | 7.439566 | 8.877742 | 17.315112 | 9.253526 | 4.712398 | 5.591971 | 38.117209 | 22.579714 | 7.220158 | 20.803344 | 4.584549 | 5.826159 | 3.517346 | 31.612198 | 33.847808 | 102.362243 | 1.000000 |
pd.concat([X_test, y_test], axis=1)['Class'].value_counts()
Class 0 85296 1 147 Name: count, dtype: int64
X_train_undersample, X_test_undersample, y_train_undersample, y_test_undersample = train_test_split(X_undersample, y_undersample, test_size = 0.3, random_state = 0)
pd.concat([X_train_undersample, y_train_undersample], axis=1).info()
<class 'pandas.core.frame.DataFrame'> Index: 688 entries, 6870 to 208266 Data columns (total 31 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Time 688 non-null float64 1 V1 688 non-null float64 2 V2 688 non-null float64 3 V3 688 non-null float64 4 V4 688 non-null float64 5 V5 688 non-null float64 6 V6 688 non-null float64 7 V7 688 non-null float64 8 V8 688 non-null float64 9 V9 688 non-null float64 10 V10 688 non-null float64 11 V11 688 non-null float64 12 V12 688 non-null float64 13 V13 688 non-null float64 14 V14 688 non-null float64 15 V15 688 non-null float64 16 V16 688 non-null float64 17 V17 688 non-null float64 18 V18 688 non-null float64 19 V19 688 non-null float64 20 V20 688 non-null float64 21 V21 688 non-null float64 22 V22 688 non-null float64 23 V23 688 non-null float64 24 V24 688 non-null float64 25 V25 688 non-null float64 26 V26 688 non-null float64 27 V27 688 non-null float64 28 V28 688 non-null float64 29 Amount 688 non-null float64 30 Class 688 non-null int64 dtypes: float64(30), int64(1) memory usage: 172.0 KB
pd.concat([X_train_undersample, y_train_undersample], axis=1).describe()
Time | V1 | V2 | V3 | V4 | V5 | V6 | V7 | V8 | V9 | V10 | V11 | V12 | V13 | V14 | V15 | V16 | V17 | V18 | V19 | V20 | V21 | V22 | V23 | V24 | V25 | V26 | V27 | V28 | Amount | Class | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 688.000000 | 688.000000 | 688.000000 | 688.000000 | 688.000000 | 688.000000 | 688.000000 | 688.000000 | 688.000000 | 688.000000 | 688.000000 | 688.000000 | 688.000000 | 688.000000 | 688.000000 | 688.000000 | 688.000000 | 688.000000 | 688.000000 | 688.000000 | 688.000000 | 688.000000 | 688.000000 | 688.000000 | 688.000000 | 688.000000 | 688.000000 | 688.000000 | 688.000000 | 688.000000 | 688.000000 |
mean | 88546.635174 | -2.443642 | 1.748210 | -3.490693 | 2.161294 | -1.466909 | -0.737723 | -2.759190 | 0.361773 | -1.222417 | -2.808144 | 1.937783 | -3.131850 | -0.001132 | -3.568854 | -0.022936 | -2.145811 | -3.365430 | -1.137238 | 0.377690 | 0.127157 | 0.446495 | 0.012945 | -0.069031 | -0.020203 | 0.031782 | 0.022154 | 0.114684 | 0.041557 | 0.036592 | 0.501453 |
std | 48529.661753 | 5.382638 | 3.616426 | 6.020391 | 3.198221 | 4.227553 | 1.829535 | 5.498995 | 4.741154 | 2.336555 | 4.417548 | 2.771137 | 4.560753 | 1.081826 | 4.641960 | 0.981683 | 3.458663 | 6.062216 | 2.462689 | 1.287256 | 1.072960 | 2.749354 | 1.143940 | 1.283882 | 0.549485 | 0.689015 | 0.474411 | 0.923161 | 0.487077 | 0.834360 | 0.500362 |
min | 117.000000 | -30.552380 | -15.799625 | -31.103685 | -3.863126 | -22.105532 | -10.261990 | -37.060311 | -37.353443 | -11.126624 | -23.228255 | -2.613374 | -18.431131 | -3.223045 | -19.214325 | -4.498945 | -13.563273 | -25.162799 | -9.498746 | -3.602657 | -7.242879 | -16.922016 | -8.887017 | -19.254328 | -2.028024 | -4.781606 | -1.214960 | -7.263482 | -2.735623 | -0.353229 | 0.000000 |
25% | 45531.000000 | -2.867222 | -0.164478 | -5.049001 | -0.212543 | -1.703845 | -1.691031 | -3.105154 | -0.220868 | -2.205996 | -4.731895 | -0.194163 | -5.643631 | -0.767631 | -6.767749 | -0.562582 | -3.612856 | -5.277726 | -1.816368 | -0.373523 | -0.197730 | -0.142520 | -0.510247 | -0.246005 | -0.373302 | -0.320463 | -0.281449 | -0.061809 | -0.050983 | -0.346113 | 0.000000 |
50% | 82526.500000 | -0.874057 | 0.984845 | -1.482880 | 1.285768 | -0.400360 | -0.741307 | -0.740952 | 0.141389 | -0.694910 | -0.981569 | 1.154879 | -0.845463 | 0.008049 | -1.132761 | 0.001558 | -0.750918 | -0.495063 | -0.392743 | 0.246478 | 0.030556 | 0.163323 | 0.076684 | -0.027143 | 0.014360 | 0.046511 | -0.026232 | 0.059798 | 0.036635 | -0.273188 | 1.000000 |
75% | 135096.750000 | 0.945582 | 2.850947 | 0.348579 | 4.166857 | 0.599892 | 0.033569 | 0.240843 | 0.919999 | 0.196633 | -0.001047 | 3.625262 | 0.163104 | 0.744021 | 0.086669 | 0.665736 | 0.219809 | 0.314206 | 0.371481 | 0.978754 | 0.443495 | 0.680597 | 0.629109 | 0.174862 | 0.382076 | 0.406056 | 0.306403 | 0.482488 | 0.235549 | 0.046539 | 1.000000 |
max | 172573.000000 | 2.335833 | 19.167239 | 3.228978 | 11.927512 | 14.103918 | 6.355986 | 5.802537 | 20.007208 | 6.816732 | 11.732926 | 12.018913 | 2.534876 | 3.091328 | 3.442422 | 2.364199 | 3.139656 | 6.739384 | 3.790316 | 5.228342 | 7.907378 | 27.202839 | 5.774087 | 5.303607 | 1.208141 | 2.208209 | 2.745261 | 3.052358 | 4.975792 | 8.146182 | 1.000000 |
pd.concat([X_train_undersample, y_train_undersample], axis=1)['Class'].value_counts()
Class 1 345 0 343 Name: count, dtype: int64
pd.concat([X_test_undersample, y_test_undersample], axis=1).info()
<class 'pandas.core.frame.DataFrame'> Index: 296 entries, 102782 to 57921 Data columns (total 31 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Time 296 non-null float64 1 V1 296 non-null float64 2 V2 296 non-null float64 3 V3 296 non-null float64 4 V4 296 non-null float64 5 V5 296 non-null float64 6 V6 296 non-null float64 7 V7 296 non-null float64 8 V8 296 non-null float64 9 V9 296 non-null float64 10 V10 296 non-null float64 11 V11 296 non-null float64 12 V12 296 non-null float64 13 V13 296 non-null float64 14 V14 296 non-null float64 15 V15 296 non-null float64 16 V16 296 non-null float64 17 V17 296 non-null float64 18 V18 296 non-null float64 19 V19 296 non-null float64 20 V20 296 non-null float64 21 V21 296 non-null float64 22 V22 296 non-null float64 23 V23 296 non-null float64 24 V24 296 non-null float64 25 V25 296 non-null float64 26 V26 296 non-null float64 27 V27 296 non-null float64 28 V28 296 non-null float64 29 Amount 296 non-null float64 30 Class 296 non-null int64 dtypes: float64(30), int64(1) memory usage: 74.0 KB
pd.concat([X_test_undersample, y_test_undersample], axis=1).describe()
Time | V1 | V2 | V3 | V4 | V5 | V6 | V7 | V8 | V9 | V10 | V11 | V12 | V13 | V14 | V15 | V16 | V17 | V18 | V19 | V20 | V21 | V22 | V23 | V24 | V25 | V26 | V27 | V28 | Amount | Class | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 296.000000 | 296.000000 | 296.000000 | 296.000000 | 296.000000 | 296.000000 | 296.000000 | 296.000000 | 296.000000 | 296.000000 | 296.000000 | 296.000000 | 296.000000 | 296.000000 | 296.000000 | 296.000000 | 296.000000 | 296.000000 | 296.000000 | 296.000000 | 296.000000 | 296.000000 | 296.000000 | 296.000000 | 296.000000 | 296.000000 | 296.000000 | 296.000000 | 296.000000 | 296.000000 | 296.000000 |
mean | 88396.587838 | -2.448419 | 1.857288 | -3.552900 | 2.336519 | -1.503755 | -0.656035 | -2.853058 | 0.086851 | -1.324446 | -2.919028 | 1.914227 | -3.106154 | -0.084562 | -3.347887 | -0.077981 | -1.984526 | -3.161909 | -1.109686 | 0.264590 | 0.289212 | 0.065582 | 0.134902 | 0.056521 | -0.077336 | 0.001963 | 0.040364 | 0.020281 | 0.058781 | 0.046845 | 0.496622 |
std | 50147.105326 | 5.812072 | 3.934323 | 6.680660 | 3.308417 | 4.389263 | 1.693893 | 6.622008 | 5.121293 | 2.451914 | 4.891517 | 2.754439 | 4.681722 | 0.986937 | 4.683458 | 1.051296 | 3.484989 | 5.826410 | 2.293910 | 1.298310 | 1.235841 | 2.862463 | 1.216935 | 0.877975 | 0.555090 | 0.650752 | 0.481822 | 1.224166 | 0.460841 | 0.892432 | 0.500835 |
min | 60.000000 | -29.876366 | -8.402154 | -30.558697 | -2.956827 | -21.665654 | -5.773192 | -43.557242 | -41.044261 | -13.434066 | -24.588262 | -2.383066 | -18.683715 | -3.076318 | -17.620634 | -3.092108 | -14.129855 | -22.541652 | -9.090892 | -3.681904 | -5.225849 | -22.797604 | -8.887017 | -5.988806 | -1.742803 | -2.079928 | -1.170476 | -7.263482 | -1.931920 | -0.353229 | 0.000000 |
25% | 45977.500000 | -2.867766 | -0.130600 | -5.417818 | -0.118496 | -1.667035 | -1.477544 | -2.835885 | -0.168935 | -2.345829 | -4.445615 | -0.144802 | -5.340188 | -0.815218 | -6.363108 | -0.729637 | -3.303237 | -5.358990 | -1.747789 | -0.563676 | -0.165023 | -0.178103 | -0.483530 | -0.212828 | -0.405811 | -0.324214 | -0.270853 | -0.056831 | -0.042639 | -0.349231 | 0.000000 |
50% | 84069.000000 | -0.740915 | 0.941852 | -1.139964 | 1.340723 | -0.369227 | -0.596589 | -0.501864 | 0.169642 | -0.696902 | -0.875521 | 1.267304 | -0.938658 | -0.060414 | -1.059352 | -0.012904 | -0.547678 | -0.527389 | -0.318904 | 0.169827 | 0.056998 | 0.130060 | 0.081904 | -0.035614 | -0.010232 | 0.068890 | 0.031911 | 0.073702 | 0.046030 | -0.300834 | 0.000000 |
75% | 135023.500000 | 0.879511 | 2.700371 | 0.394765 | 4.305361 | 0.624459 | 0.139244 | 0.306788 | 0.833392 | 0.011527 | -0.051012 | 3.542336 | 0.234752 | 0.609629 | 0.173916 | 0.685300 | 0.351119 | 0.309636 | 0.237358 | 0.948371 | 0.461180 | 0.568611 | 0.617588 | 0.200328 | 0.317653 | 0.386804 | 0.355382 | 0.395412 | 0.192766 | 0.028048 | 1.000000 |
max | 172733.000000 | 2.306769 | 22.057729 | 3.476268 | 12.114672 | 9.880564 | 6.474115 | 3.791907 | 19.587773 | 4.866316 | 6.367661 | 11.152491 | 1.725185 | 2.897044 | 2.654275 | 2.471358 | 2.696475 | 6.443649 | 2.591846 | 4.851255 | 11.059004 | 27.202839 | 8.361985 | 5.466230 | 1.077407 | 2.156042 | 1.458828 | 2.706566 | 3.042406 | 5.663610 | 1.000000 |
pd.concat([X_test_undersample, y_test_undersample], axis=1)['Class'].value_counts()
Class 0 149 1 147 Name: count, dtype: int64