Credit Cards

Different factors that affect credit card applications

Author

Skillful Togepi by Gabriel Godoy, Aidan O’Connor, Maddie Cho, William Xing, Melika Khoshneviszadeh

Published

May 5, 2023

Introduce the topic and motivation

When we were deciding what type of dataset to evaluate, we were looking for a dataset that was pertinent to our daily lives. As college students who are soon to be entering the workforce, we are going to have to deal with financial topics that we might not have considered: applying for a credit card, investing money, buying equities, etc. After looking at several datasets, we decided that a dataset that we found on Kaggle, which is called, “Credit Card Approval Prediction” was interesting and pertinent to our goals. The questions that we are trying to answer are the following:

  • How do factors such as gender, age, degree status, job, etc. affect whether an individual pays their credit card in time?

  • Is there a correlation between paying off credit cards and age?

Introduce the data

Who created the data set?

  • This dataset was created by Seanny Song a data engineer based in Washington DC- Baltimore area, and he collected this data from a confidential bank to use for a machine learning model to predict whether credit card applications were gonna get approved based on different factors

What does it contain?

  • It contained two data sets which we merged and cleaned it to remove duplicates and variables we did not need: One for general information about an applicant, and the second for the status of their credit card payments.

  • The first: Each individual row is a credit card applicant that is assigned a unique ID to maintain private information confidential. The columns are categorical and numerical variables that represent information about the applicant. For example, their income, their gender, their marital status, etc.

  • The second: Contained information whether an applicant was on time for their credit card payments or if they were overdue.

Highlights from EDA

Research question 1: How do factors such as gender, age, degree status, job, etc. affect whether an individual pays their credit card in time?

── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.4.0     ✔ purrr   1.0.0
✔ tibble  3.2.1     ✔ dplyr   1.1.2
✔ tidyr   1.2.1     ✔ stringr 1.5.0
✔ readr   2.1.3     ✔ forcats 0.5.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
── Attaching packages ────────────────────────────────────── tidymodels 1.0.0 ──

✔ broom        1.0.2     ✔ rsample      1.1.1
✔ dials        1.1.0     ✔ tune         1.1.1
✔ infer        1.0.4     ✔ workflows    1.1.2
✔ modeldata    1.0.1     ✔ workflowsets 1.0.0
✔ parsnip      1.0.3     ✔ yardstick    1.1.0
✔ recipes      1.0.6     

── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
✖ scales::discard() masks purrr::discard()
✖ dplyr::filter()   masks stats::filter()
✖ recipes::fixed()  masks stringr::fixed()
✖ dplyr::lag()      masks stats::lag()
✖ yardstick::spec() masks readr::spec()
✖ recipes::step()   masks stats::step()
• Use tidymodels_prefer() to resolve common conflicts.

Rows: 438557 Columns: 18
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (8): CODE_GENDER, FLAG_OWN_CAR, FLAG_OWN_REALTY, NAME_INCOME_TYPE, NAME...
dbl (10): ID, CNT_CHILDREN, AMT_INCOME_TOTAL, DAYS_BIRTH, DAYS_EMPLOYED, FLA...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 1048575 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): STATUS
dbl (2): ID, MONTHS_BALANCE

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Hypothesis Test

Is there a correlation between paying off credit cards and age?

Our group was interested in whether or not there is a correlation between paying off a credit card and a persons age.

Null Hypothesis: There is no relationship between age and paying credit card bills on time.

\[ H_o : \mu_ {OnTime} = \mu_{OverDue} \]

Alternative Hypothesis: There is a relationship between age and paying credit card bills on time.

\[ H_o = \mu_ {OnTime} \neq \mu_{OverDue} \]

Logistic Regression Equation:

\[ \log\Big(\frac{p}{1-p}\Big) = 0.665 - 0.002 \times new\_age\_years \]

Model


Call:
glm(formula = status ~ new_age_years, family = binomial, data = credit_app)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.4539  -1.4314   0.9317   0.9406   0.9546  

Coefficients:
                Estimate Std. Error z value Pr(>|z|)    
(Intercept)    0.6649905  0.0450988  14.745   <2e-16 ***
new_age_years -0.0016718  0.0009928  -1.684   0.0922 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 43125  on 33109  degrees of freedom
Residual deviance: 43122  on 33108  degrees of freedom
AIC: 43126

Number of Fisher Scoring iterations: 4
`geom_smooth()` using formula = 'y ~ x'
Warning in eval(family$initialize): non-integer #successes in a binomial glm!

Conclusions + future work

Our conclusion

There is not enough evidence to conclude that there is a correlation between the age of an applicant and if they will pay their credit off on time. We cannot conclude that the older a person is, the more likely they are to pay their credit on time due to the fact that the p-value is higher than 0.05 (making it not statistically significant); therefore, we fail to reject the null.

Future Work

Possible future work could be looking at other banks from other parts of the United States to see if there is any correlation between how these different factors contribute to a credit card applicant and their credit card payment status.