top of page
realcode4you

Analyze and Prevent Retail Customers churn by Creating a Predictive Model Using Retail Bank by Assets in SEA | Realcode4you


Business Case 1 - Supervised Learning

Context

You are a data analyst working in one of the largest retail bank by assets in SEA. It is the largest payment bank in terms of transaction value.


Challenges faced

Losing existing market share to competitors

Declining year-on-year portfolio balance resulting in low profits across certain segments of retail customers


Objective

You are tasked to analyze and prevent retail customers churn by creating a predictive model to identify customers with a higher propensity to churn.


Assessment Objectives

  • Perform data preprocessing on the dataset provided

  • Perform Exploratory Data Analysis (EDA) on the preprocessed dataset

  • Implement feature selection using suitable statistical techniques

  • Train, validate and evaluate Supervised Learning models

  • Implement an optimal Supervised Learning model to address specific business needs


Dataset and Data Dictionary

File Name Description and Comments

bank_churn.csv Customers' personal and bank products information


The data dictionary for the dataset can be found in the 'Data Dictionary – Bank Churn.xlsx' file in the Data folder


Recommended Steps for Model Development

Data Preprocessing 

  • Combine different datasets

  • Missing value treatment

  • Outlier treatment

  • Encoding categorical variables

  • Balancing data based on target variable (optional)


EDA & Feature Engineering

  • Exploratory Analysis

  • Bi-variates

  • Weight of evidence

  • Feature Engineering and Selection

  • Correlation Matrix

  • VIF

  • p-values


Model Creation & Validation

  • Train-test split

  • Logistic Regression modeling using sklearn and statsmodel

  • Cross validation folds


Model Testing & Evaluation

  • Model testing

  • Evaluate model

  • Balance data (optional)


Expected output

All your work for Business Case 1 should be done in the ML_Proj_BC1.ipynb file

  • You should insert additional comments where necessary to explain the purposes of your code

  • Feel free to insert new blocks of code to achieve the objectives where necessary

  • Ensure that the entire Jupyter Notebook can be executed without any error

  • Rename the ML_Proj_BC1.ipynb file to a filename that includes your full name: e.g., ML_Proj_BC1_jack_tan.ipynb



Business Case 2 – Unsupervised Learning

Context

You are a data scientist working in a retail bank based in the Middle East, where they have been doing traditional mass marketing campaigns for years. The bank is now keen to explore the benefits of running tailored marketing campaigns for the customer base.


Challenges faced

Increasingly competitive landscape where other banks are running personalized ad campaigns using differentiated products and services


You are a data scientist working in a retail bank based in the Middle East, where they have been doing traditional mass marketing campaigns for years. The bank is now keen to explore the benefits of running tailored marketing campaigns for the customer base.

Profitability pressure from reduced utilization by existing customers.


Objective

In this discovery phase, the objective is to understand the various segments that exist in the bank's customer base, based on the customers' demographics and utilization patterns.


Datasets Available

File Name Description and Comments

Bank_customers.csv Sample data for account status of 1000 customers at a bank

 

The data dictionary for the dataset can be found in the 'Data Dictionary – Bank Customers.xlsx' file inthe Data folder


Assessment Objectives

  • Perform standard EDA process in machine learning

  • Perform customer segmentation using a suitable clustering technique

  • Use appropriate metrics to measure the performance of the clustering model

  • Evaluate the clustering model to determine model performance based on context and dataset


Assessment Objectives

Expected Output

  • All your work for Business Case 2 should be done in the ML_Proj_BC2.ipynb file

  • You should insert additional comments where necessary to explain the purposes of your code

  • Feel free to insert new blocks of code to achieve the objectives where necessary

  • Ensure that the entire Jupyter Notebook can be executed without any error

  • Rename the ML_Proj_BC2.ipynb file to a filename that includes your full name, e.g., ML_Proj_BC2_jack_tan.ipynb


Business Case 3 – LCNC Machine Learning

Context

Referring back to Business Case 1, the Chief Data Officer (CDO) of the retail bank is dissatisfied with the predictive performance of the classification model in identifying of customer churn.


Knowing that the Data Analytics team has recently adopted the Orange Data Mining platform for LCNCMachine Learning, the CDO challenged the Data Analytics team to build better machine learning models using the platform


Objective

You are tasked to create better predictive models to identify customers with a higher propensity to churn using the Orange Data Mining platform.


Dataset and Data Dictionary

File Name Description and Comments

bank_churn_preprocessed.csv

Customers' personal and bank products information

(cleaned and preprocessed)


The data dictionary for the dataset can be found in the 'Data Dictionary – Bank ChurnPreprocessed.xlsx' file in the Data folder.


Recommended Steps for LCNC ML Model Development

Data Preparation

  • Load data

  • Update target column

  • Train-test split

 

Model Creation & Finetuning

Train and finetune:

  1. Logistic Regression model (balanced class distribution)

  2. Random Forest model (balanced class distribution)

  3. Gradient Boosting model


Model Testing & Scoring

  • Cross validation folds (stratified)

  • Test on training data

  • Test on testing data


Model Comparison & Evaluation

  • Confusion Matrix

  • ROC Analysis

0 views0 comments

Comments


bottom of page