Business Case 1 - Supervised Learning
Context
You are a data analyst working in one of the largest retail bank by assets in SEA. It is the largest payment bank in terms of transaction value.
Challenges faced
Losing existing market share to competitors
Declining year-on-year portfolio balance resulting in low profits across certain segments of retail customers
Objective
You are tasked to analyze and prevent retail customers churn by creating a predictive model to identify customers with a higher propensity to churn.
Assessment Objectives
Perform data preprocessing on the dataset provided
Perform Exploratory Data Analysis (EDA) on the preprocessed dataset
Implement feature selection using suitable statistical techniques
Train, validate and evaluate Supervised Learning models
Implement an optimal Supervised Learning model to address specific business needs
Dataset and Data Dictionary
File Name Description and Comments
bank_churn.csv Customers' personal and bank products information
The data dictionary for the dataset can be found in the 'Data Dictionary – Bank Churn.xlsx' file in the Data folder
Recommended Steps for Model Development
Data Preprocessing
Combine different datasets
Missing value treatment
Outlier treatment
Encoding categorical variables
Balancing data based on target variable (optional)
EDA & Feature Engineering
Exploratory Analysis
Bi-variates
Weight of evidence
Feature Engineering and Selection
Correlation Matrix
VIF
p-values
Model Creation & Validation
Train-test split
Logistic Regression modeling using sklearn and statsmodel
Cross validation folds
Model Testing & Evaluation
Model testing
Evaluate model
Balance data (optional)
Expected output
All your work for Business Case 1 should be done in the ML_Proj_BC1.ipynb file
You should insert additional comments where necessary to explain the purposes of your code
Feel free to insert new blocks of code to achieve the objectives where necessary
Ensure that the entire Jupyter Notebook can be executed without any error
Rename the ML_Proj_BC1.ipynb file to a filename that includes your full name: e.g., ML_Proj_BC1_jack_tan.ipynb
Business Case 2 – Unsupervised Learning
Context
You are a data scientist working in a retail bank based in the Middle East, where they have been doing traditional mass marketing campaigns for years. The bank is now keen to explore the benefits of running tailored marketing campaigns for the customer base.
Challenges faced
Increasingly competitive landscape where other banks are running personalized ad campaigns using differentiated products and services
You are a data scientist working in a retail bank based in the Middle East, where they have been doing traditional mass marketing campaigns for years. The bank is now keen to explore the benefits of running tailored marketing campaigns for the customer base.
Profitability pressure from reduced utilization by existing customers.
Objective
In this discovery phase, the objective is to understand the various segments that exist in the bank's customer base, based on the customers' demographics and utilization patterns.
Datasets Available
File Name Description and Comments
Bank_customers.csv Sample data for account status of 1000 customers at a bank
The data dictionary for the dataset can be found in the 'Data Dictionary – Bank Customers.xlsx' file inthe Data folder
Assessment Objectives
Perform standard EDA process in machine learning
Perform customer segmentation using a suitable clustering technique
Use appropriate metrics to measure the performance of the clustering model
Evaluate the clustering model to determine model performance based on context and dataset
Assessment Objectives
Expected Output
All your work for Business Case 2 should be done in the ML_Proj_BC2.ipynb file
You should insert additional comments where necessary to explain the purposes of your code
Feel free to insert new blocks of code to achieve the objectives where necessary
Ensure that the entire Jupyter Notebook can be executed without any error
Rename the ML_Proj_BC2.ipynb file to a filename that includes your full name, e.g., ML_Proj_BC2_jack_tan.ipynb
Business Case 3 – LCNC Machine Learning
Context
Referring back to Business Case 1, the Chief Data Officer (CDO) of the retail bank is dissatisfied with the predictive performance of the classification model in identifying of customer churn.
Knowing that the Data Analytics team has recently adopted the Orange Data Mining platform for LCNCMachine Learning, the CDO challenged the Data Analytics team to build better machine learning models using the platform
Objective
You are tasked to create better predictive models to identify customers with a higher propensity to churn using the Orange Data Mining platform.
Dataset and Data Dictionary
File Name Description and Comments
bank_churn_preprocessed.csv
Customers' personal and bank products information
(cleaned and preprocessed)
The data dictionary for the dataset can be found in the 'Data Dictionary – Bank ChurnPreprocessed.xlsx' file in the Data folder.
Recommended Steps for LCNC ML Model Development
Data Preparation
Load data
Update target column
Train-test split
Model Creation & Finetuning
Train and finetune:
Logistic Regression model (balanced class distribution)
Random Forest model (balanced class distribution)
Gradient Boosting model
Model Testing & Scoring
Cross validation folds (stratified)
Test on training data
Test on testing data
Model Comparison & Evaluation
Confusion Matrix
ROC Analysis
Comments