Predict your churn

with Shapelets

Adrián Carrio

08 April 2022 | 5 minutes

The Case: Predict your churn

Customer churn is one of the most relevant metrics to businesses, informing about how good the company is at retaining customers. In this case study, we aim to obtain insights into the factors involved in the decision of a bank customer to churn and to build accurate, explainable models in order to make predictions about churn that allow anticipating churn before it actually happens.

With this use case, we present a toy example of the construction of a simple data analysis solution aimed toward behavior analysis using Shapelets. In particular, we are interested in the factors that influence the decision of churning in bank customers.

While this case study focuses on bank customers, the following approach is applicable to any other sector involving customer retention, as long as large customer databases are available. Furthermore, while churning reduction is the objective of this use case, other objectives could be achieved, such as live marketing strategies, obtaining an understanding of customer habits to improve service quality based on demand, or reducing product failures based on user profiles.

Intro

The use case is based on a dataset containing customer information from 10k anonymized bank customers which contains 20 features commonly available, involving demographic, customer relationship and transactional information. Some of the customers in this dataset have already churned and this information is used as ground truth to try to figure out what differentiates churning from non-churning customers and build a model that can execute this classification task minimizing the classification errors.

Furthermore, the models obtained in this study can provide the churn probability for a given customer. This is great as it allows to prioritize actions on highly probable churning customers, for example providing them with special discounts or promotions.

The use case is organized as follows. First, a high-level dataset review is performed to understand the data available and its quality. Then, an exploratory data analysis (EDA) is performed in order to quickly discover relevant features or engineer them. Next comes the data modelling stage, in which predictive models are built and the performance of the model on new, unseen data is estimated. Finally, we obtain the most relevant conclusions from the analysis.

 

The Challenge

Several challenges arise in this case study, some of which are quite common to many data science studies:

Working with datasets when limited background information is available.However, this should not be the case in real applications.

Dealing with missing data or in general with datasets that have been produced without consideration to posterior data analysis processes.

Identify biases in the data, allowing to distinguish churning from non-churning customers, which can be used to filter the relevant features to be used in predictive models.

Understand and choose the right metrics for the specific problem being solved.

Understand, select, train and use predictive models efficiently. However, this should not be the case in real applications.

Come up with useful insights for the business and help prioritize customer-related activities and their targets.

Methodology

The methodology to predict churn is based on three main steps commonly followed in Data Science studies:

A high-level dataset review to understand the data available and its quality. Here, three tasks are performed: learning which features and labels are available, discovering missing features and learning about the characteristics of the features to see if they are binary or categorical.

An exploratory data analysis (EDA) in order to quickly discover important features or engineer them. In this case, we are simply visualizing each of the relevant features using the right plot according to their nature in order to learn if there is a bias in that feature when the customer churns.

A data modelling stage, in which predictive models are built and the performance of the model on new, unseen data is estimated. In this example, we simply split the data set into train/test sets to train and evaluate three state-of-the-art models with an arbitrary choice of hyperparameters. A more elaborated approach to guarantee correct model generalization and to obtain reliable classification metrics would involve considering a validation dataset or using some cross-validation procedure in order to select the best model and its hyperparameters.

 

metrics

Since the problem is posed in the form of a classification problem, the chosen metrics are precision and recall, which are defined next:

Recall or True Positive Rate (TPR) – The number of predicted positives that are actual positives, divided by the number of actual positives.

Precision or Positive Predictive Value (PPV) – The number of predicted positives that are actual positives, divided by the number of predicted positives.

Another relevant metric is the probability of false alarm or False Positive Rate (FPR) – The number of false positives divided by the number of negatives.

Recall is more relevant in this case as it penalizes the wrong classification of actual positives. A model may consider many or even all the samples as positives and thus obtain a precision as high as desired, but in order to make sure the right customers are addressed, the number of correct guesses should be compared against the number of actual positives. This is exactly what recall does.

The receiver operating characteristics (ROC) curve is another common way of visualizing the performance of classification models. It helps visualize the different ways in which a model can be used to provide a more or less conservative behaviour in the predictions, helping to define the right trade-off between the number of predicted positives and the probability of false alarms. This allows for choosing the right model threshold.

Finally, the confusion matrix is a very straightforward way of visualizing classification performance once the model threshold has been chosen. It basically summarizes the classification responses against the ground truth data.

Synthesized Resolution

An immediate indicator to obtain from the dataset is that about 24% of the customers have churned. This is a static figure, which could and should be computed periodically to monitor how it is affected by the actions of the company:

grid of grey points background

Relevant information can be easily concluded just by drawing adequate plots of the available data. In the next figure, for example, it can be quickly deduced that most churning customers have credit cards and that the bank has a very large amount of inactive customers. Again, these metrics could be monitored frequently to try to learn more about their drivers.

grid of grey points background

As a conclusion of the use-case, we can obtain a model capable of classifying any customer, old or new, and providing a probability of churn. With this probability, the customers most likely to churn can be immediately addressed with the right retention strategy in order to revert the possible churn. Of course, the model will make mistakes in its predictions, but overall it does pretty good, as can be observed in the following confusion matrix: when the model predicts that a customer will not churn it only gets it wrong in 7% of customers, and it is already able to remove more than 70% of the customers from the analysis, letting the company focus in those more likely to churn.

grid of grey points background

With Shapelets relevant metrics/KPIs can be monitored frequently and insights like the aforementioned ones can be instantly and seamlessly shared from the data scientist to all relevant departments in the organization.

results

Several interesting results arise from this study:

The first result that is obtained is probably already available since it is quite straightforward to obtain: the churning rate. In this example, about 24% of the customers have churned.

One can discover issues in international business branches, by comparing the churning ratios across countries. In this case, the churning ratio remains constant across countries.

The overall proportion of inactive members is quite high suggesting that the bank may need a program implemented to turn this group into active customers.

Customers with extreme salaries churn more.

With regard to the tenure, churning is less common among customers that have been with the bank for several years. An effort in retention during the first 2-3 years could reduce churning.

Random forest appears to be a good model for this classification problem. However, the use of validation techniques is recommended in order to select the best type of model and its hyperparameters.

The best model obtained the following metrics: a precision of 50% (half of the predicted churning customers actually churn), a recall of around 21% (this fraction of the churning customers can be correctly classified) and a false positive rate of 7% is obtained (7% of the customers that the model believes will not churn actually do churn).

How does Shapelets help solve this challenge?

Data App development in less than 30 min.

Shapelets is great for solving data science and data analysis problems and for easily sharing across the organization the solutions produced. The access to databases and distributed processing is immediate, seamless and fully scalable. No skills in web development or development operations are needed in order to come up with fully-featured data apps and to effortlessly share them across the organization. For building use cases, the user does not need to learn new ways to solve data science problems, since Shapelets relies on several native tools commonly used by data scientists. In this particular use case, we rely on matplotlib and seaborn for visualization and scikit learn for machine learning.

 

Click here to download our free eBook and give us some feedbacK!
Adrián Carrio

Adrián Carrio

Leader Data Scientist

Adrián Carrio received his degree in Industrial Engineering from the University of Oviedo and his PhD in Automation and Robotics (Cum Laude) from the Technical University of Madrid. He has also worked as a researcher in Arizona State University and the Massachusetts Institute of Technology.