With this use case, we present a toy example of the construction of a simple data analysis solution aimed towards behaviour analysis using Shapelets. In particular, we are interested in the factors that influence the decision of churning in bank customers.
See the full dataset
Customer churn is one of the most relevant metrics to businesses, informing about how good the company is at retaining customers. In this case study we aim to obtain insights about the factors involved in the decision of a bank customer to churn and to build accurate, explainable models in order to make predictions about churn that allow to anticipate churn before it actually happens.
While this case study focuses on bank customers, the following approach is applicable to any other sector involving customer retention, as long as large customer databases are available. Furthermore, while churning reduction is the objective of this use case, other objectives could be achieved, such as: live marketing strategies, obtaining understanding of customer habits to improve service quality based on demand or to reduce product failures based on usage profiles.
The use case is based on a dataset containing customer information from 10k anonymized bank customers which contains 20 features commonly available, involving demographic, customer relationship and transactional information. Some of the customers in this dataset have already churned and this information is used as ground truth to try to figure out what differentiates churning from non-churning customers and build a model that can execute this classification task minimizing the classification errors.
Furthermore, the models obtained in this study can provide the churn probability for a given customer. This is great as it allows to prioritize actions on highly probable churning customers, for example providing them with special discounts or promotions.
The use case is organized as follows. First, a high-level dataset review is performed to understand the data available and its quality. Then, an exploratory data analysis (EDA) is performed in order to quickly discover relevant features or engineer them. Next comes the data modelling stage, in which predictive models are built and the performance of the model on new, unseen data is estimated. Finally, the most relevant conclusions from the analysis are