Last week, we had published a post on the overview of the challenges faced in Insurance Sector. In this post, we will be going deep into the model that we have developed for Insurance providers to address the pressing problem of Customer Churn.
Background
Customers are the lifeline of a business. Customer Acquisition and Customer Retention are two metrics mainly used to determine the return on investment (ROI) on the marketing and sales efforts of the company. Customer acquisition is a costly process, but if done right, it can lead to great rewards!.
Most sources say that costs to acquire new customer is between 5% - 25 % than to keep an existing one.
After a customer is acquired, retaining that customer is a big challenge that insurance providers are struggling with today.
What is Customer Churn ?
Churn prediction is one of the well known problem in the Customer Relationship Management (CRM) (click to know more) and Marketing fields. A churner is a user or customer who stops using a company’s products or services. In this case, a churner is a policyholder who terminates an insurance policy prematurely.
What is the solution ?
The good news is that we live in the data age and have great tools at our disposal to help answer these questions. In this blog, we will discuss a sample case study to show how one can harness the power of Machine Learning to predict churn and take proactive measures to prevent it systematically. This is done by creating strategic touch-points for engaging with the churn-prone customer. .
If you think about this problem, you will find it to be a binary classification (click to know more) problem. Wherein, you have to predict whether a customer will churn (Y) or not based on a set of Features (X). We will try to solve this problem statement using Decision Trees and Random Forest (click to know more).
Churn Dataset
Let’s say that the customer churn data for a hypothetical insurance company has the following attributes of a policy-holder:
- Policy Identification number - Unique identification number given to policyholder with respect to some policy. (eg. 12221212)
- Date of Birth - Date of Birth of the policyholder. (eg. 03/01/1994)
- Marital Status - Marital Status of the policyholder. (eg. Married, Unmarried, etc)
- Education - Education degree that policyholder holds. (eg. Masters, Bachelors, etc)
- Annual Income - Annual Income of the policyholder. (eg. 120000)
- Annual Premium - Annual Premium of the policyholder. (eg. 5500)
- Gender - Gender of the policyholder. (eg. Male or Female)
- Height - Height of the policyholder. (eg. 170 cm)
- Weight - Weight of the policyholder. (eg. 75 Kg)
- Transactions - Dates on which the transactions were made by policyholder
- Churn - Current churn status of the policyholder. (eg. True or False)
Our Approach
The flowchart shown below depicts a typical data analysis workflow adopted for this study. We ingested the data from raw CSV files present in a distributed file system. In the preparation step, we did basic data type conversions and identified the target fields for this use case.
Once the ‘attributes’ or ‘fields’ were identified, the next step was Feature Selection (click to know more), where we combined two or more attributes to represent a ‘feature’ or ‘characteristic’ of the policyholder. For the use case, we identified these features:
- BMI (Body Mass Index) from Height and Weight.
- Age from Date of Birth.
- Loyalty from Transactions. (Loyalty is calculated from RFMC Analysis (click to know more) which is beyond the scope of this blog)
With the above fields, we reduced our features from 9 to 8. The data in these attributes was of mixed type, like strings, ordinals, cardinals (click to know more). We resolved this by encoding them into integers such that order is preserved. Luckily, we did not have any missing values in the dataset. But, if those would have occurred then, we would have generated the co-occurrence relations (click to know more) for the same and replaced it with the most similar one from the other client.
Once the features and all the missing values were identified, we were ready to train our model. In splitting phase, we chunked data into 2 sets, namely- training and testing. Training data was used for training the model, whereas, testing data was used to predict and score the model. We experimented with two algorithms Decision Trees and Random Forests (click to know more). Hyperparameter optimization (click to know more) for the model was done using Grid Search (click to know more) algorithm. We ran an exhaustive search over few important parameters for each of the algorithms, like- number of trees and maximum depth. The model was saved for the pair of parameters that gave the best result. When real-life data was fed into our model and run for predictions, the resulting predictive analysis was 75% accurate.
In this use case, the model that outperformed rest was Random Forest Classifier with 100 Trees and nodes expanded until all leaves are pure or contained less than 2 samples. We are now working to improve it further by testing new algorithms and by deriving few other important features from the dataset.