Jovian
⭐️
Sign In

Project : Customer Churn Prediction in Telecom Industry

Paluvadi Surya Vamsi | paluvadisurya@gmail.com | 8110020188

Description

  • Telecom Churn (also called as Attrition) is one of the new sensations that happened to Indian Market. Most of the telecom customers are shifting their base network to another network provider. Now, The big companies are having a hard hit to defend this situation since a significant customer base already opted out of their network. Now, the management has to find the reason, and their further steps because getting back their customers are the most challenging task, and also companies may lose a higher amount of revenue.

01 - Problem Statement

  • To provide an insight about telecom churn to telecom industries and also to prepare a predictive model to calculate customer churn.

02 - Business Understanding & Business Solutions

Latest News Articles:

  • Vodafone: Vodafone Idea posted consolidated loss of Rs 4,881.9 crore for the fourth quarter ended March 31, 2019 amid a brutal tariff war in India's mobile market. The loss narrowed from Rs 5,004.6 crore during the third quarter of 2018-19.
  • Airtel : The consolidated net profit was down 73.5 per cent for the reported period to touch Rs 970 million against Rs 3.67 billion last year. Total revenues were also down by 8.6 per cent to Rs 200 billion for the June 2018 quarter, compared to Rs 219 billion last year.

Business Understanding

  • Telecom Companies such as Airtel, Vodafone, Idea, Reliance etc., has ruled the indian market till 2016. In 2016 Reliance Owner Mukesh Ambani introduced Jio into the indian world. That made huge impact in customers churn.
  • Vodafone, Airtel and Idea combinedly lost almost 7 Billion Indisn Rupees of market.
  • Customers are very important for product based companies because their revenue relies on the price paid from customers.

Procedure to find the solution

  • Explore the data and analyze carefully.
  • Replace the missing values and obtain a structured dataset.
  • Grab insights from the data, such as relations between different services, gender churn rate.
  • Prepare a normalized/scaled data.
  • Train a model with various scikit learn machine learning algorithms.
  • Evaluate the performance of model based on important parameters such as f1_Score, confusion matrices.

03 - Data Overview

In [1]:
# Import libraries for Data Analysis
import warnings
warnings.filterwarnings('ignore')
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os
In [2]:
print(os.listdir())
['.ipynb_checkpoints', 'ChurnAnalysis.ipynb', 'Data', 'Kaggle.ipynb', 'Pictures']
In [3]:
# Load the saved CSV file and save it as an instance
df = pd.read_csv('Data/Churn.csv')
In [4]:
# Look at the samples of data
df.head()
Out[4]:
In [5]:
# Drop Duplicate data
df.drop_duplicates(inplace=True)
In [6]:
df.columns
Out[6]:
Index(['customerID', 'gender', 'SeniorCitizen', 'Partner', 'Dependents',
       'tenure', 'PhoneService', 'MultipleLines', 'InternetService',
       'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport',
       'StreamingTV', 'StreamingMovies', 'Contract', 'PaperlessBilling',
       'PaymentMethod', 'MonthlyCharges', 'TotalCharges', 'Churn'],
      dtype='object')
In [7]:
df.tail()
Out[7]:
In [8]:
df.sample(3)
Out[8]:
In [9]:
# Tenure, SeniorCitizen, Monthly Charges, Total Charges are numerical values from our dataset.
In [10]:
# Information of our dataframe to identify DataType, Missing Values Status
df.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 7043 entries, 0 to 7042 Data columns (total 21 columns): customerID 7043 non-null object gender 7043 non-null object SeniorCitizen 7043 non-null int64 Partner 7043 non-null object Dependents 7043 non-null object tenure 7043 non-null int64 PhoneService 7043 non-null object MultipleLines 7043 non-null object InternetService 7043 non-null object OnlineSecurity 7043 non-null object OnlineBackup 7043 non-null object DeviceProtection 7043 non-null object TechSupport 7043 non-null object StreamingTV 7043 non-null object StreamingMovies 7043 non-null object Contract 7043 non-null object PaperlessBilling 7043 non-null object PaymentMethod 7043 non-null object MonthlyCharges 7043 non-null float64 TotalCharges 7043 non-null object Churn 7043 non-null object dtypes: float64(1), int64(2), object(18) memory usage: 1.2+ MB
  • TotalCharges column data is in Object Datatype instead of numerical
In [11]:
# Statistics of numerial data of our data frame
df.describe()
Out[11]:
In [12]:
# Insight of our data inclusing object data types
df.describe(include=['O'])
Out[12]:
In [13]:
# Null Values
plt.figure(figsize=(16,4))
plt.xticks(rotation = 90)
sns.barplot(df.isnull().sum().index, df.isnull().sum().values)
Out[13]:
<matplotlib.axes._subplots.AxesSubplot at 0x1901510a438>
Notebook Image
In [14]:
# Look at the null values using Seaborn plot
plt.figure(figsize=(16,6))
sns.heatmap(df.isnull(), cmap='Blues', yticklabels=False)
Out[14]:
<matplotlib.axes._subplots.AxesSubplot at 0x190156166a0>
Notebook Image
  • We can see that no null values are available in our data set

04 - Data Cleaning

In [15]:
# Rename the columns for convinience
df.columns = ['CustomerID', 'Gender', 'SeniorCitizen', 'Partner', 'Dependents',
              'Tenure', 'PhoneService', 'MultipleLines', 'InternetService',
              'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport',
              'StreamingTV', 'StreamingMovies', 'Contract', 'PaperlessBilling',
              'PaymentMethod', 'MonthlyCharges', 'TotalCharges', 'Churn']
In [16]:
def object2Numerical(x):
    if x == " ":
        return np.nan
    else:
        return float(x)

df['TotalCharges'] = df['TotalCharges'].apply(lambda x: object2Numerical(x))
df.dropna(axis=0, inplace=True)
In [17]:
# Drop Customer ID column
df.drop(labels=['CustomerID'], axis=1, inplace=True)
In [18]:
for i in df.columns:
    print('#'*50)
    print(df[i].value_counts())
    print("#"*50)
################################################## Male 3549 Female 3483 Name: Gender, dtype: int64 ################################################## ################################################## 0 5890 1 1142 Name: SeniorCitizen, dtype: int64 ################################################## ################################################## No 3639 Yes 3393 Name: Partner, dtype: int64 ################################################## ################################################## No 4933 Yes 2099 Name: Dependents, dtype: int64 ################################################## ################################################## 1 613 72 362 2 238 3 200 4 176 71 170 5 133 7 131 8 123 70 119 9 119 12 117 10 116 6 110 13 109 68 100 15 99 11 99 67 98 18 97 69 95 24 94 22 90 66 89 35 88 17 87 23 85 16 80 64 80 52 80 ... 29 72 20 71 53 70 62 70 41 70 32 69 47 68 51 68 54 68 50 68 58 67 49 66 57 65 37 65 31 65 43 65 34 65 42 65 40 64 48 64 33 64 55 64 21 63 45 61 59 60 38 59 28 57 39 56 44 51 36 50 Name: Tenure, Length: 72, dtype: int64 ################################################## ################################################## Yes 6352 No 680 Name: PhoneService, dtype: int64 ################################################## ################################################## No 3385 Yes 2967 No phone service 680 Name: MultipleLines, dtype: int64 ################################################## ################################################## Fiber optic 3096 DSL 2416 No 1520 Name: InternetService, dtype: int64 ################################################## ################################################## No 3497 Yes 2015 No internet service 1520 Name: OnlineSecurity, dtype: int64 ################################################## ################################################## No 3087 Yes 2425 No internet service 1520 Name: OnlineBackup, dtype: int64 ################################################## ################################################## No 3094 Yes 2418 No internet service 1520 Name: DeviceProtection, dtype: int64 ################################################## ################################################## No 3472 Yes 2040 No internet service 1520 Name: TechSupport, dtype: int64 ################################################## ################################################## No 2809 Yes 2703 No internet service 1520 Name: StreamingTV, dtype: int64 ################################################## ################################################## No 2781 Yes 2731 No internet service 1520 Name: StreamingMovies, dtype: int64 ################################################## ################################################## Month-to-month 3875 Two year 1685 One year 1472 Name: Contract, dtype: int64 ################################################## ################################################## Yes 4168 No 2864 Name: PaperlessBilling, dtype: int64 ################################################## ################################################## Electronic check 2365 Mailed check 1604 Bank transfer (automatic) 1542 Credit card (automatic) 1521 Name: PaymentMethod, dtype: int64 ################################################## ################################################## 20.05 61 19.85 44 19.90 44 19.95 44 19.65 43 19.70 42 20.00 42 19.55 40 20.15 40 19.75 39 20.35 38 20.25 38 19.80 38 20.10 37 19.60 37 20.20 35 19.50 32 19.40 31 20.45 31 20.40 30 19.45 28 20.50 28 20.30 28 20.55 27 19.35 25 19.30 24 20.65 23 25.25 20 25.10 19 19.25 18 .. 34.85 1 66.00 1 30.85 1 45.50 1 23.50 1 56.50 1 111.35 1 92.65 1 48.45 1 113.45 1 21.45 1 103.35 1 40.45 1 34.60 1 78.25 1 48.20 1 86.95 1 112.90 1 23.05 1 67.05 1 76.30 1 93.05 1 101.70 1 77.30 1 64.25 1 92.35 1 35.60 1 72.85 1 67.70 1 113.30 1 Name: MonthlyCharges, Length: 1584, dtype: int64 ################################################## ################################################## 20.20 11 19.75 9 19.65 8 20.05 8 19.90 8 45.30 7 19.55 7 19.45 6 20.15 6 20.25 6 20.30 5 20.45 5 69.95 4 69.90 4 19.85 4 69.60 4 44.40 4 69.65 4 19.50 4 19.20 4 19.95 4 49.90 4 20.50 4 20.35 4 70.60 4 44.00 4 50.15 4 19.40 4 75.30 4 74.70 4 .. 1971.15 1 1793.25 1 1212.85 1 1504.05 1 1319.95 1 572.20 1 5958.85 1 1673.80 1 958.15 1 7610.10 1 92.05 1 2666.75 1 1001.50 1 7262.00 1 4740.00 1 96.45 1 4483.95 1 5574.75 1 44.65 1 2688.45 1 1374.35 1 3107.30 1 248.95 1 607.30 1 1534.05 1 1066.15 1 249.95 1 8333.95 1 7171.70 1 1024.00 1 Name: TotalCharges, Length: 6530, dtype: int64 ################################################## ################################################## No 5163 Yes 1869 Name: Churn, dtype: int64 ##################################################
Observations
  • Multiple columns have a column consists of Yes, No, No phone Service, No Internet Service in thier values.
  • Both No and No phone service stands for the same, let's convert those column data to new values
  • Convert TotalCharges column data into Numerical Data
In [19]:
#1 Coluns with different no values
replace_no = ['OnlineSecurity', 'OnlineBackup', 'DeviceProtection',
              'TechSupport', 'StreamingTV', 'StreamingMovies']
def replaceNo(x):
    if x == 'No internet service':
        return 'No'
    else:
        return x
for i in replace_no:
    df[i] = df[i].apply(lambda x: replaceNo(x))

def replaceNoLines(x):
    if x == 'No phone service':
        return 'No'
    else:
        return x
df['MultipleLines'] = df['MultipleLines'].apply(lambda x: replaceNoLines(x))
In [20]:
# Create two seperate variables with Churn & Non Churn customers for clear analysis
churnedCustomers = df[df['Churn'] == 'Yes']
existingCustomers = df[df['Churn'] == 'No']
  • Now we have very clean data available, Now, dig in to Data Visualization

05 - Data Visualization

In [21]:
flatui = [ "#3498db", "#f76262" ,"#95a5a6", "#293462", "#ea7dc7", "#2ecc71"]
sns.palplot(sns.color_palette(flatui))
sns.set_palette(flatui)
Notebook Image
Churned Customers
In [22]:
# Churned Customer Analysis
plt.figure(figsize=(16,6))
colors = ['#de356a', '#614ad3']
plt.subplot(121)
plt.title('Customers Attrition')
plt.pie(df['Churn'].value_counts().values, labels=df['Churn'].value_counts().index,
        colors=colors, autopct='%.1f%%', startangle=90)
plt.subplot(122)
plt.title('Customers Attrition')
sns.countplot(df['Churn'])
plt.show()
Notebook Image
Customers Attrition status among various atributes
In [23]:
df.columns
Out[23]:
Index(['Gender', 'SeniorCitizen', 'Partner', 'Dependents', 'Tenure',
       'PhoneService', 'MultipleLines', 'InternetService', 'OnlineSecurity',
       'OnlineBackup', 'DeviceProtection', 'TechSupport', 'StreamingTV',
       'StreamingMovies', 'Contract', 'PaperlessBilling', 'PaymentMethod',
       'MonthlyCharges', 'TotalCharges', 'Churn'],
      dtype='object')
In [24]:
atts = ['Gender', 'SeniorCitizen', 'Partner', 'Dependents', 'PhoneService',
        'MultipleLines', 'InternetService', 'OnlineSecurity','OnlineBackup',
        'DeviceProtection', 'TechSupport', 'StreamingTV','StreamingMovies',
        'Contract', 'PaperlessBilling', 'PaymentMethod']
plt.figure(figsize=(16,60))
for i in range(len(atts)):
    plt.subplot(8,2,i+1)
    plt.title('Customer Attrition with respect to '+atts[i])
    sns.countplot(df['Churn'], hue=df[atts[i]])
plt.show()
Notebook Image
Customer Attrition in Tenure
In [25]:
plt.figure(figsize=(12,4))
sns.distplot(df['Tenure'])
Out[25]:
<matplotlib.axes._subplots.AxesSubplot at 0x19017b26b38>
Notebook Image
In [26]:
# Let's seperate tenure of our customers into 4 Categories
def TenureGroups(x):
    if 0<x<20:
        return 'Silver'
    elif 20<= x < 40:
        return 'Gold'
    elif 40<= x < 60:
        return 'Platinum'
    elif x >= 60:
        return 'Diamond'
df['TenureGroup'] = df['Tenure'].apply(lambda x: TenureGroups(x))
In [27]:
plt.figure(figsize=(10,6))
sns.countplot(df['TenureGroup'], order=df['TenureGroup'].value_counts().index)
Out[27]:
<matplotlib.axes._subplots.AxesSubplot at 0x19015ca4240>
Notebook Image
Observation
  • We can see that Major customers are Silver customers as per our seggregation.
  • We have very good amount of Diamond Customers available with usas it tops in second place.
  • Higher chances of attrition is possible because of huge number of people are from Silver Category
Let's check the Charges with our customer categories
In [28]:
plt.figure(figsize=(16,5))
plt.subplot(121)
sns.distplot(df['TotalCharges'], color='r', hist=False, kde_kws={"shade": True})
plt.subplot(122)
sns.distplot(df['MonthlyCharges'], color='k', hist=False, kde_kws={"shade": True})
Out[28]:
<matplotlib.axes._subplots.AxesSubplot at 0x19015582e10>
Notebook Image
In [29]:
plt.figure(figsize=(16,18))
plt.subplot(311)
sns.scatterplot(df['MonthlyCharges'], df['TotalCharges'], hue=df['TenureGroup'])
plt.subplot(312)
sns.scatterplot(df['MonthlyCharges'], df['TotalCharges'], hue=df['Churn'])
plt.subplot(313)
sns.scatterplot(df['MonthlyCharges'], df['TotalCharges'], hue=df['Gender'])
Out[29]:
<matplotlib.axes._subplots.AxesSubplot at 0x19016415d68>