Telco Customer Churn

dataset source

eda-repository

my github

1) About:

Churn rate is a measure of the number of indivisuals or items moving out of a group/organisation over a specific period. Hence it serves as an important metric for companies whose customers pay in a recurrent manner. It helps mostly subscription based companies to have a ballpark estimate of how many customers they will have sticking around over a period of time, visually the line of saturation in the graph developed gradually for a given time period. Note that this equilibrium may vary over years according to company strategies which are essentially ways to lure customers thus altering the churn rate to an ideal low.

2) Dataset Overview

Each row represents a customer, each column contains customer’s attributes described on the column metadata.

The data set includes information about:

Customers who left within the last month – the column is called Churn
Services that each customer has signed up for – phone, multiple lines, internet, online security, online backup, device protection, tech support, and streaming TV and movies
Customer account information – how long they’ve been a customer, contract, payment method, paperless billing, monthly charges, and total charges
Demographic info about customers – gender, age range, and if they have partners and dependents

3) Objective:

To derive better meaning out of the given data by mere observation and visualisation aided comparision

#importing lib
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)
warnings.filterwarnings("ignore", category=FutureWarning)

import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt
import seaborn as sns
import jovian

#importing dataset
dataset = pd.read_csv("WA_Fn-UseC_-Telco-Customer-Churn.csv", index_col='customerID')
dataset['PaymentMethod'] = dataset['PaymentMethod'].replace('Bank transfer (automatic)', 'Bank transfer')
dataset['PaymentMethod'] =dataset['PaymentMethod'].replace('Credit card (automatic)', 'Credit card')

#splitting dataset / churn
bye = dataset[dataset['Churn'] == 'Yes']
nobye = dataset[dataset['Churn'] == 'No']

#fixing missing values
dataset['TotalCharges'] = dataset['TotalCharges'].replace(" ", 0)
temp = dataset['TotalCharges'].values.reshape(-1,1).astype('float64')
from sklearn.preprocessing import Imputer
imputer = Imputer(missing_values=0., strategy='mean', axis = 0)
imputer.fit(temp[:, :])
imputer.transform(temp[:, :])
dataset['TotalCharges'] = temp

#splitting in homogeneous categories
cat_feat = list(dataset.columns)
cat_feat.remove('tenure')
cat_feat.remove('MonthlyCharges')
cat_feat.remove('TotalCharges')
cat_feat.remove('Churn')

num_feat = ['tenure', 'MonthlyCharges', 'TotalCharges']
pred = 'Churn'

/home/shreesh/.local/lib/python3.6/site-packages/sklearn/utils/deprecation.py:66: DeprecationWarning: Class Imputer is deprecated; Imputer was deprecated in version 0.20 and will be removed in 0.22. Import impute.SimpleImputer from sklearn instead.
  warnings.warn(msg, category=DeprecationWarning)