Learn data science and machine learning by building real-world projects on Jovian

Project Title - Customer Personality Analysis

The Customer Personality data Analysis is one of the best analysis to collect a information from the customer and in which data maximum information is given. We analysis some informations to get important data like customer in which product to money investing. This data set collecting from kaggle.com, and in this data to analysis with the help of some python program, numpy, pandas, matplotlib, piechart performing in Jupyternotebook. We learn data analysis from Zero to Panda is one of the best platform to learn data analysis course Data Analysis with Python: Zero to Pandas, and we learn how to analysis data with the help of pandas numeric calculation numpy visualization etc.

Description about a data:

This dataset contains 29 variables and 2240 observations about different customers.

Here's a brief version of the data description file.


ID: Customer's unique identifier

Year_Birth: Customer's birth year

Education: Customer's education level

Marital_Status: Customer's marital status

Income: Customer's yearly household income

Kidhome: Number of children in customer's household

Teenhome: Number of teenagers in customer's household

Dt_Customer: Date of customer's enrollment with the company

Recency: Number of days since customer's last purchase

Complain: 1 if customer complained in the last 2 years, 0 otherwise


MntWines: Amount spent on wine in last 2 years

MntFruits: Amount spent on fruits in last 2 years

MntMeatProducts: Amount spent on meat in last 2 years

MntFishProducts: Amount spent on fish in last 2 years

MntSweetProducts: Amount spent on sweets in last 2 years

MntGoldProds: Amount spent on gold in last 2 years


NumDealsPurchases: Number of purchases made with a discount

AcceptedCmp1: 1 if customer accepted the offer in the 1st campaign, 0 otherwise

AcceptedCmp2: 1 if customer accepted the offer in the 2nd campaign, 0 otherwise

AcceptedCmp3: 1 if customer accepted the offer in the 3rd campaign, 0 otherwise

AcceptedCmp4: 1 if customer accepted the offer in the 4th campaign, 0 otherwise

AcceptedCmp5: 1 if customer accepted the offer in the 5th campaign, 0 otherwise

Response: 1 if customer accepted the offer in the last campaign, 0 otherwise


NumWebPurchases: Number of purchases made through the company’s web site

NumCatalogPurchases: Number of purchases made using a catalogue

NumStorePurchases: Number of purchases made directly in stores

NumWebVisitsMonth: Number of visits to company’s web site in the last month

!pip install jovian opendatasets --upgrade --quiet

Let's begin by downloading the data, and listing the files within the dataset.

# Change this
dataset_url = 'https://www.kaggle.com/imakash3011/customer-personality-analysis' 
import opendatasets as od
Please provide your Kaggle credentials to download this dataset. Learn more: http://bit.ly/kaggle-creds Your Kaggle username: mdzee888 Your Kaggle Key: ········ Downloading customer-personality-analysis.zip to ./customer-personality-analysis
100%|██████████| 62.0k/62.0k [00:00<00:00, 26.5MB/s]

The dataset has been downloaded and extracted.

# Change this
data_dir = './customer-personality-analysis'
import os
project_name = "customer-personality-analysis" 
!pip install jovian --upgrade -q
import jovian
[jovian] Updating notebook "mdzee888/customer-personality-analysis" on https://jovian.ai [jovian] Committed successfully! https://jovian.ai/mdzee888/customer-personality-analysis

Data Preparation and Cleaning

We first looking a data and some error data to replace in suitable data or not suitable then delete the row as per applicable. and joining some columns as per required you look in following.

!pip install pandas --upgrade --quiet
import pandas as pd
custumer_personality_df = pd.read_csv('customer-personality-analysis/marketing_campaign.csv', sep='\t')

First we checking data type information in this data so we identify one column in which date time no data type given in this data is object.

<class 'pandas.core.frame.DataFrame'> RangeIndex: 2240 entries, 0 to 2239 Data columns (total 29 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 ID 2240 non-null int64 1 Year_Birth 2240 non-null int64 2 Education 2240 non-null object 3 Marital_Status 2240 non-null object 4 Income 2216 non-null float64 5 Kidhome 2240 non-null int64 6 Teenhome 2240 non-null int64 7 Dt_Customer 2240 non-null object 8 Recency 2240 non-null int64 9 MntWines 2240 non-null int64 10 MntFruits 2240 non-null int64 11 MntMeatProducts 2240 non-null int64 12 MntFishProducts 2240 non-null int64 13 MntSweetProducts 2240 non-null int64 14 MntGoldProds 2240 non-null int64 15 NumDealsPurchases 2240 non-null int64 16 NumWebPurchases 2240 non-null int64 17 NumCatalogPurchases 2240 non-null int64 18 NumStorePurchases 2240 non-null int64 19 NumWebVisitsMonth 2240 non-null int64 20 AcceptedCmp3 2240 non-null int64 21 AcceptedCmp4 2240 non-null int64 22 AcceptedCmp5 2240 non-null int64 23 AcceptedCmp1 2240 non-null int64 24 AcceptedCmp2 2240 non-null int64 25 Complain 2240 non-null int64 26 Z_CostContact 2240 non-null int64 27 Z_Revenue 2240 non-null int64 28 Response 2240 non-null int64 dtypes: float64(1), int64(25), object(3) memory usage: 507.6+ KB
custumer_personality_df1 = custumer_personality_df.copy()
child = custumer_personality_df.Teenhome + custumer_personality_df.Kidhome
0       0
1       2
2       0
3       1
4       1
2235    1
2236    3
2237    0
2238    1
2239    2
Length: 2240, dtype: int64

We adding new column in its data child adding teen age and kid age.

custumer_personality_df1['child'] = custumer_personality_df.Teenhome + custumer_personality_df.Kidhome

Also we adding new column in its data total_amount_spent in which every customer total amount spent.

custumer_personality_df1['total_amount_spent']= custumer_personality_df.MntWines + custumer_personality_df.MntFruits + custumer_personality_df.MntMeatProducts + custumer_personality_df.MntFishProducts + custumer_personality_df.MntSweetProducts + custumer_personality_df.MntGoldProds

(2240, 31)

We checking its data any none value present so given Ture other wise False.


We checking total given value in data. No any value is none value or empty value.


Now some processing mathematical operation mean value minimum value maximum value and so on you shown below.

0       04-09-2012
1       08-03-2014
2       21-08-2013
3       10-02-2014
4       19-01-2014
2235    13-06-2013
2236    10-06-2014
2237    25-01-2014
2238    24-01-2014
2239    15-10-2012
Name: Dt_Customer, Length: 2240, dtype: object

now we convert 'dt_Customer' column is given data type object to datetime64 data type.

custumer_personality_df1['Dt_Customer'] = pd.to_datetime(custumer_personality_df.Dt_Customer)
0      2012-04-09
1      2014-08-03
2      2013-08-21
3      2014-10-02
4      2014-01-19
2235   2013-06-13
2236   2014-10-06
2237   2014-01-25
2238   2014-01-24
2239   2012-10-15
Name: Dt_Customer, Length: 2240, dtype: datetime64[ns]
Index(['ID', 'Year_Birth', 'Education', 'Marital_Status', 'Income', 'Kidhome',
       'Teenhome', 'Dt_Customer', 'Recency', 'MntWines', 'MntFruits',
       'MntMeatProducts', 'MntFishProducts', 'MntSweetProducts',
       'MntGoldProds', 'NumDealsPurchases', 'NumWebPurchases',
       'NumCatalogPurchases', 'NumStorePurchases', 'NumWebVisitsMonth',
       'AcceptedCmp3', 'AcceptedCmp4', 'AcceptedCmp5', 'AcceptedCmp1',
       'AcceptedCmp2', 'Complain', 'Z_CostContact', 'Z_Revenue', 'Response',
       'child', 'total_amount_spent'],

Exploratory Analysis and Visualization

Custumer Personality Analysis - In this we visualization some columns 'Year of birth' , 'Income' , 'Kidhome', 'Teenhome' etc.

Instructions (delete this cell)

  • Compute the mean, sum, range and other interesting statistics for numeric columns
  • Explore distributions of numeric columns using histograms etc.
  • Explore relationship between columns using scatter plots, bar charts etc.
  • Make a note of interesting insights from the exploratory analysis

Let's begin by importingmatplotlib.pyplot and seaborn.

import seaborn as sns
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline

matplotlib.rcParams['font.size'] = 14
matplotlib.rcParams['figure.figsize'] = (9, 5)
matplotlib.rcParams['figure.facecolor'] = '#00000000'

YearBirth - Following is a graph in year of birth counts so we see in graph to analysis maximum custumers year of birth in between 1960-1980.

plt.title('Number of Year of Birth');
Notebook Image
plt.title('Number of Marital Status')
Notebook Image
plt.title('Number of Marital Status in percentage')
plt.pie(custumer_personality_df.Marital_Status.value_counts(), labels=custumer_personality_df.Marital_Status.value_counts().index, autopct='%1.2f%%',shadow=True);

Notebook Image

Sacatterplot - You shown in below graph as per the marital status income is maximum in Together 666666 and average income in Married, Divorced also Single look.

sns.scatterplot(x=custumer_personality_df1.Marital_Status, y=custumer_personality_df1.Income)
plt.title('Marital Status in Income');
Notebook Image

As you show in below scatter plot in which we adding hue as Teenhome is a teenages person 0,1 and 2.

sns.scatterplot(x=custumer_personality_df1.Marital_Status, y=custumer_personality_df1.Income,hue=custumer_personality_df1.Teenhome)
plt.title('Marital Status in Income');
Notebook Image

Histplot - This graph in which the last day of purchase counts so that clear last 5 days 135 to 138 customer's purchase maximum also 91 to 95 days 133 to 135 customer's purchases in between last purchase average customer's.

import numpy as np
sns.histplot(custumer_personality_df1.Recency, bins=np.arange(0,100,5))
plt.title("Number of days since customer's last purchase");
Notebook Image

barplot - As show in figure of graph in which we analysis how many marital status money spent in last 2 year so in graph clear that marital status alone is maximum money spent variation. And maximum average of money spent in wines are window and Absurd. As per shown in graph others Marital Status.

sns.barplot(x='MntWines', y='Marital_Status', data=custumer_personality_df1)
plt.title('Marital Status Amount spent on wine in last 2 years');
Notebook Image

Below graph is same as above graph but in which adding hue so money spent in wines last 2 years are changes other material Status. in which maximum money spent variation in wine Divorced and 2 teenhome customer person also maximum money average spent in wine widow and 2 teenhome customer person. This one a different result as per above graph.

sns.barplot(x='MntWines', y='Marital_Status', hue= 'Teenhome', data=custumer_personality_df1)
plt.title('Marital Status and Teenhome Amount spent on wine in last 2 years');
Notebook Image

piechart - This pie chart you show that in which we show you number of child in home percentage overall data

  • 0 means no child in home
  • 1 means one child in home
  • 2 means two child in home
  • 3 means three child in home

1    1128
0     638
2     421
3      53
Name: child, dtype: int64
plt.pie(custumer_personality_df1.child.value_counts(), labels=custumer_personality_df1.child.value_counts().index ,autopct="%1.2f%%", explode=[0.01, 0.03, 0.05, 0.06], shadow=True, startangle=180)
plt.title('Number of child home in percentage')
Text(0.5, 1.0, 'Number of child home in percentage')
Notebook Image

Asking and Answering Questions

We looking custumer_personality_df1 in it custumer amount spent in some products in last 2 years. Number of last day to purchase and also other data is given so we some questions answering in this data collecting new data.

Q1: How many custumer Income above 50000?
high_income = custumer_personality_df1.Income > 50000
custumer_high_income_df = custumer_personality_df1[high_income]
high_income = custumer_high_income_df.Income.count()
print(f"The total numbers of above 5000 income custumers are {high_income}.")
The total numbers of above 5000 income custumers are 1156.
Q2: How many money spent overall data in wines and give it's percentage overall spent money?
overall_mntwines = custumer_personality_df1.MntWines.sum()
overall_money_spent =custumer_personality_df1.total_amount_spent.sum()
pct_of_mntwines = (overall_mntwines * 100/ overall_money_spent)
print(f"The money sent overall data in wines are {overall_mntwines} and it's percentage overall spent money is {pct_of_mntwines:.2f}%.")
The money sent overall data in wines are 680816 and it's percentage overall spent money is 50.17%.
Q3: which year to maximum amount spent customer ?
custumer_personality_df1['year'] = pd.DatetimeIndex(custumer_personality_df1.Dt_Customer).year
custumer_personality_df1['month'] = pd.DatetimeIndex(custumer_personality_df1.Dt_Customer).month
custumer_personality_df1['day'] = pd.DatetimeIndex(custumer_personality_df1.Dt_Customer).day
custumer_personality_df1['weekday'] = pd.DatetimeIndex(custumer_personality_df1.Dt_Customer).weekday

You show in above table maximum amount of money spent in 2013 as per data.

Q4: show that maximum amount income which month and give 10 customer list ?
custumer_personality_df1.sort_values('month', ascending=False).head(10)

Above 10 customer list in which maximum amount income in month.

Q5: how many total amount spent overall on wine , fruits , meat , fish , sweet and gold also give in percentage wise ?
tas_wine = custumer_personality_df1.MntWines.sum()
tas_fruits = custumer_personality_df1.MntFruits.sum()
tas_meat = custumer_personality_df1.MntMeatProducts.sum()
tas_fish = custumer_personality_df1.MntFishProducts.sum()
tas_sweet = custumer_personality_df1.MntSweetProducts.sum()
tas_gold = custumer_personality_df1.MntGoldProds.sum()
tas_wine_pct = tas_wine * 100 / custumer_personality_df1.total_amount_spent.sum()
tas_fruits_pct = tas_fruits * 100 / custumer_personality_df1.total_amount_spent.sum()
tas_meat_pct = tas_meat * 100 / custumer_personality_df1.total_amount_spent.sum()
tas_fish_pct = tas_fish * 100 / custumer_personality_df1.total_amount_spent.sum()
tas_sweet_pct = tas_sweet * 100 / custumer_personality_df1.total_amount_spent.sum()
tas_gold_pct = tas_gold * 100 / custumer_personality_df1.total_amount_spent.sum()
print(f"The total amount spent overall on wine is {tas_wine} also in percentage {tas_wine_pct:.2f}% , fruits is {tas_fruits} also in percentage {tas_fruits_pct:.2f}% , meat is {tas_meat} also in percentage {tas_meat_pct:.2f}% , fish is {tas_fish} also in percentage {tas_fish_pct:.2f}% , sweet is {tas_sweet} also in percentage {tas_sweet_pct:.2f}% and gold is {tas_gold} also in percentage {tas_gold_pct:.2f}%.  ")
The total amount spent overall on wine is 680816 also in percentage 50.17% , fruits is 58917 also in percentage 4.34% , meat is 373968 also in percentage 27.56% , fish is 84057 also in percentage 6.19% , sweet is 60621 also in percentage 4.47% and gold is 98609 also in percentage 7.27%.
Q6: How many money spent in day wise?
twd_money_spent = custumer_personality_df1.groupby('weekday')[['total_amount_spent', 'MntWines' ]].sum()
  • 0 means Monday
  • 1 means Tuesday
  • 2 means Wednesday
  • 3 means Thursday
  • 4 means Friday
  • 5 means Saturday
  • 6 means Sunday
plt.pie(twd_money_spent.total_amount_spent, labels= twd_money_spent.index,autopct="%1.2f%%", explode=[0.01,0.03, 0.06, 0.09, 0.12, 0.15, 0.18], shadow=True, startangle=180)
plt.title('Total amount sspent in every weekday in 2 years and its in percentage');
Notebook Image

Inferences and Conclusion

Customer Personality Analysis

you show that in above we analysis a data in which some interesting data collected.

  • We firstly data to check in which any null values.
  • we prepare data and clearing some missing values of rows or insert in Nan.
  • we adding child in new column.
  • we also adding total amount spent in new columns
  • we visualize some graphs show aboves related to number of years birth
  • customer is maximum birth date in between 1960 to 1980 show in above.
  • also other graphs is shows in which interesting data collected. image.png image-3.png
  • we also asking and answering question about customer related data. The total numbers of above 5000 income custumers are 1156.

We getting some important data related to customer. Maximum customers money spent in last 2 years in wines. Maximum number of customer is married. maximum amount spent in 2013. And 50% customer in one child. maximum money spent in wine widow and absurd. This graph in which the last day of purchase counts so that clear last 5 days 135 to 138 customer's purchase maximum also 91 to 95 days 133 to 135 customer's purchases in between last purchase average customer's.

we analysis in some data and collect in some interesting data as per get a result.

Thank you

given some time to looking our analysis.

import jovian

References and Future Work

import jovian