Jovian
⭐️
Sign In

Lok Sabha 2019 Candidate Analysis

In this notebook, I am performing analysis on the candidates stood for the 'Lok Sabha 2019 Elections' on their personal information as Gender, Education, Category, Criminal Cases, Party they belong and Votes they received in their constituency.

Lok Sabha 2019 Candidate Analysis

In this notebook, I am performing analysis on the candidates stood for the 'Lok Sabha 2019 Elections' on their personal information such as
  • Age
  • Gender
  • Education
  • Category
  • Criminal Cases
  • Party they belong
  • Votes they received in their constituency.

I got this dataset from Kaggle which is uploaded by Prakrut Chauhan and I am performing analysis on this dataset using Pandas & Numpy and plotting graphs using Matplotlib & Seaborn.

Also, I would like to mention this wonderful course with great instructor Aakash N S who is is the co-founder and CEO of Jovian.ml, a project management and collaboration platform for machine learning. Along with great instructor Jovian.ML gives a great platform and resources throughout the course. I have learnt Python and used it for a while, but had never tried data analysis with it. From this course I learnt Pandas, MatplotLib & Seaborn. And not just this, but also overall thinking and mentality for Data Analysis.

Also you can read my article


Let's Start with the analysis.

In [1]:
project_name = "lok_sabha_2019_candidates_analysis"
In [2]:
!pip install jovian --upgrade -q
In [3]:
import jovian
In [4]:
jovian.commit(project=project_name , outputs=['lok_sabha_2019.csv'])
[jovian] Attempting to save notebook.. [jovian] Please enter your API key ( from https://jovian.ml/ ): API KEY: ········ [jovian] Updating notebook "nihir10dec/lok-sabha-2019-candidates-analysis" on https://jovian.ml/ [jovian] Uploading notebook.. [jovian] Capturing environment.. [jovian] Uploading additional outputs... [jovian] Committed successfully! https://jovian.ml/nihir10dec/lok-sabha-2019-candidates-analysis

Data Preparation and Cleaning

TODO

In [5]:
#importing all the needed libraries
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline

I have downloaded the dataset and uploaded it in my notebook and reading that uploaded file into a Data Frame

In [6]:
data = pd.read_csv('lok_sabha_2019.csv')
data
Out[6]:

Let's checkout the basic details we have in this dataset as it's huge dataset it 2263 rows and 19 columns.

In [7]:
# Shows basic details about numeric columns of our dataset.
data.describe()
Out[7]:

We also have columns such as Assets and Liabilites whose values should be numeric and in ₹, but not showing in above numeric values and hence we'll check out some random values of it.

In [8]:
data[['ASSETS' , 'LIABILITIES']].sample(20)
Out[8]:
In [9]:
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 2263 entries, 0 to 2262 Data columns (total 19 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 STATE 2263 non-null object 1 CONSTITUENCY 2263 non-null object 2 NAME 2263 non-null object 3 WINNER 2263 non-null int64 4 PARTY 2263 non-null object 5 SYMBOL 2018 non-null object 6 GENDER 2018 non-null object 7 CRIMINAL CASES 2018 non-null object 8 AGE 2018 non-null float64 9 CATEGORY 2018 non-null object 10 EDUCATION 2018 non-null object 11 ASSETS 2018 non-null object 12 LIABILITIES 2018 non-null object 13 GENERAL VOTES 2263 non-null int64 14 POSTAL VOTES 2263 non-null int64 15 TOTAL VOTES 2263 non-null int64 16 OVER TOTAL ELECTORS IN CONSTITUENCY 2263 non-null float64 17 OVER TOTAL VOTES POLLED IN CONSTITUENCY 2263 non-null float64 18 TOTAL ELECTORS 2263 non-null int64 dtypes: float64(3), int64(5), object(11) memory usage: 336.0+ KB

As we can notice in our dataset that Assets & Liabilities columns actually hold values in Rs. but they are stored in termsof String and hence we can't perform mathematical calculations on it. So we will be converting it into Integers or Float values.

As we have value like this "Rs 30,99,414\n ~ 30 Lacs+". We are only interested in the part before "\n" and hence we'll be splitting both the columns from "\n" and take only the 1st part from it.

In [10]:
data["ASSETS"]= data["ASSETS"].str.split('\n').str[0]
data["LIABILITIES"]= data["LIABILITIES"].str.split('\n').str[0]

After that we will remove the preceeding "Rs." and differentiator "," from the string

In [11]:
data["ASSETS"]= data["ASSETS"].str.replace("Rs " , "")
data["ASSETS"]= data["ASSETS"].str.replace("," , "")
In [12]:
data["LIABILITIES"]= data["LIABILITIES"].str.replace("Rs " , "")
data["LIABILITIES"]= data["LIABILITIES"].str.replace("," , "")

Finally we will convert it to numeric values and for NaN values we have put errors="coerce"

In [13]:
data["ASSETS"]= pd.to_numeric(data["ASSETS"] , errors="coerce")
In [14]:
data["LIABILITIES"]= pd.to_numeric(data["LIABILITIES"] , errors="coerce")

As now we have the Assets and Liabilities columns we will create one more columns named TOTAL_WITHHOLDINGS which will have a value as LIABILITIES subtracted from ASSETS

In [15]:
data["TOTAL-WITHHOLDINGS"] = data["ASSETS"] - data["LIABILITIES"]

Let's check if we have successfully transformed them into numeric values or not.

In [16]:
data[["ASSETS" , "LIABILITIES" , "TOTAL-WITHHOLDINGS"]].sample(20)
Out[16]:

Now let's check what type of values we have in the column CRIMINAL CASES

In [17]:
data['CRIMINAL CASES'].value_counts()
Out[17]:
0                1242
1                 313
2                 119
3                 104
4                  64
5                  42
6                  26
Not Available      22
7                  18
8                  16
9                  11
10                 11
11                  5
14                  4
12                  4
13                  3
15                  2
16                  1
41                  1
18                  1
42                  1
31                  1
24                  1
204                 1
40                  1
28                  1
22                  1
240                 1
52                  1
Name: CRIMINAL CASES, dtype: int64

As we have 22 records with criminal cases value as Not Available we will convert that into 0 and will convert into numeric.

In [20]:
data['CRIMINAL CASES'] = data['CRIMINAL CASES'].str.replace('Not Available' , "0")
data['CRIMINAL CASES'] = pd.to_numeric(data['CRIMINAL CASES'] , errors='coerce')
data['CRIMINAL CASES'].value_counts()
Out[20]:
0.0      1264
1.0       313
2.0       119
3.0       104
4.0        64
5.0        42
6.0        26
7.0        18
8.0        16
9.0        11
10.0       11
11.0        5
12.0        4
14.0        4
13.0        3
15.0        2
40.0        1
41.0        1
24.0        1
204.0       1
28.0        1
22.0        1
31.0        1
42.0        1
16.0        1
240.0       1
18.0        1
52.0        1
Name: CRIMINAL CASES, dtype: int64

Now let's take a look at the different values we have in the column EDUCATION.

In [21]:
data.EDUCATION.value_counts()
Out[21]:
Post Graduate            502
Graduate                 441
Graduate Professional    336
12th Pass                256
10th Pass                196
8th Pass                  78
Doctorate                 73
Others                    50
Literate                  30
5th Pass                  28
Not Available             22
Illiterate                 5
Post Graduate\n            1
Name: EDUCATION, dtype: int64

For this example we'll consider Graduate and Graduate Professional as same. For simplicity, we'll also change Not Available and Others field to Illiterate. So now we only have feilds which depict clearly what their education is.

In [22]:
data.EDUCATION = data.EDUCATION.str.replace("Professional", "")
data.EDUCATION = data.EDUCATION.str.replace("Not Available", "Illiterate")
data.EDUCATION = data.EDUCATION.str.replace("Others", "Illiterate")
data.EDUCATION = data.EDUCATION.str.strip()
In [23]:
data.EDUCATION.value_counts()
Out[23]:
Graduate         777
Post Graduate    503
12th Pass        256
10th Pass        196
8th Pass          78
Illiterate        77
Doctorate         73
Literate          30
5th Pass          28
Name: EDUCATION, dtype: int64
In [24]:
import jovian
In [25]:
jovian.commit()
[jovian] Attempting to save notebook.. [jovian] Updating notebook "nihir10dec/lok-sabha-2019-candidates-analysis" on https://jovian.ml/ [jovian] Uploading notebook.. [jovian] Capturing environment.. [jovian] Committed successfully! https://jovian.ml/nihir10dec/lok-sabha-2019-candidates-analysis

Exploratory Analysis and Visualization

Let's have visual representation of the age of the people contesting elections and also of those who have winned at their constituency.

In [26]:
# Plotting 2 graphs side by side
fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# histplot directly gives us the count of the column given to it
sns.histplot(data.AGE , binwidth=5 , ax=axes[0], kde=True);
axes[0].set_title("Age wise Distribution of people contesting") # setting title for the plot

# We can change the no of bins and also the width of the bins
sns.histplot(data[data["WINNER"]>0]["AGE"] , binwidth=5 , ax=axes[1], fill=False);
axes[1].set_title("Age wise Distribution of winning contestants") # setting title for the plot

plt.show();
Notebook Image

From this we can infer that the mjaority of people contesting elections are in between the age of 45-60 years and we have very few candidates below the age of 30 where we have a huge population of youngsters.

Also, the highest number of winners are of the age 50-65 years. This trend shows that India always votes for experienced contestants and it can also start giving chance to youngsters which can bring out different mindset and thinking.

From childhood, I have always heard and had that notion in mind that politicians are not that well educated or Educated people don't join Politics. So to clear that let's visualise the education of all the people contesting elections.

In [53]:
#Using Seaborn's CountPlot with figure size 10 * 6
plt.figure(figsize=(10,6))
sns.countplot(x="EDUCATION" , data=data);
Notebook Image

By seeing this graph, the sterotype I had in mind that politicians aren't well educated is now clear, as almost more than 50-60% of people contesting elections are Graduates and Post Graduates which is an excellent number.

And some of them have education below 12th Pass which is also okay, as formal eduction is not the only thing needed to run a country which we can clearly see from example of India's PM Mr. Narendra Modi.

Now to get a more clear picture we will visualise education of the people who have won elections v/s who haven't. To check, do people keep the education of contestant in mind before voting for them??

In [28]:
g = sns.catplot(x="EDUCATION", col="WINNER", data=data, kind="count", height=8);
g.set_titles("NOT WINNERS V/S {col_var}S");
Notebook Image

Here also we can see that the highest and second highest number of winners are Graduates and Post Graduates and there are very few people with education lesser than 12th Pass and Illiterate.

It's very clearly depicted here over 50-60% of people who have won won election are either Graduate or Post Graduate, which it is and safe to say that India is majorly run by well Educated people.

Now let's have a look at the State Wise Criminal records of the contestants.

For that we will take the sum of the criminal cases of the state and display only the Top 15 States with highest criminal records of the contestants for better viewability. And we'll also do the same for only winners.

In [29]:
state_criminal = data.groupby('STATE')[['CRIMINAL CASES']].sum().sort_values(by=
                        ['CRIMINAL CASES']).tail(15).sort_values(by=['STATE'])

state_criminal_winner = data[data['WINNER']>0].groupby('STATE')[['CRIMINAL CASES']].sum().sort_values(by=
                        ['CRIMINAL CASES']).tail(15).sort_values(by=['STATE'])
state_criminal
Out[29]:

After getting the data now we'll plot it into a graph.

In [54]:
# 2 Barplot Side by Side
fig, axes = plt.subplots(1, 2, figsize=(16, 8))

# Passing X axis and Y axis along with subplot position
sns.barplot(x = state_criminal.index , y = state_criminal['CRIMINAL CASES'] , ax=axes[0] , palette='YlOrBr');
axes[0].tick_params(axis='x' , rotation=45); #changing the X axis poition to read more clearly
axes[0].set_title('STATE WISE CRIMINAL CASE OF CONTESTANTS');

#We can also change the color of the barplots by giving different palletes
sns.barplot(x = state_criminal_winner.index , y = state_criminal_winner['CRIMINAL CASES'] , ax=axes[1] , palette='viridis');
axes[1].set_title('STATE WISE CRIMINAL CASE OF WINNERS');
plt.xticks(rotation=45);
Notebook Image

Here we can clearly notice that Kerela has the highest number of criminal cases in comparison to any other states. And Bihar, Maharashtra, Uttar Pradesh and West Bengal has the nearly second highest contestants with criminal cases.In the winners also we see is Kerela is the highest leading in criminal cases and following it other states like Uttar Pradesh and West Bengal

Also one thing I can notice is that Punjab and Rajasthan are having approximately 50 cases which is the least in Top 15 States which means that the remaining States will be having cases lesser than 50 which is really optimistic way of looking at it.

Another interesting inference from this is Telengana had a total 200 cases from all the contestants and considering all the winners from Telangana still has almost 150 cases which is a really high ratio.

Also Rajasthan had a few cases in contestants but it isn't there in winners which tells us that they haven't chosen people with criminal cases while Assam wan't there in contestants graph but is present in winners which tells that maybe they have chosen contestants with some criminal cases.

Now let's take a look at their financial aspect of the contestants.

We'll first check the Assests and Liabilities of the the contestants state wise.

In [31]:
#We'll groupby them by state and only choose the required column and perform the aggregate function we want on them
state_assets = data.groupby('STATE')[['ASSETS']].sum().sort_values(by=['ASSETS']).tail(15).sort_values(by=['STATE'])

state_liablity = data.groupby('STATE')[['LIABILITIES']].sum().sort_values(by=['LIABILITIES']).tail(15).sort_values(by=['STATE'])

We'll also take account of the Total WithHoldings and plot it State Wise and Party wise also.

In [32]:
#To choose only the Top 15 states we will sort them and take 15 records from the bottom as it's sorted ascending manner
state_withholding = data.groupby('STATE')[['TOTAL-WITHHOLDINGS']].sum().sort_values(
                                by=['TOTAL-WITHHOLDINGS']).tail(15).sort_values(by=['STATE'])

party_withholding = data.groupby('PARTY')[["TOTAL-WITHHOLDINGS"]].sum().sort_values(
                                by=['TOTAL-WITHHOLDINGS']).tail(15).sort_values(by=['PARTY'])
In [33]:
#4 subplots with total figure size of 16 * 14
fig, axes = plt.subplots(2, 2, figsize=(16, 14))

sns.barplot(x = state_assets.index , y = state_assets['ASSETS'] , ax=axes[0,0] , palette='Spectral');
axes[0,0].set_title('Sum of Assets of Contestants State Wise')
axes[0,0].tick_params(axis='x' , rotation=45);

sns.barplot(x = state_liablity.index , y = state_liablity['LIABILITIES'] , ax=axes[0,1] , palette='coolwarm');
axes[0,1].set_title('Sum of Liabilities of Contestants State Wise');
axes[0,1].tick_params(axis='x' , rotation=45);

sns.barplot(x = state_withholding.index , y = state_withholding["TOTAL-WITHHOLDINGS"] , ax=axes[1,0] , palette='Set2');
axes[1,0].set_title('State wise Total Withholdings')
axes[1,0].tick_params(axis='x' , rotation=45);

sns.barplot(x = party_withholding.index , y = party_withholding["TOTAL-WITHHOLDINGS"] , ax=axes[1,1] , palette='bright');
axes[1,1].set_title('Party wise Total Withholdings')

plt.xticks(rotation=45);
plt.tight_layout(pad=2);
Notebook Image

Assets and Liabilities State Wise

From the 1st graph we can infer that Andhra Pradesh and Uttar Pradesh are the top most States with contestants having highest assets following at 2nd position by Maharashtra, Tamil Nadu and Telengana who have approximately same Assets.

From the 2nd graph we can clearly depict that Andhra Pradesh is also the highest state which has more contestants with high liabilities, whereas other states have comparatively low Liabilities than Andhra Pradesh.

Total With Holdings State & Party Wise

From the 3rd graph we conclude that Uttar Pradesh is the top most States with contestants having highest Total With Holdings following at 2nd position by Andhra Pradesh which topped in both Assets and Liabilities Grpah also. After that we have Maharashtra, Tamil Nadu and Telengana who have approximately same Total With Holdings.

The last graph evidently depicts that INC(Congress) has the highest contestants wtih high Total With Holdings which is almos double than the total of Uttar Pradesh. Followed by INC we have BJP at 2nd position with a difference of more than ₹2*10$^$8. The rest of the parties have very low Total With Hodings as compared to these 2 National level Parties

Now we'll take a look on Category and Gender Wise distribution of Candidates and Winners

Firstly, well group by them Category and Gender and take the counts of it in a particular group.

In [34]:
#Group by them Category and count the number of occurenece in each group
category = data.groupby('CATEGORY')[['NAME']].count()

#Group by them Gender and count the number of occurenece in each group
gender = data.groupby('GENDER')[['NAME']].count()

We'll also consider only winners and group by them Category and Gender to see if there is any particular trend in it.

In [35]:
#Only choose records of the winner
category_winner = data[data['WINNER']>0].groupby('CATEGORY')[['NAME']].count()

gender_winner = data[data['WINNER']>0].groupby('GENDER')[['NAME']].count()

Now, let's plot them with a help of pie chart.

In [36]:
# Plotting 4(2*2) Pie Charts 
fig1, axes = plt.subplots(2, 2, figsize=(14, 10))

# This function is written in order to write the Total no. of people inside the Pie chart
def func(pct, allvals):
    absolute = int(pct/100.*np.sum(allvals))
    return "{:.2f}%\n({:d} People)".format(pct, absolute)

# Explode gives a certain gap from the center of the circle of pie chart

wedges, texts, autotexts = axes[0,0].pie(category['NAME'], explode=(0.1, 0, 0), autopct=lambda pct: func(pct, category['NAME']),
                          shadow=True, textprops=dict(color="w") , startangle=90)
# We can also change the legend names, title, location.
axes[0,0].legend(wedges, category.index, title="Category", loc="center left", bbox_to_anchor=(1, 0, 0.5, 1))
axes[0,0].set_title("Category Wise Contestants")

# Autopct is given the function we created at top to also write no of people along with the percentage

wedges, texts, autotexts = axes[0,1].pie(category_winner['NAME'], explode=(0.1, 0, 0), autopct=lambda pct: func(pct, category_winner['NAME']),
                                shadow=True,  textprops=dict(color="w") , startangle=90)
axes[0,1].legend(wedges, category.index, title="Category", loc="center left", bbox_to_anchor=(1, 0, 0.5, 1))
axes[0,1].set_title("Category Wise Winners")

# Shadow True gives a minute 3D effect to the pie chart

wedges, texts, autotexts = axes[1,0].pie(gender['NAME'], explode = (0.2,0) , autopct=lambda pct: func(pct, gender['NAME']),
                                shadow=True,  textprops=dict(color="w") , colors=['teal' , 'tomato'])
axes[1,0].legend(wedges, gender.index, title="Gender", loc="center left", bbox_to_anchor=(1, 0, 0.5, 1))
axes[1,0].set_title("Gender Wise Contestants")

# We can change the color of the font inside the pie chart with TEXTPROPS as well as the divisions of pie chart with providing list of colors

wedges, texts, autotexts = axes[1,1].pie(gender_winner['NAME'], explode = (0.2,0) , autopct=lambda pct: func(pct, gender_winner['NAME']),
                                shadow=True,  textprops=dict(color="w") , colors=['teal' , 'tomato'])
axes[1,1].legend(wedges, gender.index, title="Gender", loc="center left", bbox_to_anchor=(1, 0, 0.5, 1))
axes[1,1].set_title("Gender Wise Winners")

# It gives a padding between all the figures.
plt.tight_layout(pad=2);
plt.show()
Notebook Image

Category Wise Distribution

As it clearly depicts that large majority of people contesting elections are from General Category (nearly 70%) and 30% from SC and ST category, according to me which is understandable as they are also less in number as compared to people from general category.

Here we can see that there is 6% more winners from general category as compared to the contestants which makes me understand that people of India don't let the category of person matter while choosing the candidate for them. SC and ST categories should be encouraged to be given equal opportunity but not let that be the only reason for electing them.

Gender Wise Distribution

As we can clearly understand that there is a huge gap between number of male and female candidates as there are very few female candidates contesting elections for Lok Sabha 2019. According to me, more and more female candidates should contest elections.

From all the winners nearly 86% of them are male and just 14% of them are female as we have seen less participation of female candiadates contesting elections itself.Instead of just talking about feminism they should try to make some changes from ground level and men should motivate and encourage them to do so.

In [37]:
import jovian
In [38]:
jovian.commit(outputs=['lok_sabha_2019.csv'])
[jovian] Attempting to save notebook.. [jovian] Updating notebook "nihir10dec/lok-sabha-2019-candidates-analysis" on https://jovian.ml/ [jovian] Uploading notebook.. [jovian] Capturing environment.. [jovian] Uploading additional outputs... [jovian] Committed successfully! https://jovian.ml/nihir10dec/lok-sabha-2019-candidates-analysis

Asking and Answering Questions

I was also curious to check if people do vote for NOTA if they feel that no candidate feel's the right choice for them.

In [39]:
data[data['NAME']=="NOTA"]['OVER TOTAL VOTES POLLED IN CONSTITUENCY'].describe()
Out[39]:
count    245.000000
mean       1.627031
std        0.690592
min        1.000689
25%        1.189109
50%        1.391887
75%        1.762612
max        5.034813
Name: OVER TOTAL VOTES POLLED IN CONSTITUENCY, dtype: float64

On an average only 1.6% do have voted for NOTA which is a lesser number as people are not yet that aware of it and thinks that our vote is going to be wasted if vote for NOTA. But if there is no right candidate according to you, and if everyone feels the same then relections can occur and those candidates won't be allowed to contest elections form same constituency again.

Now finally let's also take a look at how each party is performing in different states. But as we have total 133 Unique Parties contesting elections, we'll only consider Top 15 parties v/s Top 15 States

In [40]:
# extracting top 15 parties by number of seats they are contesting elections at.
data.PARTY.value_counts().head(15)
Out[40]:
BJP       420
INC       413
NOTA      245
IND       201
BSP       163
CPI(M)    100
VBA        47
AITC       47
SP         39
NTK        38
MNM        36
SHS        26
AAP        25
TDP        25
YSRCP      25
Name: PARTY, dtype: int64
In [41]:
# extracting top 15 states by number of Lok Sabha seats they have.
high_states = data[data['STATE'].map(data['STATE'].value_counts()) > 50]
In [42]:
high_states.STATE.value_counts()
Out[42]:
Uttar Pradesh     274
Bihar             244
Tamil Nadu        217
West Bengal       193
Maharashtra       192
Andhra Pradesh    121
Madhya Pradesh    103
Gujarat            87
Rajasthan          86
Odisha             85
Telangana          83
Karnataka          77
Jharkhand          73
Kerala             63
Punjab             62
Assam              52
Name: STATE, dtype: int64

Now we'll get all the details about all the Top 15 States and only Top 15 Parties

In [43]:
high_states_party = high_states[high_states['PARTY'].map(high_states['PARTY'].value_counts()) > 25]
high_states_party
Out[43]:

As we can see that when we had 29 states \(+\) 7 Union Territories along with 133 Parties we had 2263 rows. But just by considering Top 15 States and Top 15 Parties we are still getting 1561 rows, which clearky indicates in how small quantity they might be and hence to undestand visually more better, we can skip them and take look at bigger picture.

In [44]:
state_votes_1 = pd.pivot_table(high_states_party , values='OVER TOTAL VOTES POLLED IN CONSTITUENCY' , index='STATE' , columns='PARTY' , aggfunc=np.mean)
state_votes_2 = pd.pivot_table(high_states_party , values='OVER TOTAL ELECTORS IN CONSTITUENCY' , index='STATE' , columns='PARTY' , aggfunc=np.mean)
state_votes_1
Out[44]:

Here I have pivoted the data which we had in 2 different columns into a Matrix like data format with size of 15$*$15 which will be easy plot inside a HeatMap. Here we have considered 2 columns which are

  • OVER TOTAL VOTES POLLED IN CONSTITUENCY :- This has percentage value of how many votes out of the total votes polled in the constituency.
  • OVER TOTAL ELECTORS IN CONSTITUENCY :- This has percentage value of how many votes secures out of the total elector population of the constituency.

Eg:- Constituency A has total population of 10000 people of which 7000 are eligible to vote. Out of 7000 only 5000 people voted on the day of election.

  • Candidate X secured 3000 votes which means his OVER TOTAL VOTES POLLED IN CONSTITUENCY values will be 3000/5000 = 60% and OVER TOTAL ELECTORS IN CONSTITUENCY will be 3000/7000 = 42.8%
  • Candidate Y secured 2000 votes which means his OVER TOTAL VOTES POLLED IN CONSTITUENCY values will be 2000/5000 = 40% and OVER TOTAL ELECTORS IN CONSTITUENCY will be 2000/7000 = 28%
In [45]:
plt.figure(figsize=(20,12));

#orientation: To change the ‘horizontal’ or ‘vertical’ orientation of the color bar
#shrink: To change the size of the color bar
#extend: To change the end of the color bar like pointed or not. If you want pointed color bar both side then passes value ‘both’, for left ‘min’ , right ‘max’ and ‘neither’ for no pointed color bar.
#extendfrac: To adjust the extension of the color bar. The ‘auto’ value adjust pointer automatically, ‘False’ value for no pointer and float value will help to adjust color bar pointer according to you.
#ticks: To change the ticks of the color bar, Pass list or numpy array of ticks
cbar_kws = {"orientation":"horizontal",  "shrink":1, 'extend':'max', 'extendfrac':0.1, "ticks":np.arange(0,100 , 5) }

plt.subplot(1,2,1)
#cmap: Color Pallete
#Linewidths: width of partition line & linecolor: color of the line
p1 = sns.heatmap(state_votes_1, cmap="YlGnBu", linewidths= 0.01 , linecolor="lightblue" , annot=True, fmt="0.2f" , cbar_kws=cbar_kws);
p1.set_title("OVER TOTAL VOTES POLLED IN CONSTITUENCY");

plt.subplot(1,2,2)
# annot: True and fmt:"0.2f" parameter let us show values inside the heatmap
p2 = sns.heatmap(state_votes_2, cmap="BuPu", linewidths= 0.01 , linecolor="lightblue" , annot=True, fmt="0.2f" , cbar_kws=cbar_kws);
p2.set_title("OVER TOTAL ELECTORS IN CONSTITUENCY");

#Note: If you will pass string values to annot without using fmt then the error will occur.

plt.tight_layout(pad=3);
Notebook Image

The 1st graph shows that BJP has outperformed in majority of states except for Andhra Pradesh, Kerela and Telengana. After that we can see that INC has the second highest number of votes. The rest of parties have comparatively scored lesser votes except for some parties in a particular state like BSP & SP in Uttar Pradesh, CPI in Tamil Nadu and Kerela, AITC in West Bengal.

In 2nd grpah we can infer that BJP has got secured high number of votes in many states as well know that they won the elections, but an interesting insight is that they have performed poorly in Southern states like Andhra Pradesh, Kerela, Telengana and Tamil Nadu

In [55]:
import jovian
In [56]:
jovian.commit(outputs=['lok_sabha_2019.csv'])
[jovian] Attempting to save notebook.. [jovian] Updating notebook "nihir10dec/lok-sabha-2019-candidates-analysis" on https://jovian.ml/ [jovian] Uploading notebook.. [jovian] Capturing environment.. [jovian] Uploading additional outputs... [jovian] Committed successfully! https://jovian.ml/nihir10dec/lok-sabha-2019-candidates-analysis

Inferences and Conclusion

We've drawn some interesting inferences from doing this analaysis, here's a summary of them:
  • The average age of most of the candidates contesting elections is nearly between 45-60 years and also the winners are similar age range. There are very less people outside that range who are contesting elections or winning elections. As it's said that India has become the youngest country in 2020 because we have huge population of youngsters we can see more number of young people to understand politics and join them

  • The sterotype we generally have in mind that people who are not educated and can't do anything other in life usually join Politics in India was very much cleared by visualizing this. Almost 60% of Contestants as well as Winners are Graduates or Post Graduates which is good level of education.

  • There is also another generalization that if you are into politics then by default you have some criminal contacts or cases which by this visualisation it clearly shows that isn't true. Out of 2263 contestants , 1264 contestants have no criminal records or cases which is more than 50%. Also that Kerela has the highest number of criminal cases of the contestants as well as winners.

  • By checking the financial aspects of the contestants we concluded that Uttar Pradesh and Andhra Pradesh have contestants with very high Total Withholdings. As UP also has the highest number of Lok Sabha seats ie 80 but as compared AP only has 25 which infers that it has rich contestants.

  • We have always heard the word vote bank and heard conspiracies that many parties target particular religions and categories for their vote bank. But we also saw that among all contestants, almost 75% of them are from General category and remaining 25% from SC & ST and also the winners have the same ratio which doesn't mean that particular category is given any high priority or lesser opportunity.

  • Another clear understanding we got here in this that great majority of people contesting for Lok Sabha 2019 Elections are Male and only 14-15% are Females which is very less number. Females should try to indulge more in politics and men should encourage and motivate to do so.

  • Finally we have seen that BJP had outperformed in almost all big states if we see TOTAL VOTES POLLED IN CONSTITUENCY. They were followed by INC with second highest votes. But it can also be seen that BJP wasn't able to perform well in the southern states like Andhra Pradesh, Kerela, Telengana and Tamil Nadu where they can target it next time.

References and Future Work

References:-
Future Works:-
  • From this datasest we can infer many other things about different parties and how they performed in each state.
  • We can also check different education level of conttestants and criminal cases of different parties.
  • There is also scope of combining other years data and compare different years election data to figure out any trends in them.
In [61]:
import jovian
In [62]:
jovian.commit(outputs=['lok_sabha_2019.csv'])
[jovian] Attempting to save notebook.. [jovian] Updating notebook "nihir10dec/lok-sabha-2019-candidates-analysis" on https://jovian.ml/ [jovian] Uploading notebook.. [jovian] Capturing environment.. [jovian] Uploading additional outputs... [jovian] Committed successfully! https://jovian.ml/nihir10dec/lok-sabha-2019-candidates-analysis