Jovian
⭐️
Sign In

PS - You need to have the CSV file uploaded in the jupyter notebook. Here is the link to the dataset - https://www.kaggle.com/greeshmagirish/crime-against-women-20012014-india

Crimes against women in India

This project contains the data record of all types of crime against women in india in the span of year 2001-2014. Main aim of working on this project is to reflect the situation of women in our society and raise concern about this matter.

This project is part of my Data Analysis with Python: Zero to Pandas course.
In [1]:
import jovian
In [ ]:
jovian.commit(project='crime-against-women', environment=None)
[jovian] Attempting to save notebook..
In [ ]:
!pip install pandas 
!pip install matplotlib 
!pip install seaborn
!pip install plotly
In [ ]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
%matplotlib inline

Importing Data

In [ ]:
jovian.commit('crimes_against_women.csv')
In [ ]:
crimes_df = pd.read_csv('crimes_against_women.csv')
Let us find out the number of rows and column of the particular dataset
In [ ]:
crimes_df.shape
In [ ]:
import jovian
In [ ]:
jovian.commit()

Data Preparation and Cleaning

To start with the very basic of data cleaning, let's find out if any of the columns have any Null or missing values
In [72]:
overall_crime = crimes_df.isna().sum()
In [73]:
overall_crime
Out[73]:
Unnamed: 0                    0
STATE/UT                      0
DISTRICT                      0
Year                          0
Rape                          0
Kidnapping_Abduction          0
Dowry_Deaths                  0
Assault_for_her_modesty       0
Insult_to_modesty_of_Women    0
Domestic_violence             0
Importation_of_Girls          0
dtype: int64
None of the column has any Null values.
Now let's find the total number of 'Unique' districts, where the crimes have been committed
In [74]:
districts = len(crimes_df.DISTRICT.unique())
In [75]:
districts
Out[75]:
1605
But there are 718 districts in India, in total, which means there is messy or false datas in a huge amount, in this case, we better drop the column "District" and also "Unnamed: 0", as it is of no use, in our data analysis process.
In [76]:
crimes_df.drop(['DISTRICT', 'Unnamed: 0'], axis = 1, inplace=True)

In [77]:
crimes_df
Out[77]:
Now, let's start with analysing the datas of the column "STATE/UT", for that let's find out the names of all the states/UT through .unique()
In [78]:
print(crimes_df['STATE/UT'].unique())
['ANDHRA PRADESH' 'ARUNACHAL PRADESH' 'ASSAM' 'BIHAR' 'CHHATTISGARH' 'GOA' 'GUJARAT' 'HARYANA' 'HIMACHAL PRADESH' 'JAMMU & KASHMIR' 'JHARKHAND' 'KARNATAKA' 'KERALA' 'MADHYA PRADESH' 'MAHARASHTRA' 'MANIPUR' 'MEGHALAYA' 'MIZORAM' 'NAGALAND' 'ODISHA' 'PUNJAB' 'RAJASTHAN' 'SIKKIM' 'TAMIL NADU' 'TRIPURA' 'UTTAR PRADESH' 'UTTARAKHAND' 'WEST BENGAL' 'A & N ISLANDS' 'CHANDIGARH' 'D & N HAVELI' 'DAMAN & DIU' 'DELHI' 'LAKSHADWEEP' 'PUDUCHERRY' 'Andhra Pradesh' 'Arunachal Pradesh' 'Assam' 'Bihar' 'Chhattisgarh' 'Goa' 'Gujarat' 'Haryana' 'Himachal Pradesh' 'Jammu & Kashmir' 'Jharkhand' 'Karnataka' 'Kerala' 'Madhya Pradesh' 'Maharashtra' 'Manipur' 'Meghalaya' 'Mizoram' 'Nagaland' 'Odisha' 'Punjab' 'Rajasthan' 'Sikkim' 'Tamil Nadu' 'Tripura' 'Uttar Pradesh' 'Uttarakhand' 'West Bengal' 'A&N Islands' 'Chandigarh' 'D&N Haveli' 'Daman & Diu' 'Delhi UT' 'Lakshadweep' 'Puducherry' 'Telangana' 'A & N Islands']
We can see from above that there are lot many repeated datas, like some of them are repeated again by using capital letters and some of them have issues with space too, like A&N Islands and also Delhi has been repeated again by mentioning it as Delhi UT
In [79]:
# Fist we will remove all the repeated uppercase values
def remove_uppercase(r):
    r = r['STATE/UT'].strip()
    r = r.upper()
    return r
crimes_df['STATE/UT'] = crimes_df.apply(remove_uppercase, axis=1)

#Now use replace function to replace the other type of repeated datas as dicussed above
crimes_df['STATE/UT'].replace("A&N ISLANDS", "A & N ISLANDS", inplace = True)
crimes_df['STATE/UT'].replace("D&N HAVELI", "D & N HAVELI", inplace = True)
crimes_df['STATE/UT'].replace("DELHI UT", "DELHI", inplace = True)
Let's go through the datas now!
In [80]:
crimes_df['STATE/UT'].unique()
Out[80]:
array(['ANDHRA PRADESH', 'ARUNACHAL PRADESH', 'ASSAM', 'BIHAR',
       'CHHATTISGARH', 'GOA', 'GUJARAT', 'HARYANA', 'HIMACHAL PRADESH',
       'JAMMU & KASHMIR', 'JHARKHAND', 'KARNATAKA', 'KERALA',
       'MADHYA PRADESH', 'MAHARASHTRA', 'MANIPUR', 'MEGHALAYA', 'MIZORAM',
       'NAGALAND', 'ODISHA', 'PUNJAB', 'RAJASTHAN', 'SIKKIM',
       'TAMIL NADU', 'TRIPURA', 'UTTAR PRADESH', 'UTTARAKHAND',
       'WEST BENGAL', 'A & N ISLANDS', 'CHANDIGARH', 'D & N HAVELI',
       'DAMAN & DIU', 'DELHI', 'LAKSHADWEEP', 'PUDUCHERRY', 'TELANGANA'],
      dtype=object)
Let's check the total number of States+UT
In [81]:
len(crimes_df['STATE/UT'].unique())
Out[81]:
36
Which is coming out perfect, hence we are done with our data cleaning process of our dataset
In [82]:
import jovian
In [83]:
jovian.commit()
[jovian] Attempting to save notebook.. [jovian] Updating notebook "sathi-satb/crime-against-women" on https://jovian.ml/ [jovian] Uploading notebook.. [jovian] Capturing environment.. [jovian] Committed successfully! https://jovian.ml/sathi-satb/crime-against-women

Exploratory Analysis and Visualization

Let us find out the total population of women over the years, 2001-2014, who has been a victim of the crime based on their gender.
In [84]:
victims_raped = crimes_df.Rape.sum()
victims_kidnapped_abducted = crimes_df.Kidnapping_Abduction.sum()
dowery_death = crimes_df.Dowry_Deaths.sum()
modesty_assault = crimes_df.Assault_for_her_modesty.sum()
insult_to_modesty = crimes_df.Insult_to_modesty_of_Women.sum()
domestic_violence = crimes_df.Domestic_violence.sum()
girls_imported = crimes_df.Importation_of_Girls.sum()
In [85]:
total_population_of_victim_overall = victims_raped + victims_raped + dowery_death +modesty_assault+ insult_to_modesty + domestic_violence+ girls_imported
total_population_of_victim_overall
Out[85]:
5194570
This analysis potrays a heartbreaking situation of women in our society, as more than 5 million number of females, over the years 2001-2014, have been a victim of assault, violance, rape or even death, in India alone.
Now let us analyse the all the cases sepately by using bar graph!
In [86]:
fig, axes = plt.subplots(2, 3, figsize=(25, 12))

axes[0,0].set_title("Chart of rape cases in India in 2001-2014")
axes[0,0].bar(crimes_df.Year, crimes_df.Rape, color = 'black');
plt.xlabel('Year') #X-axis
plt.ylabel('Cases of Rape in India') #Y-axis

axes[0,1].set_title("Chart of Kidnapping and Abduction cases in India in 2001-2014")
axes[0,1].bar(crimes_df.Year, crimes_df.Kidnapping_Abduction, color = 'violet');
plt.xlabel('Year') #X-axis
plt.ylabel('Cases of Kidnappinga and Abduction in India') #Y-axis

axes[0,2].set_title("Chart of Dowry death cases in India in 2001-2014")
axes[0,2].bar(crimes_df.Year, crimes_df.Dowry_Deaths, color = 'navy');
plt.xlabel('Year') #X-axis
plt.ylabel('Cases of Dowry deaths in India') #Y-axis

axes[1,0].set_title("Chart of Assault to her modesty in 2001-2014")
axes[1,0].bar(crimes_df.Year, crimes_df.Assault_for_her_modesty, color = 'cyan');
plt.xlabel('Year') #X-axis
plt.ylabel('Cases of Assaulting a women for her modesty in India') #Y-axis

axes[1,1].set_title("Chart of Domestic Violence cases in India in 2001-2014")
axes[1,1].bar(crimes_df.Year, crimes_df.Domestic_violence, color = 'orange');
plt.xlabel('Year') #X-axis
plt.ylabel('Cases of Domestic Violance in India') #Y-axis

axes[1,2].set_title("Chart of Importation of girls in India in 2001-2014")
axes[1,2].bar(crimes_df.Year, crimes_df.Domestic_violence, color = 'red');
plt.xlabel('Year') #X-axis
plt.ylabel('Cases ofImportation of girls in India') #Y-axis
Out[86]:
Text(0, 0.5, 'Cases ofImportation of girls in India')

There are two things to be concluded from the above bar chart -

1) The cases have incraesed over the years.
2) 2014 has been the year, where violance against women was reported the maximum, under each of the cases like, rape, domestic violance etc.
In [87]:
count_df = crimes_df.groupby('Year')[['STATE/UT']].count()
count_df
Out[87]:
In [ ]:
 
In [88]:
import jovian
In [89]:
jovian.commit()
[jovian] Attempting to save notebook.. [jovian] Updating notebook "sathi-satb/crime-against-women" on https://jovian.ml/ [jovian] Uploading notebook.. [jovian] Capturing environment.. [jovian] Committed successfully! https://jovian.ml/sathi-satb/crime-against-women

Asking and Answering Questions

As part of this data analysis, it is very crucial to raise question and find answer to them. Here we will try to find out some of the most essential questions, that will help us in drawing a major conclusion from our dataset.

Q . Create a dataframe containing 10 highest reported rape cases in India, in the span of year 2001-2014.
In [90]:
max_rape_cases = crimes_df.sort_values('Rape', ascending = False).head(10)
max_rape_cases
Out[90]:
We see from the analysis, the top states that reported the maximum number of rape cases, along with the year, in which they occured.

Where, Madhya Pradesh reported the maximum rape cases in the year 2014.

Q. Create a dataframe containing 10 highest reported deaths caused by Dowry cases in India, in the span of year 2001-2014.
In [91]:
max_dowry_death_cases = crimes_df.sort_values('Dowry_Deaths', ascending = False).head(10)
max_dowry_death_cases
Out[91]:

From our analysis, we observe that the highest reported dowry death was in Uttar Pradesh in 2014, with number of reports being 2469.

One more here to be observed as well is that, Uttar Pradesh is the ONLY state that appears in this list.
Q. Create a dataframe containing 10 highest reported Domestic Violance cases in India, in the span of year 2001-2014.
In [92]:
max_domestic_violance_cases = crimes_df.sort_values('Domestic_violence', ascending = False).head(10)
max_domestic_violance_cases
Out[92]:

According to our analysis, we see the maximum cases of Domestic Violance cases came from West Bengal in the year 2014, with number of cases reported being 23278.

Q. Create a dataframe containing 10 highest reported Importation cases in India, in the span of year 2001-2014.
In [93]:
max_importation_case = crimes_df.sort_values('Importation_of_Girls', ascending = False).head(10)
max_importation_case
Out[93]:

According to our analysis, maximum of Importation of girls has been reported in Bihar in the year 2011

Q. Find out the total number of cases, in span of 2001-2014 under each category, state wise.

In [110]:
counts_df = crimes_df.groupby('STATE/UT')[['Rape', 'Kidnapping_Abduction', 'Dowry_Deaths','Domestic_violence', 'Importation_of_Girls']].sum()
counts_df
Out[110]:

Q. Find out the top 5 states, where maximum numbers of cases has been reported in TOTAL in span of 2001-2014, each category wise.

For "Rape" case -
In [111]:
counts_df.sort_values(by = 'Rape', ascending = False).head(5)
Out[111]:
Madhya Pradesh has reported the highest number of rape cases in TOTAL in span of 2001-2014, where UP, Maharastra, West Bengal and Rajasthan follows the list.
For Kidnapping and abduction case -
In [112]:
counts_df.sort_values(by = 'Kidnapping_Abduction', ascending = False).head(5)
Out[112]:
Uttar Pradesh has reported the highest number of cses under "Kidnapping and Abduction" in TOTAL in span of 2001-2014, where Rajasthan, Assam, West Bengal and Bihar follows the list.
For cases of deaths due to dowry -
In [113]:
counts_df.sort_values(by = 'Dowry_Deaths', ascending = False).head(5)
Out[113]:
Uttar Pradesh has reported the highest number of Deaths caused by Dowry cases in TOTAL in span of 2001-2014, where Bihar, Madhya Pradesh, Andhra Pradesh and West Bengal follows the list.
For Domestic Violance case -
In [114]:
counts_df.sort_values(by = 'Domestic_violence', ascending = False).head(5)
Out[114]:
West Bengal has reported the highest number of cases of Domestic Violance in TOTAL in span of 2001-2014, where Andhra Pradesh, Rajsthan, Uttar Pradesh and Maharashtra follows the list.
For Importation of Girls case -
In [115]:
counts_df.sort_values(by = 'Importation_of_Girls', ascending = False).head(5)
Out[115]:
Bihar has reported the highest number of Importation of girl's cases in TOTAL in span of 2001-2014, where Jharkhand, West Bengal, Madhya Pradesh and Karnataka follows the list.
Q: Which state has featured in both the lists of "Maximum number of rape cases" and "Maximum number of Importation cases".
In [116]:
max_importation_case = max_importation_case.merge(max_rape_cases)
max_importation_case
Out[116]:
It is Madhya Pradesh, who has maximum cases reported in both the categories.
Q: Which state has featured in both the lists of "Maximum number of rape cases" and "Maximum number of Deaths due to Dowry cases".
In [117]:
max_dowry_death_cases = max_dowry_death_cases.merge(max_rape_cases)
max_dowry_death_cases
Out[117]:
We conclude, it is Uttar Pradesh, who has reported maximum cases in both the given categories.
In [118]:
import jovian
In [119]:
jovian.commit()
[jovian] Attempting to save notebook.. [jovian] Updating notebook "sathi-satb/crime-against-women" on https://jovian.ml/ [jovian] Uploading notebook.. [jovian] Capturing environment.. [jovian] Committed successfully! https://jovian.ml/sathi-satb/crime-against-women

Inferences and Conclusion

The main aim of the project was to analyse the situation of women in the year 2001-2014.
We also did a deep analysis through charts and by raising important questions. Let us go through some of the important analysis, we have done through thid project -
      1)more than 5 million females has been a victim of some or other type of Violance, based on their gender, starting from rape to importing them for buisness. 
      2) We concluded from the series of bar graphs that 2014 was the year, when crimes were reported the highest under each category.
      3) We tried finding out the top 10 highest cases reported ever, along with year in which has been reported and in which state.Where, Madhya Pradesh having highest number of cases of rape in 2014, Uttar Pradesh having highest cases in Dowry death in 2014, West Bengal having highest cases in Domestic Violance in 2014 and Bihar having the highest cases in importaion of girls in 2011.
      4)We summarised the TOTAL number of cases happening, in 2001-2014, by each state.
      5)We also found out the top 5 states where maximum number of TOTAL cases has been reported from 2001-2014, state   wise.
      6)We also merged the data in two different cases, first one being "Maximum number of rape cases" and "Maximum number of Importation cases", where we found out it is Madhya Pradesh and in second one being "Maximum number of rape cases" and "Maximum number of Deaths due to Dowry cases", which we concluded to be Uttar Pradesh.
In [122]:
import jovian
In [123]:
jovian.commit()
[jovian] Attempting to save notebook.. [jovian] Updating notebook "sathi-satb/crime-against-women" on https://jovian.ml/ [jovian] Uploading notebook.. [jovian] Capturing environment.. [jovian] Committed successfully! https://jovian.ml/sathi-satb/crime-against-women

References and Future Work

Reference -

1) All of my doubts has been clared at https://stackoverflow.com
2)For knowing more about pandas and it's functions in deatil - https://pandas.pydata.org
3) For more ideas on Matplotlib and it's library- https://matplotlib.org
4)Also for many coding related doubts, refferd to - https://www.w3schools.com

Future Work-

I want to work more on the topic of Women's safety in our society and also would do the analysis on same type of datset, but that will not be bound to any specific country but whole world in general!

In [ ]:
import jovian
In [ ]:
jovian.commit()
In [ ]: