Jovian
⭐️
Sign In

PS - You need to have the CSV file uploaded in the jupyter notebook. Here is the link to the dataset - https://www.kaggle.com/greeshmagirish/crime-against-women-20012014-india

Crimes against women in India

This project contains the data record of all types of crime against women in india in the span of year 2001-2014. Main aim of working on this project is to reflect the situation of women in our society and raise concern about this matter.

This project is part of my Data Analysis with Python: Zero to Pandas course.
In [1]:
import jovian
In [2]:
jovian.commit(project='crime-against-women', environment=None)
[jovian] Attempting to save notebook.. [jovian] Please enter your API key ( from https://jovian.ml/ ): API KEY: ········ [jovian] Updating notebook "sathi-satb/crime-against-women" on https://jovian.ml/ [jovian] Uploading notebook.. [jovian] Committed successfully! https://jovian.ml/sathi-satb/crime-against-women
In [3]:
!pip install pandas 
!pip install matplotlib 
!pip install seaborn
!pip install plotly
Requirement already satisfied: pandas in /srv/conda/envs/notebook/lib/python3.7/site-packages (1.1.2) Requirement already satisfied: numpy>=1.15.4 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from pandas) (1.19.2) Requirement already satisfied: python-dateutil>=2.7.3 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from pandas) (2.8.1) Requirement already satisfied: pytz>=2017.2 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from pandas) (2020.1) Requirement already satisfied: six>=1.5 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from python-dateutil>=2.7.3->pandas) (1.15.0) Requirement already satisfied: matplotlib in /srv/conda/envs/notebook/lib/python3.7/site-packages (3.3.2) Requirement already satisfied: kiwisolver>=1.0.1 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from matplotlib) (1.2.0) Requirement already satisfied: python-dateutil>=2.1 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from matplotlib) (2.8.1) Requirement already satisfied: numpy>=1.15 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from matplotlib) (1.19.2) Requirement already satisfied: pillow>=6.2.0 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from matplotlib) (7.2.0) Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.3 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from matplotlib) (2.4.7) Requirement already satisfied: cycler>=0.10 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from matplotlib) (0.10.0) Requirement already satisfied: certifi>=2020.06.20 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from matplotlib) (2020.6.20) Requirement already satisfied: six>=1.5 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from python-dateutil>=2.1->matplotlib) (1.15.0) Requirement already satisfied: seaborn in /srv/conda/envs/notebook/lib/python3.7/site-packages (0.11.0) Requirement already satisfied: numpy>=1.15 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from seaborn) (1.19.2) Requirement already satisfied: matplotlib>=2.2 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from seaborn) (3.3.2) Requirement already satisfied: pandas>=0.23 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from seaborn) (1.1.2) Requirement already satisfied: scipy>=1.0 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from seaborn) (1.5.2) Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.3 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from matplotlib>=2.2->seaborn) (2.4.7) Requirement already satisfied: cycler>=0.10 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from matplotlib>=2.2->seaborn) (0.10.0) Requirement already satisfied: kiwisolver>=1.0.1 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from matplotlib>=2.2->seaborn) (1.2.0) Requirement already satisfied: python-dateutil>=2.1 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from matplotlib>=2.2->seaborn) (2.8.1) Requirement already satisfied: pillow>=6.2.0 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from matplotlib>=2.2->seaborn) (7.2.0) Requirement already satisfied: certifi>=2020.06.20 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from matplotlib>=2.2->seaborn) (2020.6.20) Requirement already satisfied: pytz>=2017.2 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from pandas>=0.23->seaborn) (2020.1) Requirement already satisfied: six in /srv/conda/envs/notebook/lib/python3.7/site-packages (from cycler>=0.10->matplotlib>=2.2->seaborn) (1.15.0) Requirement already satisfied: plotly in /srv/conda/envs/notebook/lib/python3.7/site-packages (4.10.0) Requirement already satisfied: six in /srv/conda/envs/notebook/lib/python3.7/site-packages (from plotly) (1.15.0) Requirement already satisfied: retrying>=1.3.3 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from plotly) (1.3.3)
In [4]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
%matplotlib inline

Importing Data

In [5]:
jovian.commit('crimes_against_women.csv')
[jovian] Attempting to save notebook.. [jovian] Updating notebook "sathi-satb/crime-against-women" on https://jovian.ml/ [jovian] Uploading notebook.. [jovian] Capturing environment.. [jovian] Committed successfully! https://jovian.ml/sathi-satb/crime-against-women
In [6]:
crimes_df = pd.read_csv('crimes_against_women.csv')
Let us find out the number of rows and column of the particular dataset
In [8]:
crimes_df.shape
Out[8]:
(10677, 11)
In [9]:
import jovian
In [10]:
jovian.commit()
[jovian] Attempting to save notebook.. [jovian] Updating notebook "sathi-satb/crime-against-women" on https://jovian.ml/ [jovian] Uploading notebook.. [jovian] Capturing environment.. [jovian] Committed successfully! https://jovian.ml/sathi-satb/crime-against-women

Data Preparation and Cleaning

To start with the very basic of data cleaning, let's find out if any of the columns have any Null or missing values
In [11]:
overall_crime = crimes_df.isna().sum()
In [12]:
overall_crime
Out[12]:
Unnamed: 0                    0
STATE/UT                      0
DISTRICT                      0
Year                          0
Rape                          0
Kidnapping_Abduction          0
Dowry_Deaths                  0
Assault_for_her_modesty       0
Insult_to_modesty_of_Women    0
Domestic_violence             0
Importation_of_Girls          0
dtype: int64
None of the column has any Null values.
Now let's find the total number of 'Unique' districts, where the crimes have been committed
In [13]:
districts = len(crimes_df.DISTRICT.unique())
In [14]:
districts
Out[14]:
1605
But there are 718 districts in India, in total, which means there is messy or false datas in a huge amount, in this case, we better drop the column "District" and also "Unnamed: 0", as it is of no use, in our data analysis process.
In [15]:
crimes_df.drop(['DISTRICT', 'Unnamed: 0'], axis = 1, inplace=True)

In [16]:
crimes_df
Out[16]:
Now, let's start with analysing the datas of the column "STATE/UT", for that let's find out the names of all the states/UT through .unique()
In [17]:
print(crimes_df['STATE/UT'].unique())
['ANDHRA PRADESH' 'ARUNACHAL PRADESH' 'ASSAM' 'BIHAR' 'CHHATTISGARH' 'GOA' 'GUJARAT' 'HARYANA' 'HIMACHAL PRADESH' 'JAMMU & KASHMIR' 'JHARKHAND' 'KARNATAKA' 'KERALA' 'MADHYA PRADESH' 'MAHARASHTRA' 'MANIPUR' 'MEGHALAYA' 'MIZORAM' 'NAGALAND' 'ODISHA' 'PUNJAB' 'RAJASTHAN' 'SIKKIM' 'TAMIL NADU' 'TRIPURA' 'UTTAR PRADESH' 'UTTARAKHAND' 'WEST BENGAL' 'A & N ISLANDS' 'CHANDIGARH' 'D & N HAVELI' 'DAMAN & DIU' 'DELHI' 'LAKSHADWEEP' 'PUDUCHERRY' 'Andhra Pradesh' 'Arunachal Pradesh' 'Assam' 'Bihar' 'Chhattisgarh' 'Goa' 'Gujarat' 'Haryana' 'Himachal Pradesh' 'Jammu & Kashmir' 'Jharkhand' 'Karnataka' 'Kerala' 'Madhya Pradesh' 'Maharashtra' 'Manipur' 'Meghalaya' 'Mizoram' 'Nagaland' 'Odisha' 'Punjab' 'Rajasthan' 'Sikkim' 'Tamil Nadu' 'Tripura' 'Uttar Pradesh' 'Uttarakhand' 'West Bengal' 'A&N Islands' 'Chandigarh' 'D&N Haveli' 'Daman & Diu' 'Delhi UT' 'Lakshadweep' 'Puducherry' 'Telangana' 'A & N Islands']
We can see from above that there are lot many repeated datas, like some of them are repeated again by using capital letters and some of them have issues with space too, like A&N Islands and also Delhi has been repeated again by mentioning it as Delhi UT
In [18]:
# Fist we will remove all the repeated uppercase values
def remove_uppercase(r):
    r = r['STATE/UT'].strip()
    r = r.upper()
    return r
crimes_df['STATE/UT'] = crimes_df.apply(remove_uppercase, axis=1)

#Now use replace function to replace the other type of repeated datas as dicussed above
crimes_df['STATE/UT'].replace("A&N ISLANDS", "A & N ISLANDS", inplace = True)
crimes_df['STATE/UT'].replace("D&N HAVELI", "D & N HAVELI", inplace = True)
crimes_df['STATE/UT'].replace("DELHI UT", "DELHI", inplace = True)
Let's go through the datas now!
In [19]:
crimes_df['STATE/UT'].unique()
Out[19]:
array(['ANDHRA PRADESH', 'ARUNACHAL PRADESH', 'ASSAM', 'BIHAR',
       'CHHATTISGARH', 'GOA', 'GUJARAT', 'HARYANA', 'HIMACHAL PRADESH',
       'JAMMU & KASHMIR', 'JHARKHAND', 'KARNATAKA', 'KERALA',
       'MADHYA PRADESH', 'MAHARASHTRA', 'MANIPUR', 'MEGHALAYA', 'MIZORAM',
       'NAGALAND', 'ODISHA', 'PUNJAB', 'RAJASTHAN', 'SIKKIM',
       'TAMIL NADU', 'TRIPURA', 'UTTAR PRADESH', 'UTTARAKHAND',
       'WEST BENGAL', 'A & N ISLANDS', 'CHANDIGARH', 'D & N HAVELI',
       'DAMAN & DIU', 'DELHI', 'LAKSHADWEEP', 'PUDUCHERRY', 'TELANGANA'],
      dtype=object)
Let's check the total number of States+UT
In [20]:
len(crimes_df['STATE/UT'].unique())
Out[20]:
36
Which is coming out perfect, hence we are done with our data cleaning process of our dataset
In [21]:
import jovian
In [22]:
jovian.commit()
[jovian] Attempting to save notebook.. [jovian] Updating notebook "sathi-satb/crime-against-women" on https://jovian.ml/ [jovian] Uploading notebook.. [jovian] Capturing environment.. [jovian] Committed successfully! https://jovian.ml/sathi-satb/crime-against-women

Exploratory Analysis and Visualization

Let us find out the total population of women over the years, 2001-2014, who has been a victim of the crime based on their gender.
In [23]:
victims_raped = crimes_df.Rape.sum()
victims_kidnapped_abducted = crimes_df.Kidnapping_Abduction.sum()
dowery_death = crimes_df.Dowry_Deaths.sum()
modesty_assault = crimes_df.Assault_for_her_modesty.sum()
insult_to_modesty = crimes_df.Insult_to_modesty_of_Women.sum()
domestic_violence = crimes_df.Domestic_violence.sum()
girls_imported = crimes_df.Importation_of_Girls.sum()
In [24]:
total_population_of_victim_overall = victims_raped + victims_raped + dowery_death +modesty_assault+ insult_to_modesty + domestic_violence+ girls_imported
total_population_of_victim_overall
Out[24]:
5194570
This analysis potrays a heartbreaking situation of women in our society, as more than 5 million number of females, over the years 2001-2014, have been a victim of assault, violance, rape or even death, in India alone.
In [25]:
cases_2001_df = crimes_df[crimes_df.Year == 2001]
In [26]:
plt.figure(figsize=(14,6))
plt.hist(cases_2001_df,bins=np.arange(6, 7, 0.25));
In [ ]:
 
Now let us analyse the all the cases sepately by using bar graph!
In [27]:
fig, axes = plt.subplots(2, 3, figsize=(25, 12))

axes[0,0].set_title("Chart of rape cases in India in 2001-2014")
axes[0,0].bar(crimes_df.Year, crimes_df.Rape, color = 'black');
plt.xlabel('Year') #X-axis
plt.ylabel('Cases of Rape in India') #Y-axis

axes[0,1].set_title("Chart of Kidnapping and Abduction cases in India in 2001-2014")
axes[0,1].bar(crimes_df.Year, crimes_df.Kidnapping_Abduction, color = 'violet');
plt.xlabel('Year') #X-axis
plt.ylabel('Cases of Kidnappinga and Abduction in India') #Y-axis

axes[0,2].set_title("Chart of Dowry death cases in India in 2001-2014")
axes[0,2].bar(crimes_df.Year, crimes_df.Dowry_Deaths, color = 'navy');
plt.xlabel('Year') #X-axis
plt.ylabel('Cases of Dowry deaths in India') #Y-axis

axes[1,0].set_title("Chart of Assault to her modesty in 2001-2014")
axes[1,0].bar(crimes_df.Year, crimes_df.Assault_for_her_modesty, color = 'cyan');
plt.xlabel('Year') #X-axis
plt.ylabel('Cases of Assaulting a women for her modesty in India') #Y-axis

axes[1,1].set_title("Chart of Domestic Violence cases in India in 2001-2014")
axes[1,1].bar(crimes_df.Year, crimes_df.Domestic_violence, color = 'orange');
plt.xlabel('Year') #X-axis
plt.ylabel('Cases of Domestic Violance in India') #Y-axis

axes[1,2].set_title("Chart of Importation of girls in India in 2001-2014")
axes[1,2].bar(crimes_df.Year, crimes_df.Domestic_violence, color = 'red');
plt.xlabel('Year') #X-axis
plt.ylabel('Cases ofImportation of girls in India') #Y-axis
Out[27]:
Text(0, 0.5, 'Cases ofImportation of girls in India')

There are two things to be concluded from the above bar chart -

1) The cases have incraesed over the years.
2) 2014 has been the year, where violance against women was reported the maximum, under each of the cases like, rape, domestic violance etc.
In [87]:
count_df = crimes_df.groupby('Year')[['STATE/UT']].count()
count_df
Out[87]:
In [98]:
plt.pie(count_df)
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-98-34462486c319> in <module> ----> 1 plt.pie(count_df) /srv/conda/envs/notebook/lib/python3.7/site-packages/matplotlib/pyplot.py in pie(x, explode, labels, colors, autopct, pctdistance, shadow, labeldistance, startangle, radius, counterclock, wedgeprops, textprops, center, frame, rotatelabels, normalize, data) 2832 wedgeprops=wedgeprops, textprops=textprops, center=center, 2833 frame=frame, rotatelabels=rotatelabels, normalize=normalize, -> 2834 **({"data": data} if data is not None else {})) 2835 2836 /srv/conda/envs/notebook/lib/python3.7/site-packages/matplotlib/__init__.py in inner(ax, data, *args, **kwargs) 1436 def inner(ax, *args, data=None, **kwargs): 1437 if data is None: -> 1438 return func(ax, *map(sanitize_sequence, args), **kwargs) 1439 1440 bound = new_sig.bind(ax, *args, **kwargs) /srv/conda/envs/notebook/lib/python3.7/site-packages/matplotlib/axes/_axes.py in pie(self, x, explode, labels, colors, autopct, pctdistance, shadow, labeldistance, startangle, radius, counterclock, wedgeprops, textprops, center, frame, rotatelabels, normalize) 2995 x = np.asarray(x, np.float32) 2996 if x.ndim > 1: -> 2997 raise ValueError("x must be 1D") 2998 2999 if np.any(x < 0): ValueError: x must be 1D
In [ ]:
 
In [99]:
import jovian
In [100]:
jovian.commit()
[jovian] Attempting to save notebook.. [jovian] Updating notebook "sathi-satb/crime-against-women" on https://jovian.ml/ [jovian] Uploading notebook.. [jovian] Capturing environment.. [jovian] Committed successfully! https://jovian.ml/sathi-satb/crime-against-women

Asking and Answering Questions

As part of this data analysis, it is very crucial to find out, where the maximum number of cases happened along with which year it has been reported into.

Q . Create a dataframe containing 10 highest reported rape cases in India, in the span of year 2001-2014.
In [113]:
crimes_df.drop(['Assault_for_her_modesty', 'Insult_to_modesty_of_Women'], axis = 1, inplace=True)
max_rape_cases = crimes_df.sort_values('Rape', ascending = False).head(10)
max_rape_cases
Out[113]:
We see from the analysis, the top states that reported the maximum number of rape cases, along with the year, in which they occured.

Where, Madhya Pradesh reported the maximum rape cases in the year 2014.

Q. Create a dataframe containing 10 highest reported deaths caused by Dowry cases in India, in the span of year 2001-2014.
In [114]:
max_dowry_death_cases = crimes_df.sort_values('Dowry_Deaths', ascending = False).head(10)
max_dowry_death_cases
Out[114]:

From our analysis, we observe that the highest reported dowry death was in Uttar Pradesh in 2014, with number of reports being 2469.

One more here to be observed as well is that, Uttar Pradesh is the ONLY state that appears in this list.
Q. Create a dataframe containing 10 highest reported Domestic Violance cases in India, in the span of year 2001-2014.
In [133]:
max_domestic_violance_cases = crimes_df.sort_values('Domestic_violence', ascending = False).head(10)
max_domestic_violance_cases
Out[133]:

According to our analysis, we see the maximum cases of Domestic Violance cases came from West Bengal in the year 2014, with number of cases reported being 23278.

Q. Create a dataframe containing 10 highest reported Importation cases in India, in the span of year 2001-2014.
In [134]:
max_importation_case = crimes_df.sort_values('Importation_of_Girls', ascending = False).head(10)
max_importation_case
Out[134]:
According to our analysis, maximum of Importation of girls has been reported in Bihar in the year 2011

Q. Find out the total number of cases, in span of 2001-2014 under each category, state wise.

In [138]:
counts_df = crimes_df.groupby('STATE/UT')[['Rape', 'Kidnapping_Abduction', 'Dowry_Deaths','Domestic_violence', 'Importation_of_Girls']].sum()
counts_df
Out[138]:

Q. Find out the top 5 states, where maximum numbers of cases has been reported in TOTAL in span of 2001-2014, each category wise.

For "Rape" case -
In [153]:
counts_df.sort_values(by = 'Rape', ascending = False).head(5)
Out[153]:
Madhya Pradesh has reported the highest number of rape cases in TOTAL in span of 2001-2014, where UP, Maharastra, West Bengal and Rajasthan follows the list.
For Kidnapping and abduction case -
In [157]:
counts_df.sort_values(by = 'Kidnapping_Abduction', ascending = False).head(5)
Out[157]:
Uttar Pradesh has reported the highest number of cses under "Kidnapping and Abduction" in TOTAL in span of 2001-2014, where Rajasthan, Assam, West Bengal and Bihar follows the list.
For cases of deaths due to dowry -
In [161]:
counts_df.sort_values(by = 'Dowry_Deaths', ascending = False).head(5)
Out[161]:
Uttar Pradesh has reported the highest number of Deaths caused by Dowry cases in TOTAL in span of 2001-2014, where Bihar, Madhya Pradesh, Andhra Pradesh and West Bengal follows the list.
For Domestic Violance case -
In [159]:
counts_df.sort_values(by = 'Domestic_violence', ascending = False).head(5)
Out[159]:
West Bengal has reported the highest number of cases of Domestic Violance in TOTAL in span of 2001-2014, where Andhra Pradesh, Rajsthan, Uttar Pradesh and Maharashtra follows the list.
For Importation of Girls case -
In [160]:
counts_df.sort_values(by = 'Importation_of_Girls', ascending = False).head(5)
Out[160]:
Bihar has reported the highest number of Importation of girl's cases in TOTAL in span of 2001-2014, where Jharkhand, West Bengal, Madhya Pradesh and Karnataka follows the list.
In [162]:
import jovian
In [ ]:
jovian.commit()
[jovian] Attempting to save notebook..

Inferences and Conclusion

In [45]:
import jovian
In [ ]:
jovian.commit()
[jovian] Attempting to save notebook..

References and Future Work

In [ ]:
import jovian
In [ ]:
jovian.commit()
In [ ]: