Jovian
⭐️
Sign In

PS - You need to have the CSV file uploaded in the jupyter notebook. Here is the link to the dataset - https://www.kaggle.com/greeshmagirish/crime-against-women-20012014-india

Crimes against women in India

This project contains the data record of all types of crime against women in india in the span of year 2001-2014. Main aim of working on this project is to reflect the situation of women in our society and raise concern about this matter.

This project is part of my Data Analysis with Python: Zero to Pandas course.
In [1]:
import jovian
In [2]:
jovian.commit(project='crime-against-women', environment=None)
[jovian] Attempting to save notebook.. [jovian] Please enter your API key ( from https://jovian.ml/ ): API KEY: ········ [jovian] Updating notebook "sathi-satb/crime-against-women" on https://jovian.ml/ [jovian] Uploading notebook.. [jovian] Committed successfully! https://jovian.ml/sathi-satb/crime-against-women
In [3]:
!pip install pandas 
!pip install matplotlib 
!pip install seaborn
!pip install plotly
Requirement already satisfied: pandas in /srv/conda/envs/notebook/lib/python3.7/site-packages (1.1.2) Requirement already satisfied: pytz>=2017.2 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from pandas) (2020.1) Requirement already satisfied: numpy>=1.15.4 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from pandas) (1.19.2) Requirement already satisfied: python-dateutil>=2.7.3 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from pandas) (2.8.1) Requirement already satisfied: six>=1.5 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from python-dateutil>=2.7.3->pandas) (1.15.0) Requirement already satisfied: matplotlib in /srv/conda/envs/notebook/lib/python3.7/site-packages (3.3.2) Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.3 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from matplotlib) (2.4.7) Requirement already satisfied: pillow>=6.2.0 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from matplotlib) (7.2.0) Requirement already satisfied: numpy>=1.15 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from matplotlib) (1.19.2) Requirement already satisfied: certifi>=2020.06.20 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from matplotlib) (2020.6.20) Requirement already satisfied: kiwisolver>=1.0.1 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from matplotlib) (1.2.0) Requirement already satisfied: python-dateutil>=2.1 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from matplotlib) (2.8.1) Requirement already satisfied: cycler>=0.10 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from matplotlib) (0.10.0) Requirement already satisfied: six>=1.5 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from python-dateutil>=2.1->matplotlib) (1.15.0) Requirement already satisfied: seaborn in /srv/conda/envs/notebook/lib/python3.7/site-packages (0.11.0) Requirement already satisfied: numpy>=1.15 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from seaborn) (1.19.2) Requirement already satisfied: scipy>=1.0 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from seaborn) (1.5.2) Requirement already satisfied: pandas>=0.23 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from seaborn) (1.1.2) Requirement already satisfied: matplotlib>=2.2 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from seaborn) (3.3.2) Requirement already satisfied: pytz>=2017.2 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from pandas>=0.23->seaborn) (2020.1) Requirement already satisfied: python-dateutil>=2.7.3 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from pandas>=0.23->seaborn) (2.8.1) Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.3 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from matplotlib>=2.2->seaborn) (2.4.7) Requirement already satisfied: cycler>=0.10 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from matplotlib>=2.2->seaborn) (0.10.0) Requirement already satisfied: certifi>=2020.06.20 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from matplotlib>=2.2->seaborn) (2020.6.20) Requirement already satisfied: pillow>=6.2.0 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from matplotlib>=2.2->seaborn) (7.2.0) Requirement already satisfied: kiwisolver>=1.0.1 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from matplotlib>=2.2->seaborn) (1.2.0) Requirement already satisfied: six>=1.5 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from python-dateutil>=2.7.3->pandas>=0.23->seaborn) (1.15.0) Requirement already satisfied: plotly in /srv/conda/envs/notebook/lib/python3.7/site-packages (4.10.0) Requirement already satisfied: six in /srv/conda/envs/notebook/lib/python3.7/site-packages (from plotly) (1.15.0) Requirement already satisfied: retrying>=1.3.3 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from plotly) (1.3.3)
In [4]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
%matplotlib inline

Importing Data

In [ ]:
jovian.commit('crimes_against_women.csv')
[jovian] Attempting to save notebook..
In [ ]:
crimes_df = pd.read_csv('crimes_against_women.csv')
In [ ]:
crimes_df
Let us find out the number of rows and column of the particular dataset
In [ ]:
crimes_df.shape
In [ ]:
import jovian
In [11]:
jovian.commit()
[jovian] Attempting to save notebook.. [jovian] Updating notebook "sathi-satb/crime-against-women" on https://jovian.ml/ [jovian] Uploading notebook.. [jovian] Capturing environment.. [jovian] Committed successfully! https://jovian.ml/sathi-satb/crime-against-women

Data Preparation and Cleaning

To start with the very basic of data cleaning, let's find out if any of the columns have any Null or missing values
In [12]:
overall_crime = crimes_df.isna().sum()
In [13]:
overall_crime
Out[13]:
Unnamed: 0                    0
STATE/UT                      0
DISTRICT                      0
Year                          0
Rape                          0
Kidnapping_Abduction          0
Dowry_Deaths                  0
Assault_for_her_modesty       0
Insult_to_modesty_of_Women    0
Domestic_violence             0
Importation_of_Girls          0
dtype: int64
None of the column has any Null values.
Now let's find the total number of 'Unique' districts, where the crimes have been committed
In [14]:
districts = len(crimes_df.DISTRICT.unique())
In [15]:
districts
Out[15]:
1605
But there are 718 districts in India, in total, which means there is messy or false datas in a huge amount, in this case, we better drop the column "District" and also "Unnamed: 0", as it is of no use, in our data analysis process.
In [16]:
crimes_df.drop(['DISTRICT', 'Unnamed: 0'], axis = 1, inplace=True)

In [17]:
crimes_df
Out[17]:
Now, let's start with analysing the datas of the column "STATE/UT", for that let's find out the names of all the states/UT through .unique()
In [18]:
print(crimes_df['STATE/UT'].unique())
['ANDHRA PRADESH' 'ARUNACHAL PRADESH' 'ASSAM' 'BIHAR' 'CHHATTISGARH' 'GOA' 'GUJARAT' 'HARYANA' 'HIMACHAL PRADESH' 'JAMMU & KASHMIR' 'JHARKHAND' 'KARNATAKA' 'KERALA' 'MADHYA PRADESH' 'MAHARASHTRA' 'MANIPUR' 'MEGHALAYA' 'MIZORAM' 'NAGALAND' 'ODISHA' 'PUNJAB' 'RAJASTHAN' 'SIKKIM' 'TAMIL NADU' 'TRIPURA' 'UTTAR PRADESH' 'UTTARAKHAND' 'WEST BENGAL' 'A & N ISLANDS' 'CHANDIGARH' 'D & N HAVELI' 'DAMAN & DIU' 'DELHI' 'LAKSHADWEEP' 'PUDUCHERRY' 'Andhra Pradesh' 'Arunachal Pradesh' 'Assam' 'Bihar' 'Chhattisgarh' 'Goa' 'Gujarat' 'Haryana' 'Himachal Pradesh' 'Jammu & Kashmir' 'Jharkhand' 'Karnataka' 'Kerala' 'Madhya Pradesh' 'Maharashtra' 'Manipur' 'Meghalaya' 'Mizoram' 'Nagaland' 'Odisha' 'Punjab' 'Rajasthan' 'Sikkim' 'Tamil Nadu' 'Tripura' 'Uttar Pradesh' 'Uttarakhand' 'West Bengal' 'A&N Islands' 'Chandigarh' 'D&N Haveli' 'Daman & Diu' 'Delhi UT' 'Lakshadweep' 'Puducherry' 'Telangana' 'A & N Islands']
We can see from above that there are lot many repeated datas, like some of them are repeated again by using capital letters and some of them have issues with space too, like A&N Islands and also Delhi has been repeated again by mentioning it as Delhi UT
In [19]:
# Fist we will remove all the repeated uppercase values
def remove_uppercase(r):
    r = r['STATE/UT'].strip()
    r = r.upper()
    return r
crimes_df['STATE/UT'] = crimes_df.apply(remove_uppercase, axis=1)

#Now use replace function to replace the other type of repeated datas as dicussed above
crimes_df['STATE/UT'].replace("A&N ISLANDS", "A & N ISLANDS", inplace = True)
crimes_df['STATE/UT'].replace("D&N HAVELI", "D & N HAVELI", inplace = True)
crimes_df['STATE/UT'].replace("DELHI UT", "DELHI", inplace = True)
Let's go through the datas now!
In [20]:
crimes_df['STATE/UT'].unique()
Out[20]:
array(['ANDHRA PRADESH', 'ARUNACHAL PRADESH', 'ASSAM', 'BIHAR',
       'CHHATTISGARH', 'GOA', 'GUJARAT', 'HARYANA', 'HIMACHAL PRADESH',
       'JAMMU & KASHMIR', 'JHARKHAND', 'KARNATAKA', 'KERALA',
       'MADHYA PRADESH', 'MAHARASHTRA', 'MANIPUR', 'MEGHALAYA', 'MIZORAM',
       'NAGALAND', 'ODISHA', 'PUNJAB', 'RAJASTHAN', 'SIKKIM',
       'TAMIL NADU', 'TRIPURA', 'UTTAR PRADESH', 'UTTARAKHAND',
       'WEST BENGAL', 'A & N ISLANDS', 'CHANDIGARH', 'D & N HAVELI',
       'DAMAN & DIU', 'DELHI', 'LAKSHADWEEP', 'PUDUCHERRY', 'TELANGANA'],
      dtype=object)
Let's check the total number of States+UT
In [21]:
len(crimes_df['STATE/UT'].unique())
Out[21]:
36
Which is coming out perfect, hence we are done with our data cleaning process of our dataset
In [22]:
import jovian
In [23]:
jovian.commit()
[jovian] Attempting to save notebook.. [jovian] Updating notebook "sathi-satb/crime-against-women" on https://jovian.ml/ [jovian] Uploading notebook.. [jovian] Capturing environment.. [jovian] Committed successfully! https://jovian.ml/sathi-satb/crime-against-women

Exploratory Analysis and Visualization

Let us find out the total population of women over the years, 2001-2014, who has been a victim of the crime based on their gender.
In [24]:
victims_raped = crimes_df.Rape.sum()
victims_kidnapped_abducted = crimes_df.Kidnapping_Abduction.sum()
dowery_death = crimes_df.Dowry_Deaths.sum()
modesty_assault = crimes_df.Assault_for_her_modesty.sum()
insult_to_modesty = crimes_df.Insult_to_modesty_of_Women.sum()
domestic_violence = crimes_df.Domestic_violence.sum()
girls_imported = crimes_df.Importation_of_Girls.sum()
In [25]:
total_population_of_victim_overall = victims_raped + victims_raped + dowery_death +modesty_assault+ insult_to_modesty + domestic_violence+ girls_imported
total_population_of_victim_overall
Out[25]:
5194570
This analysis potrays a heartbreaking situation of women in our society, as more than 5 million number of females, over the years 2001-2014, have been a victim of assault, violance, rape or even death, in India alone.
Now let us analyse the all the cases sepately by using bar graph!
In [26]:
fig, axes = plt.subplots(2, 3, figsize=(25, 12))

axes[0,0].set_title("Chart of rape cases in India in 2001-2014")
axes[0,0].bar(crimes_df.Year, crimes_df.Rape, color = 'black');
plt.xlabel('Year') #X-axis
plt.ylabel('Cases of Rape in India') #Y-axis

axes[0,1].set_title("Chart of Kidnapping and Abduction cases in India in 2001-2014")
axes[0,1].bar(crimes_df.Year, crimes_df.Kidnapping_Abduction, color = 'violet');
plt.xlabel('Year') #X-axis
plt.ylabel('Cases of Kidnappinga and Abduction in India') #Y-axis

axes[0,2].set_title("Chart of Dowry death cases in India in 2001-2014")
axes[0,2].bar(crimes_df.Year, crimes_df.Dowry_Deaths, color = 'navy');
plt.xlabel('Year') #X-axis
plt.ylabel('Cases of Dowry deaths in India') #Y-axis

axes[1,0].set_title("Chart of Assault to her modesty in 2001-2014")
axes[1,0].bar(crimes_df.Year, crimes_df.Assault_for_her_modesty, color = 'cyan');
plt.xlabel('Year') #X-axis
plt.ylabel('Cases of Assaulting a women for her modesty in India') #Y-axis

axes[1,1].set_title("Chart of Domestic Violence cases in India in 2001-2014")
axes[1,1].bar(crimes_df.Year, crimes_df.Domestic_violence, color = 'orange');
plt.xlabel('Year') #X-axis
plt.ylabel('Cases of Domestic Violance in India') #Y-axis

axes[1,2].set_title("Chart of Importation of girls in India in 2001-2014")
axes[1,2].bar(crimes_df.Year, crimes_df.Domestic_violence, color = 'red');
plt.xlabel('Year') #X-axis
plt.ylabel('Cases ofImportation of girls in India') #Y-axis
Out[26]:
Text(0, 0.5, 'Cases ofImportation of girls in India')
Notebook Image

There are two things to be concluded from the above bar chart -

1) The cases have incraesed over the years.
2) 2014 has been the year, where violance against women was reported the maximum, under each of the cases like, rape, domestic violance etc.
In [53]:
plt.figure(figsize=(14,6))
cases_2001_df = crimes.df[crimes_df.Year == '2014']
plt.hist(crimes_df.Year, alpha = 0.4);

Notebook Image
In [ ]:
plt.hist(cases_2001_df.sepal_width, alpha=0.4, bins=np.arange(2, 5, 0.25));
plt.hist(versicolor_df.sepal_width, alpha=0.4, bins=np.arange(2, 5, 0.25));
In [ ]:
 
In [31]:
import jovian
In [32]:
jovian.commit()
[jovian] Attempting to save notebook.. [jovian] Updating notebook "sathi-satb/crime-against-women" on https://jovian.ml/ [jovian] Uploading notebook.. [jovian] Capturing environment.. [jovian] Committed successfully! https://jovian.ml/sathi-satb/crime-against-women

Asking and Answering Questions

In [29]:
import jovian
In [30]:
jovian.commit()
[jovian] Attempting to save notebook.. [jovian] Updating notebook "sathi-satb/crime-against-women" on https://jovian.ml/ [jovian] Uploading notebook.. [jovian] Capturing environment.. [jovian] Committed successfully! https://jovian.ml/sathi-satb/crime-against-women

Inferences and Conclusion

In [45]:
import jovian
In [ ]:
jovian.commit()
[jovian] Attempting to save notebook..

References and Future Work

In [ ]:
import jovian
In [ ]:
jovian.commit()
In [ ]: