Sign In

PS - You need to have the CSV file uploaded in the jupyter notebook. Here is the link to the dataset -

Crimes against women in India

This project contains the data record of all types of crime against women in india in the span of year 2001-2014. Main aim of working on this project is to reflect the situation of women in our society and raise concern about this matter.

This project is part of my Data Analysis with Python: Zero to Pandas course.
In [31]:
import jovian
In [ ]:
jovian.commit(project='crime-against-women', environment=None)
[jovian] Attempting to save notebook..
In [ ]:
!pip install pandas 
!pip install matplotlib 
!pip install seaborn
!pip install plotly
In [ ]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import as px
%matplotlib inline

Importing Data

In [ ]:
In [ ]:
crimes_df = pd.read_csv('crimes_against_women.csv')
In [ ]:
Let us find out the number of rows and column of the particular dataset
In [ ]:
In [ ]:
import jovian
In [ ]:

Data Preparation and Cleaning

To start with the very basic of data cleaning, let's find out if any of the columns have any Null or missing values
In [ ]:
overall_crime = crimes_df.isna().sum()
In [ ]:
None of the column has any Null values.
Now let's find the total number of 'Unique' districts, where the crimes have been committed
In [ ]:
districts = len(crimes_df.DISTRICT.unique())
In [ ]:
But there are 718 districts in India, in total, which means there is messy or false datas in a huge amount, in this case, we better drop the column "District" and also "Unnamed: 0", as it is of no use, in our data analysis process.
In [ ]:
crimes_df.drop(['DISTRICT', 'Unnamed: 0'], axis = 1, inplace=True)

In [ ]:
Now, let's start with analysing the datas of the column "STATE/UT", for that let's find out the names of all the states/UT through .unique()
In [ ]:
We can see from above that there are lot many repeated datas, like some of them are repeated again by using capital letters and some of them have issues with space too, like A&N Islands and also Delhi has been repeated again by mentioning it as Delhi UT
In [19]:
# Fist we will remove all the repeated uppercase values
def remove_uppercase(r):
    r = r['STATE/UT'].strip()
    r = r.upper()
    return r
crimes_df['STATE/UT'] = crimes_df.apply(remove_uppercase, axis=1)

#Now use replace function to replace the other type of repeated datas as dicussed above
crimes_df['STATE/UT'].replace("A&N ISLANDS", "A & N ISLANDS", inplace = True)
crimes_df['STATE/UT'].replace("D&N HAVELI", "D & N HAVELI", inplace = True)
crimes_df['STATE/UT'].replace("DELHI UT", "DELHI", inplace = True)
Let's go through the datas now!
In [20]:
Let's check the total number of States+UT
In [21]:
Which is coming out perfect, hence we are done with our data cleaning process of our dataset
In [22]:
import jovian
In [23]:
[jovian] Attempting to save notebook.. [jovian] Updating notebook "sathi-satb/crime-against-women" on [jovian] Uploading notebook.. [jovian] Capturing environment.. [jovian] Committed successfully!

Exploratory Analysis and Visualization

Let us find out the total population of women over the years, 2001-2014, who has been a victim of the crime based on their gender.
In [24]:
victims_raped = crimes_df.Rape.sum()
victims_kidnapped_abducted = crimes_df.Kidnapping_Abduction.sum()
dowery_death = crimes_df.Dowry_Deaths.sum()
modesty_assault = crimes_df.Assault_for_her_modesty.sum()
insult_to_modesty = crimes_df.Insult_to_modesty_of_Women.sum()
domestic_violence = crimes_df.Domestic_violence.sum()
girls_imported = crimes_df.Importation_of_Girls.sum()
In [25]:
total_population_of_victim_overall = victims_raped + victims_raped + dowery_death +modesty_assault+ insult_to_modesty + domestic_violence+ girls_imported
This analysis potrays a heartbreaking situation of women in our society, as more than 5 million number of females, over the years 2001-2014, have been a victim of assault, violance, rape or even death, in India alone.
In [26]:
cases_2001_df = crimes_df[crimes_df.Year == 2001]
In [27]:
plt.hist(cases_2001_df,bins=np.arange(6, 7, 0.25));
In [ ]:
Now let us analyse the all the cases sepately by using bar graph!
In [28]:
fig, axes = plt.subplots(2, 3, figsize=(25, 12))

axes[0,0].set_title("Chart of rape cases in India in 2001-2014")
axes[0,0].bar(crimes_df.Year, crimes_df.Rape, color = 'black');
plt.xlabel('Year') #X-axis
plt.ylabel('Cases of Rape in India') #Y-axis

axes[0,1].set_title("Chart of Kidnapping and Abduction cases in India in 2001-2014")
axes[0,1].bar(crimes_df.Year, crimes_df.Kidnapping_Abduction, color = 'violet');
plt.xlabel('Year') #X-axis
plt.ylabel('Cases of Kidnappinga and Abduction in India') #Y-axis

axes[0,2].set_title("Chart of Dowry death cases in India in 2001-2014")
axes[0,2].bar(crimes_df.Year, crimes_df.Dowry_Deaths, color = 'navy');
plt.xlabel('Year') #X-axis
plt.ylabel('Cases of Dowry deaths in India') #Y-axis

axes[1,0].set_title("Chart of Assault to her modesty in 2001-2014")
axes[1,0].bar(crimes_df.Year, crimes_df.Assault_for_her_modesty, color = 'cyan');
plt.xlabel('Year') #X-axis
plt.ylabel('Cases of Assaulting a women for her modesty in India') #Y-axis

axes[1,1].set_title("Chart of Domestic Violence cases in India in 2001-2014")
axes[1,1].bar(crimes_df.Year, crimes_df.Domestic_violence, color = 'orange');
plt.xlabel('Year') #X-axis
plt.ylabel('Cases of Domestic Violance in India') #Y-axis

axes[1,2].set_title("Chart of Importation of girls in India in 2001-2014")
axes[1,2].bar(crimes_df.Year, crimes_df.Domestic_violence, color = 'red');
plt.xlabel('Year') #X-axis
plt.ylabel('Cases ofImportation of girls in India') #Y-axis
Text(0, 0.5, 'Cases ofImportation of girls in India')

There are two things to be concluded from the above bar chart -

1) The cases have incraesed over the years.
2) 2014 has been the year, where violance against women was reported the maximum, under each of the cases like, rape, domestic violance etc.
In [29]:
rapecase_data = crimes_df["Rape"]
kidnapping_abd_case = crimes_df["Kidnapping_Abduction"]
In [30]:
plt.pie(crimes_df.Rape, crimes_df.Kidnapping_Abduction, shadow=True, startangle=140)
plt.title("Gold medal achievements of five most successful\n"+"countries in 2016 Summer Olympics")
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) /srv/conda/envs/notebook/lib/python3.7/site-packages/IPython/core/ in __call__(self, obj) 339 pass 340 else: --> 341 return printer(obj) 342 # Finally look for special method names 343 method = get_real_method(obj, self.print_method) /srv/conda/envs/notebook/lib/python3.7/site-packages/IPython/core/ in <lambda>(fig) 246 247 if 'png' in formats: --> 248 png_formatter.for_type(Figure, lambda fig: print_figure(fig, 'png', **kwargs)) 249 if 'retina' in formats or 'png2x' in formats: 250 png_formatter.for_type(Figure, lambda fig: retina_figure(fig, **kwargs)) /srv/conda/envs/notebook/lib/python3.7/site-packages/IPython/core/ in print_figure(fig, fmt, bbox_inches, **kwargs) 130 FigureCanvasBase(fig) 131 --> 132 fig.canvas.print_figure(bytes_io, **kw) 133 data = bytes_io.getvalue() 134 if fmt == 'svg': /srv/conda/envs/notebook/lib/python3.7/site-packages/matplotlib/ in print_figure(self, filename, dpi, facecolor, edgecolor, orientation, format, bbox_inches, pad_inches, bbox_extra_artists, backend, **kwargs) 2215 orientation=orientation, 2216 bbox_inches_restore=_bbox_inches_restore, -> 2217 **kwargs) 2218 finally: 2219 if bbox_inches and restore_bbox: /srv/conda/envs/notebook/lib/python3.7/site-packages/matplotlib/ in wrapper(*args, **kwargs) 1637 kwargs.pop(arg) 1638 -> 1639 return func(*args, **kwargs) 1640 1641 return wrapper /srv/conda/envs/notebook/lib/python3.7/site-packages/matplotlib/backends/ in print_png(self, filename_or_obj, metadata, pil_kwargs, *args) 507 *metadata*, including the default 'Software' key. 508 """ --> 509 FigureCanvasAgg.draw(self) 510 mpl.image.imsave( 511 filename_or_obj, self.buffer_rgba(), format="png", origin="upper", /srv/conda/envs/notebook/lib/python3.7/site-packages/matplotlib/backends/ in draw(self) 400 def draw(self): 401 # docstring inherited --> 402 self.renderer = self.get_renderer(cleared=True) 403 # Acquire a lock on the shared font cache. 404 with RendererAgg.lock, \ /srv/conda/envs/notebook/lib/python3.7/site-packages/matplotlib/backends/ in get_renderer(self, cleared) 416 and getattr(self, "_lastKey", None) == key) 417 if not reuse_renderer: --> 418 self.renderer = RendererAgg(w, h, self.figure.dpi) 419 self._lastKey = key 420 elif cleared: /srv/conda/envs/notebook/lib/python3.7/site-packages/matplotlib/backends/ in __init__(self, width, height, dpi) 94 self.width = width 95 self.height = height ---> 96 self._renderer = _RendererAgg(int(width), int(height), dpi) 97 self._filter_renderers = [] 98 ValueError: Image size of 1240620x1157162 pixels is too large. It must be less than 2^16 in each direction.
<Figure size 432x288 with 1 Axes>
In [ ]:
In [63]:
import jovian
In [64]:
[jovian] Attempting to save notebook.. [jovian] Updating notebook "sathi-satb/crime-against-women" on [jovian] Uploading notebook.. [jovian] Capturing environment.. [jovian] Committed successfully!

Asking and Answering Questions

In [29]:
import jovian
In [30]:
[jovian] Attempting to save notebook.. [jovian] Updating notebook "sathi-satb/crime-against-women" on [jovian] Uploading notebook.. [jovian] Capturing environment.. [jovian] Committed successfully!

Inferences and Conclusion

In [45]:
import jovian
In [ ]:
[jovian] Attempting to save notebook..

References and Future Work

In [ ]:
import jovian
In [ ]:
In [ ]: