Important links:
This is the starter notebook for the course project for Data Analysis with Python: Zero to Pandas. You will pick a real-world dataset of your choice and apply the concepts learned in this course to perform exploratory data analysis. Use this starter notebook as an outline for your project . Focus on documentation and presentation - this Jupyter notebook will also serve as a project report, so make sure to include detailed explanations wherever possible using Markdown cells.
Your submission will be evaluated using the following criteria:
Follow this step-by-step guide to work on your project.
opendatasets
Python libraryHere's some sample code for downloading the US Elections Dataset:
import opendatasets as od
dataset_url = 'https://www.kaggle.com/tunguz/us-elections-dataset'
od.download('https://www.kaggle.com/tunguz/us-elections-dataset')
You can find a list of recommended datasets here: https://jovian.ml/forum/t/recommended-datasets-for-course-project/11711
jovian.commit
.Refer to these projects for inspiration:
Analyzing your browser history using Pandas & Seaborn by Kartik Godawat
WhatsApp Chat Data Analysis by Prajwal Prashanth
Understanding the Gender Divide in Data Science Roles by Aakanksha N S
NOTE: Remove this cell containing the instructions before making your submission. You can do using the "Edit > Delete Cells" menu option.
TODO - Write some introduction about your project here: describe the dataset, where you got it from, what you're trying to do with it, and which tools & techniques you're using. You can also mention about the course Data Analysis with Python: Zero to Pandas, and what you've learned from it.
This is an executable Jupyter notebook hosted on Jovian.ml, a platform for sharing data science projects. You can run and experiment with the code in a couple of ways: using free online resources (recommended) or on your own computer.
The easiest way to start executing this notebook is to click the "Run" button at the top of this page, and select "Run on Binder". This will run the notebook on mybinder.org, a free online service for running Jupyter notebooks. You can also select "Run on Colab" or "Run on Kaggle".
Install Conda by following these instructions. Add Conda binaries to your system PATH
, so you can use the conda
command on your terminal.
Create a Conda environment and install the required libraries by running these commands on the terminal:
conda create -n zerotopandas -y python=3.8
conda activate zerotopandas
pip install jovian jupyter numpy pandas matplotlib seaborn opendatasets --upgrade
jovian clone notebook-owner/notebook-id
cd directory-name
and start the Jupyter notebook.jupyter notebook
You can now access Jupyter's web interface by clicking the link that shows up on the terminal or by visiting http://localhost:8888 on your browser. Click on the notebook file (it has a .ipynb
extension) to open it.
TODO - add some explanation here
!pip install jovian opendatasets --upgrade --quiet
Let's begin by downloading the data, and listing the files within the dataset.
The dataset has been downloaded and extracted.
import pandas as pd
import numpy as np
from urllib.request import urlretrieve
urlretrieve('https://hub.jovian.ml/wp-content/uploads/2020/09/countries.csv',
'countries.csv')
('countries.csv', <http.client.HTTPMessage at 0x7f36fd1b52b0>)
Let us save and upload our work to Jovian before continuing.
project_name = "course-project-starter-pro" # change this (use lowercase letters and hyphens only)
!pip install jovian --upgrade -q
import jovian
jovian.commit(project=project_name)
[jovian] Attempting to save notebook..
[jovian] Updating notebook "prapathade1111/course-project-starter-pro" on https://jovian.ai
[jovian] Uploading notebook..
[jovian] Uploading additional files...
[jovian] Committed successfully! https://jovian.ai/prapathade1111/course-project-starter-pro
we have downlaoded dataset from website. Next step is uploading csv file in dataframe
countries_df = pd.read_csv('countries.csv')
countries_df
countries_df.shape
(210, 6)
countries_df.count()
location 210
continent 210
population 210
life_expectancy 207
hospital_beds_per_thousand 164
gdp_per_capita 183
dtype: int64
countries_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 210 entries, 0 to 209
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 location 210 non-null object
1 continent 210 non-null object
2 population 210 non-null float64
3 life_expectancy 207 non-null float64
4 hospital_beds_per_thousand 164 non-null float64
5 gdp_per_capita 183 non-null float64
dtypes: float64(4), object(2)
memory usage: 10.0+ KB
countries_df.describe()
import jovian
jovian.commit()
[jovian] Attempting to save notebook..
[jovian] Updating notebook "prapathade1111/course-project-starter-pro" on https://jovian.ai
[jovian] Uploading notebook..
[jovian] Uploading additional files...
[jovian] Committed successfully! https://jovian.ai/prapathade1111/course-project-starter-pro
- we Compute the mean, sum & interesting statistics for numeric columns
- We explore distributions of numeric columns using histograms
- we also Explore relationship between columns using scatter plot & bar charts
countries_df.groupby('continent').mean()
countries_df['population'].sum()
7757980095.0
countries_df['life_expectancy'].sum()
15220.68
countries_df['hospital_beds_per_thousand'].sum()
494.078
countries_df['gdp_per_capita'].sum()
3565921.9689999996
countries_df['gdp_per_capita'].mean()
19485.912398907098
Let's begin by importingmatplotlib.pyplot
and seaborn
.
import seaborn as sns
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
sns.set_style('darkgrid')
matplotlib.rcParams['font.size'] = 14
matplotlib.rcParams['figure.figsize'] = (9, 5)
matplotlib.rcParams['figure.facecolor'] = '#00000000'
ploted bar graph for population of continents also analyzing data with histogram
plt.figure(figsize=(12,6))
plt.xticks(rotation=75)
plt.title('population of continents')
sns.barplot(x=countries_df['continent'], y=countries_df['population']);
plt.figure(figsize=(12, 6))
plt.title('population of continents')
plt.xlabel('population')
plt.ylabel('counts')
plt.hist(countries_df['population'], bins=(10), color='purple');
ploted bar graph for 'life_expectancy of continents' also analyzing data with histogram
plt.figure(figsize=(12,6))
plt.xticks(rotation=75)
plt.title('life_expectancy of continents')
sns.barplot(x=countries_df['continent'], y=countries_df['life_expectancy']);
plt.figure(figsize=(12, 6))
plt.title('life_expectancy of continents')
plt.xlabel('life_expectancy')
plt.ylabel('counts')
plt.hist(countries_df['life_expectancy'], bins=(10), color='red');
ploted bar graph for 'hospital_beds_per_thousand of continents' also analyzing data with histogram
plt.figure(figsize=(12,6))
plt.xticks(rotation=75)
plt.title('hospital_beds_per_thousand of continents')
sns.barplot(x=countries_df['continent'], y=countries_df['hospital_beds_per_thousand']);
plt.figure(figsize=(12, 6))
plt.title('hospital_beds_per_thousand of continents')
plt.xlabel('hospital_beds_per_thousand')
plt.ylabel('counts')
plt.hist(countries_df['hospital_beds_per_thousand'], bins=(10), color='red');
ploted bar graph for 'gdp_per_capita of continents' also analyzing data with histogram
plt.figure(figsize=(12,6))
plt.xticks(rotation=75)
plt.title('gdp_per_capita of continents')
sns.barplot(x=countries_df['continent'], y=countries_df['gdp_per_capita']);
plt.figure(figsize=(12, 6))
plt.title('gdp_per_capita of continents')
plt.xlabel('gdp_per_capita')
plt.ylabel('counts')
plt.hist(countries_df['gdp_per_capita'], bins=(10), color='Blue');
Ploted scatterplot between population & life_expectancy also analyzing data with histogram
sns.scatterplot( x=countries_df['population'], y=countries_df['life_expectancy'], hue=countries_df['continent'])
<AxesSubplot:xlabel='population', ylabel='life_expectancy'>
Ploted scatterplot between population & gdp_per_capita also analyzing data with histogram
sns.scatterplot( x=countries_df['population'], y=countries_df['gdp_per_capita'], hue=countries_df['continent'])
<AxesSubplot:xlabel='population', ylabel='gdp_per_capita'>
Ploted scatterplot between life_expectancy & gdp_per_capita also analyzing data with histogram
sns.scatterplot( x=countries_df['life_expectancy'], y=countries_df['gdp_per_capita'], hue=countries_df['continent'])
<AxesSubplot:xlabel='life_expectancy', ylabel='gdp_per_capita'>
Let us save and upload our work to Jovian before continuing
import jovian
jovian.commit()
[jovian] Attempting to save notebook..
[jovian] Updating notebook "prapathade1111/course-project-starter-pro" on https://jovian.ai
[jovian] Uploading notebook..
[jovian] Uploading additional files...
[jovian] Committed successfully! https://jovian.ai/prapathade1111/course-project-starter-pro
TODO - write some explanation here.
Q: Add a new column in countries_df
to record the overall GDP per country (product of population & per capita GDP).
countries_df['gdp'] = countries_df['population']*countries_df['gdp_per_capita']
countries_df
Q: Create a data frame showing the total population of each continent.
continent_populations_df = countries_df.groupby(["continent"]).sum().reset_index()[["continent","population"]]
continent_populations_df
Q:Create a dataframe containing 10 countries with the lowest GDP per capita, among the counties with population greater than 100 million.
Low_GDP = countries_df.sort_values(by='gdp', ascending=True)
Mill = Low_GDP.groupby('population')
Mill.head(10)
Q: Create graph showing the life_expectancy of each continent.
continent_populations_df = countries_df.groupby(["continent"]).sum().reset_index()[["continent","life_expectancy"]]
continent_populations_df
sns.barplot(x=continent_populations_df['continent'] , y=continent_populations_df['life_expectancy'])
<AxesSubplot:xlabel='continent', ylabel='life_expectancy'>
Q. Find the correlation between population & gdp
sns.scatterplot( x=countries_df['population'], y=countries_df['gdp'], hue=countries_df['continent'])
<AxesSubplot:xlabel='population', ylabel='gdp'>
Let us save and upload our work to Jovian before continuing.
import jovian
jovian.commit()
[jovian] Attempting to save notebook..
[jovian] Updating notebook "prapathade1111/course-project-starter-pro" on https://jovian.ai
[jovian] Uploading notebook..
[jovian] Uploading additional files...
[jovian] Committed successfully! https://jovian.ai/prapathade1111/course-project-starter-pro
import jovian
jovian.commit()
[jovian] Attempting to save notebook..