Jovian
⭐️
Sign In
Learn data science and machine learning by building real-world projects on Jovian

Assignment 3 - Pandas Data Analysis Practice

This assignment is a part of the course "Data Analysis with Python: Zero to Pandas"

In this assignment, you'll get to practice some of the concepts and skills covered this tutorial: https://jovian.ml/aakashns/python-pandas-data-analysis

As you go through this notebook, you will find a ??? in certain places. To complete this assignment, you must replace all the ??? with appropriate values, expressions or statements to ensure that the notebook runs properly end-to-end.

Some things to keep in mind:

  • Make sure to run all the code cells, otherwise you may get errors like NameError for undefined variables.
  • Do not change variable names, delete cells or disturb other existing code. It may cause problems during evaluation.
  • In some cases, you may need to add some code cells or new statements before or after the line of code containing the ???.
  • Since you'll be using a temporary online service for code execution, save your work by running jovian.commit at regular intervals.
  • Questions marked (Optional) will not be considered for evaluation, and can be skipped. They are for your learning.

You can make submissions on this page: https://jovian.ml/learn/data-analysis-with-python-zero-to-pandas/assignment/assignment-3-pandas-practice

If you are stuck, you can ask for help on the community forum: https://jovian.ml/forum/t/assignment-3-pandas-practice/11225/3 . You can get help with errors or ask for hints, describe your approach in simple words, link to documentation, but please don't ask for or share the full working answer code on the forum.

How to run the code and save your work

The recommended way to run this notebook is to click the "Run" button at the top of this page, and select "Run on Binder". This will run the notebook on mybinder.org, a free online service for running Jupyter notebooks.

Before staring the assignment, let's save a snapshot of the assignment to your Jovian.ml profile, so that you can access it later, and continue your work.

In [1]:
import jovian
In [3]:
jovian.commit(project='pandas-practice-assignment', environment=None)
[jovian] Attempting to save notebook.. [jovian] Updating notebook "abdullah-alashafi/pandas-practice-assignment" on https://jovian.ai [jovian] Uploading notebook.. [jovian] Uploading additional files... [jovian] Committed successfully! https://jovian.ai/abdullah-alashafi/pandas-practice-assignment
In [2]:
# Run the next line to install Pandas
!pip install pandas --upgrade
Collecting pandas Downloading pandas-1.2.3-cp38-cp38-manylinux1_x86_64.whl (9.7 MB) |████████████████████████████████| 9.7 MB 4.4 MB/s eta 0:00:01 Requirement already satisfied, skipping upgrade: python-dateutil>=2.7.3 in /opt/conda/lib/python3.8/site-packages (from pandas) (2.8.1) Requirement already satisfied, skipping upgrade: numpy>=1.16.5 in /opt/conda/lib/python3.8/site-packages (from pandas) (1.19.2) Requirement already satisfied, skipping upgrade: pytz>=2017.3 in /opt/conda/lib/python3.8/site-packages (from pandas) (2020.1) Requirement already satisfied, skipping upgrade: six>=1.5 in /opt/conda/lib/python3.8/site-packages (from python-dateutil>=2.7.3->pandas) (1.15.0) Installing collected packages: pandas Attempting uninstall: pandas Found existing installation: pandas 1.1.3 Uninstalling pandas-1.1.3: Successfully uninstalled pandas-1.1.3 Successfully installed pandas-1.2.3
In [3]:
import pandas as pd

In this assignment, we're going to analyze an operate on data from a CSV file. Let's begin by downloading the CSV file.

In [4]:
from urllib.request import urlretrieve

urlretrieve('https://hub.jovian.ml/wp-content/uploads/2020/09/countries.csv', 
            'countries.csv')
Out[4]:
('countries.csv', <http.client.HTTPMessage at 0x7fef7caa5970>)

Let's load the data from the CSV file into a Pandas data frame.

In [5]:
countries_df = pd.read_csv('countries.csv')
In [9]:
countries_df
Out[9]:

Q: How many countries does the dataframe contain?

Hint: Use the .shape method.

In [10]:
num_countries = countries_df.shape[0]
In [11]:
print('There are {} countries in the dataset'.format(num_countries))
There are 210 countries in the dataset
In [12]:
jovian.commit(project='pandas-practice-assignment', environment=None)
[jovian] Attempting to save notebook.. [jovian] Updating notebook "abdullah-alashafi/pandas-practice-assignment" on https://jovian.ai [jovian] Uploading notebook.. [jovian] Uploading additional files... [jovian] Committed successfully! https://jovian.ai/abdullah-alashafi/pandas-practice-assignment

Q: Retrieve a list of continents from the dataframe?

Hint: Use the .unique method of a series.

In [22]:
continents = pd.unique(countries_df.continent)
# print(pd.unique.__doc__)
In [23]:
continents
# print(type(continents))
Out[23]:
array(['Asia', 'Europe', 'Africa', 'North America', 'South America',
       'Oceania'], dtype=object)
In [24]:
jovian.commit(project='pandas-practice-assignment', environment=None)
[jovian] Attempting to save notebook.. [jovian] Updating notebook "abdullah-alashafi/pandas-practice-assignment" on https://jovian.ai [jovian] Uploading notebook.. [jovian] Uploading additional files... [jovian] Committed successfully! https://jovian.ai/abdullah-alashafi/pandas-practice-assignment

Q: What is the total population of all the countries listed in this dataset?

In [28]:
total_population = countries_df.population.sum()
In [29]:
print('The total population is {}.'.format(int(total_population)))
The total population is 7757980095.
In [30]:
jovian.commit(project='pandas-practice-assignment', environment=None)
[jovian] Attempting to save notebook.. [jovian] Updating notebook "abdullah-alashafi/pandas-practice-assignment" on https://jovian.ai [jovian] Uploading notebook.. [jovian] Uploading additional files... [jovian] Committed successfully! https://jovian.ai/abdullah-alashafi/pandas-practice-assignment

Q: (Optional) What is the overall life expectancy across in the world?

Hint: You'll need to take a weighted average of life expectancy using populations as weights.

In [34]:
weighted_average = (countries_df['life_expectancy']*countries_df['population']).sum() / countries_df['population'].sum()
In [35]:
weighted_average 
Out[35]:
72.72165193409664
In [36]:
jovian.commit(project='pandas-practice-assignment', environment=None)
[jovian] Attempting to save notebook.. [jovian] Updating notebook "abdullah-alashafi/pandas-practice-assignment" on https://jovian.ai [jovian] Uploading notebook.. [jovian] Uploading additional files... [jovian] Committed successfully! https://jovian.ai/abdullah-alashafi/pandas-practice-assignment

Q: Create a dataframe containing 10 countries with the highest population.

Hint: Chain the sort_values and head methods.

In [44]:
most_populous_df = countries_df.sort_values('population', ascending=False).head(10)

In [45]:
most_populous_df
Out[45]:
In [46]:
jovian.commit(project='pandas-practice-assignment', environment=None)
[jovian] Attempting to save notebook.. [jovian] Updating notebook "abdullah-alashafi/pandas-practice-assignment" on https://jovian.ai [jovian] Uploading notebook.. [jovian] Uploading additional files... [jovian] Committed successfully! https://jovian.ai/abdullah-alashafi/pandas-practice-assignment

Q: Add a new column in countries_df to record the overall GDP per country (product of population & per capita GDP).

In [48]:
countries_df['gdp'] = countries_df.population / countries_df.gdp_per_capita
In [49]:
countries_df
Out[49]:
In [50]:
jovian.commit(project='pandas-practice-assignment', environment=None)
[jovian] Attempting to save notebook.. [jovian] Updating notebook "abdullah-alashafi/pandas-practice-assignment" on https://jovian.ai [jovian] Uploading notebook.. [jovian] Uploading additional files... [jovian] Committed successfully! https://jovian.ai/abdullah-alashafi/pandas-practice-assignment

Q: (Optional) Create a dataframe containing 10 countries with the lowest GDP per capita, among the counties with population greater than 100 million.

In [72]:
df = countries_df.sort_values('gdp_per_capita')[countries_df.population > 100000000].head(10)
# df = countries_df[countries_df.gdp_per_capita > 1000]

<ipython-input-72-4ee4eaa97c14>:1: UserWarning: Boolean Series key will be reindexed to match DataFrame index. df = countries_df.sort_values('gdp_per_capita')[countries_df.population > 100000000].head(10)
Out[72]:
In [73]:
df

Out[73]:
In [74]:
jovian.commit(project='pandas-practice-assignment', environment=None)
[jovian] Attempting to save notebook.. [jovian] Updating notebook "abdullah-alashafi/pandas-practice-assignment" on https://jovian.ai [jovian] Uploading notebook.. [jovian] Uploading additional files... [jovian] Committed successfully! https://jovian.ai/abdullah-alashafi/pandas-practice-assignment

Q: Create a data frame that counts the number countries in each continent?

Hint: Use groupby, select the location column and aggregate using count.

In [83]:
country_counts_df = countries_df.groupby('continent')[['location']].count()

In [84]:
country_counts_df
Out[84]:
In [ ]:
jovian.commit(project='pandas-practice-assignment', environment=None)
[jovian] Attempting to save notebook..

Q: Create a data frame showing the total population of each continent.

Hint: Use groupby, select the population column and aggregate using sum.

In [14]:
continent_populations_df = countries_df.groupby('continent')[['population']].sum()

In [15]:
continent_populations_df
Out[15]:
In [16]:
jovian.commit(project='pandas-practice-assignment', environment=None)
[jovian] Attempting to save notebook.. [jovian] Updating notebook "abdullah-alashafi/pandas-practice-assignment" on https://jovian.ai [jovian] Uploading notebook.. [jovian] Uploading additional files... [jovian] Committed successfully! https://jovian.ai/abdullah-alashafi/pandas-practice-assignment

Let's download another CSV file containing overall Covid-19 stats for various countires, and read the data into another Pandas data frame.

In [6]:
urlretrieve('https://hub.jovian.ml/wp-content/uploads/2020/09/covid-countries-data.csv', 
            'covid-countries-data.csv')
Out[6]:
('covid-countries-data.csv', <http.client.HTTPMessage at 0x7fef7caa5d00>)
In [7]:
covid_data_df = pd.read_csv('covid-countries-data.csv')
In [8]:
covid_data_df
Out[8]:

Q: Count the number of countries for which the total_tests data is missing.

Hint: Use the .isna method.

In [26]:
total_tests_missing = covid_data_df[covid_data_df.total_tests.isna()].location.count()
total_tests_missing
Out[26]:
122
In [27]:
print("The data for total tests is missing for {} countries.".format(int(total_tests_missing)))
The data for total tests is missing for 122 countries.
In [28]:
jovian.commit(project='pandas-practice-assignment', environment=None)
[jovian] Attempting to save notebook.. [jovian] Updating notebook "abdullah-alashafi/pandas-practice-assignment" on https://jovian.ai [jovian] Uploading notebook.. [jovian] Uploading additional files... [jovian] Committed successfully! https://jovian.ai/abdullah-alashafi/pandas-practice-assignment

Let's merge the two data frames, and compute some more metrics.

Q: Merge countries_df with covid_data_df on the location column.

*Hint: Use the .merge method on countries_df.

In [9]:
combined_df = countries_df.merge(covid_data_df, on='location')
In [10]:
combined_df
Out[10]:
In [47]:
jovian.commit(project='pandas-practice-assignment', environment=None)
[jovian] Attempting to save notebook.. [jovian] Updating notebook "abdullah-alashafi/pandas-practice-assignment" on https://jovian.ai [jovian] Uploading notebook.. [jovian] Uploading additional files... [jovian] Committed successfully! https://jovian.ai/abdullah-alashafi/pandas-practice-assignment

Q: Add columns tests_per_million, cases_per_million and deaths_per_million into combined_df.

In [11]:
combined_df['tests_per_million'] = combined_df['total_tests'] * 1e6 / combined_df['population']
In [12]:
combined_df['cases_per_million'] = combined_df['total_cases'] * 1e6 / combined_df['population']
In [13]:
combined_df['deaths_per_million'] = combined_df['total_deaths'] * 1e6 / combined_df['population']
In [14]:
combined_df
Out[14]:
In [52]:
jovian.commit(project='pandas-practice-assignment', environment=None)
[jovian] Attempting to save notebook.. [jovian] Updating notebook "abdullah-alashafi/pandas-practice-assignment" on https://jovian.ai [jovian] Uploading notebook.. [jovian] Uploading additional files... [jovian] Committed successfully! https://jovian.ai/abdullah-alashafi/pandas-practice-assignment

Q: Create a dataframe with 10 countires that have highest number of tests per million people.

In [15]:
highest_tests_df = combined_df.sort_values('tests_per_million',  ascending=False).head(10)
In [16]:
highest_tests_df
Out[16]:
In [17]:
jovian.commit(project='pandas-practice-assignment', environment=None)
[jovian] Attempting to save notebook.. [jovian] Updating notebook "abdullah-alashafi/pandas-practice-assignment" on https://jovian.ai [jovian] Uploading notebook.. [jovian] Uploading additional files... [jovian] Committed successfully! https://jovian.ai/abdullah-alashafi/pandas-practice-assignment

Q: Create a dataframe with 10 countires that have highest number of positive cases per million people.

In [18]:
highest_cases_df = combined_df.sort_values('cases_per_million',  ascending=False).head(10)
In [19]:
highest_cases_df
Out[19]:
In [20]:
jovian.commit(project='pandas-practice-assignment', environment=None)
[jovian] Attempting to save notebook.. [jovian] Updating notebook "abdullah-alashafi/pandas-practice-assignment" on https://jovian.ai [jovian] Uploading notebook.. [jovian] Uploading additional files... [jovian] Committed successfully! https://jovian.ai/abdullah-alashafi/pandas-practice-assignment

Q: Create a dataframe with 10 countires that have highest number of deaths cases per million people?

In [22]:
highest_deaths_df = combined_df.sort_values('deaths_per_million',  ascending=False).head(10)
In [23]:
highest_deaths_df
Out[23]:
In [24]:
jovian.commit(project='pandas-practice-assignment', environment=None)
[jovian] Attempting to save notebook.. [jovian] Updating notebook "abdullah-alashafi/pandas-practice-assignment" on https://jovian.ai [jovian] Uploading notebook.. [jovian] Uploading additional files... [jovian] Committed successfully! https://jovian.ai/abdullah-alashafi/pandas-practice-assignment

(Optional) Q: Count number of countries that feature in both the lists of "highest number of tests per million" and "highest number of cases per million".

In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
jovian.commit(project='pandas-practice-assignment', environment=None)

(Optional) Q: Count number of countries that feature in both the lists "20 countries with lowest GDP per capita" and "20 countries with the lowest number of hospital beds per thousand population". Only consider countries with a population higher than 10 million while creating the list.

In [ ]:
 
In [ ]:
 
In [ ]:
 
In [2]:
import jovian
In [25]:
jovian.commit(project='pandas-practice-assignment', environment=None)
[jovian] Attempting to save notebook.. [jovian] Updating notebook "abdullah-alashafi/pandas-practice-assignment" on https://jovian.ai [jovian] Uploading notebook.. [jovian] Uploading additional files... [jovian] Committed successfully! https://jovian.ai/abdullah-alashafi/pandas-practice-assignment

Submission

Congratulations on making it this far! You've reached the end of this assignment, and you just completed your first real-world data analysis problem. It's time to record one final version of your notebook for submission.

Make a submission here by filling the submission form: https://jovian.ml/learn/data-analysis-with-python-zero-to-pandas/assignment/assignment-3-pandas-practice

Also make sure to help others on the forum: https://jovian.ml/forum/t/assignment-3-pandas-practice/11225/2

In [ ]:
jovian.submit(assignment="zero-to-pandas-a3")
[jovian] Attempting to save notebook..