abhyu420/pandas-practice-assignment - Jovian
Learn data science and machine learning by building real-world projects on Jovian

Assignment 3 - Pandas Data Analysis Practice

This assignment is a part of the course "Data Analysis with Python: Zero to Pandas"

In this assignment, you'll get to practice some of the concepts and skills covered this tutorial: https://jovian.ml/aakashns/python-pandas-data-analysis

As you go through this notebook, you will find a ??? in certain places. To complete this assignment, you must replace all the ??? with appropriate values, expressions or statements to ensure that the notebook runs properly end-to-end.

Some things to keep in mind:

  • Make sure to run all the code cells, otherwise you may get errors like NameError for undefined variables.
  • Do not change variable names, delete cells or disturb other existing code. It may cause problems during evaluation.
  • In some cases, you may need to add some code cells or new statements before or after the line of code containing the ???.
  • Since you'll be using a temporary online service for code execution, save your work by running jovian.commit at regular intervals.
  • Questions marked (Optional) will not be considered for evaluation, and can be skipped. They are for your learning.

You can make submissions on this page: https://jovian.ml/learn/data-analysis-with-python-zero-to-pandas/assignment/assignment-3-pandas-practice

If you are stuck, you can ask for help on the community forum: https://jovian.ml/forum/t/assignment-3-pandas-practice/11225/3 . You can get help with errors or ask for hints, describe your approach in simple words, link to documentation, but please don't ask for or share the full working answer code on the forum.

How to run the code and save your work

The recommended way to run this notebook is to click the "Run" button at the top of this page, and select "Run on Binder". This will run the notebook on mybinder.org, a free online service for running Jupyter notebooks.

Before staring the assignment, let's save a snapshot of the assignment to your Jovian.ml profile, so that you can access it later, and continue your work.

import jovian
jovian.commit(project='pandas-practice-assignment', environment=None)
[jovian] Detected Colab notebook... [jovian] Please enter your API key ( from https://jovian.ai/ ): API KEY: ·········· [jovian] Uploading colab notebook to Jovian... Committed successfully! https://jovian.ai/abhyu420/pandas-practice-assignment
# Run the next line to install Pandas
!pip install pandas --upgrade
Requirement already satisfied: pandas in /usr/local/lib/python3.7/dist-packages (1.1.5) Collecting pandas Downloading pandas-1.3.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.3 MB) |████████████████████████████████| 11.3 MB 4.1 MB/s Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.7/dist-packages (from pandas) (2018.9) Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.7/dist-packages (from pandas) (2.8.2) Requirement already satisfied: numpy>=1.17.3 in /usr/local/lib/python3.7/dist-packages (from pandas) (1.19.5) Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (from python-dateutil>=2.7.3->pandas) (1.15.0) Installing collected packages: pandas Attempting uninstall: pandas Found existing installation: pandas 1.1.5 Uninstalling pandas-1.1.5: Successfully uninstalled pandas-1.1.5 ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. google-colab 1.0.0 requires pandas~=1.1.0; python_version >= "3.0", but you have pandas 1.3.3 which is incompatible. Successfully installed pandas-1.3.3
import pandas as pd

In this assignment, we're going to analyze an operate on data from a CSV file. Let's begin by downloading the CSV file.

from urllib.request import urlretrieve

urlretrieve('https://hub.jovian.ml/wp-content/uploads/2020/09/countries.csv', 
            'countries.csv')
('countries.csv', <http.client.HTTPMessage at 0x7fdc53b9ee10>)

Let's load the data from the CSV file into a Pandas data frame.

countries_df = pd.read_csv('countries.csv')
countries_df
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) /usr/local/lib/python3.7/dist-packages/IPython/core/formatters.py in __call__(self, obj) 336 method = get_real_method(obj, self.print_method) 337 if method is not None: --> 338 return method() 339 return None 340 else: /usr/local/lib/python3.7/dist-packages/pandas/core/frame.py in _repr_html_(self) 796 797 @property --> 798 def shape(self) -> tuple[int, int]: 799 """ 800 Return a tuple representing the dimensionality of the DataFrame. /usr/local/lib/python3.7/dist-packages/pandas/io/formats/format.py in to_html(self, buf, encoding, classes, notebook, border) 986 encoding: str | None = None, 987 classes: str | list | tuple | None = None, --> 988 notebook: bool = False, 989 border: int | None = None, 990 table_id: str | None = None, AttributeError: 'NotebookFormatter' object has no attribute 'get_result'
           location continent  ...  hospital_beds_per_thousand  gdp_per_capita
0       Afghanistan      Asia  ...                        0.50        1803.987
1           Albania    Europe  ...                        2.89       11803.431
2           Algeria    Africa  ...                        1.90       13913.839
3           Andorra    Europe  ...                         NaN             NaN
4            Angola    Africa  ...                         NaN        5819.495
..              ...       ...  ...                         ...             ...
205         Vietnam      Asia  ...                        2.60        6171.884
206  Western Sahara    Africa  ...                         NaN             NaN
207           Yemen      Asia  ...                        0.70        1479.147
208          Zambia    Africa  ...                        2.00        3689.251
209        Zimbabwe    Africa  ...                        1.70        1899.775

[210 rows x 6 columns]

Q1: How many countries does the dataframe contain?

Hint: Use the .shape method.

num_countries = len(countries_df["location"])
print('There are {} countries in the dataset'.format(num_countries))
There are 210 countries in the dataset
jovian.commit(project='pandas-practice-assignment', environment=None)
[jovian] Detected Colab notebook... [jovian] Uploading colab notebook to Jovian... Committed successfully! https://jovian.ai/abhyu420/pandas-practice-assignment

Q2: Retrieve a list of continents from the dataframe?

Hint: Use the .unique method of a series.

continents = countries_df["continent"].unique()
continents
array(['Asia', 'Europe', 'Africa', 'North America', 'South America',
       'Oceania'], dtype=object)
jovian.commit(project='pandas-practice-assignment', environment=None)
[jovian] Detected Colab notebook... [jovian] Uploading colab notebook to Jovian... Committed successfully! https://jovian.ai/abhyu420/pandas-practice-assignment

Q3: What is the total population of all the countries listed in this dataset?

total_population = countries_df["population"].sum()

print('The total population is {}.'.format(int(total_population)))
The total population is 7757980095.
jovian.commit(project='pandas-practice-assignment', environment=None)
[jovian] Detected Colab notebook... [jovian] Uploading colab notebook to Jovian... Committed successfully! https://jovian.ai/abhyu420/pandas-practice-assignment

Q: (Optional) What is the overall life expectancy across in the world?

Hint: You'll need to take a weighted average of life expectancy using populations as weights.

 
 
jovian.commit(project='pandas-practice-assignment', environment=None)
[jovian] Detected Colab notebook... [jovian] Uploading colab notebook to Jovian... Committed successfully! https://jovian.ai/abhyu420/pandas-practice-assignment

Q4: Create a dataframe containing 10 countries with the highest population.

Hint: Chain the sort_values and head methods.

most_populous_df = countries_df.sort_values("population",ascending=False).head(10)
most_populous_df
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) /usr/local/lib/python3.7/dist-packages/IPython/core/formatters.py in __call__(self, obj) 336 method = get_real_method(obj, self.print_method) 337 if method is not None: --> 338 return method() 339 return None 340 else: /usr/local/lib/python3.7/dist-packages/pandas/core/frame.py in _repr_html_(self) 796 797 @property --> 798 def shape(self) -> tuple[int, int]: 799 """ 800 Return a tuple representing the dimensionality of the DataFrame. /usr/local/lib/python3.7/dist-packages/pandas/io/formats/format.py in to_html(self, buf, encoding, classes, notebook, border) 986 encoding: str | None = None, 987 classes: str | list | tuple | None = None, --> 988 notebook: bool = False, 989 border: int | None = None, 990 table_id: str | None = None, AttributeError: 'NotebookFormatter' object has no attribute 'get_result'
          location      continent  ...  hospital_beds_per_thousand  gdp_per_capita
41           China           Asia  ...                        4.34       15308.712
90           India           Asia  ...                        0.53        6426.674
199  United States  North America  ...                        2.77       54225.446
91       Indonesia           Asia  ...                        1.04       11188.744
145       Pakistan           Asia  ...                        0.60        5034.708
27          Brazil  South America  ...                        2.20       14103.452
141        Nigeria         Africa  ...                         NaN        5338.454
15      Bangladesh           Asia  ...                        0.80        3523.984
157         Russia         Europe  ...                        8.05       24765.954
125         Mexico  North America  ...                        1.38       17336.469

[10 rows x 6 columns]
jovian.commit(project='pandas-practice-assignment', environment=None)
[jovian] Detected Colab notebook... [jovian] Uploading colab notebook to Jovian... Committed successfully! https://jovian.ai/abhyu420/pandas-practice-assignment

Q5: Add a new column in countries_df to record the overall GDP per country (product of population & per capita GDP).

countries_df['gdp'] = countries_df["population"] * countries_df["gdp_per_capita"]
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-22-7dcfb8e5d41d> in <module>() ----> 1 countries_df['gdp'] = countries_df["population"] * countries_df["gdp_per_capita"] /usr/local/lib/python3.7/dist-packages/pandas/core/ops/common.py in new_method(self, other) 63 break 64 if isinstance(other, cls): ---> 65 return NotImplemented 66 67 other = item_from_zerodim(other) /usr/local/lib/python3.7/dist-packages/pandas/core/ops/__init__.py in wrapper(left, right) 341 342 --> 343 def frame_arith_method_with_reindex(left: DataFrame, right: DataFrame, op) -> DataFrame: 344 """ 345 For DataFrame-with-DataFrame operations that require reindexing, /usr/local/lib/python3.7/dist-packages/pandas/core/ops/array_ops.py in arithmetic_op(left, right, op) 188 189 Note: the caller is responsible for ensuring that numpy warnings are --> 190 suppressed (with np.errstate(all="ignore")) if needed. 191 192 Parameters /usr/local/lib/python3.7/dist-packages/pandas/core/ops/array_ops.py in na_arithmetic_op(left, right, op, is_cmp) 138 def _na_arithmetic_op(left, right, op, is_cmp: bool = False): 139 """ --> 140 Return the result of evaluating op on the passed in values. 141 142 If native types are not compatible, try coercion to object dtype. /usr/local/lib/python3.7/dist-packages/pandas/core/computation/expressions.py in <module>() 17 from pandas._typing import FuncType 18 ---> 19 from pandas.core.computation.check import NUMEXPR_INSTALLED 20 from pandas.core.ops import roperator 21 /usr/local/lib/python3.7/dist-packages/pandas/core/computation/check.py in <module>() 1 from pandas.compat._optional import import_optional_dependency 2 ----> 3 ne = import_optional_dependency("numexpr", errors="warn") 4 NUMEXPR_INSTALLED = ne is not None 5 if NUMEXPR_INSTALLED: TypeError: import_optional_dependency() got an unexpected keyword argument 'errors'
countries_df
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) /usr/local/lib/python3.7/dist-packages/IPython/core/formatters.py in __call__(self, obj) 336 method = get_real_method(obj, self.print_method) 337 if method is not None: --> 338 return method() 339 return None 340 else: /usr/local/lib/python3.7/dist-packages/pandas/core/frame.py in _repr_html_(self) 796 797 @property --> 798 def shape(self) -> tuple[int, int]: 799 """ 800 Return a tuple representing the dimensionality of the DataFrame. /usr/local/lib/python3.7/dist-packages/pandas/io/formats/format.py in to_html(self, buf, encoding, classes, notebook, border) 986 encoding: str | None = None, 987 classes: str | list | tuple | None = None, --> 988 notebook: bool = False, 989 border: int | None = None, 990 table_id: str | None = None, AttributeError: 'NotebookFormatter' object has no attribute 'get_result'
           location continent  ...  hospital_beds_per_thousand  gdp_per_capita
0       Afghanistan      Asia  ...                        0.50        1803.987
1           Albania    Europe  ...                        2.89       11803.431
2           Algeria    Africa  ...                        1.90       13913.839
3           Andorra    Europe  ...                         NaN             NaN
4            Angola    Africa  ...                         NaN        5819.495
..              ...       ...  ...                         ...             ...
205         Vietnam      Asia  ...                        2.60        6171.884
206  Western Sahara    Africa  ...                         NaN             NaN
207           Yemen      Asia  ...                        0.70        1479.147
208          Zambia    Africa  ...                        2.00        3689.251
209        Zimbabwe    Africa  ...                        1.70        1899.775

[210 rows x 6 columns]
jovian.commit(project='pandas-practice-assignment', environment=None)
[jovian] Detected Colab notebook... [jovian] Uploading colab notebook to Jovian... Committed successfully! https://jovian.ai/abhyu420/pandas-practice-assignment

Q: (Optional) Create a dataframe containing 10 countries with the lowest GDP per capita, among the counties with population greater than 100 million.

new_df = countries_df[countries_df.population > 100000000].head(10)
new_df = countries_df.sort_values("gdp_per_capita",ascending = True).head(10)
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-28-0a6bea4a515b> in <module>() ----> 1 new_df = countries_df[countries_df.population > 100000000].head(10) 2 new_df = countries_df.sort_values("gdp_per_capita",ascending = True).head(10) /usr/local/lib/python3.7/dist-packages/pandas/core/ops/common.py in new_method(self, other) 63 break 64 if isinstance(other, cls): ---> 65 return NotImplemented 66 67 other = item_from_zerodim(other) /usr/local/lib/python3.7/dist-packages/pandas/core/ops/__init__.py in wrapper(self, other) 368 # to avoid constructing two potentially large/sparse DataFrames 369 join_columns, _, _ = left.columns.join( --> 370 right.columns, how="outer", level=None, return_indexers=True 371 ) 372 /usr/local/lib/python3.7/dist-packages/pandas/core/ops/array_ops.py in comparison_op(left, right, op) 249 rvalues = ensure_wrapped_if_datetimelike(right) 250 --> 251 rvalues = lib.item_from_zerodim(rvalues) 252 if isinstance(rvalues, list): 253 # TODO: same for tuples? /usr/local/lib/python3.7/dist-packages/pandas/core/ops/array_ops.py in na_arithmetic_op(left, right, op, is_cmp) 138 def _na_arithmetic_op(left, right, op, is_cmp: bool = False): 139 """ --> 140 Return the result of evaluating op on the passed in values. 141 142 If native types are not compatible, try coercion to object dtype. /usr/local/lib/python3.7/dist-packages/pandas/core/computation/expressions.py in <module>() 17 from pandas._typing import FuncType 18 ---> 19 from pandas.core.computation.check import NUMEXPR_INSTALLED 20 from pandas.core.ops import roperator 21 /usr/local/lib/python3.7/dist-packages/pandas/core/computation/check.py in <module>() 1 from pandas.compat._optional import import_optional_dependency 2 ----> 3 ne = import_optional_dependency("numexpr", errors="warn") 4 NUMEXPR_INSTALLED = ne is not None 5 if NUMEXPR_INSTALLED: TypeError: import_optional_dependency() got an unexpected keyword argument 'errors'
 
jovian.commit(project='pandas-practice-assignment', environment=None)
[jovian] Detected Colab notebook... [jovian] Uploading colab notebook to Jovian... Committed successfully! https://jovian.ai/abhyu420/pandas-practice-assignment

Q6: Create a data frame that counts the number countries in each continent?

Hint: Use groupby, select the location column and aggregate using count.

countries_df
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) /usr/local/lib/python3.7/dist-packages/IPython/core/formatters.py in __call__(self, obj) 336 method = get_real_method(obj, self.print_method) 337 if method is not None: --> 338 return method() 339 return None 340 else: /usr/local/lib/python3.7/dist-packages/pandas/core/frame.py in _repr_html_(self) 796 797 @property --> 798 def shape(self) -> tuple[int, int]: 799 """ 800 Return a tuple representing the dimensionality of the DataFrame. /usr/local/lib/python3.7/dist-packages/pandas/io/formats/format.py in to_html(self, buf, encoding, classes, notebook, border) 986 encoding: str | None = None, 987 classes: str | list | tuple | None = None, --> 988 notebook: bool = False, 989 border: int | None = None, 990 table_id: str | None = None, AttributeError: 'NotebookFormatter' object has no attribute 'get_result'
           location continent  ...  hospital_beds_per_thousand  gdp_per_capita
0       Afghanistan      Asia  ...                        0.50        1803.987
1           Albania    Europe  ...                        2.89       11803.431
2           Algeria    Africa  ...                        1.90       13913.839
3           Andorra    Europe  ...                         NaN             NaN
4            Angola    Africa  ...                         NaN        5819.495
..              ...       ...  ...                         ...             ...
205         Vietnam      Asia  ...                        2.60        6171.884
206  Western Sahara    Africa  ...                         NaN             NaN
207           Yemen      Asia  ...                        0.70        1479.147
208          Zambia    Africa  ...                        2.00        3689.251
209        Zimbabwe    Africa  ...                        1.70        1899.775

[210 rows x 6 columns]
country_counts_df = countries_df.groupby("continent").count()
jovian.commit(project='pandas-practice-assignment', environment=None)
[jovian] Detected Colab notebook... [jovian] Uploading colab notebook to Jovian... Committed successfully! https://jovian.ai/abhyu420/pandas-practice-assignment

Q7: Create a data frame showing the total population of each continent.

Hint: Use groupby, select the population column and aggregate using sum.

continent_populations_df = countries_df.groupby(["continent"]).sum()
continent_populations_df
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) /usr/local/lib/python3.7/dist-packages/IPython/core/formatters.py in __call__(self, obj) 336 method = get_real_method(obj, self.print_method) 337 if method is not None: --> 338 return method() 339 return None 340 else: /usr/local/lib/python3.7/dist-packages/pandas/core/frame.py in _repr_html_(self) 796 797 @property --> 798 def shape(self) -> tuple[int, int]: 799 """ 800 Return a tuple representing the dimensionality of the DataFrame. /usr/local/lib/python3.7/dist-packages/pandas/io/formats/format.py in to_html(self, buf, encoding, classes, notebook, border) 986 encoding: str | None = None, 987 classes: str | list | tuple | None = None, --> 988 notebook: bool = False, 989 border: int | None = None, 990 table_id: str | None = None, AttributeError: 'NotebookFormatter' object has no attribute 'get_result'
                 population  ...  gdp_per_capita
continent                    ...                
Africa         1.339424e+09  ...      288523.368
Asia           4.607388e+09  ...     1032210.905
Europe         7.485062e+08  ...     1401145.971
North America  5.912425e+08  ...      584691.580
Oceania        4.095832e+07  ...       93260.722
South America  4.304611e+08  ...      166089.423

[6 rows x 4 columns]
jovian.commit(project='pandas-practice-assignment', environment=None)
[jovian] Detected Colab notebook... [jovian] Uploading colab notebook to Jovian... Committed successfully! https://jovian.ai/abhyu420/pandas-practice-assignment

Let's download another CSV file containing overall Covid-19 stats for various countires, and read the data into another Pandas data frame.

urlretrieve('https://hub.jovian.ml/wp-content/uploads/2020/09/covid-countries-data.csv', 
            'covid-countries-data.csv')
('covid-countries-data.csv', <http.client.HTTPMessage at 0x7fdc534dcf90>)
covid_data_df = pd.read_csv('covid-countries-data.csv')
covid_data_df
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) /usr/local/lib/python3.7/dist-packages/IPython/core/formatters.py in __call__(self, obj) 336 method = get_real_method(obj, self.print_method) 337 if method is not None: --> 338 return method() 339 return None 340 else: /usr/local/lib/python3.7/dist-packages/pandas/core/frame.py in _repr_html_(self) 796 797 @property --> 798 def shape(self) -> tuple[int, int]: 799 """ 800 Return a tuple representing the dimensionality of the DataFrame. /usr/local/lib/python3.7/dist-packages/pandas/io/formats/format.py in to_html(self, buf, encoding, classes, notebook, border) 986 encoding: str | None = None, 987 classes: str | list | tuple | None = None, --> 988 notebook: bool = False, 989 border: int | None = None, 990 table_id: str | None = None, AttributeError: 'NotebookFormatter' object has no attribute 'get_result'
           location  total_cases  total_deaths  total_tests
0       Afghanistan      38243.0        1409.0          NaN
1           Albania       9728.0         296.0          NaN
2           Algeria      45158.0        1525.0          NaN
3           Andorra       1199.0          53.0          NaN
4            Angola       2729.0         109.0          NaN
..              ...          ...           ...          ...
207  Western Sahara        766.0           1.0          NaN
208           World   26059065.0      863535.0          NaN
209           Yemen       1976.0         571.0          NaN
210          Zambia      12415.0         292.0          NaN
211        Zimbabwe       6638.0         206.0      97272.0

[212 rows x 4 columns]

Q8: Count the number of countries for which the total_tests data is missing.

Hint: Use the .isna method.

total_tests_missing = covid_data_df.total_tests.isna().sum()
total_tests_missing
122
print("The data for total tests is missing for {} countries.".format(int(total_tests_missing)))
The data for total tests is missing for 122 countries.
jovian.commit(project='pandas-practice-assignment', environment=None)
[jovian] Detected Colab notebook... [jovian] Uploading colab notebook to Jovian... Committed successfully! https://jovian.ai/abhyu420/pandas-practice-assignment

Let's merge the two data frames, and compute some more metrics.

Q9: Merge countries_df with covid_data_df on the location column.

*Hint: Use the .merge method on countries_df.

combined_df = pd.merge(countries_df,covid_data_df,on="location")
combined_df
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) /usr/local/lib/python3.7/dist-packages/IPython/core/formatters.py in __call__(self, obj) 336 method = get_real_method(obj, self.print_method) 337 if method is not None: --> 338 return method() 339 return None 340 else: /usr/local/lib/python3.7/dist-packages/pandas/core/frame.py in _repr_html_(self) 796 797 @property --> 798 def shape(self) -> tuple[int, int]: 799 """ 800 Return a tuple representing the dimensionality of the DataFrame. /usr/local/lib/python3.7/dist-packages/pandas/io/formats/format.py in to_html(self, buf, encoding, classes, notebook, border) 986 encoding: str | None = None, 987 classes: str | list | tuple | None = None, --> 988 notebook: bool = False, 989 border: int | None = None, 990 table_id: str | None = None, AttributeError: 'NotebookFormatter' object has no attribute 'get_result'
           location continent  ...  total_deaths  total_tests
0       Afghanistan      Asia  ...        1409.0          NaN
1           Albania    Europe  ...         296.0          NaN
2           Algeria    Africa  ...        1525.0          NaN
3           Andorra    Europe  ...          53.0          NaN
4            Angola    Africa  ...         109.0          NaN
..              ...       ...  ...           ...          ...
205         Vietnam      Asia  ...          35.0     261004.0
206  Western Sahara    Africa  ...           1.0          NaN
207           Yemen      Asia  ...         571.0          NaN
208          Zambia    Africa  ...         292.0          NaN
209        Zimbabwe    Africa  ...         206.0      97272.0

[210 rows x 9 columns]
jovian.commit(project='pandas-practice-assignment', environment=None)
[jovian] Detected Colab notebook... [jovian] Uploading colab notebook to Jovian... Committed successfully! https://jovian.ai/abhyu420/pandas-practice-assignment

Q10: Add columns tests_per_million, cases_per_million and deaths_per_million into combined_df.

combined_df['tests_per_million'] = combined_df['total_tests'] * 1e6 / combined_df['population']
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-51-f238a7cdcd92> in <module>() ----> 1 combined_df['tests_per_million'] = combined_df['total_tests'] * 1e6 / combined_df['population'] /usr/local/lib/python3.7/dist-packages/pandas/core/ops/common.py in new_method(self, other) 63 break 64 if isinstance(other, cls): ---> 65 return NotImplemented 66 67 other = item_from_zerodim(other) /usr/local/lib/python3.7/dist-packages/pandas/core/ops/__init__.py in wrapper(left, right) 341 342 --> 343 def frame_arith_method_with_reindex(left: DataFrame, right: DataFrame, op) -> DataFrame: 344 """ 345 For DataFrame-with-DataFrame operations that require reindexing, /usr/local/lib/python3.7/dist-packages/pandas/core/ops/array_ops.py in arithmetic_op(left, right, op) 188 189 Note: the caller is responsible for ensuring that numpy warnings are --> 190 suppressed (with np.errstate(all="ignore")) if needed. 191 192 Parameters /usr/local/lib/python3.7/dist-packages/pandas/core/ops/array_ops.py in na_arithmetic_op(left, right, op, is_cmp) 138 def _na_arithmetic_op(left, right, op, is_cmp: bool = False): 139 """ --> 140 Return the result of evaluating op on the passed in values. 141 142 If native types are not compatible, try coercion to object dtype. /usr/local/lib/python3.7/dist-packages/pandas/core/computation/expressions.py in <module>() 17 from pandas._typing import FuncType 18 ---> 19 from pandas.core.computation.check import NUMEXPR_INSTALLED 20 from pandas.core.ops import roperator 21 /usr/local/lib/python3.7/dist-packages/pandas/core/computation/check.py in <module>() 1 from pandas.compat._optional import import_optional_dependency 2 ----> 3 ne = import_optional_dependency("numexpr", errors="warn") 4 NUMEXPR_INSTALLED = ne is not None 5 if NUMEXPR_INSTALLED: TypeError: import_optional_dependency() got an unexpected keyword argument 'errors'
combined_df['cases_per_million'] = combined_df['total_cases'] * 1e6 / combined_df['population']
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-52-b4c8d7e3bfd3> in <module>() ----> 1 combined_df['cases_per_million'] = combined_df['total_cases'] * 1e6 / combined_df['population'] /usr/local/lib/python3.7/dist-packages/pandas/core/ops/common.py in new_method(self, other) 63 break 64 if isinstance(other, cls): ---> 65 return NotImplemented 66 67 other = item_from_zerodim(other) /usr/local/lib/python3.7/dist-packages/pandas/core/ops/__init__.py in wrapper(left, right) 341 342 --> 343 def frame_arith_method_with_reindex(left: DataFrame, right: DataFrame, op) -> DataFrame: 344 """ 345 For DataFrame-with-DataFrame operations that require reindexing, /usr/local/lib/python3.7/dist-packages/pandas/core/ops/array_ops.py in arithmetic_op(left, right, op) 188 189 Note: the caller is responsible for ensuring that numpy warnings are --> 190 suppressed (with np.errstate(all="ignore")) if needed. 191 192 Parameters /usr/local/lib/python3.7/dist-packages/pandas/core/ops/array_ops.py in na_arithmetic_op(left, right, op, is_cmp) 138 def _na_arithmetic_op(left, right, op, is_cmp: bool = False): 139 """ --> 140 Return the result of evaluating op on the passed in values. 141 142 If native types are not compatible, try coercion to object dtype. /usr/local/lib/python3.7/dist-packages/pandas/core/computation/expressions.py in <module>() 17 from pandas._typing import FuncType 18 ---> 19 from pandas.core.computation.check import NUMEXPR_INSTALLED 20 from pandas.core.ops import roperator 21 /usr/local/lib/python3.7/dist-packages/pandas/core/computation/check.py in <module>() 1 from pandas.compat._optional import import_optional_dependency 2 ----> 3 ne = import_optional_dependency("numexpr", errors="warn") 4 NUMEXPR_INSTALLED = ne is not None 5 if NUMEXPR_INSTALLED: TypeError: import_optional_dependency() got an unexpected keyword argument 'errors'
combined_df['deaths_per_million'] = combined_df['total_deaths'] * 1e6 / combined_df['population']
combined_df
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) /usr/local/lib/python3.7/dist-packages/IPython/core/formatters.py in __call__(self, obj) 336 method = get_real_method(obj, self.print_method) 337 if method is not None: --> 338 return method() 339 return None 340 else: /usr/local/lib/python3.7/dist-packages/pandas/core/frame.py in _repr_html_(self) 796 797 @property --> 798 def shape(self) -> tuple[int, int]: 799 """ 800 Return a tuple representing the dimensionality of the DataFrame. /usr/local/lib/python3.7/dist-packages/pandas/io/formats/format.py in to_html(self, buf, encoding, classes, notebook, border) 986 encoding: str | None = None, 987 classes: str | list | tuple | None = None, --> 988 notebook: bool = False, 989 border: int | None = None, 990 table_id: str | None = None, AttributeError: 'NotebookFormatter' object has no attribute 'get_result'
           location continent  ...  total_deaths  total_tests
0       Afghanistan      Asia  ...        1409.0          NaN
1           Albania    Europe  ...         296.0          NaN
2           Algeria    Africa  ...        1525.0          NaN
3           Andorra    Europe  ...          53.0          NaN
4            Angola    Africa  ...         109.0          NaN
..              ...       ...  ...           ...          ...
205         Vietnam      Asia  ...          35.0     261004.0
206  Western Sahara    Africa  ...           1.0          NaN
207           Yemen      Asia  ...         571.0          NaN
208          Zambia    Africa  ...         292.0          NaN
209        Zimbabwe    Africa  ...         206.0      97272.0

[210 rows x 9 columns]
jovian.commit(project='pandas-practice-assignment', environment=None)
[jovian] Detected Colab notebook... [jovian] Uploading colab notebook to Jovian... Committed successfully! https://jovian.ai/abhyu420/pandas-practice-assignment

Q11: Create a dataframe with 10 countires that have highest number of tests per million people.

highest_tests_df = combined_df.sort_values("tests_per_million",ascending=False).head(10)
--------------------------------------------------------------------------- KeyError Traceback (most recent call last) <ipython-input-55-92a5a33fdb19> in <module>() ----> 1 highest_tests_df = combined_df.sort_values("tests_per_million",ascending=False).head(10) /usr/local/lib/python3.7/dist-packages/pandas/core/frame.py in sort_values(self, by, axis, ascending, inplace, kind, na_position, ignore_index, key) 5296 # Use a column that we know is valid for our column's dtype GH#38434 5297 label = self.columns[0] -> 5298 5299 if periods > 0: 5300 result = self.iloc[:, :-periods] /usr/local/lib/python3.7/dist-packages/pandas/core/generic.py in _get_label_or_level_values(self, key, axis) 1561 DataFrame.astype : Change the data type of a DataFrame, including to boolean. 1562 numpy.bool_ : NumPy boolean data type, used by pandas for boolean values. -> 1563 1564 Examples 1565 -------- KeyError: 'tests_per_million'
highest_tests_df
--------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-56-c8d5719d0e47> in <module>() ----> 1 highest_tests_df NameError: name 'highest_tests_df' is not defined
jovian.commit(project='pandas-practice-assignment', environment=None)
[jovian] Detected Colab notebook... [jovian] Uploading colab notebook to Jovian... Committed successfully! https://jovian.ai/abhyu420/pandas-practice-assignment

Q12: Create a dataframe with 10 countires that have highest number of positive cases per million people.

highest_cases_df = combined_df.sort_values("cases_per_million",ascending=False).head(10)
--------------------------------------------------------------------------- KeyError Traceback (most recent call last) <ipython-input-58-5ec6204634db> in <module>() ----> 1 highest_cases_df = combined_df.sort_values("cases_per_million",ascending=False).head(10) /usr/local/lib/python3.7/dist-packages/pandas/core/frame.py in sort_values(self, by, axis, ascending, inplace, kind, na_position, ignore_index, key) 5296 # Use a column that we know is valid for our column's dtype GH#38434 5297 label = self.columns[0] -> 5298 5299 if periods > 0: 5300 result = self.iloc[:, :-periods] /usr/local/lib/python3.7/dist-packages/pandas/core/generic.py in _get_label_or_level_values(self, key, axis) 1561 DataFrame.astype : Change the data type of a DataFrame, including to boolean. 1562 numpy.bool_ : NumPy boolean data type, used by pandas for boolean values. -> 1563 1564 Examples 1565 -------- KeyError: 'cases_per_million'
highest_cases_df
--------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-59-90b89c40ad19> in <module>() ----> 1 highest_cases_df NameError: name 'highest_cases_df' is not defined
jovian.commit(project='pandas-practice-assignment', environment=None)
[jovian] Detected Colab notebook... [jovian] Uploading colab notebook to Jovian... Committed successfully! https://jovian.ai/abhyu420/pandas-practice-assignment

Q13: Create a dataframe with 10 countires that have highest number of deaths cases per million people?

highest_deaths_df = combined_df.sort_values("deaths_per_million",ascending=False).head(10)
--------------------------------------------------------------------------- KeyError Traceback (most recent call last) <ipython-input-61-18f41a0cc37b> in <module>() ----> 1 highest_deaths_df = combined_df.sort_values("deaths_per_million",ascending=False).head(10) /usr/local/lib/python3.7/dist-packages/pandas/core/frame.py in sort_values(self, by, axis, ascending, inplace, kind, na_position, ignore_index, key) 5296 # Use a column that we know is valid for our column's dtype GH#38434 5297 label = self.columns[0] -> 5298 5299 if periods > 0: 5300 result = self.iloc[:, :-periods] /usr/local/lib/python3.7/dist-packages/pandas/core/generic.py in _get_label_or_level_values(self, key, axis) 1561 DataFrame.astype : Change the data type of a DataFrame, including to boolean. 1562 numpy.bool_ : NumPy boolean data type, used by pandas for boolean values. -> 1563 1564 Examples 1565 -------- KeyError: 'deaths_per_million'
highest_deaths_df
--------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-62-7d7035da9bf2> in <module>() ----> 1 highest_deaths_df NameError: name 'highest_deaths_df' is not defined
jovian.commit(project='pandas-practice-assignment', environment=None)
[jovian] Detected Colab notebook... [jovian] Uploading colab notebook to Jovian... Committed successfully! https://jovian.ai/abhyu420/pandas-practice-assignment

(Optional) Q: Count number of countries that feature in both the lists of "highest number of tests per million" and "highest number of cases per million".

 
 
 
jovian.commit(project='pandas-practice-assignment', environment=None)

(Optional) Q: Count number of countries that feature in both the lists "20 countries with lowest GDP per capita" and "20 countries with the lowest number of hospital beds per thousand population". Only consider countries with a population higher than 10 million while creating the list.

 
 
 
import jovian
jovian.commit(project='pandas-practice-assignment', environment=None)
[jovian] Attempting to save notebook..

Submission

Congratulations on making it this far! You've reached the end of this assignment, and you just completed your first real-world data analysis problem. It's time to record one final version of your notebook for submission.

Make a submission here by filling the submission form: https://jovian.ml/learn/data-analysis-with-python-zero-to-pandas/assignment/assignment-3-pandas-practice

Also make sure to help others on the forum: https://jovian.ml/forum/t/assignment-3-pandas-practice/11225/2