Covid Vaccination Data Analysis

Hello and welcome to my project notebook!

The COVID-19 outbreak has shaken the global health system and economy by its roots. This epidemic is continuously spreading and showing no signs of slowing down. Vaccination could be the only effective and economical means to control or stop this pandemic. Many research institutions and pharmaceutical companies worldwide are currently involved in the development for a suitable coronavirus vaccine. The efforts on coronavirus vaccine began initially in China as soon as the outbreak of coronavirus erupted and then world-over as the disease was declared a pandemic by WHO. Eventually, each country got into the race of developing the vaccine to be 1st in the world to safeguard its population & have an advantage over other countries. On 2 December 2020, the United Kingdom's Medicines and Healthcare products Regulatory Agency (MHRA) gave temporary regulatory approval for the Pfizer–BioNTech vaccine, becoming the first country to approve this vaccine.

Here in this notebook I picked a dataset containing details regarding the day-wise Covid-19 vaccinations in different countries. Till now around 2223 countries has started vaccination to save their people.
I found this dataset on kaggle, If you want to see the dataset on kaggle click here.

Let's first talk a little bit about the dataset:

The dataset contains more than 71,815 rows and 15 columns (this count of rows is on 24th January, 2022)

columns in the dataset has indexed as country, date, total_vaccinations, people_vaccinated, people_fully_vaccinated, daily_vaccinations_raw, daily_vaccinations, vaccines, source etc,(But we are interested in few them so I'll not talk about remaining columns.)

Let's talk about some specific columns:

total_vaccination: This is the absolute number of total immunizations in the country

people_vaccinated: A person, depending on the immunization scheme, will receive one or more (typically 2) vaccines; at a certain moment, the number of vaccination might be larger than the number of people

people_fully_vaccinated: This is the number of people that received the entire set of immunization according to the immunization scheme (typically 2); at a certain moment in time, there might be a certain number of people that received one vaccine and another number (smaller) of people that received all vaccines in the scheme.

daily_vaccinations_raw: For a certain data entry, the number of vaccination for that date/country.

daily_vaccinations: For a certain data entry, the number of vaccination for that date/country.

Note: For details of rest of the columns please visit the page, link is available above.

Outline:

We will finish our project in four steps as follows:

1 Data Downloading We will install all the required libraries and download the dataset.

2 Data Preparation & Cleaning We will start checking whether the dataset is clean or not, like if there are duplicate entries, missing values or any other misguiding data, which may lead us to bad results.

3 Visualization We will start analyze the dataset, with some visualization on different columns, try to set relationship between columns and make inferences

4 Q & A We will try to answer some interesting questions based on the data available and what a person can ask in general.

Step 1: Data Downloading

Let's first start with installing and importing required libraries and modules, that we are going to use in this entire project notebook.

We will install Numpy library for mathematical computations,
We will install Pandas library, as we will do our whole analysis on the entire dataset using pandas dataframe,
We will install jovian library to keep our a copy of our notebook on jovian platform,
We will install opendatasets library for downloading the dataset from the kaggle
We will install Plotly to create some visualization

And finally we will import some useful modules from these libraries.

!pip install pandas==1.1.5 --quiet 
#Installing pandas library