Learn practical skills, build real-world projects, and advance your career

Overview

We have been given techonolgy employment fo the years 2018,2019,2020. We are to perform Data Wrangling and Data Analysis with the aid of pandas, matplotlib and plotly

import matplotlib.pyplot as plt
plt.figure(num=None, figsize=(8, 6), dpi=80, facecolor='w', edgecolor='k')
import seaborn as sns
import numpy as np
import pandas as pd
<Figure size 640x480 with 0 Axes>

Task 1: Data Loading and Data Aggregation

  • Load the 3 data files into the variables data_18, data_19, data_20.
data_18  =  pd.read_csv("https://raw.githubusercontent.com/dphi-official/Datasets/master/IT_Salary_Survey_EU_18-20/Survey_2018.csv")
data_19  =  pd.read_csv("https://raw.githubusercontent.com/dphi-official/Datasets/master/IT_Salary_Survey_EU_18-20/Survey_2019.csv")
data_20  =  pd.read_csv("https://raw.githubusercontent.com/dphi-official/Datasets/master/IT_Salary_Survey_EU_18-20/Survey_2020.csv")

Task 2: Data Analysis

  • Display the first 5 rows of the 2018 survey data
  • Display a concise summary of the 2020 data and list out 3 observations/inferences that you observe from the result. For this you will need to use the info() method.
  • Display the descriptive statistics of the 2018 survey data
  • Display the number of missing values in each column of the 2018 survey data
    How many people responded to the survey in each of the 3 years? Has the number increased or decreased over the years?
  • Display all the unique values and their frequency in the column - “Number of vacation days” of 2020 data. Write down your observations (at least one) for this result.