Learn practical skills, build real-world projects, and advance your career

Analysis of Premier League data from 2015/16 to 2019/20

The Premier League is one of the world's most entertaining leagues. They have some of the best managers, players and fans! But, what makes it truly entertaining is the sheer unpredictability. There are 6 equally amazing teams with a different team lifting the trophy every season. Not only that, the league has also witnessed victories from teams outside of the top 6. So, let us analyse some of these instances.

Now, we won't be looking for NaN values and eliminating them in this dataset because the NaN values have a purpose. For example, attributes like Saves , Sweeper Clearances , Punches , etc are specific to a goalkeeper. So, a striker will have values as NaN for these attributes

#importing pandas, numpy and some visualisation libraries.

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

import cufflinks as cf

import plotly.graph_objects as go
from plotly.subplots import make_subplots

from plotly.offline import init_notebook_mode, download_plotlyjs,plot,iplot
import plotly.express as px

%matplotlib inline
cf.go_offline()

init_notebook_mode(connected=True)
#Reading the csv files for each season of the Premier League (2015/16 to 2019/20)

pl16 = pd.read_csv('pl_15-16.csv')
pl17 = pd.read_csv('pl_16-17.csv')
pl18 = pd.read_csv('pl_17-18.csv')
pl19 = pd.read_csv('pl_18-19.csv')
pl20 = pd.read_csv('pl_19-20.csv')

pl16['Year']='2015/16'
pl17['Year']='2016/17'
pl18['Year']='2017/18'
pl19['Year']='2018/19'
pl20['Year']='2019/20'
#Concatenating all the seasons' data into one dataframe for specific analysis later. 

pl_upto20 = pd.concat([pl16, pl17, pl18, pl19, pl20])