Premier League - An Exploratory Data Analysis

Season 2020-21 of English Premier League - Arguably the world's most entertaining football has come to an end. Congratulations to Manchester City for winning the league. It has been a different season without fans in the stadium, but it has given us some nail biting finishes, wild and breathtaking matches, memorable goals and cherishable moments. Before we get our studs and shin pads ready for next season, why not check some interesting trends in the 29 seasons of the beautiful game played so far and learn step-by-step approach to doing Exploratory Data Analysis along the way? Let's dig in.

The dataset we're using is a dataset on kaggle containing information about more than 10,000 Premier League matches played. The dataset can be found on https://www.kaggle.com/irkaal/english-premier-league-results.

We will use python libraries NumPy,Pandas,Matplotlib and Seaborn in this project. If you read through, you would be able to perform EDA on any dataset of your choice!

Downloading the Dataset

Let's start by downloading the dataset from kaggle. Here we use the opendatasets library made for python to download the same. By passing the URL of the kaggle page for the dataset to opendatasets.download(), we will download the dataset to our Jupyter notebook directly. To execute following cells select the cell and press Shift + Enter. Make sure you execute all the cells.

!pip install jovian opendatasets --upgrade --quiet

Let's begin by downloading the data, and listing the files within the dataset.

dataset_url = 'https://www.kaggle.com/irkaal/english-premier-league-results' 
import opendatasets as od
od.download(dataset_url)

Please provide your Kaggle credentials to download this dataset. Learn more: http://bit.ly/kaggle-creds
Your Kaggle username: adityapatkar
Your Kaggle Key: ··········

100%|██████████| 289k/289k [00:00<00:00, 35.6MB/s]

Downloading english-premier-league-results.zip to ./english-premier-league-results