abhishek1567-cse18/ipl-data-analysis-and-visualization - Jovian
Learn data science and machine learning by building real-world projects on Jovian

IPL Data Analysis & Visualization

project_name = 'ipl_data_analysis_and_visualization'
import jovian
jovian.commit(project = project_name, files = ['matches.csv'])
[jovian] Attempting to save notebook.. [jovian] Updating notebook "abhishek1567-cse18/ipl-data-analysis-and-visualization" on https://jovian.ai [jovian] Uploading notebook.. [jovian] Uploading additional files... [jovian] Committed successfully! https://jovian.ai/abhishek1567-cse18/ipl-data-analysis-and-visualization

The Indian Premier League (IPL) is a professional Twenty20 cricket league in India contested during March or April and May of every year by eight teams representing eight different cities in India.The league was founded by the Board of Control for Cricket in India (BCCI) in 2008. The IPL has an exclusive window in ICC Future Tours Programme.

Section-1: Data Preparation and Cleaning

In this data Analysis We will be using various Libraries such as pandas, Numpy, Seaborn & Matplotlib

Installing The Mentioned Libraries

pip install numpy
Requirement already satisfied: numpy in /opt/conda/lib/python3.8/site-packages (1.19.2) Note: you may need to restart the kernel to use updated packages.
pip install pandas
Requirement already satisfied: pandas in /opt/conda/lib/python3.8/site-packages (1.1.3) Requirement already satisfied: python-dateutil>=2.7.3 in /opt/conda/lib/python3.8/site-packages (from pandas) (2.8.1) Requirement already satisfied: pytz>=2017.2 in /opt/conda/lib/python3.8/site-packages (from pandas) (2020.1) Requirement already satisfied: numpy>=1.15.4 in /opt/conda/lib/python3.8/site-packages (from pandas) (1.19.2) Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.8/site-packages (from python-dateutil>=2.7.3->pandas) (1.15.0) Note: you may need to restart the kernel to use updated packages.
pip install seaborn
Requirement already satisfied: seaborn in /opt/conda/lib/python3.8/site-packages (0.11.0) Requirement already satisfied: scipy>=1.0 in /opt/conda/lib/python3.8/site-packages (from seaborn) (1.5.2) Requirement already satisfied: matplotlib>=2.2 in /opt/conda/lib/python3.8/site-packages (from seaborn) (3.3.2) Requirement already satisfied: numpy>=1.15 in /opt/conda/lib/python3.8/site-packages (from seaborn) (1.19.2) Requirement already satisfied: pandas>=0.23 in /opt/conda/lib/python3.8/site-packages (from seaborn) (1.1.3) Requirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.8/site-packages (from matplotlib>=2.2->seaborn) (0.10.0) Requirement already satisfied: pillow>=6.2.0 in /opt/conda/lib/python3.8/site-packages (from matplotlib>=2.2->seaborn) (8.0.0) Requirement already satisfied: python-dateutil>=2.1 in /opt/conda/lib/python3.8/site-packages (from matplotlib>=2.2->seaborn) (2.8.1) Requirement already satisfied: certifi>=2020.06.20 in /opt/conda/lib/python3.8/site-packages (from matplotlib>=2.2->seaborn) (2020.6.20) Requirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/lib/python3.8/site-packages (from matplotlib>=2.2->seaborn) (1.2.0) Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.3 in /opt/conda/lib/python3.8/site-packages (from matplotlib>=2.2->seaborn) (2.4.7) Requirement already satisfied: pytz>=2017.2 in /opt/conda/lib/python3.8/site-packages (from pandas>=0.23->seaborn) (2020.1) Requirement already satisfied: six in /opt/conda/lib/python3.8/site-packages (from cycler>=0.10->matplotlib>=2.2->seaborn) (1.15.0) Note: you may need to restart the kernel to use updated packages.
pip install matplotlib
Requirement already satisfied: matplotlib in /opt/conda/lib/python3.8/site-packages (3.3.2) Requirement already satisfied: pillow>=6.2.0 in /opt/conda/lib/python3.8/site-packages (from matplotlib) (8.0.0) Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.3 in /opt/conda/lib/python3.8/site-packages (from matplotlib) (2.4.7) Requirement already satisfied: certifi>=2020.06.20 in /opt/conda/lib/python3.8/site-packages (from matplotlib) (2020.6.20) Requirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.8/site-packages (from matplotlib) (0.10.0) Requirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/lib/python3.8/site-packages (from matplotlib) (1.2.0) Requirement already satisfied: python-dateutil>=2.1 in /opt/conda/lib/python3.8/site-packages (from matplotlib) (2.8.1) Requirement already satisfied: numpy>=1.15 in /opt/conda/lib/python3.8/site-packages (from matplotlib) (1.19.2) Requirement already satisfied: six in /opt/conda/lib/python3.8/site-packages (from cycler>=0.10->matplotlib) (1.15.0) Note: you may need to restart the kernel to use updated packages.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

The Dataset I am using is downloaded from kaggle and contains around 6 csv's but For the current analysis we will be using only matches Played Data i.e matches.csv

Reading Data using Pandas
ipl_df = pd.read_csv('matches.csv')

Using .info() method we can See the type of values each Column contains

ipl_df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 756 entries, 0 to 755 Data columns (total 18 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 id 756 non-null int64 1 Season 756 non-null object 2 city 749 non-null object 3 date 756 non-null object 4 team1 756 non-null object 5 team2 756 non-null object 6 toss_winner 756 non-null object 7 toss_decision 756 non-null object 8 result 756 non-null object 9 dl_applied 756 non-null int64 10 winner 752 non-null object 11 win_by_runs 756 non-null int64 12 win_by_wickets 756 non-null int64 13 player_of_match 752 non-null object 14 venue 756 non-null object 15 umpire1 754 non-null object 16 umpire2 754 non-null object 17 umpire3 119 non-null object dtypes: int64(4), object(14) memory usage: 106.4+ KB

to know number of rows and columns of dataset we will use the .shape method

ipl_df.shape
(756, 18)
# The .describe method gives us the overview of our data i.e data in rows and columns
ipl_df.describe
<bound method NDFrame.describe of         id    Season           city        date                        team1  \
0        1  IPL-2017      Hyderabad  05-04-2017          Sunrisers Hyderabad   
1        2  IPL-2017           Pune  06-04-2017               Mumbai Indians   
2        3  IPL-2017         Rajkot  07-04-2017                Gujarat Lions   
3        4  IPL-2017         Indore  08-04-2017       Rising Pune Supergiant   
4        5  IPL-2017      Bangalore  08-04-2017  Royal Challengers Bangalore   
..     ...       ...            ...         ...                          ...   
751  11347  IPL-2019         Mumbai  05-05-2019        Kolkata Knight Riders   
752  11412  IPL-2019        Chennai  07-05-2019          Chennai Super Kings   
753  11413  IPL-2019  Visakhapatnam  08-05-2019          Sunrisers Hyderabad   
754  11414  IPL-2019  Visakhapatnam  10-05-2019               Delhi Capitals   
755  11415  IPL-2019      Hyderabad  12-05-2019               Mumbai Indians   

                           team2                  toss_winner toss_decision  \
0    Royal Challengers Bangalore  Royal Challengers Bangalore         field   
1         Rising Pune Supergiant       Rising Pune Supergiant         field   
2          Kolkata Knight Riders        Kolkata Knight Riders         field   
3                Kings XI Punjab              Kings XI Punjab         field   
4               Delhi Daredevils  Royal Challengers Bangalore           bat   
..                           ...                          ...           ...   
751               Mumbai Indians               Mumbai Indians         field   
752               Mumbai Indians          Chennai Super Kings           bat   
753               Delhi Capitals               Delhi Capitals         field   
754          Chennai Super Kings          Chennai Super Kings         field   
755          Chennai Super Kings               Mumbai Indians           bat   

     result  dl_applied                       winner  win_by_runs  \
0    normal           0          Sunrisers Hyderabad           35   
1    normal           0       Rising Pune Supergiant            0   
2    normal           0        Kolkata Knight Riders            0   
3    normal           0              Kings XI Punjab            0   
4    normal           0  Royal Challengers Bangalore           15   
..      ...         ...                          ...          ...   
751  normal           0               Mumbai Indians            0   
752  normal           0               Mumbai Indians            0   
753  normal           0               Delhi Capitals            0   
754  normal           0          Chennai Super Kings            0   
755  normal           0               Mumbai Indians            1   

     win_by_wickets player_of_match  \
0                 0    Yuvraj Singh   
1                 7       SPD Smith   
2                10         CA Lynn   
3                 6      GJ Maxwell   
4                 0       KM Jadhav   
..              ...             ...   
751               9       HH Pandya   
752               6        AS Yadav   
753               2         RR Pant   
754               6    F du Plessis   
755               0       JJ Bumrah   

                                         venue         umpire1  \
0    Rajiv Gandhi International Stadium, Uppal     AY Dandekar   
1      Maharashtra Cricket Association Stadium  A Nand Kishore   
2       Saurashtra Cricket Association Stadium     Nitin Menon   
3                       Holkar Cricket Stadium    AK Chaudhary   
4                        M Chinnaswamy Stadium             NaN   
..                                         ...             ...   
751                           Wankhede Stadium   Nanda Kishore   
752                  M. A. Chidambaram Stadium     Nigel Llong   
753                           ACA-VDCA Stadium             NaN   
754                           ACA-VDCA Stadium   Sundaram Ravi   
755         Rajiv Gandhi Intl. Cricket Stadium     Nitin Menon   

            umpire2                  umpire3  
0          NJ Llong                      NaN  
1            S Ravi                      NaN  
2         CK Nandan                      NaN  
3     C Shamshuddin                      NaN  
4               NaN                      NaN  
..              ...                      ...  
751        O Nandan                   S Ravi  
752     Nitin Menon                Ian Gould  
753             NaN                      NaN  
754  Bruce Oxenford  Chettithody Shamshuddin  
755       Ian Gould              Nigel Llong  

[756 rows x 18 columns]>

Data Cleaning and Processing

ipl_df

We wont be using the Umpires Columns ('umpire1', 'umpire2', 'umpire3') in this analysis so we will remove those fields using .drop() method

#inplace argument is used to make permanent changes in the dataframe
ipl_df.drop(columns=['umpire1','umpire2','umpire3'],inplace=True)
# Exploring all column names in the data frame
ipl_df.columns
Index(['id', 'Season', 'city', 'date', 'team1', 'team2', 'toss_winner',
       'toss_decision', 'result', 'dl_applied', 'winner', 'win_by_runs',
       'win_by_wickets', 'player_of_match', 'venue'],
      dtype='object')
# Now lets seasons data we have in our dataframe
# we use .unique() method to list the unique items from the selected column
ipl_df.Season.unique()
array(['IPL-2017', 'IPL-2008', 'IPL-2009', 'IPL-2010', 'IPL-2011',
       'IPL-2012', 'IPL-2013', 'IPL-2014', 'IPL-2015', 'IPL-2016',
       'IPL-2018', 'IPL-2019'], dtype=object)
# Now Lets see all the teams that have played so far
ipl_df.team1.unique()
array(['Sunrisers Hyderabad', 'Mumbai Indians', 'Gujarat Lions',
       'Rising Pune Supergiant', 'Royal Challengers Bangalore',
       'Kolkata Knight Riders', 'Delhi Daredevils', 'Kings XI Punjab',
       'Chennai Super Kings', 'Rajasthan Royals', 'Deccan Chargers',
       'Kochi Tuskers Kerala', 'Pune Warriors', 'Rising Pune Supergiants',
       'Delhi Capitals'], dtype=object)
ipl_df.city.unique()
array(['Hyderabad', 'Pune', 'Rajkot', 'Indore', 'Bangalore', 'Mumbai',
       'Kolkata', 'Delhi', 'Chandigarh', 'Kanpur', 'Jaipur', 'Chennai',
       'Cape Town', 'Port Elizabeth', 'Durban', 'Centurion',
       'East London', 'Johannesburg', 'Kimberley', 'Bloemfontein',
       'Ahmedabad', 'Cuttack', 'Nagpur', 'Dharamsala', 'Kochi',
       'Visakhapatnam', 'Raipur', 'Ranchi', 'Abu Dhabi', 'Sharjah', nan,
       'Mohali', 'Bengaluru'], dtype=object)

From the Above Observations some Data cleaning is required

  1. Pune was represented by various Team Names as 'Rising Pune Supergiant', 'Pune Warriors' & 'Rising Pune Supergiants' so as a convinience we will changes these with the recent team representing Pune 'Rising Pune Supergiant' in all columns involving this name i.e 'team1','team2','winner' & 'toss_winner' columns, similarly 2nd Change is in team name of Delhi
  2. Earlier the team name for delhi was 'Delhi Daredevils' but later it was changed to 'Delhi Capitals' so we will replace the "delhi Daredevils' with 'Delhi Capitals'
  3. Bangalore was Renamed as Bengaluru in 2014 so we will change the Name for City Bangalore to Bengaluru to avoid Errors in Data Analysis
# We will use the .replace() method for the above mentioned cleaning
ipl_df.team1.replace({'Rising Pune Supergiants' : 'Rising Pune Supergiant', 'Delhi Daredevils':'Delhi Capitals','Pune Warriors' : 'Rising Pune Supergiant'},inplace=True)
ipl_df.team2.replace({'Rising Pune Supergiants' : 'Rising Pune Supergiant', 'Delhi Daredevils':'Delhi Capitals','Pune Warriors' : 'Rising Pune Supergiant'},inplace=True)
ipl_df.toss_winner.replace({'Rising Pune Supergiants' : 'Rising Pune Supergiant', 'Delhi Daredevils':'Delhi Capitals','Pune Warriors' : 'Rising Pune Supergiant'},inplace=True)
ipl_df.winner.replace({'Rising Pune Supergiants' : 'Rising Pune Supergiant', 'Delhi Daredevils':'Delhi Capitals','Pune Warriors' : 'Rising Pune Supergiant'},inplace=True)
ipl_df.city.replace({'Bangalore' : 'Bengaluru'},inplace=True)
ipl_df.team1.unique()
array(['Sunrisers Hyderabad', 'Mumbai Indians', 'Gujarat Lions',
       'Rising Pune Supergiant', 'Royal Challengers Bangalore',
       'Kolkata Knight Riders', 'Delhi Capitals', 'Kings XI Punjab',
       'Chennai Super Kings', 'Rajasthan Royals', 'Deccan Chargers',
       'Kochi Tuskers Kerala'], dtype=object)
ipl_df.team2.unique()
array(['Royal Challengers Bangalore', 'Rising Pune Supergiant',
       'Kolkata Knight Riders', 'Kings XI Punjab', 'Delhi Capitals',
       'Sunrisers Hyderabad', 'Mumbai Indians', 'Gujarat Lions',
       'Rajasthan Royals', 'Chennai Super Kings', 'Deccan Chargers',
       'Kochi Tuskers Kerala'], dtype=object)
ipl_df.city.unique()
array(['Hyderabad', 'Pune', 'Rajkot', 'Indore', 'Bengaluru', 'Mumbai',
       'Kolkata', 'Delhi', 'Chandigarh', 'Kanpur', 'Jaipur', 'Chennai',
       'Cape Town', 'Port Elizabeth', 'Durban', 'Centurion',
       'East London', 'Johannesburg', 'Kimberley', 'Bloemfontein',
       'Ahmedabad', 'Cuttack', 'Nagpur', 'Dharamsala', 'Kochi',
       'Visakhapatnam', 'Raipur', 'Ranchi', 'Abu Dhabi', 'Sharjah', nan,
       'Mohali'], dtype=object)

So We Have Cleaned With Replication And Misspelled Data

Lets Commit our Work so it's Saved

jovian.commit(project=project_name, files = ['matches.csv'])
[jovian] Attempting to save notebook.. [jovian] Updating notebook "abhishek1567-cse18/ipl-data-analysis-and-visualization" on https://jovian.ai [jovian] Uploading notebook.. [jovian] Uploading additional files... [jovian] Committed successfully! https://jovian.ai/abhishek1567-cse18/ipl-data-analysis-and-visualization

Lets Check For Missing Values

# we can use .isnull() to set Null values to True and then use .sum() to calculate all the null values
ipl_df.isnull().sum().sum()
15

Above Result Shows we have 15 Null values in our data set. Now we will search For them.

null_df = ipl_df[ipl_df.isna().any(axis=1)]
null_df

From Above Observations We can See NaN values in various Columns like 'city', 'winner' ,'Player Of The Match'. But By Observation it is clear the NaN Values for columns like 'Winner' and 'Player Of The Match' are only for Case When Match had "No Result" so we Can assume the Match might have been a Draw or Cancelled Due to Some Weather Or Technical Conditions. While The Other Case 'City' Column has NaN values for Rows where Stadium Location is dubai. So we will Replace These NaN values and Insert "Dubai" as City in its Place

We Can See this values are at index 461,462,466,468,469,474,476

ipl_df.loc[460:470]

Now We will Replace it With "Dubai"

ipl_df.loc[[461,462,466,468,469,474,476],'city'] = "Dubai"
#Lets See the Changed Values
ipl_df.loc[461:480]
#Now lets Confirm if we have any NaN values in City Field
ipl_df.city.isnull().any()
False

Now Lets Check For Total Remaining NaN Values

# Lets Check if any any other COlumns Have NaN values
ipl_df.isna().any()[lambda x: x]
winner             True
player_of_match    True
dtype: bool

From Above Results It is clear That we have have No NaN values Other than Those in Columns Of Winner and Player Of The Match

So We Have Now Completed With Our Data Cleaning Part and Can Move with Further Steps

Let's Commit the Work Completed Until now

jovian.commit(project=project_name)
[jovian] Attempting to save notebook.. [jovian] Updating notebook "abhishek1567-cse18/ipl-data-analysis-and-visualization" on https://jovian.ai [jovian] Uploading notebook.. [jovian] Uploading additional files... [jovian] Committed successfully! https://jovian.ai/abhishek1567-cse18/ipl-data-analysis-and-visualization

Section-2: Exploratory Analysis and Visualization

Now We Will Analyse the Data For Different types of Queries

#Lets find total Number of Matches Played from 2008 - 2019
ipl_df.id.count()
756

We can see 756 Matches Have Been Played in 11 Seasons (08 - 19)

#Now Lets Find Total Number Of Matches Where Result Was normal i.e Not A Tie
regular_matches = ipl_df[ipl_df.result == 'normal'].count()
regular_matches.result
743

We can See From 756 Matches Played Only 13 seem not to have a normal result

Total Matches Played in Each City

#Lets See About Cities Where Matches have Been Played
ipl_df.city.unique()
array(['Hyderabad', 'Pune', 'Rajkot', 'Indore', 'Bengaluru', 'Mumbai',
       'Kolkata', 'Delhi', 'Chandigarh', 'Kanpur', 'Jaipur', 'Chennai',
       'Cape Town', 'Port Elizabeth', 'Durban', 'Centurion',
       'East London', 'Johannesburg', 'Kimberley', 'Bloemfontein',
       'Ahmedabad', 'Cuttack', 'Nagpur', 'Dharamsala', 'Kochi',
       'Visakhapatnam', 'Raipur', 'Ranchi', 'Abu Dhabi', 'Sharjah',
       'Dubai', 'Mohali'], dtype=object)

Now Lets See Match count played in each of the above city

cities = ipl_df.groupby('city')[['id']].count()
cities

Lets Arrange this data In a More Organised manner

plt.figaspect
cities.rename(columns={'id':'matches'},inplace=True)
cities = cities.sort_values('matches',ascending=True).reset_index()
cities

We can See IPL has Altogether 32 Official Locations where matches have been Played since 2008 till 2019. As we all might know this year's IPL is again being Held at UAE

Lets Plot the Cities in a bar Chart

plt.figure(figsize=(20,10))
plt.grid()
plt.title('Number Of Matches Played In Each City')
sns.barplot(x='matches',y='city',data=cities);
Notebook Image

It seems Mumbai has Been the favourite Location followed by Bengaluru and Kolkata

Now Lets See Matches Won by Each Team

Total Matches Won By Each Team

ipl_df.winner.unique()
array(['Sunrisers Hyderabad', 'Rising Pune Supergiant',
       'Kolkata Knight Riders', 'Kings XI Punjab',
       'Royal Challengers Bangalore', 'Mumbai Indians', 'Delhi Capitals',
       'Gujarat Lions', 'Chennai Super Kings', 'Rajasthan Royals',
       'Deccan Chargers', 'Kochi Tuskers Kerala', nan], dtype=object)
winner_df = ipl_df.groupby('winner')[['id']].count()
winner_df = winner_df.sort_values('id', ascending=False).reset_index()

winner_df.rename(columns = {'id':'wins','winner':'Teams'},inplace=True)
winner_df

Seems Mumbai Indians Have won the Most matches in IPL Till Date. Followed by Chennai Super Kings.

Now Lets Plot These Wins

#Plotting Wins vs Teams
plt.figure(figsize=(30,20))
plt.xlabel('Teams')
plt.ylabel('Wins')
plt.title('Matches Won By Each Team');
plt.bar(winner_df.Teams,winner_df.wins);
Notebook Image

Lets Add Colour To Each Team so That we Get A Clear Idea

We can do this by using color argument of the bar() Function

#Plotting Wins vs Teams
#We will be using colour code of teams jersey to make it easily understandable
plt.figure(figsize=(20,10))
plt.legend(winner_df.Teams,loc=1)
plt.xlabel('Teams',fontweight='bold',fontsize=30)
plt.ylabel('Wins',fontweight='bold',fontsize=30)
plt.tick_params(labelsize=20)
plt.xticks(rotation=90)
plt.title('Matches Won By Each Team',fontweight='bold',fontsize=30);
plt.bar(winner_df.Teams, winner_df.wins, color = ['blue','#FFD801','#461B7E','#C11B17','#F660AB','#000080','#F535AA','#F87217','#BCC6CC','#2C04A2','#E04F16','#632B72']);
Notebook Image

From the above Graph its clear Mumbai Indians have won most number of Matches

Lets Commit Our Work Until Now

jovian.commit(project=project_name)
[jovian] Attempting to save notebook.. [jovian] Updating notebook "abhishek1567-cse18/ipl-data-analysis-and-visualization" on https://jovian.ai [jovian] Uploading notebook.. [jovian] Uploading additional files... [jovian] Committed successfully! https://jovian.ai/abhishek1567-cse18/ipl-data-analysis-and-visualization

Now lets See Season with Most Number Of Matches

season_df = ipl_df.groupby('Season')[['id']].count()
season_df = season_df.sort_values('Season', ascending=False).reset_index()
season_df.rename(columns = {'id':'Matches','Season':'Year'},inplace = True)
season_df

Now Lets plot this Information

#To make it look more neat we will rotate the x-axis name with an angle of 60 using .xticks() method
# Also will make the font bold and increase its size for readability
plt.figure(figsize=(20,10))
plt.title("Mathes Played In Each Season",fontweight='bold',fontsize=30)
plt.xlabel('Season',fontweight='bold',fontsize=30)
plt.ylabel('Total Matches',fontweight='bold',fontsize=30)
plt.xticks(rotation='60')
plt.tick_params(labelsize=20)
plt.bar(season_df.Year,season_df.Matches,color=['#98AFC7','#6D7B8D']);
Notebook Image

From The Above Graph its Clear the Season 2013 had most number of matches played (76)

Lets Commit our work and Move further with analysis

jovian.commit(project=project_name)
[jovian] Attempting to save notebook.. [jovian] Updating notebook "abhishek1567-cse18/ipl-data-analysis-and-visualization" on https://jovian.ai [jovian] Uploading notebook.. [jovian] Uploading additional files... [jovian] Committed successfully! https://jovian.ai/abhishek1567-cse18/ipl-data-analysis-and-visualization

Section 3: Asking Interesting Questions on data

I will be asking following Questions:

  1. What was the most preferred Decision On winning Toss i.e. Choose To Bat / Choose To Field

  2. Which Decision has proved most beneficial i.e Field / Bat

  3. Which Venue has hosted the Most Number Of Ipl Matches

  4. Who has been awarded with Player Of the Max maximum Number Of Times

  5. Who Has Won the Ipl Trophy Most Number of Times

  6. Which Season had Most Number of Matches Played

Q1. What was the most preferred Decision On winning Toss i.e. Bat / Field

ipl_df
# We can see toss decision is either bat/field
ipl_df.toss_decision.unique()
array(['field', 'bat'], dtype=object)
decision_df = ipl_df.groupby('toss_decision')[['id']].count()
decision_df = decision_df.sort_values('id').reset_index()
decision_df.rename(columns={'id':'Total','toss_decision':'Decision'},inplace=True)
decision_df
#Lets plot the Result
plt.figure(figsize=(10,10))
plt.title("Preferred Decision",fontweight='bold',fontsize=30)
plt.xlabel('Decision',fontweight='bold',fontsize=30)
plt.ylabel('Total',fontweight='bold',fontsize=30)
plt.tick_params(labelsize=20)
plt.grid()
plt.bar(decision_df.Decision, decision_df.Total, color=['#4863A0','#566D7E']);
Notebook Image
print('The Most Preferred Decision After Winning Toss in the IPL Until 2019 has been "Choose to Field First"')
The Most Preferred Decision After Winning Toss in the IPL Until 2019 has been "Choose to Field First"
jovian.commit(project=project_name)
[jovian] Attempting to save notebook.. [jovian] Updating notebook "abhishek1567-cse18/ipl-data-analysis-and-visualization" on https://jovian.ai [jovian] Uploading notebook.. [jovian] Uploading additional files... [jovian] Committed successfully! https://jovian.ai/abhishek1567-cse18/ipl-data-analysis-and-visualization

Now Lets See Which Decision has been Proven more Beneficial

Q2. Which Decision has proved most beneficial i.e Field / Bat

field_df = ipl_df.loc[(ipl_df['toss_winner'] == ipl_df['winner']) & (ipl_df['toss_decision'] == 'field'), ['id', 'winner','toss_decision']]
field_df.winner.count()
259
bat_df = ipl_df.loc[(ipl_df['toss_winner'] == ipl_df['winner']) & (ipl_df['toss_decision'] == 'bat'), ['id', 'winner','toss_decision']]
bat_df.winner.count()
134
frames = [bat_df, field_df]
result_df = pd.concat(frames)
result_df = result_df.groupby('toss_decision')[['id']].count()
result_df
#As from Earlier Analysis we know out of 756 Toss that were tossed (2008 - 2019) "463 times toss winning Team Choose to Field First" and only "293 Times batting was choosen"
# Now Lets Plot the New Understanding Regarding the Success of these decisions
result_df = result_df.sort_values('id').reset_index()
result_df.rename(columns={'id':'Total','toss_decision':'Decision'},inplace=True)
result_df
plt.figure(figsize=(10,10))
plt.title("Decision Success",fontweight='bold',fontsize=30)
plt.xlabel('Decision',fontweight='bold',fontsize=30)
plt.ylabel('Total',fontweight='bold',fontsize=30)
plt.tick_params(labelsize=20)
plt.bar(decision_df.Decision, decision_df.Total, color=['#4CC552','#4CC552']);
plt.bar(result_df.Decision, result_df.Total, color=['#00FF00','#00FF00']);
plt.legend(['Decision Taken','Decision Proved Right']);
Notebook Image

We can See the Fielding decision on winning toss has not only been most Preferred one But it has also proven to be a good Decision as almost 60% of the Time it is Proved Right

Q3. Which Venue has hosted the Most Number Of Matches

# Lets see how many venues have hosted the Ipl Matches
ipl_df.venue.unique()
array(['Rajiv Gandhi International Stadium, Uppal',
       'Maharashtra Cricket Association Stadium',
       'Saurashtra Cricket Association Stadium', 'Holkar Cricket Stadium',
       'M Chinnaswamy Stadium', 'Wankhede Stadium', 'Eden Gardens',
       'Feroz Shah Kotla',
       'Punjab Cricket Association IS Bindra Stadium, Mohali',
       'Green Park', 'Punjab Cricket Association Stadium, Mohali',
       'Sawai Mansingh Stadium', 'MA Chidambaram Stadium, Chepauk',
       'Dr DY Patil Sports Academy', 'Newlands', "St George's Park",
       'Kingsmead', 'SuperSport Park', 'Buffalo Park',
       'New Wanderers Stadium', 'De Beers Diamond Oval',
       'OUTsurance Oval', 'Brabourne Stadium',
       'Sardar Patel Stadium, Motera', 'Barabati Stadium',
       'Vidarbha Cricket Association Stadium, Jamtha',
       'Himachal Pradesh Cricket Association Stadium', 'Nehru Stadium',
       'Dr. Y.S. Rajasekhara Reddy ACA-VDCA Cricket Stadium',
       'Subrata Roy Sahara Stadium',
       'Shaheed Veer Narayan Singh International Stadium',
       'JSCA International Stadium Complex', 'Sheikh Zayed Stadium',
       'Sharjah Cricket Stadium', 'Dubai International Cricket Stadium',
       'M. A. Chidambaram Stadium', 'Feroz Shah Kotla Ground',
       'M. Chinnaswamy Stadium', 'Rajiv Gandhi Intl. Cricket Stadium',
       'IS Bindra Stadium', 'ACA-VDCA Stadium'], dtype=object)
total_venue = list(ipl_df.venue.unique())
len(total_venue)
41

We Can See ipl has hosted the Matches across 41 Different venues

Lets See Which Venue Hosted the Most Number Of Matches

venue_df = ipl_df.groupby('venue')[['id']].count()
venue_df = venue_df.sort_values('id',ascending=False).reset_index()
venue_df.rename(columns={'id':'Total','venue':'Stadium'},inplace=True)
labels = list(venue_df.Stadium)
venue_df

As we have a long list We will only Take Top 10 Venues for our Graphical Representation

plt.figure(figsize=(20,20))
plt.title("Venues",fontweight='bold',fontsize=30)
plt.tick_params(labelsize=40)
plt.pie(venue_df.Total,labels=labels,textprops={'fontsize': 13});
Notebook Image

So We can See the most Number of matches were played at Eden Gardens(77) Followed By Wankhede Stadium (73)

#Lets Commit The Work
jovian.commit(project=project_name)
[jovian] Attempting to save notebook.. [jovian] Updating notebook "abhishek1567-cse18/ipl-data-analysis-and-visualization" on https://jovian.ai [jovian] Uploading notebook.. [jovian] Uploading additional files... [jovian] Committed successfully! https://jovian.ai/abhishek1567-cse18/ipl-data-analysis-and-visualization

Q4. Who has been awarded with Player Of the Match maximum Number Of Times

#Lets Check how many players have been awarded with player of the match award
len(ipl_df.player_of_match.unique())
227

This is Huge Number, we can see 227 Players have been awarded with player of the match title

Now Among these players lets see who have Got the maximum Player of The Match Awards

player_df = ipl_df.groupby('player_of_match')[['id']].count()
player_df
player_df = player_df.sort_values('id',ascending=False).reset_index()
player_df
#Now From these Players Lets Extract Top 10 Players
players_df = player_df.head(10).copy()
players_df.rename(columns={'id':'Total_Awards','player_of_match':'Man_Of_The_Match'},inplace=True)
players_df

From the above result it is clear that Chris Gayle has received "21 Man of The Match Titles" and is followed by AB de Villiers having "20"

plt.figure(figsize=(15,10))
plt.title("Top 10 Players with Highest Man Of the Match Titles",fontweight='bold' )
plt.xticks(rotation=90)
plt.yticks(ticks=np.arange(0,25,5))
plt.ylabel('No. of Awards')
plt.xlabel('Players')
sns.barplot(x=players_df.Man_Of_The_Match,y=players_df.Total_Awards, alpha=0.6);
Notebook Image
#Lets Commit
jovian.commit(project=project_name)
[jovian] Attempting to save notebook.. [jovian] Updating notebook "abhishek1567-cse18/ipl-data-analysis-and-visualization" on https://jovian.ai [jovian] Uploading notebook.. [jovian] Uploading additional files... [jovian] Committed successfully! https://jovian.ai/abhishek1567-cse18/ipl-data-analysis-and-visualization

Q5. Who Has Won the Ipl Trophy Most Number of Times

Now lets search For Team with Most Season Wins

We will have to extract the Final matches from the Entire Data

To do that we can sort the matches season wise and then select the last match of the season

final_df = ipl_df.groupby('Season').tail(1).copy()
final_df
#Now Lets sort The Data According to Seasons
final_df = final_df.sort_values('Season')
final_df
final_df.winner.unique()
array(['Rajasthan Royals', 'Deccan Chargers', 'Chennai Super Kings',
       'Kolkata Knight Riders', 'Mumbai Indians', 'Sunrisers Hyderabad'],
      dtype=object)
final_df['winner'].value_counts()
Mumbai Indians           4
Chennai Super Kings      3
Kolkata Knight Riders    2
Sunrisers Hyderabad      1
Deccan Chargers          1
Rajasthan Royals         1
Name: winner, dtype: int64

We can See Mumbai Indians have Won the Most Season Titles till 2019

plt.figure(figsize=(20,10))
plt.title("Season Champions",fontweight='bold',fontsize=20)
plt.xlabel('Teams',fontweight='bold',fontsize=30)
plt.ylabel('Total Seasons',fontweight='bold',fontsize=20)
plt.xticks(rotation='60')
plt.tick_params(labelsize=10)
sns.countplot(x=final_df['winner'],palette=['#F535AA','#BCC6CC','yellow','#461B7E','blue','#F87217']);
Notebook Image

So We can easily verify ny visuals the Most Number of Season Champion is Mumbai Indians

jovian.commit(project=project_name)
[jovian] Attempting to save notebook.. [jovian] Updating notebook "abhishek1567-cse18/ipl-data-analysis-and-visualization" on https://jovian.ai [jovian] Uploading notebook.. [jovian] Uploading additional files... [jovian] Committed successfully! https://jovian.ai/abhishek1567-cse18/ipl-data-analysis-and-visualization

Q6. Which Season Had Most Number of Matches

#This we had Explored Earlier
season_df
#To make it look more neat we will rotate the x-axis name with an angle of 60 using .xticks() method
# Also will make the font bold and increase its size for readability
plt.figure(figsize=(20,10))
plt.title("Mathes Played In Each Season",fontweight='bold',fontsize=30)
plt.xlabel('Season',fontweight='bold',fontsize=30)
plt.ylabel('Total Matches',fontweight='bold',fontsize=30)
plt.xticks(rotation='60')
plt.tick_params(labelsize=20)
plt.bar(season_df.Year,season_df.Matches,color=['#98AFC7','#6D7B8D']);
Notebook Image

So We can see IPL-2013 had witnessed most number of matches

jovian.commit(project=project_name)
[jovian] Attempting to save notebook..

Section 4: Inferences and Conclusion

In this analysis I used the matches.csv file from the kaggle Datasets. Following are my conclusions about it

  1. A total of 756 matches have been played from 2008 - 2019
  2. Out of these 756 matches 743 matches were played normally and had a normal result
  3. Most number of Matches were played in Mumbai [101]
  4. Mumbai Indian's Have Won the Most Number of Matches (109) followed by Chennai Super Kings with 100 Matches
  5. IPL-2013 Season Hosted most Number of Matches (76)
  6. Eden Gardens (Stadium) Hosted the Most Number of Matches (77) followed by wankhede Stadium (73)
  7. Chris Gayle has been the Man Of The Match Most Number of Times with "21" Awards followed by AB di Villiers (20) and MS Dhoni (17)
  8. Mumbai Indians Have been the IPL Champions Most number of times (4) followed by Chennai Super Kings (3)
  9. Mumbai Indians and Chennai Super Kings have been the dominant Teams

Future Work

The data set had 6 different csv Files. I will try to explore the 'Players.csv', 'Deliveries.csv' and others in detail

References

jovian.commit(project=project_name, files = ['matches.csv'])