Learn practical skills, build real-world projects, and advance your career

Exploratory Data Analysis Using Python : IPL (2008-2020)

IPL (Indian Premier League) dataset 2008-2019 has beentaken from https://www.kaggle.com/patrickb1912/ipl-complete-dataset-20082020 .This dataset consists of two seperate CSV files for matches and deliveries. These files contain the information of each match summary and ball by ball details, respectively. For analysis both former and current teams have been considered and Hidden Insights has been found considering all the matches played till 2020.
alt
source

IPL is a professional Twenty20 cricket league in India usually contested between March and May of every year by eight teams representing eight different cities or states in India.The league was founded by the Board of Control for Cricket in India (BCCI) in 2007. To read more (view)

Current Teams : Chennai Super Kings, Delhi Capitals & Delhi Daredevils are same, Kings XI Punjab, Kolkata Knight Riders, Mumbai Indians, Rajasthan Royals, Royal Challengers Bangalore, Sunrisers Hyderabad.
alt
Source

Former teams : Deccan Chargers, Kochi Tuskers Kerala, Pune Warriors India, Rising Pune Supergiant, Gujarat Lions

These are some helper libraries in Python for data analysis and visualization.Learned from Data Analysis with Python: Zero to Pandas, the course have assignments dedicated for the practice along with the jupyter notebooks used while teaching for better understanding and experimentation.It also has has active and supportive community forums for clearing the queries. Good option for beginners to kick start.

Table of Content


1. Downloading the Dataset

  1. Installing opndatasets helper library
  2. Saving dataset url
  3. Import opendatasets and operating system libraries
  4. Downloading dataset
  5. View downloaded files

2.Data Preparation and Cleaning

  1. Import Pandas
  2. Load IPL Ball-by-Ball 2008-2020.csv
  3. Check datatype of the loaded dataset
  4. View Columns
  5. Filter unwanted columns
  6. View 10 random rows
  7. Load IPL Matches 2008-2020.csv
  8. View Columns
  9. View content of columns
  10. change 'Delhi Daredevils' to its present name 'Delhi Capitals' and also there is typo in 'Rising Pune Supergiants' has extra 's' in last instead of 'Rising Pune Supergiant'
  11. Dealing with NAN values
  12. Creating dataframe with interested columns
  13. Check info of the dataframe
  14. Change datatype of date column from 'object' to 'date'
  15. Add a weekday, month and year columns
  16. Drop date column
  17. Check shape of the prepared dataframe
  18. Check 10 random rows of dataframe ready for analysis

3.Exploratory Analysis and Visualization

  1. Import helper librabries matplotlib.pyplot, seaborn and numpy for visualization
  2. Rename some columns
  3. Check nunmber of matches won by individual team
  4. Top 3 cities
  5. Top 3 stadiums
  6. Matches played in respective Year
  7. Matches played in respective Months

4. Asking and Answering Questions

  1. Win the Toss, Win the Match
  2. king of boundaries
  3. Luckiest Stadium for Mumbai Indians
  4. Wicket King
  5. Max matches won, CSK VS MI
  6. Player of the Match
  7. 2020 : Top 10 Boundary rivals

5. Hidden Insights

6. References

For analysis we'll take help of following libraries :


  • opendasets
  • os
  • pandas
  • numpy
  • matplotlib
  • matplotlib.pyplot
  • seaborn

1.Downloading the Dataset


1.1 We'll use the opendatasets helper library to download the files.

!pip install jovian opendatasets --upgrade --quiet