Exploratory Data Analysis Using Python : IPL (2008-2020)
IPL (Indian Premier League) dataset 2008-2019 has beentaken from https://www.kaggle.com/patrickb1912/ipl-complete-dataset-20082020 .This dataset consists of two seperate CSV files for matches and deliveries. These files contain the information of each match summary and ball by ball details, respectively. For analysis both former and current teams have been considered and Hidden Insights has been found considering all the matches played till 2020.
source
IPL is a professional Twenty20 cricket league in India usually contested between March and May of every year by eight teams representing eight different cities or states in India.The league was founded by the Board of Control for Cricket in India (BCCI) in 2007. To read more (view)
Current Teams
: Chennai Super Kings, Delhi Capitals & Delhi Daredevils are same, Kings XI Punjab, Kolkata Knight Riders, Mumbai Indians, Rajasthan Royals, Royal Challengers Bangalore, Sunrisers Hyderabad.
Source
Former teams
: Deccan Chargers, Kochi Tuskers Kerala, Pune Warriors India, Rising Pune Supergiant, Gujarat Lions
These are some helper libraries in Python for data analysis and visualization.Learned from Data Analysis with Python: Zero to Pandas, the course have assignments dedicated for the practice along with the jupyter notebooks used while teaching for better understanding and experimentation.It also has has active and supportive community forums for clearing the queries. Good option for beginners to kick start.
Table of Content
1. Downloading the Dataset
- Installing opndatasets helper library
- Saving dataset url
- Import
opendatasets
andoperating system
libraries- Downloading dataset
- View downloaded files
2.Data Preparation and Cleaning
- Import
Pandas
- Load IPL Ball-by-Ball 2008-2020.csv
- Check datatype of the loaded dataset
- View Columns
- Filter unwanted columns
- View 10 random rows
- Load IPL Matches 2008-2020.csv
- View Columns
- View content of columns
- change 'Delhi Daredevils' to its present name 'Delhi Capitals' and also there is typo in 'Rising Pune Supergiants' has extra 's' in last instead of 'Rising Pune Supergiant'
- Dealing with NAN values
- Creating dataframe with interested columns
- Check info of the dataframe
- Change datatype of date column from 'object' to 'date'
- Add a weekday, month and year columns
- Drop date column
- Check shape of the prepared dataframe
- Check 10 random rows of dataframe ready for analysis
3.Exploratory Analysis and Visualization
- Import helper librabries
matplotlib.pyplot
,seaborn
andnumpy
for visualization- Rename some columns
- Check nunmber of matches won by individual team
- Top 3 cities
- Top 3 stadiums
- Matches played in respective Year
- Matches played in respective Months
4. Asking and Answering Questions
- Win the Toss, Win the Match
- king of boundaries
- Luckiest Stadium for Mumbai Indians
- Wicket King
- Max matches won, CSK VS MI
- Player of the Match
- 2020 : Top 10 Boundary rivals
5. Hidden Insights
6. References
For analysis we'll take help of following libraries
:
- opendasets
- os
- pandas
- numpy
- matplotlib
- matplotlib.pyplot
- seaborn
1.Downloading the Dataset
1.1 We'll use the opendatasets helper library to download the files.
!pip install jovian opendatasets --upgrade --quiet