Learn practical skills, build real-world projects, and advance your career

NBA PLAYER STATS 2020 EDA

In this Jupyter notebook, I will be showing you how to perform Exploratory Data Analysis on web scraped data of NBA player stats which is web scraped directly from http://www.basketball-reference.com/

Introduction

Data Set Information:

This Dataset contains NBA players stats per_game in 2020 which can be scraped here https://www.basketball-reference.com/leagues/NBA_2020_per_game.html

Data Attributes
AcronymDescription
RkRank
PosPosition
AgePlayer's age on February 1 of the season
TmTeam
GGames
GSGames Started
MPMinutes Played Per Game
FGField Goals Per Game
FGAField Goal Attempts Per Game
FG%Field Goal Percentage
3P3-Point Field Goals Per Game
3PA3-Point Field Goal Attempts Per Game
3P%FG% on 3-Pt FGAs.
2P2-Point Field Goals Per Game
2PA2-Point Field Goal Attempts Per Game
2P%FG% on 2-Pt FGAs.
eFG%Effective Field Goal Percentage
(Note: This statistic adjusts for the fact that a 3-point field goal is worth one more point than a 2-point field goal.)
FTFree Throws Per Game
FTAFree Throw Attempts Per Game
FT%Free Throw Percentage
ORBOffensive Rebounds Per Game
DRBDefensive Rebounds Per Game
TRBTotal Rebounds Per Game
ASTAssists Per Game
STLSteals Per Game
BLKBlocks Per Game
TOVTurnovers Per Game
PFPersonal Fouls Per Game
PTSPoints Per Game

There are several options for getting the dataset into Jupyter:

  • Download the csv manually and upload it via Jupyter's GUI
  • Use the urlretrieve function from the urllib.request to download CSV files from a raw URL directly
  • a helper library e.g. opendatasets, which contains a collection of curated datasets and provides a function for directly download.
  • Web scrape html page containing stats using pandas read_html() function

We'll use the pandas read_html() function to download the stats table and save it in a csv file.

!pip install jovian numpy pandas matplotlib seaborn opendatasets --upgrade --quiet

Let's begin by downloading the data, and listing it.

Web scraping data using pandas

The code block below retrieve the "2019-20 NBA Player Stats: Per Game" data from http://www.basketball-reference.com/