NOTE -
The Dataset used in this workshop was updated recently from Kaggle, here are the things changed since the live workshop.

  • The filename of the dataset has been updated, the new filename is './us-accidents/US_Accidents_Dec20_updated.csv'. Please change the filename if you are getting errors while creating the dataframe.
  • The "Source" column is removed from the updated dataset. Any mention of "Source" in this notebook can be ignored while using the updated dataset.

US Accidents Exploratory Data Analysis

TODO - talk about EDA

TODO - talk about the dataset (source, what it contains, how it will be useful)

  • Kaggle
  • informaiton about accidents
  • can use useful to prevent accidents
  • mention that this does not contain data about New York
pip install opendatasets --upgrade --quiet
import opendatasets as od

download_url = 'https://www.kaggle.com/sobhanmoosavi/us-accidents'

od.download(download_url)
Please provide your Kaggle credentials to download this dataset. Learn more: http://bit.ly/kaggle-creds Your Kaggle username: aakashns1 Your Kaggle Key: ··········
0%| | 0.00/299M [00:00<?, ?B/s]
Downloading us-accidents.zip to ./us-accidents
100%|██████████| 299M/299M [00:05<00:00, 56.6MB/s]
data_filename = './us-accidents/US_Accidents_Dec20.csv'

Data Preparation and Cleaning

  1. Load the file using Pandas
  2. Look at some information about the data & the columns
  3. Fix any missing or incorrect values