Learn practical skills, build real-world projects, and advance your career

US Accidents Exploratory Data Analysis

  • In statistics, exploratory data analysis is an approach of analyzing data sets to summarize their main characteristics, often using statistical graphs and other data visualization methods.
  • In this exploratory data analysis, we are going to analyse data of accidents happened in various cities of US.
  • I got this dataset from kaggle. Kaggle is site where you can get real world datasets for analysis., which will be useful for modifications in that field.
  • This is a countrywide car accident dataset, which covers 49 states of the USA. The accident data are collected from February 2016 to Dec 2020, using multiple APIs that provide streaming traffic incident (or event) data. These APIs broadcast traffic data captured by a variety of entities, such as the US and state departments of transportation, law enforcement agencies, traffic cameras, and traffic sensors within the road-networks. Currently, there are about 3 million accident records in this dataset.
  • This analysis will be helpful to make changes accordingly to prevent future accidents.

Download the data

Steps to download datasets from this github repository are:
  • Install opendatasets library using !pip command.
  • Download dataset using opendatasets.download(dataset_url)
  • opendatasets uses the Kaggle Official API for donwloading dataset from Kaggle.
  • Follow these steps to find your API:
  1. Sign in to https://kaggle.com/, then click on your profile picture on the top right and select "My Account" from the menu.
  2. Scroll down to the "API" section and click "Create New API Token". This will download a file kaggle.json with the contents:
    {"username":"YOUR_KAGGLE_USERNAME","key":"YOUR_KAGGLE_KEY"}
  3. When you run opendatsets.download, you will be asked to enter your username & Kaggle API, which you can get from the file downloaded.
  • Note that you need to download the kaggle.json file only once.

Let's download dataset using opendatasets library from kaggle.