IBM Employee Attrition & Performance EDA

In this tutorial, we'll analyze the StackOverflow developer survey dataset. The dataset contains responses to an annual survey conducted by StackOverflow. You can find the raw data & official analysis here: https://insights.stackoverflow.com/survey.

There are several options for getting the dataset into Jupyter:

Download the CSV manually and upload it via Jupyter's GUI
Use the urlretrieve function from the urllib.request to download CSV files from a raw URL
Use a helper library, e.g., opendatasets, which contains a collection of curated datasets and provides a helper function for direct download.

We'll use the opendatasets helper library to download the files.

Environment setup

Let's begin by downloading the data, and listing the files within the dataset

!pip install opendatasets --upgrade --quiet

dataset_url = 'https://www.kaggle.com/pavansubhasht/ibm-hr-analytics-attrition-dataset'

import opendatasets as od
od.download(dataset_url)

Please provide your Kaggle credentials to download this dataset. Learn more: http://bit.ly/kaggle-creds
Your Kaggle username: 
Your Kaggle username: 
Your Kaggle username: 
Your Kaggle username: 
Your Kaggle username: irisle2712
Your Kaggle Key: ········

100%|██████████| 50.1k/50.1k [00:00<00:00, 5.17MB/s]

Downloading ibm-hr-analytics-attrition-dataset.zip to .\ibm-hr-analytics-attrition-dataset