Learn practical skills, build real-world projects, and advance your career

IBM Employee Attrition & Performance EDA

In this tutorial, we'll analyze the StackOverflow developer survey dataset. The dataset contains responses to an annual survey conducted by StackOverflow. You can find the raw data & official analysis here: https://insights.stackoverflow.com/survey.

There are several options for getting the dataset into Jupyter:

  • Download the CSV manually and upload it via Jupyter's GUI
  • Use the urlretrieve function from the urllib.request to download CSV files from a raw URL
  • Use a helper library, e.g., opendatasets, which contains a collection of curated datasets and provides a helper function for direct download.

We'll use the opendatasets helper library to download the files.

Environment setup

Let's begin by downloading the data, and listing the files within the dataset

!pip install opendatasets --upgrade --quiet
dataset_url = 'https://www.kaggle.com/pavansubhasht/ibm-hr-analytics-attrition-dataset'
import opendatasets as od
od.download(dataset_url)
Please provide your Kaggle credentials to download this dataset. Learn more: http://bit.ly/kaggle-creds Your Kaggle username: Your Kaggle username: Your Kaggle username: Your Kaggle username: Your Kaggle username: irisle2712 Your Kaggle Key: ········
100%|██████████| 50.1k/50.1k [00:00<00:00, 5.17MB/s]
Downloading ibm-hr-analytics-attrition-dataset.zip to .\ibm-hr-analytics-attrition-dataset