Learn practical skills, build real-world projects, and advance your career

Exploratory Data Analysis using Python - A Case Study

Analyzing responses from the Stack Overflow Annual Developer Survey 2020

alt

Part 9 of "Data Analysis with Python: Zero to Pandas"

This tutorial series is a beginner-friendly introduction to programming and data analysis using the Python programming language. These tutorials take a practical and coding-focused approach. The best way to learn the material is to execute the code and experiment with it yourself. Check out the full series here:

  1. First Steps with Python and Jupyter
  2. A Quick Tour of Variables and Data Types
  3. Branching using Conditional Statements and Loops
  4. Writing Reusable Code Using Functions
  5. Reading from and Writing to Files
  6. Numerical Computing with Python and Numpy
  7. Analyzing Tabular Data using Pandas
  8. Data Visualization using Matplotlib & Seaborn
  9. Exploratory Data Analysis - A Case Study

The following topics are covered in this tutorial:

  • Selecting and downloading a dataset
  • Data preparation and cleaning
  • Exploratory analysis and visualization
  • Asking and answering interesting questions
  • Summarizing inferences and drawing conclusions

How to run the code

This tutorial is an executable Jupyter notebook hosted on Jovian. You can run this tutorial and experiment with the code examples in a couple of ways: using free online resources (recommended) or on your computer.

Option 1: Running using free online resources (1-click, recommended)

The easiest way to start executing the code is to click the Run button at the top of this page and select Run on Binder. You can also select "Run on Colab" or "Run on Kaggle", but you'll need to create an account on Google Colab or Kaggle to use these platforms.

Option 2: Running on your computer locally

To run the code on your computer locally, you'll need to set up Python, download the notebook and install the required libraries. We recommend using the Conda distribution of Python. Click the Run button at the top of this page, select the Run Locally option, and follow the instructions.

Jupyter Notebooks: This tutorial is a Jupyter notebook - a document made of cells. Each cell can contain code written in Python or explanations in plain English. You can execute code cells and view the results, e.g., numbers, messages, graphs, tables, files, etc., instantly within the notebook. Jupyter is a powerful platform for experimentation and analysis. Don't be afraid to mess around with the code & break things - you'll learn a lot by encountering and fixing errors. You can use the "Kernel > Restart & Clear Output" menu option to clear all outputs and start again from the top.

Introduction

In this tutorial, we'll analyze the StackOverflow developer survey dataset. The dataset contains responses to an annual survey conducted by StackOverflow. You can find the raw data & official analysis here: https://insights.stackoverflow.com/survey.

There are several options for getting the dataset into Jupyter:

  • Download the CSV manually and upload it via Jupyter's GUI
  • Use the urlretrieve function from the urllib.request to download CSV files from a raw URL
  • Use a helper library, e.g., opendatasets, which contains a collection of curated datasets and provides a helper function for direct download.

We'll use the opendatasets helper library to download the files.

pip install opendatasets --upgrade
Collecting opendatasets Downloading opendatasets-0.1.6-py3-none-any.whl (14 kB) Requirement already satisfied, skipping upgrade: tqdm in /opt/conda/lib/python3.8/site-packages (from opendatasets) (4.50.2) Collecting kaggle Downloading kaggle-1.5.9.tar.gz (58 kB) |████████████████████████████████| 58 kB 3.5 MB/s eta 0:00:011 Requirement already satisfied, skipping upgrade: click in /opt/conda/lib/python3.8/site-packages (from opendatasets) (7.1.2) Requirement already satisfied, skipping upgrade: six>=1.10 in /opt/conda/lib/python3.8/site-packages (from kaggle->opendatasets) (1.15.0) Requirement already satisfied, skipping upgrade: certifi in /opt/conda/lib/python3.8/site-packages (from kaggle->opendatasets) (2020.6.20) Requirement already satisfied, skipping upgrade: python-dateutil in /opt/conda/lib/python3.8/site-packages (from kaggle->opendatasets) (2.8.1) Requirement already satisfied, skipping upgrade: requests in /opt/conda/lib/python3.8/site-packages (from kaggle->opendatasets) (2.24.0) Collecting python-slugify Downloading python-slugify-4.0.1.tar.gz (11 kB) Collecting slugify Downloading slugify-0.0.1.tar.gz (1.2 kB) Requirement already satisfied, skipping upgrade: urllib3 in /opt/conda/lib/python3.8/site-packages (from kaggle->opendatasets) (1.25.11) Requirement already satisfied, skipping upgrade: chardet<4,>=3.0.2 in /opt/conda/lib/python3.8/site-packages (from requests->kaggle->opendatasets) (3.0.4) Requirement already satisfied, skipping upgrade: idna<3,>=2.5 in /opt/conda/lib/python3.8/site-packages (from requests->kaggle->opendatasets) (2.10) Collecting text-unidecode>=1.3 Downloading text_unidecode-1.3-py2.py3-none-any.whl (78 kB) |████████████████████████████████| 78 kB 4.1 MB/s eta 0:00:011 Building wheels for collected packages: kaggle, python-slugify, slugify Building wheel for kaggle (setup.py) ... done Created wheel for kaggle: filename=kaggle-1.5.9-py3-none-any.whl size=73265 sha256=c39728b07c748f5a3de01422306a05b64baeb0873b842ae4f820421cce79163b Stored in directory: /home/jovyan/.cache/pip/wheels/ed/bd/77/e12cac6080de7e1185049bbfb87c7f250d74fd4f9389af0c9c Building wheel for python-slugify (setup.py) ... done Created wheel for python-slugify: filename=python_slugify-4.0.1-py2.py3-none-any.whl size=6769 sha256=5609f7fdb545567ecc97705c2e65f600043965979916be927e9d56f6c2cb9f90 Stored in directory: /home/jovyan/.cache/pip/wheels/91/4d/4f/e740a68c215791688c46c4d6251770a570e8dfea91af1acb5c Building wheel for slugify (setup.py) ... done Created wheel for slugify: filename=slugify-0.0.1-py3-none-any.whl size=1909 sha256=e4343ac0aa4ebeeb3a83de49c2dbdf245bb73c0df48f86aae35b9db2e3c22582 Stored in directory: /home/jovyan/.cache/pip/wheels/a2/49/ff/b5d3130b393f908f0faebf7b4069b259e97d23821826553a76 Successfully built kaggle python-slugify slugify Installing collected packages: text-unidecode, python-slugify, slugify, kaggle, opendatasets Successfully installed kaggle-1.5.9 opendatasets-0.1.6 python-slugify-4.0.1 slugify-0.0.1 text-unidecode-1.3 Note: you may need to restart the kernel to use updated packages.