Learn practical skills, build real-world projects, and advance your career

Hypothesis Testing and Statistical Significance

This tutorial is a part of the Zero to Data Analyst Bootcamp by Jovian

alt

Hypothesis testing is a technique by statisticians, scientists and data analysts for measuring whether the results of an experiment are meaningful and reliable. The statistical significance of the results of an experiment is often quantified using a P-value. This tutorial aims to build intuition for hypothesis testing using some real-world examples.

How to Run the Code

The best way to learn the material is to execute the code and experiment with it yourself. This tutorial is an executable Jupyter notebook. You can run this tutorial and experiment with the code examples in a couple of ways: using free online resources (recommended) or on your computer.

Option 1: Running using free online resources (1-click, recommended)

The easiest way to start executing the code is to click the Run button at the top of this page and select Run on Binder. You can also select "Run on Colab" or "Run on Kaggle", but you'll need to create an account on Google Colab or Kaggle to use these platforms.

Option 2: Running on your computer locally

To run the code on your computer locally, you'll need to set up Python, download the notebook and install the required libraries. We recommend using the Conda distribution of Python. Click the Run button at the top of this page, select the Run Locally option, and follow the instructions.

Problem Statement

Let's work through a real-world example to understand how statistical tests are performed:

QUESTION: You're an analyst at the investment firm Capital Ventures, and you're evaluating the company Jovian for a potential investment. The founders of Jovian claim that completing a data science bootcamp offered by Jovian helps you land a data science job faster.

alt

A 2020 McKinley report suggests that candidates apply for an average of 37 data science job roles before getting hired. You've surveyed 42 Jovian bootcamp graduates who are now working in data science roles, and compiled data for the number of jobs each one applied to before getting hired: 31, 23, 19, 42, 37, 18, 7, 53, 33, 17, 27, 41, 36, 29, 60, 34, 21, 18, 45, 33, 16, 10, 48, 32, 19, 29, 40, 35, 28, 57, 25, 31, 19, 40, 37, 33, 38, 28, 40, 36, 42, 39

Is there a statistically significant decrease in the number of jobs candidates need to apply to before getting hired if they've completed a bootcamp offered by Jovian?

jobs_applied = [31, 23, 19, 42, 37, 18, 7, 53, 33, 17, 
                27, 41, 36, 29, 60, 34, 21, 18, 45, 33, 
                16, 10, 48, 32, 19, 29, 40, 35, 28, 57, 
                25, 31, 19, 40, 37, 33, 38, 28, 40, 36, 42, 39]