Learn practical skills, build real-world projects, and advance your career

Random Variables and Probability Distributions

This tutorial is a part of the Zero to Data Analyst Bootcamp by Jovian

alt

In this tutorial, we study random variables: a powerful technique for modeling real world processes & phenomena using statistics and probability. We will also explore some commonly used probability distributions and study their applications with real-world examples.

The tutorial covers the following topics:

  • Introduction to random variables and probability distributions
  • Discrete distributions: Bernoulli, Binomial, Poisson etc.
  • Continuous distributions: Uniform, Gaussian, Exponential
  • Expected value, variance and standard deviation
  • Simulating randomness with Python

How to Run the Code

The best way to learn the material is to execute the code and experiment with it yourself. This tutorial is an executable Jupyter notebook. You can run this tutorial and experiment with the code examples in a couple of ways: using free online resources (recommended) or on your computer.

Option 1: Running using free online resources (1-click, recommended)

The easiest way to start executing the code is to click the Run button at the top of this page and select Run on Binder. You can also select "Run on Colab" or "Run on Kaggle", but you'll need to create an account on Google Colab or Kaggle to use these platforms.

Option 2: Running on your computer locally

To run the code on your computer locally, you'll need to set up Python, download the notebook and install the required libraries. We recommend using the Conda distribution of Python. Click the Run button at the top of this page, select the Run Locally option, and follow the instructions.

Random Variables

A random variable maps the outcome of a random process to a number. Here are some examples:

  1. A random variable CC has the value 11 if the result of tossing a fair coin is a "head", and 00 if the result is a "tail".
  2. A random variable HH counts the the number of heads obtained on tossing a fair coin 10 times. It can take the values 0, 1, 2, 3, 4, 5, 6, 7, 8, 10.
  3. A random variable DD indicates the result of a rolling a fair die. It can take the values 1, 2, 3, 4, 5, 6.
  4. A random variable SS indicates the sum of the numbers that show up when 2 fair dice are rolled. It can take the values 2, 3, 4, .... up to 12.
  5. A random variable RR indicates the amount of rainfall (in mm) in Bengaluru on a random day during the past week. Unlike, previous examples, it can take any value greater than or equal to 0. RR is a continuous random variable, while the others are discrete.
  6. A random variable LL indicates the exact duration of a video lesson on Jovian. What values can LL take? Is it continuous or discrete?
  7. A random variable WW indicates the width (diameter) of the trunk of a random tree in a forest (see picture above). Is WW continuous or discrete?

EXERCISE: Here's a simple thought experiment: think of a number between the 1 and 10. Let the random variable TT can indicate the result of this experiment. Repeat the experiment 20 times, and list all the values taken by T. Do some values show up more frequently than others?