Learn practical skills, build real-world projects, and advance your career

Insurance cost prediction using linear regression

In this assignment we're going to use information like a person's age, sex, BMI, no. of children and smoking habit to predict the price of yearly medical bills. This kind of model is useful for insurance companies to determine the yearly insurance premium for a person. The dataset for this problem is taken from: https://www.kaggle.com/mirichoi0218/insurance

We will create a model with the following steps:

  1. Download and explore the dataset
  2. Prepare the dataset for training
  3. Create a linear regression model
  4. Train the model to fit the data
  5. Make predictions using the trained model

This assignment builds upon the concepts from the first 2 lectures. It will help to review these Jupyter notebooks:

As you go through this notebook, you will find a ??? in certain places. Your job is to replace the ??? with appropriate code or values, to ensure that the notebook runs properly end-to-end . In some cases, you'll be required to choose some hyperparameters (learning rate, batch size etc.). Try to experiment with the hypeparameters to get the lowest loss.

# Uncomment and run the commands below if imports fail
!conda install pandas seaborn numpy pytorch torchvision cpuonly -c pytorch -y
!pip install matplotlib --upgrade 
!pip install jovian --upgrade 
Collecting package metadata (current_repodata.json): done Solving environment: done ==> WARNING: A newer version of conda exists. <== current version: 4.8.2 latest version: 4.8.3 Please update conda by running $ conda update -n base conda ## Package Plan ## environment location: /srv/conda/envs/notebook added / updated specs: - cpuonly - numpy - pandas - pytorch - seaborn - torchvision The following packages will be downloaded: package | build ---------------------------|----------------- cycler-0.10.0 | py_2 9 KB conda-forge icu-64.2 | he1b5a44_1 12.6 MB conda-forge kiwisolver-1.2.0 | py37h99015e2_0 87 KB conda-forge matplotlib-base-3.2.1 | py37h30547a4_0 7.1 MB conda-forge patsy-0.5.1 | py_0 187 KB conda-forge scipy-1.4.1 | py37ha3d9a3c_3 18.8 MB conda-forge seaborn-0.10.1 | py_0 158 KB conda-forge statsmodels-0.11.1 | py37h8f50634_1 10.1 MB conda-forge ------------------------------------------------------------ Total: 49.0 MB The following NEW packages will be INSTALLED: cycler conda-forge/noarch::cycler-0.10.0-py_2 icu conda-forge/linux-64::icu-64.2-he1b5a44_1 kiwisolver conda-forge/linux-64::kiwisolver-1.2.0-py37h99015e2_0 matplotlib-base conda-forge/linux-64::matplotlib-base-3.2.1-py37h30547a4_0 patsy conda-forge/noarch::patsy-0.5.1-py_0 scipy conda-forge/linux-64::scipy-1.4.1-py37ha3d9a3c_3 seaborn conda-forge/noarch::seaborn-0.10.1-py_0 statsmodels conda-forge/linux-64::statsmodels-0.11.1-py37h8f50634_1 Downloading and Extracting Packages icu-64.2 | 12.6 MB | ##################################### | 100% statsmodels-0.11.1 | 10.1 MB | ##################################### | 100% matplotlib-base-3.2. | 7.1 MB | ##################################### | 100% kiwisolver-1.2.0 | 87 KB | ##################################### | 100% scipy-1.4.1 | 18.8 MB | ##################################### | 100% seaborn-0.10.1 | 158 KB | ##################################### | 100% patsy-0.5.1 | 187 KB | ##################################### | 100% cycler-0.10.0 | 9 KB | ##################################### | 100% Preparing transaction: done Verifying transaction: done Executing transaction: done Requirement already up-to-date: matplotlib in /srv/conda/envs/notebook/lib/python3.7/site-packages (3.2.1) Requirement already satisfied, skipping upgrade: python-dateutil>=2.1 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from matplotlib) (2.8.1) Requirement already satisfied, skipping upgrade: numpy>=1.11 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from matplotlib) (1.18.4) Requirement already satisfied, skipping upgrade: kiwisolver>=1.0.1 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from matplotlib) (1.2.0) Requirement already satisfied, skipping upgrade: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from matplotlib) (2.4.7) Requirement already satisfied, skipping upgrade: cycler>=0.10 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from matplotlib) (0.10.0) Requirement already satisfied, skipping upgrade: six>=1.5 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from python-dateutil>=2.1->matplotlib) (1.15.0) Requirement already up-to-date: jovian in /srv/conda/envs/notebook/lib/python3.7/site-packages (0.2.14) Requirement already satisfied, skipping upgrade: requests in /srv/conda/envs/notebook/lib/python3.7/site-packages (from jovian) (2.23.0) Requirement already satisfied, skipping upgrade: uuid in /srv/conda/envs/notebook/lib/python3.7/site-packages (from jovian) (1.30) Requirement already satisfied, skipping upgrade: click in /srv/conda/envs/notebook/lib/python3.7/site-packages (from jovian) (7.1.2) Requirement already satisfied, skipping upgrade: pyyaml in /srv/conda/envs/notebook/lib/python3.7/site-packages (from jovian) (5.3.1) Requirement already satisfied, skipping upgrade: certifi>=2017.4.17 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from requests->jovian) (2020.4.5.1) Requirement already satisfied, skipping upgrade: idna<3,>=2.5 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from requests->jovian) (2.9) Requirement already satisfied, skipping upgrade: chardet<4,>=3.0.2 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from requests->jovian) (3.0.4) Requirement already satisfied, skipping upgrade: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from requests->jovian) (1.25.9)
import torch
import jovian
import torchvision
import torch.nn as nn
import pandas as pd
import matplotlib.pyplot as plt
import torch.nn.functional as F
from torchvision.datasets.utils import download_url
from torch.utils.data import DataLoader, TensorDataset, random_split
project_name='02-insurance-linear-regression' # will be used by jovian.commit

Step 1: Download and explore the data

Let us begin by downloading the data. We'll use the download_url function from PyTorch to get the data as a CSV (comma-separated values) file.