Learn practical skills, build real-world projects, and advance your career

Insurance cost prediction using linear regression

In this assignment we're going to use information like a person's age, sex, BMI, no. of children and smoking habit to predict the price of yearly medical bills. This kind of model is useful for insurance companies to determine the yearly insurance premium for a person. The dataset for this problem is taken from: https://www.kaggle.com/mirichoi0218/insurance

We will create a model with the following steps:

  1. Download and explore the dataset
  2. Prepare the dataset for training
  3. Create a linear regression model
  4. Train the model to fit the data
  5. Make predictions using the trained model

This assignment builds upon the concepts from the first 2 lectures. It will help to review these Jupyter notebooks:

As you go through this notebook, you will find a ??? in certain places. Your job is to replace the ??? with appropriate code or values, to ensure that the notebook runs properly end-to-end . In some cases, you'll be required to choose some hyperparameters (learning rate, batch size etc.). Try to experiment with the hypeparameters to get the lowest loss.

# Uncomment and run the commands below if imports fail
!conda install numpy pytorch pandas torchvision cpuonly -c pytorch -y
!pip install matplotlib --upgrade --quiet
!pip install jovian --upgrade --quiet
Collecting package metadata (current_repodata.json): done Solving environment: done ## Package Plan ## environment location: /cargo/.local/anaconda3/envs/jovian added / updated specs: - cpuonly - numpy - pandas - pytorch - torchvision The following packages will be downloaded: package | build ---------------------------|----------------- certifi-2020.4.5.2 | py38_0 156 KB cpuonly-1.0 | 0 2 KB pytorch intel-openmp-2020.1 | 217 780 KB libtiff-4.1.0 | h2733197_1 449 KB lz4-c-1.9.2 | he6710b0_0 191 KB mkl-2020.1 | 217 129.0 MB mkl-service-2.3.0 | py38he904b0f_0 62 KB mkl_fft-1.1.0 | py38h23d657b_0 150 KB mkl_random-1.1.1 | py38h0573a6f_0 341 KB ninja-1.9.0 | py38hfd86e86_0 1.2 MB numpy-1.18.1 | py38h4f9e942_0 5 KB numpy-base-1.18.1 | py38hde5b4d6_1 4.2 MB olefile-0.46 | py_0 33 KB pandas-1.0.4 | py38h0573a6f_0 9.0 MB pillow-7.1.2 | py38hb39fc2d_0 605 KB pytorch-1.5.1 | py3.8_cpu_0 37.9 MB pytorch pytz-2020.1 | py_0 184 KB torchvision-0.6.1 | py38_cpu 11.0 MB pytorch zstd-1.4.4 | h0b5b093_3 447 KB ------------------------------------------------------------ Total: 195.6 MB The following NEW packages will be INSTALLED: blas pkgs/main/linux-64::blas-1.0-mkl cpuonly pytorch/noarch::cpuonly-1.0-0 intel-openmp pkgs/main/linux-64::intel-openmp-2020.1-217 libgfortran-ng pkgs/main/linux-64::libgfortran-ng-7.3.0-hdf63c60_0 libtiff pkgs/main/linux-64::libtiff-4.1.0-h2733197_1 lz4-c pkgs/main/linux-64::lz4-c-1.9.2-he6710b0_0 mkl pkgs/main/linux-64::mkl-2020.1-217 mkl-service pkgs/main/linux-64::mkl-service-2.3.0-py38he904b0f_0 mkl_fft pkgs/main/linux-64::mkl_fft-1.1.0-py38h23d657b_0 mkl_random pkgs/main/linux-64::mkl_random-1.1.1-py38h0573a6f_0 ninja pkgs/main/linux-64::ninja-1.9.0-py38hfd86e86_0 numpy pkgs/main/linux-64::numpy-1.18.1-py38h4f9e942_0 numpy-base pkgs/main/linux-64::numpy-base-1.18.1-py38hde5b4d6_1 olefile pkgs/main/noarch::olefile-0.46-py_0 pandas pkgs/main/linux-64::pandas-1.0.4-py38h0573a6f_0 pillow pkgs/main/linux-64::pillow-7.1.2-py38hb39fc2d_0 pytorch pytorch/linux-64::pytorch-1.5.1-py3.8_cpu_0 pytz pkgs/main/noarch::pytz-2020.1-py_0 torchvision pytorch/linux-64::torchvision-0.6.1-py38_cpu zstd pkgs/main/linux-64::zstd-1.4.4-h0b5b093_3 The following packages will be UPDATED: certifi 2020.4.5.1-py38_0 --> 2020.4.5.2-py38_0 Downloading and Extracting Packages certifi-2020.4.5.2 | 156 KB | ##################################### | 100% mkl_random-1.1.1 | 341 KB | ##################################### | 100% zstd-1.4.4 | 447 KB | ##################################### | 100% olefile-0.46 | 33 KB | ##################################### | 100% pytz-2020.1 | 184 KB | ##################################### | 100% torchvision-0.6.1 | 11.0 MB | ##################################### | 100% mkl-2020.1 | 129.0 MB | ##################################### | 100% pandas-1.0.4 | 9.0 MB | ##################################### | 100% mkl-service-2.3.0 | 62 KB | ##################################### | 100% numpy-base-1.18.1 | 4.2 MB | ##################################### | 100% pytorch-1.5.1 | 37.9 MB | ##################################### | 100% lz4-c-1.9.2 | 191 KB | ##################################### | 100% libtiff-4.1.0 | 449 KB | ##################################### | 100% intel-openmp-2020.1 | 780 KB | ##################################### | 100% ninja-1.9.0 | 1.2 MB | ##################################### | 100% mkl_fft-1.1.0 | 150 KB | ##################################### | 100% pillow-7.1.2 | 605 KB | ##################################### | 100% numpy-1.18.1 | 5 KB | ##################################### | 100% cpuonly-1.0 | 2 KB | ##################################### | 100% Preparing transaction: done Verifying transaction: done Executing transaction: done
import torch
import jovian
import torchvision
import torch.nn as nn
import pandas as pd
import matplotlib.pyplot as plt
import torch.nn.functional as F
from torchvision.datasets.utils import download_url
from torch.utils.data import DataLoader, TensorDataset, random_split
project_name='02-insurance-linear-regression' # will be used by jovian.commit

Step 1: Download and explore the data

Let us begin by downloading the data. We'll use the download_url function from PyTorch to get the data as a CSV (comma-separated values) file.