Learn practical skills, build real-world projects, and advance your career

Insurance cost prediction using linear regression

In this assignment we're going to use information like a person's age, sex, BMI, no. of children and smoking habit to predict the price of yearly medical bills. This kind of model is useful for insurance companies to determine the yearly insurance premium for a person. The dataset for this problem is taken from: https://www.kaggle.com/mirichoi0218/insurance

We will create a model with the following steps:

  1. Download and explore the dataset
  2. Prepare the dataset for training
  3. Create a linear regression model
  4. Train the model to fit the data
  5. Make predictions using the trained model

This assignment builds upon the concepts from the first 2 lectures. It will help to review these Jupyter notebooks:

As you go through this notebook, you will find a ??? in certain places. Your job is to replace the ??? with appropriate code or values, to ensure that the notebook runs properly end-to-end . In some cases, you'll be required to choose some hyperparameters (learning rate, batch size etc.). Try to experiment with the hyperparameters to get the lowest loss.

# Uncomment and run the commands below if imports fail
!conda install numpy pytorch torchvision cpuonly -c pytorch -y
!pip install matplotlib --upgrade --quiet
!pip install jovian --upgrade --quiet
Collecting package metadata (current_repodata.json): done Solving environment: done ## Package Plan ## environment location: /opt/conda added / updated specs: - cpuonly - numpy - pytorch - torchvision The following packages will be downloaded: package | build ---------------------------|----------------- ca-certificates-2020.4.5.2 | hecda079_0 147 KB conda-forge certifi-2020.4.5.2 | py37hc8dfbb8_0 152 KB conda-forge numpy-1.18.5 | py37h8960a57_0 5.1 MB conda-forge ------------------------------------------------------------ Total: 5.4 MB The following packages will be UPDATED: ca-certificates 2020.4.5.1-hecc5488_0 --> 2020.4.5.2-hecda079_0 certifi 2020.4.5.1-py37hc8dfbb8_0 --> 2020.4.5.2-py37hc8dfbb8_0 numpy 1.18.1-py37h8960a57_1 --> 1.18.5-py37h8960a57_0 Downloading and Extracting Packages certifi-2020.4.5.2 | 152 KB | ##################################### | 100% ca-certificates-2020 | 147 KB | ##################################### | 100% numpy-1.18.5 | 5.1 MB | ##################################### | 100% Preparing transaction: done Verifying transaction: done Executing transaction: done
pip install pandas
Requirement already satisfied: pandas in /opt/conda/lib/python3.7/site-packages (1.0.3) Requirement already satisfied: pytz>=2017.2 in /opt/conda/lib/python3.7/site-packages (from pandas) (2019.3) Requirement already satisfied: numpy>=1.13.3 in /opt/conda/lib/python3.7/site-packages (from pandas) (1.18.5) Requirement already satisfied: python-dateutil>=2.6.1 in /opt/conda/lib/python3.7/site-packages (from pandas) (2.8.1) Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.7/site-packages (from python-dateutil>=2.6.1->pandas) (1.14.0) Note: you may need to restart the kernel to use updated packages.
import torch
import jovian
import torchvision
import torch.nn as nn
import pandas as pd
import matplotlib.pyplot as plt
import torch.nn.functional as F
from torchvision.datasets.utils import download_url
from torch.utils.data import DataLoader, TensorDataset, random_split
project_name='02-insurance-linear-regression' # will be used by jovian.commit