Insurance cost prediction using linear regression
In this assignment we're going to use information like a person's age, sex, BMI, no. of children and smoking habit to predict the price of yearly medical bills. This kind of model is useful for insurance companies to determine the yearly insurance premium for a person. The dataset for this problem is taken from: https://www.kaggle.com/mirichoi0218/insurance
We will create a model with the following steps:
- Download and explore the dataset
- Prepare the dataset for training
- Create a linear regression model
- Train the model to fit the data
- Make predictions using the trained model
This assignment builds upon the concepts from the first 2 lectures. It will help to review these Jupyter notebooks:
- PyTorch basics: https://jovian.ml/aakashns/01-pytorch-basics
- Linear Regression: https://jovian.ml/aakashns/02-linear-regression
- Logistic Regression: https://jovian.ml/aakashns/03-logistic-regression
- Linear regression (minimal): https://jovian.ml/aakashns/housing-linear-minimal
- Logistic regression (minimal): https://jovian.ml/aakashns/mnist-logistic-minimal
As you go through this notebook, you will find a ??? in certain places. Your job is to replace the ??? with appropriate code or values, to ensure that the notebook runs properly end-to-end . In some cases, you'll be required to choose some hyperparameters (learning rate, batch size etc.). Try to experiment with the hypeparameters to get the lowest loss.
# Uncomment and run the commands below if imports fail
!conda install numpy pytorch torchvision cpuonly -c pytorch -y
!pip install matplotlib --upgrade --quiet
!pip install jovian --upgrade --quiet
!pip install pandas
Collecting package metadata (current_repodata.json): done
Solving environment: /
The environment is inconsistent, please check the package plan carefully
The following packages are causing the inconsistency:
- pytorch/linux-64::pytorch==1.5.0=py3.7_cpu_0
- pytorch/linux-64::torchvision==0.6.0=py37_cpu
done
==> WARNING: A newer version of conda exists. <==
current version: 4.8.2
latest version: 4.8.3
Please update conda by running
$ conda update -n base conda
## Package Plan ##
environment location: /srv/conda/envs/notebook
added / updated specs:
- cpuonly
- numpy
- pytorch
- torchvision
The following packages will be downloaded:
package | build
---------------------------|-----------------
numpy-1.18.4 | py37h8960a57_0 5.2 MB conda-forge
------------------------------------------------------------
Total: 5.2 MB
The following NEW packages will be INSTALLED:
numpy conda-forge/linux-64::numpy-1.18.4-py37h8960a57_0
Downloading and Extracting Packages
numpy-1.18.4 | 5.2 MB | ##################################### | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
Requirement already satisfied: pandas in /srv/conda/envs/notebook/lib/python3.7/site-packages (1.0.4)
Requirement already satisfied: numpy>=1.13.3 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from pandas) (1.18.5)
Requirement already satisfied: python-dateutil>=2.6.1 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from pandas) (2.8.1)
Requirement already satisfied: pytz>=2017.2 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from pandas) (2020.1)
Requirement already satisfied: six>=1.5 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from python-dateutil>=2.6.1->pandas) (1.15.0)
import torch
import jovian
import torchvision
import torch.nn as nn
import pandas as pd
import matplotlib.pyplot as plt
import torch.nn.functional as F
from torchvision.datasets.utils import download_url
from torch.utils.data import DataLoader, TensorDataset, random_split
project_name='02-insurance-linear-regression' # will be used by jovian.commit
Step 1: Download and explore the data
Let us begin by downloading the data. We'll use the download_url
function from PyTorch to get the data as a CSV (comma-separated values) file.