Insurance cost prediction using linear regression
In this assignment we're going to use information like a person's age, sex, BMI, no. of children and smoking habit to predict the price of yearly medical bills. This kind of model is useful for insurance companies to determine the yearly insurance premium for a person. The dataset for this problem is taken from: https://www.kaggle.com/mirichoi0218/insurance
We will create a model with the following steps:
- Download and explore the dataset
- Prepare the dataset for training
- Create a linear regression model
- Train the model to fit the data
- Make predictions using the trained model
This assignment builds upon the concepts from the first 2 lectures. It will help to review these Jupyter notebooks:
- PyTorch basics: https://jovian.ml/aakashns/01-pytorch-basics
- Linear Regression: https://jovian.ml/aakashns/02-linear-regression
- Logistic Regression: https://jovian.ml/aakashns/03-logistic-regression
- Linear regression (minimal): https://jovian.ml/aakashns/housing-linear-minimal
- Logistic regression (minimal): https://jovian.ml/aakashns/mnist-logistic-minimal
As you go through this notebook, you will find a ??? in certain places. Your job is to replace the ??? with appropriate code or values, to ensure that the notebook runs properly end-to-end . In some cases, you'll be required to choose some hyperparameters (learning rate, batch size etc.). Try to experiment with the hypeparameters to get the lowest loss.
# Uncomment and run the commands below if imports fail
!conda install numpy pytorch pandas torchvision cpuonly -c pytorch -y
!pip install matplotlib --upgrade --quiet
!pip install jovian --upgrade --quiet
Collecting package metadata (current_repodata.json): done
Solving environment: done
## Package Plan ##
environment location: /cargo/.local/anaconda3/envs/jovian
added / updated specs:
- cpuonly
- numpy
- pandas
- pytorch
- torchvision
The following packages will be downloaded:
package | build
---------------------------|-----------------
certifi-2020.4.5.2 | py38_0 156 KB
cpuonly-1.0 | 0 2 KB pytorch
intel-openmp-2020.1 | 217 780 KB
libtiff-4.1.0 | h2733197_1 449 KB
lz4-c-1.9.2 | he6710b0_0 191 KB
mkl-2020.1 | 217 129.0 MB
mkl-service-2.3.0 | py38he904b0f_0 62 KB
mkl_fft-1.1.0 | py38h23d657b_0 150 KB
mkl_random-1.1.1 | py38h0573a6f_0 341 KB
ninja-1.9.0 | py38hfd86e86_0 1.2 MB
numpy-1.18.1 | py38h4f9e942_0 5 KB
numpy-base-1.18.1 | py38hde5b4d6_1 4.2 MB
olefile-0.46 | py_0 33 KB
pandas-1.0.4 | py38h0573a6f_0 9.0 MB
pillow-7.1.2 | py38hb39fc2d_0 605 KB
pytorch-1.5.1 | py3.8_cpu_0 37.9 MB pytorch
pytz-2020.1 | py_0 184 KB
torchvision-0.6.1 | py38_cpu 11.0 MB pytorch
zstd-1.4.4 | h0b5b093_3 447 KB
------------------------------------------------------------
Total: 195.6 MB
The following NEW packages will be INSTALLED:
blas pkgs/main/linux-64::blas-1.0-mkl
cpuonly pytorch/noarch::cpuonly-1.0-0
intel-openmp pkgs/main/linux-64::intel-openmp-2020.1-217
libgfortran-ng pkgs/main/linux-64::libgfortran-ng-7.3.0-hdf63c60_0
libtiff pkgs/main/linux-64::libtiff-4.1.0-h2733197_1
lz4-c pkgs/main/linux-64::lz4-c-1.9.2-he6710b0_0
mkl pkgs/main/linux-64::mkl-2020.1-217
mkl-service pkgs/main/linux-64::mkl-service-2.3.0-py38he904b0f_0
mkl_fft pkgs/main/linux-64::mkl_fft-1.1.0-py38h23d657b_0
mkl_random pkgs/main/linux-64::mkl_random-1.1.1-py38h0573a6f_0
ninja pkgs/main/linux-64::ninja-1.9.0-py38hfd86e86_0
numpy pkgs/main/linux-64::numpy-1.18.1-py38h4f9e942_0
numpy-base pkgs/main/linux-64::numpy-base-1.18.1-py38hde5b4d6_1
olefile pkgs/main/noarch::olefile-0.46-py_0
pandas pkgs/main/linux-64::pandas-1.0.4-py38h0573a6f_0
pillow pkgs/main/linux-64::pillow-7.1.2-py38hb39fc2d_0
pytorch pytorch/linux-64::pytorch-1.5.1-py3.8_cpu_0
pytz pkgs/main/noarch::pytz-2020.1-py_0
torchvision pytorch/linux-64::torchvision-0.6.1-py38_cpu
zstd pkgs/main/linux-64::zstd-1.4.4-h0b5b093_3
The following packages will be UPDATED:
certifi 2020.4.5.1-py38_0 --> 2020.4.5.2-py38_0
Downloading and Extracting Packages
certifi-2020.4.5.2 | 156 KB | ##################################### | 100%
mkl_random-1.1.1 | 341 KB | ##################################### | 100%
zstd-1.4.4 | 447 KB | ##################################### | 100%
olefile-0.46 | 33 KB | ##################################### | 100%
pytz-2020.1 | 184 KB | ##################################### | 100%
torchvision-0.6.1 | 11.0 MB | ##################################### | 100%
mkl-2020.1 | 129.0 MB | ##################################### | 100%
pandas-1.0.4 | 9.0 MB | ##################################### | 100%
mkl-service-2.3.0 | 62 KB | ##################################### | 100%
numpy-base-1.18.1 | 4.2 MB | ##################################### | 100%
pytorch-1.5.1 | 37.9 MB | ##################################### | 100%
lz4-c-1.9.2 | 191 KB | ##################################### | 100%
libtiff-4.1.0 | 449 KB | ##################################### | 100%
intel-openmp-2020.1 | 780 KB | ##################################### | 100%
ninja-1.9.0 | 1.2 MB | ##################################### | 100%
mkl_fft-1.1.0 | 150 KB | ##################################### | 100%
pillow-7.1.2 | 605 KB | ##################################### | 100%
numpy-1.18.1 | 5 KB | ##################################### | 100%
cpuonly-1.0 | 2 KB | ##################################### | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
import torch
import jovian
import torchvision
import torch.nn as nn
import pandas as pd
import matplotlib.pyplot as plt
import torch.nn.functional as F
from torchvision.datasets.utils import download_url
from torch.utils.data import DataLoader, TensorDataset, random_split
project_name='02-insurance-linear-regression' # will be used by jovian.commit
Step 1: Download and explore the data
Let us begin by downloading the data. We'll use the download_url
function from PyTorch to get the data as a CSV (comma-separated values) file.