Insurance cost prediction using linear regression
In this assignment we're going to use information like a person's age, sex, BMI, no. of children and smoking habit to predict the price of yearly medical bills. This kind of model is useful for insurance companies to determine the yearly insurance premium for a person. The dataset for this problem is taken from: https://www.kaggle.com/mirichoi0218/insurance
We will create a model with the following steps:
- Download and explore the dataset
- Prepare the dataset for training
- Create a linear regression model
- Train the model to fit the data
- Make predictions using the trained model
This assignment builds upon the concepts from the first 2 lectures. It will help to review these Jupyter notebooks:
- PyTorch basics: https://jovian.ml/aakashns/01-pytorch-basics
- Linear Regression: https://jovian.ml/aakashns/02-linear-regression
- Logistic Regression: https://jovian.ml/aakashns/03-logistic-regression
- Linear regression (minimal): https://jovian.ml/aakashns/housing-linear-minimal
- Logistic regression (minimal): https://jovian.ml/aakashns/mnist-logistic-minimal
As you go through this notebook, you will find a ??? in certain places. Your job is to replace the ??? with appropriate code or values, to ensure that the notebook runs properly end-to-end . In some cases, you'll be required to choose some hyperparameters (learning rate, batch size etc.). Try to experiment with the hyperparameters to get the lowest loss.
# Uncomment and run the commands below if imports fail
!conda install numpy pytorch torchvision cpuonly -c pytorch -y
!pip install matplotlib --upgrade --quiet
!pip install jovian --upgrade --quiet
Collecting package metadata (current_repodata.json): done
Solving environment: done
## Package Plan ##
environment location: /opt/conda
added / updated specs:
- cpuonly
- numpy
- pytorch
- torchvision
The following packages will be downloaded:
package | build
---------------------------|-----------------
ca-certificates-2020.4.5.2 | hecda079_0 147 KB conda-forge
certifi-2020.4.5.2 | py37hc8dfbb8_0 152 KB conda-forge
numpy-1.18.5 | py37h8960a57_0 5.1 MB conda-forge
------------------------------------------------------------
Total: 5.4 MB
The following packages will be UPDATED:
ca-certificates 2020.4.5.1-hecc5488_0 --> 2020.4.5.2-hecda079_0
certifi 2020.4.5.1-py37hc8dfbb8_0 --> 2020.4.5.2-py37hc8dfbb8_0
numpy 1.18.1-py37h8960a57_1 --> 1.18.5-py37h8960a57_0
Downloading and Extracting Packages
certifi-2020.4.5.2 | 152 KB | ##################################### | 100%
ca-certificates-2020 | 147 KB | ##################################### | 100%
numpy-1.18.5 | 5.1 MB | ##################################### | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
pip install pandas
Requirement already satisfied: pandas in /opt/conda/lib/python3.7/site-packages (1.0.3)
Requirement already satisfied: pytz>=2017.2 in /opt/conda/lib/python3.7/site-packages (from pandas) (2019.3)
Requirement already satisfied: numpy>=1.13.3 in /opt/conda/lib/python3.7/site-packages (from pandas) (1.18.5)
Requirement already satisfied: python-dateutil>=2.6.1 in /opt/conda/lib/python3.7/site-packages (from pandas) (2.8.1)
Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.7/site-packages (from python-dateutil>=2.6.1->pandas) (1.14.0)
Note: you may need to restart the kernel to use updated packages.
import torch
import jovian
import torchvision
import torch.nn as nn
import pandas as pd
import matplotlib.pyplot as plt
import torch.nn.functional as F
from torchvision.datasets.utils import download_url
from torch.utils.data import DataLoader, TensorDataset, random_split
project_name='02-insurance-linear-regression' # will be used by jovian.commit