Learn practical skills, build real-world projects, and advance your career

Problem Statement

QUESTION : ACME Insurance Inc. offers affordable health insurance to thousands of customer all over the United States. As the lead data scientist at ACME, ** you're tasked with creating an automated system tto estimate the annual medical expenditure for new customers**, using information such as their age , sex , BMI , children , smoking habits and region of residence.

Estimates from your system will be used to determine the annual insurance premium (amount paid every month) offered to the customer. Due to regulatory requirements, you must be able to explain why your system outputs a certain prediction.

You're given a CSV file containing verified historical data, consisting of the aforementioned information and the actual medical charges incurred by over 1300 customers.
alt

Dataset source: https://github.com/stedy/Machine-Learning-with-R-datasets

medical_charges_url = 'https://raw.githubusercontent.com/JovianML/opendatasets/master/data/medical-charges.csv'
from urllib.request import urlretrieve
urlretrieve(medical_charges_url,'medical.csv')
('medical.csv', <http.client.HTTPMessage at 0x7f1d54925250>)
import pandas as pd