Linear regression model to estimate medical charges for smokers

EXERCISE : Repeat the steps is this section to train a linear regression model to estimate medical charges for smokers. Visualize the targets and predictions, and compute the loss.

model3 = LinearRegression()
smoker_df = medical_df[medical_df.smoker == ‘yes’]

inputs = smoker_df[[‘age’]]
targets = smoker_df.charges
print(‘inputs.shape :’, inputs.shape)
print(‘targes.shape :’, targets.shape)

model3.fit(inputs, targets)

predictions = model.predict(inputs)

rmse(targets, predictions)
26148.867808867617

try_parameters(model3.coef_, model3.intercept_)
RMSE Loss: 24338.502872599212
image

So I did this just wanted to know am in Right direction or not and getting correct output ?

I have written my code this way. But it’s showing this error "ValueError: could not convert string to float: ‘yes’’.
smoker_df = medical_df[medical_df.smoker == ‘yes’]
inputs = smoker_df[[‘smoker’]]
targets = smoker_df.charges
print(‘inputs.shape :’, inputs.shape)
print(‘targes.shape :’, targets.shape)
from sklearn import linear_model
from sklearn.linear_model import SGDRegressor
SGDReg=linear_model.SGDRegressor(loss = ‘huber’)
SGDReg.fit(inputs,targets)
SGDReg.predict(np.array([[23], [37], [61]]))

Can someone please let me know what is the mistake in my code?

So you are using ‘smoker’ as your input, which is a categorical value (ie can only be “yes” or “no”). But linear regression models (and most sklearn models) take in numerical inputs. So you’ll have to encode your categorical values to numerical values, eg “yes” —> 1, “no” —> 0. (But do you really want to run a linear regression model on a binary input?).