Assignment 1 - Train Your First ML Model

Ask questions related to the assignment on this thread, submissions can be made on the link above.

:zap: In this assignment, you’re going to predict the price of a house using information like its location, area, no. of rooms, etc. You’ll follow a step-by-step process to train our model:

  1. Download and explore a dataset
  2. Prepare the dataset for training
  3. Train a linear regression model
  4. Make predictions and evaluate the model

:ledger: Assignment Notebook


:computer: Join the Jovian Discord Server to interact with the course team, share resources and attend the study hours :point_right: Jovian Discord Server

2 Likes

Good afternoon, Is it possible for me not to use the Jovian notebook provided and just write the code in a Jupyter lab interface and then send as my submission?

1 Like

Yes, you can do it in any jupyter notebook and at last commit it with your API key that it will automatically submitted.

1 Like

Hi fellow learners and teachers @hemanth @aakashns-6l3 , while one hot encoding the categorical columns you there are some NaN values which cannot be converted…I read about two ways to bypass it/solve the issue:

  1. To convert NaN values to “Other” object type value
  2. To use imputer and insert most_frequent data value in that column in place of the NaN
    I used the second one because somehow it was giving a better result but which one do you people think is better or any third option which you think might be the best?
2 Likes

Is it possible to know what data-set you are using because it totally depends on the data
If you can calculate the data then you can use that as filler, for e.g Body Mass Index (BMI) if height and weight if given you can just write a line to calculate and insert that data but if either of these data is not available then you just drop those rows. Then you can go for average of the values in the data-set or use the method that you have mentioned (imputer method) but it is possible that you might get results that are skewed (since that’s not the actual value and we are assuming/assigning a value that we think might be possible)

what is the best way to deal with NaN values in ‘object’ type variables ? Can some one help me deal with this ?

Hey @mrunalnarayana, welcome to the forum.
I don’t think I know the best way, it completely depends on your data, and how is it structured, and how the model performs with the way you are doing it.
I can suggest you two ways to fill values for missing data in the categorical column.

  1. Make a new category “Unknows” which does not belong to any other category. (used in lectures)
  2. You can fill the values with the mode value present in the column.

Also, you can drop the rows with NaN values(if possible).There might be other interesting ways too(Tip: Google it). Try other ways and see which works better for your model. Reply to this thread if you find something more interesting and works better.

5 Likes

Hello! I may face a problem that when I was using plotly.express, I tried to use simply graph like
(fig = px.imshow(df)
fig.show() )
It doesn’t have any error code but cannot show the image. Is it the setting problem for me on the computer?

Heyy
while doing one hot encoder for catogorical cloumn i get an error which shows like the below image

2 Likes

heyy I’m gettig error while fitting hot encoder.
can you help me out??

i think you didn’t handled missing values in prices_df[categorical_cols])

Well, missing values are not needed to be handled in the latest version of scikit-learn, those are handled automatically. Colab is using an older version so if you just do !pip install scikit-learn --upgrade and restart the runtime and run again, missing values will be automatically handled. The assignment notebook will be updated for this issue too. I think you shouldn’t face this issue if you are using binder/jupyter notebook as it uses the latest version of scikit-learn.

1 Like

Hey, px.imshow() is used to show an image using plotly. Are you trying to show an image or a graph?

Hi,

Need some advice on how should I resolve the error below.

image

Thanks

Hey, the input columns is supposed to be a list of column names, not a DataFrame.

2. Fit the encoder to the categorical columns

prices_df2 = prices_df[categorical_cols].fillna('unknown')
encoder.fit(prices_df2)

I have used this code, which fill NaN values.

I am using jupyter. Still getting same error


Getting error : Input contains NaN
due to this code :
input_df[encoded_cols] = encoder.transform(input_df[categorical_cols].values)
I have corrected it by replacing .values with .fillna(‘unknown’). Is this is the correct way ?
Can you tell me why it is not showing error after replacing NaN values by unknown ?
Thanks in advance.

1 Like

Hey, are you using Jupyter in your local machine? Try updating the version of scikit-learn. It should work with latest version of scikit-learn.

This was such a great assignment.
I was able to practice what i learnt in the first two lectures.
So glad to finally gain this confidence.

Thank you for curating this :slight_smile:

3 Likes