Lesson 6 - Unsupervised Learning and Recommendations

:arrow_forward: Lecture Video will be available on the course page :point_up_2:

Topics Covered:

  • Data exploration and feature engineering
  • Training and comparing multiple models
  • Hyperparameter tuning & ensembling

:spiral_notepad: Notebooks used in this lesson:

:writing_hand: Please provide your valuable feedback on this link to help us improve the course experience.

:computer: Join the Jovian Discord Server to interact with the course team, share resources and attend the study hours :point_right: Jovian Discord Server

:question: Asking/Answering Questions

Reply to this thread to ask questions. Before asking, scroll through the thread and check if your question (or a similar one) is already present. If yes, just like it. We will give priority to the questions with the most likes. The rest will be answered by our mentors or the community. If you see a question you know the answer to, please post your answer as a reply to that question. Let’s help each other learn!

my kernel keeps on dying when I transform train_input[numerc_cols] and test-input[numeric_cols] to impute missing numerical data using the simple imputer for my course project.

Kernel doesn’t automatically restart either.

Please guide and provide the easiest solution.

How large is your data set?

Yeah I reduced my dataset size and it worked thanks.

How to reduce rows dimension instead of columns dimension , i mean if 1000 row how to reduce them to 10, like we can do for columns (setting n_clusters =10) ,by using any method like pca …

For tabulated Data we tend to think of the columns as features or dimensions. The rows themselves are not strictly dimensions but observations. So dimensionality reduction algorithms like PCA won’t reduce the number of rows. To reduce the number of rows is to reduce the data sample size. If you want to randomly sample rows you can use pandas sample function, otherwise you’ll have to come up with a way to group the data by rows and look at the data one group at a time. This could be done by grouping the data by another feature or by using some kind of clustering algorithm (eg k means) to group the data.

Should we do the process of dimensionality reduction before processing the data(like treating Nans, Encoding categorical columns, scaling etc) or after it ?? :thinking: :thinking:

When to use T-sne , PCA or ICA??

1 Like