Course Project - Real-World Machine Learning Model

Machine Learning with Python: Zero to GBMs

Lesson 1 - Linear Regression with Scikit Learn Lesson 2 - Logistic Regression for Classification Assignment 1 - Train Your First ML Model Lesson 3 - Decision Trees and Hyperparameters Lesson 4 - Random Forests and Regularization Assignment 2 - Decision Trees and Random Forests Lesson 5 - Gradient Boosting with XGBoost

← Back

Course Home

Lesson 6 - Unsupervised Learning and Recommendations

In the course project, you will apply the machine learning skills covered in this course by training an ML model on a real-world dataset. Follow these steps to complete your project:

Pick a large real-world dataset from Kaggle (see the "Resources" section below) and download it using opendatasets. Your training set should contain at least 50,000 rows and 5 columns of data.
Read the dataset description, understand the problem statement and describe the modeling objective clearly. You can also browse through existing notebooks created by others for inspiration.
Perform exploratory data analysis, gather insights about the data, perform feature engineering, create a training-validation split, and prepare the data for modeling.
Train & evaluate different machine learning models, tune hyperparameters and reduce overfitting to improve the model.
Report the final performance of your best model(s), show sample predictions, and save model weights. Summarize your work, share links to references, and suggest ideas for future work.
Publish your Jupyter notebook to Jovian, make a submission below and share your project with the community. Optionally, you may also write a blog post and contribute to the Jovian official blog.

There is no starter notebook for the course project. Please use the "New" button on Jovian to create a new notebook, "Run on Colab" to execute it, and jovian.commit to record versions. Please review the "Evaluation Criteria" and "Resources" sections below carefully before starting your project.

Submit

Resources

Google Colab Public Notebook Link (Required)

You can submit multiple times. Only your last submission will be evaluated.

Evaluation Criteria

Your submission must satisfy the following criteria:

Training set should contain at least 50,000 rows of data and 5 columns
Notebook must include all the steps listed in the project guidelines above
Notebook must be executed end-to-end with error-free outputs for all cells
You must train at least 2 different types of machine learning models
You must tune at least 2 different hyperparameters for your chosen model
Your model's performance on the validation set must be reasonably good
Your project must be documented extensively using markdown cells
Notebook must include references to relevant notebooks/tutorials/documentation sites
Your notebook must not be plagiarized (i.e., directly copied) from another project

NOTE: It may take 7-10 days for your project to be evaluated, due to a high volume of submissions.