Jovian
⭐️
Sign In

Course Project - Real-World Machine Learning Model

Machine Learning with Python: Zero to GBMs

Deadline: Aug 14, 11:59 PM GMT

In the course project, you will apply the machine learning skills covered in this course by training an ML model on a real-world dataset. Follow these steps to complete your project:

  1. Pick a large real-world dataset from Kaggle (see the "Recommended Datasets" section below) and download it using opendatasets. Your training set should contain at least 50,000 rows and 5 columns of data.

  2. Read the dataset description, understand the problem statement and describe the modeling objective clearly. You can also browse through existing notebooks created by others for inspiration.

  3. Perform exploratory data analysis, gather insights about the data, perform feature engineering, create a training-validation split, and prepare the data for modeling.

  4. Train & evaluate different machine learning models, tune hyperparameters and reduce overfitting to improve the model.

  5. Report the final performance of your best model(s), show sample predictions, and save model weights. Summarize your work, share links to references, and suggest ideas for future work.

  6. Publish your Jupyter notebook to Jovian, make a submission below and share your project with the community. Optionally, you may also write a blog post and contribute to the Jovian official blog.

There is no starter notebook for the course project. Please use the "New" button on Jovian to create a new notebook, "Run on Colab" to execute it, and jovian.commit to record versions. Please review the "Evaluation Criteria" and "Recommended Datasets" sections below carefully before starting your project.

Assignment Notebook

Use the starter notebook(s) to get started with the assignment. Read the problem statement, follow the instructions, add your solutions, and make a submission.

Make a Submission

Use the following command to submit directly from the notebook
Learn
jovian.submit(assignment="zerotogbms-project")
OR
Notebook Link (Required)
You can submit multiple times. Only your last submission will be evaluated.

Evaluation Criteria

Your submission must satisfy the following criteria:

  • Training set should contain at least 50,000 rows of data and 5 columns
  • Notebook must include all the steps listed in the project guidelines above
  • Notebook must be executed end-to-end with error-free outputs for all cells
  • You must train at least 2 different types of machine learning models
  • You must tune at least 2 different hyperparameters for your chosen model
  • Your model's performance on the validation set must be reasonably good
  • Your project must be documented extensively using markdown cells
  • Notebook must include references to relevant notebooks/tutorials/documentation sites
  • Your notebook must not be plagiarized (i.e., directly copied) from another project

Recommended Datasets

Use Kaggle to find real-world datasets and past competitions for machine learning.

Here are some project ideas to choose from:

  1. Driver Alertness Detection - https://www.kaggle.com/c/stayalert
  2. Insurance Claim Prediction - https://www.kaggle.com/c/ClaimPredictionChallenge
  3. Financial Distress Prediction: https://www.kaggle.com/c/GiveMeSomeCredit
  4. Used Car Quality Detection - https://www.kaggle.com/c/DontGetKicked
  5. Photo Quality Prediction - https://www.kaggle.com/c/PhotoQualityPrediction
  6. Bond Price Prediction - https://www.kaggle.com/c/benchmark-bond-trade-price-challenge
  7. Biological Response Prediction - https://www.kaggle.com/c/bioresponse
  8. Eye Movement Verification and Identification - https://www.kaggle.com/c/emvic
  9. Next Song Suggestion - https://www.kaggle.com/c/msdchallenge
  10. Xbox Game Suggestion - https://www.kaggle.com/c/acm-sf-chapter-hackathon-small
  11. E-commerce Product Suggestion - https://www.kaggle.com/c/acm-sf-chapter-hackathon-big
  12. Home Credit Default Risk Prediction - https://www.kaggle.com/c/home-credit-default-risk
  13. Expedia Hotel Recommendation - https://www.kaggle.com/c/expedia-hotel-recommendations
  14. Movie Box Office Revenue Prediction - https://www.kaggle.com/c/tmdb-box-office-prediction
  15. Job Posting Recommendation - https://www.kaggle.com/c/job-recommendation
  16. New Song Recommendation - https://www.kaggle.com/c/MusicHackathon
  17. Non-Profit Donor Targeting - https://www.kaggle.com/c/Raising-Money-to-Fund-an-Organizational-Mission
  18. Product Sales Prediction - https://www.kaggle.com/c/online-sales
  19. Air Quality Prediction - https://www.kaggle.com/c/dsg-hackathon
  20. Reducing Commercial Aviation Fatalities - https://www.kaggle.com/c/reducing-commercial-aviation-fatalities
  21. Bulldozer Auction Price Prediction - https://www.kaggle.com/c/bluebook-for-bulldozers
  22. Event Recommendation Engine - https://www.kaggle.com/c/event-recommendation-engine-challenge
  23. Wind Power Generation Forecasting - https://www.kaggle.com/c/GEF2012-wind-forecasting
  24. Bank Customer Category Classification - https://www.kaggle.com/c/elo-merchant-category-recommendation
  25. Zillow Home Value Prediction - https://www.kaggle.com/c/zillow-prize-1
  26. Santander Customer Satisfaction - https://www.kaggle.com/c/santander-customer-satisfaction
  27. Insurance Decision Prediction - https://www.kaggle.com/c/prudential-life-insurance-assessment
  28. Insurance Quote Conversion - https://www.kaggle.com/c/homesite-quote-conversion
  29. Energy Load Forecasting - https://www.kaggle.com/c/global-energy-forecasting-competition-2012-load-forecasting
  30. Product Launch Failure Prediction - https://www.kaggle.com/c/hack-reduce-dunnhumby-hackathon
  31. Airbnb Booking Destination Prediction - https://www.kaggle.com/c/airbnb-recruiting-new-user-bookings
  32. Yelp Business Rating Prediction - https://www.kaggle.com/c/yelp-recsys-2013
  33. Yelp Business Votes Prediction - https://www.kaggle.com/c/yelp-recruiting
  34. Predicting Weather From Tweets - https://www.kaggle.com/c/crowdflower-weather-twitter
  35. Inventory Demand Prediction - https://www.kaggle.com/c/grupo-bimbo-inventory-demand
  36. Manufacturing Failure Prediction - https://www.kaggle.com/c/bosch-production-line-performance
  37. Customer Business Value Prediction - https://www.kaggle.com/c/predicting-red-hat-business-value
  38. PUBG Finish Placement Prediction - https://www.kaggle.com/c/pubg-finish-placement-prediction
  39. Online Ad Demand Prediction - https://www.kaggle.com/c/avito-demand-prediction
  40. Hotel Search Ranking - https://www.kaggle.com/c/expedia-personalized-sort
  41. Webpage Evergreen Rating - https://www.kaggle.com/c/stumbleupon
  42. User Detection from Smartphone Accelerometer Data - https://www.kaggle.com/c/accelerometer-biometric-competition
  43. Loyal Shoppers Detection - https://www.kaggle.com/c/acquire-valued-shoppers-challenge
  44. Walmart Sales Forecasting - https://www.kaggle.com/c/walmart-recruiting-store-sales-forecasting
  45. Grocery Sales Forecasting - https://www.kaggle.com/c/favorita-grocery-sales-forecasting
  46. Insurance Policy Purchase Prediction - https://www.kaggle.com/c/allstate-purchase-prediction-challenge
  47. Laptop Malfunctioning Parts Prediction - https://www.kaggle.com/c/pakdd-cup-2014
  48. Mobile Ad Click-Through Prediction - https://www.kaggle.com/c/avazu-ctr-prediction
  49. Auto Insurance Claim Prediction - https://www.kaggle.com/c/porto-seguro-safe-driver-prediction
  50. Taxi Trip Duration Prediction - https://www.kaggle.com/c/nyc-taxi-trip-duration
  51. African Soil Property Prediction - https://www.kaggle.com/c/afsis-soil-properties
  52. Search Result Relevance Prediction - https://www.kaggle.com/c/crowdflower-search-relevance
  53. West Nile Virus Detection - https://www.kaggle.com/c/predict-west-nile-virus
  54. ECommerce Product Classification - https://www.kaggle.com/c/otto-group-product-classification-challenge
  55. Windows Malware Detection - https://www.kaggle.com/c/malware-classification
  56. Hourly Rain Prediction - https://www.kaggle.com/c/how-much-did-it-rain
  57. Shopping Trip Type Classification - https://www.kaggle.com/c/walmart-recruiting-trip-type-classification
  58. Context Ad Click Prediction - https://www.kaggle.com/c/avito-context-ad-clicks
  59. Duplicate Ads Detection - https://www.kaggle.com/c/avito-duplicate-ads-detection
  60. Instacart Market Basket Prediction - https://www.kaggle.com/c/instacart-market-basket-analysis
  61. Russian Real Estate Price Prediction - https://www.kaggle.com/c/sberbank-russian-housing-market
  62. Santander Customer Transaction Prediction - https://www.kaggle.com/c/santander-customer-transaction-prediction

NOTE: It is not compulsory to use one of the above datasets. You can select a dataset from any online source of your choice.