Course Project - Real-World Machine Learning Model
Machine Learning with Python: Zero to GBMs
In the course project, you will apply the machine learning skills covered in this course by training an ML model on a real-world dataset. Follow these steps to complete your project:
-
Pick a large real-world dataset from Kaggle (see the "Resources" section below) and download it using
opendatasets. Your training set should contain at least 50,000 rows and 5 columns of data. -
Read the dataset description, understand the problem statement and describe the modeling objective clearly. You can also browse through existing notebooks created by others for inspiration.
-
Perform exploratory data analysis, gather insights about the data, perform feature engineering, create a training-validation split, and prepare the data for modeling.
-
Train & evaluate different machine learning models, tune hyperparameters and reduce overfitting to improve the model.
-
Report the final performance of your best model(s), show sample predictions, and save model weights. Summarize your work, share links to references, and suggest ideas for future work.
-
Publish your Jupyter notebook to Jovian, make a submission below and share your project with the community. Optionally, you may also write a blog post and contribute to the Jovian official blog.
There is no starter notebook for the course project. Please use the "New" button on Jovian to create a new notebook, "Run on Colab" to execute it, and jovian.commit to record versions. Please review the "Evaluation Criteria" and "Resources" sections below carefully before starting your project.
Recommended Datasets
Use Kaggle to find real-world datasets and past competitions for machine learning.
Here are some project ideas to choose from:
- Driver Alertness Detection - https://www.kaggle.com/c/stayalert
- Insurance Claim Prediction - https://www.kaggle.com/c/ClaimPredictionChallenge
- Financial Distress Prediction: https://www.kaggle.com/c/GiveMeSomeCredit
- Used Car Quality Detection - https://www.kaggle.com/c/DontGetKicked
- Photo Quality Prediction - https://www.kaggle.com/c/PhotoQualityPrediction
- Bond Price Prediction - https://www.kaggle.com/c/benchmark-bond-trade-price-challenge
- Biological Response Prediction - https://www.kaggle.com/c/bioresponse
- Eye Movement Verification and Identification - https://www.kaggle.com/c/emvic
- Next Song Suggestion - https://www.kaggle.com/c/msdchallenge
- Xbox Game Suggestion - https://www.kaggle.com/c/acm-sf-chapter-hackathon-small
- E-commerce Product Suggestion - https://www.kaggle.com/c/acm-sf-chapter-hackathon-big
- Home Credit Default Risk Prediction - https://www.kaggle.com/c/home-credit-default-risk
- Expedia Hotel Recommendation - https://www.kaggle.com/c/expedia-hotel-recommendations
- Job Posting Recommendation - https://www.kaggle.com/c/job-recommendation
- New Song Recommendation - https://www.kaggle.com/c/MusicHackathon
- Non-Profit Donor Targeting - https://www.kaggle.com/c/Raising-Money-to-Fund-an-Organizational-Mission
- Product Sales Prediction - https://www.kaggle.com/c/online-sales
- Air Quality Prediction - https://www.kaggle.com/c/dsg-hackathon
- Reducing Commercial Aviation Fatalities - https://www.kaggle.com/c/reducing-commercial-aviation-fatalities
- Bulldozer Auction Price Prediction - https://www.kaggle.com/c/bluebook-for-bulldozers
- Event Recommendation Engine - https://www.kaggle.com/c/event-recommendation-engine-challenge
- Wind Power Generation Forecasting - https://www.kaggle.com/c/GEF2012-wind-forecasting
- Bank Customer Category Classification - https://www.kaggle.com/c/elo-merchant-category-recommendation
- Zillow Home Value Prediction - https://www.kaggle.com/c/zillow-prize-1
- Santander Customer Satisfaction - https://www.kaggle.com/c/santander-customer-satisfaction
- Insurance Decision Prediction - https://www.kaggle.com/c/prudential-life-insurance-assessment
- Insurance Quote Conversion - https://www.kaggle.com/c/homesite-quote-conversion
- Energy Load Forecasting - https://www.kaggle.com/c/global-energy-forecasting-competition-2012-load-forecasting
- Product Launch Failure Prediction - https://www.kaggle.com/c/hack-reduce-dunnhumby-hackathon
- Airbnb Booking Destination Prediction - https://www.kaggle.com/c/airbnb-recruiting-new-user-bookings
- Yelp Business Rating Prediction - https://www.kaggle.com/c/yelp-recsys-2013
- Yelp Business Votes Prediction - https://www.kaggle.com/c/yelp-recruiting
- Predicting Weather From Tweets - https://www.kaggle.com/c/crowdflower-weather-twitter
- Inventory Demand Prediction - https://www.kaggle.com/c/grupo-bimbo-inventory-demand
- Manufacturing Failure Prediction - https://www.kaggle.com/c/bosch-production-line-performance
- Customer Business Value Prediction - https://www.kaggle.com/c/predicting-red-hat-business-value
- PUBG Finish Placement Prediction - https://www.kaggle.com/c/pubg-finish-placement-prediction
- Online Ad Demand Prediction - https://www.kaggle.com/c/avito-demand-prediction
- Hotel Search Ranking - https://www.kaggle.com/c/expedia-personalized-sort
- Webpage Evergreen Rating - https://www.kaggle.com/c/stumbleupon
- User Detection from Smartphone Accelerometer Data - https://www.kaggle.com/c/accelerometer-biometric-competition
- Loyal Shoppers Detection - https://www.kaggle.com/c/acquire-valued-shoppers-challenge
- Walmart Sales Forecasting - https://www.kaggle.com/c/walmart-recruiting-store-sales-forecasting
- Grocery Sales Forecasting - https://www.kaggle.com/c/favorita-grocery-sales-forecasting
- Insurance Policy Purchase Prediction - https://www.kaggle.com/c/allstate-purchase-prediction-challenge
- Laptop Malfunctioning Parts Prediction - https://www.kaggle.com/c/pakdd-cup-2014
- Mobile Ad Click-Through Prediction - https://www.kaggle.com/c/avazu-ctr-prediction
- Auto Insurance Claim Prediction - https://www.kaggle.com/c/porto-seguro-safe-driver-prediction
- Taxi Trip Duration Prediction - https://www.kaggle.com/c/nyc-taxi-trip-duration
- African Soil Property Prediction - https://www.kaggle.com/c/afsis-soil-properties
- Search Result Relevance Prediction - https://www.kaggle.com/c/crowdflower-search-relevance
- West Nile Virus Detection - https://www.kaggle.com/c/predict-west-nile-virus
- ECommerce Product Classification - https://www.kaggle.com/c/otto-group-product-classification-challenge
- Windows Malware Detection - https://www.kaggle.com/c/malware-classification
- Hourly Rain Prediction - https://www.kaggle.com/c/how-much-did-it-rain
- Shopping Trip Type Classification - https://www.kaggle.com/c/walmart-recruiting-trip-type-classification
- Context Ad Click Prediction - https://www.kaggle.com/c/avito-context-ad-clicks
- Duplicate Ads Detection - https://www.kaggle.com/c/avito-duplicate-ads-detection
- Instacart Market Basket Prediction - https://www.kaggle.com/c/instacart-market-basket-analysis
- Russian Real Estate Price Prediction - https://www.kaggle.com/c/sberbank-russian-housing-market
- Santander Customer Transaction Prediction - https://www.kaggle.com/c/santander-customer-transaction-prediction
NOTE: It is not compulsory to use one of the above datasets. You can select a dataset from any online source of your choice.