Learn practical skills, build real-world projects, and advance your career

Problem Statement

  • COMPETITION: Bluebook for Bulldozers
  • QUESTION: The dataset provided includes several machine configurations and fields which provide various options a machine can have. The problem involves predicting the sale price of bulldozers sold in an auction.

The plan is to use the dataset from the bluebook to train a regression model for predicting sales price of the machines.

Before we begin there are certain key points to be remember:

  • There are two types of models which are used to solve the problem at hand. Both requires substantital amount of resources and time in training the model. As there are limited resources, I have run limited iterations for RandomizedSearchCV and GridSearchCV for tuning both parameters.
  • Second, RanomizedSearchCV as the name suggests takes random paramter combinations as defined in the grid. Hence, reproducing the same exact best_params scenario is not feasible. I would suggest you to save your model every now and then until you get satisfactory scores.
  • The charts prepared for tuning RandomForestRegressor individual parameters have been done separately as they each take a lot of time and run. In case you want to regenerate the charts with a different set of input please do it so by uncommenting the test_param_and_plot method call

Package Requirements

Package Installation

!pip install tabulate pandas numpy seaborn matplotlib plotly sklearn py7zr jovian opendatasets lightgbm --quiet