Learn practical skills, build real-world projects, and advance your career

Gradient Boosting Machines (GBMs) with XGBoost

This tutorial is a part of Machine Learning with Python: Zero to GBMs and Zero to Data Science Bootcamp by Jovian

alt

The following topics are covered in this tutorial:

  • Downloading a real-world dataset from a Kaggle competition
  • Performing feature engineering and prepare the dataset for training
  • Training and interpreting a gradient boosting model using XGBoost
  • Training with KFold cross validation and ensembling results
  • Configuring the gradient boosting model and tuning hyperparamters

Let's begin by installing the required libraries.

!pip install numpy pandas matplotlib seaborn
Requirement already satisfied: numpy in /opt/conda/lib/python3.9/site-packages (1.21.0) Requirement already satisfied: pandas in /opt/conda/lib/python3.9/site-packages (1.3.0) Requirement already satisfied: matplotlib in /opt/conda/lib/python3.9/site-packages (3.4.2) Requirement already satisfied: seaborn in /opt/conda/lib/python3.9/site-packages (0.11.1) Requirement already satisfied: python-dateutil>=2.7.3 in /opt/conda/lib/python3.9/site-packages (from pandas) (2.8.1) Requirement already satisfied: pytz>=2017.3 in /opt/conda/lib/python3.9/site-packages (from pandas) (2021.1) Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.9/site-packages (from python-dateutil>=2.7.3->pandas) (1.16.0) Requirement already satisfied: pyparsing>=2.2.1 in /opt/conda/lib/python3.9/site-packages (from matplotlib) (2.4.7) Requirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.9/site-packages (from matplotlib) (0.10.0) Requirement already satisfied: pillow>=6.2.0 in /opt/conda/lib/python3.9/site-packages (from matplotlib) (8.3.1) Requirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/lib/python3.9/site-packages (from matplotlib) (1.3.1) Requirement already satisfied: scipy>=1.0 in /opt/conda/lib/python3.9/site-packages (from seaborn) (1.7.0)
!pip install jovian opendatasets xgboost graphviz lightgbm scikit-learn xgboost lightgbm --upgrade
Requirement already satisfied: jovian in /opt/conda/lib/python3.9/site-packages (0.2.41) WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f5bf4a64c40>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution')': /simple/jovian/ WARNING: Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f5bf4a5ff40>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution')': /simple/jovian/ WARNING: Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f5bf4a5fee0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution')': /simple/jovian/ WARNING: Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f5bf4a64160>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution')': /simple/jovian/ WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f5bf4a64d00>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution')': /simple/jovian/ Requirement already satisfied: opendatasets in /opt/conda/lib/python3.9/site-packages (0.1.20) WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f5bf4a0c8e0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution')': /simple/opendatasets/ WARNING: Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f5bf4a0caf0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution')': /simple/opendatasets/ WARNING: Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f5bf4a0cca0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution')': /simple/opendatasets/ WARNING: Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f5bf4a0ce50>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution')': /simple/opendatasets/ WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f5bf4a14040>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution')': /simple/opendatasets/ Requirement already satisfied: xgboost in /opt/conda/lib/python3.9/site-packages (1.4.2) WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f5bf4a148e0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution')': /simple/xgboost/ WARNING: Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f5bf4a14af0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution')': /simple/xgboost/ WARNING: Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f5bf4a14ca0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution')': /simple/xgboost/ WARNING: Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f5bf4a14e80>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution')': /simple/xgboost/ Requirement already satisfied: graphviz in /opt/conda/lib/python3.9/site-packages (0.17) Requirement already satisfied: lightgbm in /opt/conda/lib/python3.9/site-packages (3.2.1) Requirement already satisfied: scikit-learn in /opt/conda/lib/python3.9/site-packages (0.24.2) Requirement already satisfied: uuid in /opt/conda/lib/python3.9/site-packages (from jovian) (1.30) Requirement already satisfied: click in /opt/conda/lib/python3.9/site-packages (from jovian) (8.0.1) Requirement already satisfied: requests in /opt/conda/lib/python3.9/site-packages (from jovian) (2.25.1) Requirement already satisfied: pyyaml in /opt/conda/lib/python3.9/site-packages (from jovian) (5.4.1) Requirement already satisfied: kaggle in /opt/conda/lib/python3.9/site-packages (from opendatasets) (1.5.12) Requirement already satisfied: tqdm in /opt/conda/lib/python3.9/site-packages (from opendatasets) (4.61.1) Requirement already satisfied: numpy in /opt/conda/lib/python3.9/site-packages (from xgboost) (1.21.0) Requirement already satisfied: scipy in /opt/conda/lib/python3.9/site-packages (from xgboost) (1.7.0) Requirement already satisfied: wheel in /opt/conda/lib/python3.9/site-packages (from lightgbm) (0.36.2) Requirement already satisfied: threadpoolctl>=2.0.0 in /opt/conda/lib/python3.9/site-packages (from scikit-learn) (2.2.0) Requirement already satisfied: joblib>=0.11 in /opt/conda/lib/python3.9/site-packages (from scikit-learn) (1.0.1) Requirement already satisfied: python-slugify in /opt/conda/lib/python3.9/site-packages (from kaggle->opendatasets) (5.0.2) Requirement already satisfied: urllib3 in /opt/conda/lib/python3.9/site-packages (from kaggle->opendatasets) (1.26.6) Requirement already satisfied: six>=1.10 in /opt/conda/lib/python3.9/site-packages (from kaggle->opendatasets) (1.16.0) Requirement already satisfied: certifi in /opt/conda/lib/python3.9/site-packages (from kaggle->opendatasets) (2021.5.30) Requirement already satisfied: python-dateutil in /opt/conda/lib/python3.9/site-packages (from kaggle->opendatasets) (2.8.1) Requirement already satisfied: text-unidecode>=1.3 in /opt/conda/lib/python3.9/site-packages (from python-slugify->kaggle->opendatasets) (1.3) Requirement already satisfied: idna<3,>=2.5 in /opt/conda/lib/python3.9/site-packages (from requests->jovian) (2.10) Requirement already satisfied: chardet<5,>=3.0.2 in /opt/conda/lib/python3.9/site-packages (from requests->jovian) (4.0.0)