Learn practical skills, build real-world projects, and advance your career
Created 3 years ago
Cross-Validation with Linear Regression
This notebook demonstrates how to do cross-validation (CV) with linear regression as an example (it is heavily used in almost all modelling techniques such as decision trees, SVM etc.). We will mainly use sklearn
to do cross-validation.
This notebook is divided into the following parts:
0. Experiments to understand overfitting
- Building a linear regression model without cross-validation
- Problems in the current approach
- Cross-validation: A quick recap
- Cross-validation in
sklearn
:- 4.1 K-fold CV
- 4.2 Hyperparameter tuning using CV
- 4.3 Other CV schemes
0. Experiments to Understand Overfitting
In this section, let's quickly go through some experiments to understand what overfitting looks like. We'll run some experiments using polynomial regression.
# import all libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import re
import sklearn
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import PolynomialFeatures
from sklearn.preprocessing import scale
from sklearn.feature_selection import RFE
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import make_pipeline
import warnings # supress warnings
warnings.filterwarnings('ignore')
# import Housing.csv
housing = pd.read_csv('Housing.csv')
housing.head()
# number of observations
len(housing.index)
545