Learn practical skills, build real-world projects, and advance your career

UK House Prices from 1995 to 2017

Dataset from Kaggle
Source: https://www.kaggle.com/hm-land-registry/uk-housing-prices-paid)
  • I am using the 'uk-housing-prices-paid' dataset which has in excess of 22 million rows and 11 columns.

  • As I live in the county of Dorset, I will filter the csv to retain only Dorset data, which will give an initial dataset in excess of 200,000 rows and 11 columns.

  • The filtering will be performed via my python script 'filtercsv.py'

  • To acheiev reasonably fast performance, I will create a dataset containing only "North Dorset" data

  • The objectives are

    • understand which types of property have the highest number of sales
    • remove any data that is skewing the prices e.g. commercial properties or land aquired for development with large selling prices > £2,000,000
    • to predict the year on year increase in house prices
    • which time of year most houses sell
# install essential python libraries
!pip install pandas numpy matplotlib seaborn plotly --quiet
Import essential python libraries
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.offline import plot, iplot
from sklearn.preprocessing import OneHotEncoder, MinMaxScaler
from sklearn.linear_model import LinearRegression, SGDRegressor