Learn practical skills, build real-world projects, and advance your career
Updated 3 years ago
UK House Prices from 1995 to 2017
Dataset from Kaggle
Source: https://www.kaggle.com/hm-land-registry/uk-housing-prices-paid)
-
I am using the 'uk-housing-prices-paid' dataset which has in excess of 22 million rows and 11 columns.
-
As I live in the county of Dorset, I will filter the csv to retain only Dorset data, which will give an initial dataset in excess of 200,000 rows and 11 columns.
-
The filtering will be performed via my python script 'filtercsv.py'
-
To acheiev reasonably fast performance, I will create a dataset containing only "North Dorset" data
-
The objectives are
- understand which types of property have the highest number of sales
- remove any data that is skewing the prices e.g. commercial properties or land aquired for development with large selling prices > £2,000,000
- to predict the year on year increase in house prices
- which time of year most houses sell
# install essential python libraries
!pip install pandas numpy matplotlib seaborn plotly --quiet
Import essential python libraries
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.offline import plot, iplot
from sklearn.preprocessing import OneHotEncoder, MinMaxScaler
from sklearn.linear_model import LinearRegression, SGDRegressor