Learn practical skills, build real-world projects, and advance your career

Multiseasonal time series analysis decomposition and forecasting with Python

Author: Daniel J., TOTH

https://tothjd.medium.com

https://www.linkedin.com/in/tothjd/

Scope

This notebook processes the Hourly Energy Consumption dataset from www.kaggle.com. The time series contains energy demand data from several power supplier companies with diverse service fields. One series was selected (American Electric Power - AEP) for demonstration purposes.

The primary goal of my analysis:

  • is to gain insights into power demand dynamics clearly to lay audience
  • perform modelling to forecast values in a meaningful way
  • forecast in a longer, one year time scale

Description of data

Data is available at https://www.kaggle.com/robikscube/hourly-energy-consumption. From the available files, AEP_hourly.csv is used.
Index values are strings instead of datetime format. Some dates are missing, some are duplicates. In case of the latter, corresponding values are different. These are to be addressed as time series model classes take time indices in datetime format with specified frequency. NaN values are not present.

Methods

Eventually, I will show that UnobservedComponents (UC) class of Statsmodels provide an efficient algorithm to cope with complex multiseasonal time series in a relatively few lines of code. Before applying UC, I perform a multiseasonal decomposition by seasonal_decompose method, in several lines. As it turns out, UC is essentially the same, however it can take arrays of exogenous variables as arguments to regress the residual and refine the model. The code below is extensively commented and hopefully shows some useful snippets or tips for the reader, such as:

  • dealing with datetime indices (model classes need formatting)
  • dealing with missing data and duplicates
  • plotting time series at random time intervals (applying random choice)
  • drawing dashed vertical line for inspecting seasonal effects
  • placing annotations with text boxes and arrows on subplots
  • extracting data from seasonal_decompose results object
  • approximating time series components with polynomial or trigonometric functions using Numpy and optimizing with scipy.optimize class
  • evaluating models by mean absolute error (MAE) and root mean squared error (RMSE) using sklearn.metrics class
  • performing model residual diagnostics

Table of Contents

1. Loading and cleaning dataframe

2. Exploratory data analysis

3. Decompositon of time series to individual components

4. In-sample prediction of model after decomposition

5. Approximation functions of model components

6. Component analysis of optimized model

7. Unobserved Components Model (UCM)

8. UCM residual diagnostics

9. UCM supplemented by exogenous variables

10. Second UCM residual diagnostics

11. UCM as pure regression analysis

12. Third UCM residual analysis

13. UCM supplemented by additional exogenous variables

14. Fourth UCM residual analysis

15. Evaluation of models

#mathematical operations
import math
import scipy as sp
import numpy as np

#data handling
import pandas as pd

#plotting
import matplotlib as mpl
import matplotlib.pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf
from statsmodels.graphics.tsaplots import plot_pacf
import seaborn as sns
sns.set()

#machine learning and statistical methods
import statsmodels.api as sm

#dataframe index manipulations
import datetime

#selected preprocessing and evaluation methods
from sklearn.preprocessing import StandardScaler
from statsmodels.tsa.stattools import kpss
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import mean_squared_error

#muting unnecessary warnings if needed
import warnings

1. Loading and cleaning dataframe

#loading raw data
df_aep = pd.read_csv("AEP_hourly.csv", index_col=0)
df_aep
#sorting unordered indices
df_aep.sort_index(inplace = True)
df_aep