SageMaker/DeepAR demo on electricity dataset

This notebook complements the DeepAR introduction notebook.

Here, we will consider a real use case and show how to use DeepAR on SageMaker for predicting energy consumption of 370 customers over time, based on a dataset that was used in the academic papers [1] and [2].

In particular, we will see how to:

Prepare the dataset
Use the SageMaker Python SDK to train a DeepAR model and deploy it
Make requests to the deployed model to obtain forecasts interactively
Illustrate advanced features of DeepAR: missing values, additional time features, non-regular frequencies and category information

Running this notebook takes around 40 min on a ml.c4.2xlarge for the training, and inference is done on a ml.m4.xlarge (the usage time will depend on how long you leave your served model running).

For more information see the DeepAR documentation or paper,

%matplotlib inline

import sys
from urllib.request import urlretrieve
import zipfile
from dateutil.parser import parse
import json
from random import shuffle
import random
import datetime
import os

import boto3
import s3fs
import sagemaker
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from __future__ import print_function
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets
from ipywidgets import IntSlider, FloatSlider, Checkbox

# set random seeds for reproducibility
np.random.seed(42)
random.seed(42)

sagemaker_session = sagemaker.Session()

Before starting, we can override the default values for the following:

The S3 bucket and prefix that you want to use for training and model data. This should be within the same region as the Notebook Instance, training, and hosting.
The IAM role arn used to give training and hosting access to your data. See the documentation for how to create these.