Next-Frame Video Prediction with Convolutional LSTMs

Author: Amogh Joshi

Date created: 2021/06/02

Last modified: 2021/06/05

Description: How to build and train a convolutional LSTM model for next-frame video prediction.

Introduction

The
Convolutional LSTM
architectures bring together time series processing and computer vision by
introducing a convolutional recurrent cell in a LSTM layer. In this example, we will explore the
Convolutional LSTM model in an application to next-frame prediction, the process
of predicting what video frames come next given a series of past frames.

Setup

import numpy as np
import matplotlib.pyplot as plt

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

import io
import imageio
from IPython.display import Image, display
from ipywidgets import widgets, Layout, HBox

Dataset Construction

For this example, we will be using the
Moving MNIST
dataset.

We will download the dataset and then construct and
preprocess training and validation sets.

For next-frame prediction, our model will be using a previous frame,
which we'll call f_n, to predict a new frame, called f_(n + 1).
To allow the model to create these predictions, we'll need to process
the data such that we have "shifted" inputs and outputs, where the
input data is frame x_n, being used to predict frame y_(n + 1).