Learn practical skills, build real-world projects, and advance your career

Using LSTM to predict Dow Jones price

This was a great class and really got me thinking about how to apply deep learning solutions.
One of the things I'm interested in is analysis and predictions on time-series data
I did some reading and practicing outside of the course, and learned something about LSTM architecture. LSTM is a type of recurrent neural net that allows information contained in seqences of data to be leveraged in the model. In other words, relationships between successive data points can be used in the model, not just individual data points.
I also learned how to get started with the Skorch library, which is a wrapper around PyTorch that allows PyTorch models to be used in a typical Scikit-Learn pipeline. Since I use Scikit at work, I thought might be a good idea to get up to speed with Skorch.
So my general approach was to use:

  • Skorch
  • LSTM

To predict a time series.

I'm joining two daily datasets:

  • Dow Jones Industrial Average
  • Foreign exchange rates (Euro, Japanese Yen, Mexican Peso, Chines Yuan) in terms of US Dollar

And trying to use the previous 5 days (i.e. a moving window) of data to predict the closing Dow Jones price the following day

import pandas as pd
import numpy as np
import sqlite3

from sklearn.preprocessing import StandardScaler

import torch
import torch.nn as nn
import torch.functional as F

from skorch import NeuralNetRegressor

import seaborn as sns
import matplotlib.pyplot as plt

import pickle
con = sqlite3.connect(":memory:")
#read in stock market data:
df_djia = pd.read_csv("/mnt/c/Users/jdbri/Downloads/^DJI.csv",index_col="Date")
print(df_djia.head())
df_djia.to_sql("djia", con, if_exists="replace")
Open High Low Close \ Date 2000-01-03 11501.849609 11522.009766 11305.690430 11357.509766 2000-01-04 11349.750000 11350.059570 10986.450195 10997.929688 2000-01-05 10989.370117 11215.099609 10938.669922 11122.650391 2000-01-06 11113.370117 11313.450195 11098.450195 11253.259766 2000-01-07 11247.059570 11528.139648 11239.919922 11522.559570 Adj Close Volume Date 2000-01-03 11357.509766 169750000 2000-01-04 10997.929688 178420000 2000-01-05 11122.650391 203190000 2000-01-06 11253.259766 176550000 2000-01-07 11522.559570 184900000
/home/jeff/.local/lib/python3.8/site-packages/pandas/core/generic.py:2602: UserWarning: The spaces in these column names will not be changed. In pandas versions < 0.14, spaces were converted to underscores. sql.to_sql(
#read in forex data:
df = pd.read_csv("./FRB_H10.csv", na_values=['ND'], skiprows=5, index_col="Time Period").dropna() #first 5 rows are metadata
df.columns ="EUR CAD CNY MXN".split()
df['EUR'] = 1/df['EUR'] #the EUR column is originally in $/EUR, need to change to EUR/$
print(df.head())
df.to_sql("forex",con,if_exists="replace")
EUR CAD CNY MXN Time Period 2000-01-03 0.984737 1.4465 8.2798 9.4015 2000-01-04 0.970026 1.4518 8.2799 9.4570 2000-01-05 0.967586 1.4518 8.2798 9.5350 2000-01-06 0.968617 1.4571 8.2797 9.5670 2000-01-07 0.971440 1.4505 8.2794 9.5200
/home/jeff/.local/lib/python3.8/site-packages/pandas/core/generic.py:2602: UserWarning: The spaces in these column names will not be changed. In pandas versions < 0.14, spaces were converted to underscores. sql.to_sql(