This is Transformer for time series classification. Very heavily inspired by Peter Bloem's code and explanations. Idea of adding positional encodings with 1D convolutions is from Attend and Diagnose paper.
Given sequence of time series, determine to which class it belongs. In the financial context this would be something like "Can we predict if the future price will go up or down given the sequence of last n
prices?".
Instead of using something like LSTM, RNN or TCN, we've decided to build Transformer. To start with, Medium has a great review of various methods.
We've mostly used approach from Attend & Diagnose paper; Dense Interpolation is taken from here. However, in this case 1D Convolution and maxpool worked better than Dense Interpolation. See chart below for very high level archicture overview:
Image taken from here.
See transformers_time_series.ipynb
for an example.