Transformer are attention based neural networks designed to solve NLP tasks. Their key features are:
linear complexity in the dimension of the feature vector ;
paralellisation of computing of a sequence, as opposed to sequential computing ;
long term memory, as we can look at any input time sequence step directly.
This repo will focus on their application to times series.
Dataset and application as metamodel
Our use-case is modeling a numerical simulator for building consumption prediction. To this end, we created a dataset by sampling random inputs (building characteristics and usage, weather, ...) and got simulated outputs. We then convert these variables in time series format, and feed it to the transformer.
Adaptations for time series
In order to perform well on time series, a few adjustments had to be made:
The embedding layer is replaced by a generic linear layer ;
Original positional encoding are removed. A "regular" version, better matching the input sequence day/night patterns, can be used instead ;
A window is applied on the attention map to limit backward attention, and focus on short term patterns.