Transformer-based models have been successfully applied in many fields like natural language processing (think BERT or GPT models) and computer vision to name a few.
However, when it comes to time series, state-of-the-art results have mostly been achieved by MLP models (multilayer perceptron) such as N-BEATS and N-HiTS. A recent paper even shows that simple linear models outperform complex transformer-based forecasting models on many benchmark datasets (see Zheng et al., 2022).
Still, a new transformer-based model has been proposed that achieves state-of-the-art results for long-term forecasting tasks: PatchTST.