Unified Time Series model

1. Introduction

The rise of foundational models in the sphere of generative artificial intelligence have already had a massive impact on deep learning. Foundational models are models that are trained on a broad array of data and are capable of performing various tasks The most notable examples are large language models as they were the first to arise, initially in the area of natural language processing. Starting off being capable of only text-to-text tasks, foundational models have since developed the ability to perform tasks such as image generation, code completion, and visual comprehension among many others. However, one area in which there has not been a development of a unified, foundational model is time series forecasting. Time series forecasting tasks can be broken down into several categories.

Long-Term Forecasting - Prediction based on a lookback window, long-term refers to a time horizon that generally extends more than one year and can range up to years or decades
Short-Term Forecasting - Prediction based on a lookback window, short-term refers to a time horizon that generally extends to the next few days months, or in some cases next few minutes, hours, or days
Imputation - Filling in missing data based on the structure of the time series
Classification - Categorizing time series data into predefined groups based on temporal features
Anomaly Detection - The identification of unusual patterns or events that deviate from the expected behavior of a time series

A number of state of the art models exist for the various tasks, with some of the best performers being TimesNet, TimeMixer, and iTransformer. But until now, there has not been a foundational model for time series data. This is due to a few challenges that are unique to time series data. One is that characteristics of time series data vary widely over different domains. Since time series models learn based on temporal dynamics, being able to capture common dynamics that apply to unseen all time series is key in developing a unified model. Another is that different tasks in time series analysis have fundamentally different goals such as regression or classification in typical machine learning. A foundational model must be able to adapt to a range of tasks without compromising performance. The researchers of the UniTS paper were able to build a unified time series model capable of generalizing over multiple domains and tasks. We believe that this development can have huge implications for the future of time series analysis in practice. In our research we seek to test the capabilities of the UniTS model over multiple tasks with a new dataset. Being able to use one model with the same weights for multiple tasks can be cost and time effective across a variety of domains.

2. Literature Review

Gao et al. developed the UniTS model to handle multiple time series tasks with shared weights. Their generalized model outperforms task-tuned baseline models 27 of 38 times. The tasks consisted of forecasting, imputation, anomaly detection, and classification, all of which can be performed by the same model without requiring any additional task-specific modules. The UniTS model consists of a token based approach inspired by LLMs. The model introduces three tokens: sequence tokens, prompt tokens, and task tokens. The time series input is encoded into the sequence tokens, the prompt tokens carry important information about the task and domain, and the task tokens are concatenated to the sequence and prompt tokens to be transformed into the output prediction later. Their results are indicative of a model capable of zero-shot, few-shot, and prompt-based learning across many domains. We want to identify the strengths and weaknesses of the novel Unified Time Series model proposed by evaluating it on different human activity monitoring tasks. The researchers documented their averaged results across several datasets and tasks, but it is unclear how the model will perform individually on each task for a novel human activity dataset.

3. Implementation

In order to conduct our evaluation we had to pull in the repository from the GitHub of the authors. From there got the models that they used as well as their pre-trained weights that were shared throughout the model. After some code changes to get the model running properly on our machine we were able to successfully test the transfer and zero-shot learning capabilities of the models on some of the data that was provided. The data that the authors used was formatted according to the .ts Time Series for classification. Some data processing was done in order to reformat the data for forecasting analysis. There were various data loaders and datasets for the different tasks. We followed the format that was in place for other datasets and added ours to the corresponding files for the tasks of classification, forecasting, and anomaly detection. Our first goal was to test the transfer learning capabilities for classification on our dataset so we ran our data along with the other data that was already present. We chose to only run prompt-tuning rather than doing more supervised training. This meant that only the tokens were updated and not the underlying model. Next, we wanted to test the transfer learning capabilities on forecasting for our data. We followed similar steps and added our data to the data loader for the prompt-tuning forecasting task. The two tasks were defined in the following way:

Link to code base

4. Experimental Findings

After running the model with the conditions specified above we got the following results. For the classification task we received the following results after running our multi-task classification with our new data set among others predefined by the authors. The model returned an accuracy of 0.70625 on the data as can be seen in CLS_Project.

For the our forecasting tasks we got an MSE of 0.4882508 and an MAE of 0.45813653. This can be seen in the task named LTF_Project.

We also ran TimesNet, the generally accepted current SOTA time series model on our forecasting task.

We were not able to run TimesNet for the classification task. However, for a similar dataset with 3 dimensions TimesNet achieved an accuracy score of 78.0. This indicates that UniTS was not quite as good as TimesNet, but was comparable.

5. Conclusion

Despite the result of our experiments, we are confident in the capabilities of the UniTS model to produce results for multiple tasks using shared weights without any task-specific modules. UniTS did not always outperform state of the art models, but did outperform in many respects This represents a large step forward for time series representations in deep learning models. The unified model generalizes temporal data well enough to outperform state of the art models fine tuned for each individual task in some cases. Once the model’s parameters have been trained, it can be applied to any of the tasks discussed above without any fine tuning modules. This provides utility to deep learning researchers who can save resources and improve efficiency by using the unified time series model to capture general temporal dynamics for several tasks. There is much more to be explored relating to this model and time series analysis. We ran into a lot of difficulties with the setup of the model and with the format of the data. If given more time, we would definitely like to explore the other capabilities of the model and test more data. More niche datasets and use cases outside the domains covered by the training data would be particularly interesting to look at.

References

[1] Gao, S., Koker, T., Queen, O., Hartvigsen, T., Tsiligkaridis, T., & Zitnik, M. UniTS: Building a Unified Time Series Model. (2024, February 29).

[2] Wu, H., Hu, T., Liu, Y., Zhou, H., Wang, J., & Long, M. TIMESNET: Temporal 2D-variation modeling. (2023)

[3] Liu, Y., Hu, T., Zhang, H., Wu, H., Wang, S., Ma, L., & Long, M. ITransformer: Inverted Transformers are effective for time series forecasting. (2024)

[4] Wang, S., Wu, H., Shi, X., Hu, T., Luo, H., Ma, L., Zhang, J., & Zhou, J. Timemixer: Decomposable Multiscale mixing for time series forecasting. (2024)

Team Members

John Berry and Nikola Dobrev.

UniTS For DS4440