We will be using the Kaggle dataset that the paper used as part of the Kaggle competition. It contains five csv files, but two of them are for submission evaluation. The three we will be using are:
This dataset contains the data for 3049 products across ten stores in three US states. The products are further divided into three product categories and seven product departments.
Time series data is one kind of data that has a natural time order. It is collected through a sequence of time points over regular intervals. If we examine the time series data closer, we can find it contains an embedded hierarchical aggregation structure. For example, we can look at the daily sale of coffee drinks in Boston over the past three months. The coffee drinks can be disaggregated by types into the sales of latte, cappuccino, espresso, americano, and so on. When the time series data is collected with a hierarchical aggregation structure, we call it hierarchical time series (HTS) data (Hyndman et al., 2018). Although dealing with HTS data can be a daunting task, it is essential for decision making in the business field, especially when it comes to forecasting HTS.
However, one of the main challenges of HTS forecasting is to keep the forecasting consistent in each hierarchy. Often the lowest level of the hierarchy will exhibit a strong pattern, but the upper levels of the hierarchy, through aggregation by the bottom level, will contain the forecastable components. By spatially aggregating the intermittent time series (Zufferey et al., 2016), or temporally aggregating the intermittent time series (Nikolopoulos et al. 2011), overall forecast accuracy can be improved.
In this paper, the authors split up the hierarchy into the continuous time series at the top level and the intermediate time series on the bottom level. By treating the bottom levels as mutable, they can achieve higher accuracy on the top levels of the model.
To achieve this, they combine two models: N-BEATS (Oreshkin et al., 2019), a deep neural network architecture that uses deep fully connected layers and backward and forward residual links, and LightGBM (Ke et al., 2017), which uses gradient boosting decision trees, partitioning input variables into tree structures which are then used in the final decision.
The authors aim to apply the deep learning forecasting model N-BEATS with 30 stacks at the top five levels and the tree-based algorithm LightGBM to increase the HTS forecasting accuracy. In addition, to overcome the hierarchy forecasting inconsistency, the authors apply a variate bottom-up method called hierarchical-forecasting-with-alignment approach, which means adding a bias to the bottom hierarchy.
The author applies an intricate approach that combines N-BEATS and LightGBM, which leads to heavy computation and potential overfitting problems. Therefore, we want to do a comparison between the prediction accuracy of this complex model and a shallow Convolutional Neural Networks (CNN) model with 3 layers. We also would like to apply dropout to the neurons in the CNN to see if we can achieve better performance. We hope that through this comparison, we can better understand the effectiveness and underlying motive of the authors' construction of their solution.
[1] Hyndman, R. J. and Athanasopoulos, G. (2018). Forecasting: principles and practice. Melbourne, Australia: OTexts.
[2]Oreshkin, B. N., Carpov, D., Chapados, N., and Bengio, Y. (2019). "N-BEATS: Neural basis expansion analysis for interpretable time series forecasting". In: arXiv preprint arXiv:1905.10437.
[3]Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., and Liu, T.-Y. (2017). "LightGBM: A highly efficient gradient boosting decision tree". In: Advances in neural information processing systems, pp. 3146-3154.
[4]Zufferey, T., Ulbig, A., Koch, S., and Hug, G. (2016). "Forecasting of smart meter time series based on neural networks". In: International workshop on data analytics for renewable energy integration. Springer, pp. 10-21.
[5]Nikolopoulos, K., Syntetos, A. A., Boylan, J. E., Petropoulos, F., and Assimakopoulos, V. (2011). "An aggregate-disaggregate intermittent demand approach (ADIDA) to forecasting: an empirical proposition and analysis". In: Journal of the Operational Research Society 62.3, pp. 544-554.
Levi Kaplan, Ming Luo