Restaurant Daily Revenue Prediction : Utilizing Synthetic Time Series Data for Improved Model Performance

University essay from Uppsala universitet/Avdelningen för beräkningsvetenskap

Abstract: This study aims to enhance the accuracy of a demand forecasting model, XGBoost, by incorporating synthetic multivariate restaurant time series data during the training process. The research addresses the limited availability of training data by generating synthetic data using TimeGAN, a generative adversarial deep neural network tailored for time series data. A one-year daily time series dataset, comprising numerical and categorical features based on a real restaurant's sales history, supplemented by relevant external data, serves as the original data. TimeGAN learns from this dataset to create synthetic data that closely resembles the original data in terms of temporal and distributional dynamics. Statistical and visual analyses demonstrate a strong similarity between the synthetic and original data. To evaluate the usefulness of the synthetic data, an experiment is conducted where varying lengths of synthetic data are iteratively combined with the one-year real dataset. Each iteration involves retraining the XGBoost model and assessing its accuracy for a one-week forecast using the Root Mean Square Error (RMSE). The results indicate that incorporating 6 years of synthetic data improves the model's performance by 65%. The hyperparameter configurations suggest that deeper tree structures benefit the XGBoost model when synthetic data is added. Furthermore, the model exhibits improved feature selection with an increased amount of training data. This study demonstrates that incorporating synthetic data closely resembling the original data can effectively enhance the accuracy of predictive models, particularly when training data is limited.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)