Forecasting Hotel Industry Demand Using Time Series Analysis

Rojas, Cristof

This master's thesis presents an in-depth analysis and comparison of hotel demand forecasting models, with a specific focus on the effectiveness of the state-of-the-art forecasting model Temporal Fusion Transformer (TFT) compared to other models such as ARIMA, LightGBM, LSTM, and N-BEATS. The study is structured around two key research questions: firstly, assessing the improvement in forecasting accuracy when utilizing advanced models over baselines, and secondly, evaluating the impact of integrating additional explanatory variables, like weather conditions and Google Trends data, into the forecasting process.

The initial phase of the study focuses on a comprehensive literature review, establishing a baseline understanding of hotel demand forecasting and identifying gaps in current methodologies. The methodology section outlines the data collection process, encompassing historical booking data from multiple hotels and external variables like weather and online search trends. The data undergoes preprocessing to ensure validity and reliability for model training and testing. The dataset comprised several years of historical booking data from 11 different hotels, each with multiple room types, where the final year of data served as a validation set, ensuring a robust and practical evaluation of the forecasting models.

The methodology includes a thorough time series analysis of the dataset, such as time series decomposition and autocorrelation studies, to better understand underlying patterns and dependencies. A key methodological approach used in this research is time series cross-validation, which ensures that the forecasting models' evaluation is accurate and applicable to real-world scenarios. The primary forecast horizon was 21 days over a year without retraining.

The study's results highlight the TFT's superior performance in forecasting hotel demand and its advanced capabilities in handling complex temporal data. The TFT model had a relative decrease in Mean Absolute Error (MAE) of 20 % compared to the second best model LSTM, with a further improvement of 10 % using hyperparameter optimization, resulting in an MAE of 4.25 and a Mean Absolute Percentage Error (MAPE) of 11.64 %. Additionally, the thesis delves into probabilistic forecasting, evaluating the performance of individual room types and across long-term time horizons up to a year. Several higher forecast horizons were evaluated, where a 1 year horizon resulted in an MAE score of 5.26. Moreover, including external variables such as weather and Google Trends, insights further enhance the precision of the TFT model, confirming the hypothesis that additional explanatory variables contribute positively to the forecasting process. Backwards stepwise regression demonstrated that removing external variables reduced the relative MAE score by almost half. The final product is a pipeline that continuously tracks the performance of the forecasting model and automatically optimizes, retrains, and redeploys it to a user-friendly API.

Inhalt

Titelaufnahme