Implementation of Machine Learning Models to Nowcast GDP

Overview

Teaching: 130 min
Exercises: 130 min
Questions
  • Can Google Trends and machine learning predict Nigeria’s GDP? We trained nine machine learning models to answer this question.

Objectives
  • Train and compare machine learning models that leverage Google Trends data to forecast Nigeria’s quarterly GDP that helps policymakers and businesses react faster than waiting for official GDP releases.

Leveraging Machine Learning Models and Google Trends Data to Nowcast Nigeria’s Quarterly GDP

Why GDP nowcasting matters

Objective

Train and compare machine learning models that leverage Google Trends data to forecast Nigeria’s quarterly GDP that helps policymakers and businesses react faster than waiting for official GDP releases.

GDP Nowcasting Workflow

Data Collection & Preparation

The chart bellow shows Nigeria’s quarterly GDP (in billions of Naira) from 2010 through 2024. Overall, GDP demonstrates a steady upward trend, reflecting growth in the country’s economic output over time. However, seasonal or cyclical fluctuations appear each year. By 2024, GDP has climbed to nearly twice its 2010 level, underscoring significant long-term expansion despite intermittent short-term fluctuations.

Data Preprocessing Steps

Data Aggregation involves converting monthly Google Trends data into quarterly. Normalization/Standardization then adjusts each variable to a common scale, so large-value features don’t dominate the model. Next, Trend Removal strips away long-term upward or downward drifts in GDP, letting us focus on short-term or cyclical fluctuations. Finally, Seasonality Removal subtracts repeating seasonal effects such as quarter-to-quarter spikes so the underlying patterns in the data are more apparent for accurate nowcasting.

Feature Engineering Process

This workflow begins by calculating the GDP growth rate from raw GDP data, establishing the main economic performance indicator. Next, we create lag features shifting variables by one or more quarters to capture predictive relationships over time. Finally, we merge all features (including the newly generated lags) into a single dataset, enabling seamless modeling and analysis.

This plot presents the quarterly GDP growth rates from 2010 to 2024, clearly demonstrating a recurring pattern of substantial peaks and troughs that appear to occur annually.

Train–Test Split

The final dataset was partitioned into a training set (80%) covering the period from 2010-09-30 to 2021-12-31 and 20% from 2022-03-31 to 2024-09-30 for out-of-sample (test set).

Model Training & Forecasting

Nine (9) different machine learning algorithms were trained on the final data set.

Two forecasting methods were employed:

The implementation notebooks for each of the nine models are available bellow. You are welcome to modify or improve them.

ML Models
LGBM Regressor
Extra Trees Regressor
Random Forest Regressor
ElasticNet Regressor
XGBoost Regressor
Gradient Boosting Regressor
Lasso Regressor
Decision Tree Regressor
Ridge Regressor

Hyperparameter Tuning & Cross Validation

Optimization: Hyperparameter optimization was achieved through an exhaustive grid search approch over a predefined parameter space.

Cross-Validation Strategy: Time-series cross-validation, such as sklearn.TimeSeriesSplit, was employed to ensure robust performance evaluation.

Model Evaluation

The coefficient of determination (R²) was calculated for both the training and test datasets. The reported R-squared value reflects the model’s performance on the held-out test set.

ML Models R2
LGBM Regressor 0.868
Extra Trees Regressor 0.856
Random Forest Regressor 0.798
ElasticNet Regressor 0.790
XGBoost Regressor 0.780
Gradient Boosting Regressor 0.775
Lasso Regressor 0.732
Decision Tree Regressor 0.688
Ridge Regressor 0.646

Model Uncertainty (Confidence Intervals)

Non-parametric bootstrap resampling was used to estimate 5-95% confidence intervals for the model’s predictions.

Visualization

Actual and predicted GDP time series were compared graphically, with the forecast uncertainty represented by shaded confidence intervals.

Training Period:

Test Period:

Confidence Interval (Shaded Area): Using bootstrap, we bracket the potential future GDP from the 5th to 95th percentile. This range accounts for model and sampling uncertainty. The shaded confidence intervals reflect uncertainty critical for policymakers to hedge risks.

Key Insights

LGBM Regressor

The plot illustrates the predicted GDP (in Naira) using an LGBM Regressor, incorporating 5-95% confidence intervals over a quarterly timeline. The actual GDP values are represented by the black line with dots, covering both the training and test periods. The blue line indicates the model’s predictions during the training phase, demonstrating how well it fits historical data. The red line represents the GDP nowcasting for the future period, with a shaded red region highlighting the uncertainty range. The blue-shaded area corresponds to the training period, while the red-shaded region reflects the confidence interval for future predictions, showing increasing uncertainty over time. The model achieves an R² score of 0.87, indicating a strong fit, explaining 87% of the variance in GDP. Overall, the model effectively captures seasonal and trend patterns, making it reliable for short-term forecasting, though the widening confidence intervals suggest that caution is needed when interpreting long-term projections.

Extra Trees Regressor

The Extra Trees Regressor model achieves an R² score of 0.86, suggesting a strong fit, explaining 86% of the variance in GDP. Compared to the previous LGBM model, this Extra Trees model exhibits more rigid step-like predictions, a characteristic of tree-based models.

Random Forest Regressor The Random Forest Regressor achieved a high R-squared of 0.80, explaining 80% of the variance in GDP. This suggests a strong fit to the data. Relative to other models evaluated, Random Forest demonstrated a commendable balance between accurately capturing historical GDP trends and acknowledging inherent model uncertainty.

ElasticNet Regressor The ElasticNet Regressor achieved an R-squared of 0.79, explaining 79% of the GDP variance, but indicating a slightly weaker fit compared to previous models. While the model follows the general GDP trend, the widening confidence intervals, coupled with the lower R-squared, suggest increased uncertainty in future predictions, particularly for longer forecasting horizons.

XGBoost Regressor The XGBoost model achieved an R-squared of 0.78, indicating a reasonable fit to the historical GDP data, explaining 78% of the variance. However, its performance was slightly lower than Random Forest and Extra Trees. While XGBoost effectively models past trends, the widening confidence intervals after 2022 suggest that near-term forecasts are more reliable than long-term projections.

Gradient Boosting Regressor The Gradient Boosting Regressor demonstrated a coefficient of determination (R-squared) of 0.78, indicating a strong fit to the historical GDP data, comparable to that of the XGBoost model, explaining 78% of the variance. However, the widening confidence intervals post-2022 underscore the increasing uncertainty associated with long-term forecasts. Despite the model’s strong performance on historical data, caution should be exercised when interpreting predictions beyond the near term.

Lasso Regressor The Lasso Regression model, exhibiting an R-squared of 0.73, demonstrates a marginally lower fit compared to models such as Gradient Boosting and XGBoost, accounting for 73% of the variance in GDP.

Decision Tree Regressor The Decision Tree model exhibited a coefficient of determination (R-squared) of 0.69, indicating a weaker fit compared to models such as Gradient Boosting, Random Forest, and LGBM, accounting for 69% of the variance in GDP.

Ridge Regressor The Ridge Regression model achieved a coefficient of determination (R-squared) of 0.65, explaining 65% of the variance in GDP. This indicates a weaker fit compared to models such as Gradient Boosting, LGBM, and Random Forest.

Future Enhancements

Final Wrap

By combining Google Trends signals and advanced ML models, we achieved ~87% accuracy (R²) in nowcasting Nigeria’s GDP, offering an actionable, real-time vantage on economic performance.

Key Points

  • Google Trends, Machine Learning, Quarterly GDP, LGBM Regressor, Extra Trees Regressor, Random Forest Regressor, Elastic-Net Regressor, XGBoost Regressor, Gradient Boosting Regressor, Lasso Regressor, Decision Tree Regressor, Ridge Regressor

Copyright © 2024 UNECA-ACS

Contact