Back to BlogGuides

Ensemble Forecasting: Why Combining Models Beats Picking the 'Best' One

Learn how ensemble forecasting reduces prediction errors by combining multiple models. We analyze real quarterly revenue data to show when and why the 'wisdom of the crowd' approach outperforms individual models.

January 20, 202611 min read

You've run four forecasting models on your data. LightGBM says revenue will hit €57M next quarter. SARIMAX says €51M. Prophet says €59M. Theta-GAM says €62M.

Which one do you trust?

Here's a counterintuitive answer: trust all of them—by averaging their predictions into an ensemble forecast.

The Wisdom of the Crowd, Applied to Forecasting

In 1906, statistician Francis Galton observed something strange at a county fair. Nearly 800 people guessed the weight of an ox. Individual guesses were wildly off, but the average of all guesses was almost perfect—within 1% of the actual weight.

This "wisdom of the crowd" effect appears everywhere: prediction markets, jury decisions, and yes, forecasting models.

Different forecasting models make different assumptions about your data:

  • LightGBM learns complex non-linear patterns through gradient boosting
  • SARIMAX-Ridge models autoregressive relationships with regularization
  • Prophet detects changepoints and handles seasonality flexibly
  • Theta-GAM decomposes trends and applies exponential smoothing

When these models agree, you can be confident. When they disagree, the average often lands closer to the truth than any single prediction—because individual model errors tend to cancel out.

A Real Example: Quarterly Revenue Forecasting

Let's look at actual quarterly revenue data spanning 30+ years. We trained all four models and generated forecasts for 12 quarters (2022-Q1 through 2024-Q4), then compared predictions against actual results.

Individual model forecasts showing varying predictions Four models, four different forecasts. Which one is "right"?

Here's what each model predicted versus what actually happened:

PeriodLightGBMSARIMAXTheta-GAMProphetEnsembleActual
2022-Q154.96M51.92M56.04M59.48M55.60M53.05M
2022-Q254.76M51.59M55.98M58.54M55.22M52.94M
2022-Q355.66M51.42M56.72M56.81M55.15M53.67M
2022-Q456.05M51.45M56.17M60.77M56.11M55.41M
2023-Q156.03M51.35M66.80M59.54M58.43M62.94M
2023-Q255.39M51.40M57.37M58.60M55.69M61.40M
2023-Q358.82M51.31M59.30M56.87M56.58M53.39M
2023-Q458.36M51.36M59.57M60.83M57.53M55.68M
2024-Q157.38M51.27M62.42M59.60M57.67M55.91M
2024-Q259.03M51.32M60.07M58.66M57.27M54.92M
2024-Q359.17M51.23M61.08M56.94M57.10M51.59M
2024-Q459.05M51.27M62.09M60.90M58.33M53.42M

Now let's calculate the Mean Absolute Percentage Error (MAPE) for each approach:

ModelMAPE
Ensemble5.51%
SARIMAX-Ridge6.83%
LightGBM6.92%
Prophet8.49%
Theta-GAM8.74%

The ensemble—a simple average of all four models—achieved the lowest error rate, beating even the best individual model by 1.3 percentage points.

Ensemble forecast result The ensemble forecast (green) smooths out individual model volatility.

Why Does This Work?

The ensemble outperforms for a simple statistical reason: uncorrelated errors cancel out.

Consider 2022-Q4, where the actual value was €55.41M:

  • LightGBM overshot by €0.64M (+1.1%)
  • SARIMAX undershot by €3.97M (-7.2%)
  • Theta-GAM overshot by €0.76M (+1.4%)
  • Prophet overshot by €5.36M (+9.7%)

The ensemble prediction of €56.11M overshot by just €0.70M (+1.3%)—better than three of the four individual models.

This isn't magic. It's what happens when you combine predictions that err in different directions. Over 12 quarters, the models' biases partially offset each other.

When Ensembles Work Best

Ensemble forecasting shines when:

1. Models produce plausible but different predictions

If all four models cluster tightly around €55M, the ensemble won't add much. But if they spread from €51M to €62M, you're getting diverse perspectives that can be productively combined.

2. You don't have strong prior knowledge about which model fits best

If you know your data has strong seasonality and Prophet historically nails it, maybe trust Prophet. But if you're forecasting a new product line or entering a new market, model uncertainty is high—and ensembles hedge that uncertainty.

3. The forecast horizon is medium to long-term

Short-term forecasts (1-2 periods) may be dominated by recent momentum that one model captures well. Longer horizons involve more uncertainty where ensemble averaging pays off.

4. Accuracy matters more than interpretability

A single model's forecast is easier to explain: "Prophet detected a trend change in Q3 2021." An ensemble is a black box that's harder to narratively justify—but often more accurate.

When to Skip Ensembles

Don't blindly combine models. Ensembles can hurt when:

One model is clearly wrong

Including it in the ensemble might drag down accuracy. You might get better results ensembling only the other three.

Models are highly correlated

If you run three variations of gradient boosting, their errors will be correlated. Averaging them won't help much—you're just averaging similar mistakes.

You need point forecasts for downstream systems

Some inventory or financial planning systems want a single number, not a range. While ensembles give you a central estimate, you lose the confidence intervals that individual models provide.

How to Build an Ensemble in Sanvia

Building an ensemble takes three clicks:

  1. Run your forecast with multiple models selected
  2. On the results page, click Ensemble
  3. Select which models to include (or use all)

Sanvia calculates a simple average across selected models for each forecast period. You can then compare the ensemble against individual models and export whichever you prefer.

For more advanced use cases, you can also:

  • Weight models differently: If LightGBM historically outperforms on your data, give it 40% weight vs. 20% for others
  • Exclude outlier models: Drop any model that seems systematically biased
  • Ensemble subsets: Try combining just the top 2 performers

Practical Recommendations

Based on our analysis and experience across hundreds of forecasts:

Start with a simple average of all models. It's robust and requires no tuning. In most cases, it will match or beat your best individual model.

Look at the spread between models. If predictions range widely (>15-20% difference between highest and lowest), the ensemble is likely adding value. If they're tightly clustered (<5%), you could just pick any one.

Review ensemble vs. individual performance regularly. Your data patterns may shift. A model that was dragging down the ensemble might start outperforming—or vice versa.

Don't ensemble when you have domain knowledge. If you know a structural change is coming (new product launch, market entry, regulatory change), individual models that can incorporate that knowledge may beat a backward-looking ensemble.

The Bottom Line

Ensemble forecasting isn't a silver bullet, but it's one of the easiest ways to improve forecast accuracy without additional data or model tuning.

In our test case, combining four models reduced error from 6.83% (best individual) to 5.51% (ensemble)—a 19% improvement in accuracy with zero extra effort.

When your models disagree, don't agonize over which one to trust. Trust all of them.


Ready to try ensemble forecasting on your data? Start your free trial and run multiple models in minutes.

Ready to forecast smarter?

Try Sanvia free for 14 days.

Start Free Trial