Ensemble Forecasting: Why Combining Models Beats Picking the 'Best' One

You've run four forecasting models on your data. LightGBM says revenue will hit €57M next quarter. SARIMAX says €51M. Prophet says €59M. Theta-GAM says €62M.

Which one do you trust?

Here's a counterintuitive answer: trust all of them—by averaging their predictions into an ensemble forecast.

The Wisdom of the Crowd, Applied to Forecasting

In 1906, statistician Francis Galton observed something strange at a county fair. Nearly 800 people guessed the weight of an ox. Individual guesses were wildly off, but the average of all guesses was almost perfect—within 1% of the actual weight.

This "wisdom of the crowd" effect appears everywhere: prediction markets, jury decisions, and yes, forecasting models.

Different forecasting models make different assumptions about your data:

LightGBM learns complex non-linear patterns through gradient boosting
SARIMAX-Ridge models autoregressive relationships with regularization
Prophet detects changepoints and handles seasonality flexibly
Theta-GAM decomposes trends and applies exponential smoothing

When these models agree, you can be confident. When they disagree, the average often lands closer to the truth than any single prediction—because individual model errors tend to cancel out.

A Real Example: Quarterly Revenue Forecasting

Let's look at actual quarterly revenue data spanning 30+ years. We trained all four models and generated forecasts for 12 quarters (2022-Q1 through 2024-Q4), then compared predictions against actual results.

Individual model forecasts showing varying predictions Four models, four different forecasts. Which one is "right"?

Here's what each model predicted versus what actually happened:

Period	LightGBM	SARIMAX	Theta-GAM	Prophet	Ensemble	Actual
2022-Q1	54.96M	51.92M	56.04M	59.48M	55.60M	53.05M
2022-Q2	54.76M	51.59M	55.98M	58.54M	55.22M	52.94M
2022-Q3	55.66M	51.42M	56.72M	56.81M	55.15M	53.67M
2022-Q4	56.05M	51.45M	56.17M	60.77M	56.11M	55.41M
2023-Q1	56.03M	51.35M	66.80M	59.54M	58.43M	62.94M
2023-Q2	55.39M	51.40M	57.37M	58.60M	55.69M	61.40M
2023-Q3	58.82M	51.31M	59.30M	56.87M	56.58M	53.39M
2023-Q4	58.36M	51.36M	59.57M	60.83M	57.53M	55.68M
2024-Q1	57.38M	51.27M	62.42M	59.60M	57.67M	55.91M
2024-Q2	59.03M	51.32M	60.07M	58.66M	57.27M	54.92M
2024-Q3	59.17M	51.23M	61.08M	56.94M	57.10M	51.59M
2024-Q4	59.05M	51.27M	62.09M	60.90M	58.33M	53.42M

Now let's calculate the Mean Absolute Percentage Error (MAPE) for each approach:

Model	MAPE
Ensemble	5.51%
SARIMAX-Ridge	6.83%
LightGBM	6.92%
Prophet	8.49%
Theta-GAM	8.74%

The ensemble—a simple average of all four models—achieved the lowest error rate, beating even the best individual model by 1.3 percentage points.

Ensemble forecast result The ensemble forecast (green) smooths out individual model volatility.

Why Does This Work?

The ensemble outperforms for a simple statistical reason: uncorrelated errors cancel out.

Consider 2022-Q4, where the actual value was €55.41M:

LightGBM overshot by €0.64M (+1.1%)
SARIMAX undershot by €3.97M (-7.2%)
Theta-GAM overshot by €0.76M (+1.4%)
Prophet overshot by €5.36M (+9.7%)

The ensemble prediction of €56.11M overshot by just €0.70M (+1.3%)—better than three of the four individual models.

This isn't magic. It's what happens when you combine predictions that err in different directions. Over 12 quarters, the models' biases partially offset each other.

When Ensembles Work Best

Ensemble forecasting shines when:

1. Models produce plausible but different predictions

If all four models cluster tightly around €55M, the ensemble won't add much. But if they spread from €51M to €62M, you're getting diverse perspectives that can be productively combined.

2. You don't have strong prior knowledge about which model fits best

If you know your data has strong seasonality and Prophet historically nails it, maybe trust Prophet. But if you're forecasting a new product line or entering a new market, model uncertainty is high—and ensembles hedge that uncertainty.

3. The forecast horizon is medium to long-term

Short-term forecasts (1-2 periods) may be dominated by recent momentum that one model captures well. Longer horizons involve more uncertainty where ensemble averaging pays off.

4. Accuracy matters more than interpretability

A single model's forecast is easier to explain: "Prophet detected a trend change in Q3 2021." An ensemble is a black box that's harder to narratively justify—but often more accurate.

When to Skip Ensembles

Don't blindly combine models. Ensembles can hurt when:

One model is clearly wrong

Including it in the ensemble might drag down accuracy. You might get better results ensembling only the other three.

Models are highly correlated

If you run three variations of gradient boosting, their errors will be correlated. Averaging them won't help much—you're just averaging similar mistakes.

You need point forecasts for downstream systems

Some inventory or financial planning systems want a single number, not a range. While ensembles give you a central estimate, you lose the confidence intervals that individual models provide.

How to Build an Ensemble in Sanvia

Building an ensemble takes three clicks:

Run your forecast with multiple models selected
On the results page, click Ensemble
Select which models to include (or use all)

Sanvia calculates a simple average across selected models for each forecast period. You can then compare the ensemble against individual models and export whichever you prefer.

For more advanced use cases, you can also:

Weight models differently: If LightGBM historically outperforms on your data, give it 40% weight vs. 20% for others
Exclude outlier models: Drop any model that seems systematically biased
Ensemble subsets: Try combining just the top 2 performers

Practical Recommendations

Based on our analysis and experience across hundreds of forecasts:

Start with a simple average of all models. It's robust and requires no tuning. In most cases, it will match or beat your best individual model.

Look at the spread between models. If predictions range widely (>15-20% difference between highest and lowest), the ensemble is likely adding value. If they're tightly clustered (<5%), you could just pick any one.

Review ensemble vs. individual performance regularly. Your data patterns may shift. A model that was dragging down the ensemble might start outperforming—or vice versa.

Don't ensemble when you have domain knowledge. If you know a structural change is coming (new product launch, market entry, regulatory change), individual models that can incorporate that knowledge may beat a backward-looking ensemble.

The Bottom Line

Ensemble forecasting isn't a silver bullet, but it's one of the easiest ways to improve forecast accuracy without additional data or model tuning.

In our test case, combining four models reduced error from 6.83% (best individual) to 5.51% (ensemble)—a 19% improvement in accuracy with zero extra effort.

When your models disagree, don't agonize over which one to trust. Trust all of them.

Ready to try ensemble forecasting on your data? Start your free trial and run multiple models in minutes.

Ensemble Forecasting: Why Combining Models Beats Picking the 'Best' One

The Wisdom of the Crowd, Applied to Forecasting

A Real Example: Quarterly Revenue Forecasting

Why Does This Work?

When Ensembles Work Best

When to Skip Ensembles

How to Build an Ensemble in Sanvia

Practical Recommendations

The Bottom Line

Related Articles

Seasonal Sales Forecasting: How to Predict and Prepare for Demand Swings

How to Write a 5-Year Business Plan (Free Template Included)

Ready to forecast smarter?