Why Your Air Quality Forecast Might Be Less Reliable...

Air quality forecasting has been touted as a success story for machine learning adoption. But a closer look at the numbers suggests those claims might be overstated. What’s going on beneath the surface of those AI-driven predictions?

The Hidden Flaw in Static Evaluations

Many studies highlight the potential of AI, like XGBoost, for predicting daily air pollution levels. However, they often rely on static chronological splits and ignore simple persistence baselines. This means their practical usefulness could be less than advertised.

In a study with 2,350 daily PM10 observations from 2017 to 2024 in southern Europe, researchers put XGBoost up against SARIMA and persistence models. Under static conditions, XGBoost seemed to shine, especially predicting up to a week ahead. But when the evaluation method was switched to a rolling-origin protocol with monthly updates, those findings flipped. Suddenly, XGBoost wasn’t consistently better than good ol’ persistence, while SARIMA held its ground across the board.

Why Rolling-Origin Matters

The real story here's how the evaluation method changes everything. Static splits can paint an overly rosy picture of a model's operational value, potentially misleading researchers and practitioners alike. In contrast, the rolling-origin method, akin to a moving average, offers a clearer view of which models hold up over time.

Consider this: if you're a practitioner relying on AI forecasts, do you want to bet your air quality strategies on models that crumble under more realistic conditions? The gap between the keynote and the cubicle is enormous, and sometimes, it's just easier to stick with what works, like persistence.

The Bigger Picture

Researchers should take note: static evaluations might give you quick wins for a conference presentation, but they don't necessarily translate to real-world reliability. Practitioners need models that adapt and stay solid as conditions change, and that’s where rolling-origin evaluations come in.

In an era where AI promises to revolutionize every industry, it's essential to scrutinize how these technologies are evaluated. Otherwise, we risk investing in solutions that look great on paper but falter on the ground.

Why Your Air Quality Forecast Might Be Less Reliable Than You Think

The Hidden Flaw in Static Evaluations

Why Rolling-Origin Matters

The Bigger Picture

Key Terms Explained