West Nile virus (WNV) is the leading cause of mosquito-borne illness in the continental United States (CONUS). Spatial heterogeneity in historical incidence, environmental factors, and complex ecology make prediction of spatiotemporal variation in WNV transmission challenging. Machine learning provides promising tools for identification of important variables in such situations. To predict annual WNV neuroinvasive disease (WNND) cases in CONUS (2015–2021), we fitted 10 probabilistic models with variation in complexity from naïve to machine learning algorithm and an ensemble. We made predictions in each of nine climate regions on a hexagonal grid and evaluated each model's predictive accuracy. Using the machine learning models (random forest and neural network), we identified the relative importance and variation in ranking of predictors (historical WNND cases, climate anomalies, human demographics, and land use) across regions. We found that historical WNND cases and population density were among the most important factors while anomalies in temperature and precipitation often had relatively low importance. While the relative performance of each model varied across climatic regions, the magnitude of difference between models was small. All models except the naïve model had non-significant differences in performance relative to the baseline model (negative binomial model fit per hexagon). No model, including the ensemble or more complex machine learning models, outperformed models based on historical case counts on the hexagon or region level; these models are good forecasting benchmarks. Further work is needed to assess if predictive capacity can be improved beyond that of these historical baselines. Key Points Historical case counts and population density were the most important variables for predicting West Nile neuroinvasive disease cases No model, including machine learning models, significantly improved prediction beyond that of historical patterns in cases We found variability in the relative importance of climatic anomalies across models and regions, with generally low importance Plain Language Summary West Nile virus (WNV) is spread by mosquitoes and can cause a severe disease in humans when it enters the central nervous system (i.e., neuroinvasive disease). Predicting the number of these neuroinvasive disease cases that will occur each year is important for planning timely public health responses. However, prediction is difficult given the complexity of WNV transmission. For example, temperature and precipitation impact mosquito populations, viral transmission rates, and interactions of birds and mosquitoes. To identify the most important factors, we built several probabilistic models, ranging from simple to more complex, to predict annual numbers of WNV neuroinvasive cases. Overall, all our models, except for the simple model, had similar accuracy. Historical trends in case counts were highly predictive of the number of observed cases. In contrast, anomalies in temperature and precipitation generally had low impact on predicting case counts. Models based on historical case counts can be used for future prediction of WNV. Further work needs to be done to assess if we can improve prediction beyond this or if we have reached the upper limit on predictive capacity for WNV.
Authors who have authored or contributed to this publication.