Traditional statistical measures used to evaluate precipitation forecast skill are affected by variations in the scale of the features in both the forecast and verification fields. This scale-dependence complicates the comparison of precipitation fields that contain differing degrees of small-scale detail and is especially important for warm season precipitation, which is dominated by convective storms. These storms produce precipitation patterns with significant small-scale variability, which are extremely difficult to accurately predict. With the ever-increasing resolution of numerical models, forecast precipitation fields with a similarly high degree of small-scale detail can now be generated. Frequently, traditional scores (such as the equitable threat score) are worse for these highly detailed forecasts than for forecast fields with less small-scale detail. This is because the detailed forecasts often produce near misses for precipitation maxima, even though they quite accurately depict the overall character of the precipitation. The degree to which small-scale details should be retained in mesoscale models (and more sophisticated techniques used to verify the models) is an important question confronting the mesoscale modeling community. In this study, we quantitatively document the scale-sensitivities in precipitation skill scores for four numerical model formulations run during IHOP. The model comparison includes the operational 12-km Eta, the operational 20-km RUC, an experimental 10-km RUC and an experimental 12-km LAPS/MM5. Comparisons of the equitable threat score (ETS) and bias are made for each of the models (verified against stage IV precipitation data) on their native grid and on systematically coarsened grids. By systematically upscaling higher-resolution forecasts to coarser grids, we are able to isolate the impact on the skill scores due solely to smoothing the forecast and verification fields. This comparison of traditional skill scores is complemented by spectral analyses of the various forecast and verification fields. In the first set of experiments, both the forecast and verification fields are upscaled, allowing us to assess the scale impacts on comparisons of forecasts with significantly different spectra. In the second set of experiments, only the forecast fields are smoothed, allowing us to evaluate the usefulness of enhanced precipitation detail, as reflected in the traditional skill scores. The focus of work so far is on the first set of experiments, in which both the forecast and verification precipitation fields are systematically coarsened. For theses experiments, we document the skill-score dependence on the spectral characteristics and bias of the precipitation field, thereby confirming the significant scale impacts on precipitation skill scores. Work continues on the second set of experiments, in which only the forecasts fields are smoothed. Overall, our results support earlier research suggesting that it may be difficult to show improvement in ETSs for models with increasingly fine resolution.
This publication was presented at the following: