Three storm-scale ensemble prediction models (AROME-EPS, COSMO-DE-EPS and MOGREPS-UK) were combined over Western Europe to create two convection-allowing, multi-model ensemble prediction systems (EPSs), with the goal of improving ensemble spread–skill relationships and probabilistic forecasting ability. A shared, uniform grid of ∼2.2 km resolution was used, containing two regions where model domains overlapped (AROME-EPS/COSMO-DE-EPS and AROME-EPS/MOGREPS-UK). Verification was conducted over a five-month period spanning two years, using near-surface observations of wind, temperature, relative humidity, and precipitation. Bias correction was also applied to each model in order to remove systematic error and to better assess the added value of the multi-model ensembles. Analyses of EPS errors and ensemble scores are presented, including comparisons between individual EPS and multi-model scores. Verification results show that the multi-model ensembles exhibit generally lower root-mean-square error (RMSE), increased spread and improved relative operating characteristic (ROC), ROC area (ROCA) and Brier scores compared with the individual EPSs. In addition, a case study was selected to highlight discrepancies in precipitation frequency bias between the AROME-EPS and COSMO-DE-EPS models, attributable to differences in the distribution and intensity of precipitation. When combined as a multi-model EPS, the 6 h precipitation accumulation forecast for this case study matched the observations better than the individual EPS forecasts. Together, these results highlight the potential advantage of using multiple models with differing dynamics and physics parametrizations when developing a convection-allowing EPS.