The surface energy budgets estimated from the 0- to 12-hour forecasts of three operational model suites and the National Centers for Environmental Prediction/National Center for Atmospheric Research (NCEP/NCAR) Reanalysis are analyzed at local to continental scales. The models are (1) the Eta model, (2) the Mesoscale Analysis and Prediction System (MAPS), and (3) the Global Environmental Multiscale (GEM) model. The first two are regional, while the third one is global with a variable grid with a resolution over North America that is equivalent to that of the regional models. This assessment of one summer month (August 1997) and one winter month (January 1998) has the purpose of estimating the reliability of the surface energy budgets within the context of the Global Energy and Water Cycle Experiment (GEWEX) Continental-Scale International Project (GCIP) goals. Satellite estimates were used to evaluate the downward short wave radiation at the surface, while measurements from the southern Great Plains region were used to evaluate the model computed surface energy budget estimates. The results show that the surface short wave radiation biases of the models are widespread and of the order of 25-50 W m(-2) and averaging over larger areas does not help reduce the differences. These biases are compensated by the other long and short wave radiation terms so that the resulting errors in the net radiation are smaller. During August 1997, continental east-west gradients of latent heat flux and Bowen ratio were surprisingly dissimilar among models. Still, the Bowen ratio estimated from the Eta and GEM models was close to observations over the southern Great Plains region, while both the Reanalysis and MAPS had ratios that at least doubled the observed ones. In the case of MAPS a revised latent heat flux formulation was introduced in fall 1997, and, subsequently, for January 1998, estimates were closer to the other models' estimates. However, during January 1998 all models had difficulties reproducing the Bowen ratios from observations. Further, daily time series showed that models' estimates also tended to miss the amplitude of the day-to-day variability. It is conceivable that this may be the result of difficulties in parameterizing the total cloud cover, and, particularly, attenuation by clouds may still be insufficient.