Several air quality forecasting ensembles were created from seven models, running in real-time during the 2006 Texas Air Quality (TEXAQS-II) experiment. These multi-model ensembles incorporated a diverse set of meteorological models, chemical mechanisms, and emission inventories. Evaluation of individual model and ensemble forecasts of surface ozone and particulate matter (PM) was performed using data from 119 EPA AIRNow ozone sites and 38 PM sites during a 50-day period in August and September of 2006. From the original set of models, two new bias-corrected model data sets were built, either by applying a simple running mean average to the past 7 days of data or by a Kalman-Filter approach. From the original and two bias-corrected data sets, three ensembles were created by a simple averaging of the seven models. For further improvements three additional weighted model ensembles were created, where individual model weights were calculated using the singular value decomposition method. All six of the ensembles are compared to the individual models and to each other in terms of root mean square error, correlation, and contingency and probabilistic statistics. In most cases, each of the ensembles show improved skill compared to the best of the individual models. The over all best ensemble technique was found to be the combination of Kalman-Filtering and weighted averaging. PM2.5 aerosol ensembles demonstrated significant improvement gains, mostly because the original model's skill was very low.