Originally posted on Mar 10, 2014 at 1:56 PM at Climate Audit
This new Nature Climate Change paper by Drew Shindell claims that the lowest end of transient climate response (TCR) – below 1.3°C – in CMIP5 models is very unlikely, and that this suggests the lowest end of model equilibrium climate sensitivity estimates – modestly above 2°C – is also unlikely. The reason is that CMIP5 models display substantially greater transient climate sensitivity to forcing from aerosols and ozone than to forcing from CO2. Allowing for this, Shindell estimates that TCR is 1.7°C, very close to the CMIP5 multimodel mean of ~1.8°C. Accordingly, he sees no reason to doubt the models. In this connection, I would note (without criticising it) that Drew Shindell is arguing against the findings of the Otto et al (2013) study, of which he and myself were two of the authors.
As with most papers by establishment climate scientists, no data or computer code appears to be archived in relation to the paper. Nor are the six models/model-averages shown on the graphs identified there. However, useful model-by-model information is given in the Supplementary Information. I was rather surprised that the first piece of data I looked at – the WM-GHG (well-mixed greenhouse gas) global forcing for the average of the MIROC, MRI and NorESM climate models, in Table S2 – is given as 1.91 W/m², when the three individual model values obviously don’t average that. They actually average 2.05 W/m². Whether this is a simple typo or an error affecting the analysis I cannot tell, but the apparent lack of care it shows reinforces the view that little confidence should be placed in studies that do not archive data and full computer code – and so cannot be properly checked.
The extensive adjustments made by Shindell to the data he uses are a source of concern. One of those adjustments is to add +0.3 W/m² to the figures used for model aerosol forcing to bring the estimated model aerosol forcing into line with the AR5 best estimate of -0.9 W/m². He notes that the study’s main results are very sensitive to the magnitude of this adjustment. If it were removed, the estimated mean TCR would increase by 0.7°C. If it were increased by 0.15 W/m², presumably the mean TCR estimate of 1.7°C would fall to 1.35°C – in line with the Otto et al (2013) estimate. Now, so far as I know, model aerosol forcing values are generally for the change from the 1850s, or thereabouts, to ~2000, not – as is the AR5 estimate – for the change from 1750. Since the AR5 aerosol forcing best estimate for the 1850s was -0.19 W/m², the adjustment required to bring the aerosol forcing estimates for the models into line with the AR5 best estimate is ~0.49 W/m², not ~0.3 W/m². On the face of it, using that adjustment would bring Shindell’s TCR estimate down to around 1.26°C.
Additionally, the estimates of aerosol forcing in the models that Shindell uses to derive the 0.3 W/m² adjustment are themselves quite uncertain. He gives a figure of -0.98 W/m² for the NorESM1‑M model, but the estimate by the modelling team  appears to be -1.29 W/m². Likewise, Shindell’s figure of -1.44 W/m² for the GFDL-CM3 model appears to be contradicted by the estimate of -1.59 W/m² (or -1.68 W/m², dependent on version), by the team involved with the model’s development. Substituting these two estimates for those used by Shindell would bring his TCR estimate down even further.
In any event, since the AR5 uncertainty range for aerosol forcing is very wide (5–95% range: -1.9 to -0.1 W/m²), the sensitivity of Shindell’s TCR estimate to the aerosol forcing bias adjustment is such that the true uncertainty of Shindell’s TCR range must be huge – so large as to make his estimate worthless.
I’ll set aside further consideration of the detailed methodology Shindell used and the adjustments and assumptions he made. In the rest of this analysis I deal with the question of to what extent the model simulations used by Shindell can be regarded as providing reliable information about how the real climate system responds to forcing from aerosols, ozone and other forcing components.
First, it is generally accepted that global forcing from aerosols has changed little over the well-observed period since 1980. And most of the uncertainty in aerosol forcing relates to changes from preindustrial (1750) to 1980. So, if TCR values in CMIP5 models are on average correct, as Shindell claims, one would expect global warming simulated by those models to be, on average, in line with reality. But as Steve McIntyre showed, here, that is far from being the case. On average, CMIP5 models overestimate the warming trend between 1979 and 2013 by 50%. See Figure 1, below.
Figure 1: Modelled versus observed decadal global surface temperature trend 1979–2013
Temperature trends in °C/decade. Virtually all model climates warmed much faster than the real climate over the last 35 years. Source: https://climateaudit.org/2013/09/24/two-minutes-to-midnight/. Models with multiple runs have separate boxplots; models with single runs are grouped together in the boxplot marked ‘singleton’. The orange boxplot at the right combines all model runs together. The default settings in the R boxplot function have been used; the end of the boxes represent the 25th and 75th percentiles. The red dotted line shows the actual trend in global surface temperature over the same period per the HadCRUT4 observational dataset. The 1979–2013 observed global temperature trends from the three datasets used in AR5 are very similar; the HadCRUT4 trend shown is the middle of the three.
Secondly, the paper relies on the simulation of the response of the CMIP5 models to aerosol, ozone and land use changes being realistic, and not overstated. Those components dominate the change in total non-greenhouse gas anthropogenic forcing over the 1850-2000 period considered in the paper. Aerosol forcing changes are most important by a wide margin, and land use changes (which Shindell excludes in some analyses) are of relatively little significance.
For its flagship 90% and 95% certainty attribution statements, AR5 relies on the ‘gold standard’ of detection and attribution studies. In order to separate out the effects of greenhouse gases (GHG), these analyses typically regress time series of many observational variables – including latitudinally and/or otherwise spatially distinguished surface temperatures – on model-simulated changes arising not only from separate greenhouse gas and natural forcings but also from other separate non-GHG anthropogenic forcings. The resulting regression coefficients – ‘scaling factors’ – indicate to what extent the changes simulated by the model(s) concerned have to be scaled up or down to match observations. There is a large literature on this approach and the associated statistical optimal fingerprint methodology. The IPCC, and the climate science community as a whole, evidently considers this observationally-based-scaling approach to be a more robust way of identifying the influence of aerosols and other inhomogeneous forcings than the almost purely climate-model-simulations-based approach used by Shindell. I agree with that view.
Figure 10.4 of AR5, reproduced as Figure 2 below, shows in panel (b) estimated scaling factors for three forcing components: natural (blue bars), GHG (green bars) and ‘other anthropogenic’ – largely aerosols, ozone and land use change (yellow bars). The bars show 5–95% confidence intervals from separate studies based on 1861–2010, 1901–2010 and 1951–2010 periods. Best estimates from the studies using those three periods are shown respectively by triangles, squares and diamonds. Previous research (Gillett et al, 2012) has shown that scaling factors based on a 1901 start date are more sensitive to end date than those starting in the middle of the 19th century, with temperatures in the first two decades of the 20th century having been anomalously low, so the 1861–2010 estimates are probably more reliable than the 1901–2010 ones.
Multimodel estimates are given for the 1861–2010 and 1951–2010 periods (“multi”, at the top of the figure). The best estimate scaling factors for ‘other anthropogenic’ over those periods are respectively 0.58 and 0.61. The consistency of the two best estimates is encouraging, suggesting that the choice between these two periods does not greatly affect results. The average of the two scaling factors implies that the CMIP5 models analysed on average exaggerate the response to aerosols, ozone and other non-greenhouse gas anthropogenic forcings by almost 70%. However, the ‘other anthropogenic’ scaling factors for both periods have wide ranges. The possibility that the true scaling factor is zero is not ruled out at a 95% confidence level (although zero is almost ruled out using 1951–2010 data alone).
Figure 2: Reproduction of Figure 10.4 of IPCC AR5 WGI report
The individual results for models used by Shindell are of particular interest.
The first of the five individual CMIP5 models included in Shindell’s analysis, CanESM2, shows negative scaling factors for ‘other anthropogenic’ over all three periods – strongly negative over 1901–2010. The best estimates for its GHG scaling factor are also far below one over both 1861–2010 and 1951–2010. So it would be inappropriate to place any weight on simulations by this model. In Figure 1, it is CanESM2 that shows the greatest overestimate of 1979-2013 warming.
The second CMIP5 model in Shindell’s analysis, CSIRO-Mk3-6-0, shows completely unconstrained scaling factors using 1901–2010 data, and extremely high scaling factors for both GHG and ‘other anthropogenic’ over both 1861–2010 and 1951–2010 – so much so that the GHG scaling factor is inconsistent with unity at better than 95% confidence for the longer period, and at almost 95% for the shorter period. This indicates that the model should be rejected as a representation of the real world, and no confidence put in its simulated responses to aerosols, ozone or any other forcings.
The third of Shindell’s models, GFDL-CM3, is not included in AR5 Figure 10.4.
The fourth of Shindell’s models, HadGEM2, shows scaling factors for ‘other anthropogenic’ averaging 0.44, with all but the 1901–2010 analyses being inconsistent with unity at a 95% confidence level. The best defined scaling factor, using 1861–2010 data, is only 0.31, with a 95% bound of 0.58. So HadGEM2 appears to have a vastly exaggerated response to aerosol, ozone etc. forcing.
The fifth and last of Shindell’s separate models, IPSL-CM5A-LR, is included in Figure 10.4 in respect of 1861–2010 and 1901–2010. The scaling factors using 1861-2010 data are much the better defined. They are inconsistent with unity for all three forcing components, as are those over 1901-2010 for natural and GHG components. That indicates no confidence should be put in the model as a representation of the real climate system. The best estimate scaling factor for ‘other anthropogenic’ for the 1861-2010 period is 0.49, indicating that the model exaggerates the response to aerosols, ozone etc. by a factor of two.
Shindell also includes the average of the MIROC-CHEM, MRI-CGCM3 and NorESM1-M models. Only one of those, NorESM1-M, is included in AR5 Figure 10.4.
To summarise, four out of six models/model-averages used by Shindell are included in the detection and attribution analyses whose results are summarised in AR5 Figure 10.4. Leaving aside the generally less well constrained results using the 1901–2010 period that started with two anomalously cold decades, none of these show scaling factors for ‘other anthropogenic’ – predominantly aerosol and to a lesser extent ozone, with minor contributions from land use and other factors – that are consistent with unity at a 95% confidence level. In a nutshell, these models at least do not realistically simulate the response of surface temperatures and other variables to these factors.
A recent open-access paper in GRL by Chylek et al, here, throws further light on the behaviour of three of the models used by Shindell. The authors conclude from an inverse structural analysis that the CanESM2, GFDL-CM3and HadGEM-ES models all strongly overestimate GHG warming and compensate by a very strongly overestimated aerosol cooling, which simulates AMO-like behaviour with the correct timing – something that would not occur if the models were generating true AMO behaviour from natural internal variability. Interestingly, the paper also estimates that only about two-thirds of the post-1975 global warming is due to anthropogenic effects, with the other one-third being due to the positive phase of the AMO.
In the light of the analyses of the characteristics of the models used in Shindell’s analysis, as outlined above, combined with the evidence that Shindell’s aerosol forcing bias-adjustment is very likely understated and that his results’ sensitivity to it makes his TCR estimate far more uncertain than claimed, it is difficult to see that any weight should be put on Shindell’s findings.
 Otto, A., F. E. L. Otto, O. Boucher, J. Church, G. Hegerl, P. M. Forster, N. P. Gillett, J.Gregory, G. C. Johnson, R. Knutti,N. Lewis,U. Lohmann, J.Marotzke,G.Myhre, D. Shindell, B Stevens and M. R. Allen, 2013. Energy budget constraints on climate response. Nature Geosci., 6: 415–416.
 A.Kirkevag et al, 2013, Aerosol-climate interactions in the Norwegian Earth System Model-NorESM1-M. GMD .
 M.Salzmann et al, 2010: Two-moment bulk stratiform cloud microphysics in the GFDL AM3 GCM: description, evaluation, and sensitivity tests. ACP.
 Gillett N.P., V. K. Arora, G. M. Flato, J. F. Scinocca, K. von Salzen, 2012: Improved constraints on 21st-century warming derived using 160 years of temperature observations. Geophys. Res. Lett, 39, L01704, doi:10.1029/2011GL050226
 P Chylek e al., 2014. The Atlantic Multidecadal Oscillation as a dominant factor of oceanic influence on climate. GRL.