Statistical errors in the Forest 2006 climate sensitivity study

Originally a guest post on Nov 8, 2012 at 8:03 AM at Climate Audit

First, my thanks to Steve for providing this platform. Some of you will know of me as a co-author of the O’Donnell, Lewis, McIntyre and Condon paper on an improved temperature reconstruction for Antarctica. Since then I have mainly been investigating studies of equilibrium climate sensitivity (S) and related issues, since climate sensitivity lies at the centre of the AGW/dangerous climate change thesis. (The equilibrium referred to is that of the ocean – it doesn’t include very slow changes in polar ice sheets, etc.) Obviously, the upper tail of the estimated distribution for S is important, not just its central value.

People convinced as to the accuracy of AO-GCM (Atmosphere Ocean General Circulation Model) simulations may believe that these provide acceptable estimates of S, but even the IPCC does not deny the importance of observational evidence. The most popular observationally-constrained method of estimating climate sensitivity involves comparing data whose relation to S is too complex to permit direct estimation, such as temperatures over a spatio-temporal grid, with simulations thereof by a simplified climate model that has adjustable parameters for setting S and other key climate properties. The model is run at many different parameter combinations; the likelihood of each being the true combination is then assessed from the resulting discrepancies between the modelled and observed temperatures. This procedure requires estimates of the natural spatio-temporal covariability of the observations, which are usually derived from AO-GCM control runs, employing an optimal fingerprinting approach. A Bayesian approach is then used to infer an estimated probability density function (PDF) for climate sensitivity. A more detailed description of these methods is given in AR4 WG1 Appendix 9B, here.

I have concentrated on the Bayesian inference involved in such studies, since they seem to me in many cases to use inappropriate prior distributions that heavily fatten the upper tail of the estimated PDF for S. I may write a future post concerning that issue, but in this post I want to deal with more basic statistical issues arising in what is, probably, the most important of the Bayesian studies whose PDFs for climate sensitivity were featured in AR4. I refer to the 2006 GRL paper by Forest et al.: Estimated PDFs of climate system properties including natural and anthropogenic forcings, henceforth Forest 2006, available here, with the SI here. Forest 2006 is largely an update, using a more complete set of forcings, of a 2002 paper by Forest et al., also featured in AR4, available here, in which a more detailed description of the methods used is given. Forest 2006, along with several other climate sensitivity studies, used simulations by the MIT 2D model of zonal surface and upper-air temperatures and global deep-ocean temperature, the upper-air data being least influential. Effective ocean diffusivity, Kv, and total aerosol forcing, Faer, are estimated simultaneously with S. It is the use of multiple sets of observational data, combined with the joint estimation of three key uncertain climate parameters, that makes Forest 2006 stand out from similar Bayesian studies.

Forest completely stonewalled my requests to provide data and code for over a year (for part of which time, to be fair, he was recovering from an accident). However, I was able to undertake a study based on the same approach as in Forest 2006 but using improved statistical methods, thanks to data very helpfully made available by the lead authors, respectively Charles Curry and Bruno Sanso, of two related studies that Forest co-authored. Although Forest 2006 stated that the Curry et al. 2005 study used the Forest 2006 data (and indeed relied upon that study’s results in relation to the surface dataset), the MIT model surface temperature simulation dataset for Curry et al. 2005 was very different from that used in the other study, Sanso et al. 2008. The Sanso et al. 2008 dataset turned out to correspond to that actually used in Forest 2006. The saga of the two conflicting datasets was the subject of an article of mine posted at Judith Curry’s blog Climate Etc this summer, here , which largely consisted of an open letter to the chief editor of GRL. Whilst I failed to persuade GRL to require Forest to provide any verifiable data or computer code, he had a change of heart – perhaps prompted by the negative publicity at Climate Etc – and a month later archived the complete code used for Forest 2006, along with semi-processed versions of the relevant MIT model, observational and AO-GCM control run data – the raw MIT model run data having been lost. Well done, Dr Forest. The archive (2GB) is available at http://bricker.met.psu.edu/~cef13/GRL2006_reproduce.tgz .

The code, written in IDL, that Forest has made available is both important and revealing. Important, because all or much of it has been used in many studies cited, or expected to be cited, by the IPCC. That includes, most probably, all the studies based on simulations by the MIT 2D model, both before and after AR4. Moreover, it also includes a number of detection and attribution studies, the IPCC’s “gold standard” in terms of inferring climate change and establishing consistency of AO-GCM simulations of greenhouse gas induced warming with observations. Much of the core code was originally written by Myles Allen, whose heavily-cited 1999 Allen and Tett optimal fingerprinting paper, here, provided the statistical theory on which Forest 2006 and its predecessor and successor studies were based. Allen was a co-author of the Forest et al. 2000 (MIT Report preprint version of GRL paper here), Forest et al. 2001 (MIT Report preprint version of Climate Dynamics paper here) and Forest et al. 2002 studies, in which the methods used in Forest 2006 were developed.

The IDL code is revealing because it incorporates some fundamental statistical errors in the derivation of the likelihood functions from the model-simulation – observation discrepancies. The errors are in the bayesupdatenew2.pro module (written by Forest or under his direction, not by Allen, I think) that computes the relevant likelihood function and combines it with a specified initial distribution (prior) using Bayes theorem. There are three likelihood functions involved, one for each of the three “diagnostics” – surface, upper-air, and deep-ocean, which involve respectively 20, 218 and 1 observation(s). The bayesupdatenew2 module is therefore called (by module bu_lev05.pro) three times, if an “expert” prior is being used. Where the prior used is uniform, on the first pass bayesupdatenew2 also computes a likelihood for a second set of discrepancies and uses that as a “data prior”, so the module is only called twice.

Each likelihood is based on the sum of the squares of the ‘whitened’ model-simulation – observation discrepancies, r2. Whitening involving transforming the discrepancies, using an estimate of the inverse spatio-temporal natural variability covariance matrix, so that they would, in a perfect world, be independent standardised normal random variables. The likelihoods are computed from the excess, delta.r2, of r2 over its minimum value, minr2 (occurring where the model run parameters provide the best fit to observations), divided by m, the number of free model parameters, here 3. The statistical derivation implies that delta.r2/m has an F_m,v statistical distribution, in this case that delta.r2/3 has an F_3,v distribution, v being the number of degrees of freedom in estimating the spatio-temporal natural variability covariance matrix. The math is developed in Forest et al. 2000 and 2001 out of that in Allen and Tett 1999, and I will not argue here about its validity.

The statistical errors I want to highlight are as follows:

(a) the likelihoods are computed using (1 minus the) cumulative distribution function (CDF) for a F_3,v(delta.r2/3) distribution, rather than its probability density function. A likelihood function is the density of the data, viewed as a function of the parameters. Therefore, it must be based on the PDF, not the CDF, of the F distribution. The following code segment in bayesupdatenew2 incorporates this error:

r2 = r2 – minr2 +1e-6
nf = 3
pp= 1.- f_pdf(r2/float(nf),nf,dof1)

where dof1 is the degrees of freedom used for the diagnostic concerned. Despite the name “f_pdf”, this function gives the F-distribution CDF.

(b) the same calculation, using the F_3,v(delta.r2/3) function, is used not only for the surface and upper-air diagnostics, but also for the univariate deep-ocean diagnostic. The use of the F_3,v distribution, with its argument being delta.r2/3, is based on delta.r2 being, when the model parameter settings are the true ones, the sum of the squares of 3 independent N(0,1) distributed random variables. That there are only 3 such variables, when the diagnostic involves a larger number of observations and hence whitened discrepancies, reflects the higher dimensional set of whitened discrepancies all having to lie on a 3D hypersurface (assumed flat) spanned by the parameters. However, when there is only one data variable, as with the deep-ocean diagnostic (being its temperature trend), and hence one whitened discrepancy, delta.r2 is the square of one N(0,1) variable, not the sum of the squares of three N(0,1) variables. Therefore, the deep-ocean delta.r2, divided by 1 not 3, will have a F_1,v distribution – implying the unsquared whitened deep ocean discrepancy r will have a Student’s t_v distribution. Here, delta.r2 = r2, since a perfect fit to a single data point can be obtained by varying the parameters, implying minr2 = 0.

(c) there is a statistical shortcoming in the Forest et al 2006 method in relation to the use of an F-distribution based PDF as a likelihood function. A geometrical correction to the F-distribution density is need in order to convert it from a PDF for the sum of the squares of three N(0,1) distributed variables to a joint PDF for those three variables. The appropriate correction, which follows from the form of the Chi-square distribution PDF, is division of the F-distribution PDF by sqrt(delta.r2). Without that correction, the likelihood function goes to zero, rather than to a maximum, at the best-fit point.

There may be many other errors in the IDL code archive – I’ve identified a couple. Any readers who are familiar with IDL and have the time might find it interesting to study it, with a view to posting any significant findings – or even to rewriting key parts of it in R.

As it happens the F_3,v PDF is not that different from {1 – CDF} once the parameter combination is well away from the best fit point, so the effect of error (a) is not very substantial. Nevertheless, that Forest made this error – and it was not a mis-coding – seems very surprising.

The effect of (c), which is partially compensated by error (a), is likewise not very substantial.

However, the difference in the behaviour of the likelihood function resulting from error (b) is substantial; the ratio of the Forest 2006 to the correct likelihood varies by approaching 3x as the parameters are moved away from the best fit point. And the deep-ocean likelihood is what largely causes the estimated PDF for S ultimately to decline with increasing S: the two other diagnostics provide almost no constraint on very high values for S.

In addition to the Forest 2002 and 2006 papers, I believe these errors also affected the Forest et al. 2008 Tellus A and the Libardoni and Forest 2011 GRL papers, and probably also 2009 and 2010 papers lead authored by Forest’s regular co-author Sokolov. It is to be expected that there will be multiple citations of results from these various studies in the AR5 WG1 report, . I put it to Myles Allen – who seems, along with Gabi Hegerl, to be the lead author of Chapter 10 primarily responsible for the sections relating to climate sensitivity – that in view of these serious statistical errors, results from the affected papers should not be cited in the IPCC report. However, whilst accepting that the errors were real, he expressed the view that the existence of these statistical errors didn’t really matter to the results of the papers concerned. His reasoning was that only error (b) had a potentially substantial effect, and that didn’t much matter since there was anyway considerable uncertainty in the ocean data that the studies used. I’m not sure that I agree with this approach.

I would be surprised if the basic statistical errors in the IDL code do not significantly affect the results of some or all of the papers involved. I would like to test this in regard to the Forest 2006 paper, by running the IDL code with the errors corrected, in time to put on record in my “expert reviewer” comments on Chapter 10 of the Second Order Draft of IPCC AR5 WG1 report what the differences in Forest 2006’s results resulting from correcting these errors are, if significant. At least Myles Allen and Gabi Hegerl will then be aware of the size of the differences when deciding whether to ignore them.

Unfortunately, IDL seems to be a very expensive product – the company selling it won’t even give any pricing information without many personal details being provided! There is an open source clone, GDL, which I have tried using, but it lacks too much functionality to run most of the code. I’ve implemented part of the IDL code in R, but it would take far too long to convert it all, and I couldn’t be confident that the results would be correct.

So, my question is, could any reader assist me by running the relevant IDL code and letting me have the results? If so, please can you email me via Steve M. The code in the GRL2006_reproduce archive should be fully turnkey, although it is written for an old version of IDL. I can supply an alternative, corrected version of the bayesupdatenew2.pro module and relevant information/instructions.

In case any of you are wondering, I submitted a paper to Journal of Climate in July detailing my reanalysis of the Forest 2006 datasets, focussing on improving the methodology, in particular by using an objective Bayesian approach. That was just before Forest released the IDL code, so I was unaware then that he had made serious statistical errors in implementing his method, not that they directly affect my paper.

Nicholas Lewis

For the convenience of readers who would like to look at the bayesupdatenew2.pro code without downloading a 2GB archive file, it is as follows:

pro bayesupdatenew2,prior,data,post,expert=expert,hik=hik,mtit=mtit,usegcms=usegcms,alf=alf,dt=dt,indiv=indiv,yrng=yrng,$
  label=label,dataprior=dataprior,dof1=dof1,dof2=dof2,noplot=noplot,igsm2=igsm2,mcmc=mcmc,i2post=i2post

if (n_params() eq 0) then begin
  print, 'Usage: bayesupdatenew2,prior,newp,post,expert=expert,$'
  print, '   hik=hik,mtit=mtit,usegcms=usegcms,alf=alf,dt=dt,$'
  print, '   indiv=indiv,yrng=yrng,label=label,dataprior=dataprior,$'
  print, '   dof1=dof1,dof2=dof2,noplot=noplot,igsm2=igsm2,mcmc=mcmc,i2post=i2post'
  print, ' prior = a priori estimate of pdf'
  print, ' data = r2 data used to estimate new pdf'
  print, ' post = a posteriori estimate of pdf'
  print, ' dataprior = 1 if using r2 values for first prior'
  print, ' dof1, dof2 = degreees of freedom for likelihood estimators'
  print, ' dof2 = used for dataprior'
  print, ' FOR USE WITH GSOVSV FORCING'
  print, 'i2post = 1, use igsm2 posterior for aerosol prior'
  return
end

if (not keyword_set(yrng)) then yr = [0,10.] else yr = yrng
if (not keyword_set(dof1)) then dof1 = 12
if (not keyword_set(dof2)) then dof2 = 12 

;set colors
loadct,1 
gamma_ct,0.3



; prior - taken from somewhere old posterior or expert judgement
; data - provided by new diagnostic
; post - returned by updating procedure

kk= findgen(81)*.1 & ss = findgen(101)*.1 +.5 & aa = -findgen(41)*.05+0.5

np = n_elements(data)
;print, 'np = ',np

; create probability from r2 values
r2 = data
pp = fltarr(np)

r2neg = where(r2 le 0., nu)
print, 'Number of negative points=', nu

if (nu ne 0) then r2(r2neg) = 0.

if (keyword_set(igsm2)) then begin
  minr2 = min(r2(1:50,0:75,0:40)) 
  print, 'Minrp2 in igsm2 domain:',minr2
endif else minr2 = min(r2)

print, 'minr2 =',minr2
print, 'Range r2:',range(r2)
;stop

r2 = r2 - minr2 +1e-6
nf = 3
;dof = 12
;for i = 0,np-1 do pp(i)=  1.- f_pdf(r2(i)/float(nf),nf,dof)
print, 'Computing p(r2) for HUGE data vector'
pp=  1.- f_pdf(r2/float(nf),nf,dof1)
help, dof1

;stop

;interpolate prior to r2 points
; note: no prior for aa = alpha
if (keyword_set(expert)) then begin

; returns 3d joint prior on r2interp grid
  expertprior,krange,kprior,srange,sprior,prior,opt=expert,$
      hik=hik,/threeD,/forcing,igsm2=igsm2,mcmc=mcmc,i2post=i2post
;  priorx = krange & priory = srange & priorz = jprior = jointprior
endif ;else begin
;  priorx = kk & priory = ss & priorz = prior 
;endelse

if (not keyword_set(expert) and  keyword_set(dataprior)) then begin
    r2p = prior 
    r2pneg = where(r2p le 0., nup)
    print, 'Number of negative points=', nup
    
    if (nup ne 0) then r2p(r2pneg) = 0.

    print, 'Range(r2p)', range(r2p)
    if (keyword_set(igsm2)) then begin
      print, 'Keyword set: igsm2'
      minr2p = min(r2p(1:50,0:75,0:40)) 
      print, 'minr2p ',minr2p
    endif else minr2p = min(r2p)
    r2p = r2p - minr2p + 1e-6
    print, 'Computing p(r2) for HUGE prior vector'

    prior2 =  1.- f_pdf(r2p/float(nf),nf,dof2)
    print,'Range prior2 before ', range(prior2)

    help, dof2
    if (keyword_set(igsm2)) then begin
      prior2(0,*,*) = 0.        ;KV = 0.
      prior2(51:80,*,*) = 0.    ;KV > 25.
;      prior2(*,76:100,*) = 0.   ;CS > 8.
      prior2(*,76:100,*) = 0.   ;CS > 8.
;      prior2(*,*,0:4) = 0.      ;FA > 0.25 
    endif else begin
      prior2(0,*,*) = 0.
      prior2(*,96:100,*) = 0.
    endelse
    
endif else prior2 = prior

;stop

; multiply probabilities
post = prior2 * pp

; interpolate to finer grid to compute integral
; separate into aa levels
nk = findgen(81)*.1
nk = nk*nk
ns = findgen(101)*.1 + 0.5

; estimate integral to normalize

ds = 0.1 & dk = 0.1 & da = 0.05
;totpl = fltarr(6) ; totpl = total prob at level aa(i)
;for i =0,5 do totpl(i) = total( smpostall(i,*,*) )/(8.*9.5*2.0)   

;where intervals are k=[0,8], s=[0.5,10.], a=[0.5,-1.5]
totp = total(post)/(8.*9.5*2.)

;stop

;normalize here
post = post/totp
;print, post

smpostall = post

if (not keyword_set(noplot)) then begin 

;estimate levels for contour from volume integral
clevs = c_int_pdf3d([0.99,.9,.8],smpostall)
;clevs = c_int_pdf3d([0.90],smpostall)
rr= range(smpostall)
print, 'range post:', rr(1) -rr(0)
;clevs = findgen(3)*(rr(1) - rr(0))/4+min(smpostall)
print,' clevs =', clevs
;stop
if (not keyword_set(indiv)) then !p.multi=[0,2,4] else !p.multi=0
clabs=replicate(1,40)
pmax = max(post)
;print,'max(post), imax', max(post), where(post eq max(post))
ccols = [indgen(3)*50 + 99,255]

pane = ['(a) : ','(b) : ','(c) : ','(d) : ','(e) : ','(f) : ','(g) : ']
titl= ['F!daer!n = 0.5 W/m!u2!n','F!daer!n = 0.0 W/m!u2!n','F!daer!n = -0.25 W/m!u2!n',$
       'F!daer!n = -0.5 W/m!u2!n','F!daer!n = -0.75 W/m!u2!n',$
       'F!daer!n = -1.0 W/m!u2!n','F!daer!n = -1.5 W/m!u2!n']

alevs = [0,10,15,20,25,30,40]
for i = 0,6 do begin
  ii =  alevs(i)
  ka = nk & sa = ns

  contour, post(*,*,ii), sqrt(ka), sa, $
  xrange=[0,8],yrange=yr,/xstyle,/ystyle,$
  levels=clevs,c_labels=0,/cell_fill, c_colors=ccols, $
  title=pane(i)+mtit+' : '+titl(i),$
  xtitle='Effective ocean diffusivity [cm!u2!n/s]',$
  ytitle='Climate Sensitivity [!uo!nC]', $
  xticks = 8, xtickv=findgen(9),$
  xtickname=strmid((findgen(9))^2,6,4); , charsize=chsz

  contour, post(*,*,ii), sqrt(ka), sa,/overplot, $
  levels=clevs,c_labels=0
  
  imax = where(post(*,*,ii) eq pmax, icount)
  ix = where(post(*,*,ii) eq max(post(*,*,ii)))
;  print, imax, ix
  if (icount ne 0) then oplot,[sqrt(ka(imax))],[sa(imax)],psym=sym(1) else $
  oplot, [sqrt(ka(ix))],[sa(ix)], psym = sym(6)
;  for j=0,ni-1 do  oplot, [sqrt(ka(j))],[sa(j)], psym = sym(11)
  
  if (keyword_set(usegcms)) then begin
    getaogcmlist,gcms,nms,opt=usegcms
    oplot,sqrt(gcms(0,*)),gcms(1,*),psym=sym(4),symsize=1.0
    if (keyword_set(label)) then $
          xyouts,sqrt(gcms(0,*))+.15,gcms(1,*),nms,charsize=chsz
  endif

  if (keyword_set(dt)) then begin
      dtread,i,dtdata,obs,dttitl,period=dt,/smooth
      dtlabs = replicate(1,31)
      dtlevs =  findgen(31)/20.
      dr =  range(dtdata.dt)
      ddr = dr(1) - dr(0)
      if (ddr lt 0.5) then dtlevs = findgen(31)/50.
      contour,dtdata.dt,sqrt(dtdata.k),dtdata.s,/overplot, levels=dtlevs,$ 
        c_labels=dtlabs, thick=5
      contour,dtdata.dt,sqrt(dtdata.k),dtdata.s,/overplot, levels=[obs],$ 
        c_labels=dtlabs, thick=5, c_linestyle=5
  endif
    
endfor
!p.multi=0

if (keyword_set(alf)) then begin
  na = -findgen(41)*.05+0.5
  set_plot, 'ps'
  !p.font=0
  device, /encapsulated, /helvetica, font_size=18
  device,/color,bits_per_pixel=8,xsize=20, ysize=5./7.*20
  i = where( na eq alf, nl)
  i = i(0)
  if (nl lt 1) then print, 'No matching aerosol forcing' else begin
    ii = where( na eq alf,ni)
    ka = nk & sa = ns
    
    contour, post(*,*,ii), sqrt(ka), sa,$
    xrange=[0,8],yrange=yrng,/xstyle,/ystyle,$
    xtitle='Rate of Ocean Heat Uptake [Sqrt(K!dv!n)]',ytitle='Climate Sensitivity (K)',title=mtit+' : '+titl(i),$
    levels=clevs,c_labels=0,/cell_fill, c_colors=ccols
    
    contour, post(ii), sqrt(ka), sa,/irregular,/overplot, $
    levels=clevs,c_labels=0
      
    if (keyword_set(usegcms)) then begin
      getaogcmlist,gcms,nms,opt=usegcms
      oplot,sqrt(gcms(0,*)),gcms(1,*),psym=sym(4),symsize=1.0
;    xyouts,sqrt(gcms(0,*))+.15,gcms(1,*),nms,charsize=chsz
    endif

    if (keyword_set(dt)) then begin
      dtread,i,dtdata,obs,dttitl,period=dt,/smooth
      dtlabs = replicate(1,31)
      dtlevs =  findgen(31)/20.
      dr =  range(dtdata.dt)
      ddr = dr(1) - dr(0)
      if (ddr lt 0.5) then dtlevs = findgen(31)/50.
      contour,dtdata.dt,sqrt(dtdata.k),dtdata.s,/overplot, levels=dtlevs,$ 
        c_labels=dtlabs, thick=5
      contour,dtdata.dt,sqrt(dtdata.k),dtdata.s,/overplot, levels=[obs],$ 
        c_labels=dtlabs, thick=5, c_linestyle=5
    endif

  endelse

  device, /close,color=0,encapsulated=0
  set_plot, 'x'
  !p.font = -1

endif
endif 

return
end

Statistical errors in the Forest 2006 climate sensitivity study