Song's Blog: Using Bayesian to Cheat

A few days before the end of last semester, I stumbled upon a paper in the journal Ecological Indicators [Song and Guan(2013)]. The title indicates a Bayesian estimation method was used. The authors started with a discussion of the “environmental efficiency analysis” and introduced an indicator, which is some kind of mathematical function of input and output. There was no intuitive explanation on the indicator. The indicator was based on three input variables (population, “fixed capital formation,” and non renewable energy consumption), one desirable output variable (GDP), and one undesirable output (industrial SO2 emission). The calculation resulted in 17 environmental efficiency scores (9 cities in 2 years). The main objective of the paper is to explore factors affecting these environmental efficiency scores (EE), using regression. The potential factors are (1) per capita GDP (RGDP), (2) total import and export volume (IE), (3) the proportion of the “second industry” in GDP (GY), (4) the proportion of “the industry of the second industry” in GDP (GGY), and (5) the proportion of environmental spending in GDP (HZ). The authors explained that (1) is an indicator of economic scale, (2) is a measure of economic exchange with the outside world (the authors used the term “opening up,” a bad translation of a Chinese term), (3) and (4) are measures of industry structure, and (5) is the “government factor.” Not being an economist, I don’t want to comment on the choice of these potential factors, except that GDP is now used both as part of the environmental efficiency score and as a potential factor that will be used to explain the variance of the score.
The authors used a multiple regression approach, but regression coefficients were estimated using MCMC. I was expecting a discussion on the choice of prior distributions of these coefficients. But it was soon clear that there was no prior distribution. So, why did they use MCMC for a multiple regression problem? Based on the authors’ affiliation (School of Statistics and Mathematics), I assume that they know that there should be no substantive difference between using MCMC and using OLS. The authors presented the following regression model:

The estimated model coefficients may have revealed the answer:

coefficient estimate standard error

α -0.1188 1.7260

β1 -0.1564 0.1562

β2 3.8130 1.5510

β3 4.8050 8.0620

β4 -3.3960 5.5560

β5 0.9362 18.8900

All slopes, except β2 are statistically not different from 0! If the authors used the OLS, a typical regression model output would include a column of p-values, which will make the paper unpublishable. Using MCMC, the authors are able to present the estimated coefficients, standard error, and selected quantiles. Without the column of p-values, a busy reviewer may not be able to catch the problem. (But all my students in an introductory biostatistics class recognized the problem.)

Is this a successful story of cheating by using “Bayesian” statistics?

References
[Song and Guan(2013)] Malin Song and Youyi Guan. The environmental efficiency of Wanjian demonstration area: a Bayesian estimation approach. Ecological Indicators, 36:59–67, 2013.

2 comments:

Unknown said...: Thanks for highlighting this example. Obviously the approach is concerning, but I worry more about the peer-review process. For example, whatever methods the authors used and for whatever motivations they may have had, shouldn't reviewers and editors be able to see basic statistical concerns? I'm less familiar with this journal than with others--is this typical of this journal?; January 19, 2014 at 1:20 PM
Song Qian said...: The topic of the paper is not something this journal typically publish. As a result, I suspect that the associate editor simply used the author-recommended reviewers, who may well be professionally or personally connected with the authors. I cannot believe that an independent reviewer would have missed the problem. If authorship can be bought in China (http://www.sciencemag.org/content/342/6162/1035.short), why not reviewers?; January 20, 2014 at 4:45 PM

Song's Blog

Monday, January 6, 2014

Using Bayesian to Cheat

2 comments:

Log or not log

Search This Blog