Song's Blog: Reproduceable Results

Over a year ago, Duke was in the spotlight for academic fraud involving some biologists misuse statistics and fabricate results. The main culprit resigned and now practicing medicine in South Carolina. Duke University apparently does not agree that their rising star was a fraud, as the Duke cancer research head wrote a very positive letter of recommendation for the guy.

One clue that something was wrong was that no one can reproduce the results the Duke team published in Nature. Unable to reproduce a result in a published paper is such a common phenomenon in ecological studies because no one can repeat a costly experiment or even obtain the data used in a paper. Even when people put code and data as a supplement to a paper, rarely reviewers check these materials. In order for a reviewer to check the work of a submitted paper, we should ask the author to provide something like a Sweave file, code plus documentation and data. This is why I came back to Sweave today to prepare my report using Sweave so that those interested can repeat the work I did.

During the summer, I read a paper published in the journal Methods in Ecology and Evolution (1:25-37, 2010), advocating the use of a program called "TITAN" for detecting and estimating community threshold using species compositional data. In one example, the authors studied the effect of urbanization in a watershed on the biodiversity in stream using data from multiple watersheds in Maryland. The conclusion that a mere 1 to 2% urban land cover in a watershed can result in a dramatic shift in biodiversity in streams is highly suspicious as the measurement error of land cover (as a percentage of total land area in a watershed) can be very high (5 to 10%). Subsequent papers by the same authors also reached similar conclusions (very small urban land cover will lead to large changes in aquatic ecosystem biodiversity).

A careful examination of the code, I realized that the statistics behind the method was wrong. The mistake is not obvious in the description of the method, but should have been detected if the reviewers were critical enough to try the method with a simple simulation. I and a colleague conducted an extensive simulation study and we found that TITAN cannon detect known thresholds in simulated data, unless the threshold is clearly a step function noticeable without using a computer. The effort took us several weeks. Reviewers of this paper should have the code and data set for one example. But the TITAN authors included a bootstrapping procedure that made running the program time consuming. As a result, I suspect that reviewers of the manuscript never ran the code. If they did, they would have discovered that TITAN will produce different estimates every time the model is executed using the same data. That is, the reviewers would likely to see a different result from those in the manuscript.

Song's Blog

Friday, September 30, 2011

Reproduceable Results

No comments:

Log or not log

Search This Blog