Saturday, May 19, 2018

Peer-Review Fraud: Cite my paper or else

ReviewerFraud.html

Peer-Review Fraud: Cite my paper or else!

May 19, 2016 (revised in 2018)

I serve as a peer-reviewer a lot because I value the peer-review process. In the first few years after graduate school, reviewer comments on my manuscripts were often the most helpful part of the writing process.I benefited from the process and I am willing do what I can to contribute to the process. Reviewers' are volunteers and their service is a critical part of academic publication. I believe in the process and I value the system. As a result, I treat my review assignments seriously and always write reviews objectively and provide constructive recommendations. I want to do my part to keep this academic common a sustainable endeavor.

In 2016, reviews on two manuscripts disturbed me. The lead authors of the two manuscripts were former students of mine. One is about the use of a generalized propensity score method to estimate the causal effect of nitrogen on stream benthic community, and the other is on statistical issues of discretization of a continuous variable when constructing a Bayesian networks model. These two manuscripts have nothing in common, except that I was the second author on both. Reviewers' comments on the two manuscripts came back in the same week. One reviewer apparently reviewed both papers. This reviewer's comments on both papers were essentially the same. But the suggestions are irrelevant to our work. It is clear to us that this reviewer was sending a message: cite my papers and I will let you go.

For the Bayesian networks paper, we chose to ignore this reviewer as he was one of four reviewers commented on our paper. We copied this reviewer's comments on our propensity score paper to the editor and the paper is now published. The propensity score paper had only one reviewer. The lead author was a student at the time and was eager to add more publications to his resume before graduation. After discussion, I wrote to the editor of the journal to explain our concerns. I requested that the manuscript be considered as a new submission and go through the review process again. Although it would be easy to add a sentence or two with the recommended citations, I believe that it is important to uphold the principle. The associate editor ignored my request for communication so I sent the request to the editor in chief. Although the editor promised to handle the re-review himself, he delegated the work to the same associate editor, who in turn made sure that the paper went through repeated reviews until it was rejected. The paper is now published in a different journal.

I copy reviews in question below. Hopefully readers will reach the same conclusion as I did. We want to publish and we want our peers to read and cite our work because the work is worthwhile. Abusing the "power" as a reviewer is just as bad as cheating!

Review on the Bayesian networks model paper:

General comments:

Overall I like the study and I feel it is fairly well written. My two observations are about the lack of global sensitivity and uncertainty analyses (GSUA) and a conversation about management implications that we can extract from the model/GSUA. Note that here with ''model'' I mean any method that use the data, yet any model that process the data in input and produce an output. That is useful for assessing input factor importance and interaction, regimes, and scaling laws between model input factors and outcomes. This differs from traditional sensitivity analysis methods. Thus, GSUA is very useful for finding out optimal management/design strategies. GSUA is a variance-based method for analyzing data and models given an objective function. It is a bit unclear how many realizations of the model have been run and how the authors maximized prediction accuracy. Are the values of the input factors taken to maximize predictions? GSUA (see references below) typically assigns probability distribution functions to all model factors and propagate those into model outputs.

In this context, that is about discretization methods for pdfs, the impact of discretization may be small or large depending on the pdf chosen (or suitable) for the variables; yet, the discretization may have different results as a function of the nature of the variables of interest as well as of the model used.

I think that independently of the model / variables used the authors should discuss these issues in their paper and possibly postpone further research along these lines to another paper.

Specific comments:

Variance-based methods (see Saltelli and Convertino below) are a class of probabilistic approaches which quantify the input and output uncertainties as probability distributions, and decompose the output variance into parts attributable to input variables and combinations of variables. The sensitivity of the output to an input variable is therefore measured by the amount of variance in the output caused by that input. Variance-based methods allow full exploration of the input space, accounting for interactions, and nonlinear responses. For these reasons they are widely used when it is feasible to calculate them. Typically this calculation involves the use of Monte Carlo methods, but since this can involve many thousands of model runs, other methods (such as emulators) can be used to reduce computational expense when necessary. Note that full variance decompositions are only meaningful when the input factors are independent from one another. If that is not the case information theory based GSUA is necessary (see Ludtke et al. )

Thus, I really would like to see GSUA done because it (i) informs about the dynamics of the processes investigated and (ii) is very important for management purposes.

Convertino et al. Untangling drivers of species distributions: Global sensitivity and uncertainty analyses of MaxEnt. Journal Environmental Modelling & Software archive Volume 51, January, 2014 Pages 296-309

Saltelli A, Marco Ratto, Terry Andres, Francesca Campolongo, Jessica Cariboni, Debora Gatelli, Michaela Saisana, Stefano Tarantola Global Sensitivity Analysis: The Primer ISBN: 978-0-470-05997-5

Ludtke et al. (2007), Information-theoretic Sensitivity Analysis: a general method for credit assignment in complex networks J. Royal Soc. Interface

Review on the propensity score paper:

GENERAL COMMENTS

After a careful reading of the manuscript I really like the study and I feel it can have some impact into the theory of biodiversity and biogeography at multiple scales. My two technical observations are about the lack of global sensitivity and uncertainty analyses (GSUA) and a conversation about management implications that we can extract from the model/GSUA. Also, I think the findings can be presented in a clearer way by focusing on (i) the universality of findings across macro-geographical areas, (2) probabilistic structure of the variable considered and (3) the possibility to discuss gradual and sudden change in a non-linear theoretical framework (tipping points and gradual change). I would strongly suggest to talk about ''potential causal factors/relationship'' rather than talking about true causality because that is very difficulty proven and many causality assessment methods exist (e.g. transfer entropy, conergence cross mapping, scaling analysis, etc.). Also, can you provide an explanation for Eq. 6? Figure 2 does not show regressions but scaling law relationship since you plot everything in loglog. This can be an important results, in fact I suggest you to consider this avenue of interpretation (see Convertino et al. 2014 but also other work or Rinaldo and Rodriguez-Iturbe).

Note that here with ''model'' I mean any method that use the data, yet any model that process the data in input and produce an output. Data in fact can be thought as a model and probability distribution functions (pdfs) can be assigned to data variables (see Convertino et al. 2014). These pdfs can be assigned to any source of uncertainty about a variable (e.g. changing presence / absence into a continuous variable) and the uncertainty of outputs (e.g. species richness) can be tested against the uncertainty of all input variables. I believe that just considering average values is not enough.

As for the rest I really love the paper. I suggest to also plot the patterns in Convertino et al (2009): these are for instance the JSI and the Regional Species Richness; in ecological terms these can be defined as alpha, beta and gamma diversity. These patters can be studied as a function of geomorphological patterns such as the distance from the coat in order to find potential drivers of diversity. These are just ideas that can be pursued further. Lastly I wonder if the data can be made available to the community for further studies. For all above motivations I suggest to accept the paper only after Moderate or Major Revisions. Again, I think that these revisions can just make better the paper.

SPECIFIC COMMENTS:

In any context, e.g. as in this paper GSUA is very important because it given an idea of what is driving the output in term of model input factor importance and interaction, and how that can be used for management. GSUA is a variance-based method for analyzing data and models given an objective function. It is a bit unclear how many realizations of the model have been run and how the authors maximized prediction accuracy. Are the values of the input factors taken to maximize predictions? GSUA (see references below) typically assigns probability distribution functions to all model factors and propagate that into model outputs. That is useful for assessing input factor importance and interaction, regimes, and scaling laws between model input factors and outcomes. This differs from traditional sensitivity analysis methods (that are even missing here)

Variance-based methods (see Saltelli and Convertino below) are a class of probabilistic approaches which quantify the input and output uncertainties as probability distributions, and decompose the output variance into parts attributable to input variables and combinations of variables. The sensitivity of the output to an input variable is therefore measured by the amount of variance in the output caused by that input. Variance-based methods allow full exploration of the input space, accounting for interactions, and nonlinear responses. For these reasons they are widely used when it is feasible to calculate them. Typically this calculation involves the use of Monte Carlo methods, but since this can involve many thousands of model runs, other methods (such as emulators) can be used to reduce computational expense when necessary. Note that full variance decompositions are only meaningful when the input factors are independent from one another. If that is not the case information theory based GSUA is necessary (see Ludtke et al. for an information theory model of GSUA).

Thus, I really would like to see GSUA done because it (i) informs about the dynamics of the processes investigated and (ii) is very important for management purposes.

REFERENCES

Convertino, M. et al (2009) On neutral metacommunity patterns of river basins at different scales of aggregation http://www1.maths.leeds.ac.uk/~fbssaz/articles/Convertino_WRR09.pdf

Convertino, M.; Baker, K.M.; Vogel, J.T.; Lu, C.; Suedel, B.; and Linkov, I., "Multi-criteria decision analysis to select metrics for design and monitoring of sustainable ecosystem restorations" (2013). US Army Research. Paper 190. http://digitalcommons.unl.edu/usarmyresearch/190 http://digitalcommons.unl.edu/cgi/viewcontent.cgi?article=1189&context=usarmyresearch

Convertino et al. Untangling drivers of species distributions: Global sensitivity and uncertainty analyses of MaxEnt Journal Environmental Modelling & Software archive Volume 51, January, 2014 Pages 296-309

Saltelli A, Marco Ratto, Terry Andres, Francesca Campolongo, Jessica Cariboni, Debora Gatelli, Michaela Saisana, Stefano Tarantola Global Sensitivity Analysis: The Primer ISBN: 978-0-470-05997-5

Ludtke et al. (2007), Information-theoretic Sensitivity Analysis: a general method for credit assignment in complex networks J. Royal Soc. Interface

No comments:

Log or not log

LOGorNOTLOG.html Log or not log, that is the question May 19, 2018 In 2014 I taught a special topics class on statistical i...