Song's Blog: October 2015

In a recent paper, I and my co-author discussed the use of statistical causal analysis (propensity score matching) for analyzing observational data. The example is to estimate the effect of water and soil conservation practices on controlling nutrient loss from farm fields. The data were observational, a collection of measured P and N loss from various field level studies. My interest on the subject started in 2009 after I completed this book, thinking about whether the topic of causal analysis should be included in the future. When a colleague shared a dataset collected by USDA, I decided to study causal analysis and use it in my class. The effect of conservation practices on nutrient loss has been a topic of agricultural studies for a long time. But the method used in various studies is inevitably modeling. But these models were never properly calibrated and the basic input to these model is the nutrient yield. In many cases, researchers simply assume a fixed rate of reduction in nutrient yield and plug the rate into the model to calculate the total reduction. I have not seen a study properly document the effect of conservation practices using a randomized experiment. After discussing with colleagues, I realized that a randomized experiment is practically impossible. However, the decision of implementing conservation practices is often based on whether a field is prone to soil and water loss. As a result, if we combine data from many studies and compare the nutrient loss from fields with and without conservation practices, we often find that fields with conservation practices have larger nutrient losses. But we are often comparing fields with row crops (with conservation practices) to pastures (without conservation practices). In one published paper, the authors were puzzled by the result of such a comparison. I worked on a dataset consisting of measurements from about 160 papers with field scale measurements on nutrient loss, fertilizer application rate and methods, and other routinely measured variables (crops, best management practices, conservation practices). Using the dataset as an example, I taught statistical causal analysis in 2011, 2013, and 2014 in my graduate level statistics classes. I wrote the paper based on how students responded to the materials. I found that the concept of confounding factor is often new to students, which is not surprising as most students don't have a good conceptual understanding of statistics. I included a long background subsection in the Introduction. From a statistical perspective, the methods section on the propensity score matching method can be described in general terms. The results, in my mind, include not only the estimated effects, but also additional factors to be controlled. In other words, the derived statistical model and its interpretation should be part of the results section. One reviewer and the associate editor insisted that we move the description of the model to methods section. They recommended rejection because of the "organization" problem. I explained in my response that the model is a result of the the causal analysis process. By describing it as a result, I emphasize the process of finding the appropriate model (hence the model is part of the results). By presenting the model in the methods section, we give an impression that causal analysis is a simple application of another statistical procedure. The editor eventually decided a compromise. We presented more in the methods section and I added more in the results on the process of deriving the appropriate model. In my opinion, the usual structure of Introduction -- Methods -- Results -- Discussion (IMRD) is not always effective. My experience showed that the structure can be a hindrance to the presentation of the process of doing research. In two recent papers I published in Environmental Science and Technology, I deviated from IMRD to include background information. When conducting research, we often have to change our approaches or abandon our initial hypothesis. After all, research is a learning process. The IMRD structure encourages us to avoid the discussion of the process. This approach can be hazardous in presenting statistical modeling.

Song's Blog

Thursday, October 29, 2015

Results or Methods

Log or not log

Search This Blog